Assembly Tutorial - Syscalls via ARM

After playing around with an Amiga Emulator lately and this has gotten me inspired to write a few asm programs. This series will cover several popular architectures and platforms. Namely Dos, *nix, and Amiga running on x86, 68k, and Arm.

To start off one should have a VM running Linux as the code below will be cross compiled via GCC and we’ll be using AT&T syntax where appropriate.

Nearly all operating systems provides a table of functional code that one can directly access at any time which is provided by the system kernel and its supported libraries. To access this one usually needs to store a few parameters in a common registry then set the number in the table for that system call and then initialize a interrupt call on the bus.

Below is an example ‘hello world’ code that shows how to call write on stdin and exit with a status code of 0 for arm based processors running Linux typically found on Android and Raspberry Pi. The table index is slightly different under unix so be sure to review the documentation when porting between the two.

To cover a little about the ARM processor. This chipset operates on a Reduced Instruction Set and gives one 16 registers to work with. Eight of those registers are for general use registers (ie storage areas) within the cpu, one for a program counter, another for the stack pointer and a final one for the link register:

A program file is typically broken up into several sections. Namely a text block which holds our program code and a data block which holds any stored data related to the programs such as strings, pre initialized integers, structures, and other data types. This is done so data lives in memory like code but due to some practical technicalities, that we do not care very much now, it is usually kept together in what is called a data section. .data directive tells the assembler to emit the entities in the data section. the .text directive makes a similar thing for code. So we will put data after a .data directive and code after a .text to store our message.

We’ll begin by setting up our entrypoint, exporting it as a global object to the kernel then build out our own table of functions including the main entrypoint then assigning our data values to store along with the program. I will include equivalve c/c++ code examples in the comments along with our assembly code to help guide you along.

Just with K&R and C we will also write in a top down method for our program. This mainly matters in different assembler implementations and if we’re using Big Endian or Little Endian. For our purposes we’re working with an "bi-endian" chip that defaults to LIttle endian.

/* compile with `gcc -s -Wall hello.s -o hello.elf` */

/* char msg[17] = "Hello, World.\n\0" */
msg dbc "Hello, World.\n", 0
msglen = . - msg

.global main                               /* Entrypoint */

        mov r7,  #0x900004                 /* Call write() */
        mov r0, #1
        svc   $0                           /* invoke syscall */

        mov r7, #0x900001                  /* Call sys_exit() */
        svc  $0 

/* Main entry point ie: int main() in C/C++ */
        push {r0, lr}

        /* write(STDOUT, &msg, size_of(msg)) */ 
        mov r1, =msg
        mov r2, =msglen                    /* r2 = size of msg [16] */
        bl print                           /* call our print function */

        /* exit(ERR_SUCCESS||0) */
        move r0, #0
        bl quit                            /* call our quit function */

A Quicksheet guide for the ARM registry set will be helpfull for later.

In the next guide we’ll cover inputs and control flow.


Love assembly language.

Not so for AT&T syntax and K&R style.

Yeah, I’m a big Unix, ANSI-C and Bell labs kind of guy, so its just more natural for myself but I would encourage everyone to learn the different style guides to programming.

Although in review, it looks like the code I posted is more in intel syntax even though GAS uses AT&T by default. Guess that’s a feature and layer 8 bug

Pretty decent introduction for folks on a machine they could play with cheaply.

What’s the difference between “move r0, #0” and “mov r0, #0”?

What’s the difference between “move r0, #0” and “mov r0, #0”?

move != mov. but mov ::= move

where one is the english language verb move, the other is the (depending on your assembler implementation) mnemonic code for moving a value into a registry.