The Zerofox Syndicate
Disclaimer
These posts are heavily based on lessons that DGivney originally published on asmtutor.com.
I just wanted to port his lessons to x64 as an exercise for myself and put them somewhere online. Credit should still go to @DGiveny.
This tutorial assumes that you already know what a CPU register is and what a CPU instruction is, yet are unfamiliar with how programs interact with the outside world.
Programs can execute instructions on a CPU. Most of those instructions are very basic and allow you to do arithmetic, like adding two numbers.
For any program to be useful, it needs to be able to interact with the outside world. Calculation only gets you so far. At some point, you want to display the results of those calculations. A program needs ways to receive input and ways to output the results of their calculations.
Programs cannot do these things directly. There are no CPU instructions to perform these actions directly. In order to achieve these things, a program needs to ask the kernel to perform this specific operation. The kernel will then suspend the execution of the program and perform the task that you requested, after which the program is resumed, and the result provided to your program. These interactions with the kernel are called system calls.
System calls allow programs to request certain operations that have to be performed by the kernel. Since a system calls are often related to manipulations in the real world it is necessary to have a way to provide data and receive data from the kernel through these system calls. We do this by providing parameters in certain registers, and we expect a return value in a specific register. Which registers are used as arguments are not hard-wired into the CPU. That decision is done by the kernel, but not communicated to the program. The program is supposed to know which registers the CPU will expect with which arguments for a specific syscall. These expectations are called a calling convention. It will be different if you use a BSD kernel instead of a Linux kernel or if you are using a different CPU architecture.
The original tutorial that this blog post is based on, only showed the 32-bit(x86) version. While it is still relevant for educational purposes, it isn’t really used in modern desktops any more, hence I wanted to update this tutorial with the 64-bit version that everyone is likely to run these days.
If you want to learn the original x86 calling convention. I suggest reading the original tutorial on asmtutor.com
The x86_64 architecture, or sometimes x64, is a 64-bit extension by AMD of the original 32-bit x86 architecture by Intel.
Before diving into more theory, here is a small program.
; Hello World Program - asmtutor.com
; Compile with: nasm -f elf64 helloworld.asm
; Link with: ld -m elf_x86_64 helloworld.o -o helloworld
; Run with: ./helloworld
SECTION .data
msg db 'Hello World!', 0Ah ; assign msg variable with your message string
SECTION .text
global _start
_start:
mov rax, 1 ; invoke SYS_WRITE (kernel opcode 1)
mov rdi, 1 ; write to the STDOUT file
mov rsi, msg ; move the memory address of our message string into ecx
mov rdx, 13 ; number of bytes to write - one for each letter plus 0Ah (line feed character)
syscall
For the kernel to know which syscall you want to execute, it looks into the
rax
register. In this case, 1 means we are trying to invoke the write
syscall.
The write
syscall takes 3 arguments. First the file descriptor, secondly the
address in memory that we want to write, and thirdly the number of bytes we
would like to write away.
Our first argument is 1 because we would like to write to STDOUT.
Our second argument is the msg
variable, the NASM compiler will replace this
with the address in memory that holds the “Hello World!\n” string.
Our third argument is the length of the “Hello World!\n” string.
syscall number | arg1 | arg2 | arg3 | arg4 | arg5 | arg6 |
---|---|---|---|---|---|---|
rax | rdi | rsi | rdx | r10 | r8 | r9 |
To run this, we first need to compile this file to an object file. This file will contain all the instructions but, it not yet in the format of an executable file that the operating system can execute. We create that by linking it.
nasm -f elf64 helloworld.asm # compile to helloworld.o (object file)
ld -m elf_x86_64 helloworld.o -o helloworld # create the executable
If we execute the resulting helloworld
file, you will notice that we get the
desired output but that the program also crashes immediately afterwards.
This is because we are not properly exiting the program. Properly exiting the program requires another syscall.
The exit syscall number is 60
and takes only one argument, the exit code.
As an exercise, try adding it to the hello world example so that the program
exits cleanly without crashing.
Lesson 2 will be the solution to this exercise.
syscall(2)
, if you just want to know which register is used for which argument.