Advent of Languages: Day 1 – x86 assembly
03 December, 2019
9 minute read
This is a part of my ongoing challenge of solving every Advent of Code puzzle in a different language.
The challenge we're tackling today is available here.
The Tyranny of the Rocket Equation
After a bit of characteristic fluff about Santas and elves and such, we get to the meat of the problem.
Given a mass m
, divide it by 3, round it down, then subtract 2. As usual, we’re given an input file with a bunch of masses. Dividing and rounding it down is thankfully how integer division works. To be exact, we take the quotient and discard the remainder. For each mass in the input file, we’re to find the result of this equation, and then add them together. Advent of Code provides a nice input file for everyone, and taking a look at mine, it seems the result should fit in an integer, which makes this extra simple.
The Language
Since this problem is simple, we should pick a tricky language to get it out of the way. The hardest part of this problem will probably be just loading the file itself. Thus, I’ve decided to go for x86 assembly.
Assembly languages are a family of low-level languages, usually mapping 1-1 to processor instructions. This means they’re a little tough to reason about for wetware. x86 assembly is specifically the assembly that my computer (with an intel processor) is familiar with. This is what a hello world in x86 assembly for my computer looks like:
.intel_syntax noprefix
.data
.align 16
hello_msg:
.asciz "Hello, World!"
.text
.global _main
_main:
push rbp
mov rbp, rsp
lea rdi, [rip+hello_msg]
call _puts
xor rax, rax
leave
ret
To compile and run this, since I’m on macOS, I’ll use clang
. On a mac, doing direct system calls in assembly is not recommended. Instead, we’ll call into the c standard library for i/o.
The Solution: Part 1
As suspected, the solution was simple. However, brushing up on my x86 assembly took a long time, and I’m pretty sure there is a myriad of problems and major security issues with my code. For one, I didn’t optimize memory alignment at all. Please do point these issues out for me, however, since I’m not experienced with plain asm (the cool person’s way of saying assembly).
Let’s take a look at the finished code to start with, then run through the code line by line to understand what’s going on.
.intel_syntax noprefix
.data
file_name:
.asciz "input"
read_flag:
.asciz "r"
scan_digit_format:
.asciz "%d"
print_digit_format:
.asciz "%d\n"
.text
.global _main
_main:
enter 16, 0
lea rdi, [rip + file_name]
lea rsi, [rip + read_flag]
call _fopen
mov qword ptr [rbp - 8], rax
mov dword ptr [rbp - 12], 0
read_loop:
mov rdi, qword ptr [rbp - 8]
lea rsi, [rip + scan_digit_format]
lea rdx, [rbp - 16]
call _fscanf
cmp eax, 1
jne after_loop
mov eax, dword ptr [rbp - 16]
cdq
mov ecx, 3
idiv ecx
sub eax, 2
add eax, dword ptr [rbp - 12]
mov dword ptr [rbp - 12], eax
jmp read_loop
after_loop:
mov rdi, qword ptr [rbp - 8]
call _fclose
lea rdi, [rip + print_digit_format]
mov esi, dword ptr [rbp - 12]
call _printf
leave
ret
It consists of roughly four parts:
- Program setup
- Getting a handle to the input file
- The loop that reads and computes the result
- Printing the final result
Let’s get into them a bit more deeply.
Program setup
Let’s take a look at this part of the code:
.intel_syntax noprefix
.data
file_name:
.asciz "input"
read_flag:
.asciz "r"
scan_digit_format:
.asciz "%d"
print_digit_format:
.asciz "%d\n"
Let’s notice a couple of things about our source code: some stuff starts with .
and some doesn’t. The things that do are called directives, which means they aren’t part of the actual calls to the x86 processor. Rather, they’re used to put those together.
The first line declares, with a directive, what kind of assembly syntax this is, in this case, it’s non-prefixed intel syntax. I picked it after a long research process of copying what the working hello world program did. The second line declares that we’re going into a .data
section. An assembly file is split into a couple of different sections. In our program, we use the .data
and .text
sections. .data
is used to store some sort of initialized data that does not change at runtime. Take note that we can move these sections wherever we want.
Next, we set up four “strings” (really just null-terminated byte arrays) statically in the program, using a combination of labels and the .asciz
directive. A label (like file_name
) is essentially a handy name we can give to a place in our source code. We’ll use these labels later in the code itself to refer to these strings. .asciz
will just write null-terminated ASCII strings to that place in the source code.
That’s it for the data section! On to the meat.
Getting a handle to the input file
As a reminder, now we’re working on this bit of the code:
.text
.global _main
_main:
enter 16, 0
lea rdi, [rip + file_name]
lea rsi, [rip + read_flag]
call _fopen
mov qword ptr [rbp - 8], rax
mov dword ptr [rbp - 12], 0
There’s quite a bit more here to cover than in the last section. What we accomplish in this section is to start the main function and to store a pointer to the input file in memory we can work with. We start by declaring that we’re entering a .text
section, which is used to indicate that this is where the program source code lives.
Then we defined that the _main
label should be accessible outside this program, effectively turning it .global
. Then we declare where the _main
label lives, which is where our program starts. This is effectively the main
function you’re used to in languages like c or java.
Now we declare our first actual x86 instruction, using the enter
operation. enter
sets up the “stack frame” for this function. The stack frame is essentially a space in memory for stuff in your function. When you declare local variables in c, they use the stack frame. Essentially we’re making the computer store two pointers to either side of this frame here. The 16 here corresponds to how many bytes we’re going to need in our function, I reckon we’ll need 16 bytes for this task. That means the size of the stack frame we’re working with is 16 bytes. The 0 there is used for nesting frames inside each other, which we’re not going to do.
If we skip forward a bit, you can see we’re going to run call _fopen
. This means we’re going to be calling the standard library function fopen
to get a pointer to the input file. The way calling functions in asm works, is you write your arguments to different registers in the CPU, then you call the function, and then the function writes its output to another register. rdi
and rsi
correspond to the first and second argument of the function you’re going to call, respectively. In x86, the argument registers are: rdi
, rsi
, rdx
, rcx
, r8
, and r9
. This means you can call functions like this with a maximum of six arguments. After that, it gets more complicated, as you have to use the stack for any additional arguments.
What the next three lines try to accomplish is, essentially, to call fopen("input", "r")
. For folks unfamiliar with c, this means opening a file with the relative filename of “input” in read-only mode. We have those strings saved in the .data
section of our software, and now we want to load their locations into the rdi
and rsi
registers.
If we look at the next actual instruction, the operation it uses, lea
, places the memory address of the second operand into the register specified by its first operand. The brackets denote that we compute this address. The file_name
there is one of the labels we set up earlier, and rip
is the actual instruction pointer (pointer to which instruction the computer is executing). We get the address of the string by adding them together. Thus, we load the address of “input” into the first argument register and “r” into the second.
Finally, we call _fopen
.
The return value of _fopen
, the pointer to the file we just opened, is stored in the rax
return register. We need to get it out of there before we go forward, as it is used by other functions as well. Thus, we load it into ram from the register. The mov
operation is similar to lea
, but while lea
copies the address of the second argument, mov
copies the value at the second argument.
We want to mov
the value at the rax
register to ram. The value we want to move is a pointer to a file. A pointer on a 64-bit machine takes, by definition, 8 bytes. This is because 64-bit denotes that the machine’s ram is addressable (how many slots it has) to up to the maximum 64-bit value. Thus, it takes 64/8=8 bytes to store slots to ram. qword ptr
declares that the value we’re trying to move to is a pointer to a qword
. qword
, meaning quadword, means the value is 8 bytes. That’s because a “word” is a 2-byte value. Thus, a quadword means 4*2=8 bytes. This is the size we want for our pointer. We also pick a spot in our stack (that we allocated earlier) to store it at, this being the area from -8 bytes to 0 in our stack (negative, because the stack grows downwards on my system). Essentially, we’re declaring a local variable here, of type “pointer to a file”.
Finally, we create another new local variable to store the actual sum of our computation. Since this is an int
, which takes 4 bytes, this time we use the length dword
(“double word = 2 * 2 = 4 bytes), at slots -12 to -8. We set the value there to 0.
The computation loop
The code we’re now looking at is this part:
read_loop:
mov rdi, qword ptr [rbp - 8]
lea rsi, [rip + scan_digit_format]
lea rdx, [rbp - 16]
call _fscanf
cmp eax, 1
jne after_loop
mov eax, dword ptr [rbp - 16]
cdq
mov ecx, 3
idiv ecx
sub eax, 2
add eax, dword ptr [rbp - 12]
mov dword ptr [rbp - 12], eax
jmp read_loop
We won’t go as deep as we did in the last section, as a lot of it is covered already. What we do in this section is read the input file line by line, compute the mass for the line and add it to the total sum, exiting the loop if we run out of lines. We start by wanting to call the fscanf
function with three arguments. These arguments are:
- The pointer to the file we’re reading that we stored in ram earlier.
- The “scan format”, in this case, “%d”, denoting an integer. We have defined this string in our data section under the scan_digit_format section.
- The address where we want to store the read digit to.
fscanf
stores the read variables to the addresses passed in. It returns how many variables were read and stores it in the eax
register. Note: the rax
and eax
registers are essentially the same thing, but have different sizes. rax
is 8 bytes and eax
is 4. The same convention applies to the return registers. In this case, we want to read and compute until we’re not reading new variables. We use the mov
and lea
operations to put our variables into the arguments registers. Notice that we’re using the previously unused -16 to -12 ram slot to store the read digit in.
After calling fscanf
, we use a new operation, cmp
. It compares the first operand with the second and stores the result in a special register meant for comparison. We use that value in the proceeding line, using jne
. That operand means “jump if not equals”, it looks at the values in the special comparison register (which is called EFLAGS), and if they indicate that the last compared values were not equal, it will jump to the location you provide. What we’re doing here is comparing the return value of fscanf
, (which is stored in eax
) to 1, and if they don’t equal, we’re going to “jump” to the label after_loop, defined later in our software.
Now if we didn’t jump, we proceed to the computation part of the loop. We want to divide the value read with 3, subtract 2 from it and then add it to the total sum, after which we want to repeat the first part of the loop. We start by moving the value fscanf
read into eax
because the following operations will run on that register. I won’t go too deep here as division is a bit complicated. Since I don’t understand this very well, I won’t try to explain it. We run cdq
to change the value in eax
to a 64-bit value (as the div
operation wants a 64-bit value), then move 3 to the ecx
register and divide the value in eax
with the value in ecx
. The result of this is stored in eax
again. Now we take the “sum” that we declared at the beginning of the main function (at slots -12 to -8) and add it to the value in eax
, using sum
, storing the result of that (which, again, is saved in eax
), to the same “sum” slot. Finally, we “jump” back to the read_loop label to run this whole thing again.
Printing the final result
This is the code we’re working with:
after_loop:
mov rdi, qword ptr [rbp - 8]
call _fclose
lea rdi, [rip + print_digit_format]
mov esi, dword ptr [rbp - 12]
call _printf
leave
ret
Once we reach this code, we know we’ve run out of values we read, and whatever value is at rbp - 12
is the answer to the problem. We start with a little housekeeping (which I don’t do with the rest of the code) by closing the file pointer we opened earlier. Simple. Then, we call printf
with %d\n
and our sum. Finally, we leave
, which does the opposite of enter
, restoring the stack, and then, at long last, return from our main function.
The Solution: Part 2
The second part of the task is a little bit more complicated. We now have to take into account the mass of the fuel as well. Essentially, we do the same as before, but once we have the amount of fuel, we need to calculate the amount of fuel that fuel takes to transport, how much that fuel takes to transport, etc, until we reach 0.
Let’s take a look at the code for part 2.
.intel_syntax noprefix
.data
file_name:
.asciz "input"
read_flag:
.asciz "r"
scan_digit_format:
.asciz "%d"
print_digit_format:
.asciz "%d\n"
.text
.global _main
_main:
enter 32, 0
lea rdi, [rip + file_name]
lea rsi, [rip + read_flag]
call _fopen
mov qword ptr [rbp - 8], rax
mov dword ptr [rbp - 12], 0
read_loop:
mov rdi, qword ptr [rbp - 8]
lea rsi, [rip + scan_digit_format]
lea rdx, [rbp - 16]
call _fscanf
cmp eax, 1
jne after_loop
mov eax, dword ptr [rbp - 16]
cdq
mov ecx, 3
idiv ecx
sub eax, 2
mov dword ptr [rbp - 20], eax
fuel_loop:
cmp dword ptr [rbp - 20], 0
jle read_loop
mov eax, dword ptr [rbp - 20]
add eax, dword ptr [rbp - 12]
mov dword ptr [rbp - 12], eax
mov eax, dword ptr [rbp - 20]
cdq
mov ecx, 3
idiv ecx
sub eax, 2
mov dword ptr [rbp - 20], eax
jmp fuel_loop
after_loop:
mov rdi, qword ptr [rbp - 8]
call _fclose
lea rdi, [rip + print_digit_format]
mov esi, dword ptr [rbp - 12]
call _printf
leave
ret
The code is largely the same, so let’s just look at the differences.
First, we allocate a little more memory at the start as we need to keep track of a new value, the amount of fuel left to calculate for. Although we could get by with just using 20 bytes, let’s allocate 32 for it to be a power of 2 for reasons I won’t get into here.
Second, we run a new loop inside of the read loop, to continually apply the equation to the leftover fuel until we reach 0. We use a new operation here, jle
, or “jump if less than or equal to”. It’s very similar to the jne
we’ve seen before. We compare the amount of fuel left (saved in slot -20) to 0, and with jle
, we will jump to read a new value if the leftover fuel value is less than or equal to 0.
Retro
While the task itself was very simple, I had a blast at learning x86 to finish it. It did take me quite a while though, so catching up to the advent of code tasks will take longer than I planned.
If you want to look at the code in full, I’m putting all of the Advent of Code solutions on GitHub.