Advent of Languages: Day 1 – x86 assembly

03 December, 2019

9 minute read

This is a part of my ongoing challenge of solving every Advent of Code puzzle in a different language.

The challenge we're tackling today is available here.

The Tyranny of the Rocket Equation

After a bit of characteristic fluff about Santas and elves and such, we get to the meat of the problem.

Given a mass m, divide it by 3, round it down, then subtract 2. As usual, we’re given an input file with a bunch of masses. Dividing and rounding it down is thankfully how integer division works. To be exact, we take the quotient and discard the remainder. For each mass in the input file, we’re to find the result of this equation, and then add them together. Advent of Code provides a nice input file for everyone, and taking a look at mine, it seems the result should fit in an integer, which makes this extra simple.

The Language

Since this problem is simple, we should pick a tricky language to get it out of the way. The hardest part of this problem will probably be just loading the file itself. Thus, I’ve decided to go for x86 assembly.

Assembly languages are a family of low-level languages, usually mapping 1-1 to processor instructions. This means they’re a little tough to reason about for wetware. x86 assembly is specifically the assembly that my computer (with an intel processor) is familiar with. This is what a hello world in x86 assembly for my computer looks like:

.intel_syntax noprefix
.data
.align 16

hello_msg:
    .asciz "Hello, World!"

.text

.global _main
_main:
    push rbp
    mov rbp, rsp

    lea rdi, [rip+hello_msg]
    call _puts

    xor rax, rax
    leave
    ret

To compile and run this, since I’m on macOS, I’ll use clang. On a mac, doing direct system calls in assembly is not recommended. Instead, we’ll call into the c standard library for i/o.

The Solution: Part 1

As suspected, the solution was simple. However, brushing up on my x86 assembly took a long time, and I’m pretty sure there is a myriad of problems and major security issues with my code. For one, I didn’t optimize memory alignment at all. Please do point these issues out for me, however, since I’m not experienced with plain asm (the cool person’s way of saying assembly).

Let’s take a look at the finished code to start with, then run through the code line by line to understand what’s going on.

.intel_syntax noprefix
.data

file_name:
    .asciz "input"
read_flag:
    .asciz "r"
scan_digit_format:
    .asciz "%d"
print_digit_format:
    .asciz "%d\n"

.text
.global _main
_main:
    enter 16, 0

    lea rdi, [rip + file_name] 
    lea rsi, [rip + read_flag] 
    call _fopen 
    mov qword ptr [rbp - 8], rax 

    mov dword ptr [rbp - 12], 0 

read_loop: 
    mov rdi, qword ptr [rbp - 8]
    lea rsi, [rip + scan_digit_format]
    lea rdx, [rbp - 16]
    call _fscanf

    cmp eax, 1
    jne after_loop

    mov eax, dword ptr [rbp - 16]
    cdq
    mov ecx, 3
    idiv ecx
    sub eax, 2
    add eax, dword ptr [rbp - 12]
    mov dword ptr [rbp - 12], eax

    jmp read_loop

after_loop:
    mov rdi, qword ptr [rbp - 8]
    call _fclose

    lea rdi, [rip + print_digit_format]
    mov esi, dword ptr [rbp - 12]
    call _printf

    leave
    ret

It consists of roughly four parts:

Program setup
Getting a handle to the input file
The loop that reads and computes the result
Printing the final result

Let’s get into them a bit more deeply.

Program setup

Let’s take a look at this part of the code:

.intel_syntax noprefix
.data

file_name:
    .asciz "input"
read_flag:
    .asciz "r"
scan_digit_format:
    .asciz "%d"
print_digit_format:
    .asciz "%d\n"

Let’s notice a couple of things about our source code: some stuff starts with . and some doesn’t. The things that do are called directives, which means they aren’t part of the actual calls to the x86 processor. Rather, they’re used to put those together.

The first line declares, with a directive, what kind of assembly syntax this is, in this case, it’s non-prefixed intel syntax. I picked it after a long research process of copying what the working hello world program did. The second line declares that we’re going into a .data section. An assembly file is split into a couple of different sections. In our program, we use the .data and .text sections. .data is used to store some sort of initialized data that does not change at runtime. Take note that we can move these sections wherever we want.

Next, we set up four “strings” (really just null-terminated byte arrays) statically in the program, using a combination of labels and the .asciz directive. A label (like file_name) is essentially a handy name we can give to a place in our source code. We’ll use these labels later in the code itself to refer to these strings. .asciz will just write null-terminated ASCII strings to that place in the source code.

That’s it for the data section! On to the meat.

Getting a handle to the input file

As a reminder, now we’re working on this bit of the code:

.text
.global _main
_main:
    enter 16, 0

    lea rdi, [rip + file_name] 
    lea rsi, [rip + read_flag] 
    call _fopen 
    mov qword ptr [rbp - 8], rax 

    mov dword ptr [rbp - 12], 0

There’s quite a bit more here to cover than in the last section. What we accomplish in this section is to start the main function and to store a pointer to the input file in memory we can work with. We start by declaring that we’re entering a .text section, which is used to indicate that this is where the program source code lives.

Then we defined that the _main label should be accessible outside this program, effectively turning it .global. Then we declare where the _main label lives, which is where our program starts. This is effectively the main function you’re used to in languages like c or java.

Now we declare our first actual x86 instruction, using the enter operation. enter sets up the “stack frame” for this function. The stack frame is essentially a space in memory for stuff in your function. When you declare local variables in c, they use the stack frame. Essentially we’re making the computer store two pointers to either side of this frame here. The 16 here corresponds to how many bytes we’re going to need in our function, I reckon we’ll need 16 bytes for this task. That means the size of the stack frame we’re working with is 16 bytes. The 0 there is used for nesting frames inside each other, which we’re not going to do.

If we skip forward a bit, you can see we’re going to run call _fopen. This means we’re going to be calling the standard library function fopen to get a pointer to the input file. The way calling functions in asm works, is you write your arguments to different registers in the CPU, then you call the function, and then the function writes its output to another register. rdi and rsi correspond to the first and second argument of the function you’re going to call, respectively. In x86, the argument registers are: rdi, rsi, rdx, rcx, r8, and r9. This means you can call functions like this with a maximum of six arguments. After that, it gets more complicated, as you have to use the stack for any additional arguments.

What the next three lines try to accomplish is, essentially, to call fopen("input", "r"). For folks unfamiliar with c, this means opening a file with the relative filename of “input” in read-only mode. We have those strings saved in the .data section of our software, and now we want to load their locations into the rdi and rsi registers.

If we look at the next actual instruction, the operation it uses, lea, places the memory address of the second operand into the register specified by its first operand. The brackets denote that we compute this address. The file_name there is one of the labels we set up earlier, and rip is the actual instruction pointer (pointer to which instruction the computer is executing). We get the address of the string by adding them together. Thus, we load the address of “input” into the first argument register and “r” into the second.

Finally, we call _fopen.

The return value of _fopen, the pointer to the file we just opened, is stored in the rax return register. We need to get it out of there before we go forward, as it is used by other functions as well. Thus, we load it into ram from the register. The mov operation is similar to lea, but while lea copies the address of the second argument, mov copies the value at the second argument.

We want to mov the value at the rax register to ram. The value we want to move is a pointer to a file. A pointer on a 64-bit machine takes, by definition, 8 bytes. This is because 64-bit denotes that the machine’s ram is addressable (how many slots it has) to up to the maximum 64-bit value. Thus, it takes 64/8=8 bytes to store slots to ram. qword ptr declares that the value we’re trying to move to is a pointer to a qword. qword, meaning quadword, means the value is 8 bytes. That’s because a “word” is a 2-byte value. Thus, a quadword means 4*2=8 bytes. This is the size we want for our pointer. We also pick a spot in our stack (that we allocated earlier) to store it at, this being the area from -8 bytes to 0 in our stack (negative, because the stack grows downwards on my system). Essentially, we’re declaring a local variable here, of type “pointer to a file”.

Finally, we create another new local variable to store the actual sum of our computation. Since this is an int, which takes 4 bytes, this time we use the length dword (“double word = 2 * 2 = 4 bytes), at slots -12 to -8. We set the value there to 0.

The computation loop

The code we’re now looking at is this part:

read_loop: 
    mov rdi, qword ptr [rbp - 8]
    lea rsi, [rip + scan_digit_format]
    lea rdx, [rbp - 16]
    call _fscanf

    cmp eax, 1
    jne after_loop

    mov eax, dword ptr [rbp - 16]
    cdq
    mov ecx, 3
    idiv ecx
    sub eax, 2
    add eax, dword ptr [rbp - 12]
    mov dword ptr [rbp - 12], eax

    jmp read_loop

We won’t go as deep as we did in the last section, as a lot of it is covered already. What we do in this section is read the input file line by line, compute the mass for the line and add it to the total sum, exiting the loop if we run out of lines. We start by wanting to call the fscanf function with three arguments. These arguments are:

The pointer to the file we’re reading that we stored in ram earlier.
The “scan format”, in this case, “%d”, denoting an integer. We have defined this string in our data section under the scan_digit_format section.
The address where we want to store the read digit to.

fscanf stores the read variables to the addresses passed in. It returns how many variables were read and stores it in the eax register. Note: the rax and eax registers are essentially the same thing, but have different sizes. rax is 8 bytes and eax is 4. The same convention applies to the return registers. In this case, we want to read and compute until we’re not reading new variables. We use the mov and lea operations to put our variables into the arguments registers. Notice that we’re using the previously unused -16 to -12 ram slot to store the read digit in.

After calling fscanf, we use a new operation, cmp. It compares the first operand with the second and stores the result in a special register meant for comparison. We use that value in the proceeding line, using jne. That operand means “jump if not equals”, it looks at the values in the special comparison register (which is called EFLAGS), and if they indicate that the last compared values were not equal, it will jump to the location you provide. What we’re doing here is comparing the return value of fscanf, (which is stored in eax) to 1, and if they don’t equal, we’re going to “jump” to the label after_loop, defined later in our software.

Now if we didn’t jump, we proceed to the computation part of the loop. We want to divide the value read with 3, subtract 2 from it and then add it to the total sum, after which we want to repeat the first part of the loop. We start by moving the value fscanf read into eax because the following operations will run on that register. I won’t go too deep here as division is a bit complicated. Since I don’t understand this very well, I won’t try to explain it. We run cdq to change the value in eax to a 64-bit value (as the div operation wants a 64-bit value), then move 3 to the ecx register and divide the value in eax with the value in ecx. The result of this is stored in eax again. Now we take the “sum” that we declared at the beginning of the main function (at slots -12 to -8) and add it to the value in eax, using sum, storing the result of that (which, again, is saved in eax), to the same “sum” slot. Finally, we “jump” back to the read_loop label to run this whole thing again.

Printing the final result

This is the code we’re working with:

after_loop:
    mov rdi, qword ptr [rbp - 8]
    call _fclose

    lea rdi, [rip + print_digit_format]
    mov esi, dword ptr [rbp - 12]
    call _printf

    leave
    ret

Once we reach this code, we know we’ve run out of values we read, and whatever value is at rbp - 12 is the answer to the problem. We start with a little housekeeping (which I don’t do with the rest of the code) by closing the file pointer we opened earlier. Simple. Then, we call printf with %d\n and our sum. Finally, we leave, which does the opposite of enter, restoring the stack, and then, at long last, return from our main function.

The Solution: Part 2

The second part of the task is a little bit more complicated. We now have to take into account the mass of the fuel as well. Essentially, we do the same as before, but once we have the amount of fuel, we need to calculate the amount of fuel that fuel takes to transport, how much that fuel takes to transport, etc, until we reach 0.

Let’s take a look at the code for part 2.

.intel_syntax noprefix
.data

file_name:
    .asciz "input"
read_flag:
    .asciz "r"
scan_digit_format:
    .asciz "%d"
print_digit_format:
    .asciz "%d\n"

.text
.global _main
_main:
    enter 32, 0

    lea rdi, [rip + file_name] 
    lea rsi, [rip + read_flag] 
    call _fopen 
    mov qword ptr [rbp - 8], rax 

    mov dword ptr [rbp - 12], 0 

read_loop: 
    mov rdi, qword ptr [rbp - 8]
    lea rsi, [rip + scan_digit_format]
    lea rdx, [rbp - 16]
    call _fscanf

    cmp eax, 1
    jne after_loop

    mov eax, dword ptr [rbp - 16]
    cdq
    mov ecx, 3
    idiv ecx
    sub eax, 2
    mov dword ptr [rbp - 20], eax

fuel_loop:
    cmp dword ptr [rbp - 20], 0
    jle read_loop

    mov eax, dword ptr [rbp - 20]
    add eax, dword ptr [rbp - 12]
    mov dword ptr [rbp - 12], eax

    mov eax, dword ptr [rbp - 20]
    cdq
    mov ecx, 3
    idiv ecx
    sub eax, 2
    mov dword ptr [rbp - 20], eax

    jmp fuel_loop

after_loop:
    mov rdi, qword ptr [rbp - 8]
    call _fclose

    lea rdi, [rip + print_digit_format]
    mov esi, dword ptr [rbp - 12]
    call _printf

    leave
    ret

The code is largely the same, so let’s just look at the differences.

First, we allocate a little more memory at the start as we need to keep track of a new value, the amount of fuel left to calculate for. Although we could get by with just using 20 bytes, let’s allocate 32 for it to be a power of 2 for reasons I won’t get into here.

Second, we run a new loop inside of the read loop, to continually apply the equation to the leftover fuel until we reach 0. We use a new operation here, jle, or “jump if less than or equal to”. It’s very similar to the jne we’ve seen before. We compare the amount of fuel left (saved in slot -20) to 0, and with jle, we will jump to read a new value if the leftover fuel value is less than or equal to 0.

Retro

While the task itself was very simple, I had a blast at learning x86 to finish it. It did take me quite a while though, so catching up to the advent of code tasks will take longer than I planned.

If you want to look at the code in full, I’m putting all of the Advent of Code solutions on GitHub.