Assembly Language
Confession: I don’t know much about assembly. Thankfully compilers are good enough these days that it doesn’t matter, but I’ve always been curious about writing assembly as is. I ended up writing some m68k assembly last post when writing an m68k to x86_64 interpreter. So I had the bright idea to write my own assembly language. For fun.
Now the actual plan was as follows: write an assembly language spec, write an interpreter, and write an encoder/decoder (so I could test out how different design decisions affected code density in the binary). My first order of business was to decide on the size of my opcodes. I decided to go with 16-bit like the m68k – I thought it would be spartan enough to not endlessly iterate and be compact yet give me enough space to design instructions. My first decision was to reverse one design decision I think hampered the m68k, splitting registers into address and data registers.
On the m68k, there are 16 32-bit registers. This is for a CPU that came out in 1979; compare it to a similar Intel CPU at the time, the 8086. The 8086 came with 8 16-bit registers. You get 2 times the registers with 2 times the bit-width. Sadly, with the data register/address register split, the 8 data registers store the usual data and the 8 address registers are for indirect accessing of memory. I assume this was chosen because this made registers take 3-bits (2^3 = 8) instead of 4-bits (2^4 = 16), and because the m68k had support for 8 effective addressing modes on its two operands, which took up 6 bits in total (EA + Reg). Given a 4-bit group, you’d be left with another 3 bits for the register for a binary operator, or to do what you wanted.
I wanted to see if it was possible to support 16 registers, so I bit the 4-bit bullet per register. Given that an instruction set needs to have at least 30 must have instructions, you also need to reserve at least 5 bits for every instruction so you know which opcode to use. I decided to go with a scheme that would allow me to have up to 64 instructions – a 4-bit major group, with a further 2-bits subdividing each group with 4 instructions each, so I would have 16 groups with 4 subgroups in each group. This left me with 10-bits. You see the problem. If each register costs you 4-bits, and you have 2 of them, you’re left with 2 bits per instruction. But wait, there’s more. Each instruction must also carry bits for its size. I wanted 16 64-bit registers, so each instruction had to support an 8-bit, 16-bit, 32-bit, and 64-bit variant (I called these byte, short, long, word). This is 4 choices, so this takes up 2-bits.
So there you have it. Each binary instruction has 4-bits for group, 4-bits for subgroup, 2-bits for size, 4-bits for register 1 (which is the target) and 4-bits for register 2. This works out for most instructions, like arithmetic ones, but you cannot do an arithmetic operation + change the addressing mode.
To do so, I added three instructions as full 12-bit instructions.
lea, (load effective address), a load and store
instruction. These took up 4 instructions a pop (since they take up a
whole group), but I was able to cram in a 2-bit effective address for
each of these.
The effective addresses are:
- register direct, as is:
%rX - register indirect, the address of the register,
(%rX) - immediate,
$10 - offset indirect
-4(%sp)
Since you can only place this on either the src or the target, this is a bit barebones, but it works out fine.
I decided not to go with a condition flag design (like m68k, arm, x86) and take a flagless design – in exchange though, you have to implement a lot of branching operations. For the branching instructions:
- Branch if Equal
- Branch if not equal
- Branch less than
- Branch less than equal
- Branch if Zero
- Branch if not zero
I could’ve gone with a sparser set, but I figured it’d be ok to use
the instructions since branching is quite commonplace. For the cmp side,
I decided to go with a sparser set, explicitly requiring a not or a flip
of operands to get <=, >, and
>=.
- Cmp equal
- Cmp not equal
- Cmp less than equal
- Cmp less than
I also wanted compact operations that used immediates a lot, so I
provided immediate verisons of add, sub, mul, mov. If I dropped the
size, I could have 6-bits left over. So I can have immediates from
0..63. However, I really wanted to be able to use 64. It
turns out you actually can support the range of 0..64 for these ops:
For add and sub with 0, this does nothing,
so you can read this code and emit a no-op instead. For mul
and mov, it zeroes out the target register. You can do the
same with xor %rX, %rX, which zeroes out the target
register. Thus, in the textual assembly format, You can support 65
values (which requires 7-bits) by rewriting the last case to something
supported. You can also do this with div (rewrite
div %rX, $0 with trap), or mod as
well.
I decided to add a few extra assembly directives:
.byteto include a byte array,.asciito include an ascii string,.ascizto include an ascii string with a null terminator.includeto include the file copy pasted as in C, to link multiple assembly files together.
After hooking up syscall write on linux, you can write assembly pretty well:
movi %r0, $1 # syscall write
movi %r1, $1 # stdout
load.l %r2, msg # r2 = &msg
movi %r3, $12 # length
trap # perform write
ret
msg:
.asciz "hello world\n"And recursive fibonacci isn’t so bad:
# Calculate fib(10)
movi %r1, $10 # n = 10
call fib
jmp done
fib:
# Prologue: save caller-saved registers
push.w %r2
push.w %r3
# Base case: if n == 0, return 0
brz.l %r1, base_case
# Base case: if n == 1, return 1
mov.l %r3, %r1
subi.l %r3, $1
brz.l %r3, base_case
# Recursive case: fib(n-1)
subi %r1, $1
push.w %r1 # Save n-1
call fib
mov.l %r2, %r0 # r2 = fib(n-1)
pop.w %r1 # Restore n-1
# fib(n-2)
subi.l %r1, $1
push.w %r1
call fib
mov.l %r3, %r0 # r3 = fib(n-2)
pop.w %r1
# Return fib(n-1) + fib(n-2)
mov.l %r0, %r2
add.l %r0, %r3
jmp end_fib
base_case:
mov.l %r0, %r1 # Return n
end_fib:
# Epilogue: restore registers
pop.w %r3
pop.w %r2
ret
done:
nopI really enjoyed this project – it was interesting to design an assembly language, since it made me value what instructions could be supported with such little space. I still won’t be writing assembly, but at least I can understand why it’s so hard to design a good ISA!