Almost everything you'll need is in the ARMv8 Architecture Reference Manual and Cortex-72A Processor Technical Reference Manual. I highly recommend downloading a copy of each PDF. Some of their contents are reproduced below.
Term | Definition |
---|---|
cortex-72a | the processor used by the Raspberry Pi 4 |
broadcom bcm2711 | same as above, for our purposes (???) |
instruction encoding | the encoding for translating assembly into machine code |
These are the search terms you're looking for.
Term | Definition |
---|---|
cortex-72a | the processor used by the Raspberry Pi 4 |
broadcom bcm2711 | same as above, for our purposes (???) |
instruction encoding | the encoding for translating assembly into machine code |
Many peripherals (external io stuff) are accessible through special memory addresses. The Raspberry Pi 4's peripherals document has not yet been released, so for now we'll use the RPi 3's BCM2835 ARM Peripherals Manual.
0xFE201000
on raspi4This can be skipped on QEMU, but I recommend implementing the hardware setup procedure as promptly as possible.
For the following setup steps, use the BCM2836 Peripheral Manual's GPIO address section, replacing 0x7E20
with 0xFE20
for raspi4. Also see the manual's UART address section.
BDR_I
) to the UART integer baud rate divisor registerm
) to the UART fractional baud rate divisor registerUART data can be sent by storing ASCII-encoded text in 0xFE201000
.
On real hardware, you'll want to first wait until the UART flag register says that the transmit FIFO isn't full.
Efficiently allocating memory can be a beast to write in machine code, so we'll put that off until we have a higher-level language to work with. In the mean time, this is the gist of the memory scheme we'll be using on the Raspberry Pi 4:
Memory Address(es) | Usage |
---|---|
0x00000-0x80000 |
System stack |
0x80000-0x300000 |
Machine code |
0x300000-0x3b3fffff |
0.99 GB of memory to be used by our machine code |
See the machine code tutorial.
These are the fastest place to store data. Most machine code instructions involve registers and moving data to/from/between them.
Register | Description |
---|---|
MPIDR_EL1 |
This read-only identification register, among other things, provides a core identification number (how to access) |
r0 to r15 |
General-purpose registers. Because we're writing our own assembly language, feel free to use these however you want |
r31 or SP |
Depending on the instruction, this is either the stack pointer or a register that always reads zero and discards data when written |
The aarch64 instruction encoding is 32 bits wide, so we cannot store large constants into registers in a single command. Instead, we use multiple commands to store the constant, such as mov
with a bit shift followed by one or more add
instructions.
Every operation that you'll need should be in this document. There are plenty more operations out there, but for the purposes of this book, we'll only learn the basics. This is a tradeoff of efficiency (using the minimal number of instructions) verus simplicity.
Term | Definition |
---|---|
immediate | constant |
imm |
signed immediate |
uimm |
unsigned immediate |
This instruction family copies into a register either a constant or the value of another register.
110100101 hw2 imm16 Rd5 Rd <= imm << hw*16
1001000100000000000000 Rn5 Rd5 (Rd or *SP) <= (Rn or *SP)
11010101001 SRn16 Rt5 Rt <- SRn
System Register | SRn |
---|---|
MPIDR_EL1 |
1100 0000 0000 0101 |
Using constants in logical aarch64 operations can be surprisingly complex, so we'll only use logical operations between registers.
I won't explain all of these here, but know that xor
is also known as eor
.
1 opc2 010100 shift1 0 Rm5 uimm6 Rn5 Rd5 out = Rn # Rm Rd <= shift ? (out >> uimm) : (out << uimm)
opc | instruction |
---|---|
00 |
AND |
01 |
OR |
10 |
XOR |
100100010 shift1 uimm12 Rn5 Rd5 Rd <= Rn + (uimm << (shift ? 12 : 0))
shift | If one, uimm is shifted 12 bits to the left |
uimm | Unsigned constant integer |
11001011 shift2 0 Rm5 uimm6 Rn5 Rd5 Rd <= Rn - Rm
110100010 shift1 uimm12 Rn5 Rd5 Rd <= Rn - (uimm << (shift ? 12 : 0))
10111000000 imm9 00 Rn5 Rt5 *(Rn + imm) <= (Rt or SP)
0011100100 imm12 Rn5 Rt5 Rt <= *(Rn + imm)
LOAD BYTE https://static.docs.arm.com/ddi0487/ca/DDI0487C_a_armv8_arm.pdf#page=704&zoom=auto,-4,732
0011100101 imm12 Rn5 Rt5 Rt <= *(Rn + imm)
11111000010 imm9 00 Rn5 Rt5 Rt <= *(Rn + imm)
Reads `Rn` and stores it into the memory address `Rt + imm`.
This is how you'll jump around the source code, which allows us to implement functions, if statements, and more.
0 0 0 1 0 1 imm26
imm
is a signed constant that specificies how many instructions forward/backwards the processor should jump.
11111010010 Rm5 111100 Rn5 00000 Rn ? Rm
Compares Rn to Rm. Used before a conditional jump.
01010100 imm19 0 cond4
imm
is a signed constant that specificies how many instructions forward/backwards the processor should jump.
cond | description |
---|---|
0000 |
Equal |
0001 |
Not Equal |
1011 |
Less Than |
1101 |
Less Than or Equal |
1100 |
Greater Than |
1010 |
Greater Than or Equal |
imm
is a signed constant that specificies how many instructions forward/backwards the processor should jump.
1101 0101 0000 0011 0010 0000 0101 1111
You'll want to loop this instruction.
Is something confusing? Email us!
We'd love a chance to help out and improve our documentation.
Our addresses are listed on our GitHub accounts @aaronjanse
and @rohantib
.