Almost everything you'll need is in the ARMv8 Architecture Reference Manual and Cortex-72A Processor Technical Reference Manual. I highly recommend downloading a copy of each PDF. Some of their contents are reproduced below.

Our Languages

Term Definition
cortex-72a the processor used by the Raspberry Pi 4
broadcom bcm2711 same as above, for our purposes (???)
instruction encoding the encoding for translating assembly into machine code

Terminology

These are the search terms you're looking for.

Term Definition
cortex-72a the processor used by the Raspberry Pi 4
broadcom bcm2711 same as above, for our purposes (???)
instruction encoding the encoding for translating assembly into machine code

Peripherals

Many peripherals (external io stuff) are accessible through special memory addresses. The Raspberry Pi 4's peripherals document has not yet been released, so for now we'll use the RPi 3's BCM2835 ARM Peripherals Manual.

UART

External Documentation

Setup Procedure

This can be skipped on QEMU, but I recommend implementing the hardware setup procedure as promptly as possible.

For the following setup steps, use the BCM2836 Peripheral Manual's GPIO address section, replacing 0x7E20 with 0xFE20 for raspi4. Also see the manual's UART address section.

  1. disable UART using the UART control register
  2. disable GPIO pin pull up/down
  3. delay for 150 cycles (create a loop with a countdown)
  4. disable GPIO pin pull up/down clock 0
  5. delay for 150 cycles
  6. disable GPIO pin pull up/down clock 0 (yeah, again; idk why)
  7. clear all pending interrupts using the UART interrupt clear register
  8. set baud rate to 115200 given a 3 Mhz clock (follow the PrimeCell UART Manual's baud rate calculation example)
  9. enable FIFO and 8-bit data transmission using the UART line control register
  10. mask all interrupts using the interrupt mask set/clear register
  11. enable UART, transfer, and receive using the UART control register

Writing

UART data can be sent by storing ASCII-encoded text in 0xFE201000.

On real hardware, you'll want to first wait until the UART flag register says that the transmit FIFO isn't full.

Reading

Memory Allocation

Efficiently allocating memory can be a beast to write in machine code, so we'll put that off until we have a higher-level language to work with. In the mean time, this is the gist of the memory scheme we'll be using on the Raspberry Pi 4:

Memory Address(es) Usage
0x00000-0x80000 System stack
0x80000-0x300000 Machine code
0x300000-0x3b3fffff 0.99 GB of memory to be used by our machine code

Machine Code Overview

Encoding (pg 223)

See the machine code tutorial.

Registers (pg 75)

These are the fastest place to store data. Most machine code instructions involve registers and moving data to/from/between them.

Register Description
MPIDR_EL1 This read-only identification register, among other things, provides a core identification number (how to access)
r0 to r15 General-purpose registers. Because we're writing our own assembly language, feel free to use these however you want
r31 or SP Depending on the instruction, this is either the stack pointer or a register that always reads zero and discards data when written

Constants ("immediate" values)

The aarch64 instruction encoding is 32 bits wide, so we cannot store large constants into registers in a single command. Instead, we use multiple commands to store the constant, such as mov with a bit shift followed by one or more add instructions.

Machine Code Operations (pg 224)

Every operation that you'll need should be in this document. There are plenty more operations out there, but for the purposes of this book, we'll only learn the basics. This is a tradeoff of efficiency (using the minimal number of instructions) verus simplicity.

Term Definition
immediate constant
imm signed immediate
uimm unsigned immediate

Register Movement

This instruction family copies into a register either a constant or the value of another register.

From Constant (pg 226)

110100101 hw2 imm16 Rd5
  Rd <= imm << hw*16

From Register (pg 723)

1001000100000000000000 Rn5 Rd5
  (Rd or *SP) <= (Rn or *SP)

From System Register (pg 802)

11010101001 SRn16 Rt5
  Rt <- SRn
System Register SRn
MPIDR_EL1 1100 0000 0000 0101

Logical Operations (pg 270)

Using constants in logical aarch64 operations can be surprisingly complex, so we'll only use logical operations between registers.

I won't explain all of these here, but know that xor is also known as eor.

1 opc2 010100 shift1 0 Rm5 uimm6 Rn5 Rd5
  out = Rn # Rm
  Rd <= shift ? (out >> uimm) : (out << uimm)
opc instruction
00 AND
01 OR
10 XOR

Register-based

And, immediate

Or

Xor

Arithmetic Operations

Add (pg 533)

Add, immediate (pg 531)

100100010 shift1 uimm12 Rn5 Rd5
  Rd <= Rn + (uimm << (shift ? 12 : 0))
shift If one, uimm is shifted 12 bits to the left
uimm Unsigned constant integer

Sub (pg 961)

11001011 shift2 0 Rm5 uimm6  Rn5 Rd5
  Rd <= Rn - Rm

Sub, immediate

110100010 shift1 uimm12 Rn5 Rd5
  Rd <= Rn - (uimm << (shift ? 12 : 0))

Memory Operations

Rose Lowe cs2310 Slideshow

Store (pg ???)

10111000000 imm9 00 Rn5 Rt5
  *(Rn + imm) <= (Rt or SP)

Store Byte (pg 702)

0011100100 imm12 Rn5 Rt5
  Rt <= *(Rn + imm)

LOAD BYTE https://static.docs.arm.com/ddi0487/ca/DDI0487C_a_armv8_arm.pdf#page=704&zoom=auto,-4,732

Load Byte (pg 702)

0011100101 imm12 Rn5 Rt5
  Rt <= *(Rn + imm)

Load (pg 769)

11111000010 imm9 00 Rn5 Rt5
  Rt <= *(Rn + imm)

Reads `Rn` and stores it into the memory address `Rt + imm`.

Branching (pg 228)

This is how you'll jump around the source code, which allows us to implement functions, if statements, and more.

Unconditional Jump (pg 233)

0 0 0 1 0 1 imm26

imm is a signed constant that specificies how many instructions forward/backwards the processor should jump.

Compare (pg 594)

11111010010 Rm5 111100 Rn5 00000
  Rn ? Rm

Compares Rn to Rm. Used before a conditional jump.

Conditional Jump (pg 228)

01010100 imm19 0 cond4

imm is a signed constant that specificies how many instructions forward/backwards the processor should jump.

cond description
0000 Equal
0001 Not Equal
1011 Less Than
1101 Less Than or Equal
1100 Greater Than
1010 Greater Than or Equal

imm is a signed constant that specificies how many instructions forward/backwards the processor should jump.

Miscellaneous

Enter Sleep State (pg 1000)

1101 0101 0000 0011 0010 0000 0101 1111

You'll want to loop this instruction.


Is something confusing? Email us!
We'd love a chance to help out and improve our documentation.
Our addresses are listed on our GitHub accounts @aaronjanse and @rohantib.