Timeline

2025-09-27

init


Reference documentation:

Assemblers for ARM64:

  • ARM’s official assembler
  • GNU AS assembler: aarch64-linux-gnu-as
  • GCC uses as as its assembler, so assembly code uses AT&T syntax
    • AT&T: Derived from Bell Labs, created for UNIX system development.
    • ARM format: ARM’s official assembly syntax.

Syntax

  • label: Any identifier ending with a colon is considered a label.
  • Comments:
    • // denotes a comment.
    • # at the beginning of a line denotes a whole-line comment.
  • Instructions, pseudo-instructions, and registers can all be uppercase or lowercase. GNU style defaults to lowercase.

Symbol

Represents the address where it resides; can also be used as a variable or function.

  • Global symbols can be declared with .global.
  • Local symbols are mainly used in local scope, with labels named as numbers 0-99, usually combined with the b instruction.
  • f: tells the assembler to search forward.
  • b: tells the assembler to search backward.

Alignment Directives

  • .align: Align by padding with data, either 0 or NOP instructions.
    • Tells the assembler that the code following .align must start at an address divisible by 2^n.
    • In ARM64, the first argument represents 2^n size.

Data Definition Directives

Integer and Floating-Point Directives Summary
Directive Data Type / Purpose Size Notes
.byte Define 8-bit integer 1 B Often used for characters, control bits
.hword Define 16-bit integer (half-word) 2 B Also called .short on some architectures
.int/.long Define 32-bit integer 4 B .int is an alias, same effect
.quad Define 64-bit integer (quad-word) 8 B Very common in AArch64
.float Define IEEE-754 single-precision float (32-bit) 4 B Equivalent to C float
String Definition Directives
Directive Description
.ascii "str" Insert the string as-is without automatic \0; suitable for non-C-style strings
.asciz "str" Automatically append a null character \0 at the end; suitable for C strings (recommended)
.rept ... .endr: Repeat Block Definition

Syntax:

1
2
3
.rept <count>
<content>
.endr

Purpose: Repeat a block of assembly code or data definitions a specified number of times, useful for initializing arrays or padding space.

Example:

1
2
3
.rept 3
.long 0
.endr

Equivalent to:

1
2
3
.long 0
.long 0
.long 0
.equ / .set: Constant Definition (Assignment)

These two directives are completely equivalent, differing only in syntax style.

.equ:

1
.equ abcd, 0x45

Makes abcd a constant macro definition with value 0x45.

.set:

1
.set abcd, 0x45

Same effect — also defines abcd as 0x45.

Typical uses: defining register addresses, constant bit masks, etc.

Common usage example (combining .equ and .rept):

1
2
3
4
5
6
.equ LED_BASE, 0x3F200000

.section .data
.rept 4
.int LED_BASE
.endr

Fills LED_BASE address 4 times, 4 bytes each, totaling 16 bytes.

Differences between .equ and C #define:

.equ / .set #define
Assigned at assembly stage, value is immutable Text substitution at preprocessing stage
Can be used in expressions (e.g. .equ val, 4+5) Only does text concatenation
Cannot be used for conditional compilation Can be used with #ifdef, etc.
Directive Purpose
.global Define a global symbol
.include Include a header file
.if .else .endif Control structure for conditional compilation
if Statement Directives
Directive Meaning
.ifdef symbol Check if symbol is defined
.ifndef symbol Check if symbol is not defined
.ifc str1,str2 Check if strings str1 and str2 are equal
.ifeq expr Check if expression expr equals 0
.ifeqs str1,str2 Equivalent to .ifc str1,str2
.ifge expr Check if expression expr is ≥ 0
.ifle expr Check if expression expr is ≤ 0
.ifne expr Check if expression expr is ≠ 0
  • .section indicates which section the following assembly will be linked into (e.g., code section, data section, etc.).
  • Each section begins with a section name and ends at the next section name or end of file.
1
.section name, "flags"

Flags can be added to indicate section attributes:

Flag Meaning
a allocatable: The section needs to be loaded into memory at runtime.
d GNU_MBIND section: A special binding section used by GNU.
e excluded: The section will not be included in the executable or shared library.
w writable: The section is writable.
x executable: The section contains executable code.
M mergeable: Can be merged with other sections having the same attributes (typically used for read-only strings).
S string: The section contains null-terminated strings.
G group: The section belongs to a section group (e.g., COMDAT).
T thread-local-storage: The section is used for Thread Local Storage (TLS).
? unspecified group: The section belongs to the previous section’s group (if any).

Example:

1
.section ".idmap.text","awx"

.pushsection <name>: Inserts the following code or data into the specified section while saving the current section state.

.popsection: Ends the previous push and restores the original section.

  • Used in pairs.
  • Only affects code between pushsection and popsection; other code is unaffected.
  • The rest of the code still belongs to the original section, such as .text or .data.
1
2
3
4
5
6
7
8
9
10
    .text
.globl _start
_start:
nop // In the default .text section

.pushsection .mydata, "a"
.long 0x12345678 // Inserted into .mydata section
.popsection

nop // Back in .text section

_start and both nops belong to the .text section; .long 0x12345678 is inserted into the custom .mydata section.

Macros
  • .macro and .endm form a macro.
  • .macro is followed by the macro name, then the macro parameters.
  • Use parameters in the macro by prefixing with \.
1
.macro plus1 p, p1

Defines a macro named plus1 with two parameters p and p1.
Parameters are referenced in the macro with the \ prefix: \p for the first parameter, \p1 for the second.

  • Macro parameters can have default values when defined:
1
.macro reserve_str p1=0 p2

The first parameter p1 has a default value of 0. You can then call reserve_str a,b or reserve_str ,b to invoke this macro.

Potential issues when using parameters in macros

Solutions:

  • Use spaces or use altmacro+&

Using spaces or altmacro+&

  • Use \() for concatenation

Using \() for concatenation

Linux kernel example using \()

ARM64-Specific Features

ARM64 Compilation Options
  • -EB: for big-endian CPUs; -EL: for little-endian CPUs.
  • -mabi: specifies ABI mode — ilp32 for ELF32, lp64 for ELF64; default is lp64.
  • -mcpu=processor+extension: specifies CPU model, e.g., cortex-a72.
  • -march=: specifies the supported architecture, e.g., armv8.2-a.
  • ARM64 supported extensions: see GNU Assembler as_v2.34 Chapter 9.1.2.
Special Characters
  • // denotes a comment.
  • # at the beginning of a line denotes a comment; not at the beginning can also denote an immediate value.
  • :lo12: denotes the lower 12 bits.
1
2
adrp x0, foo
ldr x0, [x0, #:lo12:foo]
  • ldr pseudo-instruction
  • .bss switches to the BSS section
  • .dword/.xword for 64-bit data
  • name .req register_name to alias a register:
1
foo .req w0