Timeline

2025-09-27

init


  • A linker is a program that combines one or more object files generated by a compiler or assembler, along with libraries, into an executable file.
  • GNU Linker uses the AT&T linker script language.

ld Command

  • aarch64-linux-gnu-ld
  • Common parameters:
    • -T: specify linker script.
    • -Map: output a symbol table file.
    • -o: output the final executable binary.

A Simple Example

1
2
3
4
5
6
7
8
SECTIONS
{
. = 0x10000;
.text : {* (.text)}
. = 0x8000000
.data : {*(.data)}
.bss : {*(.bss)}
}

Basic Concepts

  • Input sections and output sections.
  • Each section has a name and size.
  • Section attributes:
    • loadable: the section contents will be loaded into memory at runtime.
    • allocatable: the section contents will not be loaded at runtime.
  • Section addresses:
    • VMA (Virtual Memory Address): virtual address, the runtime address.
    • LMA (Load Memory Address): load address.
    • Typically, the ROM address is the load address, while the RAM address is the VMA.

Linker Script Commands

  • ENTRY(symbol): sets the program entry point.

  • The linker has several ways to set the entry point:

    • Using the -e parameter.
    • Using ENTRY(symbol).
    • At the very beginning of .text.
    • Address 0.
  • INCLUDE filename: includes the filename linker script.

  • OUTPUT filename: outputs the binary file, equivalent to using -o filename on the command line.

  • OUTPUT_FORMAT(bfd): outputs BFD format.

  • OUTPUT_ARCH(bfdarch): outputs the processor architecture format.

Symbol Assignment

  • Symbols can be assigned values just like in C.

Symbols can be assigned like in C

  • . represents the location counter, indicating the current position.

. represents the current position

Symbol References

  • High-level languages often need to reference symbols defined in the linker script.
  • In C, defining a variable and initializing it — e.g., int foo = 100:
    • The compiler defines a symbol foo in the symbol table.
    • The compiler stores 100 in memory for that symbol.
  • Defining a variable in a linker script:
    • The linker only defines the symbol in the symbol table; it does not allocate memory to store the variable’s value.
  • Accessing a linker script-defined variable: you access the variable’s address, not its value.

Linker only defines the symbol, no memory allocated

  • We can set symbols at each section boundary to facilitate C code accessing the start and end addresses of each section.

Setting section start and end addresses

Example from the Linux kernel

SECTIONS Command

  • The SECTIONS command tells the linker how to map input sections to output sections, and how to lay out those output sections in memory.

SECTIONS Command

  • Output section descriptors:

Output section descriptor

Output section descriptor explained

LMA Load Address

  • Every section has a VMA (virtual address, runtime address) and an LMA (load address).
  • In output section descriptors, use AT to specify the LMA.
  • If LMA is not specified via AT, typically LMA = VMA.
  • Building a ROM-based image often requires setting different virtual and load addresses for output sections.

Specifying LMA with AT

Output section virtual address differs from load address

  • The data section’s load address differs from its link address (virtual address), so program initialization must copy the data section from the ROM load address to the SDRAM virtual address.
  • The data load address starts at _etext, the data section runtime address starts at _data, and the data section size is _edata - _data. The following code copies the data section from _etext to _data:

Copying data section from ROM load address to SDRAM virtual address

Common Built-in Functions

ADDR(section)

Returns the VMA address of a previously defined section.

ADDR(section)

ALIGN(n)

Returns the next address aligned to n bytes, calculated based on the current location counter.
Note: n bytes here, not 2^n bytes (different from the assembler’s .align).

ALIGN(n)

SIZEOF(section)

Returns the size of a section.

SIZEOF(section)

MAX(exp1, exp2) / MIN(exp1, exp2)

Returns the maximum or minimum of two expressions.

Experiment 1: Printing Memory Layout of Each Section

Experiment 1

Output

C definitions to obtain addresses defined in the linker script

  1. Linker-exported symbols are addresses, not variable values.

These symbols in the linker script:

1
_text = .;

define an address label (symbol address), not a variable. C has no syntax for “address labels,” so the only way to indirectly reference this address is through some kind of “variable.”

Declaring it as char[] essentially says:

“This is a memory region starting at _text; I care about its address, not its specific contents.”

  1. char[] is the smallest addressable memory unit, convenient for pointer arithmetic.

char is the smallest addressable unit in C (1 byte). Using char[] type allows precise address operations:

1
2
extern char _text[], _etext[];
size_t text_size = _etext - _text; // Calculate section length in bytes

If you used int[] or void*, this calculation might be incorrect or uncompilable.

  1. Difference between char[] vs char*: Linker symbols are “array addresses,” not pointer variables.

While you could write:

1
extern char *_text;

this actually means _text is a “pointer variable,” not an address label.

char *_text; tells the compiler to “fetch the value of the variable _text,” which must be assigned by code.
char _text[]; declares “the linker will provide this address” — no extra symbol or variable is generated.

So the recommended approach is:

1
extern char _text[];

Experiment 2: Load Address ≠ Runtime Address

Experiment 2

Linker Script

Need to copy code from load address to runtime address:

Copy code from load address to runtime address

Experiment 3: Analyzing the Linux 5.0 Kernel Linker Script

Experiment 3

vmlinux.lds.S overall framework

Linker file contents

Definition:

The address assigned by the compiler and linker to each section (such as .text, .data, .bss) when generating an executable file (such as an ELF file).

Characteristics:

  • An address set by the linker at compile time.
  • Can be explicitly set via a linker script, e.g., . = 0x80000;.
  • These addresses are recorded in the executable’s section headers or program headers.

Example:

1
.text : { *(.text) } > 0x80000

means the .text section’s link address is 0x80000.

Load Address

Definition:

The location in memory where the executable file’s contents are loaded — the location where the OS/bootloader places the file into memory.

Characteristics:

  • Usually equals the link address, but can differ in certain cases (such as dynamic linking or load address relocation).
  • Determined by the OS or bootloader; can also be relocated using tools like objcopy.

Example:

  • Your ELF file’s .text section link address is 0x80000, but the bootloader loads it at 0x100000. Then:
    • Link address ≠ load address.
    • If no relocation is performed, the program will crash on execution (due to absolute addresses in the code).
Runtime/Execution Address

Definition:

The actual memory address accessed by the CPU during program execution.

Characteristics:

  • Typically equals the load address (wherever the program is loaded, it executes from there).
  • If MMU (Memory Management Unit) is enabled, the runtime address is a virtual address mapped by the MMU to the physical load address.
  • In bare-metal programs, generally link address = load address = runtime address.