Practical Binary Analysis, Ch. 1: Anatomy of a Binary

71 13 2f 79 55 5d 51 25 54 e8 2f 7a 6b ae 9f 59 12 be 10 7f ed 39 92 c8 5f 56 00 9e 45 24 8e 4c a2 7a ec 2a 39 9b 19 01 d2 1b 9b 0c 77 a8 58 a8 7f a7 25 40 e2 ba f2 89 cf 76 74 d4 c4 a2 1b a2 7c 48 2f a9 ea bc de 28 09 f5 59 57 d1 37 14 c5 6b 55 b8 3e da b6 17 62 27 2c 1d c0 55 96 7a 77 61 f6 40 da 1a 98 cb 3f 1d 21 52 62 35 d6 c7 bd 79 3e b0 76 1b 84 35 15 c2 92 dc 62 07 15 3f 38 bb 3c e0 02 80 42 25 5e 2a 48 04 2b 6a fb 12 9c 16 dd 1e 1b 5a d6 28 6d 9c 78 e0 b2 4b 30 d5 7e 1b 88 5c bf 84 ce 65 0d d9 c9 9b d6 8e ad f9 9a 2b 96 bc 6f 91 a7 90 7d 1e 43 d4 eb be 7c e5 7c 85 b9 98 83 3d 0a c4 c9 bc 41 a6 59 50 33 92 c2 e0 4a ae 7a 40 11 48 37 2d e4 54 bc 65 c4 1e 53 f3 57 48 23 2c 57 61 d6 cc b6 9c 35 04 04 9f de 26 ab 1f 6f dd bc a4 7c 70 f9 09 5c 27 8c be 0f 03 a2 f6 d2 22 e4 18 6b eb 8a 2a 3b 88 9d f8 b8 c5 8f 98 df d0 dc 70 8e d9 a5 1b 24 c1 c4 9e b4 8f 46 c9 f8 dc 2c 91 cd 58 69 44 d3 f8 f1 45 e7 84 e8 57 ae 3c 29 95 81 93 35 f4 2b f5 ec c9 cf 11 e0 49 19 91 3e 77 7f e7 50 7e 91 9e 85 b9 b5 70 02 0b 51 06 dc b2 c6 a4 5f 1f 99 80 25 fd 51 cf 78 4d 71 2b 04 23 3b 04 d8 46 f0 f9 dc da f8 bf dc 29 97 40 82 8a 66 4d 18 4c 8f de 92 63 09 3e 3f eb aa 12 0b 20 a3 a5 34 e9 fa 8f d5 8b 52 1e 5a 71 68 f2 e1 4d e7 b7 e0 db 45 a0 00 86 ca d4 1d 1e 63 8c 90 ef cb 7a 58 63 b8 3f d1 c2 15 ac 14 ed f3 38 80 87 f8 63 ec 80 44 f4 a5 9d a4 5c e9 7b b8 b9 ea 7b 23 2a 04 96 87 3d 2d 6a 6f 0a e9 74 8f df 52 ed 29 3a b7 d6 85 bf 08 49 67 1d c3 f6 2e 01 5c 09 31 67 e0 83 5e 5f 32 3d 1f 55 8a a3 4a 06 87 df e0 1d 38 ec 62 92 fe 4a b3 80 12 56 d5 53 59 ea 6b bd 42 90 f5 1a 28 4e 25 5a 0d b9 a2 ba c9 91 c8 18 6c cb 68 36 36 05 3d 0d 7a b3 b3 3f 2d 64 13 26 16 0c 76 e1 53 98 71 81 f2 6c 9a 9e 23 37 ee b8 aa 13 12 93 1b a4 23 27 40 4f 10 25 a6 0e 3c 7b e4 9f ec f3 b3 24 95 f8 bc 5c 00 60 e4 d0 af dd 07 e0 02 f4 ac 63 68 fb ed b4 13 02 92 52 9d df ac 06 bc 9a 91 a8 9b 5c 9a c4 a3 5a 13 8a ca ec e0 0a 44 4b 5b cf 9e f7 d6 98 95 08 2a 35 0a 1b 66 f4 2b 21 2c 40 bd f4 33 77 47 1f 2d de a9 ca de 03 6b a0 b8 9e 2e e1 dc 3e 02 f9 d3 76 c0 44 3f 87 d5 30 30 e4 62 e0 5b 07 a8 f5 bf 13 12 48 5a a5 46 db 37 12 33 f1 73 c7 8f 5f db 11 8b 83 a5 eb 03 3b 17 d6 f7 b2 b8 b4 95 32 e8 be fe c2 a6 26 49 3e 0a 5b ed db c2 d2 c9 22 e8 37 f0 d1 eb 0b 57

000000 69 db be 76 36 5c |f..1*.| 000008 ba 2c 1f df 55 1f |...&!.| 000010 51 e6 83 01 ba fe |^$]0#r| 000018 b4 0d a1 09 4b 1c |.r..w.| 000020 f8 47 c4 2d 4b 51 |...i.b| 000028 5f 6b 9d 15 ab 07 |p.j.d.| 000030 ab bd 8d 29 e9 ab |{0)=p.| 000038 06 e1 4b 2c ed da |...}a_| 000040 78 36 9b 83 3a 56 |..%ae.| 000048 2d 62 be 2e b0 b5 |..i]pm| 000050 1a 5a fc c3 77 79 |!;.../| 000058 0f 85 52 3f 20 15 |...;..| 000060 b4 24 a9 49 ec 3c |..k.#}| 000068 a5 c2 b5 58 e7 c5 |.#u<>@| 000070 e0 d1 98 8a 6f 9c |...-e_| 000078 fc b2 09 a8 2c d6 |z..tb.| 000080 8f 66 8d f0 3f 95 |$.ij..| 000088 5c 1b 7f 32 2b e9 |^uv.##| 000090 27 2c 63 9e 3f 97 |..#jd.| 000098 29 55 77 0c 94 9c |e.xu..| 0000a0 e2 59 94 7f d9 ae |01..@r| 0000a8 43 9d c8 eb 34 cf |..1$ux| 0000b0 ab b8 02 70 f0 e7 |l<..w.| 0000b8 0e 2d 4b 7c 54 cc |..w...| 0000c0 30 3c 3c df 50 13 |#.spyw| 0000c8 a8 b0 65 99 20 f6 |e$rw.^| 0000d0 4f 67 a7 36 e2 e8 |}..o..| 0000d8 d4 aa d0 3f 16 64 |${}$..| 0000e0 37 78 48 93 ac 14 |...{.p| 0000e8 fc 68 de 91 2a ba |..fr.f| 0000f0 28 93 2c f6 d0 55 |._.va&| 0000f8 39 c0 3c 85 f2 f3 |...$sg| 000100 ef f2 ee c2 da f2 |.b....| 000108 ff aa c8 c7 03 e0 |.!r.tg| 000110 7d b2 e7 e7 8c dd |.[.qq<| 000118 84 8e fb 4c e0 7a |yv%...| 000120 ee ec 26 af dd f9 |w)f._.| 000128 1d e6 0d b3 5e 5b |>.[=ur| 000130 54 07 be 76 13 60 |*...-o| 000138 04 17 36 fd 91 81 |p).om.| 000140 52 d5 09 c1 f3 05 |.%{..-| 000148 8c ae 69 1e e8 c3 |.f..d%| 000150 18 d9 81 88 6a a7 |:1..ae| 000158 dc 3e 38 15 01 05 |^#...x| 000160 42 bc 8b c0 f7 06 |c..=p.| 000168 fa da 6b 22 8b 09 |f.b.*.|

11011100 10100111 00000100 00000111 10101101 00101000 00110000 01000001 01001111 10101101 11011011 01110011 00000101 10001000 01101101 00010011 11010000 11100001 10001011 10100000 11001101 00100000 11101000 01110010 11111011 11010110 01010111 00100101 00110101 00000000 10011100 10111111 00110111 01110100 11010010 10010110 01010011 11100000 11010100 00010110 11101101 00000001 01100011 11001011 01011010 10010111 00110110 10110100 01111001 01001110 10000001 01111011 11111110 11011001 10111101 10111111 11001011 11111111 11101001 10111001 10000100 01100011 01000000 11011100 00110100 10101001 01000010 10111110 01000011 10011011 00000111 10110011 10010111 10001101 11101010 01101110 10110100 10011011 01001010 01000111 10011100 01010101 01101111 00111111 01011011 00101010 00110110 01100011 11111111 10110011 01101000 00111111 01101000 10011000 11011110 10100111 00110010 11111011 00111101 00111111 00000001 11100100 10110111 10100101 11110111 00111100 11001000 00111111 10001100 01001000 01010001 10001010 01000110 10011001 01100010 01101011 00101110 10100100 10011110 10111101 11011001 11101111 00010000 00001011 10001011 00001100 01010110 11001011 10010101 11110111 11000000 00110001 00001110 01111110 01111100 01110011 01010110 11000100

12 July 2024 · binary-analysis · ELF · reverse-engineering

Notes on Chapter 1 of Practical Binary Analysis by Dennis Andriesse, plus my solutions to the exercises.

gcc pipeline

# ── preprocessing ────────────────────────────────────────────────
gcc -E -P <file>            # -E  stop after cpp
                            # -P  omit line markers (# directives)

# ── compilation ──────────────────────────────────────────────────
gcc -S -masm=intel <file>   # -S         emit .s, stop before assembly
                            # -masm=intel  Intel syntax  (≠ AT&T default)

# ── assembly ─────────────────────────────────────────────────────
gcc -c <file>               # compile + assemble → relocatable .o, no link

# ── full pipeline ────────────────────────────────────────────────
gcc <file> -o binary        # dynamically linked executable (default)
gcc <file> -o binary -static  # statically linked, all deps embedded

inspection

file <file>                  # ELF type + arch: relocatable / executable / shared object
readelf --syms <file>        # symbol table  (long form of -s)
readelf -r <file>            # relocation entries: what the linker must patch and how
objdump -M intel -d <file>   # disassemble (Intel syntax)
objdump -sj .rodata <file>   # hex+ascii dump of .rodata section
strip --strip-all <file>     # remove .symtab/.strtab  (runtime keeps .dynsym)

Pipeline overview

Preprocessing

cpp expands #include, #define, and conditionals into a single translation unit of pure C, with no directives left in the output. Token pasting (##), stringification (#), and predefined macros (__FILE__, __LINE__, __COUNTER__) are resolved here, before any type-checking or code generation occurs.

Compilation

GCC lowers the translation unit through its internal IR (GIMPLE → RTL) to architecture-specific assembly. The optimization level is the critical variable for reverse engineering: -O0 emits a nearly 1:1 mapping to the C source: every variable on the stack, every expression a separate instruction. From -O2 up, the compiler inlines aggressively, eliminates dead branches, unrolls loops, and can merge or reorder functions entirely, breaking the structural correspondence you’d expect. Symbolic labels are still present in the .s output regardless of level.

Assembly

as translates mnemonics to opcodes and produces a relocatable ELF object (.o). External references have no resolved address yet; they are recorded as relocation entries in .rela.text (RELA on x86-64: offset + type + symbol + addend), pointing the linker at every site that needs patching. file reports the output as ELF 64-bit LSB relocatable; a shared library built with -fPIC -shared comes out as shared object, position-independent with no fixed load address.

Key sections in a .o:

.text: executable code
.data: initialized globals; present in file, mapped RW
.bss: zero-initialized globals; no file space, kernel zero-fills on load
.rodata: string literals and const arrays; mapped RO, inspect with objdump -sj .rodata
.symtab / .strtab: full symbol table; stripped in release builds
.dynsym: export/import table for the dynamic linker; survives strip --strip-all

Linking

ld (invoked by gcc) consumes all .o files and library inputs, resolves every relocation entry, assigns final virtual addresses, and emits the executable ELF.

Static (-static) merges all .text / .data / … sections from .o files and .a archives, resolving all relocations to absolute addresses. It is fully self-contained, with no PT_INTERP, no PLT indirection, and direct calls. The binary is larger (it carries full libc), but there are no runtime dependency surprises; preferred for forensics and sandboxed environments.

Dynamic (default) leaves external symbols as UNDEF in .dynsym. A PT_INTERP segment names the runtime linker (/lib64/ld-linux-x86-64.so.2), which maps the needed .sos and resolves symbols at load time, or lazily on first call via PLT/GOT: each external call goes through a PLT stub that on first invocation falls through to the resolver, patches the GOT entry with the real address; subsequent calls jump through the GOT directly.

LTO (-flto) defers optimization to link time. GCC stores GIMPLE IR in object files instead of machine code; ld then runs a second optimization pass across all translation units simultaneously, enabling cross-TU inlining and dead function elimination. From a RE perspective, LTO-built binaries have no function-level correspondence to source files, and aggressively inlined code makes CFG reconstruction significantly harder.

linking / runtime

ldd ./binary                          # runtime shared lib dependencies
readelf -d ./binary | grep NEEDED     # required .so from ELF dynamic section
readelf -W --syms ./binary | grep UNDEF  # symbols resolved at runtime by ld.so
objdump -M intel -d ./binary          # disassemble: external calls appear as plt@FUNC
strip --strip-all ./binary            # remove .symtab/.strtab  (keeps .dynsym)

After strip --strip-all, function names and local symbols are gone; only the exports and imports the dynamic linker depends on remain. CFG recovery falls back to heuristics: function prologue patterns, cross-reference analysis, and section boundary detection.

Debug info

ELF + DWARF. Debug info is embedded directly as dedicated sections (.debug_info, .debug_line, .debug_abbrev, …). strip --strip-debug removes these without touching the symbol table; strip --strip-all removes both. Shipped ELF binaries are typically fully stripped.

PE + PDB. Windows uses a separate PDB (Program Database) file linked to the binary via a GUID in IMAGE_DEBUG_DIRECTORY. The binary carries almost no debug info itself, and the PDB is rarely shipped, making it absent for most RE work unless indexed by a symbol server (symsrv).

ELF loading

Linking produces a file on disk. Loading turns it into a live process, a choreography across three actors: the kernel (parses ELF, creates the initial address space), the dynamic linker (ld.so, at /lib64/ld-linux-x86-64.so.2, a position-independent shared library that bootstraps the runtime before any user code runs), and libc, the standard C library (libc.so.6) providing malloc, printf, exit, and the startup/shutdown scaffolding every C program depends on.

fork + execve. The shell calls fork() to clone itself, then execve(path, argv, envp) in the child. The kernel’s do_execve handler tears down the existing address space and begins constructing a new one from the binary on disk.

Kernel reads ELF header. The 64-byte ELF header supplies everything needed: magic bytes (\x7fELF), e_type (ET_EXEC for a fixed-address executable, ET_DYN for a PIE), e_machine (EM_X86_64), and e_phoff, the file offset to the program header table listing all PT_* entries.

Kernel maps PT_LOAD segments. For each PT_LOAD entry the kernel issues an mmap(): the file range is mapped at the specified virtual address with permissions matching the segment flags (R-X for code, RW- for data). It is demand-paged, so there is no physical I/O until the first access triggers a page fault.

Kernel maps ld.so from PT_INTERP. If a PT_INTERP segment is present (every dynamically linked binary has one), the kernel reads the interpreter path it contains (/lib64/ld-linux-x86-64.so.2), opens that file, and maps its own PT_LOAD segments at a fresh ASLR base address. ld.so is itself a PIE, so it runs fully position-independent and must self-relocate before it can call any of its own functions.

Kernel prepares the initial stack. Before handing over control, the kernel writes the initial stack frame: argc, argv[] and envp[] pointer arrays (each null-terminated), then the auxiliary vector, a sequence of AT_* key-value pairs that pass kernel-internal state to userspace without a syscall:

AT_PHDR / AT_PHNUM: address and count of the main executable’s program headers; lets ld.so locate PT_DYNAMIC
AT_ENTRY: the main executable’s entry point (_start); where ld.so will jump at the very end
AT_BASE: ld.so’s own load base; needed for its self-relocation pass
AT_RANDOM: 16 kernel-generated random bytes; seeds the stack canary and glibc’s ASLR offsets

auxiliary vector: kernel to userspace channel

LD_SHOW_AUXV=1 ./a.out 2>/dev/null | grep -E "AT_PHDR|AT_PHNUM|AT_ENTRY|AT_BASE|AT_RANDOM|AT_PAGESZ|AT_INTERP"

AT_PAGESZ:   4096                  # memory page size (arch constant, no syscall needed)
AT_PHDR:     0x555555554040        # main executable program header table
AT_PHNUM:    13                    # number of PT_* entries in that table
AT_BASE:     0x7ffff7fc5000        # ld.so load base (ASLR randomised each exec)
AT_ENTRY:    0x555555555050        # _start, ld.so jumps here after relocation
AT_RANDOM:   0x7fffffffde39        # 16 random bytes (stack canary seed)

Dynamic linker maps shared libraries. ld.so reads the main executable’s PT_DYNAMIC segment and walks its DT_NEEDED entries, each of which names a required shared object (libc.so.6, libm.so.6, …). It resolves paths through DT_RUNPATH, LD_LIBRARY_PATH, and /etc/ld.so.cache, then maps each library’s PT_LOAD segments at ASLR-randomised addresses. The process is recursive: each newly loaded library may declare its own DT_NEEDED entries.

Dynamic linker applies relocations. With all libraries resident, ld.so processes the relocation tables (.rela.dyn / .rela.plt): for each entry, the target symbol is resolved across all loaded objects’ .dynsym tables and the result is written into the target slot, whether a GOT entry or an absolute pointer in .data. After this pass, all eagerly-bound symbols are fully resolved; PLT entries are primed for lazy resolution on first call.

Constructors, .init_array. ld.so calls functions listed in .init_array in dependency order: library constructors first (TLS setup, libc internal init), then the executable’s own. C functions annotated __attribute__((constructor)) land here, and they run before main with no explicit call in the source.

Dynamic linker jumps to _start. ld.so reads AT_ENTRY from the auxiliary vector and transfers control to the main executable’s entry point. This is _start, injected by the GCC linker script from crt1.o, and it never appears in the C source.

_start → __libc_start_main. _start zeroes rbp (marking the outermost stack frame for unwinders), extracts argc, argv, and envp from the initial stack layout, and calls __libc_start_main(main, argc, argv, …). libc is already fully mapped and relocated, so the call goes through the GOT like any other external symbol.

libc calls main. __libc_start_main installs atexit handlers, runs any remaining init callbacks, then calls main(argc, argv, envp). On return, exit() flushes stdio buffers, fires atexit callbacks, and issues the exit_group(status) syscall, terminating every thread in the process.

Exercises

1. Locating functions

Write a C program that contains several functions and compile it into an assembly file, an object file, and an executable binary, respectively. Try to locate the functions you wrote in the assembly file and in the disassembled object file and executable. Can you see the correspondence between the C code and the assembly code? Finally, strip the executable and try to identify the functions again.

locatingfunctions.c

#include <stdlib.h>
#include <stdio.h>

float multiply(float a, float b) { return a *= b; }
float divide(float a, float b)   { return a /= b; }

void show(char* str) { printf("%s\n", str); }

int main(int argc, char* argv[]) {
    printf("%s\n", "hello!");
    show("World");
    float c = multiply(divide(4.0, 2.0), 5.0);
    printf("%f\n", c);
    return 0;
}

compile at each stage

gcc -S -masm=intel locatingfunctions.c   # → .s   labels are plain text in the source
gcc -c locatingfunctions.c               # → .o   external calls show as 0x0 + reloc entry
gcc locatingfunctions.c -o a.out         # → executable, symbols intact
strip --strip-all a.out -o a2.out        # → stripped copy

In the .s file, function names appear as plain assembly labels (multiply:, divide:, …); they are source text. In the .o, they are symbol table entries; calls to external functions show address 0x0 with a relocation entry pointing the linker at the call site. In the linked binary, all relocations are resolved to final virtual addresses.

With symbols (a.out):

objdump -M intel -d a.out: user functions

0000000000001149 <multiply>:
    1149:  55                       push   rbp
    114a:  48 89 e5                 mov    rbp,rsp
    114d:  f3 0f 11 45 fc           movss  DWORD PTR [rbp-0x4],xmm0   ; spill a  (-O0)
    1152:  f3 0f 11 4d f8           movss  DWORD PTR [rbp-0x8],xmm1   ; spill b
    1157:  f3 0f 10 45 fc           movss  xmm0,DWORD PTR [rbp-0x4]   ; reload a
    115c:  f3 0f 59 45 f8           mulss  xmm0,DWORD PTR [rbp-0x8]   ; xmm0 = a * b
    1161:  f3 0f 11 45 fc           movss  DWORD PTR [rbp-0x4],xmm0   ; store result
    1166:  f3 0f 10 45 fc           movss  xmm0,DWORD PTR [rbp-0x4]   ; reload for return
    116b:  5d                       pop    rbp
    116c:  c3                       ret

000000000000116d <divide>:          ; identical structure, divss instead of mulss
    ...
    1180:  f3 0f 5e 45 f8           divss  xmm0,DWORD PTR [rbp-0x8]
    ...
    1190:  c3                       ret

0000000000001191 <show>:
    ...
    11a4:  e8 87 fe ff ff           call   1030 <puts@plt>             ; printf("%s\n") → puts
    11ab:  c3                       ret

00000000000011ac <main>:
    ...
    11c5:  e8 66 fe ff ff           call   1030 <puts@plt>             ; printf("%s\n","hello!") → puts
    11d4:  e8 b8 ff ff ff           call   1191 <show>
    11eb:  e8 7d ff ff ff           call   116d <divide>
    1200:  e8 44 ff ff ff           call   1149 <multiply>
    122e:  e8 0d fe ff ff           call   1040 <printf@plt>           ; %f needs real printf

gcc -O3: what actually changes

; multiply / divide: standalone stubs, 2 instructions, no frame, no spills
0000000000001190 <multiply>:
    1190:  f3 0f 59 c1   mulss  xmm0,xmm1
    1194:  c3            ret

00000000000011a0 <divide>:
    11a0:  f3 0f 5e c1   divss  xmm0,xmm1
    11a4:  c3            ret

; show: tail call, the function body IS the jump to puts
00000000000011b0 <show>:
    11b0:  e9 7b fe ff ff   jmp   1030 <puts@plt>

; main: multiply / divide / show all inlined, arithmetic constant-folded to a single value
0000000000001050 <main>:
    1050:  sub    rsp,0x8
    1054:  lea    rdi,[rip+0xfa9]               ; "hello!"
    105b:  call   1030 <puts@plt>               ; printf("hello!\n") → puts
    1060:  lea    rdi,[rip+0xfa4]               ; "World"
    1067:  call   1030 <puts@plt>               ; show("World") inlined → direct puts
    106c:  movsd  xmm0,QWORD PTR [rip+0xfa4]   ; multiply(divide(4.0,2.0),5.0) = 10.0 precomputed
    1074:  mov    eax,0x1                        ; 1 xmm arg for printf varargs
    1079:  lea    rdi,[rip+0xf91]               ; "%f\n"
    1080:  call   1040 <printf@plt>
    1085:  xor    eax,eax                       ; return 0
    1087:  add    rsp,0x8
    108b:  ret

PLT stub: puts@plt (lazy binding trampoline)

0000000000001030 <puts@plt>:
    1030:  ff 25 ca 2f 00 00   jmp    QWORD PTR [rip+0x2fca]   ; jump through GOT slot
    1036:  68 00 00 00 00      push   0x0                       ; reloc index for resolver
    103b:  e9 e0 ff ff ff      jmp    1020 <puts@plt-0x10>     ; → ld.so resolver

; First call: GOT slot → resolver → patches GOT with real puts address
; All subsequent calls: GOT slot → libc puts directly (no resolver overhead)

Stripped (a2.out):

After strip --strip-all, the entire .text section becomes one anonymous blob. There are no function labels; only PLT entries (from .dynsym) and ELF section names survive. Calls that previously referenced named symbols now reference offsets relative to the nearest surviving symbol:

stripped: same opcodes, annotations lost

; ── non-stripped ────────────────────────────────────
0000000000001149 <multiply>:         ; function label present

    11d4:  e8 b8 ff ff ff   call   1191 <show>
    11eb:  e8 7d ff ff ff   call   116d <divide>
    1200:  e8 44 ff ff ff   call   1149 <multiply>

; ── stripped ─────────────────────────────────────────
0000000000001050 <.text>:            ; entire section = one blob, no function labels

    11d4:  e8 b8 ff ff ff   call   1191 <printf@plt+0x151>   ; was: <show>
    11eb:  e8 7d ff ff ff   call   116d <printf@plt+0x12d>   ; was: <divide>
    1200:  e8 44 ff ff ff   call   1149 <printf@plt+0x109>   ; was: <multiply>

; opcodes are byte-for-byte identical, only the annotations differ

2. Sections

As you’ve seen, ELF binaries (and other types of binaries) are divided into sections. Some sections contain code, and others contain data. Why do you think the distinction between code and data sections exists? How do you think the loading process differs for code and data sections? Is it necessary to copy all sections into memory when a binary is loaded for execution?

Why the distinction exists:

The core reason is memory protection. The CPU’s MMU enforces per-page permission bits set by the OS: code pages are mapped R-X (read + execute, not writable), data pages RW- (read + write, not executable). Separating them into distinct sections lets the linker group them into ELF segments with matching permissions, which the loader then passes to mmap().

How loading differs (code vs data):

The kernel ELF loader (load_elf_binary) and ld.so work with segments (program header entries), not sections. Each PT_LOAD segment groups sections of the same permission class and is mapped with a single mmap() call:

readelf -l a.out: PT_LOAD segments

Type    Offset   VirtAddr           FileSiz  MemSiz   Flg
LOAD    0x001000 0x0000000000001000  0x000249 0x000249 R E   # code:  r-x
LOAD    0x002db0 0x0000000000003db0  0x000270 0x000278 RW    # data:  rw-
#                                    ^FileSiz  ^MemSiz
#                                                      ^^^^^ .bss adds to MemSiz but not FileSiz

Is it necessary to load all sections?

No, and the sections/segments split is precisely what makes this possible. Sections are the linker/debugger view of the file. Segments are the runtime view. Only sections that fall inside a PT_LOAD segment are ever mapped; everything else exists only on disk:

readelf -l a.out: section-to-segment mapping

Section to Segment mapping:
  Segment  Sections
  ...
  03       .init .plt .text .fini                           # r-x  → loaded
  04       .rodata .eh_frame_hdr .eh_frame                  # r    → loaded
  05       .data .bss .got .dynamic .init_array .fini_array # rw-  → loaded
  ...
  # NOT listed in any segment → never mapped:
  #   .symtab   .strtab   (stripped anyway in release)
  #   .debug_info   .debug_line   .debug_abbrev   .debug_str   (DWARF)

verify at runtime: /proc maps

cat /proc/$(pidof binary)/maps
# 555555555000-555555556000 r-xp  ...  a.out   ← .text mapped r-x
# 555555557000-555555558000 r--p  ...  a.out   ← .rodata mapped r
# 555555558000-555555559000 rw-p  ...  a.out   ← .data/.bss mapped rw
# 7ffff7d00000-7ffff7f28000 r-xp  ...  libc.so ← shared code pages (same physical frames across all processes)