diff --git a/README.adoc b/README.adoc
index 82cba3b..390c69a 100644
--- a/README.adoc
+++ b/README.adoc
@@ -4023,7 +4023,7 @@ This is the default install path for `CONFIG_SOME_MOD=m` modules built with `mak
 
 Currently, there are only two kinds of kernel modules that you can try out with `modprobe`:
 
-* modules built with Buildroot, see: <<kernel_modules-package>>
+* modules built with Buildroot, see: <<kernel_modules-buildroot-package>>
 * modules built from the kernel tree itself, see: <<dummy-irq>>
 
 We are not installing out custom `./build-modules` modules there, because:
@@ -5692,7 +5692,7 @@ TODO: what for, and at which point point does Buildroot / BusyBox generate that
 
 Unlike `insmod`, <<modprobe>> deals with kernel module dependencies for us.
 
-First get <<kernel_modules-package>> working.
+First get <<kernel_modules-buildroot-package>> working.
 
 Then, for example:
 
@@ -5889,7 +5889,7 @@ TODO how to get the vermagic from running kernel from userland? https://lists.ke
 
 This option just strips `modversion` information from the module before loading, so it is not a kernel feature.
 
-==== module_init
+==== init_module
 
 `init_module` and `cleanup_module` are an older alternative to the `module_init` and `module_exit` macros:
 
@@ -6378,6 +6378,7 @@ Bibliography:
 * https://superuser.com/questions/619955/how-does-proc-work/1442571#1442571
 * https://stackoverflow.com/questions/8516021/proc-create-example-for-kernel-module/18924359#18924359
 
+[[proc-version]]
 ===== /proc/version
 
 Its data is shared with `uname()`, which is a <<posix,POSIX C>> function and has a Linux syscall to back it up.
@@ -10940,7 +10941,10 @@ In theory, the cleanest way to add m5ops to your benchmarks would be to do exact
 
 However, I think it is usually not worth the trouble of hacking up the build system of the benchmark to do this, and I recommend just hardcoding in a few raw instructions here and there, and managing it with version control + `sed`.
 
-Related: https://www.mail-archive.com/gem5-users@gem5.org/msg15418.html
+Bibliography:x
+
+* https://stackoverflow.com/questions/56506154/how-to-analyze-only-interest-area-in-source-code-by-using-gem5/56506419#56506419
+* https://www.mail-archive.com/gem5-users@gem5.org/msg15418.html
 
 ===== m5ops instructions interface
 
@@ -11729,6 +11733,8 @@ The implementation lives under `libgomp` in the GCC tree, and is documented at:
 
 Programs under link:userland/cpp/[] are examples of link:https://en.wikipedia.org/wiki/C%2B%2B#Standardization[ISO C] programming.
 
+* link:userland/cpp/hello.cpp[]
+
 === POSIX
 
 Programs under link:userland/posix/[] are examples of POSIX C programming.
@@ -11783,7 +11789,7 @@ The add examples in particular:
 * introduce the basics of how a given assembly works: how many inputs / outputs, who is input and output, can it use memory or just registers, etc.
 +
 It is then a big copy paste for most other data instructions.
-* verify that the venerable `add` instruction and our assertions are working
+* verify that the venerable ADD instruction and our assertions are working
 
 Now try to modify modify the x86_64 add program to see the assertion fail:
 
@@ -11844,21 +11850,21 @@ Bibliography: <<armarm7>> A2.3 "ARM core registers".
 
 Example: link:userland/arch/aarch64/x31.S[]
 
-There is no `x31` name, and the encoding can have two different names depending on the instruction:
+There is no X31 name, and the encoding can have two different names depending on the instruction:
 
-* `xzr`: zero register:
+* XZR: zero register:
 ** https://stackoverflow.com/questions/42788696/why-might-one-use-the-xzr-register-instead-of-the-literal-0-on-armv8
 ** https://community.arm.com/processors/f/discussions/3185/wzr-xzr-register-s-purpose
-* `sp`: stack pointer
+* SP: stack pointer
 
-To make things more confusing, some aliases can take either name, which makes them alias to different things, e.g. `mov` accepts both:
+To make things more confusing, some aliases can take either name, which makes them alias to different things, e.g. MOV accepts both:
 
 ....
 mov x0, sp
 mov x0, xzr
 ....
 
-and the first one is an alias to `add` while the second an alias to `orr`.
+and the first one is an alias to ADD while the second an alias to <<arm-bitwise-instructions,ORR>>.
 
 The difference is documented on a per instruction basis. Instructions that encode 31 as SP say:
 
@@ -12290,8 +12296,8 @@ Some of the differences include:
 
 * `#` is optional in unified syntax int literals, see <<gnu-gas-assembler-immediates>>
 * many mnemonics changed:
-** most of them are condition code position changes, e.g. `andseq` vs `andeqs`: https://stackoverflow.com/questions/51184921/wierd-gcc-behaviour-with-arm-assembler-andseq-instruction
-** but there are some more drastic ones, e.g. `swi` vs `svc`: https://stackoverflow.com/questions/8459279/are-arm-instructuons-swi-and-svc-exactly-same-thing/54078731#54078731
+** most of them are condition code position changes, e.g. ANDSEQ vs ANDEQS: https://stackoverflow.com/questions/51184921/wierd-gcc-behaviour-with-arm-assembler-andseq-instruction
+** but there are some more drastic ones, e.g. SWI vs <<arm-svc-instruction,SVC>>: https://stackoverflow.com/questions/8459279/are-arm-instructuons-swi-and-svc-exactly-same-thing/54078731#54078731
 * cannot have implicit destination with shift, see: <<arm-shift-suffixes>>
 
 ===== GNU GAS assembler ARM .n and .w suffixes
@@ -12365,23 +12371,23 @@ History:
 
 ==== x86 SSE2
 
-===== x86 addpd instruction
+===== x86 ADDPD instruction
 
-link:userland/arch/x86_64/addpd.S[]: `addps`, `addpd`
+link:userland/arch/x86_64/addpd.S[]: ADDPS, ADDPD
 
 Good first instruction to learn SIMD: <<simd-assembly>>
 
-===== x86 paddq instruction
+===== x86 PADDQ instruction
 
-link:userland/arch/x86_64/paddq.S[]: `paddq`, `paddl`, `paddw`, `paddb`
+link:userland/arch/x86_64/paddq.S[]: PADDQ, PADDL, PADDW, PADDB
 
 Good first instruction to learn SIMD: <<simd-assembly>>
 
-=== x86 rdtsc instruction
+=== x86 RDTSC instruction
 
 TODO: review this section, make a more controlled userland experiment with <<m5ops>> instrumentation.
 
-Let's have some fun and try to correlate the gem5 <<stats-txt>> `system.cpu.numCycles` cycle count with the link:https://en.wikipedia.org/wiki/Time_Stamp_Counter[x86 `rdtsc` instruction] that is supposed to do the same thing:
+Let's have some fun and try to correlate the gem5 <<stats-txt>> `system.cpu.numCycles` cycle count with the link:https://en.wikipedia.org/wiki/Time_Stamp_Counter[x86 RDTSC instruction] that is supposed to do the same thing:
 
 ....
 ./build-userland --static userland/arch/x86_64/inline_asm/rdtsc.c
@@ -12391,14 +12397,14 @@ Let's have some fun and try to correlate the gem5 <<stats-txt>> `system.cpu.numC
 
 Source: link:userland/arch/x86_64/rdtsc.c[]
 
-`rdtsc` outputs a cycle count which we compare with gem5's `gem5-stat`:
+RDTSC outputs a cycle count which we compare with gem5's `gem5-stat`:
 
-* `3828578153`: `rdtsc`
+* `3828578153`: RDTSC
 * `3830832635`: `gem5-stat`
 
 which gives pretty close results, and serve as a nice sanity check that the cycle counter is coherent.
 
-It is also nice to see that `rdtsc` is a bit smaller than the `stats.txt` value, since the latter also includes the exec syscall for `m5`.
+It is also nice to see that RDTSC is a bit smaller than the `stats.txt` value, since the latter also includes the exec syscall for `m5`.
 
 Bibliography:
 
@@ -12560,7 +12566,7 @@ Bibliography: https://www.quora.com/Why-is-it-that-you-need-a-license-from-ARM-t
 
 ==== ARM instruction encodings
 
-Understanding the basics of instruction encodings is fundamental to help you to remember what instructions do and why some things are possible or not, notably the <<arm-ldr-pseudo-instruction>> and the <<arm-adr-instruction,`adrp` instruction>>.
+Understanding the basics of instruction encodings is fundamental to help you to remember what instructions do and why some things are possible or not, notably the <<arm-ldr-pseudo-instruction>> and the <<arm-adr-instruction,ADRP instruction>>.
 
 aarch32 has two "instruction sets", which to look just like encodings.
 
@@ -12592,7 +12598,7 @@ This RISC-y mostly fixed instruction length design likely makes processor design
 
 This design can be contrasted with x86, which has widely variable instruction length.
 
-We can swap between A32 and T32 with the `bx` and `blx` instructions: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.kui0100a/armasm_cihfddaf.htm puts it really nicely:
+We can swap between A32 and T32 with the BX and BLX instructions: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.kui0100a/armasm_cihfddaf.htm puts it really nicely:
 
 ____
 * The BL and BLX instructions copy the address of the next instruction into lr (r14, the link register).
@@ -12633,26 +12639,26 @@ We verify that with:
 ./run-toolchain --arch arm readelf -- -h "$(./getvar --arch arm userland_build_dir)/arch/arm/freestanding/linux/hello_thumb.out"
 ....
 +
-The Linux kernel must use that to decide put the CPU in thumb mode: that could be done simply with a regular `bx`.
+The Linux kernel must use that to decide put the CPU in thumb mode: that could be done simply with a regular BX.
 * on the non-freestanding one, the linker uses some ELF metadata to decide that `main` is thumb and jumps to it appropriately: https://reverseengineering.stackexchange.com/questions/6080/how-to-detect-thumb-mode-in-arm-disassembly
 +
-TODO details. Does the linker then resolve thumbness with address relocation? Doesn't this imply that the compiler cannot generate `bl` (never changes) or `blx` (always changes) across object files, only `bx` (target state controlled by lower bit)?
+TODO details. Does the linker then resolve thumbness with address relocation? Doesn't this imply that the compiler cannot generate BL (never changes) or BLX (always changes) across object files, only BX (target state controlled by lower bit)?
 
 === ARM branch instructions
 
-==== ARM b instruction
+==== ARM B instruction
 
 Unconditional branch.
 
 Example: link:userland/arch/arm/b.S[]
 
-The encoding stores `pc` offsets in 24 bits. The destination must be a multiple of 4, which is easy since all instructions are 4 bytes.
+The encoding stores PC offsets in 24 bits. The destination must be a multiple of 4, which is easy since all instructions are 4 bytes.
 
 This allows for 26 bit long jumps, which is 64 MiB.
 
 TODO: what to do if we want to jump longer than that?
 
-==== ARM beq instruction
+==== ARM BEQ instruction
 
 Branch if equal based on the status registers.
 
@@ -12663,16 +12669,16 @@ Examples:
 
 The family of instructions includes:
 
-* `beq`: branch if equal
-* `bne`: branch if not equal
-* `ble`: less or equal
-* `bge`: greater or equal
-* `blt`: less than
-* `bgt`: greater than
+* BEQ: branch if equal
+* BNE: branch if not equal
+* BLE: less or equal
+* BGE: greater or equal
+* BLT: less than
+* BGT: greater than
 
-==== ARM bl instruction
+==== ARM BL instruction
 
-Branch with link, i.e. branch and store the return address on the `rl` register.
+Branch with link, i.e. branch and store the return address on the RL register.
 
 Example: link:userland/arch/arm/bl.S[]
 
@@ -12680,7 +12686,7 @@ This is the major way to make function calls.
 
 The current ARM / Thumb mode is encoded in the least significant bit of lr.
 
-===== ARM bx instruction
+===== ARM BX instruction
 
 See: <<arm-thumb-encoding>>
 
@@ -12690,14 +12696,14 @@ Example: link:userland/arch/aarch64/ret.S[]
 
 ARMv8 AArch64 only:
 
-* there is no `bx` in AArch64 since no Thumb to worry about, so it is called just `br`
-* the `ret` instruction was added in addition to `br`, with the following differences:
+* there is no BX in AArch64 since no Thumb to worry about, so it is called just BR
+* the RET instruction was added in addition to BR, with the following differences:
 ** provides a hint that this is a function call return
-** has a default argument `x30` if none is given. This is where `bl` puts the return value.
+** has a default argument X30 if none is given. This is where BL puts the return value.
 
 See also: https://stackoverflow.com/questions/32304646/arm-assembly-branch-to-address-inside-register-or-memory/54145818#54145818
 
-==== ARM cbz instruction
+==== ARM CBZ instruction
 
 Compare and branch if zero.
 
@@ -12709,11 +12715,11 @@ Very handy!
 
 ==== ARM conditional execution
 
-Weirdly, <<arm-b-instruction>> and family are not the only instructions that can execute conditionally on the flags: the same also applies to most instructions, e.g. `add`.
+Weirdly, <<arm-b-instruction>> and family are not the only instructions that can execute conditionally on the flags: the same also applies to most instructions, e.g. ADD.
 
 Example: link:userland/arch/arm/cond.S[]
 
-Just add the usual `eq`, `ne`, etc. suffixes just as for `b`.
+Just add the usual `eq`, `ne`, etc. suffixes just as for B.
 
 The list of all extensions is documented at <<armarm7>> "A8.3 Conditional execution".
 
@@ -12727,15 +12733,15 @@ This is part of the RISC-y beauty of the ARM instruction set, unlike x86 in whic
 
 This kind of architecture is called a link:https://en.wikipedia.org/wiki/Load/store_architecture[Load/store architecture].
 
-==== ARM ldr instruction
+==== ARM LDR instruction
 
-===== ARM ldr pseudo-instruction
+===== ARM LDR pseudo-instruction
 
-`ldr` can be either a regular instruction that loads stuff into memory, or also a pseudo-instruction (assembler magic): http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0041c/Babbfdih.html
+LDR can be either a regular instruction that loads stuff into memory, or also a pseudo-instruction (assembler magic): http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0041c/Babbfdih.html
 
 The pseudo instruction version is when an equal sign appears on one of the operators.
 
-The `ldr` pseudo instruction can automatically create hidden variables in a place called the "literal pool", and load them from memory with PC relative loads.
+The LDR pseudo instruction can automatically create hidden variables in a place called the "literal pool", and load them from memory with PC relative loads.
 
 Example: link:userland/arch/arm/ldr_pseudo.S[]
 
@@ -12790,14 +12796,14 @@ As an application of the post-indexed addressing mode, let's increment an array.
 
 Example: link:userland/arch/arm/inc_array.S[]
 
-===== ARM ldrh and ldrb instructions
+===== ARM LDRH and LDRB instructions
 
-There are `ldr` variants that load less than full 4 bytes:
+There are LDR variants that load less than full 4 bytes:
 
 * link:userland/arch/arm/ldrb.S[]: load byte
 * link:userland/arch/arm/ldrh.S[]: load half word
 
-==== ARM str instruction
+==== ARM STR instruction
 
 Store from memory into registers.
 
@@ -12805,40 +12811,40 @@ Example: link:userland/arch/arm/str.S[]
 
 Basically everything that applies to <<arm-ldr-instruction>> also applies here so we won't go into much detail.
 
-===== ARMv8 aarch64 str instruction
+===== ARMv8 aarch64 STR instruction
 
-PC-relative `str` is not possible in aarch64.
+PC-relative STR is not possible in aarch64.
 
-For `ldr` it works <<arm-ldr-instruction,as in aarch32>>.
+For LDR it works <<arm-ldr-instruction,as in aarch32>>.
 
-As a result, it is not possible to load from the literal pool for `str`.
+As a result, it is not possible to load from the literal pool for STR.
 
 Example: link:userland/arch/aarch64/str.S[]
 
-This can be seen from <<armarm8>> C3.2.1 "Load/Store register": `ldr` simply has on extra PC encoding that `str` does not.
+This can be seen from <<armarm8>> C3.2.1 "Load/Store register": LDR simply has on extra PC encoding that STR does not.
 
-===== ARMv8 aarch64 ldp and stp instructions
+===== ARMv8 aarch64 LDP and STP instructions
 
 Push a pair of registers to the stack.
 
 TODO minimal example. Currently used in `LKMC_PROLOGUE` at link:lkmc/aarch64.h[] since it is the main way to restore register state.
 
-==== ARM ldmia instruction
+==== ARM LDMIA instruction
 
 Pop values form stack into the register and optionally update the address register.
 
-`stmdb` is the push version.
+STMDB is the push version.
 
 Example: link:userland/arch/arm/ldmia.S[]
 
 The mnemonics stand for:
 
-* `stmdb`: STore Multiple Decrement Before
-* `ldmia`: LoaD Multiple Increment After
+* STMDB: STore Multiple Decrement Before
+* LDMIA: LoaD Multiple Increment After
 
 Example: link:userland/arch/arm/push.S[]
 
-`push` and `pop` are just mnemonics `stdmdb` and `ldmia` using the stack pointer `sp` as address register:
+PUSH and POP are just mnemonics STDMDB and LDMIA using the stack pointer SP as address register:
 
 ....
 stmdb sp!, reglist
@@ -12863,7 +12869,7 @@ Arithmetic:
 * link:userland/arch/arm/rev.S[]: reverse byte order
 * link:userland/arch/arm/tst.S[]
 
-==== ARM cset instruction
+==== ARM CSET instruction
 
 Example: link:userland/arch/aarch64/cset.S[]
 
@@ -12874,11 +12880,11 @@ ARMv8-only, likely because in ARMv8 you can't have conditional suffixes for ever
 ==== ARM bitwise instructions
 
 * link:userland/arch/arm/and.S[]
-* `eor`: exclusive OR
-* `orr`: OR
+* EOR: exclusive OR
+* ORR: OR
 * link:userland/arch/arm/clz.S[]: count leading zeroes
 
-===== ARM bic instruction
+===== ARM BIC instruction
 
 Bitwise Bit Clear: clear some bits.
 
@@ -12888,7 +12894,7 @@ dest = `left & ~right`
 
 Example: link:userland/arch/arm/bic.S[]
 
-===== ARM ubfm instruction
+===== ARM UBFM instruction
 
 Unsigned Bitfield Move.
 
@@ -12900,7 +12906,7 @@ Example: link:userland/arch/aarch64/ubfm.S[]
 
 TODO: explain full behaviour. Very complicated. Has several simpler to understand aliases.
 
-====== ARM ubfx instruction
+====== ARM UBFX instruction
 
 Alias for:
 
@@ -12924,11 +12930,11 @@ dest = (src & ((1 << width) - 1)) >> lsb;
 
 Bibliography: https://stackoverflow.com/questions/8366625/arm-bit-field-extract
 
-===== ARM bfm instruction
+===== ARM BFM instruction
 
-TODO: explain. Similar to <<arm-ubfm-instruction,`ubfm`>> but leave untouched bits unmodified.
+TODO: explain. Similar to <<arm-ubfm-instruction,UBFM>> but leave untouched bits unmodified.
 
-====== ARM bfi instruction
+====== ARM BFI instruction
 
 Examples:
 
@@ -12937,14 +12943,14 @@ Examples:
 
 Move the lower bits of source register into any position in the destination:
 
-* ARMv8: an alias for <<arm-bfm-instruction,`bfm`>>
+* ARMv8: an alias for <<arm-bfm-instruction>>
 * ARMv7: a real instruction
 
-==== ARM mov instruction
+==== ARM MOV instruction
 
 Move an immediate to a register, or a register to another register.
 
-Cannot load from or to memory, since only the `ldr` and `str` instruction families can do that in ARM: <<arm-load-and-store-instructions>>
+Cannot load from or to memory, since only the LDR and STR instruction families can do that in ARM: <<arm-load-and-store-instructions>>
 
 Example: link:userland/arch/arm/mov.S[]
 
@@ -12960,7 +12966,7 @@ Summary of solutions:
 * <<arm-movw-and-movt-instructions>>
 * place it in memory. But then how to load the address, which is also a 32-bit value?
 ** use pc-relative addressing if the memory is close enough
-** use `orr` encodable shifted immediates
+** use <<arm-bitwise-instructions,ORR>> encodable shifted immediates
 
 The blog article summarizes nicely which immediates can be encoded and the design rationale:
 
@@ -13006,9 +13012,9 @@ Example: link:userland/arch/arm/shift.S[]
 
 The shift types are:
 
-* `lsr` and `lfl`: Logical Shift Right / Left. Insert zeroes.
-* `ror`: Rotate Right / Left. Wrap bits around.
-* `asr`: Arithmetic Shift Right. Keep sign.
+* LSR and LFL: Logical Shift Right / Left. Insert zeroes.
+* ROR: Rotate Right / Left. Wrap bits around.
+* ASR: Arithmetic Shift Right. Keep sign.
 
 Documented at: <<armarm7>> "A4.4.1 Standard data-processing instructions"
 
@@ -13018,11 +13024,11 @@ Example: link:userland/arch/arm/s_suffix.S[]
 
 The `S` suffix, present on most <<arm-data-processing-instructions>>, makes the instruction also set the Status register flags that control conditional jumps.
 
-If the result of the operation is `0`, then it triggers `beq`, since comparison is a subtraction, with success on 0.
+If the result of the operation is `0`, then it triggers BEQ, since comparison is a subtraction, with success on 0.
 
-`cmp` sets the flags by default of course.
+CMP sets the flags by default of course.
 
-==== ARM adr instruction
+==== ARM ADR instruction
 
 Similar rationale to the <<arm-ldr-pseudo-instruction>>, allowing to easily store a PC-relative reachable address into a register in one go, to overcome the 4-byte fixed instruction size.
 
@@ -13034,15 +13040,15 @@ Examples:
 
 More details: https://stackoverflow.com/questions/41906688/what-are-the-semantics-of-adrp-and-adrl-instructions-in-arm-assembly/54042899#54042899
 
-===== ARM adrl instruction
+===== ARM ADRL instruction
 
 See: <<arm-adr-instruction>>.
 
 === ARM miscellaneous instructions
 
-==== ARM nop instruction
+==== ARM NOP instruction
 
-There are a few different ways to encode `nop`, notably `mov` a register into itself, and a dedicated miscellaneous instruction.
+There are a few different ways to encode NOP, notably MOV a register into itself, and a dedicated miscellaneous instruction.
 
 Example: link:userland/arch/arm/nop.S[]
 
@@ -13054,7 +13060,7 @@ gdb-multiarch -batch -ex 'arch arm' -ex "file v7/nop.out" -ex "disassemble/rs as
 
 Bibliography: https://stackoverflow.com/questions/1875491/nop-for-iphone-binaries
 
-==== ARM udf instruction
+==== ARM UDF instruction
 
 Guaranteed undefined! Therefore raise illegal instruction signal. Used by GCC `__builtin_trap` apparently: https://stackoverflow.com/questions/16081618/programmatically-cause-undefined-instruction-exception
 
@@ -13084,7 +13090,7 @@ When a certain version of VFP is present on a CPU, the compiler prefix typically
 
 Bibliography:
 
-* <<armarm7>> Appendix D6 "Common VFP Subarchitecture Specification". It is not part of the ISA, but just an extension. TODO: that spec does not seem to have the instructions documented, and instruction like `VMOV` just live with the main instructions. Is `VMOV` part of VFP?
+* <<armarm7>> Appendix D6 "Common VFP Subarchitecture Specification". It is not part of the ISA, but just an extension. TODO: that spec does not seem to have the instructions documented, and instruction like VMOV just live with the main instructions. Is VMOV part of VFP?
 * https://mindplusplus.wordpress.com/2013/06/25/arm-vfp-vector-programming-part-1-introduction/
 * https://en.wikipedia.org/wiki/ARM_architecture#Floating-point_(VFP)
 
@@ -13123,7 +13129,7 @@ And you can't access the higher bytes at D16 or greater with Sn.
 * link:userland/arch/arm/vadd_scalar.S[]: see also: <<floating-point-assembly>>
 * link:userland/arch/arm/vadd_vector.S[]: see also: <<simd-assembly>>
 
-===== ARM vcvt instruction
+===== ARM VCVT instruction
 
 Example: link:userland/arch/arm/vcvt.S[]
 
@@ -13143,7 +13149,7 @@ E.g., in our 32-bit float to 32-bit unsigned example we use:
 vld1.32.f32
 ....
 
-====== ARM vcvtr instruction
+====== ARM VCVTR instruction
 
 Example: link:userland/arch/arm/vcvtr.S[]
 
@@ -13155,7 +13161,7 @@ Rounding mode selection is exposed in the ANSI C standard through link:https://e
 
 TODO: is the initial rounding mode specified by the ELF standard? Could not find a reference.
 
-====== ARMv8 AArch32 vcvta instruction
+====== ARMv8 AArch32 VCVTA instruction
 
 Example: link:userland/arch/arm/vcvt.S[]
 
@@ -13220,13 +13226,13 @@ TODO example.
 <<armarm8>> B1.2.1 "Registers in AArch64 state" describes the registers:
 
 ____
-32 SIMD&FP registers, `V0` to `V31`. Each register can be accessed as:
+32 SIMD&FP registers, V0 to V31. Each register can be accessed as:
 
-* A 128-bit register named `Q0` to `Q31`.
-* A 64-bit register named `D0` to `D31`.
-* A 32-bit register named `S0` to `S31`.
-* A 16-bit register named `H0` to `H31`.
-* An 8-bit register named `B0` to `B31`.
+* A 128-bit register named Q0 to Q31.
+* A 64-bit register named D0 to D31.
+* A 32-bit register named S0 to S31.
+* A 16-bit register named H0 to H31.
+* An 8-bit register named B0 to B31.
 ____
 
 Notice how Sn is very different between v7 and v8! In v7 it goes across Dn, and in v8 inside each Dn.
@@ -13244,7 +13250,7 @@ Good first instruction to learn SIMD: <<simd-assembly>>
 
 ====== ARM FADD vs VADD
 
-It is very confusing, but `fadds` and `faddd` in Aarch32 are <<gnu-gas-assembler-arm-unified-syntax,pre-UAL>> for `vadd.f32` and `vadd.f64` which we use in this tutorial: <<arm-vadd-instruction>>
+It is very confusing, but FADDS and FADDD in Aarch32 are <<gnu-gas-assembler-arm-unified-syntax,pre-UAL>> for `vadd.f32` and `vadd.f64` which we use in this tutorial: <<arm-vadd-instruction>>
 
 The same goes for most ARMv7 mnemonics: `f*` is old, and `v*` is the newer better syntax.
 
@@ -13262,7 +13268,7 @@ We can load multiple vectors interleaved from memory in one single instruction!
 
 This is why the `ldN` instructions take an argument list denoted by `{}` for the registers, much like armv7 <<arm-ldmia-instruction>>.
 
-There are analogous `ld3` and `ld4` instruction.
+There are analogous LD3 and LD4 instruction.
 
 ==== ARM SIMD bibliography
 
@@ -13640,7 +13646,7 @@ Since I had this compiled, I also decided to try it out on userland.
 
 I was also able to run a freestanding Linux userland example on it: https://github.com/cirosantilli/arm-assembly-cheat/blob/cd232dcaf32c0ba6399b407e0b143d19b6ec15f4/v7/linux/hello.S
 
-It just ignores the `swi` however, and does not forward syscalls to the host like QEMU does.
+It just ignores the <<arm-svc-instruction>> however, and does not forward syscalls to the host like QEMU does.
 
 Then I tried a glibc example: https://github.com/cirosantilli/arm-assembly-cheat/blob/cd232dcaf32c0ba6399b407e0b143d19b6ec15f4/v7/mov.S
 
@@ -13865,7 +13871,7 @@ contains:
    4500: system.cpu A0 T0 : @vector_table+512    :   b   <_curr_el_spx_sync>  : IntAlu :   flags=(IsControl|IsDirectControl|IsUncondControl)
 ....
 
-So we see in both cases that the `svc` is done, then an exception happens, and then we just continue running from the exception handler address.
+So we see in both cases that the SVC is done, then an exception happens, and then we just continue running from the exception handler address.
 
 The vector table format is described on <<armarm8>> Table D1-7 "Vector offsets from vector table base address".
 
@@ -13895,21 +13901,21 @@ The first part of the table contains:
 
 |===
 
-and the following other parts are analogous, but referring to `SPx` and lower ELs.
+and the following other parts are analogous, but referring to SPx and lower ELs.
 
 We are going to do everything in <<arm-exception-levels,EL1>> for now.
 
 On the terminal output, we observe the initial values of:
 
-* `DAIF`: `0x3c0`, i.e. 4 bits (6 to 9) set to 1, which means that exceptions are masked for each exception type: Synchronous, System error, IRQ and FIQ.
+* DAIF: 0x3c0, i.e. 4 bits (6 to 9) set to 1, which means that exceptions are masked for each exception type: Synchronous, System error, IRQ and FIQ.
 +
 This reset value is defined by <<armarm8>> C5.2.2 "DAIF, Interrupt Mask Bits".
-* `SPSel`: `0x1`, which means: use `SPx` instead of `SP0`.
+* SPSel: 0x1, which means: use SPx instead of SP0.
 +
 This reset value is defined by <<armarm8>> C5.2.16 "SPSel, Stack Pointer Select".
-* `VBAR_EL1`: `0x0` holds the base address of the vector table
+* VBAR_EL1: 0x0 holds the base address of the vector table
 +
-This reset value is defined `UNKNOWN` by <<armarm8>> D10.2.116 "VBAR_EL1, Vector Base Address Register (EL1)", so we must set it to something ourselves to have greater portability.
+This reset value is defined UNKNOWN by <<armarm8>> D10.2.116 "VBAR_EL1, Vector Base Address Register (EL1)", so we must set it to something ourselves to have greater portability.
 
 Bibliography:
 
@@ -13934,9 +13940,9 @@ Sources:
 * link:baremetal/arch/aarch64/multicore.S[]
 * link:baremetal/arch/arm/multicore.S[]
 
-CPU 0 of this program enters a spinlock loop: it repeatedly checks if a given memory address is `1`.
+CPU 0 of this program enters a spinlock loop: it repeatedly checks if a given memory address is 1.
 
-So, we need CPU 1 to come to the rescue and set that memory address to `1`, otherwise CPU 0 will be stuck there forever!
+So, we need CPU 1 to come to the rescue and set that memory address to 1, otherwise CPU 0 will be stuck there forever!
 
 Don't believe me? Then try:
 
@@ -13972,16 +13978,16 @@ Bibliography: https://stackoverflow.com/questions/980999/what-does-multicore-ass
 
 ===== ARM WFE and SEV instructions
 
-The `WFE` and `SEV` instructions are just hints: a compliant implementation can treat them as NOPs.
+The WFE and SEV instructions are just hints: a compliant implementation can treat them as NOPs.
 
 However, likely no implementation likely does (TODO confirm), since:
 
-* `WFE` puts the core in a low power mode
-* `SEV` wakes up cores from a low power mode
+* WFE puts the core in a low power mode
+* SEV wakes up cores from a low power mode
 
 and power consumption is key in ARM applications.
 
-In QEMU 3.0.0, `SEV` is a NOPs, and `WFE` might be, but I'm not sure, see: https://github.com/qemu/qemu/blob/v3.0.0/target/arm/translate-a64.c#L1423
+In QEMU 3.0.0, SEV is a NOPs, and WFE might be, but I'm not sure, see: https://github.com/qemu/qemu/blob/v3.0.0/target/arm/translate-a64.c#L1423
 
 ....
     case 2: /* WFE */
@@ -14007,7 +14013,7 @@ TODO: what does the WFE code do? How can it not be a NOP if SEV is a NOP? https:
  */
 ....
 
-For gem5 however, if we comment out the `SVE` instruction, then it actually exits with `simulate() limit reached`, so the CPU truly never wakes up, which is a more realistic behaviour.
+For gem5 however, if we comment out the SVE instruction, then it actually exits with `simulate() limit reached`, so the CPU truly never wakes up, which is a more realistic behaviour.
 
 The following Raspberry Pi bibliography helped us get this sample up and running:
 
@@ -14033,7 +14039,7 @@ shows something like:
 
 To wake up CPU 1 on QEMU, we must use the Power State Coordination Interface (PSCI) which is documented at: link:https://developer.arm.com/docs/den0022/latest/arm-power-state-coordination-interface-platform-design-document[].
 
-This interface uses `HVC` calls, and the calling convention is documented at "SMC CALLING CONVENTION" link:https://developer.arm.com/docs/den0028/latest[].
+This interface uses HVC calls, and the calling convention is documented at "SMC CALLING CONVENTION" link:https://developer.arm.com/docs/den0028/latest[].
 
 If we boot the Linux kernel on QEMU and <<get-device-tree-from-a-running-kernel,dump the auto-generated device tree>>, we observe that it contains the address of the PSCI CPU_ON call:
 
@@ -14050,7 +14056,7 @@ If we boot the Linux kernel on QEMU and <<get-device-tree-from-a-running-kernel,
 
 The Linux kernel wakes up the secondary cores in this exact same way at: https://github.com/torvalds/linux/blob/v4.19/drivers/firmware/psci.c#L122 We first actually got it working here by grepping the kernel and step debugging that call :-)
 
-In gem5, CPU 1 starts woken up from the start, so PSCI is not needed. TODO gem5 actually blows up if we try to do the `hvc` call, understand why.
+In gem5, CPU 1 starts woken up from the start, so PSCI is not needed. TODO gem5 actually blows up if we try to do the HVC call, understand why.
 
 Bibliography: https://stackoverflow.com/questions/20055754/arm-start-wakeup-bringup-the-other-cpu-cores-aps-and-pass-execution-start-addre/53473447#53473447
 
@@ -14926,12 +14932,14 @@ xdg-open out/README.html
 
 Source: link:build-doc[]
 
+[[documentation-verification]]
 ==== Documentation verification
 
 When running link:build-doc[], we do the following checks:
 
 * `<<>>` inner links are not broken
 * `+link:somefile[]+` links point to paths that exist via <<asciidoctor-extract-link-targets>>. Upstream wontfix at: https://github.com/asciidoctor/asciidoctor/issues/3210
+* all links in non-README files to README IDs exist via `git grep` + <<asciidoctor-extract-header-ids>>
 
 The scripts prints what you have to fix and exits with an error status if there are any errors.
 
@@ -14952,6 +14960,37 @@ Output: one link target per line.
 
 Hastily hacked from: https://asciidoctor.org/docs/user-manual/#inline-macro-processor-example
 
+[[asciidoctor-extract-header-ids]]
+===== asciidoctor/extract-header-ids
+
+Documentation for link:asciidoctor/extract-header-ids[]
+
+Extract header IDs, both auto-generatd and manually given.
+
+E.g., for the document `test.adoc`:
+
+....
+= Auto generated
+
+[[explicitly-given]]
+== La la
+....
+
+the script:
+
+....
+./asciidoctor/extract-header-ids tes.adoc
+....
+
+produces:
+
+....
+auto-generated
+explicitly-given
+....
+
+One application we have in mind for this is that as of 2.0.10 Asciidoctor does not warn on header ID collisions between auto-generated IDs: https://github.com/asciidoctor/asciidoctor/issues/3147 But this script doesn't solve that yet as it would require generating the section IDs without the `-N` suffix. Section generation happens at `Section.generate_id` in Asciidoctor code.
+
 === Clean the build
 
 You did something crazy, and nothing seems to work anymore?
@@ -15391,7 +15430,7 @@ Buildroot packages are convenient, but in general, if a package if very importan
 
 A custom build script can give you more flexibility: e.g. the package can be made work with other root filesystems more easily, have better <<9p>> support, and rebuild faster as it evades some Buildroot boilerplate.
 
-===== kernel_modules package
+===== kernel_modules buildroot package
 
 Source: link:buildroot_packages/kernel_modules/[]
 
@@ -15427,7 +15466,8 @@ Implementattion described at: https://stackoverflow.com/questions/40307328/how-t
 
 ==== patches directory
 
-===== patches/global
+[[patches-global-directory]]
+===== patches/global directory
 
 Has the following structure:
 
@@ -15439,7 +15479,8 @@ The patches are then applied to the corresponding packages before build.
 
 Uses `BR2_GLOBAL_PATCH_DIR`.
 
-===== patches/manual
+[[patches-manual-directory]]
+===== patches/manual directory
 
 Patches in this directory are never applied automatically: it is up to users to manually apply them before usage following the instructions in this documentation.
 
diff --git a/asciidoctor/extract-header-ids b/asciidoctor/extract-header-ids
new file mode 100755
index 0000000..1eaeba1
--- /dev/null
+++ b/asciidoctor/extract-header-ids
@@ -0,0 +1,28 @@
+#!/usr/bin/env ruby
+
+require 'asciidoctor'
+require 'asciidoctor/extensions'
+require 'pry'
+
+class Main < Asciidoctor::Extensions::TreeProcessor
+  def process document
+    return unless document.blocks?
+    process_blocks document
+    nil
+  end
+
+  def process_blocks node
+    node.blocks.each_with_index do |block, i|
+      if block.context == :section
+        puts block.id
+      end
+      process_blocks block if block.blocks?
+    end
+  end
+end
+
+Asciidoctor::Extensions.register do
+  treeprocessor Main
+end
+
+(Asciidoctor.load_file(ARGV[0])).convert
diff --git a/asciidoctor/extract-link-targets b/asciidoctor/extract-link-targets
index b23ae75..bb45e42 100755
--- a/asciidoctor/extract-link-targets
+++ b/asciidoctor/extract-link-targets
@@ -1,6 +1,6 @@
 #!/usr/bin/env ruby
 
-# https://github.com/cirosantilli/linux-kernel-module-cheat#asciidoctor-extract-links
+# https://github.com/cirosantilli/linux-kernel-module-cheat#asciidoctor-extract-link-targets
 
 require 'asciidoctor'
 require 'asciidoctor/extensions'
diff --git a/baremetal/arch/aarch64/multicore.S b/baremetal/arch/aarch64/multicore.S
index 5c4bda6..1bbba90 100644
--- a/baremetal/arch/aarch64/multicore.S
+++ b/baremetal/arch/aarch64/multicore.S
@@ -39,7 +39,7 @@ cpu0_only:
 
 #if !LKMC_GEM5
     /* Wake up CPU 1 from initial sleep!
-     * See:https://github.com/cirosantilli/linux-kernel-module-cheat#psci
+     * See:https://github.com/cirosantilli/linux-kernel-module-cheat#arm-psci
      */
     /* PCSI function identifier: CPU_ON. */
     ldr w0, =0xc4000003
diff --git a/baremetal/arch/aarch64/svc.c b/baremetal/arch/aarch64/svc.c
index 7d441e0..81df9df 100644
--- a/baremetal/arch/aarch64/svc.c
+++ b/baremetal/arch/aarch64/svc.c
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#svc */
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-svc-instruction */
 
 #include <assert.h>
 #include <inttypes.h>
diff --git a/baremetal/arch/aarch64/svc_asm.S b/baremetal/arch/aarch64/svc_asm.S
index 5adaf13..e3325ea 100644
--- a/baremetal/arch/aarch64/svc_asm.S
+++ b/baremetal/arch/aarch64/svc_asm.S
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#svc */
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-svc-instruction */
 
 #include <lkmc.h>
 
diff --git a/build-doc b/build-doc
index 072e18f..2560436 100755
--- a/build-doc
+++ b/build-doc
@@ -32,6 +32,8 @@ https://github.com/cirosantilli/linux-kernel-module-cheat#build-the-documentatio
             ],
             out_file=self.env['build_doc_log'],
         )
+
+        # Check that all local files linked from README exist.
         external_link_re = re.compile('^https?://')
         for link in subprocess.check_output([
             os.path.join(asciidoctor_dir, 'extract-link-targets'),
@@ -39,8 +41,35 @@ https://github.com/cirosantilli/linux-kernel-module-cheat#build-the-documentatio
         ]).decode().splitlines():
             if not external_link_re.match(link):
                 if not os.path.lexists(link):
-                    print('error: broken link: ' + link)
+                    self.log_error('broken link: ' + link)
                     exit_status = 1
+
+        # Check that non-README links to README IDs exit.
+        header_ids = set()
+        grep_line_location_re = re.compile('^(.*?:\d+):')
+        grep_line_hash_re = re.compile('^([a-z0-9_-]+)')
+        for header_id in subprocess.check_output([
+            os.path.join(asciidoctor_dir, 'extract-header-ids'),
+            self.env['readme']
+        ]).decode().splitlines():
+            header_ids.add(header_id)
+        for grep_line in subprocess.check_output([
+            'git',
+            'grep',
+            '--fixed-strings',
+            self.env['github_repo_id_url'] + '#'
+        ]).decode().splitlines():
+            url_index = grep_line.index(self.env['github_repo_id_url'])
+            hash_start_index = url_index + len(self.env['github_repo_id_url'])
+            if len(grep_line) > hash_start_index:
+                hash_str = grep_line_hash_re.search(grep_line[hash_start_index + 1:]).group(1)
+                if not hash_str in header_ids:
+                    self.log_error('broken link to {} at {}'.format(
+                        hash_str,
+                        grep_line_location_re.search(grep_line).group(1))
+                    )
+                    exit_status = 1
+
         return exit_status
 
 if __name__ == '__main__':
diff --git a/build-docker b/build-docker
index 02a31cd..4b06200 100755
--- a/build-docker
+++ b/build-docker
@@ -13,7 +13,7 @@ class DockerComponent(self.Component):
             'description': '''\
 Build a guest root filesystem based on prebuilt Docker Ubuntu root filesystems.
 
-See also:https://github.com/cirosantilli/linux-kernel-module-cheat#ubuntu-guest-setup
+See also: https://github.com/cirosantilli/linux-kernel-module-cheatTODO#ubuntu-guest-setup
 '''
         }
 
diff --git a/buildroot_packages/kernel_modules/README.adoc b/buildroot_packages/kernel_modules/README.adoc
index b54c484..70314ab 100644
--- a/buildroot_packages/kernel_modules/README.adoc
+++ b/buildroot_packages/kernel_modules/README.adoc
@@ -1 +1 @@
-https://github.com/cirosantilli/linux-kernel-module-cheat#kernel_modules-package
+https://github.com/cirosantilli/linux-kernel-module-cheat#kernel_modules-buildroot-package
diff --git a/buildroot_packages/kernel_modules/buildroot_hello.c b/buildroot_packages/kernel_modules/buildroot_hello.c
index 0574b5a..537a67b 100644
--- a/buildroot_packages/kernel_modules/buildroot_hello.c
+++ b/buildroot_packages/kernel_modules/buildroot_hello.c
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#getting-started-natively */
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#kernel_modules-buildroot-package */
 
 #include <linux/module.h>
 #include <linux/kernel.h>
diff --git a/common.py b/common.py
index cbfcf84..86b1f28 100644
--- a/common.py
+++ b/common.py
@@ -102,6 +102,7 @@ consts['sha'] = common.git_sha(consts['root_dir'])
 consts['release_dir'] = os.path.join(consts['out_dir'], 'release')
 consts['release_zip_file'] = os.path.join(consts['release_dir'], 'lkmc-{}.zip'.format(consts['sha']))
 consts['github_repo_id'] = 'cirosantilli/linux-kernel-module-cheat'
+consts['github_repo_id_url'] = 'https://github.com/' + consts['github_repo_id']
 consts['asm_ext'] = '.S'
 consts['c_ext'] = '.c'
 consts['cxx_ext'] = '.cpp'
@@ -1131,15 +1132,15 @@ lunch aosp_{}-eng
         return '{}{}'.format(self.env['toolchain_prefix_dash'], tool)
 
     def github_make_request(
-            self,
-            authenticate=False,
-            data=None,
-            extra_headers=None,
-            path='',
-            subdomain='api',
-            url_params=None,
-            **extra_request_args
-        ):
+        self,
+        authenticate=False,
+        data=None,
+        extra_headers=None,
+        path='',
+        subdomain='api',
+        url_params=None,
+        **extra_request_args
+    ):
         if extra_headers is None:
             extra_headers = {}
         headers = {'Accept': 'application/vnd.github.v3+json'}
diff --git a/kernel_modules/hello.c b/kernel_modules/hello.c
index 5d96741..bc92c6b 100644
--- a/kernel_modules/hello.c
+++ b/kernel_modules/hello.c
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#getting-started-natively */
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#qemu-buildroot-setup-getting-started */
 
 #include <linux/module.h>
 #include <linux/kernel.h>
diff --git a/kernel_modules/hello2.c b/kernel_modules/hello2.c
index 03752a2..8235dfa 100644
--- a/kernel_modules/hello2.c
+++ b/kernel_modules/hello2.c
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#getting-started-natively */
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#qemu-buildroot-setup-getting-started */
 
 #include <linux/module.h>
 #include <linux/kernel.h>
diff --git a/linux_config/README.adoc b/linux_config/README.adoc
index db9ccc4..14ea66c 100644
--- a/linux_config/README.adoc
+++ b/linux_config/README.adoc
@@ -1 +1 @@
-https://github.com/cirosantilli/linux-kernel-module-cheat#about-our-linux-kernel-configs
+https://github.com/cirosantilli/linux-kernel-module-cheat#kernel-configs-about
diff --git a/linux_config/buildroot-aarch64 b/linux_config/buildroot-aarch64
index 09adc6a..6ffe667 100644
--- a/linux_config/buildroot-aarch64
+++ b/linux_config/buildroot-aarch64
@@ -1,4 +1,4 @@
-# https://github.com/cirosantilli/linux-kernel-module-cheat#about-buildroot-s-kernel-configs
+# https://github.com/cirosantilli/linux-kernel-module-cheat#buildroot-kernel-config
 
 CONFIG_SYSVIPC=y
 CONFIG_POSIX_MQUEUE=y
diff --git a/linux_config/buildroot-arm b/linux_config/buildroot-arm
index b8e64af..8e80fa1 100644
--- a/linux_config/buildroot-arm
+++ b/linux_config/buildroot-arm
@@ -1,4 +1,4 @@
-# https://github.com/cirosantilli/linux-kernel-module-cheat#about-buildroot-s-kernel-configs
+# https://github.com/cirosantilli/linux-kernel-module-cheat#buildroot-kernel-config
 
 # CONFIG_LOCALVERSION_AUTO is not set
 CONFIG_SYSVIPC=y
diff --git a/linux_config/buildroot-x86_64 b/linux_config/buildroot-x86_64
index 41016da..0ff1194 100644
--- a/linux_config/buildroot-x86_64
+++ b/linux_config/buildroot-x86_64
@@ -1,4 +1,4 @@
-# https://github.com/cirosantilli/linux-kernel-module-cheat#about-buildroot-s-kernel-configs
+# https://github.com/cirosantilli/linux-kernel-module-cheat#buildroot-kernel-config
 
 CONFIG_SYSVIPC=y
 CONFIG_IKCONFIG=y
diff --git a/patches/global/README.adoc b/patches/global/README.adoc
index f58bf8b..059f4dd 100644
--- a/patches/global/README.adoc
+++ b/patches/global/README.adoc
@@ -1 +1 @@
-https://github.com/cirosantilli/linux-kernel-module-cheat#patches-global
+https://github.com/cirosantilli/linux-kernel-module-cheat#patches-global-directory
diff --git a/patches/manual/README.adoc b/patches/manual/README.adoc
index dc2fab1..c3fa67b 100644
--- a/patches/manual/README.adoc
+++ b/patches/manual/README.adoc
@@ -1 +1 @@
-https://github.com/cirosantilli/linux-kernel-module-cheat#patches-manual
+https://github.com/cirosantilli/linux-kernel-module-cheat#patches-manual-directory
diff --git a/rootfs_overlay/lkmc/gem5.sh b/rootfs_overlay/lkmc/gem5.sh
index e480916..60520c5 100755
--- a/rootfs_overlay/lkmc/gem5.sh
+++ b/rootfs_overlay/lkmc/gem5.sh
@@ -1,5 +1,5 @@
 #!/bin/sh
-# https://github.com/cirosantilli/linux-kernel-module-cheat#gem5-restore-new-scrip
+# https://github.com/cirosantilli/linux-kernel-module-cheat#gem5-restore-new-script
 m5 checkpoint
 m5 resetstats
 m5 readfile | sh
diff --git a/test-gdb b/test-gdb
index c29529a..35f8c64 100755
--- a/test-gdb
+++ b/test-gdb
@@ -11,7 +11,7 @@ class Main(common.TestCliFunction):
     def __init__(self):
         super().__init__(
             description='''\
-https://github.com/cirosantilli/linux-kernel-module-cheat#test-gdb
+https://github.com/cirosantilli/linux-kernel-module-cheat#gdb-tests
 ''',
             defaults={
                 'mode': 'userland',
diff --git a/userland/arch/aarch64/fadd_scalar.S b/userland/arch/aarch64/fadd_scalar.S
index a5bb610..7b9ab2d 100644
--- a/userland/arch/aarch64/fadd_scalar.S
+++ b/userland/arch/aarch64/fadd_scalar.S
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#advanced-simd-instructions */
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#armv8-aarch64-fadd-instruction */
 
 #include <lkmc.h>
 
diff --git a/userland/arch/aarch64/fadd_vector.S b/userland/arch/aarch64/fadd_vector.S
index d54415e..4ce92dc 100644
--- a/userland/arch/aarch64/fadd_vector.S
+++ b/userland/arch/aarch64/fadd_vector.S
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#armv8-aarch64-fadd-vector-instruction
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#armv8-aarch64-fadd-instruction
  *
  * Add a bunch of floating point numbers in one go.
  */
diff --git a/userland/arch/aarch64/ld2.S b/userland/arch/aarch64/ld2.S
index 16f73d4..eebde98 100644
--- a/userland/arch/aarch64/ld2.S
+++ b/userland/arch/aarch64/ld2.S
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-ld2-instruction */
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#armv8-aarch64-ld2-instruction */
 
 #include <lkmc.h>
 
diff --git a/userland/arch/arm/add.S b/userland/arch/arm/add.S
index f719254..0ea1b99 100644
--- a/userland/arch/arm/add.S
+++ b/userland/arch/arm/add.S
@@ -28,7 +28,7 @@ LKMC_PROLOGUE
 
 #if 0
     /* But we cannot omit the register if there is a shift when using .syntx unified:
-     * https://github.com/cirosantilli/linux-kernel-module-cheat#shift-suffixes
+     * https://github.com/cirosantilli/linux-kernel-module-cheat#arm-shift-suffixes
      */
     .syntax unified
     /* Error: garbage following instruction */
diff --git a/userland/arch/arm/clz.S b/userland/arch/arm/clz.S
index 7d96ae6..53a910f 100644
--- a/userland/arch/arm/clz.S
+++ b/userland/arch/arm/clz.S
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#data-bitwise-instructions */
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-bitwise-instructions */
 
 #include <lkmc.h>
 
diff --git a/userland/arch/arm/immediates.S b/userland/arch/arm/immediates.S
index 3907f03..d454a2b 100644
--- a/userland/arch/arm/immediates.S
+++ b/userland/arch/arm/immediates.S
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-gnu-instruction-gas-assembler-immediates */
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#gnu-gas-assembler-immediates */
 
 #include <lkmc.h>
 
diff --git a/userland/arch/arm/inc_array.S b/userland/arch/arm/inc_array.S
index 5433aa5..f84b50d 100644
--- a/userland/arch/arm/inc_array.S
+++ b/userland/arch/arm/inc_array.S
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-loop-instruction-over-array */
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-loop-over-array */
 
 #include <lkmc.h>
 
diff --git a/userland/arch/arm/ldmia.S b/userland/arch/arm/ldmia.S
index c7c21b4..a52cafa 100644
--- a/userland/arch/arm/ldmia.S
+++ b/userland/arch/arm/ldmia.S
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-loop-instruction-over-array */
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-ldmia-instruction */
 
 #include <lkmc.h>
 
diff --git a/userland/arch/arm/ldr_pseudo.S b/userland/arch/arm/ldr_pseudo.S
index 9a21b05..592847f 100644
--- a/userland/arch/arm/ldr_pseudo.S
+++ b/userland/arch/arm/ldr_pseudo.S
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-ldr-instruction-pseudo-instruction */
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-ldr-pseudo-instruction */
 
 #include <lkmc.h>
 
diff --git a/userland/arch/arm/ldrb.S b/userland/arch/arm/ldrb.S
index 556ed68..86ce0e5 100644
--- a/userland/arch/arm/ldrb.S
+++ b/userland/arch/arm/ldrb.S
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-ldrh-instruction-and-ldrb */
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-ldrh-and-ldrb-instructions */
 
 #include <lkmc.h>
 
diff --git a/userland/arch/arm/ldrh.S b/userland/arch/arm/ldrh.S
index e4bf7dc..7efca77 100644
--- a/userland/arch/arm/ldrh.S
+++ b/userland/arch/arm/ldrh.S
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-ldrh-instruction-and-ldrb */
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-ldrh-and-ldrb-instructions */
 
 #include <lkmc.h>
 
diff --git a/userland/arch/arm/rev.S b/userland/arch/arm/rev.S
index 3d62e04..2ffa3b1 100644
--- a/userland/arch/arm/rev.S
+++ b/userland/arch/arm/rev.S
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#data-processing-instructions
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-data-processing-instructions
  *
  * Reverse byte order.
  */
diff --git a/userland/arch/arm/str.S b/userland/arch/arm/str.S
index 15aace3..1ee6065 100644
--- a/userland/arch/arm/str.S
+++ b/userland/arch/arm/str.S
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#load-and-store-instructions */
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-str-instruction */
 
 #include <lkmc.h>
 
@@ -44,7 +44,7 @@ LKMC_PROLOGUE
      * but it will always segfault under Linux because the text segment is read-only.
      * This is however useful in baremetal programming.
      * This construct is not possible in ARMv8 for str:
-     * https://github.com/cirosantilli/linux-kernel-module-cheat#armv8-str
+     * https://github.com/cirosantilli/linux-kernel-module-cheat#armv8-aarch64-str-instruction
      */
     str r1, var_in_same_section
 var_in_same_section:
diff --git a/userland/arch/arm/vadd_scalar.S b/userland/arch/arm/vadd_scalar.S
index f5fc4fa..63198d1 100644
--- a/userland/arch/arm/vadd_scalar.S
+++ b/userland/arch/arm/vadd_scalar.S
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#vfp
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-vadd-instruction
  * Adapted from: https://mindplusplus.wordpress.com/2013/06/27/arm-vfp-vector-programming-part-2-examples/ */
 
 #include <lkmc.h>
diff --git a/userland/arch/arm/vcvta.S b/userland/arch/arm/vcvta.S
index 2da2a7a..1f7d890 100644
--- a/userland/arch/arm/vcvta.S
+++ b/userland/arch/arm/vcvta.S
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-vcvta-instruction */
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#armv8-aarch32-vcvta-instruction */
 
 #include <lkmc.h>
 
diff --git a/userland/arch/arm/vcvtr.S b/userland/arch/arm/vcvtr.S
index 0f1e215..3e00fe6 100644
--- a/userland/arch/arm/vcvtr.S
+++ b/userland/arch/arm/vcvtr.S
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-vcvtrr-instruction */
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-vcvtr-instruction */
 
 #include <lkmc.h>
 
diff --git a/userland/arch/x86_64/addpd.S b/userland/arch/x86_64/addpd.S
index db91964..e00ac79 100644
--- a/userland/arch/x86_64/addpd.S
+++ b/userland/arch/x86_64/addpd.S
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#x86-addpq-instruction
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#x86-addpd-instruction
  *
  * Add a few floating point numbers in one go (P == packaged)
  */
diff --git a/userland/arch/x86_64/rdtsc.c b/userland/arch/x86_64/rdtsc.c
index 219553c..a4c2143 100644
--- a/userland/arch/x86_64/rdtsc.c
+++ b/userland/arch/x86_64/rdtsc.c
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#rdtsc */
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#x86-rdtsc-instruction */
 
 #include <stdint.h>
 #include <stdio.h>
diff --git a/userland/c/file_write_read.c b/userland/c/file_write_read.c
index 486b65b..63b8c16 100644
--- a/userland/c/file_write_read.c
+++ b/userland/c/file_write_read.c
@@ -1,3 +1,5 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#c */
+
 #include <lkmc.h>
 
 #include <assert.h>
diff --git a/userland/cpp/hello.cpp b/userland/cpp/hello.cpp
index b9ca976..ab0f367 100644
--- a/userland/cpp/hello.cpp
+++ b/userland/cpp/hello.cpp
@@ -1,4 +1,4 @@
-// https://github.com/cirosantilli/linux-kernel-module-cheat#sanity-checks
+// https://github.com/cirosantilli/linux-kernel-module-cheat#cpp
 
 #include <iostream>
 
diff --git a/userland/linux/sched_getaffinity.c b/userland/linux/sched_getaffinity.c
index 901fece..61649b9 100644
--- a/userland/linux/sched_getaffinity.c
+++ b/userland/linux/sched_getaffinity.c
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#gdb-step-debug-multicore */
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#gdb-step-debug-multicore-userland */
 
 #define _GNU_SOURCE
 #include <assert.h>
diff --git a/userland/linux/sched_getaffinity_threads.c b/userland/linux/sched_getaffinity_threads.c
index 293f4d5..38a574f 100644
--- a/userland/linux/sched_getaffinity_threads.c
+++ b/userland/linux/sched_getaffinity_threads.c
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#gdb-step-debug-multicore */
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#gdb-step-debug-multicore-userland */
 
 #define _GNU_SOURCE
 #include <assert.h>