diff --git a/index.html b/index.html
index 944201a..ad16af9 100644
--- a/index.html
+++ b/index.html
@@ -1026,73 +1026,74 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
 <li><a href="#qemu">18. QEMU</a>
 <ul class="sectlevel2">
 <li><a href="#introduction-to-qemu">18.1. Introduction to QEMU</a></li>
-<li><a href="#disk-persistency">18.2. Disk persistency</a>
+<li><a href="#binary-translation">18.2. Binary translation</a></li>
+<li><a href="#disk-persistency">18.3. Disk persistency</a>
 <ul class="sectlevel3">
-<li><a href="#gem5-disk-persistency">18.2.1. gem5 disk persistency</a></li>
+<li><a href="#gem5-disk-persistency">18.3.1. gem5 disk persistency</a></li>
 </ul>
 </li>
-<li><a href="#gem5-qcow2">18.3. gem5 qcow2</a></li>
-<li><a href="#snapshot">18.4. Snapshot</a>
+<li><a href="#gem5-qcow2">18.4. gem5 qcow2</a></li>
+<li><a href="#snapshot">18.5. Snapshot</a>
 <ul class="sectlevel3">
-<li><a href="#snapshot-internals">18.4.1. Snapshot internals</a></li>
+<li><a href="#snapshot-internals">18.5.1. Snapshot internals</a></li>
 </ul>
 </li>
-<li><a href="#device-models">18.5. Device models</a>
+<li><a href="#device-models">18.6. Device models</a>
 <ul class="sectlevel3">
-<li><a href="#pci">18.5.1. PCI</a>
+<li><a href="#pci">18.6.1. PCI</a>
 <ul class="sectlevel4">
-<li><a href="#pci-min">18.5.1.1. pci_min</a></li>
-<li><a href="#qemu-edu">18.5.1.2. QEMU edu PCI device</a></li>
-<li><a href="#manipulate-pci-registers-directly">18.5.1.3. Manipulate PCI registers directly</a></li>
-<li><a href="#pciutils">18.5.1.4. pciutils</a></li>
-<li><a href="#introduction-to-pci">18.5.1.5. Introduction to PCI</a></li>
-<li><a href="#pci-bfd">18.5.1.6. PCI BFD</a></li>
-<li><a href="#pci-bar">18.5.1.7. PCI BAR</a></li>
+<li><a href="#pci-min">18.6.1.1. pci_min</a></li>
+<li><a href="#qemu-edu">18.6.1.2. QEMU edu PCI device</a></li>
+<li><a href="#manipulate-pci-registers-directly">18.6.1.3. Manipulate PCI registers directly</a></li>
+<li><a href="#pciutils">18.6.1.4. pciutils</a></li>
+<li><a href="#introduction-to-pci">18.6.1.5. Introduction to PCI</a></li>
+<li><a href="#pci-bfd">18.6.1.6. PCI BFD</a></li>
+<li><a href="#pci-bar">18.6.1.7. PCI BAR</a></li>
 </ul>
 </li>
-<li><a href="#gpio">18.5.2. GPIO</a></li>
-<li><a href="#leds">18.5.3. LEDs</a></li>
-<li><a href="#platform-device">18.5.4. platform_device</a></li>
-<li><a href="#gem5-educational-hardware-models">18.5.5. gem5 educational hardware models</a></li>
+<li><a href="#gpio">18.6.2. GPIO</a></li>
+<li><a href="#leds">18.6.3. LEDs</a></li>
+<li><a href="#platform-device">18.6.4. platform_device</a></li>
+<li><a href="#gem5-educational-hardware-models">18.6.5. gem5 educational hardware models</a></li>
 </ul>
 </li>
-<li><a href="#qemu-monitor">18.6. QEMU monitor</a>
+<li><a href="#qemu-monitor">18.7. QEMU monitor</a>
 <ul class="sectlevel3">
-<li><a href="#qemu-monitor-from-guest">18.6.1. QEMU monitor from guest</a></li>
-<li><a href="#qemu-monitor-from-gdb">18.6.2. QEMU monitor from GDB</a></li>
+<li><a href="#qemu-monitor-from-guest">18.7.1. QEMU monitor from guest</a></li>
+<li><a href="#qemu-monitor-from-gdb">18.7.2. QEMU monitor from GDB</a></li>
 </ul>
 </li>
-<li><a href="#debug-the-emulator">18.7. Debug the emulator</a>
+<li><a href="#debug-the-emulator">18.8. Debug the emulator</a>
 <ul class="sectlevel3">
-<li><a href="#reverse-debug-the-emulator">18.7.1. Reverse debug the emulator</a></li>
-<li><a href="#debug-gem5-python-scripts">18.7.2. Debug gem5 Python scripts</a></li>
+<li><a href="#reverse-debug-the-emulator">18.8.1. Reverse debug the emulator</a></li>
+<li><a href="#debug-gem5-python-scripts">18.8.2. Debug gem5 Python scripts</a></li>
 </ul>
 </li>
-<li><a href="#tracing">18.8. Tracing</a>
+<li><a href="#tracing">18.9. Tracing</a>
 <ul class="sectlevel3">
-<li><a href="#qemu-d-tracing">18.8.1. QEMU -d tracing</a></li>
-<li><a href="#qemu-trace-register-values">18.8.2. QEMU trace register values</a></li>
-<li><a href="#qemu-trace-memory-accesses">18.8.3. QEMU trace memory accesses</a></li>
-<li><a href="#trace-source-lines">18.8.4. Trace source lines</a></li>
-<li><a href="#qemu-record-and-replay">18.8.5. QEMU record and replay</a>
+<li><a href="#qemu-d-tracing">18.9.1. QEMU -d tracing</a></li>
+<li><a href="#qemu-trace-register-values">18.9.2. QEMU trace register values</a></li>
+<li><a href="#qemu-trace-memory-accesses">18.9.3. QEMU trace memory accesses</a></li>
+<li><a href="#trace-source-lines">18.9.4. Trace source lines</a></li>
+<li><a href="#qemu-record-and-replay">18.9.5. QEMU record and replay</a>
 <ul class="sectlevel4">
-<li><a href="#qemu-reverse-debugging">18.8.5.1. QEMU reverse debugging</a></li>
+<li><a href="#qemu-reverse-debugging">18.9.5.1. QEMU reverse debugging</a></li>
 </ul>
 </li>
-<li><a href="#qemu-trace-multicore">18.8.6. QEMU trace multicore</a></li>
-<li><a href="#qemu-get-guest-instruction-count">18.8.7. QEMU get guest instruction count</a></li>
-<li><a href="#gem5-tracing">18.8.8. gem5 tracing</a>
+<li><a href="#qemu-trace-multicore">18.9.6. QEMU trace multicore</a></li>
+<li><a href="#qemu-get-guest-instruction-count">18.9.7. QEMU get guest instruction count</a></li>
+<li><a href="#gem5-tracing">18.9.8. gem5 tracing</a>
 <ul class="sectlevel4">
-<li><a href="#gem5-trace-internals">18.8.8.1. gem5 trace internals</a></li>
-<li><a href="#gem5-execall-trace-format">18.8.8.2. gem5 ExecAll trace format</a></li>
-<li><a href="#gem5-registers-trace-format">18.8.8.3. gem5 Registers trace format</a></li>
-<li><a href="#gem5-tarmac-traces">18.8.8.4. gem5 TARMAC traces</a></li>
-<li><a href="#gem5-tracing-internals">18.8.8.5. gem5 tracing internals</a></li>
+<li><a href="#gem5-trace-internals">18.9.8.1. gem5 trace internals</a></li>
+<li><a href="#gem5-execall-trace-format">18.9.8.2. gem5 ExecAll trace format</a></li>
+<li><a href="#gem5-registers-trace-format">18.9.8.3. gem5 Registers trace format</a></li>
+<li><a href="#gem5-tarmac-traces">18.9.8.4. gem5 TARMAC traces</a></li>
+<li><a href="#gem5-tracing-internals">18.9.8.5. gem5 tracing internals</a></li>
 </ul>
 </li>
 </ul>
 </li>
-<li><a href="#qemu-gui-is-unresponsive">18.9. QEMU GUI is unresponsive</a></li>
+<li><a href="#qemu-gui-is-unresponsive">18.10. QEMU GUI is unresponsive</a></li>
 </ul>
 </li>
 <li><a href="#gem5">19. gem5</a>
@@ -1140,6 +1141,7 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
 <li><a href="#gem5-fast-forward">19.5.4.1. gem5 fast forward</a></li>
 </ul>
 </li>
+<li><a href="#gem5-checkpoint-upgrader">19.5.5. gem5 checkpoint upgrader</a></li>
 </ul>
 </li>
 <li><a href="#pass-extra-options-to-gem5">19.6. Pass extra options to gem5</a></li>
@@ -1224,7 +1226,13 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
 </ul>
 </li>
 <li><a href="#gem5-minorcpu">19.16.1.2. gem5 MinorCPU</a></li>
-<li><a href="#gem5-derivo3cpu">19.16.1.3. gem5 DerivO3CPU</a></li>
+<li><a href="#gem5-derivo3cpu">19.16.1.3. gem5 <code>DerivO3CPU</code></a>
+<ul class="sectlevel5">
+<li><a href="#gem5-derivo3cpu-pipeline-stages">19.16.1.3.1. gem5 <code>DerivO3CPU</code> pipeline stages</a></li>
+<li><a href="#gem5-utilo3-pipeview-py-o3-pipeline-viewer">19.16.1.3.2. gem5 util/o3-pipeview.py O3 pipeline viewer</a></li>
+<li><a href="#gem5-konata-o3-pipeline-viewer">19.16.1.3.3. gem5 Konata O3 pipeline viewer</a></li>
+</ul>
+</li>
 </ul>
 </li>
 <li><a href="#gem5-arm-rsk">19.16.2. gem5 ARM RSK</a></li>
@@ -1296,8 +1304,22 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
 <li><a href="#gem5-event-queue-atomicsimplecpu-syscall-emulation-freestanding-example-analysis-with-caches-and-multiple-cpus-and-ruby">19.20.4.4.1. gem5 event queue AtomicSimpleCPU syscall emulation freestanding example analysis with caches and multiple CPUs and Ruby</a></li>
 </ul>
 </li>
-<li><a href="#gem5-event-queue-minorcpu-syscall-emulation-freestanding-example-analysis">19.20.4.5. gem5 event queue MinorCPU syscall emulation freestanding example analysis</a></li>
-<li><a href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis">19.20.4.6. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis</a></li>
+<li><a href="#gem5-event-queue-minorcpu-syscall-emulation-freestanding-example-analysis">19.20.4.5. gem5 event queue MinorCPU syscall emulation freestanding example analysis</a>
+<ul class="sectlevel5">
+<li><a href="#gem5-event-queue-minorcpu-syscall-emulation-freestanding-example-analysis-hazard">19.20.4.5.1. gem5 event queue MinorCPU syscall emulation freestanding example analysis: hazard</a></li>
+</ul>
+</li>
+<li><a href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis">19.20.4.6. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis</a>
+<ul class="sectlevel5">
+<li><a href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-hazardless">19.20.4.6.1. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazardless</a></li>
+<li><a href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-hazard">19.20.4.6.2. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazard</a></li>
+<li><a href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-hazard4">19.20.4.6.3. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazard4</a></li>
+<li><a href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-stall">19.20.4.6.4. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall</a></li>
+<li><a href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-stall-gain">19.20.4.6.5. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall-gain</a></li>
+<li><a href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-stall-hazard4">19.20.4.6.6. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall-hazard4</a></li>
+<li><a href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-speculative">19.20.4.6.7. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: speculative</a></li>
+</ul>
+</li>
 </ul>
 </li>
 <li><a href="#gem5-instruction-definitions">19.20.5. gem5 instruction definitions</a>
@@ -1336,7 +1358,12 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
 <li><a href="#gem5-process">19.20.7.4. gem5 <code>Process</code></a></li>
 </ul>
 </li>
-<li><a href="#gem5-functional-units">19.20.8. gem5 functional units</a></li>
+<li><a href="#gem5-functional-units">19.20.8. gem5 functional units</a>
+<ul class="sectlevel4">
+<li><a href="#gem5-minorcpu-default-functional-units">19.20.8.1. gem5 <code>MinorCPU</code> default functional units</a></li>
+<li><a href="#gem5-derivo3cpu-default-functional-units">19.20.8.2. gem5 DerivO3CPU default functional units</a></li>
+</ul>
+</li>
 <li><a href="#gem5-code-generation">19.20.9. gem5 code generation</a>
 <ul class="sectlevel4">
 <li><a href="#gem5-the-isa">19.20.9.1. gem5 THE_ISA</a></li>
@@ -1352,6 +1379,7 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
 </li>
 </ul>
 </li>
+<li><a href="#gensim">19.21. Gensim</a></li>
 </ul>
 </li>
 <li><a href="#buildroot">20. Buildroot</a>
@@ -1414,7 +1442,11 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
 <li><a href="#gcc-c-extensions">21.1.3. GCC C extensions</a>
 <ul class="sectlevel4">
 <li><a href="#c-empty-struct">21.1.3.1. C empty struct</a></li>
-<li><a href="#openmp">21.1.3.2. OpenMP</a></li>
+<li><a href="#openmp">21.1.3.2. OpenMP</a>
+<ul class="sectlevel5">
+<li><a href="#openmp-validation">21.1.3.2.1. OpenMP validation</a></li>
+</ul>
+</li>
 </ul>
 </li>
 </ul>
@@ -1508,14 +1540,15 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
 <ul class="sectlevel3">
 <li><a href="#boost">21.8.1. Boost</a></li>
 <li><a href="#dhrystone">21.8.2. Dhrystone</a></li>
-<li><a href="#stream-benchmark">21.8.3. STREAM benchmark</a></li>
-<li><a href="#parsec-benchmark">21.8.4. PARSEC benchmark</a>
+<li><a href="#lmbench">21.8.3. LMbench</a></li>
+<li><a href="#stream-benchmark">21.8.4. STREAM benchmark</a></li>
+<li><a href="#parsec-benchmark">21.8.5. PARSEC benchmark</a>
 <ul class="sectlevel4">
-<li><a href="#parsec-benchmark-without-parsecmgmt">21.8.4.1. PARSEC benchmark without parsecmgmt</a></li>
-<li><a href="#parsec-change-the-input-size">21.8.4.2. PARSEC change the input size</a></li>
-<li><a href="#parsec-benchmark-with-parsecmgmt">21.8.4.3. PARSEC benchmark with parsecmgmt</a></li>
-<li><a href="#parsec-uninstall">21.8.4.4. PARSEC uninstall</a></li>
-<li><a href="#parsec-benchmark-hacking">21.8.4.5. PARSEC benchmark hacking</a></li>
+<li><a href="#parsec-benchmark-without-parsecmgmt">21.8.5.1. PARSEC benchmark without parsecmgmt</a></li>
+<li><a href="#parsec-change-the-input-size">21.8.5.2. PARSEC change the input size</a></li>
+<li><a href="#parsec-benchmark-with-parsecmgmt">21.8.5.3. PARSEC benchmark with parsecmgmt</a></li>
+<li><a href="#parsec-uninstall">21.8.5.4. PARSEC uninstall</a></li>
+<li><a href="#parsec-benchmark-hacking">21.8.5.5. PARSEC benchmark hacking</a></li>
 </ul>
 </li>
 </ul>
@@ -1800,6 +1833,11 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
 <ul class="sectlevel3">
 <li><a href="#arm-nop-instruction">24.5.1. ARM NOP instruction</a></li>
 <li><a href="#arm-udf-instruction">24.5.2. ARM UDF instruction</a></li>
+<li><a href="#arm-system-register-instructions">24.5.3. ARM system register instructions</a>
+<ul class="sectlevel4">
+<li><a href="#arm-system-register-encodings">24.5.3.1. ARM system register encodings</a></li>
+</ul>
+</li>
 </ul>
 </li>
 <li><a href="#arm-simd">24.6. ARM SIMD</a>
@@ -1866,13 +1904,16 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
 <li><a href="#armarm7">24.9.2.1. ARMv7 architecture reference manual</a></li>
 <li><a href="#armarm8">24.9.2.2. ARMv8 architecture reference manual</a></li>
 <li><a href="#armarm8-db">24.9.2.3. ARMv8 architecture reference manual db</a></li>
-<li><a href="#armv8-programmers-guide">24.9.2.4. Programmer&#8217;s Guide for ARMv8-A</a></li>
-<li><a href="#arm-a64-instruction-set-architecture-future-architecture-technologies-in-the-a-architecture-profile-documentation">24.9.2.5. Arm A64 Instruction Set Architecture: Future Architecture Technologies in the A architecture profile Documentation</a></li>
-<li><a href="#arm-processor-documentation">24.9.2.6. ARM processor documentation</a>
+<li><a href="#armarm8-fa">24.9.2.4. ARMv8 architecture reference manual db</a></li>
+<li><a href="#armv8-programmers-guide">24.9.2.5. Programmer&#8217;s Guide for ARMv8-A</a></li>
+<li><a href="#arm-a64-instruction-set-architecture-future-architecture-technologies-in-the-a-architecture-profile-documentation">24.9.2.6. Arm A64 Instruction Set Architecture: Future Architecture Technologies in the A architecture profile Documentation</a></li>
+<li><a href="#arm-processor-documentation">24.9.2.7. ARM processor documentation</a>
 <ul class="sectlevel5">
-<li><a href="#arm-cortex15-trm">24.9.2.6.1. ARM Cortex-A15 MPCore Processor Technical Reference Manual r4p0</a></li>
+<li><a href="#arm-cortex15-trm">24.9.2.7.1. ARM Cortex-A15 MPCore Processor Technical Reference Manual r4p0</a></li>
 </ul>
 </li>
+<li><a href="#arm-cortex-a77-trm">24.9.2.8. Arm Cortex‑A77 Technical Reference Manual r1p1</a></li>
+<li><a href="#arm-cortex-a77-sog">24.9.2.9. Arm Cortex‑A77 Software Optimization Guide r1p1</a></li>
 </ul>
 </li>
 </ul>
@@ -1886,7 +1927,11 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
 <li><a href="#baremetal-gdb-step-debug">27.1. Baremetal GDB step debug</a></li>
 <li><a href="#baremetal-bootloaders">27.2. Baremetal bootloaders</a></li>
 <li><a href="#baremetal-linker-script">27.3. Baremetal linker script</a></li>
-<li><a href="#baremetal-command-line-arguments">27.4. Baremetal command line arguments</a></li>
+<li><a href="#baremetal-command-line-arguments">27.4. Baremetal command line arguments</a>
+<ul class="sectlevel3">
+<li><a href="#gem5-baremetal-arm-cli-args">27.4.1. gem5 baremetal arm CLI args</a></li>
+</ul>
+</li>
 <li><a href="#semihosting">27.5. Semihosting</a>
 <ul class="sectlevel3">
 <li><a href="#gem5-semihosting">27.5.1. gem5 semihosting</a></li>
@@ -2034,26 +2079,41 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
 <li><a href="#classic-risc-pipeline">32.1.1. Classic RISC pipeline</a></li>
 </ul>
 </li>
-<li><a href="#superscalar-processor">32.2. Superscalar processor</a></li>
-<li><a href="#out-of-order-execution">32.3. Out-of-order execution</a></li>
-<li><a href="#hardware-threads">32.4. Hardware threads</a></li>
-<li><a href="#cache-coherence">32.5. Cache coherence</a>
+<li><a href="#superscalar-processor">32.2. Superscalar processor</a>
 <ul class="sectlevel3">
-<li><a href="#memory-consistency">32.5.1. Memory consistency</a>
-<ul class="sectlevel4">
-<li><a href="#sequential-consistency">32.5.1.1. Sequential Consistency</a></li>
+<li><a href="#execution-unit">32.2.1. Execution unit</a></li>
 </ul>
 </li>
-<li><a href="#can-caches-snoop-data-from-other-caches">32.5.2. Can caches snoop data from other caches?</a></li>
-<li><a href="#vi-cache-coherence-protocol">32.5.3. VI cache coherence protocol</a></li>
-<li><a href="#msi-cache-coherence-protocol">32.5.4. MSI cache coherence protocol</a>
+<li><a href="#out-of-order-execution">32.3. Out-of-order execution</a>
+<ul class="sectlevel3">
+<li><a href="#speculative-execution">32.3.1. Speculative execution</a>
 <ul class="sectlevel4">
-<li><a href="#msi-cache-coherence-protocol-with-transient-states">32.5.4.1. MSI cache coherence protocol with transient states</a></li>
+<li><a href="#branch-predictor">32.3.1.1. Branch predictor</a></li>
 </ul>
 </li>
-<li><a href="#mesi-cache-coherence-protocol">32.5.5. MESI cache coherence protocol</a></li>
-<li><a href="#mosi-cache-coherence-protocol">32.5.6. MOSI cache coherence protocol</a></li>
-<li><a href="#moesi">32.5.7. MOESI cache coherence protocol</a></li>
+<li><a href="#re-order-buffer">32.3.2. Re-order buffer</a></li>
+<li><a href="#register-renaming">32.3.3. Register renaming</a></li>
+</ul>
+</li>
+<li><a href="#instruction-level-parallelism">32.4. Instruction level parallelism</a></li>
+<li><a href="#hardware-threads">32.5. Hardware threads</a></li>
+<li><a href="#cache-coherence">32.6. Cache coherence</a>
+<ul class="sectlevel3">
+<li><a href="#memory-consistency">32.6.1. Memory consistency</a>
+<ul class="sectlevel4">
+<li><a href="#sequential-consistency">32.6.1.1. Sequential Consistency</a></li>
+</ul>
+</li>
+<li><a href="#can-caches-snoop-data-from-other-caches">32.6.2. Can caches snoop data from other caches?</a></li>
+<li><a href="#vi-cache-coherence-protocol">32.6.3. VI cache coherence protocol</a></li>
+<li><a href="#msi-cache-coherence-protocol">32.6.4. MSI cache coherence protocol</a>
+<ul class="sectlevel4">
+<li><a href="#msi-cache-coherence-protocol-with-transient-states">32.6.4.1. MSI cache coherence protocol with transient states</a></li>
+</ul>
+</li>
+<li><a href="#mesi-cache-coherence-protocol">32.6.5. MESI cache coherence protocol</a></li>
+<li><a href="#mosi-cache-coherence-protocol">32.6.6. MOSI cache coherence protocol</a></li>
+<li><a href="#moesi">32.6.7. MOESI cache coherence protocol</a></li>
 </ul>
 </li>
 </ul>
@@ -2089,7 +2149,11 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
 <li><a href="#ccache">33.9. ccache</a></li>
 <li><a href="#getvar">33.10. getvar</a>
 <ul class="sectlevel3">
-<li><a href="#run-toolchain">33.10.1. run-toolchain</a></li>
+<li><a href="#run-toolchain">33.10.1. run-toolchain</a>
+<ul class="sectlevel4">
+<li><a href="#disas">33.10.1.1. disas</a></li>
+</ul>
+</li>
 </ul>
 </li>
 <li><a href="#rebuild-buildroot-while-running">33.11. Rebuild Buildroot while running</a></li>
@@ -2107,79 +2171,80 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
 <li><a href="#buildroot-build-variants">33.13.4. Buildroot build variants</a></li>
 </ul>
 </li>
-<li><a href="#directory-structure">33.14. Directory structure</a>
+<li><a href="#optimization-level-of-a-build">33.14. Optimization level of a build</a></li>
+<li><a href="#directory-structure">33.15. Directory structure</a>
 <ul class="sectlevel3">
-<li><a href="#lkmc-directory">33.14.1. lkmc directory</a>
+<li><a href="#lkmc-directory">33.15.1. lkmc directory</a>
 <ul class="sectlevel4">
-<li><a href="#userland-objects-vs-header-only">33.14.1.1. Userland objects vs header-only</a></li>
+<li><a href="#userland-objects-vs-header-only">33.15.1.1. Userland objects vs header-only</a></li>
 </ul>
 </li>
-<li><a href="#buildroot_packages-directory">33.14.2. buildroot_packages directory</a>
+<li><a href="#buildroot_packages-directory">33.15.2. buildroot_packages directory</a>
 <ul class="sectlevel4">
-<li><a href="#kernel-modules-buildroot-package">33.14.2.1. kernel_modules buildroot package</a></li>
+<li><a href="#kernel-modules-buildroot-package">33.15.2.1. kernel_modules buildroot package</a></li>
 </ul>
 </li>
-<li><a href="#patches-directory">33.14.3. patches directory</a>
+<li><a href="#patches-directory">33.15.3. patches directory</a>
 <ul class="sectlevel4">
-<li><a href="#patches-global-directory">33.14.3.1. patches/global directory</a></li>
-<li><a href="#patches-manual-directory">33.14.3.2. patches/manual directory</a></li>
+<li><a href="#patches-global-directory">33.15.3.1. patches/global directory</a></li>
+<li><a href="#patches-manual-directory">33.15.3.2. patches/manual directory</a></li>
 </ul>
 </li>
-<li><a href="#rootfs_overlay">33.14.4. rootfs_overlay</a>
+<li><a href="#rootfs_overlay">33.15.4. rootfs_overlay</a>
 <ul class="sectlevel4">
-<li><a href="#out_rootfs_overlay_dir">33.14.4.1. out_rootfs_overlay_dir</a></li>
+<li><a href="#out_rootfs_overlay_dir">33.15.4.1. out_rootfs_overlay_dir</a></li>
 </ul>
 </li>
-<li><a href="#lkmc-c">33.14.5. lkmc.c</a></li>
-<li><a href="#lkmc_home">33.14.6. lkmc_home</a></li>
-<li><a href="#path-properties">33.14.7. path_properties.py</a></li>
-<li><a href="#rand_check-out">33.14.8. rand_check.out</a></li>
+<li><a href="#lkmc-c">33.15.5. lkmc.c</a></li>
+<li><a href="#lkmc_home">33.15.6. lkmc_home</a></li>
+<li><a href="#path-properties">33.15.7. path_properties.py</a></li>
+<li><a href="#rand_check-out">33.15.8. rand_check.out</a></li>
 </ul>
 </li>
-<li><a href="#test-this-repo">33.15. Test this repo</a>
+<li><a href="#test-this-repo">33.16. Test this repo</a>
 <ul class="sectlevel3">
-<li><a href="#automated-tests">33.15.1. Automated tests</a>
+<li><a href="#automated-tests">33.16.1. Automated tests</a>
 <ul class="sectlevel4">
-<li><a href="#test-arch-and-emulator-selection">33.15.1.1. Test arch and emulator selection</a></li>
-<li><a href="#quit-on-fail">33.15.1.2. Quit on fail</a></li>
-<li><a href="#test-userland-in-full-system">33.15.1.3. Test userland in full system</a></li>
-<li><a href="#gdb-tests">33.15.1.4. GDB tests</a></li>
-<li><a href="#magic-failure-string">33.15.1.5. Magic failure string</a></li>
+<li><a href="#test-arch-and-emulator-selection">33.16.1.1. Test arch and emulator selection</a></li>
+<li><a href="#quit-on-fail">33.16.1.2. Quit on fail</a></li>
+<li><a href="#test-userland-in-full-system">33.16.1.3. Test userland in full system</a></li>
+<li><a href="#gdb-tests">33.16.1.4. GDB tests</a></li>
+<li><a href="#magic-failure-string">33.16.1.5. Magic failure string</a></li>
 </ul>
 </li>
-<li><a href="#non-automated-tests">33.15.2. Non-automated tests</a>
+<li><a href="#non-automated-tests">33.16.2. Non-automated tests</a>
 <ul class="sectlevel4">
-<li><a href="#test-gdb-linux-kernel">33.15.2.1. Test GDB Linux kernel</a></li>
-<li><a href="#test-the-internet">33.15.2.2. Test the Internet</a></li>
-<li><a href="#cli-script-tests">33.15.2.3. CLI script tests</a></li>
+<li><a href="#test-gdb-linux-kernel">33.16.2.1. Test GDB Linux kernel</a></li>
+<li><a href="#test-the-internet">33.16.2.2. Test the Internet</a></li>
+<li><a href="#cli-script-tests">33.16.2.3. CLI script tests</a></li>
 </ul>
 </li>
 </ul>
 </li>
-<li><a href="#bisection">33.16. Bisection</a></li>
-<li><a href="#update-a-forked-submodule">33.17. Update a forked submodule</a></li>
-<li><a href="#release">33.18. Release</a>
+<li><a href="#bisection">33.17. Bisection</a></li>
+<li><a href="#update-a-forked-submodule">33.18. Update a forked submodule</a></li>
+<li><a href="#release">33.19. Release</a>
 <ul class="sectlevel3">
-<li><a href="#release-procedure">33.18.1. Release procedure</a></li>
-<li><a href="#release-zip">33.18.2. release-zip</a></li>
-<li><a href="#release-upload">33.18.3. release-upload</a></li>
+<li><a href="#release-procedure">33.19.1. Release procedure</a></li>
+<li><a href="#release-zip">33.19.2. release-zip</a></li>
+<li><a href="#release-upload">33.19.3. release-upload</a></li>
 </ul>
 </li>
-<li><a href="#design-rationale">33.19. Design rationale</a>
+<li><a href="#design-rationale">33.20. Design rationale</a>
 <ul class="sectlevel3">
-<li><a href="#design-goals">33.19.1. Design goals</a></li>
-<li><a href="#setup-trade-offs">33.19.2. Setup trade-offs</a></li>
-<li><a href="#resource-tradeoff-guidelines">33.19.3. Resource tradeoff guidelines</a></li>
-<li><a href="#linux-distro-choice">33.19.4. Linux distro choice</a></li>
+<li><a href="#design-goals">33.20.1. Design goals</a></li>
+<li><a href="#setup-trade-offs">33.20.2. Setup trade-offs</a></li>
+<li><a href="#resource-tradeoff-guidelines">33.20.3. Resource tradeoff guidelines</a></li>
+<li><a href="#linux-distro-choice">33.20.4. Linux distro choice</a></li>
 </ul>
 </li>
-<li><a href="#soft-topics">33.20. Soft topics</a>
+<li><a href="#soft-topics">33.21. Soft topics</a>
 <ul class="sectlevel3">
-<li><a href="#fairy-tale">33.20.1. Fairy tale</a></li>
-<li><a href="#should-you-waste-your-life-with-systems-programming">33.20.2. Should you waste your life with systems programming?</a></li>
+<li><a href="#fairy-tale">33.21.1. Fairy tale</a></li>
+<li><a href="#should-you-waste-your-life-with-systems-programming">33.21.2. Should you waste your life with systems programming?</a></li>
 </ul>
 </li>
-<li><a href="#bibliography">33.21. Bibliography</a></li>
+<li><a href="#bibliography">33.22. Bibliography</a></li>
 </ul>
 </li>
 </ul>
@@ -2196,7 +2261,7 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
 <p>If you don&#8217;t know which one to go for, start with <a href="#qemu-buildroot-setup-getting-started">QEMU Buildroot setup getting started</a>.</p>
 </div>
 <div class="paragraph">
-<p>Design goals of this project are documented at: <a href="#design-goals">Section 33.19.1, &#8220;Design goals&#8221;</a>.</p>
+<p>Design goals of this project are documented at: <a href="#design-goals">Section 33.20.1, &#8220;Design goals&#8221;</a>.</p>
 </div>
 <div class="sect2">
 <h3 id="qemu-buildroot-setup"><a class="anchor" href="#qemu-buildroot-setup"></a><a class="link" href="#qemu-buildroot-setup">1.1. QEMU Buildroot setup</a></h3>
@@ -2613,10 +2678,10 @@ hello /root/.profile
 <p>If you really want to develop semiconductors, your only choice is to join an university or a semiconductor company that has the EDA licenses.</p>
 </div>
 <div class="paragraph">
-<p>See also: <a href="#should-you-waste-your-life-with-systems-programming">Section 33.20.2, &#8220;Should you waste your life with systems programming?&#8221;</a>.</p>
+<p>See also: <a href="#should-you-waste-your-life-with-systems-programming">Section 33.21.2, &#8220;Should you waste your life with systems programming?&#8221;</a>.</p>
 </div>
 <div class="paragraph">
-<p>While hacking QEMU, you will likely want to GDB step its source. That is trivial since QEMU is just another userland program like any other, but our setup has a shortcut to make it even more convenient, see: <a href="#debug-the-emulator">Section 18.7, &#8220;Debug the emulator&#8221;</a>.</p>
+<p>While hacking QEMU, you will likely want to GDB step its source. That is trivial since QEMU is just another userland program like any other, but our setup has a shortcut to make it even more convenient, see: <a href="#debug-the-emulator">Section 18.8, &#8220;Debug the emulator&#8221;</a>.</p>
 </div>
 </div>
 <div class="sect4">
@@ -3752,7 +3817,7 @@ cd userland
 <p>Here we used <code>--force-rebuild</code> to force rebuild since the sources weren&#8217;t modified since the last build.</p>
 </div>
 <div class="paragraph">
-<p>Some CLI options have more specialized flags, e.g. <code>-O</code> optimization level:</p>
+<p>Some CLI options have more specialized flags, e.g. <code>-O</code> for the <a href="#optimization-level-of-a-build">Optimization level of a build</a>:</p>
 </div>
 <div class="literalblock">
 <div class="content">
@@ -3801,7 +3866,7 @@ cd userland
 <div class="ulist">
 <ul>
 <li>
-<p>put the host executables in a separate <a href="#build-variants">build-variant</a> to avoid conflict with Buildroot builds.</p>
+<p>put the host executables in a separate <a href="#build-variants">build variant</a> to avoid conflict with Buildroot builds.</p>
 </li>
 <li>
 <p>ran with the <code>--emulator native</code> option to run the program natively</p>
@@ -3817,7 +3882,7 @@ cd userland
 </div>
 </div>
 <div class="paragraph">
-<p>as shown at: <a href="#debug-the-emulator">Section 18.7, &#8220;Debug the emulator&#8221;</a>, although direct GDB host usage works as well of course.</p>
+<p>as shown at: <a href="#debug-the-emulator">Section 18.8, &#8220;Debug the emulator&#8221;</a>, although direct GDB host usage works as well of course.</p>
 </div>
 </div>
 <div class="sect4">
@@ -4309,6 +4374,9 @@ continue</pre>
 <div class="paragraph">
 <p>So get ready for some weird jumps, and <code>&lt;value optimized out&gt;</code> fun. Why, Linux, why.</p>
 </div>
+<div class="paragraph">
+<p>The <code>-O</code> level of some other userland content can be controlled as explained at: <a href="#optimization-level-of-a-build">Optimization level of a build</a>.</p>
+</div>
 </div>
 </div>
 <div class="sect2">
@@ -6916,7 +6984,7 @@ cat f
 <p>which can be good for automated tests, as it ensures that you are using a pristine unmodified system image every time.</p>
 </div>
 <div class="paragraph">
-<p>Not however that we already disable disk persistency by default on ext2 filesystems even without <code>--initrd</code>: <a href="#disk-persistency">Section 18.2, &#8220;Disk persistency&#8221;</a>.</p>
+<p>Not however that we already disable disk persistency by default on ext2 filesystems even without <code>--initrd</code>: <a href="#disk-persistency">Section 18.3, &#8220;Disk persistency&#8221;</a>.</p>
 </div>
 <div class="paragraph">
 <p>One downside of this method is that it has to put the entire filesystem into memory, and could lead to a panic:</p>
@@ -7469,6 +7537,19 @@ sudo ./setup -y</pre>
 </div>
 </div>
 <div class="paragraph">
+<p>also mentioned at:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><a href="https://stackoverflow.com/questions/62687463/gem5-kvm-doesnt-work-with-error-0x80000021" class="bare">https://stackoverflow.com/questions/62687463/gem5-kvm-doesnt-work-with-error-0x80000021</a></p>
+</li>
+<li>
+<p><a href="https://gem5-users.gem5.narkive.com/8DBihuUx/running-fs-py-with-x86kvmcpu-failed" class="bare">https://gem5-users.gem5.narkive.com/8DBihuUx/running-fs-py-with-x86kvmcpu-failed</a></p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
 <p>Bibliography:</p>
 </div>
 <div class="ulist">
@@ -7679,7 +7760,7 @@ qw er</pre>
 <p>The gem5 tests require building statically with build id <code>static</code>, see also: <a href="#gem5-syscall-emulation-mode">Section 10.7, &#8220;gem5 syscall emulation mode&#8221;</a>. TODO automate this better.</p>
 </div>
 <div class="paragraph">
-<p>See: <a href="#test-this-repo">Section 33.15, &#8220;Test this repo&#8221;</a> for more useful testing tips.</p>
+<p>See: <a href="#test-this-repo">Section 33.16, &#8220;Test this repo&#8221;</a> for more useful testing tips.</p>
 </div>
 </div>
 <div class="sect2">
@@ -8531,7 +8612,7 @@ Program aborted at tick 0</pre>
 <div class="ulist">
 <ul>
 <li>
-<p>modules built with Buildroot, see: <a href="#kernel-modules-buildroot-package">Section 33.14.2.1, &#8220;kernel_modules buildroot package&#8221;</a></p>
+<p>modules built with Buildroot, see: <a href="#kernel-modules-buildroot-package">Section 33.15.2.1, &#8220;kernel_modules buildroot package&#8221;</a></p>
 </li>
 <li>
 <p>modules built from the kernel tree itself, see: <a href="#dummy-irq">Section 15.12.2, &#8220;dummy-irq&#8221;</a></p>
@@ -9438,7 +9519,7 @@ xeyes</pre>
 <div class="sect2">
 <h3 id="enable-networking"><a class="anchor" href="#enable-networking"></a><a class="link" href="#enable-networking">14.1. Enable networking</a></h3>
 <div class="paragraph">
-<p>We disable networking by default because it starts an userland process, and we want to keep the number of userland processes to a minimum to make the system more understandable as explained at: <a href="#resource-tradeoff-guidelines">Section 33.19.3, &#8220;Resource tradeoff guidelines&#8221;</a></p>
+<p>We disable networking by default because it starts an userland process, and we want to keep the number of userland processes to a minimum to make the system more understandable as explained at: <a href="#resource-tradeoff-guidelines">Section 33.20.3, &#8220;Resource tradeoff guidelines&#8221;</a></p>
 </div>
 <div class="paragraph">
 <p>To enable networking on Buildroot, simply run:</p>
@@ -10287,15 +10368,15 @@ git log | grep -E '    Linux [0-9]+\.' | head</pre>
 <p>This also makes this repo the perfect setup to develop the Linux kernel.</p>
 </div>
 <div class="paragraph">
-<p>In case something breaks while updating the Linux kernel, you can try to bisect it to understand the root cause, see: <a href="#bisection">Section 33.16, &#8220;Bisection&#8221;</a>.</p>
+<p>In case something breaks while updating the Linux kernel, you can try to bisect it to understand the root cause, see: <a href="#bisection">Section 33.17, &#8220;Bisection&#8221;</a>.</p>
 </div>
 <div class="sect4">
 <h5 id="update-the-linux-kernel-lkmc-procedure"><a class="anchor" href="#update-the-linux-kernel-lkmc-procedure"></a><a class="link" href="#update-the-linux-kernel-lkmc-procedure">15.2.2.1. Update the Linux kernel LKMC procedure</a></h5>
 <div class="paragraph">
-<p>First, use use the branching procedure described at: <a href="#update-a-forked-submodule">Section 33.17, &#8220;Update a forked submodule&#8221;</a></p>
+<p>First, use use the branching procedure described at: <a href="#update-a-forked-submodule">Section 33.18, &#8220;Update a forked submodule&#8221;</a></p>
 </div>
 <div class="paragraph">
-<p>Because the kernel is so central to this repository, almost all tests must be re-run, so basically just follow the full testing procedure described at: <a href="#test-this-repo">Section 33.15, &#8220;Test this repo&#8221;</a>. The only tests that can be skipped are essentially the <a href="#baremetal">Baremetal</a> tests.</p>
+<p>Because the kernel is so central to this repository, almost all tests must be re-run, so basically just follow the full testing procedure described at: <a href="#test-this-repo">Section 33.16, &#8220;Test this repo&#8221;</a>. The only tests that can be skipped are essentially the <a href="#baremetal">Baremetal</a> tests.</p>
 </div>
 <div class="paragraph">
 <p>Before comitting, don&#8217;t forget to update:</p>
@@ -12587,7 +12668,10 @@ echo $?</pre>
 <div class="sect3">
 <h4 id="file-operations"><a class="anchor" href="#file-operations"></a><a class="link" href="#file-operations">15.9.1. File operations</a></h4>
 <div class="paragraph">
-<p>File operations are the main method of userland driver communication. <code>struct file_operations</code> determines what the kernel will do on filesystem system calls of <a href="#pseudo-filesystems">Pseudo filesystems</a>.</p>
+<p>File operations are the main method of userland driver communication.</p>
+</div>
+<div class="paragraph">
+<p><code>struct file_operations</code> determines what the kernel will do on filesystem system calls of <a href="#pseudo-filesystems">Pseudo filesystems</a>.</p>
 </div>
 <div class="paragraph">
 <p>This example illustrates the most basic system calls: <code>open</code>, <code>read</code>, <code>write</code>, <code>close</code> and <code>lseek</code>:</p>
@@ -12741,15 +12825,7 @@ cd</pre>
 <div class="sect3">
 <h4 id="poll"><a class="anchor" href="#poll"></a><a class="link" href="#poll">15.9.3. poll</a></h4>
 <div class="paragraph">
-<p>The poll system call allows an user process to do a non-busy wait on a kernel event:</p>
-</div>
-<div class="literalblock">
-<div class="content">
-<pre>./poll.sh</pre>
-</div>
-</div>
-<div class="paragraph">
-<p>Outcome: <code>jiffies</code> gets printed to stdout every second from userland.</p>
+<p>The poll system call allows an user process to do a non-busy wait on a kernel event.</p>
 </div>
 <div class="paragraph">
 <p>Sources:</p>
@@ -12765,6 +12841,70 @@ cd</pre>
 </ul>
 </div>
 <div class="paragraph">
+<p>Example:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>./poll.sh</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Outcome: <code>jiffies</code> gets printed to stdout every second from userland, e.g.:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>poll
+&lt;6&gt;[    4.275305] poll
+&lt;6&gt;[    4.275580] return POLLIN
+revents = 1
+POLLIN n=10 buf=4294893337
+poll
+&lt;6&gt;[    4.276627] poll
+&lt;6&gt;[    4.276911] return 0
+&lt;6&gt;[    5.271193] wake_up
+&lt;6&gt;[    5.272326] poll
+&lt;6&gt;[    5.273207] return POLLIN
+revents = 1
+POLLIN n=10 buf=4294893588
+poll
+&lt;6&gt;[    5.276367] poll
+&lt;6&gt;[    5.276618] return 0
+&lt;6&gt;[    6.275178] wake_up
+&lt;6&gt;[    6.276370] poll
+&lt;6&gt;[    6.277269] return POLLIN
+revents = 1
+POLLIN n=10 buf=4294893839</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Force the poll <a href="#file-operations"><code>file_operation</code></a> to return 0 to see what happens more clearly:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>./poll.sh pol0=1</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Sample output:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>poll
+&lt;6&gt;[   85.674801] poll
+&lt;6&gt;[   85.675788] return 0
+&lt;6&gt;[   86.675182] wake_up
+&lt;6&gt;[   86.676431] poll
+&lt;6&gt;[   86.677373] return 0
+&lt;6&gt;[   87.679198] wake_up
+&lt;6&gt;[   87.680515] poll
+&lt;6&gt;[   87.681564] return 0
+&lt;6&gt;[   88.683198] wake_up</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>From this we see that control is not returned to userland: the kernel just keeps calling the poll <code>file_operation</code> again and again.</p>
+</div>
+<div class="paragraph">
 <p>Typically, we are waiting for some hardware to make some piece of data available available to the kernel.</p>
 </div>
 <div class="paragraph">
@@ -12774,7 +12914,17 @@ cd</pre>
 <p>To simplify this example, we just fake the hardware interrupts with a <a href="#kthread">kthread</a> that sleeps for a second in an infinite loop.</p>
 </div>
 <div class="paragraph">
-<p>Bibliography: <a href="https://stackoverflow.com/questions/30035776/how-to-add-poll-function-to-the-kernel-module-code/44645336#44645336" class="bare">https://stackoverflow.com/questions/30035776/how-to-add-poll-function-to-the-kernel-module-code/44645336#44645336</a></p>
+<p>Bibliography:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><a href="https://stackoverflow.com/questions/30035776/how-to-add-poll-function-to-the-kernel-module-code/44645336#44645336" class="bare">https://stackoverflow.com/questions/30035776/how-to-add-poll-function-to-the-kernel-module-code/44645336#44645336</a></p>
+</li>
+<li>
+<p><a href="https://stackoverflow.com/questions/30234496/why-do-we-need-to-call-poll-wait-in-poll/44645480#44645480" class="bare">https://stackoverflow.com/questions/30234496/why-do-we-need-to-call-poll-wait-in-poll/44645480#44645480</a></p>
+</li>
+</ul>
 </div>
 </div>
 <div class="sect3">
@@ -16137,7 +16287,7 @@ ps</pre>
 <p>If you are familiar with <a href="https://en.wikipedia.org/wiki/VirtualBox">VirtualBox</a>, then QEMU then basically does the same thing: it opens a "window" inside your desktop that can run an operating system inside your operating system.</p>
 </div>
 <div class="paragraph">
-<p>Also both can use very similar techniques: either <a href="https://en.wikipedia.org/wiki/Binary_translation">binary translation</a> or <a href="#kvm">KVM</a>. VirtualBox' binary translator is / was based on QEMU&#8217;s it seems: <a href="https://en.wikipedia.org/wiki/VirtualBox#Software-based_virtualization" class="bare">https://en.wikipedia.org/wiki/VirtualBox#Software-based_virtualization</a></p>
+<p>Also both can use very similar techniques: either <a href="#binary-translation">Binary translation</a> or <a href="#kvm">KVM</a>. VirtualBox' binary translator is / was based on QEMU&#8217;s it seems: <a href="https://en.wikipedia.org/wiki/VirtualBox#Software-based_virtualization" class="bare">https://en.wikipedia.org/wiki/VirtualBox#Software-based_virtualization</a></p>
 </div>
 <div class="paragraph">
 <p>The huge advantage of QEMU over VirtualBox is that is supports cross arch simulation, e.g. simulate an ARM guest on an x86 host.</p>
@@ -16159,7 +16309,16 @@ ps</pre>
 </div>
 </div>
 <div class="sect2">
-<h3 id="disk-persistency"><a class="anchor" href="#disk-persistency"></a><a class="link" href="#disk-persistency">18.2. Disk persistency</a></h3>
+<h3 id="binary-translation"><a class="anchor" href="#binary-translation"></a><a class="link" href="#binary-translation">18.2. Binary translation</a></h3>
+<div class="paragraph">
+<p><a href="https://en.wikipedia.org/wiki/Binary_translation" class="bare">https://en.wikipedia.org/wiki/Binary_translation</a></p>
+</div>
+<div class="paragraph">
+<p>Used by <a href="#qemu">QEMU</a> and <a href="#gensim">Gensim</a>.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="disk-persistency"><a class="anchor" href="#disk-persistency"></a><a class="link" href="#disk-persistency">18.3. Disk persistency</a></h3>
 <div class="paragraph">
 <p>We disable disk persistency for both QEMU and gem5 by default, to prevent the emulator from putting the image in an unknown state.</p>
 </div>
@@ -16214,7 +16373,7 @@ ps</pre>
 <p>Disk persistency is useful to re-run shell commands from the history of a previous session with <code>Ctrl-R</code>, but we felt that the loss of determinism was not worth it.</p>
 </div>
 <div class="sect3">
-<h4 id="gem5-disk-persistency"><a class="anchor" href="#gem5-disk-persistency"></a><a class="link" href="#gem5-disk-persistency">18.2.1. gem5 disk persistency</a></h4>
+<h4 id="gem5-disk-persistency"><a class="anchor" href="#gem5-disk-persistency"></a><a class="link" href="#gem5-disk-persistency">18.3.1. gem5 disk persistency</a></h4>
 <div class="paragraph">
 <p>TODO how to make gem5 disk writes persistent?</p>
 </div>
@@ -16244,7 +16403,7 @@ index 17498c42b..76b8b351d 100644
 </div>
 </div>
 <div class="sect2">
-<h3 id="gem5-qcow2"><a class="anchor" href="#gem5-qcow2"></a><a class="link" href="#gem5-qcow2">18.3. gem5 qcow2</a></h3>
+<h3 id="gem5-qcow2"><a class="anchor" href="#gem5-qcow2"></a><a class="link" href="#gem5-qcow2">18.4. gem5 qcow2</a></h3>
 <div class="paragraph">
 <p>qcow2 does not appear supported, there are not hits in the source tree, and there is a mention on Nate&#8217;s 2009 wishlist: <a href="http://gem5.org/Nate%27s_Wish_List" class="bare">http://gem5.org/Nate%27s_Wish_List</a></p>
 </div>
@@ -16253,7 +16412,7 @@ index 17498c42b..76b8b351d 100644
 </div>
 </div>
 <div class="sect2">
-<h3 id="snapshot"><a class="anchor" href="#snapshot"></a><a class="link" href="#snapshot">18.4. Snapshot</a></h3>
+<h3 id="snapshot"><a class="anchor" href="#snapshot"></a><a class="link" href="#snapshot">18.5. Snapshot</a></h3>
 <div class="paragraph">
 <p>QEMU allows us to take snapshots at any time through the monitor.</p>
 </div>
@@ -16351,7 +16510,7 @@ index 17498c42b..76b8b351d 100644
 <p>Bibliography: <a href="https://stackoverflow.com/questions/40227651/does-qemu-emulator-have-checkpoint-function/48724371#48724371" class="bare">https://stackoverflow.com/questions/40227651/does-qemu-emulator-have-checkpoint-function/48724371#48724371</a></p>
 </div>
 <div class="sect3">
-<h4 id="snapshot-internals"><a class="anchor" href="#snapshot-internals"></a><a class="link" href="#snapshot-internals">18.4.1. Snapshot internals</a></h4>
+<h4 id="snapshot-internals"><a class="anchor" href="#snapshot-internals"></a><a class="link" href="#snapshot-internals">18.5.1. Snapshot internals</a></h4>
 <div class="paragraph">
 <p>Snapshots are stored inside the <code>.qcow2</code> images themselves.</p>
 </div>
@@ -16400,7 +16559,7 @@ Format specific information:
 </div>
 </div>
 <div class="sect2">
-<h3 id="device-models"><a class="anchor" href="#device-models"></a><a class="link" href="#device-models">18.5. Device models</a></h3>
+<h3 id="device-models"><a class="anchor" href="#device-models"></a><a class="link" href="#device-models">18.6. Device models</a></h3>
 <div class="paragraph">
 <p>This section documents:</p>
 </div>
@@ -16445,12 +16604,12 @@ Format specific information:
 </ul>
 </div>
 <div class="sect3">
-<h4 id="pci"><a class="anchor" href="#pci"></a><a class="link" href="#pci">18.5.1. PCI</a></h4>
+<h4 id="pci"><a class="anchor" href="#pci"></a><a class="link" href="#pci">18.6.1. PCI</a></h4>
 <div class="paragraph">
 <p>Only tested in x86.</p>
 </div>
 <div class="sect4">
-<h5 id="pci-min"><a class="anchor" href="#pci-min"></a><a class="link" href="#pci-min">18.5.1.1. pci_min</a></h5>
+<h5 id="pci-min"><a class="anchor" href="#pci-min"></a><a class="link" href="#pci-min">18.6.1.1. pci_min</a></h5>
 <div class="paragraph">
 <p>PCI driver for our minimal <code>pci_min.c</code> QEMU fork device:</p>
 </div>
@@ -16520,7 +16679,7 @@ lkmc_pci_min mmio_write addr = 4 val = 0 size = 4</pre>
 </div>
 </div>
 <div class="sect4">
-<h5 id="qemu-edu"><a class="anchor" href="#qemu-edu"></a><a class="link" href="#qemu-edu">18.5.1.2. QEMU edu PCI device</a></h5>
+<h5 id="qemu-edu"><a class="anchor" href="#qemu-edu"></a><a class="link" href="#qemu-edu">18.6.1.2. QEMU edu PCI device</a></h5>
 <div class="paragraph">
 <p>Small upstream educational PCI device:</p>
 </div>
@@ -16578,16 +16737,19 @@ lkmc_pci_min mmio_write addr = 4 val = 0 size = 4</pre>
 <div class="ulist">
 <ul>
 <li>
+<p><a href="https://stackoverflow.com/questions/17913679/how-to-instantiate-and-use-a-dma-driver-linux-module" class="bare">https://stackoverflow.com/questions/17913679/how-to-instantiate-and-use-a-dma-driver-linux-module</a></p>
+</li>
+<li>
 <p><a href="https://stackoverflow.com/questions/32592734/are-there-any-dma-driver-example-pcie-and-fpga/44716747#44716747" class="bare">https://stackoverflow.com/questions/32592734/are-there-any-dma-driver-example-pcie-and-fpga/44716747#44716747</a></p>
 </li>
 <li>
-<p><a href="https://stackoverflow.com/questions/17913679/how-to-instantiate-and-use-a-dma-driver-linux-module" class="bare">https://stackoverflow.com/questions/17913679/how-to-instantiate-and-use-a-dma-driver-linux-module</a></p>
+<p><a href="https://stackoverflow.com/questions/62831327/add-memory-device-to-qemu" class="bare">https://stackoverflow.com/questions/62831327/add-memory-device-to-qemu</a></p>
 </li>
 </ul>
 </div>
 </div>
 <div class="sect4">
-<h5 id="manipulate-pci-registers-directly"><a class="anchor" href="#manipulate-pci-registers-directly"></a><a class="link" href="#manipulate-pci-registers-directly">18.5.1.3. Manipulate PCI registers directly</a></h5>
+<h5 id="manipulate-pci-registers-directly"><a class="anchor" href="#manipulate-pci-registers-directly"></a><a class="link" href="#manipulate-pci-registers-directly">18.6.1.3. Manipulate PCI registers directly</a></h5>
 <div class="paragraph">
 <p>In this section we will try to interact with PCI devices directly from userland without kernel modules.</p>
 </div>
@@ -16733,7 +16895,7 @@ devmem 0xfeb54000 w 0x12345678</pre>
 </div>
 </div>
 <div class="sect4">
-<h5 id="pciutils"><a class="anchor" href="#pciutils"></a><a class="link" href="#pciutils">18.5.1.4. pciutils</a></h5>
+<h5 id="pciutils"><a class="anchor" href="#pciutils"></a><a class="link" href="#pciutils">18.6.1.4. pciutils</a></h5>
 <div class="paragraph">
 <p>There are two versions of <code>setpci</code> and <code>lspci</code>:</p>
 </div>
@@ -16749,7 +16911,7 @@ devmem 0xfeb54000 w 0x12345678</pre>
 </div>
 </div>
 <div class="sect4">
-<h5 id="introduction-to-pci"><a class="anchor" href="#introduction-to-pci"></a><a class="link" href="#introduction-to-pci">18.5.1.5. Introduction to PCI</a></h5>
+<h5 id="introduction-to-pci"><a class="anchor" href="#introduction-to-pci"></a><a class="link" href="#introduction-to-pci">18.6.1.5. Introduction to PCI</a></h5>
 <div class="paragraph">
 <p>The PCI standard is non-free, obviously like everything in low level: <a href="https://pcisig.com/specifications" class="bare">https://pcisig.com/specifications</a> but Google gives several illegal PDF hits :-)</p>
 </div>
@@ -16809,7 +16971,7 @@ devmem 0xfeb54000 w 0x12345678</pre>
 </div>
 </div>
 <div class="sect4">
-<h5 id="pci-bfd"><a class="anchor" href="#pci-bfd"></a><a class="link" href="#pci-bfd">18.5.1.6. PCI BFD</a></h5>
+<h5 id="pci-bfd"><a class="anchor" href="#pci-bfd"></a><a class="link" href="#pci-bfd">18.6.1.6. PCI BFD</a></h5>
 <div class="paragraph">
 <p><code>lspci -k</code> shows something like:</p>
 </div>
@@ -16863,7 +17025,7 @@ devmem 0xfeb54000 w 0x12345678</pre>
 </div>
 </div>
 <div class="sect4">
-<h5 id="pci-bar"><a class="anchor" href="#pci-bar"></a><a class="link" href="#pci-bar">18.5.1.7. PCI BAR</a></h5>
+<h5 id="pci-bar"><a class="anchor" href="#pci-bar"></a><a class="link" href="#pci-bar">18.6.1.7. PCI BAR</a></h5>
 <div class="paragraph">
 <p><a href="https://stackoverflow.com/questions/30190050/what-is-base-address-register-bar-in-pcie/44716618#44716618" class="bare">https://stackoverflow.com/questions/30190050/what-is-base-address-register-bar-in-pcie/44716618#44716618</a></p>
 </div>
@@ -16905,7 +17067,7 @@ pci_register_bar(pdev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &amp;edu-&gt;mmio);</pr
 </div>
 </div>
 <div class="sect3">
-<h4 id="gpio"><a class="anchor" href="#gpio"></a><a class="link" href="#gpio">18.5.2. GPIO</a></h4>
+<h4 id="gpio"><a class="anchor" href="#gpio"></a><a class="link" href="#gpio">18.6.2. GPIO</a></h4>
 <div class="paragraph">
 <p>TODO: broken. Was working before we moved <code>arm</code> from <code>-M versatilepb</code> to <code>-M virt</code> around af210a76711b7fa4554dcc2abd0ddacfc810dfd4. Either make it work on <code>-M virt</code> if that is possible, or document precisely how to make it work with <code>versatilepb</code>, or hopefully <code>vexpress</code> which is newer.</p>
 </div>
@@ -16948,7 +17110,7 @@ pci_register_bar(pdev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &amp;edu-&gt;mmio);</pr
 </div>
 </div>
 <div class="sect3">
-<h4 id="leds"><a class="anchor" href="#leds"></a><a class="link" href="#leds">18.5.3. LEDs</a></h4>
+<h4 id="leds"><a class="anchor" href="#leds"></a><a class="link" href="#leds">18.6.3. LEDs</a></h4>
 <div class="paragraph">
 <p>TODO: broken when <code>arm</code> moved to <code>-M virt</code>, same as <a href="#gpio">GPIO</a>.</p>
 </div>
@@ -17020,7 +17182,7 @@ echo 255 &gt;brightness</pre>
 </div>
 </div>
 <div class="sect3">
-<h4 id="platform-device"><a class="anchor" href="#platform-device"></a><a class="link" href="#platform-device">18.5.4. platform_device</a></h4>
+<h4 id="platform-device"><a class="anchor" href="#platform-device"></a><a class="link" href="#platform-device">18.6.4. platform_device</a></h4>
 <div class="paragraph">
 <p>Minimal platform device example coded into the <code>-M versatilepb</code> SoC of our QEMU fork.</p>
 </div>
@@ -17098,7 +17260,7 @@ insmod platform_device.ko</pre>
 </div>
 </div>
 <div class="sect3">
-<h4 id="gem5-educational-hardware-models"><a class="anchor" href="#gem5-educational-hardware-models"></a><a class="link" href="#gem5-educational-hardware-models">18.5.5. gem5 educational hardware models</a></h4>
+<h4 id="gem5-educational-hardware-models"><a class="anchor" href="#gem5-educational-hardware-models"></a><a class="link" href="#gem5-educational-hardware-models">18.6.5. gem5 educational hardware models</a></h4>
 <div class="paragraph">
 <p>TODO get some working!</p>
 </div>
@@ -17108,7 +17270,7 @@ insmod platform_device.ko</pre>
 </div>
 </div>
 <div class="sect2">
-<h3 id="qemu-monitor"><a class="anchor" href="#qemu-monitor"></a><a class="link" href="#qemu-monitor">18.6. QEMU monitor</a></h3>
+<h3 id="qemu-monitor"><a class="anchor" href="#qemu-monitor"></a><a class="link" href="#qemu-monitor">18.7. QEMU monitor</a></h3>
 <div class="paragraph">
 <p>The QEMU monitor is a magic terminal that allows you to send text commands to the QEMU VM itself: <a href="https://en.wikibooks.org/wiki/QEMU/Monitor" class="bare">https://en.wikibooks.org/wiki/QEMU/Monitor</a></p>
 </div>
@@ -17228,7 +17390,7 @@ insmod platform_device.ko</pre>
 </ul>
 </div>
 <div class="sect3">
-<h4 id="qemu-monitor-from-guest"><a class="anchor" href="#qemu-monitor-from-guest"></a><a class="link" href="#qemu-monitor-from-guest">18.6.1. QEMU monitor from guest</a></h4>
+<h4 id="qemu-monitor-from-guest"><a class="anchor" href="#qemu-monitor-from-guest"></a><a class="link" href="#qemu-monitor-from-guest">18.7.1. QEMU monitor from guest</a></h4>
 <div class="paragraph">
 <p>Peter Maydell said potentially not possible nicely as of August 2018: <a href="https://stackoverflow.com/questions/51747744/how-to-run-a-qemu-monitor-command-from-inside-the-guest/51764110#51764110" class="bare">https://stackoverflow.com/questions/51747744/how-to-run-a-qemu-monitor-command-from-inside-the-guest/51764110#51764110</a></p>
 </div>
@@ -17245,7 +17407,7 @@ insmod platform_device.ko</pre>
 </div>
 </div>
 <div class="sect3">
-<h4 id="qemu-monitor-from-gdb"><a class="anchor" href="#qemu-monitor-from-gdb"></a><a class="link" href="#qemu-monitor-from-gdb">18.6.2. QEMU monitor from GDB</a></h4>
+<h4 id="qemu-monitor-from-gdb"><a class="anchor" href="#qemu-monitor-from-gdb"></a><a class="link" href="#qemu-monitor-from-gdb">18.7.2. QEMU monitor from GDB</a></h4>
 <div class="paragraph">
 <p>When doing <a href="#gdb">GDB step debug</a> it is possible to send QEMU monitor commands through the GDB <code>monitor</code> command, which saves you the trouble of opening yet another shell.</p>
 </div>
@@ -17261,7 +17423,7 @@ monitor info qtree</pre>
 </div>
 </div>
 <div class="sect2">
-<h3 id="debug-the-emulator"><a class="anchor" href="#debug-the-emulator"></a><a class="link" href="#debug-the-emulator">18.7. Debug the emulator</a></h3>
+<h3 id="debug-the-emulator"><a class="anchor" href="#debug-the-emulator"></a><a class="link" href="#debug-the-emulator">18.8. Debug the emulator</a></h3>
 <div class="paragraph">
 <p>When you start hacking QEMU or gem5, it is useful to see what is going on inside the emulator themselves.</p>
 </div>
@@ -17274,7 +17436,15 @@ monitor info qtree</pre>
 </div>
 </div>
 <div class="paragraph">
-<p>Or for a faster development loop:</p>
+<p>Or for a faster development loop you can pass <code>-ex</code> command as a semicolon separated list:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>./run --debug-vm-ex 'break qemu_add_opts;run'</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>which is equivalent to the more verbose:</p>
 </div>
 <div class="literalblock">
 <div class="content">
@@ -17282,6 +17452,9 @@ monitor info qtree</pre>
 </div>
 </div>
 <div class="paragraph">
+<p>if you ever want need anything besides -ex.</p>
+</div>
+<div class="paragraph">
 <p>Or if things get really involved and you want a debug script:</p>
 </div>
 <div class="literalblock">
@@ -17330,7 +17503,7 @@ run
 <p>You can still send key presses to QEMU however even without the mouse capture, just either click on the title bar, or alt tab to give it focus.</p>
 </div>
 <div class="sect3">
-<h4 id="reverse-debug-the-emulator"><a class="anchor" href="#reverse-debug-the-emulator"></a><a class="link" href="#reverse-debug-the-emulator">18.7.1. Reverse debug the emulator</a></h4>
+<h4 id="reverse-debug-the-emulator"><a class="anchor" href="#reverse-debug-the-emulator"></a><a class="link" href="#reverse-debug-the-emulator">18.8.1. Reverse debug the emulator</a></h4>
 <div class="paragraph">
 <p>While step debugging any complex program, you always end up feeling the need to step in reverse to reach the last call to some function that was called before the failure point, in order to trace back the problem to the actual bug source.</p>
 </div>
@@ -17419,7 +17592,7 @@ reverse-next</pre>
 </div>
 </div>
 <div class="sect3">
-<h4 id="debug-gem5-python-scripts"><a class="anchor" href="#debug-gem5-python-scripts"></a><a class="link" href="#debug-gem5-python-scripts">18.7.2. Debug gem5 Python scripts</a></h4>
+<h4 id="debug-gem5-python-scripts"><a class="anchor" href="#debug-gem5-python-scripts"></a><a class="link" href="#debug-gem5-python-scripts">18.8.2. Debug gem5 Python scripts</a></h4>
 <div class="paragraph">
 <p>Start pdb at the first instruction:</p>
 </div>
@@ -17453,7 +17626,7 @@ reverse-next</pre>
 </div>
 </div>
 <div class="sect2">
-<h3 id="tracing"><a class="anchor" href="#tracing"></a><a class="link" href="#tracing">18.8. Tracing</a></h3>
+<h3 id="tracing"><a class="anchor" href="#tracing"></a><a class="link" href="#tracing">18.9. Tracing</a></h3>
 <div class="paragraph">
 <p>QEMU can log several different events.</p>
 </div>
@@ -17544,7 +17717,7 @@ Call Trace:
 </ul>
 </div>
 <div class="sect3">
-<h4 id="qemu-d-tracing"><a class="anchor" href="#qemu-d-tracing"></a><a class="link" href="#qemu-d-tracing">18.8.1. QEMU -d tracing</a></h4>
+<h4 id="qemu-d-tracing"><a class="anchor" href="#qemu-d-tracing"></a><a class="link" href="#qemu-d-tracing">18.9.1. QEMU -d tracing</a></h4>
 <div class="paragraph">
 <p>QEMU also has a second trace mechanism in addition to <code>-trace</code>, find out the events with:</p>
 </div>
@@ -17585,7 +17758,7 @@ IN:
 </div>
 </div>
 <div class="sect3">
-<h4 id="qemu-trace-register-values"><a class="anchor" href="#qemu-trace-register-values"></a><a class="link" href="#qemu-trace-register-values">18.8.2. QEMU trace register values</a></h4>
+<h4 id="qemu-trace-register-values"><a class="anchor" href="#qemu-trace-register-values"></a><a class="link" href="#qemu-trace-register-values">18.9.2. QEMU trace register values</a></h4>
 <div class="paragraph">
 <p>TODO: is it possible to show the register values for each instruction?</p>
 </div>
@@ -17615,11 +17788,11 @@ IN:
 <p>PANDA can list memory addresses, so I bet it can also decode the instructions: <a href="https://github.com/panda-re/panda/blob/883c85fa35f35e84a323ed3d464ff40030f06bd6/panda/docs/LINE_Censorship.md" class="bare">https://github.com/panda-re/panda/blob/883c85fa35f35e84a323ed3d464ff40030f06bd6/panda/docs/LINE_Censorship.md</a> I wonder why they don&#8217;t just upstream those things to QEMU&#8217;s tracing: <a href="https://github.com/panda-re/panda/issues/290" class="bare">https://github.com/panda-re/panda/issues/290</a></p>
 </div>
 <div class="paragraph">
-<p>gem5 can do it as shown at: <a href="#gem5-tracing">Section 18.8.8, &#8220;gem5 tracing&#8221;</a>.</p>
+<p>gem5 can do it as shown at: <a href="#gem5-tracing">Section 18.9.8, &#8220;gem5 tracing&#8221;</a>.</p>
 </div>
 </div>
 <div class="sect3">
-<h4 id="qemu-trace-memory-accesses"><a class="anchor" href="#qemu-trace-memory-accesses"></a><a class="link" href="#qemu-trace-memory-accesses">18.8.3. QEMU trace memory accesses</a></h4>
+<h4 id="qemu-trace-memory-accesses"><a class="anchor" href="#qemu-trace-memory-accesses"></a><a class="link" href="#qemu-trace-memory-accesses">18.9.3. QEMU trace memory accesses</a></h4>
 <div class="paragraph">
 <p>Not possible apparently, not even with the <code>memory_region_ops_read</code> and <code>memory_region_ops_write</code> trace events, Peter comments <a href="https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg07482.html" class="bare">https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg07482.html</a></p>
 </div>
@@ -17638,7 +17811,7 @@ of guest operations.</p>
 </div>
 </div>
 <div class="sect3">
-<h4 id="trace-source-lines"><a class="anchor" href="#trace-source-lines"></a><a class="link" href="#trace-source-lines">18.8.4. Trace source lines</a></h4>
+<h4 id="trace-source-lines"><a class="anchor" href="#trace-source-lines"></a><a class="link" href="#trace-source-lines">18.9.4. Trace source lines</a></h4>
 <div class="paragraph">
 <p>We can further use Binutils' <code>addr2line</code> to get the line that corresponds to each address:</p>
 </div>
@@ -17694,7 +17867,7 @@ less "$(./getvar --arch x86_64 run_dir)/trace-lines.txt"</pre>
 </div>
 </div>
 <div class="sect3">
-<h4 id="qemu-record-and-replay"><a class="anchor" href="#qemu-record-and-replay"></a><a class="link" href="#qemu-record-and-replay">18.8.5. QEMU record and replay</a></h4>
+<h4 id="qemu-record-and-replay"><a class="anchor" href="#qemu-record-and-replay"></a><a class="link" href="#qemu-record-and-replay">18.9.5. QEMU record and replay</a></h4>
 <div class="paragraph">
 <p>QEMU runs, unlike gem5, are not deterministic by default, however it does support a record and replay mechanism that allows you to replay a previous run deterministically.</p>
 </div>
@@ -17801,7 +17974,7 @@ less "$(./getvar --arch x86_64 run_dir)/trace-lines.txt"</pre>
 <p>Solved on unmerged c42634d8e3428cfa60672c3ba89cabefc720cde9 from <a href="https://github.com/ispras/qemu/tree/rr-180725" class="bare">https://github.com/ispras/qemu/tree/rr-180725</a></p>
 </div>
 <div class="sect4">
-<h5 id="qemu-reverse-debugging"><a class="anchor" href="#qemu-reverse-debugging"></a><a class="link" href="#qemu-reverse-debugging">18.8.5.1. QEMU reverse debugging</a></h5>
+<h5 id="qemu-reverse-debugging"><a class="anchor" href="#qemu-reverse-debugging"></a><a class="link" href="#qemu-reverse-debugging">18.9.5.1. QEMU reverse debugging</a></h5>
 <div class="paragraph">
 <p>TODO get working.</p>
 </div>
@@ -17840,7 +18013,7 @@ reverse-continue</pre>
 </div>
 </div>
 <div class="sect3">
-<h4 id="qemu-trace-multicore"><a class="anchor" href="#qemu-trace-multicore"></a><a class="link" href="#qemu-trace-multicore">18.8.6. QEMU trace multicore</a></h4>
+<h4 id="qemu-trace-multicore"><a class="anchor" href="#qemu-trace-multicore"></a><a class="link" href="#qemu-trace-multicore">18.9.6. QEMU trace multicore</a></h4>
 <div class="paragraph">
 <p>TODO: is there any way to distinguish which instruction runs on each core? Doing:</p>
 </div>
@@ -17855,13 +18028,13 @@ reverse-continue</pre>
 </div>
 </div>
 <div class="sect3">
-<h4 id="qemu-get-guest-instruction-count"><a class="anchor" href="#qemu-get-guest-instruction-count"></a><a class="link" href="#qemu-get-guest-instruction-count">18.8.7. QEMU get guest instruction count</a></h4>
+<h4 id="qemu-get-guest-instruction-count"><a class="anchor" href="#qemu-get-guest-instruction-count"></a><a class="link" href="#qemu-get-guest-instruction-count">18.9.7. QEMU get guest instruction count</a></h4>
 <div class="paragraph">
 <p>TODO: <a href="https://stackoverflow.com/questions/58766571/how-to-count-the-number-of-guest-instructions-qemu-executed-from-the-beginning-t" class="bare">https://stackoverflow.com/questions/58766571/how-to-count-the-number-of-guest-instructions-qemu-executed-from-the-beginning-t</a></p>
 </div>
 </div>
 <div class="sect3">
-<h4 id="gem5-tracing"><a class="anchor" href="#gem5-tracing"></a><a class="link" href="#gem5-tracing">18.8.8. gem5 tracing</a></h4>
+<h4 id="gem5-tracing"><a class="anchor" href="#gem5-tracing"></a><a class="link" href="#gem5-tracing">18.9.8. gem5 tracing</a></h4>
 <div class="paragraph">
 <p>gem5 provides also provides a tracing mechanism documented at: <a href="http://www.gem5.org/Trace_Based_Debugging" class="bare">http://www.gem5.org/Trace_Based_Debugging</a>:</p>
 </div>
@@ -17972,7 +18145,7 @@ less "$(./getvar --arch aarch64 run_dir)/trace-lines.txt"</pre>
 <p>TODO: 7452d399290c9c1fc6366cdad129ef442f323564 <code>./trace2line</code> this is too slow and takes hours. QEMU&#8217;s processing of 170k events takes 7 seconds. gem5&#8217;s processing is analogous, but there are 140M events, so it should take 7000 seconds ~ 2 hours which seems consistent with what I observe, so maybe there is no way to speed this up&#8230;&#8203; The workaround is to just use gem5&#8217;s <code>ExecSymbol</code> to get function granularity, and then GDB individually if line detail is needed?</p>
 </div>
 <div class="sect4">
-<h5 id="gem5-trace-internals"><a class="anchor" href="#gem5-trace-internals"></a><a class="link" href="#gem5-trace-internals">18.8.8.1. gem5 trace internals</a></h5>
+<h5 id="gem5-trace-internals"><a class="anchor" href="#gem5-trace-internals"></a><a class="link" href="#gem5-trace-internals">18.9.8.1. gem5 trace internals</a></h5>
 <div class="paragraph">
 <p>gem5 traces are generated from <code>DPRINTF(&lt;trace-id&gt;</code> calls scattered throughout the code, except for <code>ExecAll</code> instruction traces, which uses <code>Debug::ExecEnable</code> directly..</p>
 </div>
@@ -18009,7 +18182,7 @@ extern SimpleFlag ExecEnable;
 </div>
 </div>
 <div class="sect4">
-<h5 id="gem5-execall-trace-format"><a class="anchor" href="#gem5-execall-trace-format"></a><a class="link" href="#gem5-execall-trace-format">18.8.8.2. gem5 ExecAll trace format</a></h5>
+<h5 id="gem5-execall-trace-format"><a class="anchor" href="#gem5-execall-trace-format"></a><a class="link" href="#gem5-execall-trace-format">18.9.8.2. gem5 ExecAll trace format</a></h5>
 <div class="paragraph">
 <p>This debug flag traces all instructions.</p>
 </div>
@@ -18064,7 +18237,7 @@ extern SimpleFlag ExecEnable;
 <p><code>@start_kernel</code>: we are in the <code>start_kernel</code> function. Awesome feature! Implemented with libelf <a href="https://sourceforge.net/projects/elftoolchain/" class="bare">https://sourceforge.net/projects/elftoolchain/</a> copy pasted in-tree <code>ext/libelf</code>. To get raw addresses, remove the <code>ExecSymbol</code>, which is enabled by <code>Exec</code>. This can be done with <code>Exec,-ExecSymbol</code>.</p>
 </li>
 <li>
-<p><code>.1</code> as in <code>@start_kernel.1</code>: index of the microop</p>
+<p><code>.1</code> as in <code>@start_kernel.1</code>: index of the <a href="#gem5-microops">gem5 microops</a></p>
 </li>
 <li>
 <p><code>stp</code>: instruction disassembly. Note however that the disassembly of many instructions are very broken as of 2019q2, and you can&#8217;t just trust them blindly.</p>
@@ -18092,7 +18265,7 @@ extern SimpleFlag ExecEnable;
 </div>
 </div>
 <div class="sect4">
-<h5 id="gem5-registers-trace-format"><a class="anchor" href="#gem5-registers-trace-format"></a><a class="link" href="#gem5-registers-trace-format">18.8.8.3. gem5 Registers trace format</a></h5>
+<h5 id="gem5-registers-trace-format"><a class="anchor" href="#gem5-registers-trace-format"></a><a class="link" href="#gem5-registers-trace-format">18.9.8.3. gem5 Registers trace format</a></h5>
 <div class="paragraph">
 <p>This flag shows a more detailed register usage than <a href="#gem5-execall-trace-format">gem5 ExecAll trace format</a>.</p>
 </div>
@@ -18147,13 +18320,13 @@ add x1, x0, 2</pre>
 </div>
 </div>
 <div class="sect4">
-<h5 id="gem5-tarmac-traces"><a class="anchor" href="#gem5-tarmac-traces"></a><a class="link" href="#gem5-tarmac-traces">18.8.8.4. gem5 TARMAC traces</a></h5>
+<h5 id="gem5-tarmac-traces"><a class="anchor" href="#gem5-tarmac-traces"></a><a class="link" href="#gem5-tarmac-traces">18.9.8.4. gem5 TARMAC traces</a></h5>
 <div class="paragraph">
 <p><a href="https://stackoverflow.com/questions/54882466/how-to-use-the-tarmac-tracer-with-gem5" class="bare">https://stackoverflow.com/questions/54882466/how-to-use-the-tarmac-tracer-with-gem5</a></p>
 </div>
 </div>
 <div class="sect4">
-<h5 id="gem5-tracing-internals"><a class="anchor" href="#gem5-tracing-internals"></a><a class="link" href="#gem5-tracing-internals">18.8.8.5. gem5 tracing internals</a></h5>
+<h5 id="gem5-tracing-internals"><a class="anchor" href="#gem5-tracing-internals"></a><a class="link" href="#gem5-tracing-internals">18.9.8.5. gem5 tracing internals</a></h5>
 <div class="paragraph">
 <p>As of gem5 16eeee5356585441a49d05c78abc328ef09f7ace the default tracer is <code>ExeTracer</code>. It is set at:</p>
 </div>
@@ -18226,7 +18399,7 @@ src/arch/x86/nativetrace.hh:41:class X86NativeTrace : public NativeTrace</pre>
 </div>
 </div>
 <div class="sect2">
-<h3 id="qemu-gui-is-unresponsive"><a class="anchor" href="#qemu-gui-is-unresponsive"></a><a class="link" href="#qemu-gui-is-unresponsive">18.9. QEMU GUI is unresponsive</a></h3>
+<h3 id="qemu-gui-is-unresponsive"><a class="anchor" href="#qemu-gui-is-unresponsive"></a><a class="link" href="#qemu-gui-is-unresponsive">18.10. QEMU GUI is unresponsive</a></h3>
 <div class="paragraph">
 <p>Sometimes in Ubuntu 14.04, after the QEMU SDL GUI starts, it does not get updated after keyboard strokes, and there are artifacts like disappearing text.</p>
 </div>
@@ -18711,7 +18884,24 @@ ps Haux | grep qemu | wc</pre>
 <p><a href="https://stackoverflow.com/questions/50248067/how-to-run-a-gem5-arm-aarch64-full-system-simulation-with-fs-py-with-more-than-8" class="bare">https://stackoverflow.com/questions/50248067/how-to-run-a-gem5-arm-aarch64-full-system-simulation-with-fs-py-with-more-than-8</a></p>
 </div>
 <div class="paragraph">
-<p>Build the kernel with the <a href="#gem5-arm-linux-kernel-patches">gem5 arm Linux kernel patches</a>, and then run:</p>
+<p>With <a href="#arm-gic">GICv3</a>, tested at LKMC 224fae82e1a79d9551b941b19196c7e337663f22 gem5 3ca404da175a66e0b958165ad75eb5f54cb5e772 on vanilla kernel:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>./run \
+  --arch aarch64 \
+  --emulator gem5 \
+  --cpus 16 \
+  -- \
+  --machine-type VExpress_GEM5_V2 \
+;</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>boots to a shell and <code>nproc</code> shows <code>16</code>.</p>
+</div>
+<div class="paragraph">
+<p>For the GICv2 extension method, build the kernel with the <a href="#gem5-arm-linux-kernel-patches">gem5 arm Linux kernel patches</a>, and then run:</p>
 </div>
 <div class="literalblock">
 <div class="content">
@@ -19463,6 +19653,15 @@ Exiting @ tick 84500 because m5_exit instruction encountered</pre>
 <div class="sect3">
 <h4 id="gem5-checkpoint-internals"><a class="anchor" href="#gem5-checkpoint-internals"></a><a class="link" href="#gem5-checkpoint-internals">19.5.2. gem5 checkpoint internals</a></h4>
 <div class="paragraph">
+<p>A quick way to get a <a href="#gem5-syscall-emulation-mode">gem5 syscall emulation mode</a> or full system checkpoint to observe is:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>./run --arch aarch64 --emulator gem5 --baremetal userland/freestanding/gem5_checkpoint.S --trace-insts-stdout
+./run --arch aarch64 --emulator gem5 --userland userland/freestanding/gem5_checkpoint.S --trace-insts-stdout</pre>
+</div>
+</div>
+<div class="paragraph">
 <p>Checkpoints are stored inside the <a href="#m5out-directory">m5out directory</a> at:</p>
 </div>
 <div class="literalblock">
@@ -19485,6 +19684,22 @@ Exiting @ tick 84500 because m5_exit instruction encountered</pre>
 <div class="paragraph">
 <p>The <code>-r N</code> integer value is just pure <code>fs.py</code> sugar, the backend at <code>m5.instantiate</code> just takes the actual tracepoint directory path as input.</p>
 </div>
+<div class="paragraph">
+<p>The file <code>m5out/cpt.1000/m5.cpt</code> contains almost everything in the checkpoint except memory.</p>
+</div>
+<div class="paragraph">
+<p>It is a <a href="https://docs.python.org/3/library/configparser.html">Python configparser compatible file</a> with a section structure that matches the <a href="#gem5-python-c-interaction">SimObject</a> tree e.g.:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>[system.cpu.itb.walker.power_state]
+currState=0
+prvEvalTick=0</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>When a checkpoint is taken, each <code>SimObject</code> calls its overridden <code>serialize</code> method to generate the checkpoint, and when loading, <code>unserialize</code> is called.</p>
+</div>
 </div>
 <div class="sect3">
 <h4 id="gem5-restore-new-script"><a class="anchor" href="#gem5-restore-new-script"></a><a class="link" href="#gem5-restore-new-script">19.5.3. gem5 checkpoint restore and run a different script</a></h4>
@@ -19742,6 +19957,9 @@ cat "$(./getvar --arch aarch64 --emulator gem5 trace_txt_file)"</pre>
 <div class="ulist">
 <ul>
 <li>
+<p><a href="https://stackoverflow.com/questions/60876259/which-system-characteristics-such-as-number-of-cores-of-cache-configurations-can" class="bare">https://stackoverflow.com/questions/60876259/which-system-characteristics-such-as-number-of-cores-of-cache-configurations-can</a></p>
+</li>
+<li>
 <p><a href="https://stackoverflow.com/questions/49011096/how-to-switch-cpu-models-in-gem5-after-restoring-a-checkpoint-and-then-observe-t" class="bare">https://stackoverflow.com/questions/49011096/how-to-switch-cpu-models-in-gem5-after-restoring-a-checkpoint-and-then-observe-t</a></p>
 </li>
 </ul>
@@ -19872,6 +20090,41 @@ FullO3CPU: Ticking main, FullO3CPU.
 </div>
 </div>
 </div>
+<div class="sect3">
+<h4 id="gem5-checkpoint-upgrader"><a class="anchor" href="#gem5-checkpoint-upgrader"></a><a class="link" href="#gem5-checkpoint-upgrader">19.5.5. gem5 checkpoint upgrader</a></h4>
+<div class="paragraph">
+<p>The in-tree <code>util/cpt_upgrader.py</code> is a tool to upgrade checkpoints taken from an older version of gem5 to be compatible with the newest version, so you can update gem5 without having to re-run the simulation that generated the checkpoints.</p>
+</div>
+<div class="paragraph">
+<p>For example, whenever a <a href="#arm-system-register-instructions">system register is added in ARMv8</a>, old checkpoints break unless upgraded.</p>
+</div>
+<div class="paragraph">
+<p>Unfortunately, since the process is not very automated (automatable?), and requires manually patching the upgrader every time a new breaking change is done, the upgrader tends to break soon if you try to move many versions of gem5 ahead as of 2020. This is evidenced in bug reports such as this one: <a href="https://gem5.atlassian.net/browse/GEM5-472" class="bare">https://gem5.atlassian.net/browse/GEM5-472</a></p>
+</div>
+<div class="paragraph">
+<p>The script can be used as:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>util/cpt_upgrader.py m5out/cpt.1000/m5.cpt</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>This updates the <code>m5.cpt</code> file in-place, and a <code>m5out/cpt.1000/m5.cpt.bak</code> is generated as a backup of the old file.</p>
+</div>
+<div class="paragraph">
+<p>The upgrader determines which upgrades are needed by checking the <code>version_tags</code> entry of the checkpoint:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>[Globals]
+version_tags=arm-ccregs arm-contextidr-el2 arm-gem5-gic-ext ...</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Each of those tags corresponds to a Python file under <code>util/cpt_upgraders/</code> e.g. <code>util/cpt_upgraders/arm-ccregs.py</code>.</p>
+</div>
+</div>
 </div>
 <div class="sect2">
 <h3 id="pass-extra-options-to-gem5"><a class="anchor" href="#pass-extra-options-to-gem5"></a><a class="link" href="#pass-extra-options-to-gem5">19.6. Pass extra options to gem5</a></h3>
@@ -21161,6 +21414,9 @@ xdg-open "$(./getvar --arch arm --emulator gem5 m5out_dir)/config.dot.svg"</pre>
 </div>
 </div>
 <div class="paragraph">
+<p>Sample run time: 87 minutes on <a href="#p51">P51</a> Ubuntu 20.04 gem5 872cb227fdc0b4d60acc7840889d567a6936b6e1.</p>
+</div>
+<div class="paragraph">
 <p>After the first run has downloaded the test binaries for you, you can speed up the process a little bit by skipping an useless SCons call:</p>
 </div>
 <div class="literalblock">
@@ -21176,7 +21432,7 @@ xdg-open "$(./getvar --arch arm --emulator gem5 m5out_dir)/config.dot.svg"</pre>
 </div>
 <div class="literalblock">
 <div class="content">
-<pre>./gem5-regression --arch aarch64 --cmd list</pre>
+<pre>./gem5-regression --arch aarch64 --cmd list -- --length quick --length long</pre>
 </div>
 </div>
 <div class="paragraph">
@@ -21333,13 +21589,13 @@ Exiting @ tick 18446744073709551615 because simulate() limit reached</pre>
 <div class="sect3">
 <h4 id="gem5-debug-build"><a class="anchor" href="#gem5-debug-build"></a><a class="link" href="#gem5-debug-build">19.15.1. gem5 debug build</a></h4>
 <div class="paragraph">
-<p>How to use it in LKMC: <a href="#debug-the-emulator">Section 18.7, &#8220;Debug the emulator&#8221;</a>.</p>
+<p>How to use it in LKMC: <a href="#debug-the-emulator">Section 18.8, &#8220;Debug the emulator&#8221;</a>.</p>
 </div>
 <div class="paragraph">
 <p>If you build gem5 with <code>scons build/ARM/gem5.debug</code>, then that is a <code>.debug</code> build.</p>
 </div>
 <div class="paragraph">
-<p>It relates to the more common <code>.opt</code> build just as explained at <a href="#debug-the-emulator">Section 18.7, &#8220;Debug the emulator&#8221;</a>: both <code>.opt</code> and <code>.debug</code> have <code>-g</code>, but <code>.opt</code> uses <code>-O2</code> while <code>.debug</code> uses <code>-O0</code>.</p>
+<p>It relates to the more common <code>.opt</code> build just as explained at <a href="#debug-the-emulator">Section 18.8, &#8220;Debug the emulator&#8221;</a>: both <code>.opt</code> and <code>.debug</code> have <code>-g</code>, but <code>.opt</code> uses <code>-O2</code> while <code>.debug</code> uses <code>-O0</code>.</p>
 </div>
 </div>
 <div class="sect3">
@@ -21533,7 +21789,7 @@ Indirect leak of 1346 byte(s) in 2 object(s) allocated from:
 </ul>
 </div>
 <div class="paragraph">
-<p>Note that the <code>--ruby</code> option has some crazy side effects besides enabling Ruby, e.g. it <a href="https://github.com/gem5/gem5/blob/9fc9c67b4242c03f165951775be5cd0812f2a705/configs/ruby/Ruby.py#L61">sets the default <code>--cpu-type</code> to <code>TimingSimpleCPU</code> instead of the otherwise default <code>AtomicSimpleCPU</code></a>. But why.</p>
+<p>Note that the <code>--ruby</code> option has some crazy side effects besides enabling Ruby, e.g. it <a href="https://github.com/gem5/gem5/blob/9fc9c67b4242c03f165951775be5cd0812f2a705/configs/ruby/Ruby.py#L61">sets the default <code>--cpu-type</code> to <code>TimingSimpleCPU</code> instead of the otherwise default <code>AtomicSimpleCPU</code></a>. TODO: I have been told that this is because <a href="#gem5-functional-vs-atomic-vs-timing-memory-requests">sends the packet atomically,atomic requests do not work with Ruby, only timing</a>.</p>
 </div>
 <div class="paragraph">
 <p>It is not possible to build more than one Ruby system into a single build, and this is a major pain point for testing Ruby: <a href="https://gem5.atlassian.net/browse/GEM5-467" class="bare">https://gem5.atlassian.net/browse/GEM5-467</a></p>
@@ -21786,7 +22042,7 @@ class SystemXBar(CoherentXBar):</pre>
 <div class="ulist">
 <ul>
 <li>
-<p><code>DerivO3CPU : public FullO3CPU&lt;O3CPUImpl&gt;</code>: <a href="#gem5-derivo3cpu">gem5 DerivO3CPU</a></p>
+<p><code>DerivO3CPU : public FullO3CPU&lt;O3CPUImpl&gt;</code>: <a href="#gem5-derivo3cpu">gem5 <code>DerivO3CPU</code></a></p>
 </li>
 </ul>
 </div>
@@ -21853,13 +22109,13 @@ class SystemXBar(CoherentXBar):</pre>
 <div class="sect4">
 <h5 id="gem5-minorcpu"><a class="anchor" href="#gem5-minorcpu"></a><a class="link" href="#gem5-minorcpu">19.16.1.2. gem5 MinorCPU</a></h5>
 <div class="paragraph">
-<p>Generic in-order core that does not model any specific CPU.</p>
+<p>Generic <a href="#out-of-order-execution">in-order</a> <a href="#superscalar-processor">superscalar</a> core.</p>
 </div>
 <div class="paragraph">
 <p>Its C++ implementation that can be parametrized to more closely match real cores.</p>
 </div>
 <div class="paragraph">
-<p>Note that since gem5 is highly parametrizable, the parametrization could even change which instructions a CPU can execute by altering its available <a href="https://en.wikipedia.org/wiki/Execution_unit">functional units</a>, which are used to model performance.</p>
+<p>Note that since gem5 is highly parametrizable, the parametrization could even change which instructions a CPU can execute by altering its available <a href="#gem5-functional-units">functional units</a>, which are used to model performance.</p>
 </div>
 <div class="paragraph">
 <p>For example, <code>MinorCPU</code> allows all implemented instructions, including <a href="#arm-sve">ARM SVE</a> instructions, but a derived class modelling, say, an <a href="https://en.wikipedia.org/wiki/ARM_Cortex-A7">ARM Cortex A7 core</a>, might not, since SVE is a newer feature and the A7 core does not have SVE.</p>
@@ -21917,17 +22173,40 @@ class SystemXBar(CoherentXBar):</pre>
 </div>
 </div>
 <div class="sect4">
-<h5 id="gem5-derivo3cpu"><a class="anchor" href="#gem5-derivo3cpu"></a><a class="link" href="#gem5-derivo3cpu">19.16.1.3. gem5 DerivO3CPU</a></h5>
+<h5 id="gem5-derivo3cpu"><a class="anchor" href="#gem5-derivo3cpu"></a><a class="link" href="#gem5-derivo3cpu">19.16.1.3. gem5 <code>DerivO3CPU</code></a></h5>
 <div class="paragraph">
 <p>Generic <a href="#out-of-order-execution">out-of-order core</a>. "O3" Stands for "Out Of Order"!</p>
 </div>
 <div class="paragraph">
+<p>Basic documentation on the old gem5 wiki: <a href="http://www.m5sim.org/O3CPU" class="bare">http://www.m5sim.org/O3CPU</a></p>
+</div>
+<div class="paragraph">
 <p>Analogous to <a href="#gem5-minorcpu">MinorCPU</a>, but modelling an out of order core instead of in order.</p>
 </div>
 <div class="paragraph">
 <p>A commented execution example can be seen at: <a href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis">gem5 event queue DerivO3CPU syscall emulation freestanding example analysis</a>.</p>
 </div>
 <div class="paragraph">
+<p>The default <a href="#execution-unit">functional units</a> are described at: <a href="#gem5-derivo3cpu-default-functional-units">gem5 DerivO3CPU default functional units</a>. All default widths are set to 8 instructions, from the <a href="#gem5-config-ini"><code>config.ini</code></a>:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>[system.cpu]
+type=DerivO3CPU
+commitWidth=8
+decodeWidth=8
+dispatchWidth=8
+fetchWidth=8
+issueWidth=8
+renameWidth=8
+squashWidth=8
+wbWidth=8</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>This can be observed for example at: <a href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-hazardless">gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazardless</a>.</p>
+</div>
+<div class="paragraph">
 <p>Existing parametrizations:</p>
 </div>
 <div class="ulist">
@@ -21953,6 +22232,77 @@ class SystemXBar(CoherentXBar):</pre>
 </li>
 </ul>
 </div>
+<div class="sect5">
+<h6 id="gem5-derivo3cpu-pipeline-stages"><a class="anchor" href="#gem5-derivo3cpu-pipeline-stages"></a><a class="link" href="#gem5-derivo3cpu-pipeline-stages">19.16.1.3.1. gem5 <code>DerivO3CPU</code> pipeline stages</a></h6>
+<div class="ulist">
+<ul>
+<li>
+<p>fetch: besides obviously fetching the instruction, this is also where branch prediction runs. Presumably because you need to branch predict before deciding what to fetch next.</p>
+</li>
+<li>
+<p>retire: the instruction is completely and totally done with.</p>
+<div class="paragraph">
+<p>Mispeculated instructions never reach this stage as can be seen at: <a href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-speculative">gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: speculative</a>.</p>
+</div>
+<div class="paragraph">
+<p>The <code>ExecAll</code> happens at this time as well. And therefore <code>ExecAll</code> does not happen for mispeculated instructions.</p>
+</div>
+</li>
+</ul>
+</div>
+</div>
+<div class="sect5">
+<h6 id="gem5-utilo3-pipeview-py-o3-pipeline-viewer"><a class="anchor" href="#gem5-utilo3-pipeview-py-o3-pipeline-viewer"></a><a class="link" href="#gem5-utilo3-pipeview-py-o3-pipeline-viewer">19.16.1.3.2. gem5 util/o3-pipeview.py O3 pipeline viewer</a></h6>
+<div id="gem5-util-o3-pipeview-py-o3-pipeline-viewer" class="paragraph">
+<p>Mentioned at: <a href="http://www.m5sim.org/Visualization" class="bare">http://www.m5sim.org/Visualization</a></p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>./run \
+  --arch aarch64 \
+  --emulator gem5 \
+  --userland userland/arch/aarch64/freestanding/linux/hello.S \
+  --trace O3PipeView \
+  --trace-stdout \
+  -- \
+  --cpu-type DerivO3CPU \
+  --caches \
+;
+"$(./getvar gem5_source_dir)/util/o3-pipeview.py" -c 500 -o o3pipeview.tmp.log --color "$(./getvar --arch aarch64 trace_txt_file)"
+less -R o3pipeview.tmp.log</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Or without color:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>"$(./getvar gem5_source_dir)/util/o3-pipeview.py" -c 500 -o o3pipeview.tmp.log "$(./getvar --arch aarch64 trace_txt_file)"
+less o3pipeview.tmp.log</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>A sample output for this can be seen at: <a href="#hazardless-o3-pipeline">[hazardless-o3-pipeline]</a>.</p>
+</div>
+</div>
+<div class="sect5">
+<h6 id="gem5-konata-o3-pipeline-viewer"><a class="anchor" href="#gem5-konata-o3-pipeline-viewer"></a><a class="link" href="#gem5-konata-o3-pipeline-viewer">19.16.1.3.3. gem5 Konata O3 pipeline viewer</a></h6>
+<div class="paragraph">
+<p><a href="https://github.com/shioyadan/Konata" class="bare">https://github.com/shioyadan/Konata</a></p>
+</div>
+<div class="paragraph">
+<p><a href="http://learning.gem5.org/tutorial/presentations/vis-o3-gem5.pdf" class="bare">http://learning.gem5.org/tutorial/presentations/vis-o3-gem5.pdf</a></p>
+</div>
+<div class="paragraph">
+<p>Appears to be browser based, so you can zoom in and out, rather than the forced wrapping as for <a href="#gem5-util-o3-pipeview-py-o3-pipeline-viewer">[gem5-util-o3-pipeview-py-o3-pipeline-viewer]</a>.</p>
+</div>
+<div class="paragraph">
+<p>Uses the same data source as <code>util/o3-pipeview.py</code>.</p>
+</div>
+<div class="paragraph">
+<p><a href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-stall-gain">gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall-gain</a> shows how the text-based visualization can get problematic due to stalls requiring wraparounds.</p>
+</div>
+</div>
 </div>
 </div>
 <div class="sect3">
@@ -22660,7 +23010,7 @@ for source in PySource.all:
 </div>
 <div class="literalblock">
 <div class="content">
-<pre>b Trace::OstreamLogger::logMessage()
+<pre>b Trace::OstreamLogger::logMessage
 b EventManager::schedule
 b EventFunctionWrapper::process</pre>
 </div>
@@ -23777,7 +24127,7 @@ DRAMCtrl::Rank::startup(Tick ref_tick)
 </div>
 </div>
 <div class="paragraph">
-<p>se we deduce that the vitual address 0x400078 maps to the physical address 0x78. But of course, <a href="https://lmgtfy.com/">let me log that for you</a> byu adding <code>--trace MMU</code>:</p>
+<p>so we deduce that the virtual address 0x400078 maps to the physical address 0x78. But of course, <a href="https://lmgtfy.com/">let me log that for you</a> by adding <code>--trace MMU</code>:</p>
 </div>
 <div class="literalblock">
 <div class="content">
@@ -23799,12 +24149,11 @@ DRAMCtrl::Rank::startup(Tick ref_tick)
 <p>Now that we are here, we might as well learn how to log the data that was fetched from DRAM.</p>
 </div>
 <div class="paragraph">
-<p>Fist we determine the expected bytes from:</p>
+<p>Fist we determine the expected bytes from the <a href="#disas">disassembly</a>:</p>
 </div>
 <div class="literalblock">
 <div class="content">
-<pre>./run-toolchain --arch aarch64 objdump -- \
-  -D "$(./getvar --arch aarch64 userland_build_dir)/arch/aarch64/freestanding/linux/hello.out"</pre>
+<pre>./disas --arch aarch64 --userland userland/arch/aarch64/freestanding/linux/hello.S _start</pre>
 </div>
 </div>
 <div class="paragraph">
@@ -23812,9 +24161,8 @@ DRAMCtrl::Rank::startup(Tick ref_tick)
 </div>
 <div class="literalblock">
 <div class="content">
-<pre>0000000000400078 &lt;_start&gt;:
-  400078:       d2800020        mov     x0, #0x1                        // #1
-  40007c:       100000e1        adr     x1, 400098 &lt;msg&gt;</pre>
+<pre>   0x0000000000400078 &lt;+0&gt;:     20 00 80 d2     mov     x0, #0x1                        // #1
+   0x000000000040007c &lt;+4&gt;:     e1 00 00 10     adr     x1, 0x400098 &lt;msg&gt;</pre>
 </div>
 </div>
 <div class="paragraph">
@@ -24599,6 +24947,26 @@ TimingSimpleCPU::IcachePort::ITickEvent::process</pre>
 <p>Contrast this with the non <code>--cache</code> version seen at <a href="#timingsimplecpu-analysis-5">TimingSimpleCPU analysis #5</a> in which DRAM only actually reads the 4 required bytes.</p>
 </div>
 <div class="paragraph">
+<p>The only cryptic thing about the messages is the <code>IF</code> flag, but good computer architects would have guessed it correctly, and <a href="https://github.com/gem5/gem5/blob/fa70478413e4650d0058cbfe81fd5ce362101994/src/mem/packet.cc#L372">src/mem/packet.cc</a> confirms:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>void
+Packet::print(std::ostream &amp;o, const int verbosity,
+              const std::string &amp;prefix) const
+{
+    ccprintf(o, "%s%s [%x:%x]%s%s%s%s%s%s", prefix, cmdString(),
+             getAddr(), getAddr() + getSize() - 1,
+             req-&gt;isSecure() ? " (s)" : "",
+             req-&gt;isInstFetch() ? " IF" : "",
+             req-&gt;isUncacheable() ? " UC" : "",
+             isExpressSnoop() ? " ES" : "",
+             req-&gt;isToPOC() ? " PoC" : "",
+             req-&gt;isToPOU() ? " PoU" : "");
+}</pre>
+</div>
+</div>
+<div class="paragraph">
 <p>Another interesting observation of running with <code>--trace Cache,DRAM,XBar</code> is that between the execution of both instructions, there is a <code>Cache</code> event, but no <code>DRAM</code> or <code>XBar</code> events:</p>
 </div>
 <div class="literalblock">
@@ -24930,10 +25298,7 @@ non-atomic 19</pre>
 <p>The memory system system part must be similar to that of <code>TimingSimpleCPU</code> that we previously studied <a href="#gem5-event-queue-timingsimplecpu-syscall-emulation-freestanding-example-analysis">gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis</a>: the main thing we want to see is how the CPU pipeline speeds up execution by preventing some memory stalls.</p>
 </div>
 <div class="paragraph">
-<p>The <a href="#gem5-config-ini"><code>config.dot.svg</code></a> also indicates that: everything is exactly as in <a href="#gem5-event-queue-timingsimplecpu-syscall-emulation-freestanding-example-analysis-with-caches">gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis with caches</a>, except that the CPU is a <code>MinorCPU</code> instead of <code>TimingSimpleCPU</code>, and the <code>--caches</code> are now mandatory.</p>
-</div>
-<div class="paragraph">
-<p>TODO: analyze the trace for:</p>
+<p>The <a href="#gem5-config-ini"><code>config.dot.svg</code></a> also indicates that: everything is exactly as in <a href="#gem5-event-queue-timingsimplecpu-syscall-emulation-freestanding-example-analysis-with-caches">gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis with caches</a>, except that the CPU is a <code>MinorCPU</code> instead of <code>TimingSimpleCPU</code>, and the <code>--caches</code> are now mandatory:</p>
 </div>
 <div class="literalblock">
 <div class="content">
@@ -24941,7 +25306,7 @@ non-atomic 19</pre>
   --arch aarch64 \
   --emulator gem5 \
   --userland userland/arch/aarch64/freestanding/linux/hello.S \
-  --trace Event \
+  --trace FmtFlag,Cache,Event,ExecAll,Minor \
   --trace-stdout \
   -- \
   --cpu-type MinorCPU \
@@ -24949,11 +25314,898 @@ non-atomic 19</pre>
 ;</pre>
 </div>
 </div>
+<div class="paragraph">
+<p>and here&#8217;s a handy link to the source: <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/arch/aarch64/freestanding/linux/hello.S">userland/arch/aarch64/freestanding/linux/hello.S</a>.</p>
+</div>
+<div class="paragraph">
+<p>On LKMC ce3ea9faea95daf46dea80d4236a30a0891c3ca5 gem5 872cb227fdc0b4d60acc7840889d567a6936b6e1 we see the following.</p>
+</div>
+<div class="paragraph">
+<p>First there is a missed instruction fetch for the initial entry address which we know from <a href="#gem5-event-queue-timingsimplecpu-syscall-emulation-freestanding-example-analysis-with-caches">gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis with caches</a> is the virtual address 0x400078 which maps to physical 0x78:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>    500: Cache: system.cpu.icache: access for ReadReq [40:7f] IF miss</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The memory request comes back later on at:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>  77000: Cache: system.cpu.icache: recvTimingResp: Handling response ReadResp [40:7f] IF</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>and soon after the CPU also ifetches across the barrier:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>  79000: Cache: system.cpu.icache: access for ReadReq [80:bf] IF miss</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>TODO why? We have 0x78 and 0x7c, and those should be it since we <a href="#gem5-functional-units">are dual issue</a>, right? Is this prefetching at work?</p>
+</div>
+<div class="paragraph">
+<p>Later on we see the first instruction, our <a href="#arm-mov-instruction">MOVZ</a>, was decoded:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>  80000: MinorExecute: system.cpu.execute: Trying to issue inst: 0/1.1/1/1.1 pc: 0x400078 (movz) to FU: 0</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>and that issue succeeds, because the functional unit 0 (FU 0) is an <code>IntAlu</code> as shown at <a href="#gem5-functional-units">gem5 functional units</a>:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>  80000: MinorExecute: system.cpu.execute: Issuing inst: 0/1.1/1/1.1 pc: 0x400078 (movz) into FU 0</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>At the very same tick, the second instruction is also decoded, our <a href="#arm-adr-instruction">ADR</a>:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>  80000: MinorExecute: system.cpu.execute: Trying to issue inst: 0/1.1/1/2.2 pc: 0x40007c (adr) to FU: 0
+  80000: MinorExecute: system.cpu.execute: Can't issue as FU: 0 is already busy
+  80000: MinorExecute: system.cpu.execute: Trying to issue inst: 0/1.1/1/2.2 pc: 0x40007c (adr) to FU: 1
+  80000: MinorExecute: system.cpu.execute: Issuing inst: 0/1.1/1/2.2 pc: 0x40007c (adr) into FU 1</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>This is also an <code>IntAlu</code> instruction, and it can&#8217;t run on FU 0 because the first instruction is already running there. But to our luck, FU 1 is also an <code>IntAlu</code> unit, and so it runs there.</p>
+</div>
+<div class="paragraph">
+<p>Crap, those Minor logs should say what <code>OpClass</code> each instruction is, that would make things clearer.</p>
+</div>
+<div class="paragraph">
+<p>TODO what is that <code>0/1.1/1/1.1</code> notation that shows up everywhere? Must be important, let&#8217;s look at the source.</p>
+</div>
+<div class="paragraph">
+<p>Soon after (3 ticks later, so guessing due to <code>opLat=3</code>?), the execution appears to be over already since we see the <code>ExecAll</code> come through, which generally happens at the very end:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>  81500: MinorExecute: system.cpu.execute: Attempting to commit [tid:0]
+  81500: MinorExecute: system.cpu.execute: Committing micro-ops for interrupt[tid:0]
+  81500: MinorExecute: system.cpu.execute: Trying to commit canCommitInsts: 1
+  81500: MinorExecute: system.cpu.execute: Trying to commit from FUs
+  81500: MinorExecute: global: ExecContext setting PC: (0x400078=&gt;0x40007c).(0=&gt;1)
+  81500: MinorExecute: system.cpu.execute: Committing inst: 0/1.1/1/1.1 pc: 0x400078 (movz)
+  81500: MinorExecute: system.cpu.execute: Unstalling 0 for inst 0/1.1/1/1.1
+  81500: MinorExecute: system.cpu.execute: Completed inst: 0/1.1/1/1.1 pc: 0x400078 (movz)
+  81500: MinorScoreboard: system.cpu.execute.scoreboard0: Clearing inst: 0/1.1/1/1.1 pc: 0x400078 (movz) regIndex: 0 final numResults: 0
+  81500: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue    :   movz   x0, #1, #0        : IntAlu :  D=0x0000000000000001  FetchSeq=1  CPSeq=1  flags=(IsInteger)
+  81500: MinorExecute: system.cpu.execute: Trying to commit canCommitInsts: 1
+  81500: MinorExecute: system.cpu.execute: Trying to commit from FUs
+  81500: MinorExecute: global: ExecContext setting PC: (0x40007c=&gt;0x400080).(0=&gt;1)
+  81500: MinorExecute: system.cpu.execute: Committing inst: 0/1.1/1/2.2 pc: 0x40007c (adr)
+  81500: MinorExecute: system.cpu.execute: Unstalling 1 for inst 0/1.1/1/2.2
+  81500: MinorExecute: system.cpu.execute: Completed inst: 0/1.1/1/2.2 pc: 0x40007c (adr)
+  81500: MinorScoreboard: system.cpu.execute.scoreboard0: Clearing inst: 0/1.1/1/2.2 pc: 0x40007c (adr) regIndex: 1 final numResults: 0
+  81500: MinorExecute: system.cpu.execute: Reached inst commit limit
+  81500: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+4    :   adr   x1, #28            : IntAlu :  D=0x0000000000400098  FetchSeq=2  CPSeq=2  flags=(IsInteger)</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The ifetch for the third instruction returns at:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre> 129000: Cache: system.cpu.icache: recvTimingResp: Handling response ReadResp [80:bf] IF</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>so now we are ready to run the third and fourth instructions of the program:</p>
+</div>
+<div class="paragraph">
+<p>,&#8230;&#8203;
+    ldr x2, =len
+    mov x8, 64
+,&#8230;&#8203;</p>
+</div>
+<div class="paragraph">
+<p>The <a href="#arm-ldr-instruction">LDR</a> goes all the way down to FU 6 which is the memory one:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre> 132000: MinorExecute: system.cpu.execute: Trying to issue inst: 0/1.1/2/3.3 pc: 0x400080 (ldr) to FU: 0
+ 132000: MinorExecute: system.cpu.execute: Can't issue as FU: 0 isn't capable
+ 132000: MinorExecute: system.cpu.execute: Trying to issue inst: 0/1.1/2/3.3 pc: 0x400080 (ldr) to FU: 1
+ 132000: MinorExecute: system.cpu.execute: Can't issue as FU: 1 isn't capable
+ 132000: MinorExecute: system.cpu.execute: Trying to issue inst: 0/1.1/2/3.3 pc: 0x400080 (ldr) to FU: 2
+ 132000: MinorExecute: system.cpu.execute: Can't issue as FU: 2 isn't capable
+ 132000: MinorExecute: system.cpu.execute: Trying to issue inst: 0/1.1/2/3.3 pc: 0x400080 (ldr) to FU: 3
+ 132000: MinorExecute: system.cpu.execute: Can't issue as FU: 3 isn't capable
+ 132000: MinorExecute: system.cpu.execute: Trying to issue inst: 0/1.1/2/3.3 pc: 0x400080 (ldr) to FU: 4
+ 132000: MinorExecute: system.cpu.execute: Can't issue as FU: 4 isn't capable
+ 132000: MinorExecute: system.cpu.execute: Trying to issue inst: 0/1.1/2/3.3 pc: 0x400080 (ldr) to FU: 5
+ 132000: MinorExecute: system.cpu.execute: Can't issue as FU: 5 isn't capable
+ 132000: MinorExecute: system.cpu.execute: Trying to issue inst: 0/1.1/2/3.3 pc: 0x400080 (ldr) to FU: 6
+ 132000: MinorExecute: system.cpu.execute: Issuing inst: 0/1.1/2/3.3 pc: 0x400080 (ldr) into FU 6</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>and then the MOV issue follows soon afterwards (TODO why not at the same time like for the previous pair?):</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre> 132500: MinorExecute: system.cpu.execute: Trying to issue inst: 0/1.1/2/4.4 pc: 0x400084 (movz) to FU: 0
+ 132500: MinorExecute: system.cpu.execute: Issuing inst: 0/1.1/2/4.4 pc: 0x400084 (movz) into FU 0</pre>
+</div>
+</div>
+<div class="sect5">
+<h6 id="gem5-event-queue-minorcpu-syscall-emulation-freestanding-example-analysis-hazard"><a class="anchor" href="#gem5-event-queue-minorcpu-syscall-emulation-freestanding-example-analysis-hazard"></a><a class="link" href="#gem5-event-queue-minorcpu-syscall-emulation-freestanding-example-analysis-hazard">19.20.4.5.1. gem5 event queue MinorCPU syscall emulation freestanding example analysis: hazard</a></h6>
+<div class="paragraph">
+<p>TODO like <a href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-hazard">gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazard</a> but with the hazard.</p>
+</div>
+</div>
 </div>
 <div class="sect4">
 <h5 id="gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis"><a class="anchor" href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis"></a><a class="link" href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis">19.20.4.6. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis</a></h5>
 <div class="paragraph">
-<p>TODO: like <a href="#gem5-event-queue-minorcpu-syscall-emulation-freestanding-example-analysis">gem5 event queue MinorCPU syscall emulation freestanding example analysis</a> but even more complex!</p>
+<p>Like <a href="#gem5-event-queue-minorcpu-syscall-emulation-freestanding-example-analysis">gem5 event queue MinorCPU syscall emulation freestanding example analysis</a> but even more complex since for the <a href="#gem5-derivo3cpu">gem5 <code>DerivO3CPU</code></a>!</p>
+</div>
+<div class="paragraph">
+<p>The key new <a href="#gem5-tracing">debug flag</a> is <code>O3CPUAll</code>:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>./run \
+  --arch aarch64 \
+  --emulator gem5 \
+  --userland userland/arch/aarch64/freestanding/linux/hello.S \
+  --trace FmtFlag,Cache,Event,ExecAll,O3CPUAll \
+  --trace-stdout \
+  -- \
+  --cpu-type DerivO3CPU \
+  --caches \
+;</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The output is huge and contains about 7 thousand lines!!!</p>
+</div>
+<div class="paragraph">
+<p>This section and children are tested at LKMC 144a552cf926ea630ef9eadbb22b79fe2468c456.</p>
+</div>
+<div class="sect5">
+<h6 id="gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-hazardless"><a class="anchor" href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-hazardless"></a><a class="link" href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-hazardless">19.20.4.6.1. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazardless</a></h6>
+<div class="paragraph">
+<p>Let&#8217;s  have a look at the arguably simplest example <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/arch/aarch64/freestanding/linux/hazardless.S">userland/arch/aarch64/freestanding/linux/hazardless.S</a>.</p>
+</div>
+<div class="paragraph">
+<p>First let&#8217;s start with a <a href="#gem5-util-o3-pipeview-py-o3-pipeline-viewer">[gem5-util-o3-pipeview-py-o3-pipeline-viewer]</a> visualization:</p>
+</div>
+<div id="hazardless-o3-pipeline" class="listingblock">
+<div class="content">
+<pre>// f = fetch, d = decode, n = rename, p = dispatch, i = issue, c = complete, r = retire
+
+                                     timeline                                             tick          pc.upc     disasm                      seq_num
+[.ic.r........................................................................fdn]-(          40000) 0x00400078.0 movz x0, #0, #0           [         1]
+[.ic.r........................................................................fdn]-(          40000) 0x0040007c.0 movz x1, #1, #0           [         2]
+[....................fdn.ic.r....................................................]-(         120000) 0x00400080.0 movz x2, #2, #0           [         3]
+[....................fdn.ic.r....................................................]-(         120000) 0x00400084.0 movz x3, #3, #0           [         4]
+[....................fdn.ic.r....................................................]-(         120000) 0x00400088.0 movz x4, #4, #0           [         5]
+[....................fdn.ic.r....................................................]-(         120000) 0x0040008c.0 movz x5, #5, #0           [         6]
+[....................fdn.ic.r....................................................]-(         120000) 0x00400090.0 movz x6, #6, #0           [         7]
+[....................fdn.ic.r....................................................]-(         120000) 0x00400094.0 movz x7, #7, #0           [         8]
+[....................fdn.pic.r...................................................]-(         120000) 0x00400098.0 movz x8, #8, #0           [         9]
+[....................fdn.pic.r...................................................]-(         120000) 0x0040009c.0 movz x9, #9, #0           [        10]
+[.....................fdn.ic.r...................................................]-(         120000) 0x004000a0.0 movz x10, #10, #0         [        11]
+[.....................fdn.ic.r...................................................]-(         120000) 0x004000a4.0 movz x11, #11, #0         [        12]
+[.....................fdn.ic.r...................................................]-(         120000) 0x004000a8.0 movz x12, #12, #0         [        13]
+[.....................fdn.ic.r...................................................]-(         120000) 0x004000ac.0 movz x13, #13, #0         [        14]
+[.....................fdn.pic.r..................................................]-(         120000) 0x004000b0.0 movz x14, #14, #0         [        15]
+[.....................fdn.pic.r..................................................]-(         120000) 0x004000b4.0 movz x15, #15, #0         [        16]
+[.....................fdn.pic.r..................................................]-(         120000) 0x004000b8.0 movz x16, #16, #0         [        17]
+[.....................fdn.pic.r..................................................]-(         120000) 0x004000bc.0 movz x17, #17, #0         [        18]
+[............................................fdn.ic.r............................]-(         160000) 0x004000c0.0 movz x18, #18, #0         [        19]
+[............................................fdn.ic.r............................]-(         160000) 0x004000c4.0 movz x19, #19, #0         [        20]
+[............................................fdn.ic.r............................]-(         160000) 0x004000c8.0 movz x20, #20, #0         [        21]
+[............................................fdn.ic.r............................]-(         160000) 0x004000cc.0 movz x21, #21, #0         [        22]
+[............................................fdn.ic.r............................]-(         160000) 0x004000d0.0 movz x22, #22, #0         [        23]
+[............................................fdn.ic.r............................]-(         160000) 0x004000d4.0 movz x23, #23, #0         [        24]
+[............................................fdn.pic.r...........................]-(         160000) 0x004000d8.0 movz x24, #24, #0         [        25]
+[............................................fdn.pic.r...........................]-(         160000) 0x004000dc.0 movz x25, #25, #0         [        26]
+[.............................................fdn.ic.r...........................]-(         160000) 0x004000e0.0 movz x26, #26, #0         [        27]
+[.............................................fdn.ic.r...........................]-(         160000) 0x004000e4.0 movz x27, #27, #0         [        28]
+[.............................................fdn.ic.r...........................]-(         160000) 0x004000e8.0 movz x28, #28, #0         [        29]
+[.............................................fdn.ic.r...........................]-(         160000) 0x004000ec.0 movz x29, #29, #0         [        30]
+[.............................................fdn.pic.r..........................]-(         160000) 0x004000f0.0 movz x0, #0, #0           [        31]
+[.............................................fdn.pic.r..........................]-(         160000) 0x004000f4.0 movz x1, #1, #0           [        32]
+[.............................................fdn.pic.r..........................]-(         160000) 0x004000f8.0 movz x2, #2, #0           [        33]
+[.............................................fdn.pic.r..........................]-(         160000) 0x004000fc.0 movz x3, #3, #0           [        34]</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The first of instructions has only two instructions because the first instruction is at address 0x400078, so only two instructions fit on that cache line, as the next cache line starts at 0x400080!</p>
+</div>
+<div class="paragraph">
+<p>The initial <code>fdn</code> on top middle is likely bugged out, did it wrap around? But the rest makes sense.</p>
+</div>
+<div class="paragraph">
+<p>From this, we clearly see that up to 8 instructions can be issued concurrently, which matches the default width values we had seen at <a href="#gem5-derivo3cpu">gem5 <code>DerivO3CPU</code></a>.</p>
+</div>
+<div class="paragraph">
+<p>For example, we can clearly see how:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><code>movz x2</code> through to <code>movz x9</code> start running at the exact same time. TODO why does <code>mov x7</code> do <code>fdn.ic.r</code> while <code>mov x8</code> do <code>fdn.ic.r</code>? How are they different?</p>
+</li>
+<li>
+<p><code>movz x10</code> through <code>movz x17</code> then starts running one step later. This second chunk is fully pipelined with the first instruction pack</p>
+</li>
+<li>
+<p>then comes a pause while the next fetch comes back. This group of 16 instructions took up the entire 64-byte cacheline that had been read</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>First we can have a look at <code>ExecEnable</code> to get an initial ideal of how many instructions are run at one time:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>  78500: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue    :   movz   x0, #0, #0        : IntAlu :  D=0x0000000000000000  FetchSeq=1  CPSeq=1  flags=(IsInteger)
+  78500: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+4    :   movz   x1, #1, #0        : IntAlu :  D=0x0000000000000001  FetchSeq=2  CPSeq=2  flags=(IsInteger)
+
+ 130000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+8    :   movz   x2, #2, #0        : IntAlu :  D=0x0000000000000002  FetchSeq=3  CPSeq=3  flags=(IsInteger)
+ 130000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+12    :   movz   x3, #3, #0        : IntAlu :  D=0x0000000000000003  FetchSeq=4  CPSeq=4  flags=(IsInteger)
+ 130000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+16    :   movz   x4, #4, #0        : IntAlu :  D=0x0000000000000004  FetchSeq=5  CPSeq=5  flags=(IsInteger)
+ 130000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+20    :   movz   x5, #5, #0        : IntAlu :  D=0x0000000000000005  FetchSeq=6  CPSeq=6  flags=(IsInteger)
+ 130000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+24    :   movz   x6, #6, #0        : IntAlu :  D=0x0000000000000006  FetchSeq=7  CPSeq=7  flags=(IsInteger)
+ 130000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+28    :   movz   x7, #7, #0        : IntAlu :  D=0x0000000000000007  FetchSeq=8  CPSeq=8  flags=(IsInteger)
+ 130000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+32    :   movz   x8, #8, #0        : IntAlu :  D=0x0000000000000008  FetchSeq=9  CPSeq=9  flags=(IsInteger)
+ 130000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+36    :   movz   x9, #9, #0        : IntAlu :  D=0x0000000000000009  FetchSeq=10  CPSeq=10  flags=(IsInteger)
+
+ 130500: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+40    :   movz   x10, #10, #0      : IntAlu :  D=0x000000000000000a  FetchSeq=11  CPSeq=11  flags=(IsInteger)
+ 130500: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+44    :   movz   x11, #11, #0      : IntAlu :  D=0x000000000000000b  FetchSeq=12  CPSeq=12  flags=(IsInteger)
+ 130500: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+48    :   movz   x12, #12, #0      : IntAlu :  D=0x000000000000000c  FetchSeq=13  CPSeq=13  flags=(IsInteger)
+ 130500: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+52    :   movz   x13, #13, #0      : IntAlu :  D=0x000000000000000d  FetchSeq=14  CPSeq=14  flags=(IsInteger)
+ 130500: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+56    :   movz   x14, #14, #0      : IntAlu :  D=0x000000000000000e  FetchSeq=15  CPSeq=15  flags=(IsInteger)
+ 130500: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+60    :   movz   x15, #15, #0      : IntAlu :  D=0x000000000000000f  FetchSeq=16  CPSeq=16  flags=(IsInteger)
+ 130500: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+64    :   movz   x16, #16, #0      : IntAlu :  D=0x0000000000000010  FetchSeq=17  CPSeq=17  flags=(IsInteger)
+ 130500: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+68    :   movz   x17, #17, #0      : IntAlu :  D=0x0000000000000011  FetchSeq=18  CPSeq=18  flags=(IsInteger)</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>This suggests 8, but remember that <code>ExecEnable</code> shows issue time labels, which do not coincide necessarily with commit times. As we saw in the pipeline viewer above, instructions 9 and 10 have one extra stage.</p>
+</div>
+<div class="paragraph">
+<p>After the initial two execs from the first cache line, the full commit log chunk around the first group of six `ExecEnable`s looks like:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre> 133500: Commit: system.cpu.commit: Getting instructions from Rename stage.
+ 133500: Commit: system.cpu.commit: Trying to commit instructions in the ROB.
+
+ 133500: Commit: system.cpu.commit: Trying to commit head instruction, [tid:0] [sn:3]
+ 133500: Commit: system.cpu.commit: [tid:0] [sn:3] Committing instruction with PC (0x400080=&gt;0x400084).(0=&gt;1)
+ 130000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+8    :   movz   x2, #2, #0        : IntAlu :  D=0x0000000000000002  FetchSeq=3  CPSeq=3  flags=(IsInteger)
+ 133500: ROB: system.cpu.rob: [tid:0] Retiring head instruction, instruction PC (0x400080=&gt;0x400084).(0=&gt;1), [sn:3]
+ 133500: O3CPU: system.cpu: Removing committed instruction [tid:0] PC (0x400080=&gt;0x400084).(0=&gt;1) [sn:3]
+
+ 133500: Commit: system.cpu.commit: Trying to commit head instruction, [tid:0] [sn:4]
+ 133500: Commit: system.cpu.commit: [tid:0] [sn:4] Committing instruction with PC (0x400084=&gt;0x400088).(0=&gt;1)
+ 130000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+12    :   movz   x3, #3, #0        : IntAlu :  D=0x0000000000000003  FetchSeq=4  CPSeq=4  flags=(IsInteger)
+ 133500: ROB: system.cpu.rob: [tid:0] Retiring head instruction, instruction PC (0x400084=&gt;0x400088).(0=&gt;1), [sn:4]
+ 133500: O3CPU: system.cpu: Removing committed instruction [tid:0] PC (0x400084=&gt;0x400088).(0=&gt;1) [sn:4]
+
+ 133500: Commit: system.cpu.commit: Trying to commit head instruction, [tid:0] [sn:5]
+ 133500: Commit: system.cpu.commit: [tid:0] [sn:5] Committing instruction with PC (0x400088=&gt;0x40008c).(0=&gt;1)
+ 130000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+16    :   movz   x4, #4, #0        : IntAlu :  D=0x0000000000000004  FetchSeq=5  CPSeq=5  flags=(IsInteger)
+ 133500: ROB: system.cpu.rob: [tid:0] Retiring head instruction, instruction PC (0x400088=&gt;0x40008c).(0=&gt;1), [sn:5]
+ 133500: O3CPU: system.cpu: Removing committed instruction [tid:0] PC (0x400088=&gt;0x40008c).(0=&gt;1) [sn:5]
+
+ 133500: Commit: system.cpu.commit: Trying to commit head instruction, [tid:0] [sn:6]
+ 133500: Commit: system.cpu.commit: [tid:0] [sn:6] Committing instruction with PC (0x40008c=&gt;0x400090).(0=&gt;1)
+ 130000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+20    :   movz   x5, #5, #0        : IntAlu :  D=0x0000000000000005  FetchSeq=6  CPSeq=6  flags=(IsInteger)
+ 133500: ROB: system.cpu.rob: [tid:0] Retiring head instruction, instruction PC (0x40008c=&gt;0x400090).(0=&gt;1), [sn:6]
+ 133500: O3CPU: system.cpu: Removing committed instruction [tid:0] PC (0x40008c=&gt;0x400090).(0=&gt;1) [sn:6]
+
+ 133500: Commit: system.cpu.commit: Trying to commit head instruction, [tid:0] [sn:7]
+ 133500: Commit: system.cpu.commit: [tid:0] [sn:7] Committing instruction with PC (0x400090=&gt;0x400094).(0=&gt;1)
+ 130000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+24    :   movz   x6, #6, #0        : IntAlu :  D=0x0000000000000006  FetchSeq=7  CPSeq=7  flags=(IsInteger)
+ 133500: ROB: system.cpu.rob: [tid:0] Retiring head instruction, instruction PC (0x400090=&gt;0x400094).(0=&gt;1), [sn:7]
+ 133500: O3CPU: system.cpu: Removing committed instruction [tid:0] PC (0x400090=&gt;0x400094).(0=&gt;1) [sn:7]
+
+ 133500: Commit: system.cpu.commit: Trying to commit head instruction, [tid:0] [sn:8]
+ 133500: Commit: system.cpu.commit: [tid:0] [sn:8] Committing instruction with PC (0x400094=&gt;0x400098).(0=&gt;1)
+ 130000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+28    :   movz   x7, #7, #0        : IntAlu :  D=0x0000000000000007  FetchSeq=8  CPSeq=8  flags=(IsInteger)
+ 133500: ROB: system.cpu.rob: [tid:0] Retiring head instruction, instruction PC (0x400094=&gt;0x400098).(0=&gt;1), [sn:8]
+ 133500: O3CPU: system.cpu: Removing committed instruction [tid:0] PC (0x400094=&gt;0x400098).(0=&gt;1) [sn:8]
+
+ 133500: Commit: system.cpu.commit: [tid:0] Marking PC (0x400098=&gt;0x40009c).(0=&gt;1), [sn:9] ready within ROB.
+ 133500: Commit: system.cpu.commit: [tid:0] Marking PC (0x40009c=&gt;0x4000a0).(0=&gt;1), [sn:10] ready within ROB.
+ 133500: Commit: system.cpu.commit: [tid:0] Marking PC (0x4000a0=&gt;0x4000a4).(0=&gt;1), [sn:11] ready within ROB.
+ 133500: Commit: system.cpu.commit: [tid:0] Marking PC (0x4000a4=&gt;0x4000a8).(0=&gt;1), [sn:12] ready within ROB.
+ 133500: Commit: system.cpu.commit: [tid:0] Marking PC (0x4000a8=&gt;0x4000ac).(0=&gt;1), [sn:13] ready within ROB.
+ 133500: Commit: system.cpu.commit: [tid:0] Marking PC (0x4000ac=&gt;0x4000b0).(0=&gt;1), [sn:14] ready within ROB.
+ 133500: Commit: system.cpu.commit: [tid:0] Instruction [sn:9] PC (0x400098=&gt;0x40009c).(0=&gt;1) is head of ROB and ready to commit
+ 133500: Commit: system.cpu.commit: [tid:0] ROB has 10 insts &amp; 182 free entries.</pre>
+</div>
+</div>
+<div class="paragraph">
+<p><code>ROB</code> stands for <a href="#re-order-buffer">Re-order buffer</a>.</p>
+</div>
+<div class="paragraph">
+<p><code>0x400080&#8658;0x400084</code> is an old/new PC address of the first committed instruction.</p>
+</div>
+<div class="paragraph">
+<p>Another thing we can do, it to try to follow one of the instructions back as it goes through the pipeline. Searching for example for the address <code>0x400080</code>, we find:</p>
+</div>
+<div class="paragraph">
+<p>The first mention of the address happens when is the fetch of the two initial instructions completes. TODO not sure why it doesn&#8217;t just also fetch the next cache line at the same time:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>FullO3CPU: Ticking main, FullO3CPU.
+  78500: Fetch: system.cpu.fetch: Running stage.
+  78500: Fetch: system.cpu.fetch: Attempting to fetch from [tid:0]
+  78500: Fetch: system.cpu.fetch: [tid:0] Icache miss is complete.
+  78500: Fetch: system.cpu.fetch: [tid:0] Adding instructions to queue to decode.
+  78500: DynInst: global: DynInst: [sn:1] Instruction created. Instcount for system.cpu = 1
+  78500: Fetch: system.cpu.fetch: [tid:0] Instruction PC 0x400078 (0) created [sn:1].
+  78500: Fetch: system.cpu.fetch: [tid:0] Instruction is:   movz   x0, #0, #0
+  78500: Fetch: system.cpu.fetch: [tid:0] Fetch queue entry created (1/32).
+  78500: DynInst: global: DynInst: [sn:2] Instruction created. Instcount for system.cpu = 2
+  78500: Fetch: system.cpu.fetch: [tid:0] Instruction PC 0x40007c (0) created [sn:2].
+  78500: Fetch: system.cpu.fetch: [tid:0] Instruction is:   movz   x1, #1, #0
+  78500: Fetch: system.cpu.fetch: [tid:0] Fetch queue entry created (2/32).
+  78500: Fetch: system.cpu.fetch: [tid:0] Issuing a pipelined I-cache access, starting at PC (0x400080=&gt;0x400084).(0=&gt;1).
+  78500: Fetch: system.cpu.fetch: [tid:0] Fetching cache line 0x400080 for addr 0x400080</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>so we observe that the first two instructions arrived, and the CPU noticed that 0x400080 hasn&#8217;t been fetched yet.</p>
+</div>
+<div class="paragraph">
+<p>Then for several cycles that follow, the fetch stage just says that it is blocked on data returning, e.g. the</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>FullO3CPU: Ticking main, FullO3CPU.
+  79000: Fetch: system.cpu.fetch: Running stage.
+  79000: Fetch: system.cpu.fetch: There are no more threads available to fetch from.
+  79000: Fetch: system.cpu.fetch: [tid:0] Fetch is waiting cache response!</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>At the same time, the execution of the initial 2 instructions progresses through the pipeline.</p>
+</div>
+<div class="paragraph">
+<p>These progress up until:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>  88000: O3CPU: system.cpu: Idle!</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>at which point there are no more events scheduled besides waiting for the second cache line to come back.</p>
+</div>
+<div class="paragraph">
+<p>After this, some time passes without events, and the next tick happens when the fetch data returns:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>FullO3CPU: Ticking main, FullO3CPU.
+ 130000: Fetch: system.cpu.fetch: Running stage.
+ 130000: Fetch: system.cpu.fetch: Attempting to fetch from [tid:0]
+ 130000: Fetch: system.cpu.fetch: [tid:0] Icache miss is complete.
+ 130000: Fetch: system.cpu.fetch: [tid:0] Adding instructions to queue to decode.
+ 130000: DynInst: global: DynInst: [sn:3] Instruction created. Instcount for system.cpu = 1
+ 130000: Fetch: system.cpu.fetch: [tid:0] Instruction PC 0x400080 (0) created [sn:3].
+ 130000: Fetch: system.cpu.fetch: [tid:0] Instruction is:   movz   x2, #2, #0
+ 130000: Fetch: system.cpu.fetch: [tid:0] Fetch queue entry created (1/32).
+ 130000: DynInst: global: DynInst: [sn:4] Instruction created. Instcount for system.cpu = 2
+ 130000: Fetch: system.cpu.fetch: [tid:0] Instruction PC 0x400084 (0) created [sn:4].
+ 130000: Fetch: system.cpu.fetch: [tid:0] Instruction is:   movz   x3, #3, #0
+ 130000: Fetch: system.cpu.fetch: [tid:0] Fetch queue entry created (2/32).
+ 130000: DynInst: global: DynInst: [sn:5] Instruction created. Instcount for system.cpu = 3</pre>
+</div>
+</div>
+</div>
+<div class="sect5">
+<h6 id="gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-hazard"><a class="anchor" href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-hazard"></a><a class="link" href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-hazard">19.20.4.6.2. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazard</a></h6>
+<div class="paragraph">
+<p>Now let&#8217;s do the same as in <a href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-hazardless">gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazardless</a> but with a hazard: <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/arch/aarch64/freestanding/linux/hazard.S">userland/arch/aarch64/freestanding/linux/hazard.S</a>.</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>// f = fetch, d = decode, n = rename, p = dispatch, i = issue, c = complete, r = retire
+
+                                     timeline                                             tick          pc.upc     disasm                      seq_num
+[.ic.r........................................................................fdn]-(          40000) 0x00400078.0 movz x0, #0, #0           [         1]
+[.ic.r........................................................................fdn]-(          40000) 0x0040007c.0 movz x1, #1, #0           [         2]
+[....................fdn.ic.r....................................................]-(         120000) 0x00400080.0 movz x2, #2, #0           [         3]
+[....................fdn.pic.r...................................................]-(         120000) 0x00400084.0 add x3, x2, #1            [         4]
+[....................fdn.ic..r...................................................]-(         120000) 0x00400088.0 movz x4, #4, #0           [         5]
+[....................fdn.ic..r...................................................]-(         120000) 0x0040008c.0 movz x5, #5, #0           [         6]
+[....................fdn.ic..r...................................................]-(         120000) 0x00400090.0 movz x6, #6, #0           [         7]
+[....................fdn.ic..r...................................................]-(         120000) 0x00400094.0 movz x7, #7, #0           [         8]
+[....................fdn.ic..r...................................................]-(         120000) 0x00400098.0 movz x8, #8, #0           [         9]
+[....................fdn.pic.r...................................................]-(         120000) 0x0040009c.0 movz x9, #9, #0           [        10]
+[.....................fdn.ic.r...................................................]-(         120000) 0x004000a0.0 movz x10, #10, #0         [        11]
+[.....................fdn.ic..r..................................................]-(         120000) 0x004000a4.0 movz x11, #11, #0         [        12]
+[.....................fdn.ic..r..................................................]-(         120000) 0x004000a8.0 movz x12, #12, #0         [        13]
+[.....................fdn.ic..r..................................................]-(         120000) 0x004000ac.0 movz x13, #13, #0         [        14]
+[.....................fdn.pic.r..................................................]-(         120000) 0x004000b0.0 movz x14, #14, #0         [        15]
+[.....................fdn.pic.r..................................................]-(         120000) 0x004000b4.0 movz x15, #15, #0         [        16]
+[.....................fdn.pic.r..................................................]-(         120000) 0x004000b8.0 movz x16, #16, #0         [        17]
+[.....................fdn.pic.r..................................................]-(         120000) 0x004000bc.0 movz x17, #17, #0         [        18]
+[............................................fdn.ic.r............................]-(         160000) 0x004000c0.0 movz x18, #18, #0         [        19]
+[............................................fdn.ic.r............................]-(         160000) 0x004000c4.0 movz x19, #19, #0         [        20]
+[............................................fdn.ic.r............................]-(         160000) 0x004000c8.0 movz x20, #20, #0         [        21]
+[............................................fdn.ic.r............................]-(         160000) 0x004000cc.0 movz x21, #21, #0         [        22]
+[............................................fdn.ic.r............................]-(         160000) 0x004000d0.0 movz x22, #22, #0         [        23]
+[............................................fdn.ic.r............................]-(         160000) 0x004000d4.0 movz x23, #23, #0         [        24]
+[............................................fdn.pic.r...........................]-(         160000) 0x004000d8.0 movz x24, #24, #0         [        25]
+[............................................fdn.pic.r...........................]-(         160000) 0x004000dc.0 movz x25, #25, #0         [        26]
+[.............................................fdn.ic.r...........................]-(         160000) 0x004000e0.0 movz x0, #0, #0           [        27]
+[.............................................fdn.ic.r...........................]-(         160000) 0x004000e4.0 movz x8, #93, #0          [        28]</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>TODO understand how the hazard happens in detail.</p>
+</div>
+</div>
+<div class="sect5">
+<h6 id="gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-hazard4"><a class="anchor" href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-hazard4"></a><a class="link" href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-hazard4">19.20.4.6.3. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazard4</a></h6>
+<div class="paragraph">
+<p>Like <a href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-hazard">gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazard</a> but a hazard of depth 4: <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/arch/aarch64/freestanding/linux/hazard.S">userland/arch/aarch64/freestanding/linux/hazard.S</a>.</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>// f = fetch, d = decode, n = rename, p = dispatch, i = issue, c = complete, r = retire
+
+                                     timeline                                             tick          pc.upc     disasm                      seq_num
+[.ic.r........................................................................fdn]-(          40000) 0x00400078.0 movz x0, #0, #0           [         1]
+[.ic.r........................................................................fdn]-(          40000) 0x0040007c.0 movz x1, #1, #0           [         2]
+[....................fdn.ic.r....................................................]-(         120000) 0x00400080.0 movz x2, #2, #0           [         3]
+[....................fdn.pic.r...................................................]-(         120000) 0x00400084.0 add x3, x2, #1            [         4]
+[....................fdn.p.ic.r..................................................]-(         120000) 0x00400088.0 add x4, x3, #1            [         5]
+[....................fdn.p..ic.r.................................................]-(         120000) 0x0040008c.0 add x5, x4, #1            [         6]
+[....................fdn.p...ic.r................................................]-(         120000) 0x00400090.0 add x6, x5, #1            [         7]
+[....................fdn.ic.....r................................................]-(         120000) 0x00400094.0 movz x7, #7, #0           [         8]
+[....................fdn.ic.....r................................................]-(         120000) 0x00400098.0 movz x8, #8, #0           [         9]
+[....................fdn.ic.....r................................................]-(         120000) 0x0040009c.0 movz x9, #9, #0           [        10]
+[.....................fdn.ic....r................................................]-(         120000) 0x004000a0.0 movz x10, #10, #0         [        11]
+[.....................fdn.ic....r................................................]-(         120000) 0x004000a4.0 movz x11, #11, #0         [        12]
+[.....................fdn.ic....r................................................]-(         120000) 0x004000a8.0 movz x12, #12, #0         [        13]
+[.....................fdn.ic....r................................................]-(         120000) 0x004000ac.0 movz x13, #13, #0         [        14]
+[.....................fdn.ic.....r...............................................]-(         120000) 0x004000b0.0 movz x14, #14, #0         [        15]
+[.....................fdn.pic....r...............................................]-(         120000) 0x004000b4.0 movz x15, #15, #0         [        16]
+[.....................fdn.pic....r...............................................]-(         120000) 0x004000b8.0 movz x16, #16, #0         [        17]
+[.....................fdn.pic....r...............................................]-(         120000) 0x004000bc.0 movz x17, #17, #0         [        18]
+[............................................fdn.ic.r............................]-(         160000) 0x004000c0.0 movz x18, #18, #0         [        19]
+[............................................fdn.ic.r............................]-(         160000) 0x004000c4.0 movz x19, #19, #0         [        20]
+[............................................fdn.ic.r............................]-(         160000) 0x004000c8.0 movz x20, #20, #0         [        21]
+[............................................fdn.ic.r............................]-(         160000) 0x004000cc.0 movz x21, #21, #0         [        22]
+[............................................fdn.ic.r............................]-(         160000) 0x004000d0.0 movz x22, #22, #0         [        23]
+[............................................fdn.ic.r............................]-(         160000) 0x004000d4.0 movz x23, #23, #0         [        24]
+[............................................fdn.pic.r...........................]-(         160000) 0x004000d8.0 movz x24, #24, #0         [        25]
+[............................................fdn.pic.r...........................]-(         160000) 0x004000dc.0 movz x25, #25, #0         [        26]
+[.............................................fdn.ic.r...........................]-(         160000) 0x004000e0.0 movz x0, #0, #0           [        27]
+[.............................................fdn.ic.r...........................]-(         160000) 0x004000e4.0 movz x8, #93, #0          [        28]</pre>
+</div>
+</div>
+</div>
+<div class="sect5">
+<h6 id="gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-stall"><a class="anchor" href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-stall"></a><a class="link" href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-stall">19.20.4.6.4. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall</a></h6>
+<div class="paragraph">
+<p>Like <a href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-hazard">gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazard</a> but now with an LDR stall: <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/arch/aarch64/freestanding/linux/stall.S">userland/arch/aarch64/freestanding/linux/stall.S</a>.</p>
+</div>
+<div class="paragraph">
+<p>We can see here that:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>the addition of a data section entry changed our previous address setup a bit, the entry point was now 0x004000b0 which fits 4 instructions in the cacheline instead of 2</p>
+</li>
+<li>
+<p>the <a href="#arm-ldr-instruction">LDR</a> happens to be the fourth instruction, so it takes a long time to retire. The time is about 40k ticks, which is about the same time it takes for the instruction fetch as expected.</p>
+</li>
+<li>
+<p>fetch does not continue past the LDR, and so nothing is gained in this particular example, since the next instructions haven&#8217;t been fetched from memory yet!</p>
+</li>
+</ul>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>// f = fetch, d = decode, n = rename, p = dispatch, i = issue, c = complete, r = retire
+
+                                     timeline                                             tick          pc.upc     disasm                      seq_num
+[.ic.r........................................................................fdn]-(          40000) 0x004000b0.0 movz x0, #0, #0           [         1]
+[.ic.r........................................................................fdn]-(          40000) 0x004000b4.0 movz x1, #1, #0           [         2]
+[.ic.r........................................................................fdn]-(          40000) 0x004000b8.0 adr x2, #65780            [         3]
+[.............................................................................fdn]-(          40000) 0x004000bc.0 ldr x3, [x2]              [         4]
+[.pic............................................................................]-(          80000)     ...
+[................................r...............................................]-(         120000)     ...
+[....................fdn.ic......r...............................................]-(         120000) 0x004000c0.0 movz x4, #4, #0           [         5]
+[....................fdn.ic......r...............................................]-(         120000) 0x004000c4.0 movz x5, #5, #0           [         6]
+[....................fdn.ic......r...............................................]-(         120000) 0x004000c8.0 movz x6, #6, #0           [         7]
+[....................fdn.ic......r...............................................]-(         120000) 0x004000cc.0 movz x7, #7, #0           [         8]
+[....................fdn.ic......r...............................................]-(         120000) 0x004000d0.0 movz x8, #8, #0           [         9]
+[....................fdn.ic......r...............................................]-(         120000) 0x004000d4.0 movz x9, #9, #0           [        10]
+[....................fdn.pic.....r...............................................]-(         120000) 0x004000d8.0 movz x10, #10, #0         [        11]
+[....................fdn.pic......r..............................................]-(         120000) 0x004000dc.0 movz x11, #11, #0         [        12]
+[.....................fdn.ic......r..............................................]-(         120000) 0x004000e0.0 movz x12, #12, #0         [        13]
+[.....................fdn.ic......r..............................................]-(         120000) 0x004000e4.0 movz x13, #13, #0         [        14]
+[.....................fdn.ic......r..............................................]-(         120000) 0x004000e8.0 movz x14, #14, #0         [        15]
+[.....................fdn.ic......r..............................................]-(         120000) 0x004000ec.0 movz x15, #15, #0         [        16]
+[.....................fdn.pic.....r..............................................]-(         120000) 0x004000f0.0 movz x16, #16, #0         [        17]
+[.....................fdn.pic.....r..............................................]-(         120000) 0x004000f4.0 movz x17, #17, #0         [        18]
+[.....................fdn.pic.....r..............................................]-(         120000) 0x004000f8.0 movz x18, #18, #0         [        19]
+[.....................fdn.pic......r.............................................]-(         120000) 0x004000fc.0 movz x19, #19, #0         [        20]</pre>
+</div>
+</div>
+</div>
+<div class="sect5">
+<h6 id="gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-stall-gain"><a class="anchor" href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-stall-gain"></a><a class="link" href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-stall-gain">19.20.4.6.5. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall-gain</a></h6>
+<div class="paragraph">
+<p>Like <a href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-stall">gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall</a> but now with an LDR stall: <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/arch/aarch64/freestanding/linux/stall-gain.S">userland/arch/aarch64/freestanding/linux/stall-gain.S</a>.</p>
+</div>
+<div class="paragraph">
+<p>So in this case we see that there were actual potential gains, since the <code>movz x11</code> started running immediately. We just stopped at <code>movz x20</code> because a new ifetch was needed.</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>// f = fetch, d = decode, n = rename, p = dispatch, i = issue, c = complete, r = retire
+
+                                     timeline                                             tick          pc.upc     disasm                      seq_num
+[.ic.r........................................................................fdn]-(          40000) 0x004000b0.0 movz x0, #0, #0           [         1]
+[.ic.r........................................................................fdn]-(          40000) 0x004000b4.0 movz x1, #1, #0           [         2]
+[.ic.r........................................................................fdn]-(          40000) 0x004000b8.0 movz x2, #4, #0           [         3]
+[.ic.r........................................................................fdn]-(          40000) 0x004000bc.0 movz x3, #5, #0           [         4]
+[....................fdn.ic.r....................................................]-(         120000) 0x004000c0.0 adr x4, #65772            [         5]
+[....................fdn.pic.....................................................]-(         120000) 0x004000c4.0 ldr x5, [x4]              [         6]
+[........................................................r.......................]-(         160000)     ...
+[....................fdn.ic......................................................]-(         120000) 0x004000c8.0 movz x6, #6, #0           [         7]
+[........................................................r.......................]-(         160000)     ...
+[....................fdn.ic......................................................]-(         120000) 0x004000cc.0 movz x7, #7, #0           [         8]
+[........................................................r.......................]-(         160000)     ...
+[....................fdn.ic......................................................]-(         120000) 0x004000d0.0 movz x8, #8, #0           [         9]
+[........................................................r.......................]-(         160000)     ...
+[....................fdn.ic......................................................]-(         120000) 0x004000d4.0 movz x9, #9, #0           [        10]
+[........................................................r.......................]-(         160000)     ...
+[....................fdn.ic......................................................]-(         120000) 0x004000d8.0 movz x10, #10, #0         [        11]
+[........................................................r.......................]-(         160000)     ...
+[....................fdn.pic.....................................................]-(         120000) 0x004000dc.0 movz x11, #11, #0         [        12]
+[........................................................r.......................]-(         160000)     ...
+[.....................fdn.ic.....................................................]-(         120000) 0x004000e0.0 movz x12, #12, #0         [        13]
+[........................................................r.......................]-(         160000)     ...
+[.....................fdn.ic.....................................................]-(         120000) 0x004000e4.0 movz x13, #13, #0         [        14]
+[.........................................................r......................]-(         160000)     ...
+[.....................fdn.ic.....................................................]-(         120000) 0x004000e8.0 movz x14, #14, #0         [        15]
+[.........................................................r......................]-(         160000)     ...
+[.....................fdn.ic.....................................................]-(         120000) 0x004000ec.0 movz x15, #15, #0         [        16]
+[.........................................................r......................]-(         160000)     ...
+[.....................fdn.ic.....................................................]-(         120000) 0x004000f0.0 movz x16, #16, #0         [        17]
+[.........................................................r......................]-(         160000)     ...
+[.....................fdn.pic....................................................]-(         120000) 0x004000f4.0 movz x17, #17, #0         [        18]
+[.........................................................r......................]-(         160000)     ...
+[.....................fdn.pic....................................................]-(         120000) 0x004000f8.0 movz x18, #18, #0         [        19]
+[.........................................................r......................]-(         160000)     ...
+[.....................fdn.pic....................................................]-(         120000) 0x004000fc.0 movz x19, #19, #0         [        20]
+[.........................................................r......................]-(         160000)     ...
+[............................................fdn.ic.......r......................]-(         160000) 0x00400100.0 movz x20, #20, #0         [        21]
+[............................................fdn.ic........r.....................]-(         160000) 0x00400104.0 movz x21, #21, #0         [        22]
+[............................................fdn.ic........r.....................]-(         160000) 0x00400108.0 movz x22, #22, #0         [        23]
+[............................................fdn.ic........r.....................]-(         160000) 0x0040010c.0 movz x23, #23, #0         [        24]
+[............................................fdn.ic........r.....................]-(         160000) 0x00400110.0 movz x24, #24, #0         [        25]
+[............................................fdn.ic........r.....................]-(         160000) 0x00400114.0 movz x25, #25, #0         [        26]
+[............................................fdn.pic.......r.....................]-(         160000) 0x00400118.0 movz x26, #26, #0         [        27]
+[............................................fdn.pic.......r.....................]-(         160000) 0x0040011c.0 movz x27, #27, #0         [        28]
+[.............................................fdn.ic.......r.....................]-(         160000) 0x00400120.0 movz x28, #28, #0         [        29]
+[.............................................fdn.ic........r....................]-(         160000) 0x00400124.0 movz x29, #29, #0         [        30]
+[.............................................fdn.ic........r....................]-(         160000) 0x00400128.0 movz x0, #0, #0           [        31]
+[.............................................fdn.ic........r....................]-(         160000) 0x0040012c.0 movz x1, #1, #0           [        32]
+[.............................................fdn.pic.......r....................]-(         160000) 0x00400130.0 movz x2, #2, #0           [        33]
+[.............................................fdn.pic.......r....................]-(         160000) 0x00400134.0 movz x3, #3, #0           [        34]
+[.............................................fdn.pic.......r....................]-(         160000) 0x00400138.0 movz x4, #4, #0           [        35]
+[.............................................fdn.pic.......r....................]-(         160000) 0x0040013c.0 movz x5, #5, #0           [        36]</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>We now also understand the graph better from lines such as this:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>[....................fdn.pic.....................................................]-(         120000) 0x004000c4.0 ldr x5, [x4]              [         6]
+[........................................................r.......................]-(         160000)     ...
+[....................fdn.ic......................................................]-(         120000) 0x004000c8.0 movz x6, #6, #0           [         7]
+[........................................................r.......................]-(         160000)     ...</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>We see that extra lines are drawn (the <code>160000 &#8230;&#8203; lines</code> here) whenever something stalls for a period longer than the width of the visualisation.</p>
+</div>
+<div class="paragraph">
+<p>Things are still relatively readable because the wrapping aligns them with events that actually happened on that line directly e.g. <code>160000) 0x00400100.0 movz x20, #20, #0.</code>.</p>
+</div>
+<div class="paragraph">
+<p>But from this we kind of see the need for: <a href="#gem5-konata-o3-pipeline-viewer">gem5 Konata O3 pipeline viewer</a>.</p>
+</div>
+</div>
+<div class="sect5">
+<h6 id="gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-stall-hazard4"><a class="anchor" href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-stall-hazard4"></a><a class="link" href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-stall-hazard4">19.20.4.6.6. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall-hazard4</a></h6>
+<div class="paragraph">
+<p>Like <a href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-stall-gain">gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall-gain</a> but now with some dependencies after the LDR: <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/arch/aarch64/freestanding/linux/stall-hazard4.S">userland/arch/aarch64/freestanding/linux/stall-hazard4.S</a>.</p>
+</div>
+<div class="paragraph">
+<p>So in this case the <code>ic</code> of dependencies like <code>add x6, x5, #1</code> have to wait until the LDR is finished:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>// f = fetch, d = decode, n = rename, p = dispatch, i = issue, c = complete, r = retire
+
+                                     timeline                                             tick          pc.upc     disasm                      seq_num
+[.ic.r........................................................................fdn]-(          40000) 0x004000b0.0 movz x0, #0, #0           [         1]
+[.ic.r........................................................................fdn]-(          40000) 0x004000b4.0 movz x1, #1, #0           [         2]
+[.ic.r........................................................................fdn]-(          40000) 0x004000b8.0 movz x2, #4, #0           [         3]
+[.ic.r........................................................................fdn]-(          40000) 0x004000bc.0 movz x3, #5, #0           [         4]
+[....................fdn.ic.r....................................................]-(         120000) 0x004000c0.0 adr x4, #65772            [         5]
+[....................fdn.pic.....................................................]-(         120000) 0x004000c4.0 ldr x5, [x4]              [         6]
+[........................................................r.......................]-(         160000)     ...
+[....................fdn.p.......................................................]-(         120000) 0x004000c8.0 add x6, x5, #1            [         7]
+[......................................................ic.r......................]-(         160000)     ...
+[....................fdn.p.......................................................]-(         120000) 0x004000cc.0 add x7, x6, #1            [         8]
+[.......................................................ic.r.....................]-(         160000)     ...
+[....................fdn.p.......................................................]-(         120000) 0x004000d0.0 add x8, x7, #1            [         9]
+[........................................................ic.r....................]-(         160000)     ...
+[....................fdn.p.......................................................]-(         120000) 0x004000d4.0 add x9, x8, #1            [        10]
+[.........................................................ic.r...................]-(         160000)     ...
+[....................fdn.ic......................................................]-(         120000) 0x004000d8.0 movz x10, #10, #0         [        11]
+[............................................................r...................]-(         160000)     ...
+[....................fdn.ic......................................................]-(         120000) 0x004000dc.0 movz x11, #11, #0         [        12]
+[............................................................r...................]-(         160000)     ...
+[.....................fdn.ic.....................................................]-(         120000) 0x004000e0.0 movz x12, #12, #0         [        13]
+[............................................................r...................]-(         160000)     ...
+[.....................fdn.ic.....................................................]-(         120000) 0x004000e4.0 movz x13, #13, #0         [        14]
+[............................................................r...................]-(         160000)     ...
+[.....................fdn.ic.....................................................]-(         120000) 0x004000e8.0 movz x14, #14, #0         [        15]
+[............................................................r...................]-(         160000)     ...
+[.....................fdn.ic.....................................................]-(         120000) 0x004000ec.0 movz x15, #15, #0         [        16]
+[............................................................r...................]-(         160000)     ...
+[.....................fdn.ic.....................................................]-(         120000) 0x004000f0.0 movz x16, #16, #0         [        17]
+[............................................................r...................]-(         160000)     ...
+[.....................fdn.ic.....................................................]-(         120000) 0x004000f4.0 movz x17, #17, #0         [        18]
+[.............................................................r..................]-(         160000)     ...
+[.....................fdn.pic....................................................]-(         120000) 0x004000f8.0 movz x18, #18, #0         [        19]
+[.............................................................r..................]-(         160000)     ...
+[.....................fdn.pic....................................................]-(         120000) 0x004000fc.0 movz x19, #19, #0         [        20]
+[.............................................................r..................]-(         160000)     ...
+[............................................fdn.ic...........r..................]-(         160000) 0x00400100.0 movz x20, #20, #0         [        21]
+[............................................fdn.ic...........r..................]-(         160000) 0x00400104.0 movz x21, #21, #0         [        22]
+[............................................fdn.ic...........r..................]-(         160000) 0x00400108.0 movz x22, #22, #0         [        23]
+[............................................fdn.ic...........r..................]-(         160000) 0x0040010c.0 movz x23, #23, #0         [        24]
+[............................................fdn.ic...........r..................]-(         160000) 0x00400110.0 movz x24, #24, #0         [        25]
+[............................................fdn.ic............r.................]-(         160000) 0x00400114.0 movz x25, #25, #0         [        26]
+[............................................fdn.pic...........r.................]-(         160000) 0x00400118.0 movz x26, #26, #0         [        27]
+[............................................fdn.pic...........r.................]-(         160000) 0x0040011c.0 movz x27, #27, #0         [        28]
+[.............................................fdn.ic...........r.................]-(         160000) 0x00400120.0 movz x28, #28, #0         [        29]
+[.............................................fdn.ic...........r.................]-(         160000) 0x00400124.0 movz x29, #29, #0         [        30]
+[.............................................fdn.ic...........r.................]-(         160000) 0x00400128.0 movz x0, #0, #0           [        31]
+[.............................................fdn.ic...........r.................]-(         160000) 0x0040012c.0 movz x1, #1, #0           [        32]
+[.............................................fdn.pic..........r.................]-(         160000) 0x00400130.0 movz x2, #2, #0           [        33]
+[.............................................fdn.pic...........r................]-(         160000) 0x00400134.0 movz x3, #3, #0           [        34]
+[.............................................fdn.pic...........r................]-(         160000) 0x00400138.0 movz x4, #4, #0           [        35]
+[.............................................fdn.pic...........r................]-(         160000) 0x0040013c.0 movz x5, #5, #0           [        36]</pre>
+</div>
+</div>
+</div>
+<div class="sect5">
+<h6 id="gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-speculative"><a class="anchor" href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-speculative"></a><a class="link" href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-speculative">19.20.4.6.7. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: speculative</a></h6>
+<div class="paragraph">
+<p>Now let&#8217;s try to see some <a href="#speculative-execution">Speculative execution</a> in action with <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/arch/aarch64/freestanding/linux/speculative.S">userland/arch/aarch64/freestanding/linux/speculative.S</a>.</p>
+</div>
+<div class="paragraph">
+<p>That program is setup such that the branch is not taken if an extra CLI argument is passed with <code>--cli-args</code>.</p>
+</div>
+<div class="paragraph">
+<p>We purposefully set things up so that speculation will be running from the icache so we can see what is going on more clearly without ifetch stalls.</p>
+</div>
+<div class="paragraph">
+<p>Without an extra CLI argument (the branch is taken):</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>// f = fetch, d = decode, n = rename, p = dispatch, i = issue, c = complete, r = retire
+
+                                     timeline                                             tick          pc.upc     disasm                      seq_num
+[.............................................................................fdn]-(          40000) 0x00400078.0 ldr x0, [sp]              [         1]
+[.ic.............................................................................]-(          80000)     ...
+[................................r...............................................]-(         120000)     ...
+[.............................................................................fdn]-(          40000) 0x0040007c.0 movz x1, #1, #0           [         2]
+[.ic.............................................................................]-(          80000)     ...
+[................................r...............................................]-(         120000)     ...
+[....................fdn.ic......r...............................................]-(         120000) 0x00400080.0 movz x2, #2, #0           [         3]
+[....................fdn.ic......r...............................................]-(         120000) 0x00400084.0 movz x3, #3, #0           [         4]
+[....................fdn.ic......r...............................................]-(         120000) 0x00400088.0 movz x4, #4, #0           [         5]
+[....................fdn.ic......r...............................................]-(         120000) 0x0040008c.0 movz x5, #5, #0           [         6]
+[....................fdn.ic......r...............................................]-(         120000) 0x00400090.0 movz x6, #6, #0           [         7]
+[....................fdn.p.....ic..r.............................................]-(         120000) 0x00400094.0 subs x0, #2               [         8]
+[....................fdn.ic........r.............................................]-(         120000) 0x00400098.0 movz x0, #3, #0           [         9]
+[....................fdn.p......ic.r.............................................]-(         120000) 0x0040009c.0 b.lt 0x400080             [        10]
+[=====================fdn=ic=====================================================]-(         120000) 0x004000a0.0 -----movz x10, #10, #0    [        11]
+[=====================fdn=ic=====================================================]-(         120000) 0x004000a4.0 -----movz x11, #11, #0    [        12]
+[=====================fdn=ic=====================================================]-(         120000) 0x004000a8.0 -----movz x12, #12, #0    [        13]
+[=====================fdn=ic=====================================================]-(         120000) 0x004000ac.0 -----movz x13, #13, #0    [        14]
+[=====================fdn=ic=====================================================]-(         120000) 0x004000b0.0 -----movz x14, #14, #0    [        15]
+[=====================fdn=ic=====================================================]-(         120000) 0x004000b4.0 -----movz x15, #15, #0    [        16]
+[=====================fdn=pic====================================================]-(         120000) 0x004000b8.0 -----movz x16, #16, #0    [        17]
+[=====================fdn=pic====================================================]-(         120000) 0x004000bc.0 -----movz x17, #17, #0    [        18]
+[.....................................fdn.ic.r...................................]-(         120000) 0x00400080.0 movz x2, #2, #0           [        19]
+[.....................................fdn.ic.r...................................]-(         120000) 0x00400084.0 movz x3, #3, #0           [        20]
+[.....................................fdn.ic.r...................................]-(         120000) 0x00400088.0 movz x4, #4, #0           [        21]
+[.....................................fdn.ic.r...................................]-(         120000) 0x0040008c.0 movz x5, #5, #0           [        22]
+[.....................................fdn.ic.r...................................]-(         120000) 0x00400090.0 movz x6, #6, #0           [        23]
+[.....................................fdn.pic.r..................................]-(         120000) 0x00400098.0 movz x0, #3, #0           [        25]
+[.....................................fdn.pic.r..................................]-(         120000) 0x0040009c.0 b.lt 0x400080             [        26]
+[......................................fdn.ic.r..................................]-(         120000) 0x004000a0.0 movz x10, #10, #0         [        27]
+[......................................fdn.ic.r..................................]-(         120000) 0x004000a4.0 movz x11, #11, #0         [        28]
+[......................................fdn.ic.r..................................]-(         120000) 0x004000a8.0 movz x12, #12, #0         [        29]
+[......................................fdn.ic.r..................................]-(         120000) 0x004000ac.0 movz x13, #13, #0         [        30]
+[......................................fdn.pic.r.................................]-(         120000) 0x004000b0.0 movz x14, #14, #0         [        31]
+[......................................fdn.pic.r.................................]-(         120000) 0x004000b4.0 movz x15, #15, #0         [        32]
+[......................................fdn.pic.r.................................]-(         120000) 0x004000b8.0 movz x16, #16, #0         [        33]
+[......................................fdn.pic.r.................................]-(         120000) 0x004000bc.0 movz x17, #17, #0         [        34]
+[.............................................fdn.ic.r...........................]-(         160000) 0x004000c0.0 movz x0, #0, #0           [        35]
+[.............................................fdn.ic.r...........................]-(         160000) 0x004000c4.0 movz x8, #93, #0          [        36]</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>So here we see that the CPU mispredicted! After the <a href="#arm-branch-instructions">BLT instruction</a>, the CPU continued to run <code>movz x10</code>, assuming that the branch would not be taken.</p>
+</div>
+<div class="paragraph">
+<p>Then, at time 120000, the LDR data came back, after the wrong prediction had already been fully executed.</p>
+</div>
+<div class="paragraph">
+<p>The CPU then noticed that it mispredicted, and so it started again from the correct branch target <code>movz x2</code>, and the instructions that were thrown away are marked as <code>=====</code> in the timeline.</p>
+</div>
+<div class="paragraph">
+<p>We can also see some <a href="#branch-predictor">Branch predictor</a> log lines in the <code>O3CPUAll</code> log:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre> 130000: Fetch: system.cpu.fetch: [tid:0] [sn:10] Branch at PC 0x40009c predicted to be not taken
+ 130000: Fetch: system.cpu.fetch: [tid:0] [sn:10] Branch at PC 0x40009c predicted to go to (0x4000a0=&gt;0x4000a4).(0=&gt;1)
+
+ 131500: Commit: system.cpu.commit: [tid:10] [sn:0] Inserting PC (0x40009c=&gt;0x4000a0).(0=&gt;1) into ROB.
+ 131500: ROB: system.cpu.rob: Adding inst PC (0x40009c=&gt;0x4000a0).(0=&gt;1) to the ROB.
+ 131500: ROB: system.cpu.rob: [tid:0] Now has 10 instructions.
+
+ 132000: IEW: system.cpu.iew: [tid:0] Issue: Adding PC (0x40009c=&gt;0x4000a0).(0=&gt;1) [sn:10] [tid:0] to IQ.
+ 132000: IQ: system.cpu.iq: Adding instruction [sn:10] PC (0x40009c=&gt;0x4000a0).(0=&gt;1) to the IQ.
+ 132000: IQ: system.cpu.iq: Instruction PC (0x40009c=&gt;0x4000a0).(0=&gt;1) has src reg 6 (CCRegClass) that is being added to the dependency chain.
+ 132000: IQ: system.cpu.iq: Instruction PC (0x40009c=&gt;0x4000a0).(0=&gt;1) has src reg 8 (CCRegClass) that is being added to the dependency chain.
+ 132000: IQ: system.cpu.iq: Instruction PC (0x40009c=&gt;0x4000a0).(0=&gt;1) has src reg 7 (CCRegClass) that is being added to the dependency chain.
+
+ 135500: IQ: system.cpu.iq: Waking up a dependent instruction, [sn:10] PC (0x40009c=&gt;0x4000a0).(0=&gt;1).
+ 135500: IQ: global: [sn:10] has 1 ready out of 3 sources. RTI 0)
+ 135500: IQ: system.cpu.iq: Waking any dependents on register 7 (CCRegClass).
+ 135500: IQ: system.cpu.iq: Waking up a dependent instruction, [sn:10] PC (0x40009c=&gt;0x4000a0).(0=&gt;1).
+ 135500: IQ: global: [sn:10] has 2 ready out of 3 sources. RTI 0)
+ 135500: IQ: system.cpu.iq: Waking any dependents on register 8 (CCRegClass).
+ 135500: IQ: system.cpu.iq: Waking up a dependent instruction, [sn:10] PC (0x40009c=&gt;0x4000a0).(0=&gt;1).
+ 135500: IQ: global: [sn:10] has 3 ready out of 3 sources. RTI 0)
+ 135500: IQ: system.cpu.iq: Instruction is ready to issue, putting it onto the ready list, PC (0x40009c=&gt;0x4000a0).(0=&gt;1) opclass:1 [sn:10].
+ 135500: IEW: system.cpu.iew: Setting Destination Register 6 (CCRegClass)
+ 135500: Scoreboard: system.cpu.scoreboard: Setting reg 6 (CCRegClass) as ready
+ 135500: IEW: system.cpu.iew: Setting Destination Register 7 (CCRegClass)
+ 135500: Scoreboard: system.cpu.scoreboard: Setting reg 7 (CCRegClass) as ready
+ 135500: IEW: system.cpu.iew: Setting Destination Register 8 (CCRegClass)
+ 135500: Scoreboard: system.cpu.scoreboard: Setting reg 8 (CCRegClass) as ready
+ 135500: IQ: system.cpu.iq: Attempting to schedule ready instructions from the IQ.
+ 135500: IQ: system.cpu.iq: Thread 0: Issuing instruction PC (0x40009c=&gt;0x4000a0).(0=&gt;1) [sn:10]
+
+ 136000: IEW: system.cpu.iew: Execute: Processing PC (0x40009c=&gt;0x4000a0).(0=&gt;1), [tid:0] [sn:10].
+ 136000: IEW: global: RegFile: Access to cc register 6, has data 0x2
+ 136000: IEW: global: RegFile: Access to cc register 8, has data 0
+ 136000: IEW: global: RegFile: Access to cc register 7, has data 0
+ 136000: IEW: system.cpu.iew: Current wb cycle: 0, width: 8, numInst: 0
+wbActual:0
+ 136000: IEW: system.cpu.iew: [tid:0] [sn:10] Execute: Branch mispredict detected.
+ 136000: IEW: system.cpu.iew: [tid:0] [sn:10] Predicted target was PC: (0x4000a0=&gt;0x4000a4).(0=&gt;1)
+ 136000: IEW: system.cpu.iew: [tid:0] [sn:10] Execute: Redirecting fetch to PC: (0x40009c=&gt;0x400080).(0=&gt;1)
+ 136000: IEW: system.cpu.iew: [tid:0] [sn:10] Squashing from a specific instruction, PC: (0x40009c=&gt;0x400080).(0=&gt;1)
+
+ 136500: Commit: system.cpu.commit: [tid:0] Squashing due to branch mispred PC:0x40009c [sn:10]
+ 136500: Commit: system.cpu.commit: [tid:0] Redirecting to PC 0x400084
+ 136500: ROB: system.cpu.rob: Starting to squash within the ROB.
+ 136500: ROB: system.cpu.rob: [tid:0] Squashing instructions until [sn:10].
+ 136500: ROB: system.cpu.rob: [tid:0] Squashing instruction PC (0x4000bc=&gt;0x4000c0).(0=&gt;1), seq num 18.
+ 136500: ROB: system.cpu.rob: [tid:0] Squashing instruction PC (0x4000b8=&gt;0x4000bc).(0=&gt;1), seq num 17.
+ 136500: ROB: system.cpu.rob: [tid:0] Squashing instruction PC (0x4000b4=&gt;0x4000b8).(0=&gt;1), seq num 16.
+ 136500: ROB: system.cpu.rob: [tid:0] Squashing instruction PC (0x4000b0=&gt;0x4000b4).(0=&gt;1), seq num 15.
+ 136500: ROB: system.cpu.rob: [tid:0] Squashing instruction PC (0x4000ac=&gt;0x4000b0).(0=&gt;1), seq num 14.
+ 136500: ROB: system.cpu.rob: [tid:0] Squashing instruction PC (0x4000a8=&gt;0x4000ac).(0=&gt;1), seq num 13.
+ 136500: ROB: system.cpu.rob: [tid:0] Squashing instruction PC (0x4000a4=&gt;0x4000a8).(0=&gt;1), seq num 12.
+ 136500: ROB: system.cpu.rob: [tid:0] Squashing instruction PC (0x4000a0=&gt;0x4000a4).(0=&gt;1), seq num 11.
+ 136500: ROB: system.cpu.rob: [tid:0] Done squashing instructions.
+ 136500: Commit: system.cpu.commit: [tid:0] Marking PC (0x40009c=&gt;0x400080).(0=&gt;1), [sn:10] ready within ROB.
+
+ 137000: Commit: system.cpu.commit: [tid:0] [sn:10] Committing instruction with PC (0x40009c=&gt;0x400080).(0=&gt;1)
+ 130000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+36    :   b.lt   0x400080          : IntAlu :   FetchSeq=10  CPSeq=10  flags=(IsControl|IsDirectControl|IsCondControl)
+ 137000: ROB: system.cpu.rob: [tid:0] Retiring head instruction, instruction PC (0x40009c=&gt;0x400080).(0=&gt;1), [sn:10]
+ 137000: O3CPU: system.cpu: Removing committed instruction [tid:0] PC (0x40009c=&gt;0x400080).(0=&gt;1) [sn:10]
+ 137000: Commit: system.cpu.commit: Trying to commit head instruction, [tid:0] [sn:11]
+ 137000: Commit: system.cpu.commit: Retiring squashed instruction from ROB.
+
+ 137000: Commit: system.cpu.commit: Trying to commit head instruction, [tid:0] [sn:10]
+ 137000: Commit: system.cpu.commit: [tid:0] [sn:10] Committing instruction with PC (0x40009c=&gt;0x400080).(0=&gt;1)
+ 130000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+36    :   b.lt   0x400080          : IntAlu :   FetchSeq=10  CPSeq=10  flags=(IsControl|IsDirectControl|IsCondControl)
+
+ 138500: Fetch: system.cpu.fetch: [tid:0] [sn:26] Branch at PC 0x40009c predicted to be not taken
+ 138500: Fetch: system.cpu.fetch: [tid:0] [sn:26] Branch at PC 0x40009c predicted to go to (0x4000a0=&gt;0x4000a4).(0=&gt;1)
+
+ 142500: Commit: system.cpu.commit: [tid:0] [sn:26] Committing instruction with PC (0x40009c=&gt;0x4000a0).(0=&gt;1)
+ 138500: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+36    :   b.lt   0x400080          : IntAlu :   FetchSeq=26  CPSeq=18  flags=(IsControl|IsDirectControl|IsCondControl)
+ 142500: ROB: system.cpu.rob: [tid:0] Retiring head instruction, instruction PC (0x40009c=&gt;0x4000a0).(0=&gt;1), [sn:26]</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>With an extra CLI (the branch is not taken):</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>// f = fetch, d = decode, n = rename, p = dispatch, i = issue, c = complete, r = retire
+
+                                     timeline                                             tick          pc.upc     disasm                      seq_num
+[.............................................................................fdn]-(          40000) 0x00400078.0 ldr x0, [sp]              [         1]
+[.ic.............................................................................]-(          80000)     ...
+[................................r...............................................]-(         120000)     ...
+[.............................................................................fdn]-(          40000) 0x0040007c.0 movz x1, #1, #0           [         2]
+[.ic.............................................................................]-(          80000)     ...
+[................................r...............................................]-(         120000)     ...
+[....................fdn.ic......r...............................................]-(         120000) 0x00400080.0 movz x2, #2, #0           [         3]
+[....................fdn.ic......r...............................................]-(         120000) 0x00400084.0 movz x3, #3, #0           [         4]
+[....................fdn.ic......r...............................................]-(         120000) 0x00400088.0 movz x4, #4, #0           [         5]
+[....................fdn.ic......r...............................................]-(         120000) 0x0040008c.0 movz x5, #5, #0           [         6]
+[....................fdn.ic......r...............................................]-(         120000) 0x00400090.0 movz x6, #6, #0           [         7]
+[....................fdn.ic.......r..............................................]-(         120000) 0x00400098.0 movz x0, #3, #0           [         9]
+[....................fdn.p......ic.r.............................................]-(         120000) 0x0040009c.0 b.lt 0x400080             [        10]
+[.....................fdn.ic.......r.............................................]-(         120000) 0x004000a0.0 movz x10, #10, #0         [        11]
+[.....................fdn.ic.......r.............................................]-(         120000) 0x004000a4.0 movz x11, #11, #0         [        12]
+[.....................fdn.ic.......r.............................................]-(         120000) 0x004000a8.0 movz x12, #12, #0         [        13]
+[.....................fdn.ic.......r.............................................]-(         120000) 0x004000ac.0 movz x13, #13, #0         [        14]
+[.....................fdn.ic.......r.............................................]-(         120000) 0x004000b0.0 movz x14, #14, #0         [        15]
+[.....................fdn.ic.......r.............................................]-(         120000) 0x004000b4.0 movz x15, #15, #0         [        16]
+[.....................fdn.pic......r.............................................]-(         120000) 0x004000b8.0 movz x16, #16, #0         [        17]
+[.....................fdn.pic.......r............................................]-(         120000) 0x004000bc.0 movz x17, #17, #0         [        18]
+[............................................fdn.ic.r............................]-(         160000) 0x004000c0.0 movz x0, #0, #0           [        19]
+[............................................fdn.ic.r............................]-(         160000) 0x004000c4.0 movz x8, #93, #0          [        20]</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>So this time the prediction was correct. Retire is delayed until the memory comes back, but we otherwise just kept running forward until hitting the next ifetch cache line.</p>
+</div>
 </div>
 </div>
 </div>
@@ -25315,16 +26567,61 @@ namespace ArmISAInst {
 <div class="sect4">
 <h5 id="gem5-microops"><a class="anchor" href="#gem5-microops"></a><a class="link" href="#gem5-microops">19.20.5.2. gem5 microops</a></h5>
 <div class="paragraph">
-<p>TODO</p>
-</div>
-<div class="paragraph">
 <p>Some gem5 instructions break down into multiple microops.</p>
 </div>
 <div class="paragraph">
 <p>Microops are very similar to regular instructions, and show on the <a href="#gem5-execall-trace-format">gem5 ExecAll trace format</a> since that flag implies <code>ExecMicro</code>.</p>
 </div>
 <div class="paragraph">
-<p>On aarch64 for example, one of the simplest microoped instructions is <a href="#armv8-aarch64-ldp-and-stp-instructions">STP</a>, which does the relatively complex operation of storing two values to memory at once, and is therefore a good candidate for being broken down into microops.</p>
+<p>On aarch64 for example, one of the simplest microoped instructions is <a href="#armv8-aarch64-ldp-and-stp-instructions">STP</a>, which does the relatively complex operation of storing two values to memory at once, and is therefore a good candidate for being broken down into microops. We can observe it when executing:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>./run \
+  --arch arch64 \
+  --emulator gem5 \
+  --trace-insts-stdout \
+  --userland userland/arch/aarch64/freestanding/linux/disassembly_test.S \
+;</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>which contains in gem5&#8217;s broken-ish disassembly that the input:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>stp x1, x2 [x0, 16]</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>generated the output:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>  16500: system.cpu: A0 T0 : @_start+108    : stp
+  16500: system.cpu: A0 T0 : @_start+108. 0 :   addxi_uop   ureg0, x0, #16 : IntAlu :  D=0x0000000000420010  flags=(IsInteger|IsMicroop|IsDelayedCommit|IsFirstMicroop)
+  17000: system.cpu: A0 T0 : @_start+108. 1 :   strxi_uop   w1, [ureg0]  : MemWrite :  D=0x000000009abcdef0 A=0x420010  flags=(IsInteger|IsMemRef|IsStore|IsMicroop|IsDelayedCommit)
+  17500: system.cpu: A0 T0 : @_start+108. 2 :   strxi_uop   w2, [ureg0, #8] : MemWrite :  D=0x0000000000000002 A=0x420018  flags=(IsInteger|IsMemRef|IsStore|IsMicroop|IsLastMicroop)</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Where <code>@_start+108. 0</code>, <code>@_start+108. 1</code> and <code>@_start+108. 2</code> all happen at the same PC, and are therefore microops of STP.</p>
+</div>
+<div class="paragraph">
+<p>From their names, which are of course not specified in the <a href="#armarm8">ARMv8 architecture reference manual</a>, we guess that:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><code>addxi_uop</code>: adds 16</p>
+</li>
+<li>
+<p><code>strxi_uop</code>: stores one of the two members of the pair, like a regular STR</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>From the gem5 source code, we see that STP is a <code>class LdpStp : public PairMemOp</code>, and then the constructor of <code>PairMemOp</code> sets up the microops depending on the exact type of LDP/STP:</p>
 </div>
 </div>
 </div>
@@ -25347,7 +26644,7 @@ namespace ArmISAInst {
 <p>functional: get the value magically, do not update caches, see also: <a href="#gem5-functional-requests">gem5 functional requests</a></p>
 </li>
 <li>
-<p>atomic: get the value now without making a <a href="#gem5-event-queue">separate event</a>, but do not update caches</p>
+<p>atomic: get the value now without making a <a href="#gem5-event-queue">separate event</a>, but do not update caches. Cannot work in <a href="#gem5-ruby-build">Ruby</a> due to fundamental limitations, mentioned in passing at: <a href="https://gem5.atlassian.net/browse/GEM5-676" class="bare">https://gem5.atlassian.net/browse/GEM5-676</a></p>
 </li>
 <li>
 <p>timing: get the value simulating delays and updating caches</p>
@@ -25563,6 +26860,33 @@ TimingSimpleCPU::finishTranslation(WholeTranslationState *state)
 <div class="paragraph">
 <p>Therefore, here it makes sense for gem5 syscall implementation, which does not actually have a real kernel running, to just make a functional request and be done with it, since the impact of cache changes done by this read would be insignificant to the cost of an actual full context switch that would happen on a real syscall.</p>
 </div>
+<div class="paragraph">
+<p>It is generally hard to implement functional requests for <a href="#gem5-ruby-build">Ruby</a> runs, because packets are flying through the memory system in a transient state, and there is no simple way of finding exactly which ones might have the latest version of the memory. See for example:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><a href="https://gem5.atlassian.net/browse/GEM5-496" class="bare">https://gem5.atlassian.net/browse/GEM5-496</a></p>
+</li>
+<li>
+<p><a href="https://gem5.atlassian.net/browse/GEM5-604" class="bare">https://gem5.atlassian.net/browse/GEM5-604</a></p>
+</li>
+<li>
+<p><a href="https://gem5.atlassian.net/browse/GEM5-675" class="bare">https://gem5.atlassian.net/browse/GEM5-675</a></p>
+</li>
+<li>
+<p><a href="https://gem5.atlassian.net/browse/GEM5-676" class="bare">https://gem5.atlassian.net/browse/GEM5-676</a></p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The typical error message in that case is:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>fatal: Ruby functional read failed for address</pre>
+</div>
+</div>
 </div>
 </div>
 </div>
@@ -26393,11 +27717,10 @@ readFunc(SyscallDesc *desc, ThreadContext *tc,
 <div class="sect3">
 <h4 id="gem5-functional-units"><a class="anchor" href="#gem5-functional-units"></a><a class="link" href="#gem5-functional-units">19.20.8. gem5 functional units</a></h4>
 <div class="paragraph">
-<p>TODO</p>
-</div>
-<div class="paragraph">
-<p>Each instruction is marked with a class, and each class can execute in a given functional unit.</p>
+<p>Each instruction is marked with a class, and each class can execute in a given <a href="#execution-unit">functional unit</a>.</p>
 </div>
+<div class="sect4">
+<h5 id="gem5-minorcpu-default-functional-units"><a class="anchor" href="#gem5-minorcpu-default-functional-units"></a><a class="link" href="#gem5-minorcpu-default-functional-units">19.20.8.1. gem5 <code>MinorCPU</code> default functional units</a></h5>
 <div class="paragraph">
 <p>Which units are available is visible for example on the <a href="#gem5-config-ini">gem5 config.ini</a> of a <a href="#gem5-minorcpu">gem5 MinorCPU</a> run. Functional units are not present in simple CPUs like <a href="#gem5-timingsimplecpu">gem5 <code>TimingSimpleCPU</code></a>.</p>
 </div>
@@ -26411,7 +27734,6 @@ readFunc(SyscallDesc *desc, ThreadContext *tc,
   --emulator gem5 \
   --userland userland/arch/aarch64/freestanding/linux/hello.S \
   --trace-insts-stdout \
-  -N1 \
   -- \
   --cpu-type MinorCPU \
   --caches</pre>
@@ -26424,7 +27746,233 @@ readFunc(SyscallDesc *desc, ThreadContext *tc,
 <div class="content">
 <pre>[system.cpu]
 type=MinorCPU
-children=branchPred dcache dtb executeFuncUnits icache interrupts isa itb power_state tracer workload</pre>
+children=branchPred dcache dtb executeFuncUnits icache interrupts isa itb power_state tracer workload
+executeInputWidth=2
+executeIssueLimit=2</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Here also note the <code>executeInputWidth=2</code> and <code>executeIssueLimit=2</code> suggesting that this is a <a href="#superscalar-processor">dual issue superscalar processor</a>.</p>
+</div>
+<div class="paragraph">
+<p>The <code>system.cpu</code> points to:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>[system.cpu.executeFuncUnits]
+type=MinorFUPool
+children=funcUnits0 funcUnits1 funcUnits2 funcUnits3 funcUnits4 funcUnits5 funcUnits6 funcUnits7</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>and the two first units are in full:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>[system.cpu.executeFuncUnits.funcUnits0]
+type=MinorFU
+children=opClasses timings
+opClasses=system.cpu.executeFuncUnits.funcUnits0.opClasses
+opLat=3
+
+[system.cpu.executeFuncUnits.funcUnits0.opClasses]
+type=MinorOpClassSet
+children=opClasses
+
+[system.cpu.executeFuncUnits.funcUnits0.opClasses.opClasses]
+type=MinorOpClass
+opClass=IntAlu</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>and:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>[system.cpu.executeFuncUnits.funcUnits1]
+type=MinorFU
+children=opClasses timings
+opLat=3
+
+[system.cpu.executeFuncUnits.funcUnits1.opClasses]
+type=MinorOpClassSet
+children=opClasses
+opClasses=system.cpu.executeFuncUnits.funcUnits1.opClasses.opClasses
+
+[system.cpu.executeFuncUnits.funcUnits1.opClasses.opClasses]
+type=MinorOpClass
+opClass=IntAlu</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>So we understand that both:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>the first and second functional units are <code>IntAlu</code>, so doing integer arithmetic operations</p>
+</li>
+<li>
+<p>both have a latency of 3</p>
+</li>
+<li>
+<p>each functional unit can have a set of <code>opClass</code> with more than one type. Those first two units just happen to have a single type.</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The full list is:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>0, 1: <code>IntAlu</code>, <code>opLat=3</code></p>
+</li>
+<li>
+<p>2: <code>IntMult</code>, <code>opLat=3</code></p>
+</li>
+<li>
+<p>3: <code>IntDiv</code>, <code>opLat=9</code>. So we see that a more complex operation such as division has higher latency.</p>
+</li>
+<li>
+<p>4: <code>FloatAdd</code>, <code>FloatCmp</code>, and a gazillion other floating point related things. <code>opLat=6</code>.</p>
+</li>
+<li>
+<p>5: <code>SimdPredAlu</code>: TODO SVE-related? <code>opLat=3</code></p>
+</li>
+<li>
+<p>6: <code>MemRead</code>, <code>MemWrite</code>, <code>FloatMemRead</code>, <code>FloatMemWrite</code>. <code>opLat=1</code></p>
+</li>
+<li>
+<p>7: <code>IprAccess</code> (TODO), <code>InstPrefetch</code></p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>These are of course all specified in <a href="#gem5-python-c-interaction">from the Python</a> at <code>src/cpu/minor/MinorCPU.py</code>:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>class MinorDefaultFUPool(MinorFUPool):
+    funcUnits = [MinorDefaultIntFU(), MinorDefaultIntFU(),
+        MinorDefaultIntMulFU(), MinorDefaultIntDivFU(),
+        MinorDefaultFloatSimdFU(), MinorDefaultPredFU(),
+        MinorDefaultMemFU(), MinorDefaultMiscFU()]</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>We then expect that each instruction has a certain <code>opClass</code> that determines on which unit it can run.</p>
+</div>
+<div class="paragraph">
+<p>For example: <code>class AddImm</code>, which is what we get on a simple <code>add x1, x2, 0</code>, sets itself as an <code>IntAluOp</code> on the constructor as expected:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>    AddImm::AddImm(ExtMachInst machInst,
+                                          IntRegIndex _dest,
+                                          IntRegIndex _op1,
+                                          uint32_t _imm,
+                                          bool _rotC)
+        : DataImmOp("add", machInst, IntAluOp,
+                         _dest, _op1, _imm, _rotC)</pre>
+</div>
+</div>
+</div>
+<div class="sect4">
+<h5 id="gem5-derivo3cpu-default-functional-units"><a class="anchor" href="#gem5-derivo3cpu-default-functional-units"></a><a class="link" href="#gem5-derivo3cpu-default-functional-units">19.20.8.2. gem5 DerivO3CPU default functional units</a></h5>
+<div class="paragraph">
+<p>On gem5 3ca404da175a66e0b958165ad75eb5f54cb5e772, after running:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>./run   \
+  --arch aarch64 \
+  --emulator gem5 \
+  --userland userland/arch/aarch64/freestanding/linux/hello.S \
+  --trace-insts-stdout \
+  -- \
+  --cpu-type Derivo3CPU \
+  --caches</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>we see:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>[system.cpu]
+type=DerivO3CPU
+children=branchPred dcache dtb fuPool icache interrupts isa itb power_state tracer workload</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>and following <code>fuPool</code>:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>[system.cpu.fuPool]
+type=FUPool
+children=FUList0 FUList1 FUList2 FUList3 FUList4 FUList5 FUList6 FUList7 FUList8 FUList9</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>so for example <code>FUList0</code> is:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>[system.cpu.fuPool.FUList0]
+type=FUDesc
+children=opList
+count=6
+eventq_index=0
+opList=system.cpu.fuPool.FUList0.opList
+
+[system.cpu.fuPool.FUList0.opList]
+type=OpDesc
+eventq_index=0
+opClass=IntAlu
+opLat=1
+pipelined=true</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>and <code>FUList1</code>:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>[system.cpu.fuPool.FUList1.opList0]
+type=OpDesc
+eventq_index=0
+opClass=IntMult
+opLat=3
+pipelined=true
+
+[system.cpu.fuPool.FUList1.opList1]
+type=OpDesc
+eventq_index=0
+opClass=IntDiv
+opLat=20
+pipelined=false</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>So summarizing all units we have:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>0, 1: <code>IntAlu</code> with <code>opLat=3</code></p>
+</li>
+<li>
+<p>2: <code>IntMult</code> with <code>opLat=3</code> and <code>IntDiv</code> with <code>opLat=20</code></p>
+</li>
+<li>
+<p>3: <code>FloatAdd</code>, <code>FloatCmp</code>, <code>FloatCvt</code> with <code>opLat=2</code></p>
+</li>
+<li>
+<p>TODO lazy to finish the list :-)</p>
+</li>
+</ul>
 </div>
 </div>
 </div>
@@ -26679,6 +28227,107 @@ build/ARM/config/the_isa.hh
 </div>
 </div>
 </div>
+<div class="sect2">
+<h3 id="gensim"><a class="anchor" href="#gensim"></a><a class="link" href="#gensim">19.21. Gensim</a></h3>
+<div class="paragraph">
+<p><a href="https://gensim.org" class="bare">https://gensim.org</a></p>
+</div>
+<div class="paragraph">
+<p><a href="https://bitbucket.org/gensim/gensim" class="bare">https://bitbucket.org/gensim/gensim</a></p>
+</div>
+<div class="paragraph">
+<p>MIT licensed <a href="#binary-translation">Binary translation</a> simulator, so a bit like an MIT <a href="#qemu">QEMU</a>.</p>
+</div>
+<div class="paragraph">
+<p>Video showing it boot Linux fast: <a href="https://www.youtube.com/watch?v=aZXx17oYumc" class="bare">https://www.youtube.com/watch?v=aZXx17oYumc</a></p>
+</div>
+<div class="paragraph">
+<p>Its name is unfortunately completely and totally overshadowed by an unrelated software with the sane name: <a href="https://radimrehurek.com/gensim/" class="bare">https://radimrehurek.com/gensim/</a></p>
+</div>
+<div class="paragraph">
+<p>TODO: advantages over QEMU. Like the name implies, they seem to have a nice ISA description language. From quick internals look, seems to generate LLVM intermediate language, which sound good.</p>
+</div>
+<div class="paragraph">
+<p>Build on Ubuntu 20.04:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>sudo apt install libantlr3c-dev
+cd submodule/gensim
+make</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>First fails with:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>arm-none-eabi-gcc: error: unrecognized -march target: armv5</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Let&#8217;s try just armv8, who cares about arvm5!!!</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>mkdir build
+cd build
+cmake -DTESTING_ENABLED=FALSE -DCMAKE_BUILD_TYPE=DEBUGOPT ..
+make -j`nproc` model-armv8</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Now fails as mentioned at <a href="https://bitbucket.org/gensim/gensim/issues/34/build-fails-with-unrecognised-intrinsic" class="bare">https://bitbucket.org/gensim/gensim/issues/34/build-fails-with-unrecognised-intrinsic</a>:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>terminate called after throwing an instance of 'std::logic_error'
+  what():  Unrecognised intrinsic: __builtin_abs64
+Aborted (core dumped)</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Get the failing command with:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>make VERBOSE=1 model-armv8</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>and we see some code generation step:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>cd /home/ciro/bak/git/linux-kernel-module-cheat/submodules/gensim/models/armv8 &amp;&amp; \
+  /home/ciro/bak/git/linux-kernel-module-cheat/submodules/gensim/build/dist/bin/gensim \
+  -a /home/ciro/bak/git/linux-kernel-module-cheat/submodules/gensim/models/armv8/aarch64.ac \
+  -s module,arch,decode,disasm,ee_interp,ee_blockjit,jumpinfo,function,makefile \
+  -o decode.GenerateDotGraph=1,makefile.libtrace_path=/home/ciro/bak/git/linux-kernel-module-cheat/submodules/gensim/support/libtrace/inc,makefile.archsim_path=/home/ciro/bak/git/linux-kernel-module-cheat/submodules/gensim/archsim/inc,makefile.llvm_path=,makefile.Optimise=2,makefile.Debug=1 \
+  -t /home/ciro/bak/git/linux-kernel-module-cheat/submodules/gensim/build/models/armv8/output-aarch64/</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>We can see an inclusion path:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>gensim/models/armv8/aarch64.ac
+		ac_isa("isa.ac");
+gensim/models/armv8/isa.ac
+		ac_execute("execute.simd");</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>and where <code>gensim/models/armv8/isa.ac</code> contains <code>__builtin_abs64</code> usages.</p>
+</div>
+<div class="paragraph">
+<p>GDB on <code>gensim</code> shows that the error comes from a call to <code>gci.GenerateExecuteBodyFor(body_str, *action);</code>, so it looks like there are some missing cases in <code>EmitFixedCode</code>.</p>
+</div>
+<div class="paragraph">
+<p>This is completely broken academic code! They must be using an off-tree of part of the tool and forgot to commit.</p>
+</div>
+</div>
 </div>
 </div>
 <div class="sect1">
@@ -27017,7 +28666,7 @@ make menuconfig</pre>
 <p>If none of those methods are flexible enough for you, you can just fork or hack up <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/buildroot_packages/sample_package">buildroot_packages/sample_package</a> the sample package to do what you want.</p>
 </div>
 <div class="paragraph">
-<p>For how to use that package, see: <a href="#buildroot_packages-directory">Section 33.14.2, &#8220;buildroot_packages directory&#8221;</a>.</p>
+<p>For how to use that package, see: <a href="#buildroot_packages-directory">Section 33.15.2, &#8220;buildroot_packages directory&#8221;</a>.</p>
 </div>
 <div class="paragraph">
 <p>Then iterate trying to do what you want and reading the manual until it works: <a href="https://buildroot.org/downloads/manual/manual.html" class="bare">https://buildroot.org/downloads/manual/manual.html</a></p>
@@ -27036,7 +28685,7 @@ make menuconfig</pre>
 <p>Also mentioned at: <a href="https://stackoverflow.com/questions/47320800/how-to-clean-only-target-in-buildroot" class="bare">https://stackoverflow.com/questions/47320800/how-to-clean-only-target-in-buildroot</a></p>
 </div>
 <div class="paragraph">
-<p>See this for a sample manual workaround: <a href="#parsec-uninstall">Section 21.8.4.4, &#8220;PARSEC uninstall&#8221;</a>.</p>
+<p>See this for a sample manual workaround: <a href="#parsec-uninstall">Section 21.8.5.4, &#8220;PARSEC uninstall&#8221;</a>.</p>
 </div>
 </div>
 <div class="sect2">
@@ -27203,7 +28852,7 @@ git -C "$(./getvar qemu_source_dir)" checkout -
 <p>Then, you will also want to do a <a href="#bisection">Bisection</a> to pinpoint the exact commit to blame, and CC that developer.</p>
 </div>
 <div class="paragraph">
-<p>Finally, give the images you used save upstream developers' time as shown at: <a href="#release-zip">Section 33.18.2, &#8220;release-zip&#8221;</a>.</p>
+<p>Finally, give the images you used save upstream developers' time as shown at: <a href="#release-zip">Section 33.19.2, &#8220;release-zip&#8221;</a>.</p>
 </div>
 <div class="paragraph">
 <p>For Buildroot problems, you should wither provide the config you have:</p>
@@ -27872,17 +29521,11 @@ echo 1 &gt; /proc/sys/vm/overcommit_memory
 <p>Demonstrates <code>atomic_int</code> and <code>thrd_create</code>.</p>
 </div>
 <div class="paragraph">
-<p>Disassembly with GDB at LKMC 619fef4b04bddc4a5a38aec5e207dd4d5a25d206 + 1:</p>
+<p><a href="#disas">Disassembly with GDB</a> at LKMC 619fef4b04bddc4a5a38aec5e207dd4d5a25d206 + 1:</p>
 </div>
 <div class="literalblock">
 <div class="content">
-<pre>./run-toolchain \
-  --arch aarch64 gdb \
-  -- \
-  -batch \
-  -ex 'disas/rs my_thread_main' $(./getvar \
-  --arch aarch64 userland_build_dir)/c/atomic.out \
-;</pre>
+<pre>./disas --arch aarch64 --userland userland/c/atomic.c my_thread_main</pre>
 </div>
 </div>
 <div class="paragraph">
@@ -27982,6 +29625,101 @@ echo 1 &gt; /proc/sys/vm/overcommit_memory
 <div class="paragraph">
 <p><code>strace</code> shows that OpenMP makes <code>clone()</code> syscalls in Linux. TODO: does it actually call <code>pthread_</code> functions, or does it make syscalls directly? Or in other words, can it work on <a href="#freestanding-programs">Freestanding programs</a>? A quick grep shows many references to pthreads.</p>
 </div>
+<div class="sect5">
+<h6 id="openmp-validation"><a class="anchor" href="#openmp-validation"></a><a class="link" href="#openmp-validation">21.1.3.2.1. OpenMP validation</a></h6>
+<div class="paragraph">
+<p><a href="https://github.com/uhhpctools/omp-validation" class="bare">https://github.com/uhhpctools/omp-validation</a></p>
+</div>
+<div class="paragraph">
+<p>Host build on Ubuntu 20.04:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>git submodule update --init submodules/omp-validation
+cd submodules/omp-validation
+PERL5LIB="${PERL5LIB}:." make -j `nproc` ctest</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>This both builds and runs, took about 5 minutes on <a href="#p51">P51</a>, but had build failues for some reason:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>Summary:
+S Number of tested Open MP constructs: 62
+S Number of used tests:                123
+S Number of failed tests:              4
+S Number of successful tests:          119
+S + from this were verified:           115
+
+Normal tests:
+N Number of failed tests:              2
+N + from this fail compilation:        0
+N + from this timed out                0
+N Number of successful tests:          60
+N + from this were verified:           58
+
+Orphaned tests:
+O Number of failed tests:              2
+O + from this fail compilation:        0
+O + from this timed out                0
+O Number of successful tests:          59
+O + from this were verified:           57</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The tests and run results placed under <code>bin/c/</code>, e.g.:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>test_omp_threadprivate
+test_omp_threadprivate.c
+test_omp_threadprivate.log
+test_omp_threadprivate.out
+test_omp_threadprivate_compile.log</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>C files are also present as some kind of code generaion is used.</p>
+</div>
+<div class="paragraph">
+<p>Build only and run one of them manually:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>make -j`nproc` omp_my_sleep omp_testsuite
+PERL5LIB="${PERL5LIB}:." ./runtest.pl --lang=c --norun testlist-c.txt
+./bin/c/test_omp_barrier</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The <code>bin/c</code> directory is hardcoded in the executable, so to run it you must ensure that it exists relative to CWD, e.g.:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>cd bin/c
+mkdir -p bin/c
+./test_omp_barrier</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Manually cross compile all tests and optionally add some extra options, e.g. <code>-static</code> to <a href="#gem5-dynamic-linked-executables-in-syscall-emulation">more conveniently run in gem5</a>:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>PERL5LIB="${PERL5LIB}:." ./runtest.pl --lang=c --makeopts 'CC=aarch64-linux-gnu-gcc CFLAGS_EXTRA=-static' --norun testlist-c.txt
+./../../run --arch aarch64 --emulator gem5 --userland submodules/omp-validation/bin/c/test_omp_parallel_reduction --cpus 8 --memory 8G</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Build a single test:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>make bin/c/test_omp_sections_reduction</pre>
+</div>
+</div>
+</div>
 </div>
 </div>
 </div>
@@ -28248,7 +29986,7 @@ global 12676</pre>
 <p>The actual value is much smaller, because the threads have often overwritten one another with older values.</p>
 </div>
 <div class="paragraph">
-<p>With <code>--optimization-level 3</code>, the result almost always equals that of a single thread, e.g.:</p>
+<p>With <a href="#optimization-level-of-a-build"><code>--optimization-level 3</code></a>, the result almost always equals that of a single thread, e.g.:</p>
 </div>
 <div class="literalblock">
 <div class="content">
@@ -28414,7 +30152,6 @@ non-atomic 19</pre>
 </div>
 </div>
 <div class="paragraph">
-<div class="title">/run -aA -eg -u userland/c/atomic.c --cli-args '2 200' --cpus 3 --userland-build-id o3 -N1 --trace ExecAll&#8201;&#8212;&#8201;--caches --cpu-type TimingSimpleCPU</div>
 <p>Note that that the system is very minimal, and doesn&#8217;t even have caches, so I&#8217;m curious as to how this can happen at all.</p>
 </div>
 <div class="paragraph">
@@ -28966,6 +30703,9 @@ There are no non-locking atomic types or atomic primitives in POSIX: <a href="ht
 <li>
 <p><a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/rootfs_overlay/lkmc/python/count.py">rootfs_overlay/lkmc/python/count.py</a>: count once every second</p>
 </li>
+<li>
+<p><a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/rootfs_overlay/lkmc/python/iter_method.py">rootfs_overlay/lkmc/python/iter_method.py</a>: how to implement <code><em>iter</em></code> on a class</p>
+</li>
 </ul>
 </div>
 </li>
@@ -29219,6 +30959,36 @@ There are no non-locking atomic types or atomic primitives in POSIX: <a href="ht
 </ul>
 </div>
 </li>
+<li>
+<p><code>class</code></p>
+<div class="ulist">
+<ul>
+<li>
+<p><a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/rootfs_overlay/lkmc/nodejs/object_to_string.js">rootfs_overlay/lkmc/nodejs/object_to_string.js</a>: <code>util.inspect.custom</code> and <code>toString</code> override experiment: <a href="https://stackoverflow.com/questions/24902061/is-there-an-repr-equivalent-for-javascript/26698403#26698403" class="bare">https://stackoverflow.com/questions/24902061/is-there-an-repr-equivalent-for-javascript/26698403#26698403</a></p>
+<div class="paragraph">
+<p>Output:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>util.inspect
+my type is MyClassUtilInspectCustom and a is 1 and b is 2
+console.log
+my type is MyClassUtilInspectCustom and a is 1 and b is 2
+toString
+[object Object]
+
+util.inspect
+MyClassToString { a: 1, b: 2 }
+console.log
+MyClassToString { a: 1, b: 2 }
+toString
+my type is MyClassToString and a is 1 and b is 2</pre>
+</div>
+</div>
+</li>
+</ul>
+</div>
+</li>
 </ul>
 </div>
 <div class="sect4">
@@ -29779,7 +31549,125 @@ cblas_dgemm(      CblasColMajor, CblasNoTrans, CblasTrans,3,3,2  ,1,    A,3,  B,
 </div>
 </div>
 <div class="sect3">
-<h4 id="stream-benchmark"><a class="anchor" href="#stream-benchmark"></a><a class="link" href="#stream-benchmark">21.8.3. STREAM benchmark</a></h4>
+<h4 id="lmbench"><a class="anchor" href="#lmbench"></a><a class="link" href="#lmbench">21.8.3. LMbench</a></h4>
+<div class="paragraph">
+<p><a href="http://www.bitmover.com/lmbench/" class="bare">http://www.bitmover.com/lmbench/</a></p>
+</div>
+<div class="paragraph">
+<p>Canonical source at <a href="https://sourceforge.net/projects/lmbench/" class="bare">https://sourceforge.net/projects/lmbench/</a> but Intel has a fork at: <a href="https://github.com/intel/lmbench" class="bare">https://github.com/intel/lmbench</a> which has more recent build updates, so I think that&#8217;s the one I&#8217;d put my money on as of 2020.</p>
+</div>
+<div class="paragraph">
+<p>Feels old, guessing not representative anymore like <a href="#dhrystone">Dhrystone</a>. But hey, history!</p>
+</div>
+<div class="paragraph">
+<p>Ubuntu 20.04 AMD64 native build and run:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>git submodule update --init submodules/lmbench
+cd submodules/lmbench
+cd src
+make results</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>TODO it hangs for a long time at:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>Hang on, we are calculating your cache line size.</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Bug report: <a href="https://github.com/intel/lmbench/issues/15" class="bare">https://github.com/intel/lmbench/issues/15</a></p>
+</div>
+<div class="paragraph">
+<p>the If I kill it, configuration process continues:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>Killed
+OK, it looks like your cache line is  bytes.</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>and continues with a few more interactive questions until finally:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>Confguration done, thanks.</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>where it again hangs for at least 2 hours, so I lost patience and killed it.</p>
+</div>
+<div class="paragraph">
+<p>TODO: how to do a non-interactive config? After the above procedure, <code>bin/x86_64-linux-gnu/CONFIG.ciro-p51</code> contains:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>DISKS=""
+DISK_DESC=""
+OUTPUT=/dev/null
+ENOUGH=50000
+FASTMEM="NO"
+FILE=/var/tmp/XXX
+FSDIR=/var/tmp
+INFO=INFO.ciro-p51
+LINE_SIZE=
+LOOP_O=0.00000000
+MAIL=no
+TOTAL_MEM=31903
+MB=22332
+MHZ="-1 System too busy"
+MOTHERBOARD=""
+NETWORKS=""
+OS="x86_64-linux-gnu"
+PROCESSORS="8"
+REMOTE=""
+SLOWFS="NO"
+SYNC_MAX="1"
+LMBENCH_SCHED="DEFAULT"
+TIMING_O=0
+RSH=rsh
+RCP=rcp
+VERSION=lmbench-3alpha4
+BENCHMARK_HARDWARE=YES
+BENCHMARK_OS=YES
+BENCHMARK_SYSCALL=
+BENCHMARK_SELECT=
+BENCHMARK_PROC=
+BENCHMARK_CTX=
+BENCHMARK_PAGEFAULT=
+BENCHMARK_FILE=
+BENCHMARK_MMAP=
+BENCHMARK_PIPE=
+BENCHMARK_UNIX=
+BENCHMARK_UDP=
+BENCHMARK_TCP=
+BENCHMARK_CONNECT=
+BENCHMARK_RPC=
+BENCHMARK_HTTP=
+BENCHMARK_BCOPY=
+BENCHMARK_MEM=
+BENCHMARK_OPS=</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Native build only without running tests:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>cd src
+make</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Interestingly, one of the creators of LMbench, Larry Mcvoy (<a href="https://www.linkedin.com/in/larrymcvoy/" class="bare">https://www.linkedin.com/in/larrymcvoy/</a>, <a href="https://en.wikipedia.org/wiki/Larry_McVoy" class="bare">https://en.wikipedia.org/wiki/Larry_McVoy</a>), is also a co-founder of <a href="https://en.wikipedia.org/wiki/BitKeeper">BitKeeper</a>. Their SMC must be blazingly fast!!! Also his LinkedIn says Intel uses it. But they will forever be remembered as "the closed source Git precursor that died N years ago", RIP.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="stream-benchmark"><a class="anchor" href="#stream-benchmark"></a><a class="link" href="#stream-benchmark">21.8.4. STREAM benchmark</a></h4>
 <div class="paragraph">
 <p><a href="http://www.cs.virginia.edu/stream/ref.html" class="bare">http://www.cs.virginia.edu/stream/ref.html</a></p>
 </div>
@@ -29853,7 +31741,7 @@ times[3 * ntimes + k] = mysecond() - times[3 * ntimes + k];
 </div>
 </div>
 <div class="sect3">
-<h4 id="parsec-benchmark"><a class="anchor" href="#parsec-benchmark"></a><a class="link" href="#parsec-benchmark">21.8.4. PARSEC benchmark</a></h4>
+<h4 id="parsec-benchmark"><a class="anchor" href="#parsec-benchmark"></a><a class="link" href="#parsec-benchmark">21.8.5. PARSEC benchmark</a></h4>
 <div class="paragraph">
 <p>We have ported parts of the <a href="http://parsec.cs.princeton.edu">PARSEC benchmark</a> for cross compilation at: <a href="https://github.com/cirosantilli/parsec-benchmark" class="bare">https://github.com/cirosantilli/parsec-benchmark</a> See the documentation on that repo to find out which benchmarks have been ported. Some of the benchmarks were are segfaulting, they are documented in that repo.</p>
 </div>
@@ -29871,7 +31759,7 @@ times[3 * ntimes + k] = mysecond() - times[3 * ntimes + k];
 </ul>
 </div>
 <div class="sect4">
-<h5 id="parsec-benchmark-without-parsecmgmt"><a class="anchor" href="#parsec-benchmark-without-parsecmgmt"></a><a class="link" href="#parsec-benchmark-without-parsecmgmt">21.8.4.1. PARSEC benchmark without parsecmgmt</a></h5>
+<h5 id="parsec-benchmark-without-parsecmgmt"><a class="anchor" href="#parsec-benchmark-without-parsecmgmt"></a><a class="link" href="#parsec-benchmark-without-parsecmgmt">21.8.5.1. PARSEC benchmark without parsecmgmt</a></h5>
 <div class="literalblock">
 <div class="content">
 <pre>./build --arch arm --download-dependencies gem5-buildroot parsec-benchmark
@@ -29905,7 +31793,7 @@ times[3 * ntimes + k] = mysecond() - times[3 * ntimes + k];
 </div>
 </div>
 <div class="sect4">
-<h5 id="parsec-change-the-input-size"><a class="anchor" href="#parsec-change-the-input-size"></a><a class="link" href="#parsec-change-the-input-size">21.8.4.2. PARSEC change the input size</a></h5>
+<h5 id="parsec-change-the-input-size"><a class="anchor" href="#parsec-change-the-input-size"></a><a class="link" href="#parsec-change-the-input-size">21.8.5.2. PARSEC change the input size</a></h5>
 <div class="paragraph">
 <p>Running a benchmark of a size different than <code>test</code>, e.g. <code>simsmall</code>, requires a rebuild with:</p>
 </div>
@@ -29969,7 +31857,7 @@ times[3 * ntimes + k] = mysecond() - times[3 * ntimes + k];
 </div>
 </div>
 <div class="sect4">
-<h5 id="parsec-benchmark-with-parsecmgmt"><a class="anchor" href="#parsec-benchmark-with-parsecmgmt"></a><a class="link" href="#parsec-benchmark-with-parsecmgmt">21.8.4.3. PARSEC benchmark with parsecmgmt</a></h5>
+<h5 id="parsec-benchmark-with-parsecmgmt"><a class="anchor" href="#parsec-benchmark-with-parsecmgmt"></a><a class="link" href="#parsec-benchmark-with-parsecmgmt">21.8.5.3. PARSEC benchmark with parsecmgmt</a></h5>
 <div class="paragraph">
 <p>Most users won&#8217;t want to use this method because:</p>
 </div>
@@ -30032,7 +31920,7 @@ parsecmgmt -a run -p splash2x.fmm -i test</pre>
 </div>
 </div>
 <div class="sect4">
-<h5 id="parsec-uninstall"><a class="anchor" href="#parsec-uninstall"></a><a class="link" href="#parsec-uninstall">21.8.4.4. PARSEC uninstall</a></h5>
+<h5 id="parsec-uninstall"><a class="anchor" href="#parsec-uninstall"></a><a class="link" href="#parsec-uninstall">21.8.5.4. PARSEC uninstall</a></h5>
 <div class="paragraph">
 <p>If you want to remove PARSEC later, Buildroot doesn&#8217;t provide an automated package removal mechanism as mentioned at: <a href="#remove-buildroot-packages">Section 20.6, &#8220;Remove Buildroot packages&#8221;</a>, but the following procedure should be satisfactory:</p>
 </div>
@@ -30050,7 +31938,7 @@ parsecmgmt -a run -p splash2x.fmm -i test</pre>
 </div>
 </div>
 <div class="sect4">
-<h5 id="parsec-benchmark-hacking"><a class="anchor" href="#parsec-benchmark-hacking"></a><a class="link" href="#parsec-benchmark-hacking">21.8.4.5. PARSEC benchmark hacking</a></h5>
+<h5 id="parsec-benchmark-hacking"><a class="anchor" href="#parsec-benchmark-hacking"></a><a class="link" href="#parsec-benchmark-hacking">21.8.5.5. PARSEC benchmark hacking</a></h5>
 <div class="paragraph">
 <p>If you end up going inside <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/submodules/parsec-benchmark">submodules/parsec-benchmark</a> to hack up the benchmark (you will!), these tips will be helpful.</p>
 </div>
@@ -31250,10 +33138,10 @@ zmmintrin.h AVX512</pre>
 <p><a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/arch/x86_64/freestanding/linux/int_system_call.S">userland/arch/x86_64/freestanding/linux/int_system_call.S</a></p>
 </li>
 <li>
-<p><a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/arch/x86_64/inline_asm/freestanding/linux/hello.c">userland/arch/x86_64/inline_asm/freestanding/linux/hello.c</a></p>
+<p><a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/arch/x86_64/inline_asm/freestanding/linux/hello.c">userland/arch/x86_64/inline_asm/freestanding/linux/hello.c</a>: this shows how to do system calls from inline assembly without any C standard library helpers like <code>syscall</code></p>
 </li>
 <li>
-<p><a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/arch/x86_64/inline_asm/freestanding/linux/hello_regvar.c">userland/arch/x86_64/inline_asm/freestanding/linux/hello_regvar.c</a></p>
+<p><a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/arch/x86_64/inline_asm/freestanding/linux/hello_regvar.c">userland/arch/x86_64/inline_asm/freestanding/linux/hello_regvar.c</a>: same as <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/arch/x86_64/inline_asm/freestanding/linux/hello.c">userland/arch/x86_64/inline_asm/freestanding/linux/hello.c</a> but using register variables instead of register constraints</p>
 </li>
 </ul>
 </div>
@@ -31266,7 +33154,7 @@ zmmintrin.h AVX512</pre>
 <p><a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/arch/arm/freestanding/linux/hello.S">userland/arch/arm/freestanding/linux/hello.S</a></p>
 </li>
 <li>
-<p><a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/arch/arm/inline_asm/freestanding/linux/hello.c">userland/arch/arm/inline_asm/freestanding/linux/hello.c</a></p>
+<p><a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/arch/arm/inline_asm/freestanding/linux/hello.c">userland/arch/arm/inline_asm/freestanding/linux/hello.c</a>: there are no register constraints in ARM, so register variables are the most efficient way of storing variables in specific general purpose registers: <a href="https://stackoverflow.com/questions/3929442/how-to-specify-an-individual-register-as-constraint-in-arm-gcc-inline-assembly/54845046#54845046" class="bare">https://stackoverflow.com/questions/3929442/how-to-specify-an-individual-register-as-constraint-in-arm-gcc-inline-assembly/54845046#54845046</a></p>
 </li>
 </ul>
 </div>
@@ -34587,6 +36475,158 @@ ldmia sp!, reglist</pre>
 <p>Why GNU GAS 2.29 does not have a mnemonic for it in A64 because it is very recent: shows in <a href="#armarm8-db">ARMv8 architecture reference manual db</a> but not <code>ca</code>.</p>
 </div>
 </div>
+<div class="sect3">
+<h4 id="arm-system-register-instructions"><a class="anchor" href="#arm-system-register-instructions"></a><a class="link" href="#arm-system-register-instructions">24.5.3. ARM system register instructions</a></h4>
+<div class="paragraph">
+<p>Examples of using them can be found at: <a href="#dump-regs">dump_regs</a></p>
+</div>
+<div class="paragraph">
+<p>aarch64 only uses exactly 2 instructions:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>MRS: reads a system register to a regular register</p>
+</li>
+<li>
+<p>MSR: writes to the system register</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>aarch32 is a bit more messy due to older setups, we have both:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>MRS and MSR which are much like in aarch64</p>
+</li>
+<li>
+<p>coprocessor accesses:</p>
+<div class="ulist">
+<ul>
+<li>
+<p>MRC: reads a system register, C means coprocessor, which is how system registers were previously known as</p>
+</li>
+<li>
+<p>MCR: write to the system register</p>
+</li>
+<li>
+<p>MRRC: like MRC, but used for the system registers that are marked as 64-bit, and reads to two general purpose register</p>
+</li>
+<li>
+<p>MCRR: write version of MCRR</p>
+</li>
+</ul>
+</div>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>TODO why both? For example, as mentioned at <a href="https://stackoverflow.com/questions/62920281/cross-compilng-c-program-for-armv8-a-in-linux-x86-64-system/62922677#62922677" class="bare">https://stackoverflow.com/questions/62920281/cross-compilng-c-program-for-armv8-a-in-linux-x86-64-system/62922677#62922677</a> a register that was accessed with MRC in armv7 can move to MRS in aarch64, as is the case for:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>mrs r0, ctr     /* aarch32 */
+mrc x0, ctr_el0 /* aarch64 */</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Other functionality has moved away from coprocessors into actual instructions, e.g. cache invalidation:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>/* aarch32: DCISW, Data Cache line Invalidate by Set/Way. */
+mcr     p15, 0, r5, c7, c6, 2
+
+/* aarch64: moved to one of the DC instruction variants. */
+dc isw</pre>
+</div>
+</div>
+<div class="paragraph">
+<p><a href="#armarm8-fa">ARMv8 architecture reference manual db</a> G1.19.4 "Background to the System register interface" says that only CP14 and CP15 are specified by the ISA:</p>
+</div>
+<div class="quoteblock">
+<blockquote>
+<div class="paragraph">
+<p>The interface to the System registers was originally defined as part of a generic coprocessor interface, that gave access to 15 coprocessors, CP0 - CP15. Of these, CP8 - CP15 were reserved for use by Arm, while CP0 - CP7 were available for IMPLEMENTATION DEFINED coprocessors.</p>
+</div>
+</blockquote>
+</div>
+<div class="paragraph">
+<p>and the actual coprocessor registers are specified in Chapter G7 "AArch32 System Register Encoding" at:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>CP14: Table G7-1 "Mapping of (coproc ==0b1110) MCR, MRC, and MRRC instruction arguments to System registers"</p>
+</li>
+<li>
+<p>CP15: Table G7-3 "VMSAv8-32 (coproc==0b1111) register summary, in MCR/MRC parameter order."</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The actual MRC assembly does not exactly match the order of that table, this is how you can decode it, sample MCR:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>mcr     p15, 0, r5, c7, c6, 2</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>what each part means:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>mcr     p&lt;coproc&gt;, &lt;opc1&gt;, &lt;src-dest-reg&gt;, &lt;CRn&gt;, &lt;CRm&gt;, &lt;opc2&gt;</pre>
+</div>
+</div>
+<div class="sect4">
+<h5 id="arm-system-register-encodings"><a class="anchor" href="#arm-system-register-encodings"></a><a class="link" href="#arm-system-register-encodings">24.5.3.1. ARM system register encodings</a></h5>
+<div class="paragraph">
+<p>Each aarch64 system register is specified in the encoding of <a href="#arm-system-register-instructions">ARM system register instructions</a> by 5 integer numbers:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><code>op0</code></p>
+</li>
+<li>
+<p><code>op1</code></p>
+</li>
+<li>
+<p><code>CRn</code></p>
+</li>
+<li>
+<p><code>CRm</code></p>
+</li>
+<li>
+<p><code>op2</code></p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>The encodings are given on large tables in <a href="#armarm8-fa">ARMv8 architecture reference manual db</a> Chapter D12 "AArch64 System Register Encoding".</p>
+</div>
+<div class="paragraph">
+<p>As shown in <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/baremetal/arch/aarch64/dump_regs.c">baremetal/arch/aarch64/dump_regs.c</a> as of LKMC 4e05b00d23c73cc4d3b83be94affdb6f28008d99, you can use the encoding parameters directly in GNU GAS assembly:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>uint32_t id_isar6_el1;
+__asm__ ("mrs %0, s3_0_c0_c2_7" : "=r" (id_isar6_el1) : :);
+LKMC_DUMP_SYSTEM_REGS_PRINTF("ID_ISAR6_EL1 0x%" PRIX32 "\n", id_isar6_el1);</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>This can be useful to refer to new system registers which your older version of GNU GAS version does not yet have a name for.</p>
+</div>
+<div class="paragraph">
+<p>The Linux kernel also uses explicit sysreg encoding extensively since it is of course a very early user of many new system registers, this is done at <a href="https://github.com/torvalds/linux/blob/v5.4/arch/arm64/include/asm/sysreg.h"><code>arch/arm64/include/asm/sysreg.h</code> in Linux v5.4</a>.</p>
+</div>
+</div>
+</div>
 </div>
 <div class="sect2">
 <h3 id="arm-simd"><a class="anchor" href="#arm-simd"></a><a class="link" href="#arm-simd">24.6. ARM SIMD</a></h3>
@@ -35326,7 +37366,13 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.</p>
 </div>
 </div>
 <div class="sect4">
-<h5 id="armv8-programmers-guide"><a class="anchor" href="#armv8-programmers-guide"></a><a class="link" href="#armv8-programmers-guide">24.9.2.4. Programmer&#8217;s Guide for ARMv8-A</a></h5>
+<h5 id="armarm8-fa"><a class="anchor" href="#armarm8-fa"></a><a class="link" href="#armarm8-fa">24.9.2.4. ARMv8 architecture reference manual db</a></h5>
+<div class="paragraph">
+<p><a href="https://static.docs.arm.com/ddi0487/fa/DDI0487F_a_armv8_arm.pdf" class="bare">https://static.docs.arm.com/ddi0487/fa/DDI0487F_a_armv8_arm.pdf</a></p>
+</div>
+</div>
+<div class="sect4">
+<h5 id="armv8-programmers-guide"><a class="anchor" href="#armv8-programmers-guide"></a><a class="link" href="#armv8-programmers-guide">24.9.2.5. Programmer&#8217;s Guide for ARMv8-A</a></h5>
 <div class="paragraph">
 <p><a href="https://static.docs.arm.com/den0024/a/DEN0024A_v8_architecture_PG.pdf" class="bare">https://static.docs.arm.com/den0024/a/DEN0024A_v8_architecture_PG.pdf</a></p>
 </div>
@@ -35341,7 +37387,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.</p>
 </div>
 </div>
 <div class="sect4">
-<h5 id="arm-a64-instruction-set-architecture-future-architecture-technologies-in-the-a-architecture-profile-documentation"><a class="anchor" href="#arm-a64-instruction-set-architecture-future-architecture-technologies-in-the-a-architecture-profile-documentation"></a><a class="link" href="#arm-a64-instruction-set-architecture-future-architecture-technologies-in-the-a-architecture-profile-documentation">24.9.2.5. Arm A64 Instruction Set Architecture: Future Architecture Technologies in the A architecture profile Documentation</a></h5>
+<h5 id="arm-a64-instruction-set-architecture-future-architecture-technologies-in-the-a-architecture-profile-documentation"><a class="anchor" href="#arm-a64-instruction-set-architecture-future-architecture-technologies-in-the-a-architecture-profile-documentation"></a><a class="link" href="#arm-a64-instruction-set-architecture-future-architecture-technologies-in-the-a-architecture-profile-documentation">24.9.2.6. Arm A64 Instruction Set Architecture: Future Architecture Technologies in the A architecture profile Documentation</a></h5>
 <div class="paragraph">
 <p><a href="https://developer.arm.com/docs/ddi0602/b" class="bare">https://developer.arm.com/docs/ddi0602/b</a></p>
 </div>
@@ -35350,15 +37396,31 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.</p>
 </div>
 </div>
 <div class="sect4">
-<h5 id="arm-processor-documentation"><a class="anchor" href="#arm-processor-documentation"></a><a class="link" href="#arm-processor-documentation">24.9.2.6. ARM processor documentation</a></h5>
+<h5 id="arm-processor-documentation"><a class="anchor" href="#arm-processor-documentation"></a><a class="link" href="#arm-processor-documentation">24.9.2.7. ARM processor documentation</a></h5>
 <div class="paragraph">
 <p>ARM also releases documentation specific to each given processor.</p>
 </div>
 <div class="paragraph">
 <p>This adds extra details to the more portable <a href="#armarm8">ARMv8 architecture reference manual</a> ISA documentation.</p>
 </div>
+<div class="paragraph">
+<p>For every processor, there are basically two key documents:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>technical reference manual, e.g.: <a href="#arm-cortex-a77-trm">Arm Cortex‑A77 Technical Reference Manual r1p1</a></p>
+</li>
+<li>
+<p>software optimization guide, e.g.: <a href="#arm-cortex-a77-sog">Arm Cortex‑A77 Software Optimization Guide r1p1</a></p>
+<div class="paragraph">
+<p>This contains some approximate instruction latencies and pipeline properties.</p>
+</div>
+</li>
+</ul>
+</div>
 <div class="sect5">
-<h6 id="arm-cortex15-trm"><a class="anchor" href="#arm-cortex15-trm"></a><a class="link" href="#arm-cortex15-trm">24.9.2.6.1. ARM Cortex-A15 MPCore Processor Technical Reference Manual r4p0</a></h6>
+<h6 id="arm-cortex15-trm"><a class="anchor" href="#arm-cortex15-trm"></a><a class="link" href="#arm-cortex15-trm">24.9.2.7.1. ARM Cortex-A15 MPCore Processor Technical Reference Manual r4p0</a></h6>
 <div class="paragraph">
 <p><a href="http://infocenter.arm.com/help/topic/com.arm.doc.ddi0438i/DDI0438I_cortex_a15_r4p0_trm.pdf" class="bare">http://infocenter.arm.com/help/topic/com.arm.doc.ddi0438i/DDI0438I_cortex_a15_r4p0_trm.pdf</a></p>
 </div>
@@ -35367,6 +37429,18 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.</p>
 </div>
 </div>
 </div>
+<div class="sect4">
+<h5 id="arm-cortex-a77-trm"><a class="anchor" href="#arm-cortex-a77-trm"></a><a class="link" href="#arm-cortex-a77-trm">24.9.2.8. Arm Cortex‑A77 Technical Reference Manual r1p1</a></h5>
+<div class="paragraph">
+<p><a href="https://static.docs.arm.com/101111/0101/arm_cortex_a77_trm_101111_0101_04_en.pdf" class="bare">https://static.docs.arm.com/101111/0101/arm_cortex_a77_trm_101111_0101_04_en.pdf</a></p>
+</div>
+</div>
+<div class="sect4">
+<h5 id="arm-cortex-a77-sog"><a class="anchor" href="#arm-cortex-a77-sog"></a><a class="link" href="#arm-cortex-a77-sog">24.9.2.9. Arm Cortex‑A77 Software Optimization Guide r1p1</a></h5>
+<div class="paragraph">
+<p><a href="https://static.docs.arm.com/swog011050/c/Arm_Cortex-A77_Software_Optimization_Guide.pdf" class="bare">https://static.docs.arm.com/swog011050/c/Arm_Cortex-A77_Software_Optimization_Guide.pdf</a></p>
+</div>
+</div>
 </div>
 </div>
 </div>
@@ -35634,6 +37708,15 @@ cc</pre>
 <div class="paragraph">
 <p>It is worth noting that e.g. ARM has a <a href="#semihosting">Semihosting</a> mechanism for loading CLI arguments through <code>SYS_GET_CMDLINE</code>, but our mechanism works in principle for any ISA.</p>
 </div>
+<div class="sect3">
+<h4 id="gem5-baremetal-arm-cli-args"><a class="anchor" href="#gem5-baremetal-arm-cli-args"></a><a class="link" href="#gem5-baremetal-arm-cli-args">27.4.1. gem5 baremetal arm CLI args</a></h4>
+<div class="paragraph">
+<p>Currently not supported, so we just hardcode argc 0 on the <a href="#baremetal-bootloaders">arm baremetal bootloader</a>.</p>
+</div>
+<div class="paragraph">
+<p>I think we have to keep the CLI args below 32 GiB, otherwise argc cannot be correctly setup. But currently the gem5 text segment is exactly at 32 GiB, and we always place the CLI args higher in the <a href="#baremetal-linker-script">Baremetal linker script</a>.</p>
+</div>
+</div>
 </div>
 <div class="sect2">
 <h3 id="semihosting"><a class="anchor" href="#semihosting"></a><a class="link" href="#semihosting">27.5. Semihosting</a></h3>
@@ -36846,6 +38929,15 @@ IN: main
 <li>
 <p><a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/arch/aarch64/inline_asm/wfe_ldxr_str.cpp">userland/arch/aarch64/inline_asm/wfe_ldxr_str.cpp</a></p>
 </li>
+<li>
+<p><a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/arch/aarch64/inline_asm/futex_ldxr_stxr.c">userland/arch/aarch64/inline_asm/futex_ldxr_stxr.c</a>: tests that ldxr and stxr do not interact with futexes. This was leading to problems in <a href="#gem5-syscall-emulation-mode">gem5 syscall emulation mode</a> at one point: <a href="https://gem5.atlassian.net/browse/GEM5-537" class="bare">https://gem5.atlassian.net/browse/GEM5-537</a></p>
+<div class="paragraph">
+<p>Correct outcome: <a href="#gem5-simulate-limit-reached">gem5 simulate() limit reached</a>.</p>
+</div>
+<div class="paragraph">
+<p>Incorrect behaviour due to: <a href="https://gem5.atlassian.net/browse/GEM5-537" class="bare">https://gem5.atlassian.net/browse/GEM5-537</a>: Exits successfully. */</p>
+</div>
+</li>
 </ul>
 </div>
 <div class="paragraph">
@@ -37650,7 +39742,7 @@ ISB</pre>
 <p>In baremetal, we detect if tests failed by parsing logs for the <a href="#magic-failure-string">Magic failure string</a>.</p>
 </div>
 <div class="paragraph">
-<p>See: <a href="#test-this-repo">Section 33.15, &#8220;Test this repo&#8221;</a> for more useful testing tips.</p>
+<p>See: <a href="#test-this-repo">Section 33.16, &#8220;Test this repo&#8221;</a> for more useful testing tips.</p>
 </div>
 </div>
 </div>
@@ -38195,6 +40287,14 @@ instructions 124346081</pre>
 <div class="paragraph">
 <p>Same but with: <a href="#gem5-arm-linux-kernel-patches">gem5 arm Linux kernel patches</a> at v4.15: 73s, kernel size: 132M.</p>
 </div>
+<div class="paragraph">
+<p>On Ubuntu 20.04 gem5 3ca404da175a66e0b958165ad75eb5f54cb5e772 this took 22 minutes 53 seconds:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>./run -aa -eg --cpus 2 --tmux --quit-after-boot -- --cpu-type DerivO3CPU --caches</pre>
+</div>
+</div>
 <div class="sect4">
 <h5 id="gem5-arm-hpi-boot-takes-much-longer-than-aarch64"><a class="anchor" href="#gem5-arm-hpi-boot-takes-much-longer-than-aarch64"></a><a class="link" href="#gem5-arm-hpi-boot-takes-much-longer-than-aarch64">29.2.1.1. gem5 arm HPI boot takes much longer than aarch64</a></h5>
 <div class="paragraph">
@@ -39014,11 +41114,11 @@ west build -b qemu_aarch64 samples/hello_world</pre>
 <p>The hard part is how to prevent the compiler from optimizing it away: <a href="https://stackoverflow.com/questions/7083482/how-to-prevent-gcc-from-optimizing-out-a-busy-wait-loop/58758133#58758133" class="bare">https://stackoverflow.com/questions/7083482/how-to-prevent-gcc-from-optimizing-out-a-busy-wait-loop/58758133#58758133</a></p>
 </div>
 <div class="paragraph">
-<p>Disassembly analysis:</p>
+<p><a href="#disas">Disassembly</a> analysis:</p>
 </div>
 <div class="literalblock">
 <div class="content">
-<pre>./run-toolchain --arch aarch64 gdb -- -nh -batch -ex 'disas/rs busy_loop' "$(./getvar --arch aarch64 userland_build_dir)/gcc/busy_loop.out"</pre>
+<pre>./disas --arch aarch64 --userland userland/gcc/busy_loop.out busy_loop</pre>
 </div>
 </div>
 <div class="paragraph">
@@ -39103,7 +41203,7 @@ west build -b qemu_aarch64 samples/hello_world</pre>
 <p><a href="#gem5-minorcpu">gem5 MinorCPU</a></p>
 </li>
 <li>
-<p><a href="#gem5-derivo3cpu">gem5 DerivO3CPU</a></p>
+<p><a href="#gem5-derivo3cpu">gem5 <code>DerivO3CPU</code></a></p>
 </li>
 </ul>
 </div>
@@ -39126,10 +41226,35 @@ west build -b qemu_aarch64 samples/hello_world</pre>
 <p><a href="http://www.lighterra.com/papers/modernmicroprocessors/" class="bare">http://www.lighterra.com/papers/modernmicroprocessors/</a> explains it well.</p>
 </div>
 <div class="paragraph">
-<p>You basically decode</p>
+<p>You basically decode multiple instructions in one go, and run them at the same time if they can go in separate <a href="#execution-unit">functional units</a> and have no conflicts. Genius!</p>
 </div>
 <div class="paragraph">
-<p>TODO in gem5? gem5 definitely has functional units explicitly modelled: <a href="#gem5-functional-units">gem5 functional units</a>, so do <a href="#gem5-minorcpu">gem5 MinorCPU</a> or <a href="#gem5-derivo3cpu">gem5 DerivO3CPU</a> have it?</p>
+<p>And so the concept of <a href="#branch-predictor">branch predictor</a> must come in here: when a conditional branch is reached, you have to decide which side to execute before knowing for sure.</p>
+</div>
+<div class="paragraph">
+<p>This is why it is called a type of <a href="#instruction-level-parallelism">Instruction level parallelism</a>.</p>
+</div>
+<div class="paragraph">
+<p>Although this is a microarchitectural feature, it is so important that it is publicly documented. For example:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><a href="https://en.wikipedia.org/wiki/ARM_Cortex-A77" class="bare">https://en.wikipedia.org/wiki/ARM_Cortex-A77</a>: ARM Cortex A77 (2019) has a 4-wide superscalar decode (and is <a href="#out-of-order-execution">out-of-order</a>)</p>
+</li>
+</ul>
+</div>
+<div class="sect3">
+<h4 id="execution-unit"><a class="anchor" href="#execution-unit"></a><a class="link" href="#execution-unit">32.2.1. Execution unit</a></h4>
+<div class="paragraph">
+<p><a href="https://en.wikipedia.org/wiki/Execution_unit" class="bare">https://en.wikipedia.org/wiki/Execution_unit</a></p>
+</div>
+<div class="paragraph">
+<p>gem5 calls them "functional units".</p>
+</div>
+<div class="paragraph">
+<p>gem5 has <a href="#execution-unit">functional units</a> explicitly modelled as shown at <a href="#gem5-functional-units">gem5 functional units</a>, and those are used by both <a href="#gem5-minorcpu">gem5 MinorCPU</a> and <a href="#gem5-derivo3cpu">gem5 <code>DerivO3CPU</code></a>.</p>
+</div>
 </div>
 </div>
 <div class="sect2">
@@ -39138,11 +41263,82 @@ west build -b qemu_aarch64 samples/hello_world</pre>
 <p><a href="https://en.wikipedia.org/wiki/Out-of-order_execution" class="bare">https://en.wikipedia.org/wiki/Out-of-order_execution</a></p>
 </div>
 <div class="paragraph">
-<p>gem5&#8217;s model is <a href="#gem5-derivo3cpu">gem5 DerivO3CPU</a>.</p>
+<p>gem5&#8217;s model is <a href="#gem5-derivo3cpu">gem5 <code>DerivO3CPU</code></a>.</p>
+</div>
+<div class="paragraph">
+<p>Allows working around data dependencies: you can execute the second next instruction forward if the first next depends on the current one.</p>
+</div>
+<div class="paragraph">
+<p>Likely used on basically all (?) 2020 non-power-constrained CPUs.</p>
+</div>
+<div class="paragraph">
+<p>As mentioned at: <a href="https://stackoverflow.com/questions/10074831/what-is-general-difference-between-superscalar-and-ooo-execution" class="bare">https://stackoverflow.com/questions/10074831/what-is-general-difference-between-superscalar-and-ooo-execution</a> it is in theory possible for an out-of-order CPU to not a <a href="#superscalar-processor">Superscalar processor</a>, but the combination is so natural (since you can look ahead, you might as well run it!) that it is not super common.</p>
+</div>
+<div class="sect3">
+<h4 id="speculative-execution"><a class="anchor" href="#speculative-execution"></a><a class="link" href="#speculative-execution">32.3.1. Speculative execution</a></h4>
+<div class="paragraph">
+<p><a href="https://en.wikipedia.org/wiki/Speculative_execution" class="bare">https://en.wikipedia.org/wiki/Speculative_execution</a></p>
+</div>
+<div class="paragraph">
+<p>A gem5 example can be seen at: <a href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-speculative">gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: speculative</a>.</p>
+</div>
+<div class="paragraph">
+<p>Bibliography:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><a href="https://stackoverflow.com/questions/49601910/out-of-order-execution-vs-speculative-execution" class="bare">https://stackoverflow.com/questions/49601910/out-of-order-execution-vs-speculative-execution</a></p>
+</li>
+</ul>
+</div>
+<div class="sect4">
+<h5 id="branch-predictor"><a class="anchor" href="#branch-predictor"></a><a class="link" href="#branch-predictor">32.3.1.1. Branch predictor</a></h5>
+<div class="paragraph">
+<p><a href="https://en.wikipedia.org/wiki/Branch_predictor" class="bare">https://en.wikipedia.org/wiki/Branch_predictor</a></p>
+</div>
+<div class="paragraph">
+<p>Comes in for <a href="#superscalar-processor">superscalar processors</a>.</p>
+</div>
+<div class="paragraph">
+<p>A gem5 example can be seen at: <a href="#gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-speculative">gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: speculative</a>.</p>
+</div>
+</div>
+</div>
+<div class="sect3">
+<h4 id="re-order-buffer"><a class="anchor" href="#re-order-buffer"></a><a class="link" href="#re-order-buffer">32.3.2. Re-order buffer</a></h4>
+<div class="paragraph">
+<p><a href="https://en.wikipedia.org/wiki/Re-order_buffer" class="bare">https://en.wikipedia.org/wiki/Re-order_buffer</a></p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="register-renaming"><a class="anchor" href="#register-renaming"></a><a class="link" href="#register-renaming">32.3.3. Register renaming</a></h4>
+<div class="paragraph">
+<p><a href="https://en.wikipedia.org/wiki/Register_renaming" class="bare">https://en.wikipedia.org/wiki/Register_renaming</a></p>
+</div>
 </div>
 </div>
 <div class="sect2">
-<h3 id="hardware-threads"><a class="anchor" href="#hardware-threads"></a><a class="link" href="#hardware-threads">32.4. Hardware threads</a></h3>
+<h3 id="instruction-level-parallelism"><a class="anchor" href="#instruction-level-parallelism"></a><a class="link" href="#instruction-level-parallelism">32.4. Instruction level parallelism</a></h3>
+<div class="paragraph">
+<p><a href="https://en.wikipedia.org/wiki/Instruction-level_parallelism" class="bare">https://en.wikipedia.org/wiki/Instruction-level_parallelism</a></p>
+</div>
+<div class="paragraph">
+<p>Basically means decoding and then potentially executing a bunch of instructions in one go.</p>
+</div>
+<div class="paragraph">
+<p>Important examples:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><a href="#superscalar-processor">Superscalar processor</a></p>
+</li>
+</ul>
+</div>
+</div>
+<div class="sect2">
+<h3 id="hardware-threads"><a class="anchor" href="#hardware-threads"></a><a class="link" href="#hardware-threads">32.5. Hardware threads</a></h3>
 <div class="paragraph">
 <p>Intel name: "Hyperthreading"</p>
 </div>
@@ -39192,7 +41388,7 @@ west build -b qemu_aarch64 samples/hello_world</pre>
 </div>
 </div>
 <div class="sect2">
-<h3 id="cache-coherence"><a class="anchor" href="#cache-coherence"></a><a class="link" href="#cache-coherence">32.5. Cache coherence</a></h3>
+<h3 id="cache-coherence"><a class="anchor" href="#cache-coherence"></a><a class="link" href="#cache-coherence">32.6. Cache coherence</a></h3>
 <div class="paragraph">
 <p><a href="https://en.wikipedia.org/wiki/Cache_coherence" class="bare">https://en.wikipedia.org/wiki/Cache_coherence</a></p>
 </div>
@@ -39234,7 +41430,7 @@ west build -b qemu_aarch64 samples/hello_world</pre>
 <p>Even if caches are coherent, this is still not enough to avoid data race conditions, because this does not enforce atomicity of read modify write sequences. This is for example shown at: <a href="#detailed-gem5-analysis-of-how-data-races-happen">Detailed gem5 analysis of how data races happen</a>.</p>
 </div>
 <div class="sect3">
-<h4 id="memory-consistency"><a class="anchor" href="#memory-consistency"></a><a class="link" href="#memory-consistency">32.5.1. Memory consistency</a></h4>
+<h4 id="memory-consistency"><a class="anchor" href="#memory-consistency"></a><a class="link" href="#memory-consistency">32.6.1. Memory consistency</a></h4>
 <div class="paragraph">
 <p>According to <a href="http://www.inf.ed.ac.uk/teaching/courses/pa/Notes/lecture07-sc.pdf" class="bare">http://www.inf.ed.ac.uk/teaching/courses/pa/Notes/lecture07-sc.pdf</a> "memory consistency" is about ordering requirements of different memory addresses.</p>
 </div>
@@ -39242,14 +41438,14 @@ west build -b qemu_aarch64 samples/hello_world</pre>
 <p>This is represented explicitly in C++ for example <a href="#cpp-memory-order">C++ std::memory_order</a>.</p>
 </div>
 <div class="sect4">
-<h5 id="sequential-consistency"><a class="anchor" href="#sequential-consistency"></a><a class="link" href="#sequential-consistency">32.5.1.1. Sequential Consistency</a></h5>
+<h5 id="sequential-consistency"><a class="anchor" href="#sequential-consistency"></a><a class="link" href="#sequential-consistency">32.6.1.1. Sequential Consistency</a></h5>
 <div class="paragraph">
 <p>According to <a href="http://www.inf.ed.ac.uk/teaching/courses/pa/Notes/lecture07-sc.pdf" class="bare">http://www.inf.ed.ac.uk/teaching/courses/pa/Notes/lecture07-sc.pdf</a>, the strongest possible consistency, everything nicely ordered as you&#8217;d expect.</p>
 </div>
 </div>
 </div>
 <div class="sect3">
-<h4 id="can-caches-snoop-data-from-other-caches"><a class="anchor" href="#can-caches-snoop-data-from-other-caches"></a><a class="link" href="#can-caches-snoop-data-from-other-caches">32.5.2. Can caches snoop data from other caches?</a></h4>
+<h4 id="can-caches-snoop-data-from-other-caches"><a class="anchor" href="#can-caches-snoop-data-from-other-caches"></a><a class="link" href="#can-caches-snoop-data-from-other-caches">32.6.2. Can caches snoop data from other caches?</a></h4>
 <div class="paragraph">
 <p>Either they can snoop only control, or both control and data can be snooped.</p>
 </div>
@@ -39264,7 +41460,7 @@ west build -b qemu_aarch64 samples/hello_world</pre>
 </div>
 </div>
 <div class="sect3">
-<h4 id="vi-cache-coherence-protocol"><a class="anchor" href="#vi-cache-coherence-protocol"></a><a class="link" href="#vi-cache-coherence-protocol">32.5.3. VI cache coherence protocol</a></h4>
+<h4 id="vi-cache-coherence-protocol"><a class="anchor" href="#vi-cache-coherence-protocol"></a><a class="link" href="#vi-cache-coherence-protocol">32.6.3. VI cache coherence protocol</a></h4>
 <div class="paragraph">
 <p>Mentioned at:</p>
 </div>
@@ -39511,7 +41707,7 @@ west build -b qemu_aarch64 samples/hello_world</pre>
 </div>
 </div>
 <div class="sect3">
-<h4 id="msi-cache-coherence-protocol"><a class="anchor" href="#msi-cache-coherence-protocol"></a><a class="link" href="#msi-cache-coherence-protocol">32.5.4. MSI cache coherence protocol</a></h4>
+<h4 id="msi-cache-coherence-protocol"><a class="anchor" href="#msi-cache-coherence-protocol"></a><a class="link" href="#msi-cache-coherence-protocol">32.6.4. MSI cache coherence protocol</a></h4>
 <div class="paragraph">
 <p><a href="https://en.wikipedia.org/wiki/MSI_protocol" class="bare">https://en.wikipedia.org/wiki/MSI_protocol</a></p>
 </div>
@@ -39823,7 +42019,7 @@ CACHE2 S nyy
 <p>TODO gem5 concrete example.</p>
 </div>
 <div class="sect4">
-<h5 id="msi-cache-coherence-protocol-with-transient-states"><a class="anchor" href="#msi-cache-coherence-protocol-with-transient-states"></a><a class="link" href="#msi-cache-coherence-protocol-with-transient-states">32.5.4.1. MSI cache coherence protocol with transient states</a></h5>
+<h5 id="msi-cache-coherence-protocol-with-transient-states"><a class="anchor" href="#msi-cache-coherence-protocol-with-transient-states"></a><a class="link" href="#msi-cache-coherence-protocol-with-transient-states">32.6.4.1. MSI cache coherence protocol with transient states</a></h5>
 <div class="paragraph">
 <p>TODO understand well why those are needed.</p>
 </div>
@@ -39843,7 +42039,7 @@ CACHE2 S nyy
 </div>
 </div>
 <div class="sect3">
-<h4 id="mesi-cache-coherence-protocol"><a class="anchor" href="#mesi-cache-coherence-protocol"></a><a class="link" href="#mesi-cache-coherence-protocol">32.5.5. MESI cache coherence protocol</a></h4>
+<h4 id="mesi-cache-coherence-protocol"><a class="anchor" href="#mesi-cache-coherence-protocol"></a><a class="link" href="#mesi-cache-coherence-protocol">32.6.5. MESI cache coherence protocol</a></h4>
 <div class="paragraph">
 <p><a href="https://en.wikipedia.org/wiki/MESI_protocol" class="bare">https://en.wikipedia.org/wiki/MESI_protocol</a></p>
 </div>
@@ -39903,7 +42099,7 @@ CACHE2 S nyy
 </div>
 </div>
 <div class="sect3">
-<h4 id="mosi-cache-coherence-protocol"><a class="anchor" href="#mosi-cache-coherence-protocol"></a><a class="link" href="#mosi-cache-coherence-protocol">32.5.6. MOSI cache coherence protocol</a></h4>
+<h4 id="mosi-cache-coherence-protocol"><a class="anchor" href="#mosi-cache-coherence-protocol"></a><a class="link" href="#mosi-cache-coherence-protocol">32.6.6. MOSI cache coherence protocol</a></h4>
 <div class="paragraph">
 <p><a href="https://en.wikipedia.org/wiki/MOSI_protocol" class="bare">https://en.wikipedia.org/wiki/MOSI_protocol</a> The critical MSI vs MOSI section was a bit bogus though: <a href="https://en.wikipedia.org/w/index.php?title=MOSI_protocol&amp;oldid=895443023" class="bare">https://en.wikipedia.org/w/index.php?title=MOSI_protocol&amp;oldid=895443023</a> but I edited it :-)</p>
 </div>
@@ -39963,7 +42159,7 @@ CACHE2 S nyy
 </div>
 </div>
 <div class="sect3">
-<h4 id="moesi"><a class="anchor" href="#moesi"></a><a class="link" href="#moesi">32.5.7. MOESI cache coherence protocol</a></h4>
+<h4 id="moesi"><a class="anchor" href="#moesi"></a><a class="link" href="#moesi">32.6.7. MOESI cache coherence protocol</a></h4>
 <div class="paragraph">
 <p><a href="https://en.wikipedia.org/wiki/MOESI_protocol" class="bare">https://en.wikipedia.org/wiki/MOESI_protocol</a></p>
 </div>
@@ -40495,7 +42691,7 @@ export CCACHE_MAXSIZE="20G"</pre>
 <div class="sect3">
 <h4 id="run-toolchain"><a class="anchor" href="#run-toolchain"></a><a class="link" href="#run-toolchain">33.10.1. run-toolchain</a></h4>
 <div class="paragraph">
-<p>While you could just manually find/learn the path to toolchain tools, e.g. in LKMC b15a0e455d691afa49f3b813ad9b09394dfb02b7 they are</p>
+<p>While you could just manually find/learn the path to toolchain tools, e.g. in LKMC b15a0e455d691afa49f3b813ad9b09394dfb02b7 they are:</p>
 </div>
 <div class="literalblock">
 <div class="content">
@@ -40513,6 +42709,17 @@ export CCACHE_MAXSIZE="20G"</pre>
 </div>
 </div>
 <div class="paragraph">
+<p>This plays nicely with <a href="#getvar">getvar</a> e.g. you could disassembly <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/c/hello.c">userland/c/hello.c</a> with:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>./run-toolchain --arch aarch64 objdump -- -D $(./getvar --arch aarch64 userland_build_dir)/c/hello.out</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>however disassembly is such a common use case that we have a shortcut for it: <a href="#disas">disas</a>.</p>
+</div>
+<div class="paragraph">
 <p>Alternatively, if you just need a variable to feed into your own Build system, you can also use <a href="#getvar">getvar</a>:</p>
 </div>
 <div class="literalblock">
@@ -40528,6 +42735,36 @@ export CCACHE_MAXSIZE="20G"</pre>
 <pre>/path/to/linux-kernel-module-cheat/out/buildroot/build/default/aarch64/host/usr/bin/aarch64-buildroot-linux-gnu</pre>
 </div>
 </div>
+<div class="sect4">
+<h5 id="disas"><a class="anchor" href="#disas"></a><a class="link" href="#disas">33.10.1.1. disas</a></h5>
+<div class="paragraph">
+<p>Since disassembly of a single function of a LKMC executable with GDB is such a common use case for <a href="#run-toolchain">run-toolchain</a> via <a href="https://stackoverflow.com/questions/22769246/how-to-disassemble-one-single-function-using-objdump" class="bare">https://stackoverflow.com/questions/22769246/how-to-disassemble-one-single-function-using-objdump</a>, we have this shortcut for it.</p>
+</div>
+<div class="paragraph">
+<p>For example to disassemle a function from an <a href="#userland-content">userland binary</a>:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>./disas --arch aarch64 --userland userland/c/hello.c main</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>or to disassemble a function from the <a href="#linux-kernel">Linux kernel</a>:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>./disas --arch aarch64 start_kernel</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>and a <a href="#baremetal-setup">baremetal</a> executable:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>./disas --arch aarch64 --baremetal baremetal/arch/aarch64/no_bootloader/exit.S _start</pre>
+</div>
+</div>
+</div>
 </div>
 </div>
 <div class="sect2">
@@ -40889,9 +43126,36 @@ git -C "$(./getvar buildroot_source_dir)" checkout -
 </div>
 </div>
 <div class="sect2">
-<h3 id="directory-structure"><a class="anchor" href="#directory-structure"></a><a class="link" href="#directory-structure">33.14. Directory structure</a></h3>
+<h3 id="optimization-level-of-a-build"><a class="anchor" href="#optimization-level-of-a-build"></a><a class="link" href="#optimization-level-of-a-build">33.14. Optimization level of a build</a></h3>
+<div class="paragraph">
+<p>The <code>--optimization-level</code> option is available on all build scripts and sets the given GCC `-`O optimization level where it has been implemented for guest binaries.</p>
+</div>
+<div class="paragraph">
+<p>The default optimization level is <code>-O0</code> to improve guest visibility.</p>
+</div>
+<div class="paragraph">
+<p>To keep things sane, you generally want to create a separate <a href="#build-variants">build variant</a> for each optimization level, e.g. to create an <code>-O3</code> build:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>./build-userland --optimization-level 3 --userland-build-id o3
+./run --userland userland/c/hello.c --userland-build-id o3</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Note that for some guest content, there are hard technical challenges why we are not able to forward <code>-O</code>, notably the linux kernel: <a href="#kernel-o0">Disable kernel compiler optimizations</a>.</p>
+</div>
+<div class="paragraph">
+<p>Our emulators however are build with higher optimization levels by default otherwise running anything would be too unbearably slow.</p>
+</div>
+<div class="paragraph">
+<p>Emulator builds are also controlled with other mechanisms instead of <code>--optimization-level</code> as explained at: <a href="#debug-the-emulator">Debug the emulator</a>.</p>
+</div>
+</div>
+<div class="sect2">
+<h3 id="directory-structure"><a class="anchor" href="#directory-structure"></a><a class="link" href="#directory-structure">33.15. Directory structure</a></h3>
 <div class="sect3">
-<h4 id="lkmc-directory"><a class="anchor" href="#lkmc-directory"></a><a class="link" href="#lkmc-directory">33.14.1. lkmc directory</a></h4>
+<h4 id="lkmc-directory"><a class="anchor" href="#lkmc-directory"></a><a class="link" href="#lkmc-directory">33.15.1. lkmc directory</a></h4>
 <div class="paragraph">
 <p><a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/lkmc/">lkmc/</a> contains sources and headers that are shared across kernel modules, userland and baremetal examples.</p>
 </div>
@@ -40902,7 +43166,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -
 <p>Another option would have been to name it as <code>includes/lkmc</code>, but that would make paths longer, and we might want to store source code in that directory as well in the future.</p>
 </div>
 <div class="sect4">
-<h5 id="userland-objects-vs-header-only"><a class="anchor" href="#userland-objects-vs-header-only"></a><a class="link" href="#userland-objects-vs-header-only">33.14.1.1. Userland objects vs header-only</a></h5>
+<h5 id="userland-objects-vs-header-only"><a class="anchor" href="#userland-objects-vs-header-only"></a><a class="link" href="#userland-objects-vs-header-only">33.15.1.1. Userland objects vs header-only</a></h5>
 <div class="paragraph">
 <p>When factoring out functionality across userland examples, there are two main options:</p>
 </div>
@@ -40961,7 +43225,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -
 </div>
 </div>
 <div class="sect3">
-<h4 id="buildroot_packages-directory"><a class="anchor" href="#buildroot_packages-directory"></a><a class="link" href="#buildroot_packages-directory">33.14.2. buildroot_packages directory</a></h4>
+<h4 id="buildroot_packages-directory"><a class="anchor" href="#buildroot_packages-directory"></a><a class="link" href="#buildroot_packages-directory">33.15.2. buildroot_packages directory</a></h4>
 <div class="paragraph">
 <p>Source: <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/buildroot_packages/">buildroot_packages/</a>.</p>
 </div>
@@ -41010,7 +43274,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -
 <p>A custom build script can give you more flexibility: e.g. the package can be made work with other root filesystems more easily, have better <a href="#9p">9P</a> support, and rebuild faster as it evades some Buildroot boilerplate.</p>
 </div>
 <div class="sect4">
-<h5 id="kernel-modules-buildroot-package"><a class="anchor" href="#kernel-modules-buildroot-package"></a><a class="link" href="#kernel-modules-buildroot-package">33.14.2.1. kernel_modules buildroot package</a></h5>
+<h5 id="kernel-modules-buildroot-package"><a class="anchor" href="#kernel-modules-buildroot-package"></a><a class="link" href="#kernel-modules-buildroot-package">33.15.2.1. kernel_modules buildroot package</a></h5>
 <div class="paragraph">
 <p>Source: <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/buildroot_packages/kernel_modules/">buildroot_packages/kernel_modules/</a></p>
 </div>
@@ -41057,9 +43321,9 @@ git -C "$(./getvar buildroot_source_dir)" checkout -
 </div>
 </div>
 <div class="sect3">
-<h4 id="patches-directory"><a class="anchor" href="#patches-directory"></a><a class="link" href="#patches-directory">33.14.3. patches directory</a></h4>
+<h4 id="patches-directory"><a class="anchor" href="#patches-directory"></a><a class="link" href="#patches-directory">33.15.3. patches directory</a></h4>
 <div class="sect4">
-<h5 id="patches-global-directory"><a class="anchor" href="#patches-global-directory"></a><a class="link" href="#patches-global-directory">33.14.3.1. patches/global directory</a></h5>
+<h5 id="patches-global-directory"><a class="anchor" href="#patches-global-directory"></a><a class="link" href="#patches-global-directory">33.15.3.1. patches/global directory</a></h5>
 <div class="paragraph">
 <p>Has the following structure:</p>
 </div>
@@ -41076,7 +43340,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -
 </div>
 </div>
 <div class="sect4">
-<h5 id="patches-manual-directory"><a class="anchor" href="#patches-manual-directory"></a><a class="link" href="#patches-manual-directory">33.14.3.2. patches/manual directory</a></h5>
+<h5 id="patches-manual-directory"><a class="anchor" href="#patches-manual-directory"></a><a class="link" href="#patches-manual-directory">33.15.3.2. patches/manual directory</a></h5>
 <div class="paragraph">
 <p>Patches in this directory are never applied automatically: it is up to users to manually apply them before usage following the instructions in this documentation.</p>
 </div>
@@ -41086,7 +43350,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -
 </div>
 </div>
 <div class="sect3">
-<h4 id="rootfs_overlay"><a class="anchor" href="#rootfs_overlay"></a><a class="link" href="#rootfs_overlay">33.14.4. rootfs_overlay</a></h4>
+<h4 id="rootfs_overlay"><a class="anchor" href="#rootfs_overlay"></a><a class="link" href="#rootfs_overlay">33.15.4. rootfs_overlay</a></h4>
 <div class="paragraph">
 <p>Source: <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/rootfs_overlay">rootfs_overlay</a>.</p>
 </div>
@@ -41133,7 +43397,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -
 <p>This way you can just hack away the scripts and try them out immediately without any further operations.</p>
 </div>
 <div class="sect4">
-<h5 id="out_rootfs_overlay_dir"><a class="anchor" href="#out_rootfs_overlay_dir"></a><a class="link" href="#out_rootfs_overlay_dir">33.14.4.1. out_rootfs_overlay_dir</a></h5>
+<h5 id="out_rootfs_overlay_dir"><a class="anchor" href="#out_rootfs_overlay_dir"></a><a class="link" href="#out_rootfs_overlay_dir">33.15.4.1. out_rootfs_overlay_dir</a></h5>
 <div class="paragraph">
 <p>This path can be found with:</p>
 </div>
@@ -41167,7 +43431,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -
 </div>
 </div>
 <div class="sect3">
-<h4 id="lkmc-c"><a class="anchor" href="#lkmc-c"></a><a class="link" href="#lkmc-c">33.14.5. lkmc.c</a></h4>
+<h4 id="lkmc-c"><a class="anchor" href="#lkmc-c"></a><a class="link" href="#lkmc-c">33.15.5. lkmc.c</a></h4>
 <div class="paragraph">
 <p>The files:</p>
 </div>
@@ -41197,7 +43461,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -
 </div>
 </div>
 <div class="sect3">
-<h4 id="lkmc_home"><a class="anchor" href="#lkmc_home"></a><a class="link" href="#lkmc_home">33.14.6. lkmc_home</a></h4>
+<h4 id="lkmc_home"><a class="anchor" href="#lkmc_home"></a><a class="link" href="#lkmc_home">33.15.6. lkmc_home</a></h4>
 <div class="paragraph">
 <p><code>lkmc_home</code> refers to the target base directory in which we put all our custom built stuff, such as <a href="#userland-setup">userland executables</a> and <a href="#your-first-kernel-module-hack">kernel modules</a>.</p>
 </div>
@@ -41230,7 +43494,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -
 </div>
 </div>
 <div class="sect3">
-<h4 id="path-properties"><a class="anchor" href="#path-properties"></a><a class="link" href="#path-properties">33.14.7. path_properties.py</a></h4>
+<h4 id="path-properties"><a class="anchor" href="#path-properties"></a><a class="link" href="#path-properties">33.15.7. path_properties.py</a></h4>
 <div class="paragraph">
 <p>In order to build and run each userland and <a href="#baremetal-setup">baremetal</a> example properly, we need per-file metadata such as compiler flags and required number of cores.</p>
 </div>
@@ -41293,7 +43557,7 @@ baremetal=True</pre>
 </div>
 </div>
 <div class="sect3">
-<h4 id="rand_check-out"><a class="anchor" href="#rand_check-out"></a><a class="link" href="#rand_check-out">33.14.8. rand_check.out</a></h4>
+<h4 id="rand_check-out"><a class="anchor" href="#rand_check-out"></a><a class="link" href="#rand_check-out">33.15.8. rand_check.out</a></h4>
 <div class="paragraph">
 <p>Print out several parameters that normally change randomly from boot to boot:</p>
 </div>
@@ -41321,9 +43585,9 @@ baremetal=True</pre>
 </div>
 </div>
 <div class="sect2">
-<h3 id="test-this-repo"><a class="anchor" href="#test-this-repo"></a><a class="link" href="#test-this-repo">33.15. Test this repo</a></h3>
+<h3 id="test-this-repo"><a class="anchor" href="#test-this-repo"></a><a class="link" href="#test-this-repo">33.16. Test this repo</a></h3>
 <div class="sect3">
-<h4 id="automated-tests"><a class="anchor" href="#automated-tests"></a><a class="link" href="#automated-tests">33.15.1. Automated tests</a></h4>
+<h4 id="automated-tests"><a class="anchor" href="#automated-tests"></a><a class="link" href="#automated-tests">33.16.1. Automated tests</a></h4>
 <div class="paragraph">
 <p>Run almost all tests:</p>
 </div>
@@ -41379,7 +43643,7 @@ echo $?</pre>
 <p><a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/test">test</a> does not all possible tests, because there are too many possible variations and that would take forever. The rationale is the same as for <code>./build all</code> and is explained in <code>./build --help</code>.</p>
 </div>
 <div class="sect4">
-<h5 id="test-arch-and-emulator-selection"><a class="anchor" href="#test-arch-and-emulator-selection"></a><a class="link" href="#test-arch-and-emulator-selection">33.15.1.1. Test arch and emulator selection</a></h5>
+<h5 id="test-arch-and-emulator-selection"><a class="anchor" href="#test-arch-and-emulator-selection"></a><a class="link" href="#test-arch-and-emulator-selection">33.16.1.1. Test arch and emulator selection</a></h5>
 <div class="paragraph">
 <p>You can select multiple archs and emulators of interest, as for an other command, with:</p>
 </div>
@@ -41412,7 +43676,7 @@ echo $?</pre>
 </div>
 </div>
 <div class="sect4">
-<h5 id="quit-on-fail"><a class="anchor" href="#quit-on-fail"></a><a class="link" href="#quit-on-fail">33.15.1.2. Quit on fail</a></h5>
+<h5 id="quit-on-fail"><a class="anchor" href="#quit-on-fail"></a><a class="link" href="#quit-on-fail">33.16.1.2. Quit on fail</a></h5>
 <div class="paragraph">
 <p>By default, continue running even after the first failure happens, and they show a summary at the end.</p>
 </div>
@@ -41426,7 +43690,7 @@ echo $?</pre>
 </div>
 </div>
 <div class="sect4">
-<h5 id="test-userland-in-full-system"><a class="anchor" href="#test-userland-in-full-system"></a><a class="link" href="#test-userland-in-full-system">33.15.1.3. Test userland in full system</a></h5>
+<h5 id="test-userland-in-full-system"><a class="anchor" href="#test-userland-in-full-system"></a><a class="link" href="#test-userland-in-full-system">33.16.1.3. Test userland in full system</a></h5>
 <div class="paragraph">
 <p>TODO: we really need a mechanism to automatically generate the test list automatically e.g. based on <a href="#path-properties">path_properties.py</a>, currently there are many tests missing, and we have to add everything manually which is very annoying.</p>
 </div>
@@ -41455,7 +43719,7 @@ echo $?</pre>
 </div>
 </div>
 <div class="sect4">
-<h5 id="gdb-tests"><a class="anchor" href="#gdb-tests"></a><a class="link" href="#gdb-tests">33.15.1.4. GDB tests</a></h5>
+<h5 id="gdb-tests"><a class="anchor" href="#gdb-tests"></a><a class="link" href="#gdb-tests">33.16.1.4. GDB tests</a></h5>
 <div class="paragraph">
 <p>We have some <a href="https://github.com/pexpect/pexpect">pexpect</a> automated tests for GDB for both userland and baremetal programs!</p>
 </div>
@@ -41528,7 +43792,7 @@ echo $?</pre>
 </div>
 </div>
 <div class="sect4">
-<h5 id="magic-failure-string"><a class="anchor" href="#magic-failure-string"></a><a class="link" href="#magic-failure-string">33.15.1.5. Magic failure string</a></h5>
+<h5 id="magic-failure-string"><a class="anchor" href="#magic-failure-string"></a><a class="link" href="#magic-failure-string">33.16.1.5. Magic failure string</a></h5>
 <div class="paragraph">
 <p>We do not know of any way to set the emulator exit status in QEMU arm full system.</p>
 </div>
@@ -41631,9 +43895,9 @@ echo $?</pre>
 </div>
 </div>
 <div class="sect3">
-<h4 id="non-automated-tests"><a class="anchor" href="#non-automated-tests"></a><a class="link" href="#non-automated-tests">33.15.2. Non-automated tests</a></h4>
+<h4 id="non-automated-tests"><a class="anchor" href="#non-automated-tests"></a><a class="link" href="#non-automated-tests">33.16.2. Non-automated tests</a></h4>
 <div class="sect4">
-<h5 id="test-gdb-linux-kernel"><a class="anchor" href="#test-gdb-linux-kernel"></a><a class="link" href="#test-gdb-linux-kernel">33.15.2.1. Test GDB Linux kernel</a></h5>
+<h5 id="test-gdb-linux-kernel"><a class="anchor" href="#test-gdb-linux-kernel"></a><a class="link" href="#test-gdb-linux-kernel">33.16.2.1. Test GDB Linux kernel</a></h5>
 <div class="paragraph">
 <p>For the Linux kernel, do the following manual tests for now.</p>
 </div>
@@ -41671,7 +43935,7 @@ echo $?</pre>
 </div>
 </div>
 <div class="sect4">
-<h5 id="test-the-internet"><a class="anchor" href="#test-the-internet"></a><a class="link" href="#test-the-internet">33.15.2.2. Test the Internet</a></h5>
+<h5 id="test-the-internet"><a class="anchor" href="#test-the-internet"></a><a class="link" href="#test-the-internet">33.16.2.2. Test the Internet</a></h5>
 <div class="paragraph">
 <p>You should also test that the Internet works:</p>
 </div>
@@ -41682,7 +43946,7 @@ echo $?</pre>
 </div>
 </div>
 <div class="sect4">
-<h5 id="cli-script-tests"><a class="anchor" href="#cli-script-tests"></a><a class="link" href="#cli-script-tests">33.15.2.3. CLI script tests</a></h5>
+<h5 id="cli-script-tests"><a class="anchor" href="#cli-script-tests"></a><a class="link" href="#cli-script-tests">33.16.2.3. CLI script tests</a></h5>
 <div class="paragraph">
 <p><code>build-userland</code> and <code>test-executables</code> have a wide variety of target selection modes, and it was hard to keep them all working without some tests:</p>
 </div>
@@ -41700,7 +43964,7 @@ echo $?</pre>
 </div>
 </div>
 <div class="sect2">
-<h3 id="bisection"><a class="anchor" href="#bisection"></a><a class="link" href="#bisection">33.16. Bisection</a></h3>
+<h3 id="bisection"><a class="anchor" href="#bisection"></a><a class="link" href="#bisection">33.17. Bisection</a></h3>
 <div class="paragraph">
 <p>When updating the Linux kernel, QEMU and gem5, things sometimes break.</p>
 </div>
@@ -41756,7 +44020,7 @@ git submodule update
 </div>
 </div>
 <div class="sect2">
-<h3 id="update-a-forked-submodule"><a class="anchor" href="#update-a-forked-submodule"></a><a class="link" href="#update-a-forked-submodule">33.17. Update a forked submodule</a></h3>
+<h3 id="update-a-forked-submodule"><a class="anchor" href="#update-a-forked-submodule"></a><a class="link" href="#update-a-forked-submodule">33.18. Update a forked submodule</a></h3>
 <div class="paragraph">
 <p>This is a template update procedure for submodules for which we have some patches on on top of mainline.</p>
 </div>
@@ -41785,9 +44049,9 @@ git commit -m "linux: update to ${next_mainline_revision}"</pre>
 </div>
 </div>
 <div class="sect2">
-<h3 id="release"><a class="anchor" href="#release"></a><a class="link" href="#release">33.18. Release</a></h3>
+<h3 id="release"><a class="anchor" href="#release"></a><a class="link" href="#release">33.19. Release</a></h3>
 <div class="sect3">
-<h4 id="release-procedure"><a class="anchor" href="#release-procedure"></a><a class="link" href="#release-procedure">33.18.1. Release procedure</a></h4>
+<h4 id="release-procedure"><a class="anchor" href="#release-procedure"></a><a class="link" href="#release-procedure">33.19.1. Release procedure</a></h4>
 <div class="paragraph">
 <p>Ensure that the <a href="#automated-tests">Automated tests</a> are passing on a clean build:</p>
 </div>
@@ -41798,7 +44062,7 @@ git commit -m "linux: update to ${next_mainline_revision}"</pre>
 </div>
 </div>
 <div class="paragraph">
-<p>The <code>./build-test</code> command builds a superset of what will be downloaded which also tests other things we would like to be working on the release. For the minimal build to generate the files to be uploaded, see: <a href="#release-zip">Section 33.18.2, &#8220;release-zip&#8221;</a></p>
+<p>The <code>./build-test</code> command builds a superset of what will be downloaded which also tests other things we would like to be working on the release. For the minimal build to generate the files to be uploaded, see: <a href="#release-zip">Section 33.19.2, &#8220;release-zip&#8221;</a></p>
 </div>
 <div class="paragraph">
 <p>The clean build is necessary as it generates clean images since <a href="#remove-buildroot-packages">it is not possible to remove Buildroot packages</a></p>
@@ -41868,7 +44132,7 @@ git push --follow-tags
 </div>
 </div>
 <div class="sect3">
-<h4 id="release-zip"><a class="anchor" href="#release-zip"></a><a class="link" href="#release-zip">33.18.2. release-zip</a></h4>
+<h4 id="release-zip"><a class="anchor" href="#release-zip"></a><a class="link" href="#release-zip">33.19.2. release-zip</a></h4>
 <div class="paragraph">
 <p>Create a zip containing all files required for <a href="#prebuilt">Prebuilt setup</a>:</p>
 </div>
@@ -41893,7 +44157,7 @@ git push --follow-tags
 </div>
 </div>
 <div class="sect3">
-<h4 id="release-upload"><a class="anchor" href="#release-upload"></a><a class="link" href="#release-upload">33.18.3. release-upload</a></h4>
+<h4 id="release-upload"><a class="anchor" href="#release-upload"></a><a class="link" href="#release-upload">33.19.3. release-upload</a></h4>
 <div class="paragraph">
 <p>After:</p>
 </div>
@@ -41941,9 +44205,9 @@ git push --follow-tags
 </div>
 </div>
 <div class="sect2">
-<h3 id="design-rationale"><a class="anchor" href="#design-rationale"></a><a class="link" href="#design-rationale">33.19. Design rationale</a></h3>
+<h3 id="design-rationale"><a class="anchor" href="#design-rationale"></a><a class="link" href="#design-rationale">33.20. Design rationale</a></h3>
 <div class="sect3">
-<h4 id="design-goals"><a class="anchor" href="#design-goals"></a><a class="link" href="#design-goals">33.19.1. Design goals</a></h4>
+<h4 id="design-goals"><a class="anchor" href="#design-goals"></a><a class="link" href="#design-goals">33.20.1. Design goals</a></h4>
 <div class="paragraph">
 <p>This project was created to help me understand, modify and test low level system components by using system simulators.</p>
 </div>
@@ -42019,7 +44283,7 @@ git push --follow-tags
 </div>
 </div>
 <div class="sect3">
-<h4 id="setup-trade-offs"><a class="anchor" href="#setup-trade-offs"></a><a class="link" href="#setup-trade-offs">33.19.2. Setup trade-offs</a></h4>
+<h4 id="setup-trade-offs"><a class="anchor" href="#setup-trade-offs"></a><a class="link" href="#setup-trade-offs">33.20.2. Setup trade-offs</a></h4>
 <div class="paragraph">
 <p>The trade-offs between the different <a href="#getting-started">setups</a> are basically a balance between:</p>
 </div>
@@ -42044,13 +44308,13 @@ git push --follow-tags
 <p>compatibility: how likely is is that all the components will work well together: emulator, compiler, kernel, standard library, &#8230;&#8203;</p>
 </li>
 <li>
-<p>guest software availability: how wide is your choice of easily installed guest software packages? See also: <a href="#linux-distro-choice">Section 33.19.4, &#8220;Linux distro choice&#8221;</a></p>
+<p>guest software availability: how wide is your choice of easily installed guest software packages? See also: <a href="#linux-distro-choice">Section 33.20.4, &#8220;Linux distro choice&#8221;</a></p>
 </li>
 </ul>
 </div>
 </div>
 <div class="sect3">
-<h4 id="resource-tradeoff-guidelines"><a class="anchor" href="#resource-tradeoff-guidelines"></a><a class="link" href="#resource-tradeoff-guidelines">33.19.3. Resource tradeoff guidelines</a></h4>
+<h4 id="resource-tradeoff-guidelines"><a class="anchor" href="#resource-tradeoff-guidelines"></a><a class="link" href="#resource-tradeoff-guidelines">33.20.3. Resource tradeoff guidelines</a></h4>
 <div class="paragraph">
 <p>Choosing which features go into our default builds means making tradeoffs, here are our guidelines:</p>
 </div>
@@ -42095,7 +44359,7 @@ git push --follow-tags
 </div>
 </div>
 <div class="sect3">
-<h4 id="linux-distro-choice"><a class="anchor" href="#linux-distro-choice"></a><a class="link" href="#linux-distro-choice">33.19.4. Linux distro choice</a></h4>
+<h4 id="linux-distro-choice"><a class="anchor" href="#linux-distro-choice"></a><a class="link" href="#linux-distro-choice">33.20.4. Linux distro choice</a></h4>
 <div class="paragraph">
 <p>We haven&#8217;t found the ultimate distro yet, here is a summary table of trade-offs that we care about: <a href="#table-lkmc-linux-distro-comparison">Table 8, &#8220;Comparison of Linux distros for usage in this repository&#8221;</a>.</p>
 </div>
@@ -42198,9 +44462,9 @@ git push --follow-tags
 </div>
 </div>
 <div class="sect2">
-<h3 id="soft-topics"><a class="anchor" href="#soft-topics"></a><a class="link" href="#soft-topics">33.20. Soft topics</a></h3>
+<h3 id="soft-topics"><a class="anchor" href="#soft-topics"></a><a class="link" href="#soft-topics">33.21. Soft topics</a></h3>
 <div class="sect3">
-<h4 id="fairy-tale"><a class="anchor" href="#fairy-tale"></a><a class="link" href="#fairy-tale">33.20.1. Fairy tale</a></h4>
+<h4 id="fairy-tale"><a class="anchor" href="#fairy-tale"></a><a class="link" href="#fairy-tale">33.21.1. Fairy tale</a></h4>
 <div class="quoteblock">
 <blockquote>
 <div class="paragraph">
@@ -42237,7 +44501,7 @@ git push --follow-tags
 </div>
 </div>
 <div class="sect3">
-<h4 id="should-you-waste-your-life-with-systems-programming"><a class="anchor" href="#should-you-waste-your-life-with-systems-programming"></a><a class="link" href="#should-you-waste-your-life-with-systems-programming">33.20.2. Should you waste your life with systems programming?</a></h4>
+<h4 id="should-you-waste-your-life-with-systems-programming"><a class="anchor" href="#should-you-waste-your-life-with-systems-programming"></a><a class="link" href="#should-you-waste-your-life-with-systems-programming">33.21.2. Should you waste your life with systems programming?</a></h4>
 <div class="paragraph">
 <p>Being the hardcore person who fully understands an important complex system such as a computer, it does have a nice ring to it doesn&#8217;t it?</p>
 </div>
@@ -42266,6 +44530,9 @@ git push --follow-tags
 <div class="paragraph">
 <p>In that sense, therefore, the kernel is not as open as one might want to believe.</p>
 </div>
+<div class="paragraph">
+<p>Of course, if there is some <a href="https://stackoverflow.com/questions/1697842/do-graphic-cards-have-instruction-sets-of-their-own/1697883">super useful and undocumented hardware that is just waiting there to be reverse engineered</a>, then that&#8217;s a much juicier target :-)</p>
+</div>
 </li>
 <li>
 <p>it is impossible to become rich with this knowledge.</p>
@@ -42314,7 +44581,7 @@ git push --follow-tags
 </ul>
 </div>
 <div class="paragraph">
-<p>Are you fine with those points, and ready to continue wasting your life?</p>
+<p>Are you fine with those points, and ready to continue wasting your life with this crap?</p>
 </div>
 <div class="paragraph">
 <p>Good. In that case, read on, and let&#8217;s have some fun together ;-)</p>
@@ -42322,7 +44589,7 @@ git push --follow-tags
 </div>
 </div>
 <div class="sect2">
-<h3 id="bibliography"><a class="anchor" href="#bibliography"></a><a class="link" href="#bibliography">33.21. Bibliography</a></h3>
+<h3 id="bibliography"><a class="anchor" href="#bibliography"></a><a class="link" href="#bibliography">33.22. Bibliography</a></h3>
 <div class="paragraph">
 <p>Runnable stuff:</p>
 </div>
@@ -42366,6 +44633,9 @@ git push --follow-tags
 <div class="ulist">
 <ul>
 <li>
+<p><a href="http://cs241.cs.illinois.edu/coursebook/index.html" class="bare">http://cs241.cs.illinois.edu/coursebook/index.html</a> "CS 241: System Programming" from the University of Illinois at Urbana-Champaign. Has a PDF, Tex source at: <a href="https://github.com/illinois-cs241/coursebook" class="bare">https://github.com/illinois-cs241/coursebook</a> TODO any runnable code?</p>
+</li>
+<li>
 <p><a href="https://github.com/0xAX/linux-insides" class="bare">https://github.com/0xAX/linux-insides</a> wait, how come they have 10x more starts as this repo? :-) Just kidding, awesome effort.</p>
 </li>
 <li>