diff --git a/README.adoc b/README.adoc
index f54f95b..2524cfe 100644
--- a/README.adoc
+++ b/README.adoc
@@ -12643,6 +12643,56 @@ xdg-open "$(./getvar --arch arm --emulator gem5 m5out_dir)/config.dot.svg"
 
 An example of such file can be seen at: <<config-dot-svg-timingsimplecpu>>.
 
+On Ubuntu 20.04, you can also see the dot file "directly" with xdot:
+
+....
+xdot "$(./getvar --arch arm --emulator gem5 m5out_dir)/config.dot"
+....
+
+which is kind of really cool because it allows you to graph arrows with clicks.
+
+It is worth noting that if you are running a bunch of short simulations, dot/SVG/PDF generation could have a significant impact in simulation startup time, so it is something to watch out for. As per https://gem5-review.googlesource.com/c/public/gem5/+/29232 it can be turned off with:
+
+....
+gem5.opt --dot-config=''
+....
+
+or in LKMC:
+
+....
+./run --gem5-exe-args='--dot-config= --json-config= --dump-config='
+....
+
+The time difference can be readily observed on minimal examples by running gem5 with `time`.
+
+By looking into gem5 872cb227fdc0b4d60acc7840889d567a6936b6e1 `src/python/m5/util/dot_writer.py` are can try to remove the SVG/PDF conversion to see if those dominate the runtime:
+
+....
+def do_dot(root, outdir, dotFilename):
+    if not pydot:
+        warn("No dot file generated. " +
+             "Please install pydot to generate the dot file and pdf.")
+        return
+    # * use ranksep > 1.0 for for vertical separation between nodes
+    # especially useful if you need to annotate edges using e.g. visio
+    # which accepts svg format
+    # * no need for hoizontal separation as nothing moves horizonally
+    callgraph = pydot.Dot(graph_type='digraph', ranksep='1.3')
+    dot_create_nodes(root, callgraph)
+    dot_create_edges(root, callgraph)
+    dot_filename = os.path.join(outdir, dotFilename)
+    callgraph.write(dot_filename)
+    try:
+        # dot crashes if the figure is extremely wide.
+        # So avoid terminating simulation unnecessarily
+        callgraph.write_svg(dot_filename + ".svg")
+        callgraph.write_pdf(dot_filename + ".pdf")
+    except:
+        warn("failed to generate dot output from %s", dot_filename)
+....
+
+but nope, they don't, `dot_create_nodes` and `dot_create_edges` are the culprits, so the only way to gain speed is to remove `.dot` generation altogether. It is tempting to do this by default on LKMC and add an option to enable dot generation when desired so we can be a bit faster by default... but I'm lazy to document the option right now. When it annoys me further maybe :-)
+
 === m5term
 
 We use the `m5term` in-tree executable to connect to the terminal instead of a direct `telnet`.
@@ -13217,7 +13267,7 @@ Implementations:
 
 Useful to <<gem5-restore-checkpoint-with-a-different-cpu,boot Linux fast and then checkpoint and switch to a more detailed CPU>>.
 
-====== gem5 `TiminSimpleCPU`
+====== gem5 `TimingSimpleCPU`
 
 `TimingSimpleCPU`: memory accesses are realistic, but the CPU has no pipeline. The simulation is faster than detailed models, but slower than `AtomicSimpleCPU`.
 
@@ -13241,6 +13291,8 @@ The weird name "Minor" stands for "M (TODO what is M) IN ONder".
 
 Its 4 stage pipeline is described at the "MinorCPU" section of <<gem5-arm-rsk>>.
 
+A commented execution example can be seen at: <<gem5-event-queue-minorcpu-syscall-emulation-freestanding-example-analysis>>.
+
 There is also an in-tree doxygen at: https://github.com/gem5/gem5/blob/9fc9c67b4242c03f165951775be5cd0812f2a705/src/doc/inside-minor.doxygen[`src/doc/inside-minor.doxygen`] and rendered at: http://pages.cs.wisc.edu/~swilson/gem5-docs/minor.html
 
 As of 2019, in-order cores are mostly present in low power/cost contexts, for example little cores of https://en.wikipedia.org/wiki/ARM_big.LITTLE[ARM bigLITTLE].
@@ -13267,10 +13319,12 @@ Implemented by Pierre-Yves Péneau from LIRMM, which is a research lab in Montpe
 
 ===== gem5 DerivO3CPU
 
-Generic out-of-order core. "O3" Stands for "Out Of Order"!
+Generic <<out-of-order-execution,out-of-order core>>. "O3" Stands for "Out Of Order"!
 
 Analogous to <<gem5-minorcpu,MinorCPU>>, but modelling an out of order core instead of in order.
 
+A commented execution example can be seen at: <<gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis>>.
+
 Existing parametrizations:
 
 * `ex5_big`: big corresponding to `ex5_LITTLE`, by same author at same time. It description reads:
@@ -15789,6 +15843,16 @@ Fault STXRX64::completeAcc(PacketPtr pkt, ExecContext *xc,
 
 From GDB on <<timingsimplecpu-analysis-ldr-stall>> we see that `completeAcc` gets called from `TimingSimpleCPU::completeDataAccess`.
 
+===== gem5 microops
+
+TODO
+
+Some gem5 instructions break down into multiple microops.
+
+Microops are very similar to regular instructions, and show on the <<gem5-execall-trace-format>> since that flag implies `ExecMicro`.
+
+On aarch64 for example, one of the simplest microoped instructions is <<armv8-aarch64-ldp-and-stp-instructions,STP>>, which does the relatively complex operation of storing two values to memory at once, and is therefore a good candidate for being broken down into microops.
+
 ==== gem5 port system
 
 The gem5 memory system is connected in a very flexible way through the port system.
@@ -16652,6 +16716,36 @@ BaseSimpleCPU::BaseSimpleCPU(BaseSimpleCPUParams *p)
     }
 ....
 
+==== gem5 functional units
+
+TODO
+
+Each instruction is marked with a class, and each class can execute in a given functional unit.
+
+Which units are available is visible for example on the <<gem5-config-ini>> of a <<gem5-minorcpu>> run. Functional units are not present in simple CPUs like <<gem5-timingsimplecpu>>.
+
+For example, on gem5 872cb227fdc0b4d60acc7840889d567a6936b6e1, the `config.ini` of a minor run:
+
+....
+./run   \
+  --arch aarch64 \
+  --emulator gem5 \
+  --userland userland/arch/aarch64/freestanding/linux/hello.S \
+  --trace-insts-stdout \
+  -N1 \
+  -- \
+  --cpu-type MinorCPU \
+  --caches
+....
+
+contains:
+
+....
+[system.cpu]
+type=MinorCPU
+children=branchPred dcache dtb executeFuncUnits icache interrupts isa itb power_state tracer workload
+....
+
 ==== gem5 code generation
 
 gem5 uses a ton of code generation, which makes the project horrendous:
@@ -24159,6 +24253,35 @@ Oh my God, unoptimized code is so horrendously inefficient, even I can't stand a
 
 == Computer architecture
 
+=== Instruction pipelining
+
+In gem5, can be seen on:
+
+* <<gem5-minorcpu>>
+* <<gem5-derivo3cpu>>
+
+==== Classic RISC pipeline
+
+https://en.wikipedia.org/wiki/Classic_RISC_pipeline
+
+gem5's <<gem5-minorcpu>> implements a similar but 4 stage pipeline. TODO why didn't they go with the classic RISC pipeline instead?
+
+=== Superscalar processor
+
+https://en.wikipedia.org/wiki/Superscalar_processor
+
+http://www.lighterra.com/papers/modernmicroprocessors/ explains it well.
+
+You basically decode
+
+TODO in gem5? gem5 definitely has functional units explicitly modelled: <<gem5-functional-units>>, so do <<gem5-minorcpu>> or <<gem5-derivo3cpu>> have it?
+
+=== Out-of-order execution
+
+https://en.wikipedia.org/wiki/Out-of-order_execution
+
+gem5's model is <<gem5-derivo3cpu>>.
+
 === Hardware threads
 
 Intel name: "Hyperthreading"
diff --git a/userland/cpp/atomic/main.hpp b/userland/cpp/atomic/main.hpp
index 1fed060..d8ea55f 100644
--- a/userland/cpp/atomic/main.hpp
+++ b/userland/cpp/atomic/main.hpp
@@ -28,7 +28,7 @@ void threadMain(size_t niters) {
             "incq %0;"
             : "+g" (global),
               "+g" (i) // to prevent loop unrolling, and make results more comparable across methods,
-                       // see also: https://cirosantilli.com/linux-kernel-module-cheat#infinite-busy-loop
+                       // see also: https://cirosantilli.com/linux-kernel-module-cheat#c-busy-loop
             :
             :
         );
diff --git a/userland/gcc/busy_loop.c b/userland/gcc/busy_loop.c
index 7a3e296..d715b6e 100644
--- a/userland/gcc/busy_loop.c
+++ b/userland/gcc/busy_loop.c
@@ -1,5 +1,5 @@
 /* https://cirosantilli.com/linux-kernel-module-cheat#micro-benchmarks
- * https://cirosantilli.com/linux-kernel-module-cheat#infinite-busy-loop
+ * https://cirosantilli.com/linux-kernel-module-cheat#c-busy-loop
  * https://cirosantilli.com/linux-kernel-module-cheat#benchmark-emulators-on-userland-executables */
 
 #include <stdlib.h>