diff --git a/README.adoc b/README.adoc
index f846bd2..a9717ed 100644
--- a/README.adoc
+++ b/README.adoc
@@ -9481,7 +9481,7 @@ https://en.wikipedia.org/wiki/QEMU[QEMU] is a system simulator: it simulates a C
 
 If you are familiar with https://en.wikipedia.org/wiki/VirtualBox[VirtualBox], then QEMU then basically does the same thing: it opens a "window" inside your desktop that can run an operating system inside your operating system.
 
-Also both can use very similar techniques: either https://en.wikipedia.org/wiki/Binary_translation[binary translation] or <<KVM>>. VirtualBox' binary translator is / was based on QEMU's it seems: https://en.wikipedia.org/wiki/VirtualBox#Software-based_virtualization
+Also both can use very similar techniques: either <<binary-translation>> or <<KVM>>. VirtualBox' binary translator is / was based on QEMU's it seems: https://en.wikipedia.org/wiki/VirtualBox#Software-based_virtualization
 
 The huge advantage of QEMU over VirtualBox is that is supports cross arch simulation, e.g. simulate an ARM guest on an x86 host.
 
@@ -9495,6 +9495,12 @@ QEMU is also supported by Buildroot in-tree, see e.g.: https://github.com/buildr
 
 All of this makes QEMU the natural choice of reference system simulator for this repo.
 
+=== Binary translation
+
+https://en.wikipedia.org/wiki/Binary_translation
+
+Used by <<qemu>> and <<gensim>>.
+
 === Disk persistency
 
 We disable disk persistency for both QEMU and gem5 by default, to prevent the emulator from putting the image in an unknown state.
@@ -13490,6 +13496,16 @@ Not sure why it has v7a in the name, since I believe the CPUs are just the micro
 +
 The CLI option is named slightly differently as: `--cpu-type O3_ARM_v7a_3`.
 
+====== gem5 `DerivO3CPU` pipeline stages
+
+* fetch: besides obviously fetching the instruction, this is also where branch prediction runs. Presumably because you need to branch predict before deciding what to fetch next.
+
+* retire: the instruction is completely and totally done with.
++
+Mispeculated instructions never reach this stage as can be seen at: <<gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-speculative>>.
++
+The `ExecAll` happens at this time as well. And therefore `ExecAll` does not happen for mispeculated instructions.
+
 ====== gem5 util/o3-pipeview.py O3 pipeline viewer
 [[gem5-util-o3-pipeview-py-o3-pipeline-viewer]]
 
@@ -16450,6 +16466,85 @@ Then, at time 120000, the LDR data came back, after the wrong prediction had alr
 
 The CPU then noticed that it mispredicted, and so it started again from the correct branch target `movz x2`, and the instructions that were thrown away are marked as `=====` in the timeline.
 
+We can also see some <<branch-predictor>> log lines in the `O3CPUAll` log:
+
+....
+ 130000: Fetch: system.cpu.fetch: [tid:0] [sn:10] Branch at PC 0x40009c predicted to be not taken
+ 130000: Fetch: system.cpu.fetch: [tid:0] [sn:10] Branch at PC 0x40009c predicted to go to (0x4000a0=>0x4000a4).(0=>1)
+
+ 131500: Commit: system.cpu.commit: [tid:10] [sn:0] Inserting PC (0x40009c=>0x4000a0).(0=>1) into ROB.
+ 131500: ROB: system.cpu.rob: Adding inst PC (0x40009c=>0x4000a0).(0=>1) to the ROB.
+ 131500: ROB: system.cpu.rob: [tid:0] Now has 10 instructions.
+
+ 132000: IEW: system.cpu.iew: [tid:0] Issue: Adding PC (0x40009c=>0x4000a0).(0=>1) [sn:10] [tid:0] to IQ.
+ 132000: IQ: system.cpu.iq: Adding instruction [sn:10] PC (0x40009c=>0x4000a0).(0=>1) to the IQ.
+ 132000: IQ: system.cpu.iq: Instruction PC (0x40009c=>0x4000a0).(0=>1) has src reg 6 (CCRegClass) that is being added to the dependency chain.
+ 132000: IQ: system.cpu.iq: Instruction PC (0x40009c=>0x4000a0).(0=>1) has src reg 8 (CCRegClass) that is being added to the dependency chain.
+ 132000: IQ: system.cpu.iq: Instruction PC (0x40009c=>0x4000a0).(0=>1) has src reg 7 (CCRegClass) that is being added to the dependency chain.
+
+ 135500: IQ: system.cpu.iq: Waking up a dependent instruction, [sn:10] PC (0x40009c=>0x4000a0).(0=>1).
+ 135500: IQ: global: [sn:10] has 1 ready out of 3 sources. RTI 0)
+ 135500: IQ: system.cpu.iq: Waking any dependents on register 7 (CCRegClass).
+ 135500: IQ: system.cpu.iq: Waking up a dependent instruction, [sn:10] PC (0x40009c=>0x4000a0).(0=>1).
+ 135500: IQ: global: [sn:10] has 2 ready out of 3 sources. RTI 0)
+ 135500: IQ: system.cpu.iq: Waking any dependents on register 8 (CCRegClass).
+ 135500: IQ: system.cpu.iq: Waking up a dependent instruction, [sn:10] PC (0x40009c=>0x4000a0).(0=>1).
+ 135500: IQ: global: [sn:10] has 3 ready out of 3 sources. RTI 0)
+ 135500: IQ: system.cpu.iq: Instruction is ready to issue, putting it onto the ready list, PC (0x40009c=>0x4000a0).(0=>1) opclass:1 [sn:10].
+ 135500: IEW: system.cpu.iew: Setting Destination Register 6 (CCRegClass)
+ 135500: Scoreboard: system.cpu.scoreboard: Setting reg 6 (CCRegClass) as ready
+ 135500: IEW: system.cpu.iew: Setting Destination Register 7 (CCRegClass)
+ 135500: Scoreboard: system.cpu.scoreboard: Setting reg 7 (CCRegClass) as ready
+ 135500: IEW: system.cpu.iew: Setting Destination Register 8 (CCRegClass)
+ 135500: Scoreboard: system.cpu.scoreboard: Setting reg 8 (CCRegClass) as ready
+ 135500: IQ: system.cpu.iq: Attempting to schedule ready instructions from the IQ.
+ 135500: IQ: system.cpu.iq: Thread 0: Issuing instruction PC (0x40009c=>0x4000a0).(0=>1) [sn:10]
+
+ 136000: IEW: system.cpu.iew: Execute: Processing PC (0x40009c=>0x4000a0).(0=>1), [tid:0] [sn:10].
+ 136000: IEW: global: RegFile: Access to cc register 6, has data 0x2
+ 136000: IEW: global: RegFile: Access to cc register 8, has data 0
+ 136000: IEW: global: RegFile: Access to cc register 7, has data 0
+ 136000: IEW: system.cpu.iew: Current wb cycle: 0, width: 8, numInst: 0
+wbActual:0
+ 136000: IEW: system.cpu.iew: [tid:0] [sn:10] Execute: Branch mispredict detected.
+ 136000: IEW: system.cpu.iew: [tid:0] [sn:10] Predicted target was PC: (0x4000a0=>0x4000a4).(0=>1)
+ 136000: IEW: system.cpu.iew: [tid:0] [sn:10] Execute: Redirecting fetch to PC: (0x40009c=>0x400080).(0=>1)
+ 136000: IEW: system.cpu.iew: [tid:0] [sn:10] Squashing from a specific instruction, PC: (0x40009c=>0x400080).(0=>1) 
+
+ 136500: Commit: system.cpu.commit: [tid:0] Squashing due to branch mispred PC:0x40009c [sn:10]
+ 136500: Commit: system.cpu.commit: [tid:0] Redirecting to PC 0x400084
+ 136500: ROB: system.cpu.rob: Starting to squash within the ROB.
+ 136500: ROB: system.cpu.rob: [tid:0] Squashing instructions until [sn:10].
+ 136500: ROB: system.cpu.rob: [tid:0] Squashing instruction PC (0x4000bc=>0x4000c0).(0=>1), seq num 18.
+ 136500: ROB: system.cpu.rob: [tid:0] Squashing instruction PC (0x4000b8=>0x4000bc).(0=>1), seq num 17.
+ 136500: ROB: system.cpu.rob: [tid:0] Squashing instruction PC (0x4000b4=>0x4000b8).(0=>1), seq num 16.
+ 136500: ROB: system.cpu.rob: [tid:0] Squashing instruction PC (0x4000b0=>0x4000b4).(0=>1), seq num 15.
+ 136500: ROB: system.cpu.rob: [tid:0] Squashing instruction PC (0x4000ac=>0x4000b0).(0=>1), seq num 14.
+ 136500: ROB: system.cpu.rob: [tid:0] Squashing instruction PC (0x4000a8=>0x4000ac).(0=>1), seq num 13.
+ 136500: ROB: system.cpu.rob: [tid:0] Squashing instruction PC (0x4000a4=>0x4000a8).(0=>1), seq num 12.
+ 136500: ROB: system.cpu.rob: [tid:0] Squashing instruction PC (0x4000a0=>0x4000a4).(0=>1), seq num 11.
+ 136500: ROB: system.cpu.rob: [tid:0] Done squashing instructions.
+ 136500: Commit: system.cpu.commit: [tid:0] Marking PC (0x40009c=>0x400080).(0=>1), [sn:10] ready within ROB.
+
+ 137000: Commit: system.cpu.commit: [tid:0] [sn:10] Committing instruction with PC (0x40009c=>0x400080).(0=>1)
+ 130000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+36    :   b.lt   0x400080          : IntAlu :   FetchSeq=10  CPSeq=10  flags=(IsControl|IsDirectControl|IsCondControl)
+ 137000: ROB: system.cpu.rob: [tid:0] Retiring head instruction, instruction PC (0x40009c=>0x400080).(0=>1), [sn:10]
+ 137000: O3CPU: system.cpu: Removing committed instruction [tid:0] PC (0x40009c=>0x400080).(0=>1) [sn:10]
+ 137000: Commit: system.cpu.commit: Trying to commit head instruction, [tid:0] [sn:11]
+ 137000: Commit: system.cpu.commit: Retiring squashed instruction from ROB.
+
+ 137000: Commit: system.cpu.commit: Trying to commit head instruction, [tid:0] [sn:10]
+ 137000: Commit: system.cpu.commit: [tid:0] [sn:10] Committing instruction with PC (0x40009c=>0x400080).(0=>1)
+ 130000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+36    :   b.lt   0x400080          : IntAlu :   FetchSeq=10  CPSeq=10  flags=(IsControl|IsDirectControl|IsCondControl)
+
+ 138500: Fetch: system.cpu.fetch: [tid:0] [sn:26] Branch at PC 0x40009c predicted to be not taken
+ 138500: Fetch: system.cpu.fetch: [tid:0] [sn:26] Branch at PC 0x40009c predicted to go to (0x4000a0=>0x4000a4).(0=>1)
+
+ 142500: Commit: system.cpu.commit: [tid:0] [sn:26] Committing instruction with PC (0x40009c=>0x4000a0).(0=>1)
+ 138500: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+36    :   b.lt   0x400080          : IntAlu :   FetchSeq=26  CPSeq=18  flags=(IsControl|IsDirectControl|IsCondControl)
+ 142500: ROB: system.cpu.rob: [tid:0] Retiring head instruction, instruction PC (0x40009c=>0x4000a0).(0=>1), [sn:26]
+....
+
 With an extra CLI (the branch is not taken):
 
 ....
@@ -18005,6 +18100,83 @@ The horrendous downsides of this are:
 * when <<debug-the-emulator,debugging the emulator>>, it shows you directories inside the build directory rather than in the source tree
 * it is harder to separate which files are <<gem5-code-generation,generated>> and which are in-tree when grepping for code generated definitions
 
+=== Gensim
+
+https://gensim.org
+
+https://bitbucket.org/gensim/gensim
+
+MIT licensed <<binary-translation>> simulator, so a bit like an MIT <<qemu>>.
+
+Video showing it boot Linux fast: https://www.youtube.com/watch?v=aZXx17oYumc
+
+Its name is unfortunately completely and totally overshadowed by an unrelated software with the sane name: https://radimrehurek.com/gensim/
+
+TODO: advantages over QEMU. Like the name implies, they seem to have a nice ISA description language. From quick internals look, seems to generate LLVM intermediate language, which sound good.
+
+Build on Ubuntu 20.04:
+
+....
+sudo apt install libantlr3c-dev
+cd submodule/gensim
+make
+....
+
+First fails with:
+
+....
+arm-none-eabi-gcc: error: unrecognized -march target: armv5
+....
+
+Let's try just armv8, who cares about arvm5!!!
+
+....
+mkdir build
+cd build
+cmake -DTESTING_ENABLED=FALSE -DCMAKE_BUILD_TYPE=DEBUGOPT ..
+make -j`nproc` model-armv8
+....
+
+Now fails as mentioned at https://bitbucket.org/gensim/gensim/issues/34/build-fails-with-unrecognised-intrinsic[]:
+
+....
+terminate called after throwing an instance of 'std::logic_error'
+  what():  Unrecognised intrinsic: __builtin_abs64
+Aborted (core dumped)
+....
+
+Get the failing command with:
+
+,,..
+make VERBOSE=1 model-armv8
+....
+
+and we see some code generation step:
+
+....
+cd /home/ciro/bak/git/linux-kernel-module-cheat/submodules/gensim/models/armv8 && \
+  /home/ciro/bak/git/linux-kernel-module-cheat/submodules/gensim/build/dist/bin/gensim \
+  -a /home/ciro/bak/git/linux-kernel-module-cheat/submodules/gensim/models/armv8/aarch64.ac \
+  -s module,arch,decode,disasm,ee_interp,ee_blockjit,jumpinfo,function,makefile \
+  -o decode.GenerateDotGraph=1,makefile.libtrace_path=/home/ciro/bak/git/linux-kernel-module-cheat/submodules/gensim/support/libtrace/inc,makefile.archsim_path=/home/ciro/bak/git/linux-kernel-module-cheat/submodules/gensim/archsim/inc,makefile.llvm_path=,makefile.Optimise=2,makefile.Debug=1 \
+  -t /home/ciro/bak/git/linux-kernel-module-cheat/submodules/gensim/build/models/armv8/output-aarch64/
+....
+
+We can see an inclusion path:
+
+....
+gensim/models/armv8/aarch64.ac
+		ac_isa("isa.ac");
+gensim/models/armv8/isa.ac
+		ac_execute("execute.simd");
+....
+
+and where `gensim/models/armv8/isa.ac` contains `__builtin_abs64` usages.
+
+GDB on `gensim` shows that the error comes from a call to `gci.GenerateExecuteBodyFor(body_str, *action);`, so it looks like there are some missing cases in `EmitFixedCode`.
+
+This is completely broken academic code! They must be using an off-tree of part of the tool and forgot to commit.
+
 == Buildroot
 
 === Introduction to Buildroot
@@ -19908,6 +20080,97 @@ Canonical source at https://sourceforge.net/projects/lmbench/ but Intel has a fo
 
 Feels old, guessing not representative anymore like <<dhrystone>>. But hey, history!
 
+Ubuntu 20.04 AMD64 native build and run:
+
+....
+git submodule update --init submodules/lmbench
+cd submodules/lmbench
+cd src
+make results
+....
+
+TODO it hangs for a long time at:
+
+....
+Hang on, we are calculating your cache line size.
+....
+
+Bug report: https://github.com/intel/lmbench/issues/15
+
+the If I kill it, configuration process continues:
+
+....
+Killed
+OK, it looks like your cache line is  bytes.
+....
+
+and continues with a few more interactive questions until finally:
+
+....
+Confguration done, thanks.
+....
+
+where it again hangs for at least 2 hours, so I lost patience and killed it.
+
+TODO: how to do a non-interactive config? After the above procedure, `bin/x86_64-linux-gnu/CONFIG.ciro-p51` contains:
+
+....
+DISKS=""
+DISK_DESC=""
+OUTPUT=/dev/null
+ENOUGH=50000
+FASTMEM="NO"
+FILE=/var/tmp/XXX
+FSDIR=/var/tmp
+INFO=INFO.ciro-p51
+LINE_SIZE=
+LOOP_O=0.00000000
+MAIL=no
+TOTAL_MEM=31903
+MB=22332
+MHZ="-1 System too busy"
+MOTHERBOARD=""
+NETWORKS=""
+OS="x86_64-linux-gnu"
+PROCESSORS="8"
+REMOTE=""
+SLOWFS="NO"
+SYNC_MAX="1"
+LMBENCH_SCHED="DEFAULT"
+TIMING_O=0
+RSH=rsh
+RCP=rcp
+VERSION=lmbench-3alpha4
+BENCHMARK_HARDWARE=YES
+BENCHMARK_OS=YES
+BENCHMARK_SYSCALL=
+BENCHMARK_SELECT=
+BENCHMARK_PROC=
+BENCHMARK_CTX=
+BENCHMARK_PAGEFAULT=
+BENCHMARK_FILE=
+BENCHMARK_MMAP=
+BENCHMARK_PIPE=
+BENCHMARK_UNIX=
+BENCHMARK_UDP=
+BENCHMARK_TCP=
+BENCHMARK_CONNECT=
+BENCHMARK_RPC=
+BENCHMARK_HTTP=
+BENCHMARK_BCOPY=
+BENCHMARK_MEM=
+BENCHMARK_OPS=
+....
+
+Native build only without running tests:
+
+....
+cd src
+make
+....
+
+Interestingly, one of the creators of LMbench, Larry Mcvoy (https://www.linkedin.com/in/larrymcvoy/[], https://en.wikipedia.org/wiki/Larry_McVoy[]), is also a co-founder of https://en.wikipedia.org/wiki/BitKeeper[BitKeeper]. Their SMC must be blazingly fast!!! Also his LinkedIn says Intel uses it. But they will forever be remembered as "the closed source Git precursor that died N years ago", RIP.
+
 ==== STREAM benchmark
 
 http://www.cs.virginia.edu/stream/ref.html
@@ -22422,20 +22685,49 @@ aarch32 is a bit more messy due to older setups, we have both:
 * coprocessor accesses:
 ** MRC: reads a system register, C means coprocessor, which is how system registers were previously known as
 ** MCR: write to the system register
-** MRRC: like MRC, but used for the system registers that are marked as 64-bit, and reads to two general purpose regis
+** MRRC: like MRC, but used for the system registers that are marked as 64-bit, and reads to two general purpose register
 ** MCRR: write version of MCRR
 
+TODO why both? For example, as mentioned at https://stackoverflow.com/questions/62920281/cross-compilng-c-program-for-armv8-a-in-linux-x86-64-system/62922677#62922677 a register that was accessed with MRC in armv7 can move to MRS in aarch64, as is the case for:
+
+....
+mrs r0, ctr     /* aarch32 */
+mrc x0, ctr_el0 /* aarch64 */
+....
+
+Other functionality has moved away from coprocessors into actual instructions, e.g. cache invalidation:
+
+....
+/* aarch32: DCISW, Data Cache line Invalidate by Set/Way. */
+mcr     p15, 0, r5, c7, c6, 2
+
+/* aarch64: moved to one of the DC instruction variants. */
+dc isw
+....
+
 <<armarm8-fa>> G1.19.4 "Background to the System register interface" says that only CP14 and CP15 are specified by the ISA:
 
 ____
 The interface to the System registers was originally defined as part of a generic coprocessor interface, that gave access to 15 coprocessors, CP0 - CP15. Of these, CP8 - CP15 were reserved for use by Arm, while CP0 - CP7 were available for IMPLEMENTATION DEFINED coprocessors.
 ____
 
-and the actual coprocessor registers are specified at:
+and the actual coprocessor registers are specified in Chapter G7 "AArch32 System Register Encoding" at:
 
 * CP14: Table G7-1 "Mapping of (coproc ==0b1110) MCR, MRC, and MRRC instruction arguments to System registers"
 * CP15: Table G7-3 "VMSAv8-32 (coproc==0b1111) register summary, in MCR/MRC parameter order."
 
+The actual MRC assembly does not exactly match the order of that table, this is how you can decode it, sample MCR:
+
+....
+mcr     p15, 0, r5, c7, c6, 2
+....
+
+what each part means:
+
+....
+mcr     p<coproc>, <opc1>, <src-dest-reg>, <CRn>, <CRm>, <opc2>
+....
+
 ===== ARM system register encodings
 
 Each aarch64 system register is specified in the encoding of <<arm-system-register-instructions>> by 5 integer numbers:
@@ -25570,6 +25862,14 @@ Bibliography:
 
 * https://stackoverflow.com/questions/49601910/out-of-order-execution-vs-speculative-execution
 
+===== Branch predictor
+
+https://en.wikipedia.org/wiki/Branch_predictor
+
+Comes in for <<superscalar-processor,superscalar processors>>.
+
+A gem5 example can be seen at: <<gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis-speculative>>.
+
 ==== Re-order buffer
 
 https://en.wikipedia.org/wiki/Re-order_buffer
@@ -25588,14 +25888,6 @@ Important examples:
 
 * <<superscalar-processor>>
 
-=== Branch predictor
-
-https://en.wikipedia.org/wiki/Branch_predictor
-
-Comes in for <<superscalar-processor,superscalar processors>>.
-
-TODO analysis in gem5.
-
 === Hardware threads
 
 Intel name: "Hyperthreading"