diff --git a/README.adoc b/README.adoc
index 7236988..8aefe3b 100644
--- a/README.adoc
+++ b/README.adoc
@@ -280,27 +280,6 @@ This makes things a bit more reproducible, since the microsecond in which you pr
 +
 But on the other hand maybe you are interested in observing the interrupts generated by key presses.
 
-==== Console fun
-
-You can also try those on the Ctrl + Alt + F3 of your Ubuntu host, but it is much more fun inside a VM!
-
-Stop blinking:
-
-    echo 0 > /sys/class/graphics/fbcon/cursor_blink
-
-Rotate the console 90 degrees!
-
-    echo 1 > /sys/class/graphics/fbcon/rotate
-
-Requires `CONFIG_FRAMEBUFFER_CONSOLE_ROTATION=y`.
-
-Documented under: `fb/`.
-
-TODO: font and keymap. Mentioned at: https://cmcenroe.me/2017/05/05/linux-console.html and I think can be done with Busybox `loadkmap` and `loadfont`, we just have to understand their formats, related:
-
-* https://unix.stackexchange.com/questions/177024/remap-keyboard-on-the-linux-console
-* https://superuser.com/questions/194202/remapping-keys-system-wide-in-linux-not-just-in-x
-
 === Automatic startup commands
 
 When debugging a module, it becomes tedious to wait for build and re-type:
@@ -1567,6 +1546,33 @@ Those commits change `BR2_LINUX_KERNEL_LATEST_VERSION` in `/linux/Config.in`.
 
 You should then look up if there is a branch that supports that kernel. Staying on branches is a good idea as they will get backports, in particular ones that fix the build as newer host versions come out.
 
+=== Console fun
+
+You can also try those on the Ctrl + Alt + F3 of your Ubuntu host, but it is much more fun inside a VM!
+
+Must be run in <<text-mode,graphical mode>>.
+
+Stop the cursor from blinking:
+
+....
+echo 0 > /sys/class/graphics/fbcon/cursor_blink
+....
+
+Rotate the console 90 degrees!
+
+....
+echo 1 > /sys/class/graphics/fbcon/rotate
+....
+
+Requires `CONFIG_FRAMEBUFFER_CONSOLE_ROTATION=y`.
+
+Documented under: `fb/`.
+
+TODO: font and keymap. Mentioned at: https://cmcenroe.me/2017/05/05/linux-console.html and I think can be done with Busybox `loadkmap` and `loadfont`, we just have to understand their formats, related:
+
+* https://unix.stackexchange.com/questions/177024/remap-keyboard-on-the-linux-console
+* https://superuser.com/questions/194202/remapping-keys-system-wide-in-linux-not-just-in-x
+
 === ftrace
 
 Trace a single function:
@@ -2452,38 +2458,52 @@ One methodology problem is that gem5 and QEMU were run with different kernel con
 
 OK, this is why we used gem5 in the first place, performance measurements!
 
-https://stackoverflow.com/questions/48944587/how-to-count-the-number-of-cpu-clock-cycles-between-the-start-and-end-of-a-bench
+Let's benchmark https://en.wikipedia.org/wiki/Dhrystone[Dhrystone] which Buildroot provides.
 
-Let's benchmark https://en.wikipedia.org/wiki/Dhrystone[Dhrystone] which Buildroot provides:
+The most flexible way is to do:
 
 ....
-./gem5-bench dhrystone 1000
-./gem5-bench -r dhrystone 1000
+# Generate a checkpoint after Linux boots.
+# The boot takes a while, be patient young Padawan.
+printf 'm5 exit' >readfile.gitignore
+./run -a aarch64 -g -E 'm5 checkpoint;m5 readfile > a.sh;sh a.sh'
+
+# Restore the checkpoint, and run the benchmark with parameter 1.000.
+# We skip the boot completely, saving time!
+printf 'm5 resetstats;dhrystone 1000;m5 exit' >readfile.gitignore
+./run -a aarch64 -g -- -r 1
+./gem5-ncycles -a aarch64
+
+# Now with another parameter 10.000.
+printf 'm5 resetstats;dhrystone 10000;m5 exit' >readfile.gitignore
+./run -a aarch64 -g -- -r 1
+./gem5-ncycles -a aarch64
 ....
 
 These commands output the approximate number of CPU cycles it took Dhrystone to run.
 
-It works like this:
+A more naive and simpler to understand approach would be a direct:
 
-* the first command boots linux with the default simplified `AtomicSimpleCPU`, and generates a <<gem5-checkpoint,checkpoint>> after the kernel boots and before running the benchmark
-* the second command restores the checkpoint with the more detailed `HPI` CPU model, and runs the benchmark. We don't boot with it because that is much slower.
+....
+./run -a aarch64 -g -E 'm5 checkpoint;m5 resetstats;dhrystone 10000;m5 exit'
+....
 
-ARM employees have just been modifying benchmarking code with instrumentation directly: https://github.com/arm-university/arm-gem5-rsk/blob/aa3b51b175a0f3b6e75c9c856092ae0c8f2a7cdc/parsec_patches/xcompile-patch.diff#L230
+but the problem is that this method does not allow to easily run a different script without running the boot again, see: <<gem5-restore-new-scrip>>
 
 A few imperfections of our benchmarking method are:
 
 * when we do `m5 resetstats` and `m5 exit`, there is some time passed before the `exec` system call returns and the actual benchmark starts and ends
 * the benchmark outputs to stdout, which means so extra cycles in addition to the actual computation. But TODO: how to get the output to check that it is correct without such IO cycles?
 
+Solutions to these problems include:
+
+* modify benchmark code with instrumentation directly, as PARSEC and ARM employees have been doing: https://github.com/arm-university/arm-gem5-rsk/blob/aa3b51b175a0f3b6e75c9c856092ae0c8f2a7cdc/parsec_patches/xcompile-patch.diff#L230
+* monitor known addresses
+
+Discussion at: https://stackoverflow.com/questions/48944587/how-to-count-the-number-of-cpu-clock-cycles-between-the-start-and-end-of-a-bench/48944588#48944588
+
 Those problems should be insignificant if the benchmark runs for long enough however.
 
-TODO: even if we don't switch to the detailed CPU model, the cycle counts on the original run and the one with checkpoint restore differ slightly. Why? Multiple checkpoint restores give the same results as expected however:
-
-....
-./run -a arm -E 'm5 checkpoint;m5 resetstats;dhrystone 1000;m5 exit' -g
-./run -a arm -g -- -r 1
-....
-
 Now you can play a fun little game with your friends:
 
 * pick a computational problem
@@ -2525,6 +2545,8 @@ getconf _NPROCESSORS_CONF
 
 ===== gem5 cache size
 
+https://stackoverflow.com/questions/49624061/how-to-run-gem5-simulator-in-fs-mode-without-cache/49634544#49634544
+
 A quick `+./run -g -- -h+` leads us to the options:
 
 ....
@@ -2541,10 +2563,22 @@ But keep in mind that it only affects benchmark performance of the most detailed
 [options="header"]
 |===
 |arch |CPU type |caches used
-|X86 |`AtomicSimpleCPU` | no
-|X86 |`DerivO3CPU` | ?*
-|ARM |`AtomicSimpleCPU` | no
-|ARM |`HPI` | yes
+
+|X86
+|`AtomicSimpleCPU`
+|no
+
+|X86
+|`DerivO3CPU`
+|?*
+
+|ARM
+|`AtomicSimpleCPU`
+|no
+
+|ARM
+|`HPI`
+|yes
 |===
 
 {empty}*: couldn't test because of:
@@ -2552,91 +2586,6 @@ But keep in mind that it only affects benchmark performance of the most detailed
 * https://stackoverflow.com/questions/49011096/how-to-switch-cpu-models-in-gem5-after-restoring-a-checkpoint-and-then-observe-t
 * https://github.com/gem5/gem5/issues/16
 
-This has been verified with:
-
-....
-m5 resetstats && dhrystone 10000 && m5 dumpstats
-....
-
-at commit da79d6c6cde0fbe5473ce868c9be4771160a003b with the following gem5 commands cycle counts:
-
-....
-# 11M
-./run -a arm -g
-./run -a arm -g -- --caches --l2cache
-
-# 175M
-./run -a arm -g -- --caches --l1d_size=1024   --l1i_size=1024   --l2cache --l2_size=1024   --l3_size=1024   --cpu-type=HPI
-
-# 16M
-./run -a arm -g -- --caches --l1d_size=1024MB --l1i_size=1024MB --l2cache --l2_size=1024MB --l3_size=1024MB --cpu-type=HPI
-
-# 20M
-./run -a x86_64 -g -- --caches --l1d_size=1024   --l2cache --l2_size=1024   --l3_size=1024
-./run -a x86_64 -g -- --caches --l1d_size=1024MB --l2cache --l2_size=1024MB --l3_size=1024MB
-....
-
-At commit f3503b4cc810556df3c736d0a147cc54e05efc83:
-
-....
-cmd='./run -a aarch64 -g'
-cache_small='--caches --l2cache --l1d_size=1024   --l1i_size=1024   --l2_size=1024   --l3_size=1024'
-cache_large='--caches --l2cache --l1d_size=1024kB --l1i_size=1024kB --l2_size=1024kB --l3_size=1024kB'
-
-printf '#!/bin/sh
-m5 resetstats
-dhrystone 1000
-m5 exit
-' >readfile.gitignore
-chmod +x readfile.gitignore
-
-# Create the checkpoints after the kernel boot.
-# cpt 1: no caches
-$cmd -E 'm5 checkpoint;m5 readfile > a.sh;sh a.sh'
-# cpt 2: small caches
-$cmd -E 'm5 checkpoint;m5 readfile > a.sh;sh a.sh' -- $cache_small
-# cpt 3: large caches
-$cmd -E 'm5 checkpoint;m5 readfile > a.sh;sh a.sh' -- $cache_large
-# cpt 4: no caches HPI
-$cmd -E 'm5 checkpoint;m5 readfile > a.sh;sh a.sh' -- $cache_small --cpu-type=HPI
-# cpt 5: large caches HPI
-$cmd -E 'm5 checkpoint;m5 readfile > a.sh;sh a.sh' -- $cache_large --cpu-type=HPI
-
-# dhrystone 1.000
-# 2.738.340
-$cmd -- -r 1
-# 2.738.343
-$cmd -- -r 2 $cache_small
-# 2.738.307
-$cmd -- -r 3 $cache_large
-
-sed -Ei 's/^dhrystone .*/dhrystone 10000' readfile.gitignore
-# 10.995.467
-$cmd -- -r 1
-# 10.995.470
-$cmd -- -r 2 $cache_small
-# 10.995.434
-$cmd -- -r 3 $cache_large
-
-sed -Ei 's/^dhrystone .*/dhrystone 100000' readfile.gitignore
-# 93.475.029
-$cmd -- -r 1
-# 93.475.032
-$cmd -- -r 2 $cache_small
-# 93.475.091
-$cmd -- -r 3 $cache_large
-
-# 50.193.186
-$cmd -E 'm5 checkpoint;m5 readfile > a.sh;sh a.sh' -- $cache_small --cpu-type=HPI
-# 5.924.610
-$cmd -E 'm5 checkpoint;m5 readfile > a.sh;sh a.sh' -- $cache_large --cpu-type=HPI
-
-# 2.736.509
-$cmd -E 'm5 checkpoint;m5 readfile > a.sh;sh a.sh' -- $cache_small --restore-with-cpu=HPI -r 2
-# 2.736.949
-$cmd -E 'm5 checkpoint;m5 readfile > a.sh;sh a.sh' -- $cache_large --restore-with-cpu=HPI -r 3
-....
-
 Cache sizes can in theory be checked with the methods described at: link:https://superuser.com/questions/55776/finding-l2-cache-size-in-linux[]:
 
 ....
@@ -2655,6 +2604,15 @@ Behaviour breakdown:
 * arm QEMU and gem5 (both `AtomicSimpleCPU` or `HPI`), x86 gem5: `/sys` files don't exist, and `getconf` and `lscpu` value empty
 * x86 QEMU: `/sys` files exist, but `getconf` and `lscpu` values still empty
 
+So we take a performance measurement approach instead:
+
+....
+./gem5-bench-cache -a aarch64
+cat out/aarch64/gem5/bench-cache.txt
+....
+
+TODO: sort out HPI, and then paste results here, why the `--cpu-type=HPI` there always generates a `switch_cpu`, even if the original run was also on HPI?
+
 ===== gem5 memory latency
 
 TODO These look promising:
@@ -3126,14 +3084,15 @@ Internals:
 * the checkpoints are stored under `out/$arch/gem5/m5out/cpt.$todo_whatisthis`
 * <<m5>> is a guest utility present inside the gem5 tree which we cross-compiled and installed into the guest
 
+[[gem5-restore-new-scrip]]
 ===== gem5 checkpoint restore and run a different script
 
 You want to automate running several tests from a single pristine post-boot state.
 
 The problem is that after the checkpoint, the memory and disk states are fixed, so you can't for example:
 
-* hack up an existing rc script
-* inject new kernel boot command line options
+* hack up an existing rc script, since the disk is fixed
+* inject new kernel boot command line options, since those have already been put into memory by the bootloader
 
 There is however one loophole: <<m5-readfile>>, which reads whatever is present on the host, so we can do it like:
 
diff --git a/build b/build
index 2b14372..6fb066b 100755
--- a/build
+++ b/build
@@ -2,7 +2,6 @@
 set -eu
 . common
 set -- ${cli_build:-} "$@"
-arch=x86_64
 rm -f br2_cli.gitignore
 touch br2_cli.gitignore
 configure=true
diff --git a/common b/common
index 141be97..1afc470 100644
--- a/common
+++ b/common
@@ -1,4 +1,9 @@
 #!/usr/bin/env bash
+eeval() (
+  cmd="$1"
+  echo "$cmd" | tee -a "${2:-/dev/null}"
+  eval "$cmd"
+)
 set_common_vars() {
   arch="$1"
   gem5="$2"
@@ -14,6 +19,7 @@ set_common_vars() {
   build_dir="${buildroot_out_dir}/build"
   host_dir="${buildroot_out_dir}/host"
   gem5_out_dir="${out_arch_dir}/gem5"
+  m5out_dir="${gem5_out_dir}/m5out"
   qemu_out_dir="${out_arch_dir}/qemu"
   common_dir="${out_dir}/common"
 }
@@ -21,3 +27,5 @@ f=cli.gitignore
 if [ -f "$f" ]; then
   . "$f"
 fi
+# Default arch.
+arch=x86_64
diff --git a/gem5-bench b/gem5-bench
deleted file mode 100755
index ba8cdc4..0000000
--- a/gem5-bench
+++ /dev/null
@@ -1,19 +0,0 @@
-#!/usr/bin/env bash
-replay=false
-while getopts r OPT; do
-  case "$OPT" in
-    r)
-      replay=true
-      ;;
-  esac
-done
-shift "$(($OPTIND - 1))"
-bench="$@"
-statfile=m5out/stats.txt
-if "$replay"; then
-  ./run -a arm -g -- --caches -r 1 --restore-with-cpu=HPI
-  awk '/^system.switch_cpus.numCycles /{ print $2 }' "$statfile"
-else
-  ./run -a arm -e 'init=/eval.sh - lkmc_eval="m5 checkpoint;m5 resetstats;'"$bench"';m5 exit"' -g
-  awk '/^system.cpu.numCycles /{ print $2 }' "$statfile"
-fi
diff --git a/run b/run
index e46369e..4f66b92 100755
--- a/run
+++ b/run
@@ -4,7 +4,6 @@ set -eu
 set -- ${cli_run:-} "$@"
 
 # CLI handling.
-arch=x86_64
 cpus=1
 debug_vm=
 debug=false
@@ -132,7 +131,6 @@ if "$gem5"; then
   else
     gem5_arch=ARM
   fi
-  m5out_dir="${gem5_out_dir}/m5out"
   gem5_common="\
 M5_PATH='${gem5_build_dir}/system' \
 ${debug_vm} \
diff --git a/rungdb b/rungdb
index a00336a..84afc65 100755
--- a/rungdb
+++ b/rungdb
@@ -3,7 +3,6 @@ set -eu
 . common
 set -- ${cli_rungdb:-} "$@"
 after=
-arch='x86_64'
 before=
 gem5=false
 lx_symbols="-ex 'lx-symbols ../kernel_module-1.0/'"
diff --git a/rungdb-user b/rungdb-user
index cb0db5d..387107e 100755
--- a/rungdb-user
+++ b/rungdb-user
@@ -3,7 +3,6 @@ set -eu
 . common
 set -- ${cli_rungdb_user:-} "$@"
 usage="$0 <exec-relative-path> [<brk-symbol>]"
-arch='x86_64'
 gem5=false
 gem5_opt=
 while getopts a:gh OPT; do
diff --git a/rungdbserver b/rungdbserver
index f6f3e32..0f54cbe 100755
--- a/rungdbserver
+++ b/rungdbserver
@@ -2,7 +2,6 @@
 set -eu
 . common
 set -- ${cli_rungdbserver:-} "$@"
-arch='x86_64'
 gem5=false
 while getopts a:g OPT; do
   case "$OPT" in
diff --git a/trace-boot b/trace-boot
index a16f411..26d0cbb 100755
--- a/trace-boot
+++ b/trace-boot
@@ -2,7 +2,6 @@
 set -eu
 . common
 set -- ${cli_trace_boot:-} "$@"
-arch=x86_64
 while getopts a: OPT; do
   case "$OPT" in
     a)
diff --git a/trace2line b/trace2line
index 04f6c30..37104dd 100755
--- a/trace2line
+++ b/trace2line
@@ -2,7 +2,6 @@
 set -eu
 . common
 set -- ${cli_trace2line:-} "$@"
-arch=x86_64
 while getopts a: OPT; do
   case "$OPT" in
     a)