From d1003f1cb214c3b1d073c1d67e3803b3e34dd06f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ciro=20Santilli=20=E5=85=AD=E5=9B=9B=E4=BA=8B=E4=BB=B6=20?=
 =?UTF-8?q?=E6=B3=95=E8=BD=AE=E5=8A=9F?= <ciro.santilli@gmail.com>
Date: Thu, 16 May 2019 00:00:00 +0000
Subject: [PATCH] Make this repo good enough to move in cpp-cheat,
 x86-assembly-cheat and arm-assembly-cheat in

This commit is a large squash, the full development branch is available at:
https://github.com/cirosantilli/linux-kernel-module-cheat/tree/asm

This notably means a refactor of the userland build and testing, to support:

- improved assembly infrastructure unified across arm and x86
- native in-tree build and test helpers
- parallel building and testing, which implies thread_pool.py
- selection of what to build and test from the CLI
- path_properties.py to indicate how to build and run different examples
- in full system, move all userland stuff into /lkmc
- prefix everything that we defined across files with LKMC
- --gdb uber convenient helper
- remove import imp which was deprecated

Full commit messages from the branch follow:

1:

userland: add assembly support

Move arm assembly cheat here, and start some work on x86 cheat as well.

2:

document userland asm syscall interfaces

3:

userland assembly: structure readme

4:

x86 fail works

5:

asm: more links

6:

userland: add ported to all archs

7:

move all our stuff into /lkmc in guest

Motivation: userland is getting several new subdirectories, it would be
too insane to just dump all of that in the guest root filesystem.

To alleviate the cd pain, .profile puts user inside /lkmc by default.

8:

start the big userland migration

9:

migrate all

10:

bak

11:

build-userland-in-tree is now a Python command

./build calls it, we did this to allow --download-dependencies to work
perfectly.

12:

rename include to lkmc

13:

mtops.h is perfect now

14:

userland: make build perfect

15:

preparing test_user_mode, need to generalize stuff as usual

16:

asm: prefix every linux specific with linux/

17:

userland: maybe it really works

18:

userland: fix kernel version to work on older ubuntu

Expose --kernel-version to allow customization.

Update LTP info.

19:

userland: build really truly working now

userland test: start work, in a working state, but no features

20:

test-user-mode: make perfect like build-userland

Multithreading and target selection.

21:

userland: get a bit closer to perfection

22:

thread_pool: support passing thread IDs

Then use that to fix gem5 error log read race.

23:

userland: native testing

24:

userland: path properties getting nice!

25:

userland: move posix/environ from cpp-cheat

26:

gem5: --debug-flags without =, looks nicer whenever it can be done

27:

run: rename --wait-gdb in --gdb-wait, --gdb prefix might become a thing

28:

run: create --tmux-program gdb to open gem5 GDB

29:

run: create the uber convenient --gdb option

30:

userland: move getchar from cpp-cheat

31:

prebuilt: kernel boot aarch64 does not work on Ubuntu 16.04

32:

userland: x86_64 linux hello world make PIE

33:

userland: try to make userland executable selection saner

Only allow existing files to be built, stop extension expansion madness.

cli_function: get_cli print booleans properly, was printing without --no-
for negations.

34:

userland: only link to lkmc.o if needed

35:

path_properties: make data very compact with only tuples and dicts

Spend 2 hours of my life thinking about low value tree walks ;-)

36:

userland: move more userland/arch/ logic into property tree

37:

userland: make libs work

Working for build, but now test-user-mode-in-tree is not using --in-tree,
TODO fix later on.

38:

userland: make libs really work

39:

userland: document path_properties

40:

userland: classify linux

41:

waste your life

42:

common: fix absolute path runs

--gdb: allow running from arbitrary directory

43:

baremetal: arm allow using floating point instructions

44:

baremetal: stat preparing to make perfect like userland/

45:

run: fix image check logic accounting for userland

Was failing if I try to run userland (with abspath) when out/
directory is not present.

46:

cli-function: raise if the config file is given and does not exist

47:

common: define missing 'ld' variable, this broke m5 build

48:

rum: --qemu-which host now works for user mode as well as system

Don't fall back on host QEMU automatically, too much insanity.

49:

userland: refix silly mistakes

50:

userland: use path_properties flags for all builds, including lkmc. and userland/arch/main.c

Without this in particular, --gdb fails on assembly because main.c
was not being built with -ggdb3.

51:

userland: start refactor to show failing values on failure!

aarch64 basically done, but missing:

- other archs
- maybe convert main.c into C++ to use templates?
- full review of ASSERT_EQ calling convention issues not seen by tests
  by chance
- documentation

52:

readme: releases are more stable...

53:

submodules: sort gitmodules

54:

test-baremetal: same interface as test-user-mode

In particular, runs tests in parallel, and allows selecting given tests

55:

baremetal: allow arbitrary exit status with the magic string

test-baremetal: fix missing setting x0 return value

Examples were just returning on ret without setting x0, which led to
failures... those were not noticed because of how broken the testing system
was ;-)

56:

baremetal: ah, actually nope, it didn't work :-(

Workaround for now. Works on asserts, but not on exit 1.

Some other day, maybe.

https://github.com/cirosantilli/linux-kernel-module-cheat/issues/59

57:

panic on panic: improve behaviour description

58:

baremetal: get exit status working with on_exit :-)

59:

baremetal: implement C assert

60:

test-baremetal: remove commented out exit status workaround

61:

test-user-mode: handle exit status for signals. Fix #61.

62:

aarch64: fix ASSERT_EQ_REG tests on gem5

Was doing an 8-byte aligned store, which gem5 dislikes.

But the ARMARM says bad things may happen there, notably a signal:
"D1.8.2 SP alignment checking" so gem5 is not really too wrong,
QEMU just happens to work by chance.

63:

userland assembly: build empty.S and fail.S to toplevel and run fail.S with path_properties exit_status

They were just duplicating stuff needlessly while we don't support non-native in-tree builds,
which leads to executable conflicts for C file anyways.

64:

gem5: use a single build tree for all build types

gem5 already has different object names for each build type it seems, so
let's just make sure that works and save some disk space.

65:

userland x86_64: ASSERT_EQ show actual and expected values

66:

assert_fail.c: add to readme index

67:

userland x86_64: implement ASSERT_MEMCMP

68:

userland x86_64: allow ASSERT_EQ to take just about anything

69:

gas data sizes

70:

gas_data_sizes.S: make PIE for all ISAs

71:

x86: paddq

72:

x86 paddq: test entire family

73:

Get rid of imp, started giving deprecation warning every time in Python 3.7 in Ubuntu 19.04.

Please python stop torturing me with refactors.

Make ./run -u blow up if executable not found, otherwise I go crazy.

Get ./test-gdb back to life after the ./run relative path refactor, forgot to test this.

74:

fix run-toolchain, qemu-monitor, trace-boot, trace2line, bisect-linux-boot-gem5. Fixes part of #63

I'm sad no one reported qemu-monitor break, that one is kind of important.

count.out arguments broke it as an init program, since the kernel adds trash
parameters to every init.

Is anyone using this repo, I wonder? Keep pushing, keep pushing.
One day it gets good enough, and the whole world will see.

75:

x86 assembly: addpd

76:

Fix import_path circular dependency by splitting it out.

Use import thread_pool instead from, from is evil.

Fix poweroff.out path for ./trace-boot.

77:

run: rename cryptic tmu to tmux-split, ./run is good now so I never use it anymore explicitly

78:

assembly SIMD add: make uniform for all ISAs, mark as entry point to learning SIMD

79:

start moving arm-assembly-cheat readme in here

80:

arm assembly: move some more in

81:

move more arm in

82:

userland: attempt to fix all assembly example links to README

83:

assembly: improve organization of simd add

84:

ld2 move in

85:

Make userland / assembly getting started more uniform / visible

Forward --gcc-which to ./run --tmux.

Use gdb-multiarch for --gcc-which host.

86:

userland: disable PIE explicitly on command line for all executables

87:

userland: make userland content a better landing page

88:

build: check git version from --version and degrade gracefully

89:

build: make --dry-run work again on all

90:

import_path: importlib explicit for Ubuntu 16.04

91:

make all submodules point to my forks

git servers are insane, submodule implementation is crap, what can you do

92:

build: log warning on git too old for --update

93:

build-linux: do olddefconfig even if no fragments

In particular, gem5 kernel 4.15 needs it

94:

userland content: improve a bit landing page for cpp-cheat
---
 .gitignore                                    |    8 +-
 .gitmodules                                   |   21 +-
 README.adoc                                   | 3787 ++++++++++++-----
 baremetal/add.c                               |   14 +-
 baremetal/add.py                              |   10 +-
 baremetal/arch/aarch64/add.S                  |    7 +-
 baremetal/arch/aarch64/c_from_as.S            |    2 +-
 baremetal/arch/aarch64/fadd.S                 |   20 +-
 baremetal/arch/aarch64/multicore.S            |    3 +-
 .../arch/aarch64/no_bootloader/gem5_exit.S    |    4 +-
 .../aarch64/no_bootloader/semihost_exit.S     |    2 +-
 baremetal/arch/aarch64/regs.S                 |   18 +-
 baremetal/arch/aarch64/return.S               |    1 +
 baremetal/arch/aarch64/return1.S              |    5 +
 baremetal/arch/aarch64/svc_asm.S              |    1 +
 baremetal/arch/arm/add.S                      |    1 +
 baremetal/arch/arm/dump_regs.c                |   15 +
 baremetal/arch/arm/gem5_assert.S              |   12 +-
 baremetal/arch/arm/multicore.S                |    1 +
 baremetal/arch/arm/regs.S                     |    1 +
 baremetal/arch/arm/return1.S                  |    5 +
 baremetal/assert_fail.c                       |    1 +
 baremetal/exit.c                              |    6 -
 baremetal/exit0.c                             |    7 +
 baremetal/exit1.c                             |    7 +
 baremetal/{interactive/prompt.c => getchar.c} |    2 +
 baremetal/hello.c                             |    7 +-
 baremetal/{interactive => }/infinite_loop.c   |    0
 baremetal/interactive/README.adoc             |    1 -
 baremetal/interactive/exit1.c                 |    6 -
 baremetal/interactive/return1.c               |    1 -
 baremetal/lib/aarch64.S                       |    6 +
 baremetal/lib/arm.S                           |   11 +
 baremetal/lib/syscalls.c                      |   74 +-
 baremetal/lkmc_assert_fail.c                  |    1 +
 baremetal/return1.c                           |    2 +
 baremetal/return2.c                           |    2 +
 bisect-linux-boot-gem5                        |   45 +-
 bisect-qemu-linux-boot                        |    2 +-
 bst-vs-heap                                   |   44 +-
 bst-vs-heap.gnuplot                           |   25 +
 build                                         |   85 +-
 build-baremetal                               |   72 +-
 build-buildroot                               |    5 +-
 build-crosstool-ng                            |    2 +-
 build-doc                                     |   13 +-
 build-docker                                  |    1 -
 build-gem5                                    |    3 +-
 build-linux                                   |   30 +-
 build-m5                                      |    4 +-
 build-modules                                 |   33 +-
 build-userland                                |  348 +-
 build-userland-in-tree                        |   26 +-
 cli_function.py                               |   63 +-
 common.py                                     |  722 +++-
 copy-overlay                                  |    9 +-
 gem5-stat                                     |    2 +-
 getvar                                        |    2 +-
 include/README.adoc                           |    1 -
 kernel_modules/Makefile                       |   14 +-
 kernel_modules/anonymous_inode.c              |    2 +-
 kernel_modules/ioctl.c                        |    2 +-
 kernel_modules/netlink.c                      |    2 +-
 kernel_modules/pmccntr.c                      |    2 +-
 kernel_modules/ring0.c                        |    6 +-
 lkmc.c                                        |   17 +-
 lkmc.h                                        |    2 +-
 lkmc/README.adoc                              |    1 +
 lkmc/__init__.py                              |    0
 lkmc/add.c                                    |   13 +
 lkmc/add.py                                   |    9 +
 {include => lkmc}/anonymous_inode.h           |    6 +-
 lkmc/assert_fail.c                            |   18 +
 lkmc/hello.c                                  |    9 +
 lkmc/import_path.py                           |   34 +
 {include => lkmc}/ioctl.h                     |    6 +-
 .../assert_fail.c => lkmc/lkmc_assert_fail.c  |    1 -
 lkmc/m5ops.h                                  |   56 +
 lkmc/math.h                                   |   20 +
 {include => lkmc}/netlink.h                   |    6 +-
 userland/common_userland.h => lkmc/pagemap.h  |   16 +-
 {include => lkmc}/ring0.h                     |   26 +-
 path_properties.py                            |  413 ++
 qemu-monitor                                  |   65 +-
 release-zip                                   |    2 +-
 rootfs_overlay/.profile                       |    4 +
 rootfs_overlay/anonymous_inode.sh             |    5 -
 rootfs_overlay/conf.sh                        |    2 -
 rootfs_overlay/etc/init.d/S98                 |    2 +
 rootfs_overlay/eval.sh                        |   18 -
 rootfs_overlay/eval_base64.sh                 |    2 -
 rootfs_overlay/ioctl.sh                       |    7 -
 rootfs_overlay/lkmc/anonymous_inode.sh        |    6 +
 rootfs_overlay/{ => lkmc}/character_device.sh |    0
 .../{ => lkmc}/character_device_create.sh     |    0
 rootfs_overlay/lkmc/conf.sh                   |    3 +
 rootfs_overlay/{ => lkmc}/count.sh            |    1 +
 rootfs_overlay/{ => lkmc}/debugfs.sh          |    0
 rootfs_overlay/{ => lkmc}/dep.sh              |    0
 rootfs_overlay/lkmc/eval_base64.sh            |    4 +
 rootfs_overlay/{ => lkmc}/fb.sh               |    0
 rootfs_overlay/{ => lkmc}/fops.sh             |    0
 rootfs_overlay/{ => lkmc}/gdbserver.sh        |    0
 rootfs_overlay/{ => lkmc}/gem5.sh             |    0
 rootfs_overlay/{ => lkmc}/gem5_exit.sh        |    0
 rootfs_overlay/{ => lkmc}/gpio.sh             |    0
 rootfs_overlay/{ => lkmc}/init_forward.sh     |    0
 rootfs_overlay/{ => lkmc}/init_lkmc.sh        |    0
 rootfs_overlay/{ => lkmc}/init_module.sh      |    0
 rootfs_overlay/{ => lkmc}/insrm.sh            |    0
 rootfs_overlay/lkmc/ioctl.sh                  |    8 +
 rootfs_overlay/{ => lkmc}/kgdb.sh             |    0
 rootfs_overlay/{ => lkmc}/kstrto.sh           |    0
 rootfs_overlay/lkmc/loginroot.sh              |    3 +
 rootfs_overlay/{ => lkmc}/mknoddev.sh         |    0
 rootfs_overlay/lkmc/mmap.sh                   |    5 +
 rootfs_overlay/lkmc/netlink.sh                |    8 +
 rootfs_overlay/{ => lkmc}/params.sh           |    0
 rootfs_overlay/{ => lkmc}/pci_rescan.sh       |    0
 rootfs_overlay/{ => lkmc}/pmccntr.sh          |    0
 rootfs_overlay/lkmc/poll.sh                   |    6 +
 rootfs_overlay/{ => lkmc}/pr_debug.sh         |    0
 rootfs_overlay/{ => lkmc}/procfs.sh           |    0
 rootfs_overlay/{ => lkmc}/psa.sh              |    0
 rootfs_overlay/{ => lkmc}/qemu_edu.sh         |    0
 .../{ => lkmc}/rand_check_poweroff.sh         |    4 +-
 rootfs_overlay/{ => lkmc}/seq_file.sh         |    0
 .../{ => lkmc}/seq_file_single_open.sh        |    0
 rootfs_overlay/{ => lkmc}/sshd.sh             |    0
 rootfs_overlay/{ => lkmc}/sysfs.sh            |    0
 rootfs_overlay/{ => lkmc}/test_all.sh         |    2 +-
 rootfs_overlay/{ => lkmc}/test_fail.sh        |    2 +-
 rootfs_overlay/{ => lkmc}/uio_read.sh         |    2 +-
 rootfs_overlay/{ => lkmc}/vermagic.sh         |    0
 rootfs_overlay/{ => lkmc}/virt_to_phys.sh     |    0
 rootfs_overlay/loginroot.sh                   |    2 -
 rootfs_overlay/mmap.sh                        |    5 -
 rootfs_overlay/netlink.sh                     |    7 -
 rootfs_overlay/poll.sh                        |    5 -
 rootfs_overlay/root/.profile                  |    4 -
 run                                           |  427 +-
 run-gdb                                       |    6 +-
 run-gdb-user                                  |   62 +-
 run-gdbserver                                 |    2 +-
 run-toolchain                                 |   75 +-
 shell_helpers.py                              |   87 +-
 test                                          |   14 +-
 test-baremetal                                |   79 +-
 test-boot                                     |    3 +-
 test-build-userland                           |   70 +
 test-gdb                                      |   64 +-
 test-test-user-mode                           |   39 +
 test-user-mode                                |   90 +-
 test-user-mode-in-tree                        |   21 +
 test-userland-full-system                     |    6 +-
 thread_pool.py                                |  269 ++
 tmu => tmux-split                             |    1 +
 trace-boot                                    |   13 +-
 trace2line                                    |   44 +-
 userland/README.adoc                          |    1 -
 userland/add.c                                |    1 -
 userland/add.py                               |    1 -
 userland/anonymous_inode.c                    |   46 -
 userland/arch/aarch64/add.S                   |    9 +
 userland/arch/aarch64/add_vector.S            |   32 +
 userland/arch/aarch64/adr.S                   |   21 +
 userland/arch/aarch64/adrp.S                  |   13 +
 userland/arch/aarch64/asm_hello.c             |   13 -
 userland/arch/aarch64/beq.S                   |   33 +
 userland/arch/aarch64/bfi.S                   |   11 +
 userland/arch/aarch64/c/build                 |    1 +
 userland/arch/aarch64/c/earlyclobber.c        |   17 +
 userland/arch/aarch64/c/freestanding/build    |    1 +
 .../arch/aarch64/c/freestanding/linux/build   |    1 +
 .../arch/aarch64/c/freestanding/linux/hello.c |   39 +
 .../c/freestanding/linux/hello_clobbers.c     |   42 +
 .../arch/aarch64/c/freestanding/linux/test    |    1 +
 userland/arch/aarch64/c/freestanding/test     |    1 +
 userland/arch/aarch64/c/inc.c                 |   18 +
 userland/arch/aarch64/c/inc_float.c           |   25 +
 userland/arch/aarch64/c/linux/asm_from_c.c    |   39 +
 userland/arch/aarch64/c/linux/build           |    1 +
 userland/arch/aarch64/c/linux/test            |    1 +
 userland/arch/aarch64/c/multiline.cpp         |   18 +
 userland/arch/aarch64/c/reg_var.c             |   27 +
 userland/arch/aarch64/c/reg_var_float.c       |   28 +
 userland/arch/aarch64/c/test                  |    1 +
 userland/arch/aarch64/cbz.S                   |   19 +
 userland/arch/aarch64/comments.S              |   17 +
 userland/arch/aarch64/common_arch.h           |   83 +
 userland/arch/aarch64/cset.S                  |   28 +
 userland/arch/aarch64/fadd_scalar.S           |   60 +
 userland/arch/aarch64/fadd_vector.S           |   34 +
 userland/arch/aarch64/freestanding/build      |    1 +
 .../arch/aarch64/freestanding/linux/build     |    1 +
 .../arch/aarch64/freestanding/linux/hello.S   |   22 +
 userland/arch/aarch64/freestanding/linux/test |    1 +
 userland/arch/aarch64/freestanding/test       |    1 +
 userland/arch/aarch64/gas_data_sizes.S        |   29 +
 userland/arch/aarch64/immediates.S            |    9 +
 userland/arch/aarch64/ld2.S                   |   26 +
 userland/arch/aarch64/movk.S                  |   26 +
 userland/arch/aarch64/movn.S                  |    9 +
 userland/arch/aarch64/pc.S                    |   78 +
 userland/arch/aarch64/registers.S             |   47 +
 userland/arch/aarch64/ret.S                   |   28 +
 userland/arch/aarch64/str.S                   |   13 +
 userland/arch/aarch64/test                    |    1 +
 userland/arch/aarch64/ubfm.S                  |   17 +
 userland/arch/aarch64/ubfx.S                  |   15 +
 userland/arch/aarch64/x31.S                   |   51 +
 userland/arch/arm/add.S                       |   58 +
 userland/arch/arm/address_modes.S             |   51 +
 userland/arch/arm/adr.S                       |   33 +
 userland/arch/arm/and.S                       |   27 +
 userland/arch/arm/b.S                         |    9 +
 userland/arch/arm/beq.S                       |   28 +
 userland/arch/arm/bfi.S                       |   10 +
 userland/arch/arm/bic.S                       |   10 +
 userland/arch/arm/bl.S                        |   14 +
 userland/arch/arm/build                       |    1 +
 userland/arch/arm/c/add.c                     |   20 +
 userland/arch/arm/c/build                     |    1 +
 userland/arch/arm/c/freestanding/build        |    1 +
 userland/arch/arm/c/freestanding/linux/build  |    1 +
 .../arch/arm/c/freestanding/linux/hello.c     |   40 +
 userland/arch/arm/c/freestanding/linux/test   |    1 +
 userland/arch/arm/c/freestanding/test         |    1 +
 userland/arch/arm/c/inc.c                     |   18 +
 userland/arch/arm/c/inc_float.c               |   28 +
 userland/arch/arm/c/inc_memory.c              |   34 +
 userland/arch/arm/c/inc_memory_global.c       |   27 +
 userland/arch/arm/c/reg_var.c                 |   38 +
 userland/arch/arm/c/test                      |    1 +
 userland/arch/arm/clz.S                       |   17 +
 userland/arch/arm/comments.S                  |   14 +
 userland/arch/arm/common_arch.h               |   90 +
 userland/arch/arm/cond.S                      |   16 +
 userland/arch/arm/freestanding/build          |    1 +
 userland/arch/arm/freestanding/linux/build    |    1 +
 userland/arch/arm/freestanding/linux/hello.S  |   23 +
 userland/arch/arm/freestanding/linux/test     |    1 +
 userland/arch/arm/freestanding/test           |    1 +
 userland/arch/arm/gas_data_sizes.S            |   30 +
 userland/arch/arm/immediates.S                |   24 +
 userland/arch/arm/inc_array.S                 |   27 +
 userland/arch/arm/ldmia.S                     |   61 +
 userland/arch/arm/ldr_pseudo.S                |   65 +
 userland/arch/arm/ldrb.S                      |   12 +
 userland/arch/arm/ldrh.S                      |   12 +
 userland/arch/arm/linux/build                 |    1 +
 userland/arch/arm/linux/c_from_asm.S          |   59 +
 userland/arch/arm/linux/test                  |    1 +
 userland/arch/arm/mov.S                       |   19 +
 userland/arch/arm/movw.S                      |   27 +
 userland/arch/arm/mul.S                       |   15 +
 userland/arch/arm/nop.S                       |   32 +
 userland/arch/arm/push.S                      |   31 +
 userland/arch/arm/rbit.S                      |   12 +
 userland/arch/arm/registers.S                 |   69 +
 userland/arch/arm/rev.S                       |   18 +
 userland/arch/arm/s_suffix.S                  |   35 +
 userland/arch/arm/shift.S                     |   79 +
 userland/arch/arm/str.S                       |   60 +
 userland/arch/arm/sub.S                       |   14 +
 userland/arch/arm/test                        |    1 +
 userland/arch/arm/thumb.S                     |   21 +
 userland/arch/arm/tst.S                       |   22 +
 userland/arch/arm/vadd_scalar.S               |   72 +
 userland/arch/arm/vadd_vector.S               |   71 +
 userland/arch/arm/vcvt.S                      |   90 +
 userland/arch/arm/vcvta.S                     |   41 +
 userland/arch/arm/vcvtr.S                     |   46 +
 userland/arch/common.h                        |   32 +
 userland/arch/empty.S                         |    8 +
 userland/arch/fail.S                          |    9 +
 userland/arch/main.c                          |   61 +
 userland/arch/test                            |    1 +
 userland/arch/x86_64/add.S                    |    9 +
 userland/arch/x86_64/addpd.S                  |   31 +
 userland/arch/x86_64/asm_hello.c              |   16 -
 userland/arch/x86_64/binutils_hack.c          |   20 -
 userland/arch/x86_64/c/add.c                  |   18 +
 userland/arch/x86_64/c/binutils_hack.c        |   20 +
 userland/arch/x86_64/c/binutils_nohack.c      |   18 +
 userland/arch/x86_64/c/build                  |    1 +
 userland/arch/x86_64/c/freestanding/build     |    1 +
 .../arch/x86_64/c/freestanding/linux/build    |    1 +
 .../arch/x86_64/c/freestanding/linux/hello.c  |   33 +
 .../c/freestanding/linux/hello_regvar.c       |   37 +
 .../arch/x86_64/c/freestanding/linux/test     |    1 +
 userland/arch/x86_64/c/freestanding/test      |    1 +
 userland/arch/x86_64/c/inc.c                  |   15 +
 userland/arch/x86_64/c/rdtsc.c                |   14 +
 userland/arch/x86_64/c/ring0.c                |   12 +
 userland/arch/x86_64/c/scratch.c              |   22 +
 userland/arch/x86_64/c/scratch_hardcode.c     |   22 +
 userland/arch/x86_64/c/test                   |    1 +
 userland/arch/x86_64/common_arch.h            |   87 +
 userland/arch/x86_64/freestanding/hello.S     |   19 -
 userland/arch/x86_64/freestanding/linux/build |    1 +
 .../arch/x86_64/freestanding/linux/hello.S    |   22 +
 userland/arch/x86_64/freestanding/linux/test  |    1 +
 userland/arch/x86_64/freestanding/test        |    1 +
 userland/arch/x86_64/gas_data_sizes.S         |   29 +
 userland/arch/x86_64/lkmc_assert_eq_fail.S    |   16 +
 .../arch/x86_64/lkmc_assert_memcmp_fail.S     |   11 +
 userland/arch/x86_64/paddq.S                  |   38 +
 userland/arch/x86_64/test                     |    1 +
 userland/assert_fail.c                        |    1 -
 userland/c/README.adoc                        |    1 +
 userland/c/assert_fail.c                      |    1 +
 userland/c/false.c                            |   18 +
 userland/{ => c}/gcc_hack.c                   |    0
 userland/c/getchar.c                          |   21 +
 userland/c/hello.c                            |   10 +-
 userland/c/infinite_loop.c                    |   29 +
 userland/{ => c}/m5ops.c                      |   12 +-
 userland/c/print_argv.c                       |   14 +
 userland/c/stderr.c                           |    7 +
 userland/c/test                               |    1 +
 userland/count.c                              |   19 -
 userland/cpp/README.adoc                      |    2 +-
 userland/{ => cpp}/bst_vs_heap.cpp            |   20 +-
 userland/cpp/hello.cpp                        |    2 +-
 userland/cpp/test                             |    1 +
 userland/eigen_hello.cpp                      |   13 -
 userland/false.c                              |   13 -
 userland/gcc/empty_struct.c                   |    2 +-
 userland/gcc/openmp.c                         |   20 +
 userland/gcc/test                             |    1 +
 userland/init_env_poweroff.c                  |   26 -
 userland/ioctl.c                              |   67 -
 userland/kernel_modules/anonymous_inode.c     |   45 +
 userland/kernel_modules/ioctl.c               |   66 +
 userland/kernel_modules/mmap.c                |   93 +
 userland/{ => kernel_modules}/netlink.c       |    2 +-
 userland/kernel_modules/poll.c                |   41 +
 userland/kernel_modules/test                  |    1 +
 userland/{ => kernel_modules}/uio_read.c      |   20 +-
 userland/libs/README.adoc                     |    3 +
 userland/libs/build                           |    1 +
 userland/libs/eigen/build                     |    1 +
 userland/libs/eigen/hello.cpp                 |   15 +
 userland/libs/eigen/test                      |    1 +
 userland/libs/libdrm/build                    |    1 +
 .../libdrm/modeset.c}                         |  652 +--
 userland/libs/libdrm/test                     |    1 +
 userland/libs/openblas/build                  |    1 +
 userland/libs/openblas/hello.c                |   39 +
 userland/libs/openblas/test                   |    1 +
 userland/libs/test                            |    1 +
 userland/linux/README.adoc                    |    4 +-
 userland/{ => linux}/ctrl_alt_del.c           |   20 +-
 userland/linux/init_env_poweroff.c            |   25 +
 userland/linux/myinsmod.c                     |   61 +
 userland/{ => linux}/myrmmod.c                |   18 +-
 userland/linux/pagemap_dump.c                 |  116 +
 userland/{ => linux}/poweroff.c               |    2 +-
 userland/{ => linux}/proc_events.c            |    0
 userland/linux/rand_check.c                   |   41 +
 userland/{ => linux}/sched_getaffinity.c      |    0
 .../{ => linux}/sched_getaffinity_threads.c   |   20 +-
 userland/linux/test                           |    1 +
 userland/{ => linux}/time_boot.c              |    8 +-
 userland/linux/virt_to_phys_user.c            |   25 +
 userland/lkmc/README.adoc                     |    1 +
 userland/lkmc/add.c                           |    1 +
 userland/lkmc/add.py                          |    1 +
 userland/lkmc/assert_fail.c                   |    1 +
 userland/m5ops.h                              |   44 -
 userland/mmap.c                               |   94 -
 userland/myinsmod.c                           |   61 -
 userland/openblas_hello.c                     |   15 -
 userland/openmp.c                             |   19 -
 userland/pagemap_dump.c                       |  116 -
 userland/poll.c                               |   41 -
 userland/posix/count.c                        |   18 +
 userland/posix/environ.c                      |   15 +
 userland/{ => posix}/sleep_forever.c          |    6 +-
 userland/posix/test                           |    1 +
 userland/{ => posix}/virt_to_phys_test.c      |   14 +-
 userland/print_argv.c                         |   10 -
 userland/rand_check.c                         |   41 -
 userland/rdtsc.c                              |   20 -
 userland/ring0.c                              |   14 -
 userland/test                                 |    1 +
 userland/virt_to_phys_user.c                  |   26 -
 388 files changed, 9972 insertions(+), 3432 deletions(-)
 mode change 100644 => 120000 baremetal/add.c
 mode change 100644 => 120000 baremetal/add.py
 create mode 100644 baremetal/arch/aarch64/return1.S
 create mode 100644 baremetal/arch/arm/dump_regs.c
 create mode 100644 baremetal/arch/arm/return1.S
 create mode 120000 baremetal/assert_fail.c
 delete mode 100644 baremetal/exit.c
 create mode 100644 baremetal/exit0.c
 create mode 100644 baremetal/exit1.c
 rename baremetal/{interactive/prompt.c => getchar.c} (89%)
 mode change 100644 => 120000 baremetal/hello.c
 rename baremetal/{interactive => }/infinite_loop.c (100%)
 delete mode 100644 baremetal/interactive/README.adoc
 delete mode 100644 baremetal/interactive/exit1.c
 delete mode 100644 baremetal/interactive/return1.c
 create mode 120000 baremetal/lkmc_assert_fail.c
 create mode 100644 baremetal/return1.c
 create mode 100644 baremetal/return2.c
 create mode 100755 bst-vs-heap.gnuplot
 delete mode 100644 include/README.adoc
 create mode 100644 lkmc/README.adoc
 create mode 100644 lkmc/__init__.py
 create mode 100644 lkmc/add.c
 create mode 100644 lkmc/add.py
 rename {include => lkmc}/anonymous_inode.h (51%)
 create mode 100644 lkmc/assert_fail.c
 create mode 100644 lkmc/hello.c
 create mode 100644 lkmc/import_path.py
 rename {include => lkmc}/ioctl.h (92%)
 rename baremetal/interactive/assert_fail.c => lkmc/lkmc_assert_fail.c (98%)
 create mode 100644 lkmc/m5ops.h
 create mode 100644 lkmc/math.h
 rename {include => lkmc}/netlink.h (72%)
 rename userland/common_userland.h => lkmc/pagemap.h (86%)
 rename {include => lkmc}/ring0.h (75%)
 create mode 100644 path_properties.py
 create mode 100644 rootfs_overlay/.profile
 delete mode 100755 rootfs_overlay/anonymous_inode.sh
 delete mode 100755 rootfs_overlay/conf.sh
 delete mode 100755 rootfs_overlay/eval.sh
 delete mode 100755 rootfs_overlay/eval_base64.sh
 delete mode 100755 rootfs_overlay/ioctl.sh
 create mode 100755 rootfs_overlay/lkmc/anonymous_inode.sh
 rename rootfs_overlay/{ => lkmc}/character_device.sh (100%)
 rename rootfs_overlay/{ => lkmc}/character_device_create.sh (100%)
 create mode 100755 rootfs_overlay/lkmc/conf.sh
 rename rootfs_overlay/{ => lkmc}/count.sh (64%)
 rename rootfs_overlay/{ => lkmc}/debugfs.sh (100%)
 rename rootfs_overlay/{ => lkmc}/dep.sh (100%)
 create mode 100755 rootfs_overlay/lkmc/eval_base64.sh
 rename rootfs_overlay/{ => lkmc}/fb.sh (100%)
 rename rootfs_overlay/{ => lkmc}/fops.sh (100%)
 rename rootfs_overlay/{ => lkmc}/gdbserver.sh (100%)
 rename rootfs_overlay/{ => lkmc}/gem5.sh (100%)
 rename rootfs_overlay/{ => lkmc}/gem5_exit.sh (100%)
 rename rootfs_overlay/{ => lkmc}/gpio.sh (100%)
 rename rootfs_overlay/{ => lkmc}/init_forward.sh (100%)
 rename rootfs_overlay/{ => lkmc}/init_lkmc.sh (100%)
 rename rootfs_overlay/{ => lkmc}/init_module.sh (100%)
 rename rootfs_overlay/{ => lkmc}/insrm.sh (100%)
 create mode 100755 rootfs_overlay/lkmc/ioctl.sh
 rename rootfs_overlay/{ => lkmc}/kgdb.sh (100%)
 rename rootfs_overlay/{ => lkmc}/kstrto.sh (100%)
 create mode 100755 rootfs_overlay/lkmc/loginroot.sh
 rename rootfs_overlay/{ => lkmc}/mknoddev.sh (100%)
 create mode 100755 rootfs_overlay/lkmc/mmap.sh
 create mode 100755 rootfs_overlay/lkmc/netlink.sh
 rename rootfs_overlay/{ => lkmc}/params.sh (100%)
 rename rootfs_overlay/{ => lkmc}/pci_rescan.sh (100%)
 rename rootfs_overlay/{ => lkmc}/pmccntr.sh (100%)
 create mode 100755 rootfs_overlay/lkmc/poll.sh
 rename rootfs_overlay/{ => lkmc}/pr_debug.sh (100%)
 rename rootfs_overlay/{ => lkmc}/procfs.sh (100%)
 rename rootfs_overlay/{ => lkmc}/psa.sh (100%)
 rename rootfs_overlay/{ => lkmc}/qemu_edu.sh (100%)
 rename rootfs_overlay/{ => lkmc}/rand_check_poweroff.sh (88%)
 rename rootfs_overlay/{ => lkmc}/seq_file.sh (100%)
 rename rootfs_overlay/{ => lkmc}/seq_file_single_open.sh (100%)
 rename rootfs_overlay/{ => lkmc}/sshd.sh (100%)
 rename rootfs_overlay/{ => lkmc}/sysfs.sh (100%)
 rename rootfs_overlay/{ => lkmc}/test_all.sh (95%)
 rename rootfs_overlay/{ => lkmc}/test_fail.sh (79%)
 rename rootfs_overlay/{ => lkmc}/uio_read.sh (94%)
 rename rootfs_overlay/{ => lkmc}/vermagic.sh (100%)
 rename rootfs_overlay/{ => lkmc}/virt_to_phys.sh (100%)
 delete mode 100755 rootfs_overlay/loginroot.sh
 delete mode 100755 rootfs_overlay/mmap.sh
 delete mode 100755 rootfs_overlay/netlink.sh
 delete mode 100755 rootfs_overlay/poll.sh
 delete mode 100644 rootfs_overlay/root/.profile
 create mode 100755 test-build-userland
 create mode 100755 test-test-user-mode
 create mode 100755 test-user-mode-in-tree
 create mode 100644 thread_pool.py
 rename tmu => tmux-split (93%)
 delete mode 100644 userland/README.adoc
 delete mode 120000 userland/add.c
 delete mode 120000 userland/add.py
 delete mode 100644 userland/anonymous_inode.c
 create mode 100644 userland/arch/aarch64/add.S
 create mode 100644 userland/arch/aarch64/add_vector.S
 create mode 100644 userland/arch/aarch64/adr.S
 create mode 100644 userland/arch/aarch64/adrp.S
 delete mode 100644 userland/arch/aarch64/asm_hello.c
 create mode 100644 userland/arch/aarch64/beq.S
 create mode 100644 userland/arch/aarch64/bfi.S
 create mode 120000 userland/arch/aarch64/c/build
 create mode 100644 userland/arch/aarch64/c/earlyclobber.c
 create mode 120000 userland/arch/aarch64/c/freestanding/build
 create mode 120000 userland/arch/aarch64/c/freestanding/linux/build
 create mode 100644 userland/arch/aarch64/c/freestanding/linux/hello.c
 create mode 100644 userland/arch/aarch64/c/freestanding/linux/hello_clobbers.c
 create mode 120000 userland/arch/aarch64/c/freestanding/linux/test
 create mode 120000 userland/arch/aarch64/c/freestanding/test
 create mode 100644 userland/arch/aarch64/c/inc.c
 create mode 100644 userland/arch/aarch64/c/inc_float.c
 create mode 100644 userland/arch/aarch64/c/linux/asm_from_c.c
 create mode 120000 userland/arch/aarch64/c/linux/build
 create mode 120000 userland/arch/aarch64/c/linux/test
 create mode 100644 userland/arch/aarch64/c/multiline.cpp
 create mode 100644 userland/arch/aarch64/c/reg_var.c
 create mode 100644 userland/arch/aarch64/c/reg_var_float.c
 create mode 120000 userland/arch/aarch64/c/test
 create mode 100644 userland/arch/aarch64/cbz.S
 create mode 100644 userland/arch/aarch64/comments.S
 create mode 100644 userland/arch/aarch64/common_arch.h
 create mode 100644 userland/arch/aarch64/cset.S
 create mode 100644 userland/arch/aarch64/fadd_scalar.S
 create mode 100644 userland/arch/aarch64/fadd_vector.S
 create mode 120000 userland/arch/aarch64/freestanding/build
 create mode 120000 userland/arch/aarch64/freestanding/linux/build
 create mode 100644 userland/arch/aarch64/freestanding/linux/hello.S
 create mode 120000 userland/arch/aarch64/freestanding/linux/test
 create mode 120000 userland/arch/aarch64/freestanding/test
 create mode 100644 userland/arch/aarch64/gas_data_sizes.S
 create mode 100644 userland/arch/aarch64/immediates.S
 create mode 100644 userland/arch/aarch64/ld2.S
 create mode 100644 userland/arch/aarch64/movk.S
 create mode 100644 userland/arch/aarch64/movn.S
 create mode 100644 userland/arch/aarch64/pc.S
 create mode 100644 userland/arch/aarch64/registers.S
 create mode 100644 userland/arch/aarch64/ret.S
 create mode 100644 userland/arch/aarch64/str.S
 create mode 120000 userland/arch/aarch64/test
 create mode 100644 userland/arch/aarch64/ubfm.S
 create mode 100644 userland/arch/aarch64/ubfx.S
 create mode 100644 userland/arch/aarch64/x31.S
 create mode 100644 userland/arch/arm/add.S
 create mode 100644 userland/arch/arm/address_modes.S
 create mode 100644 userland/arch/arm/adr.S
 create mode 100644 userland/arch/arm/and.S
 create mode 100644 userland/arch/arm/b.S
 create mode 100644 userland/arch/arm/beq.S
 create mode 100644 userland/arch/arm/bfi.S
 create mode 100644 userland/arch/arm/bic.S
 create mode 100644 userland/arch/arm/bl.S
 create mode 120000 userland/arch/arm/build
 create mode 100644 userland/arch/arm/c/add.c
 create mode 120000 userland/arch/arm/c/build
 create mode 120000 userland/arch/arm/c/freestanding/build
 create mode 120000 userland/arch/arm/c/freestanding/linux/build
 create mode 100644 userland/arch/arm/c/freestanding/linux/hello.c
 create mode 120000 userland/arch/arm/c/freestanding/linux/test
 create mode 120000 userland/arch/arm/c/freestanding/test
 create mode 100644 userland/arch/arm/c/inc.c
 create mode 100644 userland/arch/arm/c/inc_float.c
 create mode 100644 userland/arch/arm/c/inc_memory.c
 create mode 100644 userland/arch/arm/c/inc_memory_global.c
 create mode 100644 userland/arch/arm/c/reg_var.c
 create mode 120000 userland/arch/arm/c/test
 create mode 100644 userland/arch/arm/clz.S
 create mode 100644 userland/arch/arm/comments.S
 create mode 100644 userland/arch/arm/common_arch.h
 create mode 100644 userland/arch/arm/cond.S
 create mode 120000 userland/arch/arm/freestanding/build
 create mode 120000 userland/arch/arm/freestanding/linux/build
 create mode 100644 userland/arch/arm/freestanding/linux/hello.S
 create mode 120000 userland/arch/arm/freestanding/linux/test
 create mode 120000 userland/arch/arm/freestanding/test
 create mode 100644 userland/arch/arm/gas_data_sizes.S
 create mode 100644 userland/arch/arm/immediates.S
 create mode 100644 userland/arch/arm/inc_array.S
 create mode 100644 userland/arch/arm/ldmia.S
 create mode 100644 userland/arch/arm/ldr_pseudo.S
 create mode 100644 userland/arch/arm/ldrb.S
 create mode 100644 userland/arch/arm/ldrh.S
 create mode 120000 userland/arch/arm/linux/build
 create mode 100644 userland/arch/arm/linux/c_from_asm.S
 create mode 120000 userland/arch/arm/linux/test
 create mode 100644 userland/arch/arm/mov.S
 create mode 100644 userland/arch/arm/movw.S
 create mode 100644 userland/arch/arm/mul.S
 create mode 100644 userland/arch/arm/nop.S
 create mode 100644 userland/arch/arm/push.S
 create mode 100644 userland/arch/arm/rbit.S
 create mode 100644 userland/arch/arm/registers.S
 create mode 100644 userland/arch/arm/rev.S
 create mode 100644 userland/arch/arm/s_suffix.S
 create mode 100644 userland/arch/arm/shift.S
 create mode 100644 userland/arch/arm/str.S
 create mode 100644 userland/arch/arm/sub.S
 create mode 120000 userland/arch/arm/test
 create mode 100644 userland/arch/arm/thumb.S
 create mode 100644 userland/arch/arm/tst.S
 create mode 100644 userland/arch/arm/vadd_scalar.S
 create mode 100644 userland/arch/arm/vadd_vector.S
 create mode 100644 userland/arch/arm/vcvt.S
 create mode 100644 userland/arch/arm/vcvta.S
 create mode 100644 userland/arch/arm/vcvtr.S
 create mode 100644 userland/arch/common.h
 create mode 100644 userland/arch/empty.S
 create mode 100644 userland/arch/fail.S
 create mode 100644 userland/arch/main.c
 create mode 120000 userland/arch/test
 create mode 100644 userland/arch/x86_64/add.S
 create mode 100644 userland/arch/x86_64/addpd.S
 delete mode 100644 userland/arch/x86_64/asm_hello.c
 delete mode 100644 userland/arch/x86_64/binutils_hack.c
 create mode 100644 userland/arch/x86_64/c/add.c
 create mode 100644 userland/arch/x86_64/c/binutils_hack.c
 create mode 100644 userland/arch/x86_64/c/binutils_nohack.c
 create mode 120000 userland/arch/x86_64/c/build
 create mode 120000 userland/arch/x86_64/c/freestanding/build
 create mode 120000 userland/arch/x86_64/c/freestanding/linux/build
 create mode 100644 userland/arch/x86_64/c/freestanding/linux/hello.c
 create mode 100644 userland/arch/x86_64/c/freestanding/linux/hello_regvar.c
 create mode 120000 userland/arch/x86_64/c/freestanding/linux/test
 create mode 120000 userland/arch/x86_64/c/freestanding/test
 create mode 100644 userland/arch/x86_64/c/inc.c
 create mode 100644 userland/arch/x86_64/c/rdtsc.c
 create mode 100644 userland/arch/x86_64/c/ring0.c
 create mode 100644 userland/arch/x86_64/c/scratch.c
 create mode 100644 userland/arch/x86_64/c/scratch_hardcode.c
 create mode 120000 userland/arch/x86_64/c/test
 create mode 100644 userland/arch/x86_64/common_arch.h
 delete mode 100644 userland/arch/x86_64/freestanding/hello.S
 create mode 120000 userland/arch/x86_64/freestanding/linux/build
 create mode 100644 userland/arch/x86_64/freestanding/linux/hello.S
 create mode 120000 userland/arch/x86_64/freestanding/linux/test
 create mode 120000 userland/arch/x86_64/freestanding/test
 create mode 100644 userland/arch/x86_64/gas_data_sizes.S
 create mode 100644 userland/arch/x86_64/lkmc_assert_eq_fail.S
 create mode 100644 userland/arch/x86_64/lkmc_assert_memcmp_fail.S
 create mode 100644 userland/arch/x86_64/paddq.S
 create mode 120000 userland/arch/x86_64/test
 delete mode 120000 userland/assert_fail.c
 create mode 100644 userland/c/README.adoc
 create mode 120000 userland/c/assert_fail.c
 create mode 100644 userland/c/false.c
 rename userland/{ => c}/gcc_hack.c (100%)
 create mode 100644 userland/c/getchar.c
 mode change 100644 => 120000 userland/c/hello.c
 create mode 100644 userland/c/infinite_loop.c
 rename userland/{ => c}/m5ops.c (73%)
 create mode 100644 userland/c/print_argv.c
 create mode 100644 userland/c/stderr.c
 create mode 120000 userland/c/test
 delete mode 100644 userland/count.c
 rename userland/{ => cpp}/bst_vs_heap.cpp (65%)
 create mode 120000 userland/cpp/test
 delete mode 100644 userland/eigen_hello.cpp
 delete mode 100644 userland/false.c
 create mode 100644 userland/gcc/openmp.c
 create mode 120000 userland/gcc/test
 delete mode 100644 userland/init_env_poweroff.c
 delete mode 100644 userland/ioctl.c
 create mode 100644 userland/kernel_modules/anonymous_inode.c
 create mode 100644 userland/kernel_modules/ioctl.c
 create mode 100644 userland/kernel_modules/mmap.c
 rename userland/{ => kernel_modules}/netlink.c (97%)
 create mode 100644 userland/kernel_modules/poll.c
 create mode 120000 userland/kernel_modules/test
 rename userland/{ => kernel_modules}/uio_read.c (89%)
 create mode 100644 userland/libs/README.adoc
 create mode 120000 userland/libs/build
 create mode 120000 userland/libs/eigen/build
 create mode 100644 userland/libs/eigen/hello.cpp
 create mode 120000 userland/libs/eigen/test
 create mode 120000 userland/libs/libdrm/build
 rename userland/{libdrm_modeset.c => libs/libdrm/modeset.c} (59%)
 create mode 120000 userland/libs/libdrm/test
 create mode 120000 userland/libs/openblas/build
 create mode 100644 userland/libs/openblas/hello.c
 create mode 120000 userland/libs/openblas/test
 create mode 120000 userland/libs/test
 rename userland/{ => linux}/ctrl_alt_del.c (55%)
 create mode 100644 userland/linux/init_env_poweroff.c
 create mode 100644 userland/linux/myinsmod.c
 rename userland/{ => linux}/myrmmod.c (59%)
 create mode 100644 userland/linux/pagemap_dump.c
 rename userland/{ => linux}/poweroff.c (86%)
 rename userland/{ => linux}/proc_events.c (100%)
 create mode 100644 userland/linux/rand_check.c
 rename userland/{ => linux}/sched_getaffinity.c (100%)
 rename userland/{ => linux}/sched_getaffinity_threads.c (75%)
 create mode 120000 userland/linux/test
 rename userland/{ => linux}/time_boot.c (70%)
 create mode 100644 userland/linux/virt_to_phys_user.c
 create mode 100644 userland/lkmc/README.adoc
 create mode 120000 userland/lkmc/add.c
 create mode 120000 userland/lkmc/add.py
 create mode 120000 userland/lkmc/assert_fail.c
 delete mode 100644 userland/m5ops.h
 delete mode 100644 userland/mmap.c
 delete mode 100644 userland/myinsmod.c
 delete mode 100644 userland/openblas_hello.c
 delete mode 100644 userland/openmp.c
 delete mode 100644 userland/pagemap_dump.c
 delete mode 100644 userland/poll.c
 create mode 100644 userland/posix/count.c
 create mode 100644 userland/posix/environ.c
 rename userland/{ => posix}/sleep_forever.c (73%)
 create mode 120000 userland/posix/test
 rename userland/{ => posix}/virt_to_phys_test.c (60%)
 delete mode 100644 userland/print_argv.c
 delete mode 100644 userland/rand_check.c
 delete mode 100644 userland/rdtsc.c
 delete mode 100644 userland/ring0.c
 create mode 120000 userland/test
 delete mode 100644 userland/virt_to_phys_user.c

diff --git a/.gitignore b/.gitignore
index efb9040..f8fbe8c 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,9 +1,10 @@
 # Extensions and prefixes.
 *.tmp
 tmp.*
+*.tmp.*
 *~
-?*.gitignore
-gitignore*
+*.gitignore
+gitignore.*
 
 # Specific files.
 /data
@@ -22,6 +23,9 @@ __pycache__
 *.o
 *.out
 
+# Data to be plotted output.
+*.dat
+
 # Kernel modules.
 .cache.mk
 *.ko.cmd
diff --git a/.gitmodules b/.gitmodules
index 1b2b814..951732e 100644
--- a/.gitmodules
+++ b/.gitmodules
@@ -3,19 +3,28 @@
 	url = git://git.kernel.org/pub/scm/linux/kernel/git/mark/boot-wrapper-aarch64.git
 [submodule "submodules/binutils-gdb"]
 	path = submodules/binutils-gdb
+    # url = git://sourceware.org/git/binutils-gdb.git
 	url = https://github.com/cirosantilli/binutils-gdb
 [submodule "submodules/buildroot"]
 	path = submodules/buildroot
+    # url = git://git.busybox.net/buildroot
 	url = https://github.com/cirosantilli/buildroot
 	ignore = dirty
 [submodule "submodules/crosstool-ng"]
 	path = submodules/crosstool-ng
-	url = https://github.com/crosstool-ng/crosstool-ng
+    # url = https://github.com/crosstool-ng/crosstool-ng
+	url = https://github.com/cirosantilli/crosstool-ng
+[submodule "submodules/gcc"]
+	path = submodules/gcc
+    # url = git://gcc.gnu.org/git/gcc.git
+	url = https://github.com/cirosantilli/gcc
 [submodule "submodules/gem5"]
 	path = submodules/gem5
-	url = https://gem5.googlesource.com/public/gem5
+    # url = https://gem5.googlesource.com/public/gem5
+	url = https://github.com/cirosantilli/gem5
 [submodule "submodules/glibc"]
 	path = submodules/glibc
+    # url = git://sourceware.org/git/glibc.git
 	url = https://github.com/cirosantilli/glibc
 # The true upstream does not accept git submodule update --init --depth 1
 # git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
@@ -23,16 +32,16 @@
 # https://unix.stackexchange.com/questions/338578/linux-kernel-source-code-size-difference
 [submodule "submodules/linux"]
 	path = submodules/linux
+    # usl = git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
 	url = https://github.com/cirosantilli/linux
 [submodule "submodules/parsec-benchmark"]
 	path = submodules/parsec-benchmark
 	url = https://github.com/cirosantilli/parsec-benchmark
 [submodule "submodules/qemu"]
 	path = submodules/qemu
+    # url = https://github.com/qemu/qemu
 	url = https://github.com/cirosantilli/qemu
 [submodule "submodules/xen"]
 	path = submodules/xen
-	url = git://xenbits.xen.org/xen.git
-[submodule "submodules/gcc"]
-	path = submodules/gcc
-	url = https://github.com/cirosantilli/gcc
+    # url = git://xenbits.xen.org/xen.git
+	url = https://github.com/cirosantilli/xen
diff --git a/README.adoc b/README.adoc
index 0a73786..1599682 100644
--- a/README.adoc
+++ b/README.adoc
@@ -9,7 +9,7 @@
 :toclevels: 6
 :toc-title:
 
-The perfect emulation setup to study and modify the <<linux-kernel>>, kernel modules, <<qemu-buildroot-setup,QEMU>> and <<gem5-buildroot-setup,gem5>>. Highly automated. Thoroughly documented. <<gdb>> and <<kgdb>> just work. Automated <<test-this-repo,tests>>. Powered by <<about-the-qemu-buildroot-setup,Buildroot>>. "Tested" in Ubuntu 18.04 host, x86_64, ARMv7 and ARMv8 guests with kernel v5.0.
+The perfect emulation setup to study and develop the <<linux-kernel>> v5.0, kernel modules, <<qemu-buildroot-setup,QEMU>>, <<gem5-buildroot-setup,gem5>> and x86_64, ARMv7 and ARMv8 <<userland-assembly,userland>> and <<baremetal-setup,baremetal>> assembly. <<gdb>> and <<kgdb>> just work. Powered by <<about-the-qemu-buildroot-setup,Buildroot>> and <<about-the-baremetal-setup,crosstool-NG>>.  Highly automated. Thoroughly documented. Automated <<test-this-repo,tests>>. "Tested" in an Ubuntu 18.04 host.
 
 TL;DR: <<qemu-buildroot-setup-getting-started>>
 
@@ -27,7 +27,7 @@ Design goals of this project are documented at: <<design-goals>>.
 
 ==== QEMU Buildroot setup getting started
 
-This setup has been mostly tested on Ubuntu. For other host operating systems see: <<supported-hosts>>.
+This setup has been mostly tested on Ubuntu. For other host operating systems see: <<supported-hosts>>. For greater stability, consider using the <<release-procedure,latest release>> instead of master: https://github.com/cirosantilli/linux-kernel-module-cheat/releases
 
 Reserve 12Gb of disk and run:
 
@@ -51,11 +51,11 @@ If you don't want to wait, you could also try the following faster but much more
 
 but you will soon find that they are simply not enough if you anywhere near serious about systems programming.
 
-After `./run`, QEMU opens up and you can start playing with the kernel modules inside the simulated system:
+After `./run`, QEMU opens up leaving you in the <<lkmc_home,`/lkmc/` directory>>, and you can start playing with the kernel modules inside the simulated system:
 
 ....
-insmod /hello.ko
-insmod /hello2.ko
+insmod hello.ko
+insmod hello2.ko
 rmmod hello
 rmmod hello2
 ....
@@ -216,7 +216,7 @@ Now there are two ways to test it out: the fast way, and the safe way.
 The fast way is, without quitting or rebooting QEMU, just directly re-insert the module with:
 
 ....
-insmod /mnt/9p/out_rootfs_overlay/hello.ko
+insmod /mnt/9p/out_rootfs_overlay/lkmc/hello.ko
 ....
 
 and the new `pr_info` message should now show on the terminal at the end of the boot.
@@ -236,7 +236,7 @@ The safe way, is to fist <<rebuild-buildroot-while-running,quit QEMU>>, rebuild
 ....
 ./build-modules
 ./build-buildroot
-./run --eval-after 'insmod /hello.ko'
+./run --eval-after 'insmod hello.ko'
 ....
 
 `./build-buildroot` is required after `./build-modules` because it re-generates the root filesystem with the modules that we compiled at `./build-modules`.
@@ -247,7 +247,7 @@ You can see that `./build` does that as well, by running:
 ./build --dry-run
 ....
 
-`--eval-after` is optional: you could just type `insmod /hello.ko` in the terminal, but this makes it run automatically at the end of boot, and then drops you into a shell.
+`--eval-after` is optional: you could just type `insmod hello.ko` in the terminal, but this makes it run automatically at the end of boot, and then drops you into a shell.
 
 If the guest and host are the same arch, typically x86_64, you can speed up boot further with <<kvm>>:
 
@@ -301,6 +301,8 @@ The only thing you can do with open source is purely functional designs with lin
 
 If you really want to develop semiconductors, your only choice is to join an university or a semiconductor company that has the EDA licenses.
 
+See also: <<should-you-waste-your-life-with-systems-programming>>.
+
 While hacking QEMU, you will likely want to GDB step its source. That is trivial since QEMU is just another userland program like any other, but our setup has a shortcut to make it even more convenient, see: <<debug-the-emulator>>.
 
 ===== Your first glibc hack
@@ -328,7 +330,7 @@ index 706b20b492..23185948f3 100644
        && _IO_putc_unlocked ('\n', _IO_stdout) != EOF)
 -    result = MIN (INT_MAX, len + 1);
 +    result = MIN (INT_MAX, len + 1 + 7);
- 
+
    _IO_release_lock (_IO_stdout);
    return result;
 ....
@@ -336,7 +338,7 @@ index 706b20b492..23185948f3 100644
 And then:
 
 ....
-./run --eval-after '/hello.out'
+./run --eval-after './c/hello.out'
 ....
 
 outputs:
@@ -350,7 +352,7 @@ Lol!
 We can also test our hacked glibc on <<user-mode-simulation>> with:
 
 ....
-./run --userland hello
+./run --userland userland/c/hello.c
 ....
 
 I just noticed that this is actually a good way to develop glibc for other archs.
@@ -369,9 +371,9 @@ Tested on a30ed0f047523ff2368d421ee2cce0800682c44e + 1.
 
 Have you ever felt that a single `inc` instruction was not enough? Really? Me too!
 
-So let's hack the link:https://en.wikipedia.org/wiki/GNU_Assembler[GNU GAS assembler], which is part of link:https://en.wikipedia.org/wiki/GNU_Binutils[GNU Binutils], to add a new shiny version of `inc` called... `myinc`!
+So let's hack the <<gnu-gas-assembler>>, which is part of link:https://en.wikipedia.org/wiki/GNU_Binutils[GNU Binutils], to add a new shiny version of `inc` called... `myinc`!
 
-GCC uses GNU GAS as its backend, so we will test out new mnemonic with an inline assembly test program: link:userland/arch/x86_64/binutils_hack.c[], which is just a copy of link:userland/arch/x86_64/asm_hello.c[] but with `myinc` instead of `inc`.
+GCC uses GNU GAS as its backend, so we will test out new mnemonic with an <<gcc-inline-assembly>> test program: link:userland/arch/x86_64/binutils_hack.c[], which is just a copy of link:userland/arch/x86_64/asm_hello.c[] but with `myinc` instead of `inc`.
 
 The inline assembly is disabled with an `#ifdef`, so first modify the source to enable that.
 
@@ -422,12 +424,12 @@ index af583ce578..3cc341f303 100644
          0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
 ....
 
-Finally, rebuild Binutils, userland and test our program with <<user-mode-setup>>:
+Finally, rebuild Binutils, userland and test our program with <<user-mode-simulation>>:
 
 ....
 ./build-buildroot -- host-binutils-rebuild
 ./build-userland --static
-./run --static --userland arch/x86_64/binutils_hack
+./run --static --userland userland/arch/x86_64/binutils_hack.c
 ....
 
 and we se that `myinc` worked since the assert did not fail!
@@ -438,13 +440,13 @@ Tested on b60784d59bee993bf0de5cde6c6380dd69420dda + 1.
 
 OK, now time to hack GCC.
 
-For convenience, let's use the <<user-mode-setup>>.
+For convenience, let's use the <<user-mode-simulation>>.
 
-If we run the program link:userland/gcc_hack.c[]:
+If we run the program link:userland/c/gcc_hack.c[]:
 
 ....
 ./build-userland --static
-./run --static --userland gcc_hack
+./run --static --userland userland/c/gcc_hack.c
 ....
 
 it produces the normal boring output:
@@ -496,7 +498,7 @@ Now rebuild GCC, the program and re-run it:
 ....
 ./build-buildroot -- host-gcc-final-rebuild
 ./build-userland --static
-./run --static --userland gcc_hack
+./run --static --userland userland/c/gcc_hack.c
 ....
 
 and the new ouptut is now:
@@ -744,6 +746,10 @@ unzip lkmc-*.zip
 ./run --qemu-which host
 ....
 
+You have to checkout to the latest tag to ensure that the scripts match the release format: https://stackoverflow.com/questions/1404796/how-to-get-the-latest-tag-name-in-current-branch-in-git
+
+This is known not to work for aarch64 on an Ubuntu 16.04 host with QEMU 2.5.0, presumably because QEMU is too old, the terminal does not show any output. I haven't investigated why.
+
 Or to run a baremetal example instead:
 
 ....
@@ -754,8 +760,6 @@ Or to run a baremetal example instead:
 ;
 ....
 
-You have to checkout to the latest tag to ensure that the scripts match the release format: https://stackoverflow.com/questions/1404796/how-to-get-the-latest-tag-name-in-current-branch-in-git
-
 Be saner and use our custom built QEMU instead:
 
 ....
@@ -929,6 +933,246 @@ sudo rmmod hello.ko
 dmesg
 ....
 
+=== Userland setup
+
+==== About the userland setup
+
+In order to test the kernel and emulators, userland content in the form of executables and scripts is of course required, and we store it mostly under:
+
+* link:userland/[]
+* <<rootfs_overlay>>
+* <<add-new-buildroot-packages>>
+
+When we started this repository, it only contained content that interacted very closely with the kernel, or that had required performance analysis.
+
+However, we soon started to notice that this had an increasing overlap with other userland test repositories: we were duplicating build and test infrastructure and even some examples.
+
+Therefore, we decided to consolidate other userland tutorials that we had scattered around into this repository.
+
+Notable userland content included / moving into this repository includes:
+
+* <<userland-assembly>>
+* <<c>>
+* <<cpp>>
+* <<posix>>
+* https://github.com/cirosantilli/algorithm-cheat TODO will be good to move here for performance analysis <<gem5-run-benchmark,with gem5>>
+
+==== Userland setup getting started
+
+There are several ways to run our userland content, notably:
+
+* natively on the host as shown at: <<userland-setup-getting-started-natively>>
++
+Can only run examples compatible with your host CPU architecture and OS, but has the fastest setup and runtimes.
+* from user mode simulation with:
++
+--
+** the host prebuilt toolchain: <<userland-setup-getting-started-with-prebuilt-toolchain-and-qemu-user-mode>>
+** the Buildroot toolchain you built yourself: <<qemu-user-mode-getting-started>>
+--
++
+This setup:
++
+--
+** can run most examples, including those for other CPU architectures, with the notable exception of examples that rely on kernel modules
+** can run reproducible approximate performance experiments with gem5, see e.g. <<bst-vs-heap>>
+--
+* from full system simulation as shown at: <<qemu-buildroot-setup-getting-started>>.
++
+This is the most reproducible and controlled environment, and all examples work there. But also the slower one to setup.
+
+===== Userland setup getting started natively
+
+With this setup, we will use the host toolchain and execute executables directly on the host.
+
+No installation or toolchain build is required, so you can just jump straight into it.
+
+Build, run and example, and clean it in-tree with:
+
+....
+sudo apt-get install gcc
+cd userland
+./build c/hello
+./c/hello.out
+./build --clean
+....
+
+Source: link:userland/c/hello.c[].
+
+Build an entire directory and test it:
+
+....
+cd userland
+./build c
+./test c
+....
+
+Build the current directory and test it:
+
+....
+cd userland/c
+./build
+./test
+....
+
+As mentioned at <<user-mode-tests>>, tests under link:userland/libs[] require certain optional libraries to be installed, and are not built or tested by default.
+
+You can install those libraries with:
+
+....
+cd linux-kernel-module-cheat
+./build --download-dependencies userland-host
+....
+
+and then build the examples and test with:
+
+....
+./build --package-all
+./test --package-all
+....
+
+Pass custom compiler options:
+
+....
+./build --ccflags='-foptimize-sibling-calls -foptimize-strlen' --force-rebuild
+....
+
+Here we used `--force-rebuild` to force rebuild since the sources weren't modified since the last build.
+
+Some CLI options have more specialized flags, e.g. `-O` optimization level:
+
+....
+./build --optimization-level 3 --force-rebuild
+....
+
+See also <<user-mode-static-executables>> for `--static`.
+
+The `build` scripts inside link:userland/[] are just symlinks to link:build-userland-in-tree[] which you can also use from toplevel as:
+
+....
+./build-userland-in-tree
+./build-userland-in-tree userland/c
+./build-userland-in-tree userland/c/hello.c
+....
+
+`build-userland-in-tre` is in turn just a thin wrapper around link:build-userland[]:
+
+....
+./build-userland --gcc-which host --in-tree userland/c
+....
+
+So you can use any option supported by `build-userland` script freely with `build-userland-in-tree` and `build`.
+
+The situation is analogous for link:userland/test[], link:test-user-mode-in-tree[] and link:test-user-mode[], which are further documented at: <<user-mode-tests>>.
+
+Do a more clean out-of-tree build instead and run the program:
+
+....
+./build-userland --gcc-which host --userland-build-id host
+./run --emulator native --userland userland/c/hello.c --userland-build-id host
+....
+
+Here we:
+
+* put the host executables in a separate <<build-variants,build-variant>> to avoid conflict with Buildroot builds.
+* ran with the `--emulator native` option to run the program natively
+
+In this case you can debub the program with:
+
+....
+./run --debug-vm --emulator native --userland userland/c/hello.c --userland-build-id host
+....
+
+as shown at: <<debug-the-emulator>>, although direct GDB host usage works as well of course.
+
+===== Userland setup getting started with prebuilt toolchain and QEMU user mode
+
+If you are lazy to built the Buildroot toolchain and QEMU, but want to run e.g. ARM <<userland-assembly>> in <<user-mode-simulation>>, you can get away on Ubuntu 18.04 with just:
+
+....
+sudo apt-get install gcc-aarch64-linux-gnu qemu-system-aarch64
+./build-userland \
+  --arch aarch64 \
+  --gcc-which host \
+  --userland-build-id host \
+;
+./run \
+  --arch aarch64 \
+  --qemu-which host \
+  --userland-build-id host \
+  --userland userland/c/print_argv.c \
+  --userland-args 'asdf "qw er"' \
+;
+....
+
+where:
+
+* `--gcc-which host`: use the host toolchain.
++
+We must pass this to `./run` as well because QEMU must know which dynamic libraries to use. See also: <<user-mode-static-executables>>.
+* `--userland-build-id host`: put the host built into a <<build-variants>>
+
+This present the usual trade-offs of using prebuilts as mentioned at: <<prebuilt>>.
+
+Other functionality are analogous, e.g. testing:
+
+....
+./test-user-mode \
+  --arch aarch64 \
+  --gcc-which host \
+  --qemu-which host \
+  --userland-build-id host \
+;
+....
+
+and <<user-mode-gdb>>:
+
+....
+./run \
+  --arch aarch64 \
+  --gdb \
+  --gcc-which host \
+  --qemu-which host \
+  --userland-build-id host \
+  --userland userland/c/print_argv.c \
+  --userland-args 'asdf "qw er"' \
+;
+....
+
+===== Userland setup getting started full system
+
+First ensure that <<qemu-buildroot-setup>> is working.
+
+After doing that setup, you can already execute your userland programs from inside QEMU: the only missing step is how to rebuild executables and run them.
+
+And the answer is exactly analogous to what is shown at: <<your-first-kernel-module-hack>>
+
+For example, if we modify link:userland/c/hello.c[] to print out something different, we can just rebuild it with:
+
+....
+./build-userland
+....
+
+Source: link:build-userland[]. `./build` calls that script automatically for us when doing the initial full build.
+
+Now, run the program either without rebooting use the <<9p>> mount:
+
+....
+/mnt/9p/out_rootfs_overlay/c/hello.out
+....
+
+or shutdown QEMU, add the executable to the root filesystem:
+
+....
+./build-buildroot
+....
+
+reboot and use the root filesystem as usual:
+
+....
+./hello.out
+....
+
 === Baremetal setup
 
 ==== About the baremetal setup
@@ -953,7 +1197,7 @@ For example, to run link:baremetal/hello.c[] in QEMU do:
 
 ....
 ./build --arch aarch64 --download-dependencies qemu-baremetal
-./run --arch aarch64 --baremetal hello
+./run --arch aarch64 --baremetal baremetal/hello.c
 ....
 
 The terminal prints:
@@ -965,7 +1209,7 @@ hello
 Now let's run link:baremetal/arch/aarch64/add.S[]:
 
 ....
-./run --arch aarch64 --baremetal arch/aarch64/add
+./run --arch aarch64 --baremetal baremetal/arch/aarch64/add.S
 ....
 
 This time, the terminal does not print anything, which indicates success.
@@ -975,7 +1219,7 @@ If you look into the source, you will see that we just have an assertion there.
 You can see a sample assertion fail in link:baremetal/interactive/assert_fail.c[]:
 
 ....
-./run --arch aarch64 --baremetal interactive/assert_fail
+./run --arch aarch64 --baremetal baremetal/interactive/assert_fail.c
 ....
 
 and the terminal contains:
@@ -991,7 +1235,7 @@ and the exit status of our script is 1:
 echo $?
 ....
 
-To modify a baremetal program, simply edit the file, .g.
+To modify a baremetal program, simply edit the file, e.g.
 
 ....
 vim baremetal/hello.c
@@ -1000,11 +1244,11 @@ vim baremetal/hello.c
 and rebuild:
 
 ....
-./build --arch aarch64 --download-dependencies qemu-baremetal
-./run --arch aarch64 --baremetal hello
+./build-baremetal --arch aarch64
+./run --arch aarch64 --baremetal baremetal/hello.c
 ....
 
-`./build qemu-baremetal` had called link:build-baremetal[] for us previously, in addition to its requirements.
+`./build qemu-baremetal` that we run previously is only needed for the initial build. That script calls link:build-baremetal[] for us, in addition to building prerequisites such as QEMU and crosstool-NG.
 
 `./build-baremetal` uses crosstool-NG, and so it must be preceded by link:build-crosstool-ng[], which `./build qemu-baremetal` also calls.
 
@@ -1018,14 +1262,14 @@ Alternatively, for the sake of tab completion, we also accept relative paths ins
 Absolute paths however are used as is and must point to the actual executable:
 
 ....
-./run --arch aarch64 --baremetal "$(./getvar --arch aarch64 baremetal_build_dir)/exit.elf"
+./run --arch aarch64 --baremetal "$(./getvar --arch aarch64 baremetal_build_dir)/hello.elf"
 ....
 
 To use gem5 instead of QEMU do:
 
 ....
 ./build --download-dependencies gem5-baremetal
-./run --arch aarch64 --baremetal interactive/prompt --emulator gem5
+./run --arch aarch64 --baremetal baremetal/hello.c --emulator gem5
 ....
 
 and then <<qemu-buildroot-setup,as usual>> open a shell with:
@@ -1037,7 +1281,7 @@ and then <<qemu-buildroot-setup,as usual>> open a shell with:
 Or as usual, <<tmux>> users can do both in one go with:
 
 ....
-./run --arch aarch64 --baremetal interactive/prompt --emulator gem5 --tmux
+./run --arch aarch64 --baremetal baremetal/hello.c --emulator gem5 --tmux
 ....
 
 TODO: the carriage returns are a bit different than in QEMU, see: <<gem5-baremetal-carriage-return>>.
@@ -1045,8 +1289,8 @@ TODO: the carriage returns are a bit different than in QEMU, see: <<gem5-baremet
 Note that `./build-baremetal` requires the `--emulator gem5` option, and generates separate executable images for both, as can be seen from:
 
 ....
-echo "$(./getvar --arch aarch64 --baremetal interactive/prompt --emulator qemu image)"
-echo "$(./getvar --arch aarch64 --baremetal interactive/prompt --emulator gem5 image)"
+echo "$(./getvar --arch aarch64 --baremetal baremetal/hello.c --emulator qemu image)"
+echo "$(./getvar --arch aarch64 --baremetal baremetal/hello.c --emulator gem5 image)"
 ....
 
 This is unlike the Linux kernel that has a single image for both QEMU and gem5:
@@ -1062,22 +1306,19 @@ The reason for that is that on baremetal we don't parse the <<device-tree,device
 
 ....
 ./build-baremetal --arch aarch64 --emulator gem5 --machine RealViewPBX
-./run --arch aarch64 --baremetal interactive/prompt --emulator gem5 --machine RealViewPBX
+./run --arch aarch64 --baremetal baremetal/hello.c --emulator gem5 --machine RealViewPBX
 ....
 
 This generates yet new separate images with new magic constants:
 
 ....
-echo "$(./getvar --arch aarch64 --baremetal interactive/prompt --emulator gem5 --machine VExpress_GEM5_V1 image)"
-echo "$(./getvar --arch aarch64 --baremetal interactive/prompt --emulator gem5 --machine RealViewPBX      image)"
+echo "$(./getvar --arch aarch64 --baremetal baremetal/hello.c --emulator gem5 --machine VExpress_GEM5_V1 image)"
+echo "$(./getvar --arch aarch64 --baremetal baremetal/hello.c --emulator gem5 --machine RealViewPBX      image)"
 ....
 
 But just stick to newer and better `VExpress_GEM5_V1` unless you have a good reason to use `RealViewPBX`.
 
-When doing bare metal programming, it is likely that you will want to learn assembly language basics. Have a look at these tutorials for the userland part:
-
-* https://github.com/cirosantilli/x86-assembly-cheat
-* https://github.com/cirosantilli/arm-assembly-cheat
+When doing baremetal programming, it is likely that you will want to learn userland assembly first, see: <<userland-assembly>>.
 
 For more information on baremetal, see the section: <<baremetal>>.
 
@@ -1086,23 +1327,15 @@ The following subjects are particularly important:
 * <<tracing>>
 * <<baremetal-gdb-step-debug>>
 
-=== User mode setup
-
-Much like <<baremetal-setup>>, this is another fun setup that does not require Buildroot or the Linux kernel.
-
-Getting started at: <<qemu-user-mode-getting-started>>.
-
-Introduction at: <<user-mode-simulation>>.
-
 [[gdb]]
 == GDB step debug
 
 === GDB step debug kernel boot
 
-`--wait-gdb` makes QEMU and gem5 wait for a GDB connection, otherwise we could accidentally go past the point we want to break at:
+`--gdb-wait` makes QEMU and gem5 wait for a GDB connection, otherwise we could accidentally go past the point we want to break at:
 
 ....
-./run --wait-gdb
+./run --gdb-wait
 ....
 
 Say you want to break at `start_kernel`. So on another shell:
@@ -1135,7 +1368,7 @@ See also:
 Just don't forget to pass `--arch` to `./run-gdb`, e.g.:
 
 ....
-./run --arch aarch64 --wait-gdb
+./run --arch aarch64 --gdb-wait
 ....
 
 and:
@@ -1166,10 +1399,10 @@ Start QEMU with just:
 and after boot inside a shell run:
 
 ....
-/count.sh
+./count.sh
 ....
 
-which counts to infinity to stdout. Source: link:rootfs_overlay/count.sh[].
+which counts to infinity to stdout. Source: link:rootfs_overlay/lkmc/count.sh[].
 
 Then in another shell, run:
 
@@ -1214,16 +1447,22 @@ First start `tmux` with:
 tmux
 ....
 
-Now that you are inside a shell inside tmux, run:
+Now that you are inside a shell inside tmux, you can start GDB simply with:
 
 ....
-./run --tmux --wait-gdb
+./run --gdb
+....
+
+which is just a convenient shortcut for:
+
+....
+./run --gdb-wait --tmux --tmux-args start_kernel
 ....
 
 This splits the terminal into two panes:
 
-* left: usual QEMU
-* right: gdb
+* left: usual QEMU with terminal
+* right: GDB
 
 and focuses on the GDB pane.
 
@@ -1232,60 +1471,67 @@ Now you can navigate with the usual tmux shortcuts:
 * switch between the two panes with: `Ctrl-B O`
 * close either pane by killing its terminal with `Ctrl-D` as usual
 
-To start again, switch back to the QEMU pane, kill the emulator, and re-run:
-
-....
-./run --tmux --wait-gdb
-....
-
-This automatically clears the GDB pane, and starts a new one.
-
-Pass extra arguments to the link:run-gdb[] pane with:
-
-....
-./run --tmux-args start_kernel --wait-gdb
-....
-
-This is equivalent to:
-
-....
-./run --wait-gdb
-./run-gdb start_kernel
-....
-
-Due to Python's CLI parsing quicks, if the link:run-gdb[] arguments start with a dash `-`, you have to use the `=` sign, e.g. to <<gdb-step-debug-early-boot>>:
-
-....
-./run --tmux-args=--no-continue --wait-gdb
-....
-
 See the tmux manual for further details:
 
 ....
 man tmux
 ....
 
+To start again, switch back to the QEMU pane with `Ctrl-O`, kill the emulator, and re-run:
+
+....
+./run --gdb
+....
+
+This automatically clears the GDB pane, and starts a new one.
+
+The option `--tmux-args` determines which options will be passed to the program running on the second tmux pane, and is equivalent to:
+
+This is equivalent to:
+
+....
+./run --gdb-wait
+./run-gdb start_kernel
+....
+
+Due to Python's CLI parsing quicks, if the link:run-gdb[] arguments start with a dash `-`, you have to use the `=` sign, e.g. to <<gdb-step-debug-early-boot>>:
+
+....
+./run --gdb --tmux-args=--no-continue
+....
+
 Bibliography: https://unix.stackexchange.com/questions/152738/how-to-split-a-new-window-and-run-a-command-in-this-new-window-using-tmux/432111#432111
 
 ==== tmux gem5
 
-If you are using gem5 instead of QEMU, `--tmux` has a different effect: it opens the gem5 terminal instead of the debugger:
+If you are using gem5 instead of QEMU, `--tmux` has a different effect by default: it opens the gem5 terminal instead of the debugger:
 
 ....
 ./run --emulator gem5 --tmux
 ....
 
-If you also want to use the debugger with gem5, you will need to create new terminals as usual.
-
-From inside tmux, you can do that with `Ctrl-B C` or `Ctrl-B %`.
-
-To see the debugger by default instead of the terminal, run:
+To open a new pane with GDB instead of the terminal, use:
 
 ....
-./tmu ./run-gdb
-./run --wait-gdb --emulator gem5
+./run --gdb
 ....
 
+which is equivalent to:
+
+....
+./run --emulator gem5 --gdb-wait --tmux --tmux-args start_kernel --tmux-program gdb
+....
+
+`--tmux-program` implies `--tmux`, so we can just write:
+
+....
+./run --emulator gem5 --gdb-wait --tmux-program gdb
+....
+
+If you also want to see both GDB and the terminal with gem5, then you will need to open a separate shell manually as usual with `./gem5-shell`.
+
+From inside tmux, you can create new terminals on a new window with `Ctrl-B C` split a pane yet again vertically with `Ctrl-B %` or horizontally with `Ctrl-B "`.
+
 === GDB step debug kernel module
 
 http://stackoverflow.com/questions/28607538/how-to-debug-linux-kernel-modules-with-qemu/44095831#44095831
@@ -1305,7 +1551,7 @@ Shell 1:
 Wait for the boot to end and run:
 
 ....
-insmod /timer.ko
+insmod timer.ko
 ....
 
 Source: link:kernel_modules/timer.c[].
@@ -1365,7 +1611,7 @@ It is kind of random: if you just `insmod` manually and then immediately `./run-
 But this fails most of the time: shell 1:
 
 ....
-./run --arch arm --eval-after 'insmod /hello.ko'
+./run --arch arm --eval-after 'insmod hello.ko'
 ....
 
 shell 2:
@@ -1454,7 +1700,7 @@ So once we find the address the first time, we can just reuse it afterwards, as
 Do a fresh boot and get the module:
 
 ....
-./run --eval-after '/pr_debug.sh;insmod /fops.ko;/poweroff.out'
+./run --eval-after './pr_debug.sh;insmod fops.ko;./linux/poweroff.out'
 ....
 
 The boot must be fresh, because the load address changes every time we insert, even after removing previous modules.
@@ -1488,7 +1734,7 @@ so the offset address is `0x240` and we deduce that the function will be placed
 Now we can just do a fresh boot on shell 1:
 
 ....
-./run --eval 'insmod /fops.ko;/poweroff.out' --wait-gdb
+./run --eval 'insmod fops.ko;./linux/poweroff.out' --gdb-wait
 ....
 
 and on shell 2:
@@ -1568,7 +1814,7 @@ Useless, but a good way to show how hardcore you are. Disable `lx-symbols` with:
 From inside guest:
 
 ....
-insmod /timer.ko
+insmod timer.ko
 cat /proc/modules
 ....
 
@@ -1596,7 +1842,7 @@ Alternatively, if the module panics before you can read `/proc/modules`, there i
 ....
 echo 8 > /proc/sys/kernel/printk
 echo 'file kernel/module.c +p' > /sys/kernel/debug/dynamic_debug/control
-/myinsmod.out /hello.ko
+./linux/myinsmod.out hello.ko
 ....
 
 And then search for a line of type:
@@ -1649,7 +1895,7 @@ less "$(./getvar --arch arm trace_txt_file)"
 and break there:
 
 ....
-./run --arch arm --wait-gdb
+./run --arch arm --gdb-wait
 ./run-gdb --arch arm '*0x1000'
 ....
 
@@ -1700,23 +1946,23 @@ since GDB does not know that libc is loaded.
 
 This is the userland debug setup most likely to work, since at init time there is only one userland executable running.
 
-For executables from the <<userland-directory>> such as link:userland/count.c[]:
+For executables from the link:userland/[] directory such as link:userland/posix/count.c[]:
 
 * Shell 1:
 +
 ....
-./run --wait-gdb --kernel-cli 'init=/count.out'
+./run --gdb-wait --kernel-cli 'init=/lkmc/posix/count.out'
 ....
 * Shell 2:
 +
 ....
-./run-gdb-user count main
+./run-gdb-user userland/posix/count.c main
 ....
 +
 Alternatively, we could also pass the full path to the executable:
 +
 ....
-./run-gdb-user "$(./getvar userland_build_dir)/sleep_forever.out" main
+./run-gdb-user "$(./getvar userland_build_dir)/posix/count.out" main
 ....
 +
 Path resolution is analogous to <<baremetal-setup-getting-started,that of `./run --baremetal`>>.
@@ -1730,7 +1976,7 @@ BusyBox custom init process:
 * Shell 1:
 +
 ....
-./run --wait-gdb --kernel-cli 'init=/bin/ls'
+./run --gdb-wait --kernel-cli 'init=/bin/ls'
 ....
 * Shell 2:
 +
@@ -1745,7 +1991,7 @@ BusyBox default init process:
 * Shell 1:
 +
 ....
-./run --wait-gdb
+./run --gdb-wait
 ....
 * Shell 2:
 +
@@ -1766,24 +2012,24 @@ Non-init process:
 * Shell 1:
 +
 ....
-./run --wait-gdb
+./run --gdb-wait
 ....
 * Shell 2:
 +
 ....
-./run-gdb-user myinsmod main
+./run-gdb-user userland/linux/myinsmod.c main
 ....
 * Shell 1 after the boot finishes:
 +
 ....
-/myinsmod.out /hello.ko
+./linux/myinsmod.out hello.ko
 ....
 
 This is the least reliable setup as there might be other processes that use the given virtual address.
 
-===== GDB step debug userland non-init without --wait-gdb
+===== GDB step debug userland non-init without --gdb-wait
 
-TODO: without `--wait-gdb` and the `break main` that we do inside `./run-gdb-user` says:
+TODO: without `--gdb-wait` and the `break main` that we do inside `./run-gdb-user` says:
 
 ....
 Cannot access memory at address 0x10604
@@ -1801,7 +2047,7 @@ We have also double checked the address with:
 
 ....
 ./run-toolchain --arch arm readelf -- \
-  -s "$(./getvar --arch arm kernel_modules_build_subdir)/fops.ko" | \
+  -s "$(./getvar --arch arm userland_build_dir)/linux/myinsmod.out" | \
   grep main
 ....
 
@@ -1840,7 +2086,7 @@ However this is failing for us:
 * some symbols are not visible to `call` even though `b` sees them
 * for those that are, `call` fails with an E14 error
 
-E.g.: if we break on `__x64_sys_write` on `/count.sh`:
+E.g.: if we break on `__x64_sys_write` on `count.sh`:
 
 ....
 >>> call printk(0, "asdf")
@@ -1892,10 +2138,10 @@ For a more minimal baremetal multicore setup, see: <<arm-multicore>>.
 We can set and get which cores the Linux kernel allows a program to run on with `sched_getaffinity` and `sched_setaffinity`:
 
 ....
-./run --cpus 2 --eval-after '/sched_getaffinity.out'
+./run --cpus 2 --eval-after './linux/sched_getaffinity.out'
 ....
 
-Source: link:userland/sched_getaffinity.c[]
+Source: link:userland/linux/sched_getaffinity.c[]
 
 Sample output:
 
@@ -1922,7 +2168,7 @@ The number of cores is modified as explained at: <<number-of-cores>>
   --config 'BR2_PACKAGE_UTIL_LINUX=y' \
   --config 'BR2_PACKAGE_UTIL_LINUX_SCHEDUTILS=y' \
 ;
-./run --eval-after 'taskset -c 1,1 /sched_getaffinity.out'
+./run --eval-after 'taskset -c 1,1 ./linux/sched_getaffinity.out'
 ....
 
 output:
@@ -1938,20 +2184,20 @@ so we see that the affinity was restricted to the second core from the start.
 
 Let's do a QEMU observation to justify this example being in the repository with <<gdb-step-debug-userland-non-init,userland breakpoints>>.
 
-We will run our `/sched_getaffinity.out` infinitely many time, on core 0 and core 1 alternatively:
+We will run our `./linux/sched_getaffinity.out` infinitely many time, on core 0 and core 1 alternatively:
 
 ....
 ./run \
   --cpus 2 \
-  --wait-gdb \
-  --eval-after 'i=0; while true; do taskset -c $i,$i /sched_getaffinity.out; i=$((! $i)); done' \
+  --eval-after 'i=0; while true; do taskset -c $i,$i ./linux/sched_getaffinity.out; i=$((! $i)); done' \
+  --gdb-wait \
 ;
 ....
 
 on another shell:
 
 ....
-./run-gdb-user "$(./getvar userland_build_dir)/sched_getaffinity.out" main
+./run-gdb-user "$(./getvar userland_build_dir)/linux/sched_getaffinity.out" main
 ....
 
 Then, inside GDB:
@@ -1976,13 +2222,13 @@ We should also try it out with kernel modules: https://stackoverflow.com/questio
 TODO we then tried:
 
 ....
-./run --cpus 2 --eval-after '/sched_getaffinity_threads.out'
+./run --cpus 2 --eval-after './linux/sched_getaffinity_threads.out'
 ....
 
 and:
 
 ....
-./run-gdb-user "$(./getvar userland_build_dir)/sched_getaffinity_threads.out"
+./run-gdb-user "$(./getvar userland_build_dir)/linux/sched_getaffinity_threads.out"
 ....
 
 to switch between two simultaneous live threads with different affinities, it just didn't break on our threads:
@@ -2172,11 +2418,11 @@ continue
 Then in QEMU:
 
 ....
-/count.sh &
-/kgdb.sh
+./count.sh &
+./kgdb.sh
 ....
 
-link:rootfs_overlay:kgdb.sh[] pauses the kernel for KGDB, and gives control back to GDB.
+link:rootfs_overlay/lkmc/kgdb.sh[] pauses the kernel for KGDB, and gives control back to GDB.
 
 And now in GDB we do the usual:
 
@@ -2210,8 +2456,8 @@ Main more generic question: https://stackoverflow.com/questions/14155577/how-to-
 Just works as you would expect:
 
 ....
-insmod /timer.ko
-/kgdb.sh
+insmod timer.ko
+./kgdb.sh
 ....
 
 In GDB:
@@ -2250,8 +2496,8 @@ And now the `kdb>` prompt is responsive because it is listening to the main cons
 After boot finishes, run the usual:
 
 ....
-/count.sh &
-/kgdb.sh
+./count.sh &
+./kgdb.sh
 ....
 
 And you are back in KDB. Now you can count with:
@@ -2328,34 +2574,34 @@ First build `gdbserver` into the root filesystem:
 ./build-buildroot --config 'BR2_PACKAGE_GDB=y'
 ....
 
-Then on guest, to debug link:userland/myinsmod.c[]:
+Then on guest, to debug link:userland/linux/myinsmod.c[]:
 
 ....
-/gdbserver.sh /myinsmod.out /hello.ko
+./gdbserver.sh ./linux/myinsmod.out hello.ko
 ....
 
-Source: link:rootfs_overlay/gdbserver.sh[].
+Source: link:rootfs_overlay/lkmc/gdbserver.sh[].
 
 And on host:
 
 ....
-./run-gdbserver myinsmod
+./run-gdbserver userland/linux/myinsmod.c
 ....
 
-or alternatively with the full path:
+or alternatively with the path to the executable itself:
 
 ....
-./run-gdbserver "$(./getvar userland_build_dir)/myinsmod.out"
+./run-gdbserver "$(./getvar userland_build_dir)/linux/myinsmod.out"
 ....
 
-https://reverseengineering.stackexchange.com/questions/8829/cross-debugging-for-arm-mips-elf-with-qemu-toolchain/16214#16214
+Bibliography: https://reverseengineering.stackexchange.com/questions/8829/cross-debugging-for-arm-mips-elf-with-qemu-toolchain/16214#16214
 
 === gdbserver BusyBox
 
 Analogous to <<gdb-step-debug-userland-processes>>:
 
 ....
-/gdbserver.sh ls
+./gdbserver.sh ls
 ....
 
 on host you need:
@@ -2371,7 +2617,7 @@ Our setup gives you the rare opportunity to step debug libc and other system lib
 For example in the guest:
 
 ....
-/gdbserver.sh /count.out
+./gdbserver.sh ./count.out
 ....
 
 Then on host:
@@ -2425,7 +2671,7 @@ To use `arm` instead of x86 for example:
 Debug:
 
 ....
-./run --arch arm --wait-gdb
+./run --arch arm --gdb-wait
 # On another terminal.
 ./run-gdb --arch arm
 ....
@@ -2466,7 +2712,7 @@ cr3 = 0xFFFFF0DCDC000
 However if we try to do it from userland:
 
 ....
-/ring0.out
+./ring0.out
 ....
 
 stdout gives:
@@ -2484,7 +2730,7 @@ traps: ring0.out[55] general protection ip:40054c sp:7fffffffec20 error:0 in rin
 Sources:
 
 * link:kernel_modules/ring0.c[]
-* link:kernel_modules/ring0.h[]
+* link:lkmc/ring0.h[]
 * link:userland/ring0.c[]
 
 In both cases, we attempt to run the exact same code which is shared on the `ring0.h` header file.
@@ -2546,7 +2792,7 @@ To have more control over the system, you can replace BusyBox's init with your o
 The most direct way to replace `init` with our own is to just use the `init=` <<kernel-command-line-parameters,command line parameter>> directly:
 
 ....
-./run --kernel-cli 'init=/count.sh'
+./run --kernel-cli 'init=/lkmc/count.sh'
 ....
 
 This just counts every second forever and does not give you a shell.
@@ -2556,18 +2802,16 @@ This method is not very flexible however, as it is hard to reliably pass multipl
 For this reason, we have created a more robust helper method with the `--eval` option:
 
 ....
-./run --eval 'echo "asdf qwer";insmod /hello.ko;/poweroff.out'
+./run --eval 'echo "asdf qwer";insmod hello.ko;./linux/poweroff.out'
 ....
 
-The `--eval` option replaces init with a shell script that just evals the given command.
-
 It is basically a shortcut for:
 
 ....
-./run --kernel-cli 'init=/eval_base64.sh - lkmc_eval="insmod /hello.ko;/poweroff.out"'
+./run --kernel-cli 'init=/lkmc/eval_base64.sh - lkmc_eval="insmod hello.ko;./linux/poweroff.out"'
 ....
 
-Source: link:rootfs_overlay/eval_base64.sh[].
+Source: link:rootfs_overlay/lkmc/eval_base64.sh[].
 
 This allows quoting and newlines by base64 encoding on host, and decoding on guest, see: <<kernel-command-line-parameters-escaping>>.
 
@@ -2588,26 +2832,28 @@ It also automatically chooses between `init=` and `rcinit=` for you, see: <<path
 
 The best way to overcome those limitations is to use: <<init-busybox>>
 
-If the script is large, you can add it to a gitignored file and pass that to `-E` as in:
+If the script is large, you can add it to a gitignored file and pass that to `--eval` as in:
 
 ....
 echo '
-insmod /hello.ko
-/poweroff.out
-' > gitignore.sh
-./run --eval "$(cat gitignore.sh)"
+cd /lkmc
+insmod hello.ko
+./linux/poweroff.out
+' > data/gitignore.sh
+./run --eval "$(cat data/gitignore.sh)"
 ....
 
 or add it to a file to the root filesystem guest and rebuild:
 
 ....
 echo '#!/bin/sh
-insmod /hello.ko
-/poweroff.out
-' > rootfs_overlay/gitignore.sh
-chmod +x rootfs_overlay/gitignore.sh
+cd /lkmc
+insmod hello.ko
+./linux/poweroff.out
+' > rootfs_overlay/lkmc/gitignore.sh
+chmod +x rootfs_overlay/lkmc/gitignore.sh
 ./build-buildroot
-./run --kernel-cli 'init=/gitignore.sh'
+./run --kernel-cli 'init=/lkmc/gitignore.sh'
 ....
 
 Remember that if your init returns, the kernel will panic, there are just two non-panic possibilities:
@@ -2627,19 +2873,19 @@ because BusyBox' `poweroff` tries to do some fancy stuff like killing init, like
 
 But this fails when we are `init` itself!
 
-`poweroff` works more brutally and effectively if you add `-f`:
+BusyBox' `poweroff` works more brutally and effectively if you add `-f`:
 
 ....
 ./run --eval 'poweroff -f'
 ....
 
-but why not just use our minimal `/poweroff.out` and be done with it?
+but why not just use our minimal `./linux/poweroff.out` and be done with it?
 
 ....
-./run --eval '/poweroff.out'
+./run --eval './linux/poweroff.out'
 ....
 
-Source: link:userland/poweroff.c[]
+Source: link:userland/linux/poweroff.c[]
 
 This also illustrates how to shutdown the computer from C: https://stackoverflow.com/questions/28812514/how-to-shutdown-linux-using-c-or-qt-without-call-to-system
 
@@ -2648,28 +2894,30 @@ This also illustrates how to shutdown the computer from C: https://stackoverflow
 I dare you to guess what this does:
 
 ....
-./run --eval '/sleep_forever.out'
+./run --eval './posix/sleep_forever.out'
 ....
 
-Source: link:userland/sleep_forever.c[]
+Source: link:userland/posix/sleep_forever.c[]
 
 This executable is a convenient simple init that does not panic and sleeps instead.
 
 ==== time_boot.out
 
-Get a reasonable answer to "how long does boot take?":
+Get a reasonable answer to "how long does boot take in guest time?":
 
 ....
-./run --eval-after '/time_boot.out'
+./run --eval-after './linux/time_boot.c'
 ....
 
-Dmesg contains a message of type:
+Source: link:userland/linux/time_boot.out[]
+
+That executable writes to `dmesg` directly through `/dev/kmsg` a message of type:
 
 ....
-[    2.188242] time_boot.c
+[    2.188242] /path/to/linux-kernel-module-cheat/userland/linux/time_boot.c
 ....
 
-which tells us that boot took `2.188242` seconds.
+which tells us that boot took `2.188242` seconds based on the dmesg timestamp.
 
 Bibliography: https://stackoverflow.com/questions/12683169/measure-time-taken-for-linux-kernel-from-bootup-to-userpace/46517014#46517014
 
@@ -2687,10 +2935,10 @@ After the commands run, you are left on an interactive shell.
 The above command is basically equivalent to:
 
 ....
-./run --kernel-cli-after-dash 'lkmc_eval="insmod /hello.ko;poweroff.out;"'
+./run --kernel-cli-after-dash 'lkmc_eval="insmod hello.ko;./linux/poweroff.out;"'
 ....
 
-where the `lkmc_eval` option gets evaled by our default link:rootfs_overlay/etc/init.d/S98[S98] startup script.
+where the `lkmc_eval` option gets evaled by our default link:rootfs_overlay/etc/init.d/S98[] startup script.
 
 Except that `--eval-after` is smarter and uses `base64` encoding.
 
@@ -2727,14 +2975,14 @@ ____
 And you can try it out with:
 
 ....
-./run --kernel-cli 'init=/init_env_poweroff.out - asdf=qwer zxcv'
+./run --kernel-cli 'init=/lkmc/linux/init_env_poweroff.out - asdf=qwer zxcv'
 ....
 
 Output:
 
 ....
 args:
-/init_env_poweroff.out
+/lkmc/linux/init_env_poweroff.out
 -
 zxcv
 
@@ -2744,7 +2992,7 @@ TERM=linux
 asdf=qwer
 ....
 
-Source: link:userland/init_env_poweroff.c[].
+Source: link:userland/linux/init_env_poweroff.c[].
 
 ==== init arguments
 
@@ -2753,21 +3001,21 @@ The annoying dash `-` gets passed as a parameter to `init`, which makes it impos
 Arguments with dots that come after `-` are still treated specially (of the form `subsystem.somevalue`) and disappear, from args, e.g.:
 
 ....
-./run --kernel-cli 'init=/init_env_poweroff.out - /poweroff.out'
+./run --kernel-cli 'init=/lkmc/linux/init_env_poweroff.out - /lkmc/linux/poweroff.out'
 ....
 
 outputs:
 
 ....
 args
-/init_env_poweroff.out
+/lkmc/linux/init_env_poweroff.out
 -
 ab
 ....
 
 so see how `a.b` is gone.
 
-The simple workaround is to just create a shell script that does it, e.g. as we've done at: link:rootfs_overlay/gem5_exit.sh[].
+The simple workaround is to just create a shell script that does it, e.g. as we've done at: link:rootfs_overlay/lkmc/gem5_exit.sh[].
 
 ==== init environment env
 
@@ -2816,20 +3064,24 @@ then it shows more variables, notably:
 PATH='/sbin:/usr/sbin:/bin:/usr/bin'
 ....
 
-Finally, login shells will source some default files, notably:
+===== BusyBox shell initrc files
+
+Login shells source some default files, notably:
 
 ....
 /etc/profile
-/root/.profile
+$HOME/.profile
 ....
 
-We currently control `/root/.profile` at link:rootfs_overlay/root/.profile[], and use the default BusyBox `/etc/profile`.
+In our case, `HOME` is set to `/` presumably by `init` at: https://git.busybox.net/busybox/tree/init/init.c?id=5059653882dbd86e3bbf48389f9f81b0fac8cd0a#n1114
+
+We provide `/.profile` from link:rootfs_overlay/.profile[], and use the default BusyBox `/etc/profile`.
 
 The shell knows that it is a login shell if the first character of `argv[0]` is `-`, see also: https://stackoverflow.com/questions/2050961/is-argv0-name-of-executable-an-accepted-standard-or-just-a-common-conventi/42291142#42291142
 
 When we use just `init=/bin/sh`, the Linux kernel sets `argv[0]` to `/bin/sh`, which does not start with `-`.
 
-However, if you use `::respawn:-/bin/sh` on inttab described at <<tty>>, BusyBox' init sets `argv[0]` to `-`, and so does `getty`. This can be observed with:
+However, if you use `::respawn:-/bin/sh` on inttab described at <<tty>>, BusyBox' init sets `argv[0][0]` to `-`, and so does `getty`. This can be observed with:
 
 ....
 cat /proc/$$/cmdline
@@ -2837,6 +3089,8 @@ cat /proc/$$/cmdline
 
 where `$$` is the PID of the shell itself: https://stackoverflow.com/questions/21063765/get-pid-in-shell-bash
 
+Bibliography: https://unix.stackexchange.com/questions/176027/ash-profile-configuration-file
+
 == initrd
 
 The kernel can boot from an CPIO file, which is a directory serialization format much like tar: https://superuser.com/questions/343915/tar-vs-cpio-what-is-the-difference
@@ -3269,26 +3523,27 @@ See also: <<user-mode-simulation-with-glibc>>
 
 === QEMU user mode getting started
 
-Let's run link:userland/print_argv.c[] built with the Buildroot toolchain on QEMU user mode:
+Let's run link:userland/c/print_argv.c[] built with the Buildroot toolchain on QEMU user mode:
 
 ....
 ./build user-mode-qemu
 ./run \
-  --userland print_argv \
-  --userland-args 'asdf "qw er"' \
+  --userland userland/c/print_argv.c \
+  --userland-args='asdf "qw er"' \
 ;
 ....
 
 Output:
 
 ....
+/path/to/linux-kernel-module-cheat/out/userland/default/x86_64/c/print_argv.out
 asdf
 qw er
 ....
 
 `./run --userland` path resolution is analogous to <<baremetal-setup-getting-started,that of `./run --baremetal`>>.
 
-`./build user-mode-qemu` first builds Buildroot, and then runs `./build-userland`, which is further documented at: <<userland-directory>>. It also builds QEMU. If you ahve already done a <<qemu-buildroot-setup>> previously, this will be very fast.
+`./build user-mode-qemu` first builds Buildroot, and then runs `./build-userland`, which is further documented at: <<userland-setup>>. It also builds QEMU. If you ahve already done a <<qemu-buildroot-setup>> previously, this will be very fast.
 
 If you modify the userland programs, rebuild simply with:
 
@@ -3296,66 +3551,122 @@ If you modify the userland programs, rebuild simply with:
 ./build-userland
 ....
 
-==== User mode with host toolchain and QEMU
+==== User mode GDB
 
-If you are lazy to built the Buildroot toolchain and QEMU, you can get away on Ubuntu 18.04 with just:
+It's nice when <<gdb,the obvious>> just works, right?
 
 ....
-sudo apt-get install gcc-aarch64-linux-gnu qemu-system-aarch64
-./build-userland \
-  --arch aarch64 \
-  --gcc-which host \
-  --userland-build-id host \
-;
 ./run \
   --arch aarch64 \
-  --qemu-which host
-  --userland-build-id host \
-  --userland print_argv \
+  --gdb-wait \
+  --userland userland/c/print_argv.c \
   --userland-args 'asdf "qw er"' \
 ;
 ....
 
-where:
+and on another shell:
 
-* `--gcc-which host`: use the host toolchain.
-+
-We must pass this to `./run` as well because QEMU must know which dynamic libraries to use. See also: <<user-mode-static-executables>>.
-* `--userland-build-id host`: put the host built into a <<build-variants>>
+....
+./run-gdb \
+  --arch aarch64 \
+  --userland userland/c/print_argv.c \
+  main \
+;
+....
 
-This present the usual trade-offs of using prebuilts as mentioned at: <<prebuilt>>.
+Or alternatively, if you are using <<tmux>>, do everything in one go with:
 
-==== User mode simulation with glibc
+....
+./run \
+  --arch aarch64 \
+  --gdb \
+  --userland userland/c/print_argv.c \
+  --userland-args 'asdf "qw er"' \
+;
+....
+
+To stop at the very first instruction of a freestanding program, just use `--no-continue` TODO example.
+
+=== User mode tests
+
+Automatically run all userland tests that can be run in user mode simulation, and check that they exit with status 0:
+
+....
+./build --all-archs test-user-mode
+./test-user-mode --all-archs --all-emulators
+....
+
+Or just for QEMU:
+
+....
+./build --all-archs test-user-mode-qemu
+./test-user-mode --all-archs --emulator qemu
+....
+
+Source: link:test-user-mode[]
+
+This script skips a manually configured list of tests, notably:
+
+* tests that depend on a full running kernel and cannot be run in user mode simulation, e.g. those that rely on kernel modules
+* tests that require user interaction
+* tests that take perceptible ammounts of time
+* known bugs we didn't have time to fix ;-)
+
+Tests under link:userland/libs/[] depend on certain libraries being available on the target, e.g. <<blas>> for link:userland/libs/blas[]. They are not run by default, but can be enabled with `--package` and `--package-all`.
+
+The gem5 tests require building statically with build id `static`, see also: <<gem5-syscall-emulation-mode>>. TODO automate this better.
+
+See: <<test-this-repo>> for more useful testing tips.
+
+=== User mode Buildroot executables
+
+If you followed <<qemu-buildroot-setup>>, you can now run the executables created by Buildroot directly as:
+
+....
+./run \
+  --userland "$(./getvar buildroot_target_dir)/bin/echo" \
+  --userland-args='asdf' \
+;
+....
+
+Here is an interesting examples of this: <<linux-test-project>>
+
+=== User mode simulation with glibc
 
 At 125d14805f769104f93c510bedaa685a52ec025d we <<libc-choice,moved Buildroot from uClibc to glibc>>, and caused some user mode pain, which we document here.
 
-===== FATAL: kernel too old
+==== FATAL: kernel too old
 
-Happens on all gem5 setups, but not on QEMU on Ubuntu 18.04 host.
+Happens on all gem5 <<user-mode-simulation>> setups, but not on QEMU on Ubuntu 18.04 host.
 
 glibc has a check for kernel version, likely obtained from the `uname` syscall, and if the kernel is not new enough, it quits.
 
 Determining the right number to put there is of course highly non-trivial and would require an extensive userland test suite, which most emulator don't have.
 
-We don't have this failure for QEMU, only gem5. QEMU by default copies the host `uname`, but it also has the `-r` option to set it explicitly, try it out with:
+We don't have this failure for QEMU on an 18.04 host, only gem5.
+
+QEMU by default copies the host `uname` value. However, our scripts set it by default to our the latest Buildroot kernel version with QEMU's `-r` option, which is exposed as `--kernel-version`:
 
 ....
-./run --arch aarch64 --userland uname -- -r v4.17.0
+./run --arch aarch64 --kernel-version 4.18 --userland userland/posix/uname.c
 ....
 
-Source: link:userland/uname.c[].
+Source: link:userland/posix/uname.c[].
+
+gem5 does not have such runtime configuration, but the error can be worked around for now by patching the hardcoded Linux version as mentioned at: https://stackoverflow.com/questions/48959349/how-to-solve-fatal-kernel-too-old-when-running-gem5-in-syscall-emulation-se-m to be a recent Linux version such as `v4.17.0`.
+
+We override the default QEMU uname because otherwise all executables fail with "kernel too old" on older Ubuntu hosts. The downside is that you might hit syscalls which your host does not have for QEMU to forward to, but we'll let you be the judge of that.
 
 The QEMU source that does this is at: https://github.com/qemu/qemu/blob/v3.1.0/linux-user/syscall.c#L8931
 
 In gem5, there are tons of missing syscalls, and that number currently just gets bumped up randomly from time to time when someone gets fed up:
 
-* https://stackoverflow.com/questions/48959349/how-to-solve-fatal-kernel-too-old-when-running-gem5-in-syscall-emulation-se-m
 * https://stackoverflow.com/questions/53085048/how-to-compile-and-run-an-executable-in-gem5-syscall-emulation-mode-with-se-py/53085049#53085049
 * https://gem5-review.googlesource.com/c/public/gem5/+/15855
 
 The ID is just hardcoded on the source:
 
-===== stack smashing detected
+==== stack smashing detected
 
 For some reason QEMU / glibc x86_64 picks up the host libc, which breaks things.
 
@@ -3367,8 +3678,8 @@ Reproduction:
 
 ....
 rm -f "$(./getvar buildroot_target_dir)/etc/ld.so.cache"
-./run --userland hello
-./run --userland hello --qemu-which host
+./run --userland userland/c/hello.c
+./run --userland userland/c/hello.c --qemu-which host
 ....
 
 Outcome:
@@ -3394,7 +3705,7 @@ A non-QEMU-specific example of stack smashing is shown at: https://stackoverflow
 
 Tested at: 2e32389ebf1bedd89c682aa7b8fe42c3c0cf96e5 + 1.
 
-==== User mode static executables
+=== User mode static executables
 
 Example:
 
@@ -3406,7 +3717,7 @@ Example:
 ./run \
   --arch aarch64 \
   --static \
-  --userland print_argv \
+  --userland userland/c/print_argv.c \
   --userland-args 'asdf "qw er"' \
 ;
 ....
@@ -3420,7 +3731,7 @@ However, in case something goes wrong, you can also try statically linked execut
 * gem5 user mode currently only supports static executables: <<gem5-syscall-emulation-mode>>
 * QEMU x86_64 guest on x86_64 host was failing with <<stack-smashing-detected>>, but we found a workaround
 
-===== User mode static executables with dynamic libraries
+==== User mode static executables with dynamic libraries
 
 One limitation of static executables is that Buildroot mostly only builds dynamic versions of libraries (the libc is an exception).
 
@@ -3429,7 +3740,7 @@ So programs that rely on those libraries might not compile as GCC can't find the
 For example, if we try to build <<blas>> statically:
 
 ....
-./build-userland --has-package openblas --static -- openblas_hello
+./build-userland --package openblas --static -- userland/libs/openblas/hello.c
 ....
 
 it fails with:
@@ -3438,43 +3749,6 @@ it fails with:
 ld: cannot find -lopenblas
 ....
 
-==== User mode GDB
-
-It's nice when <<gdb,the obvious>> just works, right?
-
-....
-./run \
-  --arch aarch64 \
-  --userland print_argv \
-  --userland-args 'asdf "qw er"' \
-  --wait-gdb \
-;
-....
-
-and on another shell:
-
-....
-./run-gdb \
-  --arch aarch64 \
-  --userland print_argv \
-  main \
-;
-....
-
-Or alternatively, if you are using <<tmux>>, do everything in one go with:
-
-....
-./run \
-  --arch aarch64 \
-  --tmux-args main \
-  --userland print_argv \
-  --userland-args 'asdf "qw er"' \
-  --wait-gdb \
-;
-....
-
-To stop at the very first instruction of a freestanding program, just use `--no-continue` TODO example.
-
 === gem5 syscall emulation mode
 
 Less robust than QEMU's, but still usable:
@@ -3505,29 +3779,29 @@ So let's just play with some static ones:
 ./run \
   --arch aarch64 \
   --emulator gem5 \
-  --userland print_argv \
+  --userland userland/c/print_argv.c \
   --userland-args 'asdf "qw er"' \
 ;
 ....
 
 TODO: how to escape spaces on the command line arguments?
 
-Step debug also works:
+<<user-mode-gdb,GDB step debug>> also works normally on gem5:
 
 ....
 ./run \
   --arch aarch64 \
   --emulator gem5 \
+  --gdb-wait \
   --static \
-  --userland print_argv \
+  --userland userland/c/print_argv.c \
   --userland-args 'asdf "qw er"' \
-  --wait-gdb \
 ;
 ./run-gdb \
   --arch aarch64 \
   --emulator gem5 \
   --static \
-  --userland print_argv \
+  --userland userland/c/print_argv.c \
   main \
 ;
 ....
@@ -3537,10 +3811,10 @@ Step debug also works:
 As of gem5 7fa4c946386e7207ad5859e8ade0bbfc14000d91, the crappy `se.py` script does not forward the exit status of syscall emulation mode, you can test it with:
 
 ....
-./run --dry-run --emulator gem5 --static --userland false
+./run --dry-run --emulator gem5 --static --userland userland/c/false.c
 ....
 
-Source: link:userland/false[].
+Source: link:userland/c/false.c[].
 
 Then manually run the generated gem5 CLI, and do:
 
@@ -3558,18 +3832,41 @@ Simulated exit code not 0! Exit code is 1
 
 which we parse in link:run[] and then exit with the correct result ourselves...
 
+Related thread: https://stackoverflow.com/questions/56032347/is-there-a-way-to-identify-if-gem5-run-got-over-successfully
+
+==== gem5 syscall emulation mode program stdin
+
+gem5 shows its own stdout to terminal, and does not allow you to type stdin to programs.
+
+Instead, you must pass stdin non-interactively with the through a file with the `--se.py --input` option, e.g.:
+
+....
+printf a > f
+./run --emulator gem5 --userland userland/c/getchar.c --static -- --input f
+....
+
+leads to gem5 output:
+
+....
+enter a character: you entered: a
+....
+
+Source: link:userland/c/getchar.c[]
+
 ==== User mode vs full system benchmark
 
 Let's see if user mode runs considerably faster than full system or not.
 
+First we build Dhrystone manually statically since dynamic linking is broken in gem5: <<gem5-syscall-emulation-mode>>.
+
 gem5 user mode:
 
 ....
-./build-buildroot --config 'BR2_PACKAGE_DHRYSTONE=y' --arch arm
+./build-buildroot --arch arm --config 'BR2_PACKAGE_DHRYSTONE=y'
 make \
   -B \
   -C "$(./getvar --arch arm buildroot_build_build_dir)/dhrystone-2" \
-  CC="$(./run-toolchain --arch arm --dry gcc)" \
+  CC="$(./run-toolchain --arch arm --print-tool gcc)" \
   CFLAGS=-static \
 ;
 time \
@@ -3587,7 +3884,7 @@ gem5 full system:
 time \
   ./run \
   --arch arm \
-  --eval-after '/gem5.sh' \
+  --eval-after './gem5.sh' \
   --emulator gem5
   --gem5-readfile 'dhrystone 100000' \
 ;
@@ -3605,7 +3902,7 @@ QEMU full system:
 time \
   ./run \
   --arch arm \
-  --eval-after 'time dhrystone 100000000;/poweroff.out' \
+  --eval-after 'time dhrystone 100000000;./linux/poweroff.out' \
 ;
 ....
 
@@ -3616,29 +3913,41 @@ Result on <<p51>> at bad30f513c46c1b0995d3a10c0d9bc2a33dc4fa0:
 * QEMU user: 45 seconds
 * QEMU full system: 223 seconds
 
-=== User mode tests
+=== QEMU user mode quirks
 
-Automatically run non-interactive userland tests that can be run in user mode simulation:
+==== QEMU user mode does not show stdout immediately
+
+At 8d8307ac0710164701f6e14c99a69ee172ccbb70 + 1, I noticed that if you run link:userland/posix/count.c[]:
 
 ....
-./build --all-archs test-user-mode
-./test-user-mode --all-archs --all-emulators
+./run --userland userland/posix/count.c --userland-args 3
 ....
 
-Or just for QEMU:
+it first waits for 3 seconds, and then dumps all the output at once, instead of counting once every second as expected.
+
+The same can be reproduced by copying the raw QEMU command and piping it through `tee`, so I don't think it is a bug in our setup:
 
 ....
-./build --all-archs test-user-mode-qemu
-./test-user-mode --all-archs --emulator qemu
+/path/to/linux-kernel-module-cheat/out/qemu/default/x86_64-linux-user/qemu-x86_64 \
+  -L /path/to/linux-kernel-module-cheat/out/buildroot/build/default/x86_64/target \
+  /path/to/linux-kernel-module-cheat/out/userland/default/x86_64/posix/count.out \
+  3 \
+| tee
 ....
 
-Source: link:test-user-mode[]
+TODO: investigate further and then possibly post on QEMU mailing list.
 
-This testing excludes notably kernel module tests which depend on a full running kernel.
+===== QEMU user mode does not show errors
 
-The gem5 tests require building statically with build id `static`, see also: <<gem5-syscall-emulation-mode>>. TODO automate this better.
+Similarly to <<qemu-user-mode-does-not-show-stdout-immediately>>, QEMU error messages do not show at all through pipes.
 
-See: <<test-this-repo>> for more useful testing tips.
+In particular, it does not say anything if you pass it a non-existing executable:
+
+....
+qemu-x86_64 asdf | cat
+....
+
+So we just check ourselves manually
 
 == Kernel module utilities
 
@@ -3647,7 +3956,7 @@ See: <<test-this-repo>> for more useful testing tips.
 link:https://git.busybox.net/busybox/tree/modutils/insmod.c?h=1_29_3[Provided by BusyBox]:
 
 ....
-./run --eval-after 'insmod /hello.ko'
+./run --eval-after 'insmod hello.ko'
 ....
 
 === myinsmod
@@ -3656,18 +3965,18 @@ If you are feeling raw, you can insert and remove modules with our own minimal m
 
 ....
 # init_module
-/myinsmod.out /hello.ko
+./linux/myinsmod.out hello.ko
 # finit_module
-/myinsmod.out /hello.ko "" 1
-/myrmmod.out hello
+./linux/myinsmod.out hello.ko "" 1
+./linux/myrmmod.out hello
 ....
 
 which teaches you how it is done from C code.
 
 Source:
 
-* link:userland/myinsmod.c[]
-* link:userland/myrmmod.c[]
+* link:userland/linux/myinsmod.c[]
+* link:userland/linux/myrmmod.c[]
 
 The Linux kernel offers two system calls for module insertion:
 
@@ -4378,10 +4687,10 @@ Not enabled by default due to the build / runtime overhead. To enable, build wit
 Then inside the guest turn on sshd:
 
 ....
-/sshd.sh
+./sshd.sh
 ....
 
-Source: link:rootfs_overlay/sshd.sh[]
+Source: link:rootfs_overlay/lkmc/sshd.sh[]
 
 And finally on host:
 
@@ -4641,16 +4950,16 @@ zcat /proc/config.gz
 or with our shortcut:
 
 ....
-/conf.sh
+./conf.sh
 ....
 
 or to conveniently grep for a specific option case insensitively:
 
 ....
-/conf.sh ikconfig
+./conf.sh ikconfig
 ....
 
-Source: link:rootfs_overlay/conf.sh[].
+Source: link:rootfs_overlay/lkmc/conf.sh[].
 
 This is enabled by:
 
@@ -5025,11 +5334,11 @@ and so it is Read Only as shown by `ro`.
 Disable userland address space randomization. Test it out by running <<rand_check-out>> twice:
 
 ....
-./run --eval-after '/rand_check.out;/poweroff.out'
-./run --eval-after '/rand_check.out;/poweroff.out'
+./run --eval-after './linux/rand_check.out;./linux/poweroff.out'
+./run --eval-after './linux/rand_check.out;./linux/poweroff.out'
 ....
 
-If we remove it from our link:run[] script by hacking it up, the addresses shown by `rand_check.out` vary across boots.
+If we remove it from our link:run[] script by hacking it up, the addresses shown by `linux/rand_check.out` vary across boots.
 
 Equivalent to:
 
@@ -5114,16 +5423,16 @@ But the awesome `CONFIG_DYNAMIC_DEBUG=y` option which we enable by default allow
 ....
 echo 8 > /proc/sys/kernel/printk
 echo 'file kernel/module.c +p' > /sys/kernel/debug/dynamic_debug/control
-/myinsmod.out /hello.ko
+./linux/myinsmod.out hello.ko
 ....
 
 and we have a shortcut at:
 
 ....
-/pr_debug.sh
+./pr_debug.sh
 ....
 
-Source: link:rootfs_overlay/pr_debug.sh[].
+Source: link:rootfs_overlay/lkmc/pr_debug.sh[].
 
 Syntax: https://www.kernel.org/doc/html/v4.11/admin-guide/dynamic-debug-howto.html
 
@@ -5144,7 +5453,7 @@ Enable messages in specific modules:
 ....
 echo 8 > /proc/sys/kernel/printk
 echo 'module myprintk +p' > /sys/kernel/debug/dynamic_debug/control
-insmod /myprintk.ko
+insmod myprintk.ko
 ....
 
 Source: link:kernel_modules/myprintk.c[]
@@ -5159,7 +5468,7 @@ but TODO: it also shows debug messages even without enabling them explicitly:
 
 ....
 echo 8 > /proc/sys/kernel/printk
-insmod /myprintk.ko
+insmod myprintk.ko
 ....
 
 and it shows as enabled:
@@ -5226,7 +5535,7 @@ This likely comes from the ifdef split at `init/main.c`:
 The Linux kernel allows passing module parameters at insertion time <<myinsmod,through the `init_module` and `finit_module` system calls>>:
 
 ....
-/params.sh
+./params.sh
 echo $?
 ....
 
@@ -5239,14 +5548,14 @@ Outcome: the test passes:
 Sources:
 
 * link:kernel_modules/params.c[]
-* link:rootfs_overlay/params.sh[]
+* link:rootfs_overlay/lkmc/params.sh[]
 
 As shown in the example, module parameters can also be read and modified at runtime from <<sysfs>>.
 
 We can obtain the help text of the parameters with:
 
 ....
-modinfo /params.ko
+modinfo params.ko
 ....
 
 The output contains:
@@ -5280,7 +5589,7 @@ This is specially important when loading modules with <<kernel-module-dependenci
 One module can depend on symbols of another module that are exported with `EXPORT_SYMBOL`:
 
 ....
-/dep.sh
+./dep.sh
 echo $?
 ....
 
@@ -5294,14 +5603,14 @@ Sources:
 
 * link:kernel_modules/dep.c[]
 * link:kernel_modules/dep2.c[]
-* link:rootfs_overlay/dep.sh[]
+* link:rootfs_overlay/lkmc/dep.sh[]
 
 The kernel deduces dependencies based on the `EXPORT_SYMBOL` that each module uses.
 
 Symbols exported by `EXPORT_SYMBOL` can be seen with:
 
 ....
-insmod /dep.ko
+insmod dep.ko
 grep lkmc_dep /proc/kallsyms
 ....
 
@@ -5318,7 +5627,7 @@ This requires `CONFIG_KALLSYMS_ALL=y`.
 Dependency information is stored by the kernel module build system in the `.ko` files' <<module_info>>, e.g.:
 
 ....
-modinfo /dep2.ko
+modinfo dep2.ko
 ....
 
 contains:
@@ -5330,7 +5639,7 @@ depends:        dep
 We can double check with:
 
 ....
-strings 3 /dep2.ko  | grep -E 'depends'
+strings 3 dep2.ko  | grep -E 'depends'
 ....
 
 The output contains:
@@ -5414,7 +5723,7 @@ Bibliography:
 Module metadata is stored on module files at compile time. Some of the fields can be retrieved through the `THIS_MODULE` `struct module`:
 
 ....
-insmod /module_info.ko
+insmod module_info.ko
 ....
 
 Dmesg output:
@@ -5441,13 +5750,13 @@ Output:
 And we can also observe them with the `modinfo` command line utility:
 
 ....
-modinfo /module_info.ko
+modinfo module_info.ko
 ....
 
 sample output:
 
 ....
-filename:       /module_info.ko
+filename:       module_info.ko
 license:        GPL
 version:        1.0
 srcversion:     AF3DE8A8CFCDEB6B00E35B6
@@ -5458,7 +5767,7 @@ vermagic:       4.17.0 SMP mod_unload modversions
 Module information is stored in a special `.modinfo` section of the ELF file:
 
 ....
-./run-toolchain readelf -- -SW "$(./getvar target_dir)/module_info.ko"
+./run-toolchain readelf -- -SW "$(./getvar kernel_modules_build_subdir)/module_info.ko"
 ....
 
 contains:
@@ -5470,7 +5779,7 @@ contains:
 and:
 
 ....
-./run-toolchain readelf -- -x .modinfo "$(./getvar buildroot_build_build_dir)/module_info.ko"
+./run-toolchain readelf -- -x .modinfo "$(./getvar kernel_modules_build_subdir)/module_info.ko"
 ....
 
 gives:
@@ -5501,7 +5810,7 @@ Bibliography:
 Vermagic is a magic string present in the kernel and on <<module_info>> of kernel modules. It is used to verify that the kernel module was compiled against a compatible kernel version and relevant configuration:
 
 ....
-insmod /vermagic.ko
+insmod vermagic.ko
 ....
 
 Possible dmesg output:
@@ -5515,7 +5824,7 @@ Source: link:kernel_modules/vermagic.c[]
 If we artificially create a mismatch with `MODULE_INFO(vermagic`, the insmod fails with:
 
 ....
-insmod: can't insert '/vermagic_fail.ko': invalid module format
+insmod: can't insert 'vermagic_fail.ko': invalid module format
 ....
 
 and `dmesg` says the expected and found vermagic found:
@@ -5561,7 +5870,7 @@ This option just strips `modversion` information from the module before loading,
 `init_module` and `cleantup_module` are an older alternative to the `module_init` and `module_exit` macros:
 
 ....
-insmod /init_module.ko
+insmod init_module.ko
 rmmod init_module
 ....
 
@@ -5581,8 +5890,8 @@ TODO why were `module_init` and `module_exit` created? https://stackoverflow.com
 To test out kernel panics and oops in controlled circumstances, try out the modules:
 
 ....
-insmod /panic.ko
-insmod /oops.ko
+insmod panic.ko
+insmod oops.ko
 ....
 
 Source:
@@ -5722,7 +6031,9 @@ One possibility that gets close would be to use <<gdb>> to break at the `panic`
 
 ====== Exit gem5 on panic
 
-gem5 actually detects panics automatically by parsing kernel symbols and detecting when the PC reaches the address of the `panic` function. gem5 then prints to stdout:
+gem5 9048ef0ffbf21bedb803b785fb68f83e95c04db8 (January 2019) can detect panics automatically if the option `system.panic_on_panic` is on.
+
+It parses kernel symbols and detecting when the PC reaches the address of the `panic` function. gem5 then prints to stdout:
 
 ....
 Kernel panic in simulated kernel
@@ -5730,17 +6041,19 @@ Kernel panic in simulated kernel
 
 and exits with status -6.
 
+At gem5 ff52563a214c71fcd1e21e9f00ad839612032e3b (July 2018) behaviour was different, and just exited 0: https://www.mail-archive.com/gem5-users@gem5.org/msg15870.html TODO find fixing commit.
+
 We enable the `system.panic_on_panic` option by default on `arm` and `aarch64`, which makes gem5 exit immediately in case of panic, which is awesome!
 
-If we don't set `system.panic_on_panic`, then gem5 just hangs.
+If we don't set `system.panic_on_panic`, then gem5 just hangs on an infinite guest loop.
 
-TODO: why doesn't x86 support `system.panic_on_panic` as well? Trying to set `system.panic_on_panic` there fails with:
+TODO: why doesn't gem5 x86 ff52563a214c71fcd1e21e9f00ad839612032e3b support `system.panic_on_panic` as well? Trying to set `system.panic_on_panic` there fails with:
 
 ....
-AttributeError: Class LinuxX86System has no parameter panic_on_panic
+tried to set or access non-existentobject parameter: panic_on_panic
 ....
 
-However, as of f9eb0b72de9029ff16091a18de109c18a9ecc30a, panic on x86 makes gem5 crash with:
+However, at that commit panic on x86 makes gem5 crash with:
 
 ....
 panic: i8042 "System reset" command not implemented.
@@ -5757,6 +6070,8 @@ The implementation of panic detection happens at: https://github.com/gem5/gem5/b
 
 Here we see that the symbol `"panic"` for the `panic()` function is the one being tracked.
 
+Related thread: https://stackoverflow.com/questions/56032347/is-there-a-way-to-identify-if-gem5-run-got-over-successfully
+
 ===== Reboot on panic
 
 Make the kernel reboot after n seconds after panic:
@@ -5781,7 +6096,7 @@ If `CONFIG_KALLSYMS=n`, then addresses are shown on traces instead of symbol plu
 In v4.16 it does not seem possible to configure that at runtime. GDB step debugging with:
 
 ....
-./run --eval-after 'insmod /dump_stack.ko' --wait-gdb --tmux-args dump_stack
+./run --eval-after 'insmod dump_stack.ko' --gdb-wait --tmux-args dump_stack
 ....
 
 shows that traces are printed at `arch/x86/kernel/dumpstack.c`:
@@ -5817,7 +6132,7 @@ It is possible to make `oops` lead to panics always with:
 
 ....
 echo 1 > /proc/sys/kernel/panic_on_oops
-insmod /oops.ko
+insmod oops.ko
 ....
 
 An oops stack trace looks like:
@@ -5896,7 +6211,7 @@ This-did not work on `arm` due to <<gdb-step-debug-kernel-module-arm>> so we nee
 The `dump_stack` function produces a stack trace much like panic and oops, but causes no problems and we return to the normal control flow, and can cleanly remove the module afterwards:
 
 ....
-insmod /dump_stack.ko
+insmod dump_stack.ko
 ....
 
 Source: link:kernel_modules/dump_stack.c[]
@@ -5909,7 +6224,7 @@ One extra side effect is that we can make it also panic with:
 
 ....
 echo 1 > /proc/sys/kernel/panic_on_warn
-insmod /warn_on.ko
+insmod warn_on.ko
 ....
 
 Source: link:kernel_modules/warn_on.c[]
@@ -5932,7 +6247,7 @@ Bibliography:
 Debugfs is the simplest pseudo filesystem to play around with:
 
 ....
-/debugfs.sh
+./debugfs.sh
 echo $?
 ....
 
@@ -5945,7 +6260,7 @@ Outcome: the test passes:
 Sources:
 
 * link:kernel_modules/debugfs.c[]
-* link:rootfs_overlay/debugfs.sh[]
+* link:rootfs_overlay/lkmc/debugfs.sh[]
 
 Debugfs is made specifically to help test kernel stuff. Just mount, set <<file-operations>>, and we are done.
 
@@ -5969,7 +6284,7 @@ Bibliography: https://github.com/chadversary/debugfs-tutorial
 Procfs is just another fops entry point:
 
 ....
-/procfs.sh
+./procfs.sh
 echo $?
 ....
 
@@ -5986,7 +6301,7 @@ Procfs can run all system calls, including ones that debugfs can't, e.g. <<mmap>
 Sources:
 
 * link:kernel_modules/procfs.c[]
-* link:rootfs_overlay/procfs.sh[]
+* link:rootfs_overlay/lkmc/procfs.sh[]
 
 Bibliography: https://stackoverflow.com/questions/8516021/proc-create-example-for-kernel-module/18924359#18924359
 
@@ -6015,7 +6330,7 @@ Linux version 4.19.0-dirty (lkmc@84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d) (gcc
 Sysfs is more restricted than <<procfs>>, as it does not take an arbitrary `file_operations`:
 
 ....
-/sysfs.sh
+./sysfs.sh
 echo $?
 ....
 
@@ -6028,7 +6343,7 @@ Outcome: the test passes:
 Sources:
 
 * link:kernel_modules/sysfs.c[]
-* link:rootfs_overlay/sysfs.sh[]
+* link:rootfs_overlay/lkmc/sysfs.sh[]
 
 Vs procfs:
 
@@ -6054,7 +6369,7 @@ Bibliography:
 Character devices can have arbitrary <<file-operations>> associated to them:
 
 ....
-/character_device.sh
+./character_device.sh
 echo $?
 ....
 
@@ -6066,8 +6381,8 @@ Outcome: the test passes:
 
 Sources:
 
-* link:rootfs_overlay/character_device.sh[]
-* link:rootfs_overlay/mknoddev.sh[]
+* link:rootfs_overlay/lkmc/character_device.sh[]
+* link:rootfs_overlay/lkmc/mknoddev.sh[]
 * link:kernel_modules/character_device.c[]
 
 Unlike <<procfs>> entires, character device files are created with userland `mknod` or `mknodat` syscalls:
@@ -6109,7 +6424,7 @@ Bibliography: https://unix.stackexchange.com/questions/37829/understanding-chara
 And also destroy it on `rmmod`:
 
 ....
-/character_device_create.sh
+./character_device_create.sh
 echo $?
 ....
 
@@ -6122,7 +6437,7 @@ Outcome: the test passes:
 Sources:
 
 * link:kernel_modules/character_device_create.c[]
-* link:rootfs_overlay/character_device_create.sh[]
+* link:rootfs_overlay/lkmc/character_device_create.sh[]
 
 Bibliography: https://stackoverflow.com/questions/5970595/how-to-create-a-device-node-from-the-init-module-code-of-a-linux-kernel-module/45531867#45531867
 
@@ -6135,7 +6450,7 @@ File operations are the main method of userland driver communication. `struct fi
 This example illustrates the most basic system calls: `open`, `read`, `write`, `close` and `lseek`:
 
 ....
-/fops.sh
+./fops.sh
 echo $?
 ....
 
@@ -6148,12 +6463,12 @@ Outcome: the test passes:
 Sources:
 
 * link:kernel_modules/fops.c[]
-* link:rootfs_overlay/fops.sh[]
+* link:rootfs_overlay/lkmc/fops.sh[]
 
 Then give this a try:
 
 ....
-sh -x /fops.sh
+sh -x ./fops.sh
 ....
 
 We have put printks on each fop, so this allows you to see which system calls are being made for each command.
@@ -6165,7 +6480,7 @@ No, there no official documentation: http://stackoverflow.com/questions/15213932
 Writing trivial read <<file-operations>> is repetitive and error prone. The `seq_file` API makes the process much easier for those trivial cases:
 
 ....
-/seq_file.sh
+./seq_file.sh
 echo $?
 ....
 
@@ -6178,7 +6493,7 @@ Outcome: the test passes:
 Sources:
 
 * link:kernel_modules/seq_file.c[]
-* link:rootfs_overlay/seq_file.sh[]
+* link:rootfs_overlay/lkmc/seq_file.sh[]
 
 In this example we create a debugfs file that behaves just like a file that contains:
 
@@ -6202,7 +6517,7 @@ Bibliography:
 If you have the entire read output upfront, `single_open` is an even more convenient version of <<seq_file>>:
 
 ....
-/seq_file.sh
+./seq_file.sh
 echo $?
 ....
 
@@ -6215,7 +6530,7 @@ Outcome: the test passes:
 Sources:
 
 * link:kernel_modules/seq_file_single_open.c[]
-* link:rootfs_overlay/seq_file_single_open.sh[]
+* link:rootfs_overlay/lkmc/seq_file_single_open.sh[]
 
 This example produces a debugfs file that behaves like a file that contains:
 
@@ -6229,7 +6544,7 @@ cd
 The poll system call allows an user process to do a non-busy wait on a kernel event:
 
 ....
-/poll.sh
+./poll.sh
 ....
 
 Outcome: `jiffies` gets printed to stdout every second from userland.
@@ -6237,8 +6552,8 @@ Outcome: `jiffies` gets printed to stdout every second from userland.
 Sources:
 
 * link:kernel_modules/poll.c[]
-* link:kernel_modules/poll.c[]
-* link:rootfs_overlay/poll.sh[]
+* link:include/poll.h[]
+* link:rootfs_overlay/lkmc/poll.sh[]
 
 Typically, we are waiting for some hardware to make some piece of data available available to the kernel.
 
@@ -6253,7 +6568,7 @@ Bibliography: https://stackoverflow.com/questions/30035776/how-to-add-poll-funct
 The `ioctl` system call is the best way to pass an arbitrary number of parameters to the kernel in a single go:
 
 ....
-/ioctl.sh
+./ioctl.sh
 echo $?
 ....
 
@@ -6266,9 +6581,9 @@ Outcome: the test passes:
 Sources:
 
 * link:kernel_modules/ioctl.c[]
-* link:kernel_modules/ioctl.h[]
-* link:userland/ioctl.c[]
-* link:rootfs_overlay/ioctl.sh[]
+* link:lkmc/ioctl.h[]
+* link:userland/kernel_modules/ioctl.c[]
+* link:rootfs_overlay/lkmc/ioctl.sh[]
 
 `ioctl` is one of the most important methods of communication with real device drivers, which often take several fields as input.
 
@@ -6302,7 +6617,7 @@ Bibliography:
 The `mmap` system call allows us to share memory between user and kernel space without copying:
 
 ....
-/mmap.sh
+./mmap.sh
 echo $?
 ....
 
@@ -6315,8 +6630,8 @@ Outcome: the test passes:
 Sources:
 
 * link:kernel_modules/mmap.c[]
-* link:userland/mmap.c[]
-* link:rootfs_overlay/mmap.sh[]
+* link:userland/kernel_modules/mmap.c[]
+* link:rootfs_overlay/lkmc/mmap.sh[]
 
 In this example, we make a tiny 4 byte kernel buffer available to user-space, and we then modify it on userspace, and check that the kernel can see the modification.
 
@@ -6338,7 +6653,7 @@ Bibliography:
 Anonymous inodes allow getting multiple file descriptors from a single filesystem entry, which reduces namespace pollution compared to creating multiple device files:
 
 ....
-/anonymous_inode.sh
+./anonymous_inode.sh
 echo $?
 ....
 
@@ -6351,9 +6666,9 @@ Outcome: the test passes:
 Sources:
 
 * link:kernel_modules/anonymous_inode.c[]
-* link:kernel_modules/anonymous_inode.h[]
-* link:userland/anonymous_inode.c[]
-* link:rootfs_overlay/anonymous_inode.sh[]
+* link:lkmc/anonymous_inode.h[]
+* link:userland/kernel_modules/anonymous_inode.c[]
+* link:rootfs_overlay/lkmc/anonymous_inode.sh[]
 
 This example gets an anonymous inode via <<ioctl>> from a debugfs entry by using `anon_inode_getfd`.
 
@@ -6366,7 +6681,7 @@ Bibliography: https://stackoverflow.com/questions/4508998/what-is-an-anonymous-i
 Netlink sockets offer a socket API for kernel / userland communication:
 
 ....
-/netlink.sh
+./netlink.sh
 echo $?
 ....
 
@@ -6379,15 +6694,15 @@ Outcome: the test passes:
 Sources:
 
 * link:kernel_modules/netlink.c[]
-* link:kernel_modules/netlink.h[]
-* link:userland/netlink.c[]
-* link:rootfs_overlay/netlink.sh[]
+* link:lkmc/netlink.h[]
+* link:userland/kernel_modules/netlink.c[]
+* link:rootfs_overlay/lkmc/netlink.sh[]
 
 Launch multiple user requests in parallel to stress our socket:
 
 ....
-insmod /netlink.ko sleep=1
-for i in `seq 16`; do /netlink.out & done
+insmod netlink.ko sleep=1
+for i in `seq 16`; do ./netlink.out & done
 ....
 
 TODO: what is the advantage over `read`, `write` and `poll`? https://stackoverflow.com/questions/16727212/how-netlink-socket-in-linux-kernel-is-different-from-normal-polling-done-by-appl
@@ -6402,7 +6717,7 @@ Bibliography:
 Kernel threads are managed exactly like userland threads; they also have a backing `task_struct`, and are scheduled with the same mechanism:
 
 ....
-insmod /kthread.ko
+insmod kthread.ko
 ....
 
 Source: link:kernel_modules/kthread.c[]
@@ -6440,7 +6755,7 @@ Bibliography:
 Let's launch two threads and see if they actually run in parallel:
 
 ....
-insmod /kthreads.ko
+insmod kthreads.ko
 ....
 
 Source: link:kernel_modules/kthreads.c[]
@@ -6474,7 +6789,7 @@ The threads almost always interleaved nicely, thus confirming that they are actu
 Count to dmesg every one second from `0` up to `n - 1`:
 
 ....
-insmod /sleep.ko n=5
+insmod sleep.ko n=5
 ....
 
 Source: link:kernel_modules/sleep.c[]
@@ -6491,7 +6806,7 @@ Bibliography:
 A more convenient front-end for <<kthread>>:
 
 ....
-insmod /workqueue_cheat.ko
+insmod workqueue_cheat.ko
 ....
 
 Outcome: count from `0` to `9` infinitely many times
@@ -6515,7 +6830,7 @@ Bibliography: https://github.com/torvalds/linux/blob/v4.17/Documentation/core-ap
 Count from `0` to `9` every second infinitely many times by scheduling a new work item from a work item:
 
 ....
-insmod /work_from_work.ko
+insmod work_from_work.ko
 ....
 
 Stop:
@@ -6533,7 +6848,7 @@ Source: link:kernel_modules/work_from_work.c[]
 Let's block the entire kernel! Yay:
 
 .....
-./run --eval-after 'dmesg -n 1;insmod /schedule.ko schedule=0'
+./run --eval-after 'dmesg -n 1;insmod schedule.ko schedule=0'
 .....
 
 Outcome: the system hangs, the only way out is to kill the VM.
@@ -6547,7 +6862,7 @@ Sleep functions like `usleep_range` also end up calling schedule.
 If we allow `schedule()` to be called, then the system becomes responsive:
 
 .....
-./run --eval-after 'dmesg -n 1;insmod /schedule.ko schedule=1'
+./run --eval-after 'dmesg -n 1;insmod schedule.ko schedule=1'
 .....
 
 
@@ -6560,7 +6875,7 @@ dmesg -w
 The system also responds if we <<number-of-cores,add another core>>:
 
 ....
-./run --cpus 2 --eval-after 'dmesg -n 1;insmod /schedule.ko schedule=0'
+./run --cpus 2 --eval-after 'dmesg -n 1;insmod schedule.ko schedule=0'
 ....
 
 ==== Wait queues
@@ -6568,7 +6883,7 @@ The system also responds if we <<number-of-cores,add another core>>:
 Wait queues are a way to make a thread sleep until an event happens on the queue:
 
 ....
-insmod /wait_queue.c
+insmod wait_queue.c
 ....
 
 Dmesg output:
@@ -6613,7 +6928,7 @@ while (!cond)
 Count from `0` to `9` infinitely many times in 1 second intervals using timers:
 
 ....
-insmod /timer.ko
+insmod timer.ko
 ....
 
 Stop counting:
@@ -6640,7 +6955,7 @@ Bibliography:
 Brute force monitor every shared interrupt that will accept us:
 
 ....
-./run --eval-after 'insmod /irq.ko' --graphic
+./run --eval-after 'insmod irq.ko' --graphic
 ....
 
 Source: link:kernel_modules/irq.c[].
@@ -6775,7 +7090,7 @@ https://github.com/torvalds/linux/blob/v4.17/Documentation/core-api/kernel-api.r
 Convert a string to an integer:
 
 ....
-/kstrto.sh
+./kstrto.sh
 echo $?
 ....
 
@@ -6788,7 +7103,7 @@ Outcome: the test passes:
 Sources:
 
 * link:kernel_modules/kstrto.c[]
-* link:rootfs_overlay/kstrto.sh[]
+* link:rootfs_overlay/lkmc/kstrto.sh[]
 
 Bibliography: https://stackoverflow.com/questions/6139493/how-convert-char-to-int-in-linux-kernel/49811658#49811658
 
@@ -6797,7 +7112,7 @@ Bibliography: https://stackoverflow.com/questions/6139493/how-convert-char-to-in
 Convert a virtual address to physical:
 
 ....
-insmod /virt_to_phys.ko
+insmod virt_to_phys.ko
 cat /sys/kernel/debug/lkmc_virt_to_phys
 ....
 
@@ -6855,10 +7170,10 @@ In this section we will play with them.
 First get a virtual address to play with:
 
 ....
-/virt_to_phys_test.out &
+./posix/virt_to_phys_test.out &
 ....
 
-Source: link:userland/virt_to_phys_test.c[]
+Source: link:userland/posix/virt_to_phys_test.c[]
 
 Sample output:
 
@@ -6876,7 +7191,7 @@ The program:
 Then, translate the virtual address to physical using `/proc/<pid>/maps` and `/proc/<pid>/pagemap`:
 
 ....
-/virt_to_phys_user.out 110 0x600800
+./linux/virt_to_phys_user.out 110 0x600800
 ....
 
 Sample output physical address:
@@ -6885,9 +7200,9 @@ Sample output physical address:
 0x7c7b800
 ....
 
-Source: link:userland/virt_to_phys_user.c[]
+Source: link:userland/linux/virt_to_phys_user.c[]
 
-Now we can verify that `virt_to_phys_user.out` gave the correct physical address in the following ways:
+Now we can verify that `linux/virt_to_phys_user.out` gave the correct physical address in the following ways:
 
 * <<qemu-xp>>
 * <<dev-mem>>
@@ -6901,7 +7216,7 @@ Bibliography:
 
 The `xp` <<qemu-monitor>> command reads memory at a given physical address.
 
-First launch `virt_to_phys_user.out` as described at <<userland-physical-address-experiments>>.
+First launch `linux/virt_to_phys_user.out` as described at <<userland-physical-address-experiments>>.
 
 On a second terminal, use QEMU to read the physical address:
 
@@ -6924,7 +7239,7 @@ We could not find however to write to memory from the QEMU monitor, boring.
 
 `/dev/mem` exposes access to physical addresses, and we use it through the convenient `devmem` BusyBox utility.
 
-First launch `virt_to_phys_user.out` as described at <<userland-physical-address-experiments>>.
+First launch `linux/virt_to_phys_user.out` as described at <<userland-physical-address-experiments>>.
 
 Next, read from the physical address:
 
@@ -6953,7 +7268,7 @@ After one second, we see on the screen:
 
 ....
 i 9abcdef0
-[1]+  Done                       /virt_to_phys_test.out
+[1]+  Done                       ./posix/virt_to_phys_test.out
 ....
 
 so the value changed, and the `while` loop exited!
@@ -6975,35 +7290,35 @@ Bibliography: https://stackoverflow.com/questions/11891979/how-to-access-mmaped-
 
 Dump the physical address of all pages mapped to a given process using `/proc/<pid>/maps` and `/proc/<pid>/pagemap`.
 
-First launch `virt_to_phys_user.out` as described at <<userland-physical-address-experiments>>. Suppose that the output was:
+First launch `linux/virt_to_phys_user.out` as described at <<userland-physical-address-experiments>>. Suppose that the output was:
 
 ....
-# /virt_to_phys_test.out &
+# ./posix/virt_to_phys_test.out &
 vaddr 0x601048
 pid 63
-# /virt_to_phys_user.out 63 0x601048
+# ./linux/virt_to_phys_user.out 63 0x601048
 0x1a61048
 ....
 
 Now obtain the page map for the process:
 
 ....
-/pagemap_dump.out 63
+./linux/pagemap_dump.out 63
 ....
 
 Sample output excerpt:
 
 ....
 vaddr pfn soft-dirty file/shared swapped present library
-400000 1ede 0 1 0 1 /virt_to_phys_test.out
-600000 1a6f 0 0 0 1 /virt_to_phys_test.out
-601000 1a61 0 0 0 1 /virt_to_phys_test.out
+400000 1ede 0 1 0 1 ./posix/virt_to_phys_test.out
+600000 1a6f 0 0 0 1 ./posix/virt_to_phys_test.out
+601000 1a61 0 0 0 1 ./posix/virt_to_phys_test.out
 602000 2208 0 0 0 1 [heap]
 603000 220b 0 0 0 1 [heap]
 7ffff78ec000 1fd4 0 1 0 1 /lib/libuClibc-1.0.30.so
 ....
 
-Source: link:userland/pagemap_dump.c[]
+Source: link:userland/linux/pagemap_dump.c[]
 
 Adapted from: https://github.com/dwks/pagemap/blob/8a25747bc79d6080c8b94eac80807a4dceeda57a/pagemap2.c
 
@@ -7012,7 +7327,7 @@ Meaning of the flags:
 * `vaddr`: first virtual address of a page the belongs to the process. Notably:
 +
 ....
-./run-toolchain readelf -- -l "$(./getvar userland_build_dir)/virt_to_phys_test.out"
+./run-toolchain readelf -- -l "$(./getvar userland_build_dir)/posix/virt_to_phys_test.out"
 ....
 +
 contains:
@@ -7043,7 +7358,7 @@ Three zeroes is 12 bits which is 4kB, which is the size of a page.
 +
 For example, the virtual address `0x601000` has `pfn` of `0x1a61`, which means that its physical address is `0x1a61000`
 +
-This is consistent with what `virt_to_phys_user.out` told us: the virtual address `0x601048` has physical address `0x1a61048`.
+This is consistent with what `linux/virt_to_phys_user.out` told us: the virtual address `0x601048` has physical address `0x1a61048`.
 +
 `048` corresponds to the three last zeroes, and is the offset within the page.
 +
@@ -7084,7 +7399,7 @@ Logs proc events such as process creation to a link:kernel_modules/netlink.c[net
 We then have a userland program that listens to the events and prints them out:
 
 ....
-# /proc_events.out &
+# ./linux/proc_events.out &
 # set mcast listen ok
 # sleep 2 & sleep 1
 fork: parent tid=48 pid=48 -> child tid=79 pid=79
@@ -7098,7 +7413,7 @@ a
 #
 ....
 
-Source: link:userland/proc_events.c[]
+Source: link:userland/linux/proc_events.c[]
 
 TODO: why `exit: tid=79` shows after `exit: tid=80`?
 
@@ -7107,7 +7422,7 @@ Note how `echo a` is a Bash built-in, and therefore does not spawn a new process
 TODO: why does this produce no output?
 
 ....
-/proc_events.out >f &
+./linux/proc_events.out >f &
 ....
 
 * https://stackoverflow.com/questions/6075013/detect-launching-of-programs-on-linux-platform/8255487#8255487
@@ -7293,7 +7608,7 @@ kprobes is an instrumentation mechanism that injects arbitrary code at a given a
 Then on guest:
 
 ....
-insmod /kprobe_example.ko
+insmod kprobe_example.ko
 sleep 4 & sleep 4 &'
 ....
 
@@ -7311,7 +7626,7 @@ Source: link:kernel_modules/kprobe_example.c[]
 TODO: it does not work if I try to immediately launch `sleep`, why?
 
 ....
-insmod /kprobe_example.ko
+insmod kprobe_example.ko
 sleep 4 & sleep 4 &
 ....
 
@@ -7422,7 +7737,7 @@ TODO `--arch arm` and `--arch aarch64` does not count firmware instructions prop
 * We can also discount the instructions after `init` runs by using `readelf` to get the initial address of `init`. One easy way to do that now is to just run:
 +
 ....
-./run-gdb-user "$(./getvar userland_build_dir)/poweroff.out" main
+./run-gdb-user "$(./getvar userland_build_dir)/linux/poweroff.out" main
 ....
 +
 And get that from the traces, e.g. if the address is `4003a0`, then we search:
@@ -7455,7 +7770,7 @@ Detects buffer overflows for us:
 ./build-modules --clean
 ./build-modules
 ./build-buildroot
-./run --eval-after 'insmod /strlen_overflow.ko' --linux-build-id fortify
+./run --eval-after 'insmod strlen_overflow.ko' --linux-build-id fortify
 ....
 
 Possible dmesg output:
@@ -7473,7 +7788,7 @@ You may not get this error because this depends on `strlen` overflowing at least
 TODO not always reproducible. Find a more reproducible failure. I could not observe it on:
 
 ....
-insmod /memcpy_overflow.ko
+insmod memcpy_overflow.ko
 ....
 
 Source: link:kernel_modules/strlen_overflow.c[]
@@ -7574,7 +7889,7 @@ TODO how to write to registers. Currently using `/dev/mem` and `lspci`.
 This example should handle interrupts from userland and print a message to stdout:
 
 ....
-/uio_read.sh
+./uio_read.sh
 ....
 
 TODO: what is the expected behaviour? I should have documented this when I wrote this stuff, and I'm that lazy right now that I'm in the middle of a refactor :-)
@@ -7587,8 +7902,8 @@ UIO interface in a nutshell:
 
 Sources:
 
-* link:userland/uio_read.c[]
-* link:rootfs_overlay/uio_read.sh[]
+* link:userland/kernel_modules/uio_read.c[]
+* link:rootfs_overlay/lkmc/uio_read.sh[]
 
 Bibliography:
 
@@ -7715,10 +8030,10 @@ echo 0 > /proc/sys/kernel/ctrl-alt-del
 Minimal example:
 
 ....
-./run --kernel-cli 'init=/ctrl_alt_del.out' --graphic
+./run --kernel-cli 'init=/lkmc/linux/ctrl_alt_del.out' --graphic
 ....
 
-Source: link:userland/ctrl_alt_del.c[]
+Source: link:userland/linux/ctrl_alt_del.c[]
 
 When you hit `Ctrl-Alt-Del` in the guest, our tiny init handles a `SIGINT` sent by the kernel and outputs to stdout:
 
@@ -7816,7 +8131,7 @@ In order to play with TTYs, do this:
 
 ....
 printf '
-tty2::respawn:/sbin/getty -n -L -l /loginroot.sh tty2 0 vt100
+tty2::respawn:/sbin/getty -n -L -l /lkmc/loginroot.sh tty2 0 vt100
 tty3::respawn:-/bin/sh
 tty4::respawn:/sbin/getty 0 tty4
 tty63::respawn:-/bin/sh
@@ -7876,7 +8191,7 @@ The trailing dash `-` can be used on any command. It makes the command that foll
 The `getty` executable however also does this operation and therefore dispenses the `-`.
 * `/sbin/getty` asks for password, and then gives you an `sh`
 +
-We can overcome the password prompt with the `-l /loginroot.sh` technique explained at: https://askubuntu.com/questions/902998/how-to-check-which-tty-am-i-using but I don't see any advantage over `-/bin/sh` currently.
+We can overcome the password prompt with the `-l /lkmc/loginroot.sh` technique explained at: https://askubuntu.com/questions/902998/how-to-check-which-tty-am-i-using but I don't see any advantage over `-/bin/sh` currently.
 
 Identify the current TTY with the command:
 
@@ -7918,10 +8233,10 @@ See also: https://stackoverflow.com/questions/16706423/two-instances-of-busybox-
 Get the TTY in bulk for all processes:
 
 ....
-/psa.sh
+./psa.sh
 ....
 
-Source: link:rootfs_overlay/psa.sh[].
+Source: link:rootfs_overlay/lkmc/psa.sh[].
 
 The TTY appears under the `TT` section, which is enabled by `-o tty`. This shows the TTY device number, e.g.:
 
@@ -7938,7 +8253,7 @@ ls -l /dev/tty1
 Next try:
 
 ....
-insmod /kthread.ko
+insmod kthread.ko
 ....
 
 and switch between virtual terminals, to understand that the dmesg goes to whatever current virtual terminal you are on, but not the others, and not to the serial terminals.
@@ -7980,14 +8295,13 @@ Outcome: `Alt-Right` cycles between three TTYs, `tty1` being the default one tha
 `man 2 setsid` says that there is only one failure possibility:
 
 ____
-
 EPERM  The process group ID of any process equals the PID of the calling process.  Thus, in particular, setsid() fails if the calling process is already a process group leader.
 ____
 
 We can get some visibility into it to try and solve the problem with:
 
 ....
-/psa.sh
+./psa.sh
 ....
 
 ===== console kernel boot parameter
@@ -8031,11 +8345,11 @@ DRM / DRI is the new interface that supersedes `fbdev`:
 
 ....
 ./build-buildroot --config 'BR2_PACKAGE_LIBDRM=y'
-./build-userland --has-package libdrm -- libdrm_modeset
-./run --eval-after '/libdrm_modeset.out' --graphic
+./build-userland --package libdrm -- userland/libs/libdrm/modeset.c
+./run --eval-after './libs/libdrm/modeset.out' --graphic
 ....
 
-Source: link:userland/libdrm_modeset.c[]
+Source: link:userland/libs/libdrm_modeset.c[]
 
 Outcome: for a few seconds, the screen that contains the terminal gets taken over by changing colors of the rainbow.
 
@@ -8043,9 +8357,9 @@ TODO not working for `aarch64`, it takes over the screen for a few seconds and t
 
 ....
 ./build-buildroot --config 'BR2_PACKAGE_LIBDRM=y'
-./build-userland --has-package libdrm
+./build-userland --package libdrm
 ./build-buildroot
-./run --eval-after '/libdrm_modeset.out' --graphic
+./run --eval-after './libs/libdrm/modeset.out' --graphic
 ....
 
 <<kmscube>> however worked, which means that it must be a bug with this demo?
@@ -8070,7 +8384,7 @@ Try creating new displays:
 to see multiple `/dev/dri/cardN`, and then use a different display with:
 
 ....
-./run --eval-after '/libdrm_modeset.out' --graphic
+./run --eval-after './libs/libdrm/modeset.out' --graphic
 ....
 
 Bibliography:
@@ -8168,13 +8482,11 @@ When I build it on Ubuntu 18.04 host, it does not generate any executable, so I'
 
 Bibliography: https://stackoverflow.com/questions/3177338/how-is-the-linux-kernel-tested
 
-==== LTP
-
-Linux Test Project
+==== Linux Test Project
 
 https://github.com/linux-test-project/ltp
 
-C userland test suite.
+Tests a lot of Linux and POSIX userland visible interfaces.
 
 Buildroot already has a package, so it is trivial to build it:
 
@@ -8182,16 +8494,27 @@ Buildroot already has a package, so it is trivial to build it:
 ./build-buildroot --config 'BR2_PACKAGE_LTP_TESTSUITE=y'
 ....
 
-Then try it out with:
+So now let's try and see if the `exit` system call is working:
 
 ....
-cd /usr/lib/ltp-testsuite/testcases
-./bin/write01
+/usr/lib/ltp-testsuite/testcases/bin/exit01
 ....
 
-There is a main executable `execltp` to run everything, but it depends on Python, so let's just run them manually.
+which gives successful output:
 
-TODO a large chunk of tests, the Open POSIX test suite, is disabled with a comment on Buildroot master saying build failed: https://github.com/buildroot/buildroot/blob/3f37dd7c3b5eb25a41edc6f72ba73e5a21b07e9b/package/ltp-testsuite/ltp-testsuite.mk#L13 However, both tickets mentioned there were closed, so we should try it out and patch Buildroot if it works now.
+....
+exit01      1  TPASS  :  exit() test PASSED
+....
+
+and has source code at: https://github.com/linux-test-project/ltp/blob/20190115/testcases/kernel/syscalls/exit/exit01.c
+
+Besides testing any kernel modifications you make, LTP can also be used to the system call implementation of <<user-mode-simulation>> as shown at <<user-mode-buildroot-executables>>:
+
+....
+./run --userland "$(./getvar buildroot_target_dir)/usr/lib/ltp-testsuite/testcases/bin/exit01"
+....
+
+Tested at: 287c83f3f99db8c1ff9bbc85a79576da6a78e986 + 1.
 
 ==== stress
 
@@ -8354,7 +8677,7 @@ qcow2 filesystems must be used for that to work.
 To test it out, login into the VM with and run:
 
 ....
-./run --eval-after 'umount /mnt/9p/*;/count.sh'
+./run --eval-after 'umount /mnt/9p/*;./count.sh'
 ....
 
 On another shell, take a snapshot:
@@ -8479,7 +8802,7 @@ PCI driver for our minimal `pci_min.c` QEMU fork device:
 then:
 
 ....
-insmod /pci_min.ko
+insmod pci_min.ko
 ....
 
 Sources:
@@ -8516,7 +8839,7 @@ Probe already does a MMIO write, which generates an IRQ and tests everything.
 Small upstream educational PCI device:
 
 ....
-/qemu_edu.sh
+./qemu_edu.sh
 ....
 
 This tests a lot of features of the edu device, to understand the results, compare the inputs with the documentation of the hardware: https://github.com/qemu/qemu/blob/v2.12.0/docs/specs/edu.txt
@@ -8525,7 +8848,7 @@ Sources:
 
 * kernel module: link:kernel_modules/qemu_edu.c[]
 * QEMU device: https://github.com/qemu/qemu/blob/v2.12.0/hw/misc/edu.c
-* test script: link:rootfs_overlay/qemu_edu.sh[]
+* test script: link:rootfs_overlay/lkmc/qemu_edu.sh[]
 
 Works because we add to our default QEMU CLI:
 
@@ -8536,7 +8859,7 @@ Works because we add to our default QEMU CLI:
 This example uses:
 
 * the QEMU `edu` educational device, which is a minimal educational in-tree PCI example
-* out `/pci.ko` kernel module, which exercises the `edu` hardware.
+* the `pci.ko` kernel module, which exercises the `edu` hardware.
 +
 I've contacted the awesome original author author of `edu` link:https://github.com/jirislaby[Jiri Slaby], and he told there is no official kernel module example because this was created for a kernel module university course that he gives, and he didn't want to give away answers. link:https://github.com/cirosantilli/how-to-teach-efficiently[I don't agree with that philosophy], so students, cheat away with this repo and go make startups instead.
 
@@ -8649,7 +8972,7 @@ followed by a trace.
 Next, also try using our <<irq-ko>> IRQ monitoring module before triggering the interrupt:
 
 ....
-insmod /irq.ko
+insmod irq.ko
 devmem 0xfeb54000 w 0x12345678
 ....
 
@@ -8772,10 +9095,10 @@ Then compile with:
 then test it out with:
 
 ....
-/gpio.sh
+./gpio.sh
 ....
 
-Source: link:rootfs_overlay/gpio.sh[]
+Source: link:rootfs_overlay/lkmc/gpio.sh[]
 
 Buildroot's Linux tools package provides some GPIO CLI tools: `lsgpio`, `gpio-event-mon`, `gpio-hammer`, TODO document them here.
 
@@ -8861,13 +9184,13 @@ We can also observe the interrupt with <<dummy-irq>>:
 
 ....
 modprobe dummy-irq irq=34
-insmod /platform_device.ko
+insmod platform_device.ko
 ....
 
 The IRQ number `34` was found by on the dmesg after:
 
 ....
-insmod /platform_device.ko
+insmod platform_device.ko
 ....
 
 Bibliography: https://stackoverflow.com/questions/28315265/how-to-add-a-new-device-in-qemu-source-code/44612957#44612957
@@ -8880,9 +9203,9 @@ http://gedare-csphd.blogspot.co.uk/2013/02/adding-simple-io-device-to-gem5.html
 
 === QEMU monitor
 
-The QEMU monitor is a terminal that allows you to send text commands to the QEMU VM: https://en.wikibooks.org/wiki/QEMU/Monitor
+The QEMU monitor is a magic terminal that allows you to send text commands to the QEMU VM itself: https://en.wikibooks.org/wiki/QEMU/Monitor
 
-On another terminal, run:
+While QEMU is running, on another terminal, run:
 
 ....
 ./qemu-monitor
@@ -8904,7 +9227,7 @@ Source: link:qemu-monitor[]
 
 `qemu-monitor` uses the `-monitor` QEMU command line option, which makes the monitor listen from a socket.
 
-Alternatively, from text mode:
+Alternatively, we can also enter the QEMU monitor from inside `-nographics` <<qemu-text-mode>> with:
 
 ....
 Ctrl-A C
@@ -8919,7 +9242,7 @@ Ctrl-A C
 * http://stackoverflow.com/questions/14165158/how-to-switch-to-qemu-monitor-console-when-running-with-curses
 * https://superuser.com/questions/488263/how-to-switch-to-the-qemu-control-panel-with-nographics
 
-And in graphic mode from the GUI:
+When in graphic mode, we can do it from the GUI:
 
 ....
 Ctrl-Alt ?
@@ -8927,6 +9250,20 @@ Ctrl-Alt ?
 
 where `?` is a digit `1`, or `2`, or, `3`, etc. depending on what else is available on the GUI: serial, parallel and frame buffer.
 
+Finally, we can also access QEMU monitor commands directly from <<gdb>> with the `monitor` command:
+
+....
+./run-gdb
+....
+
+then inside that shell:
+
+....
+monitor info qtree
+....
+
+This way you can use both QEMU monitor and GDB commands to inspect the guest from inside a single shell! Pretty awesome.
+
 In general, `./qemu-monitor` is the best option, as it:
 
 * works on both modes
@@ -8981,13 +9318,13 @@ run
 And in QEMU:
 
 ....
-/qemu_edu.sh
+./qemu_edu.sh
 ....
 
 Or for a faster development loop:
 
 ....
-./run --debug-vm --debug-vm-args '-ex "break edu_mmio_read" -ex "run"'
+./run --debug-vm-args '-ex "break edu_mmio_read" -ex "run"'
 ....
 
 When in <<qemu-text-mode>>, using `--debug-vm` makes Ctrl-C not get passed to the QEMU guest anymore: it is instead captured by GDB itself, so allow breaking. So e.g. you won't be able to easily quit from a guest program like:
@@ -9101,7 +9438,7 @@ QEMU also has a second trace mechanism in addition to `-trace`, find out the eve
 Let's pick the one that dumps executed instructions, `in_asm`:
 
 ....
-./run --eval '/poweroff.out' -- -D out/trace.txt -d in_asm
+./run --eval './linux/poweroff.out' -- -D out/trace.txt -d in_asm
 less out/trace.txt
 ....
 
@@ -9150,6 +9487,8 @@ We can further use Binutils' `addr2line` to get the line that corresponds to eac
 less "$(./getvar --arch x86_64 run_dir)/trace-lines.txt"
 ....
 
+The last commands takes several seconds.
+
 The format is as follows:
 
 ....
@@ -9177,15 +9516,15 @@ This awesome feature allows you to examine a single run as many times as you wou
 
 ....
 # Record a run.
-./run --eval-after '/rand_check.out;/poweroff.out;' --record
+./run --eval-after './linux/rand_check.out;./linux/poweroff.out;' --record
 # Replay the run.
-./run --eval-after '/rand_check.out;/poweroff.out;' --replay
+./run --eval-after './linux/rand_check.out;./linux/poweroff.out;' --replay
 ....
 
 A convenient shortcut to do both at once to test the feature is:
 
 ....
-./qemu-rr --eval-after '/rand_check.out;/poweroff.out;'
+./qemu-rr --eval-after './linux/rand_check.out;./linux/poweroff.out;'
 ....
 
 By comparing the terminal output of both runs, we can see that they are the exact same, including things which normally differ across runs:
@@ -9212,7 +9551,7 @@ EXT4-fs (sda): re-mounted. Opts: block_validity,barrier,user_xattr
 TODO replay with network gets stuck:
 
 ....
-./qemu-rr --eval-after 'ifup -a;wget -S google.com;/poweroff.out;'
+./qemu-rr --eval-after 'ifup -a;wget -S google.com;./linux/poweroff.out;'
 ....
 
 after the message:
@@ -9231,7 +9570,7 @@ Then, when I tried with <<initrd>> and no disk:
 
 ....
 ./build-buildroot --arch aarch64 --initrd
-./qemu-rr --arch aarch64 --eval-after '/rand_check.out;/poweroff.out;' --initrd
+./qemu-rr --arch aarch64 --eval-after './linux/rand_check.out;./linux/poweroff.out;' --initrd
 ....
 
 QEMU crashes with:
@@ -9251,8 +9590,8 @@ TODO get working.
 QEMU replays support checkpointing, and this allows for a simplistic "reverse debugging" implementation proposed at https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg00478.html on the unmerged link:https://github.com/ispras/qemu/tree/rr-180725[]:
 
 ....
-./run --eval-after '/rand_check.out;/poweroff.out;' --record
-./run --eval-after '/rand_check.out;/poweroff.out;' --replay --wait-gdb
+./run --eval-after './linux/rand_check.out;./linux/poweroff.out;' --record
+./run --eval-after './linux/rand_check.out;./linux/poweroff.out;' --replay --gdb-wait
 ....
 
 On another shell:
@@ -9278,7 +9617,7 @@ and we are back at `start_kernel`
 TODO: is there any way to distinguish which instruction runs on each core? Doing:
 
 ....
-./run --arch x86_64 --cpus 2 --eval '/poweroff.out' --trace exec_tb
+./run --arch x86_64 --cpus 2 --eval './linux/poweroff.out' --trace exec_tb
 ./qemu-trace2txt
 ....
 
@@ -9749,31 +10088,31 @@ https://stackoverflow.com/questions/6147242/heap-vs-binary-search-tree-bst/29548
 Usage:
 
 ....
+./build-userland \
+  --arch aarch64 \
+  --ccflagg='-DLKMC_M5OPS_ENABLE=1' \
+  --force-build cpp/bst_vs_heap \
+  --static \
+;
 ./run \
   --arch aarch64 \
-  --eval-after '/gem5.sh' \
   --emulator gem5 \
-  --gem5-readfile '/bst_vs_heap.out' \
+  --static \
+  --userland userland/cpp/bst_vs_heap.cpp \
+  --userland-args='1000' \
 ;
-./bst-vs-heap --arch aarch64 --emulator gem5 > bst_vs_heap.dat
+./bst-vs-heap --arch aarch64 > bst_vs_heap.dat
+./bst-vs-heap.gnuplot
+xdg-open bst-vs-heap.tmp.png
 ....
 
-and then feed `bst_vs_heap.dat` into: https://github.com/cirosantilli/cpp-cheat/blob/9d0f77792fc8e55b20b6ee32018761ef3c5a3f2f/cpp/interactive/bst_vs_heap.gnuplot
-
 Sources:
 
+* link:userland/cpp/bst_vs_heap.cpp[]
 * link:bst-vs-heap[]
-* link:userland/bst_vs_heap.cpp[]
+* link:bst-vs-heap.gnuplot[]
 
-===== OpenMP
-
-Implemented by GCC itself, so just a toolchain configuration, no external libs, and we enable it by default:
-
-....
-/openmp.out
-....
-
-Source: link:userland/openmp.c[]
+Tested on e70103b9b32e6e33dbab9eaf2ff00c358f55d8db + 1 with the workaround patch mentioned at: <<fatal-kernel-too-old>>.
 
 ===== BLAS
 
@@ -9781,8 +10120,8 @@ Buildroot supports it, which makes everything just trivial:
 
 ....
 ./build-buildroot --config 'BR2_PACKAGE_OPENBLAS=y'
-./build-userland --has-package openblas -- openblas_hello
-./run --eval-after '/openblas_hello.out; echo $?'
+./build-userland --package openblas -- userland/libs/openblas/hello.c
+./run --eval-after './libs/openblas/hello.out; echo $?'
 ....
 
 Outcome: the test passes:
@@ -9791,7 +10130,7 @@ Outcome: the test passes:
 0
 ....
 
-Source: link:userland/openblas.c[]
+Source: link:userland/libs/openblas/hello.c[]
 
 The test performs a general matrix multiplication:
 
@@ -9821,13 +10160,13 @@ Header only linear algebra library with a mainline Buildroot package:
 
 ....
 ./build-buildroot --config 'BR2_PACKAGE_EIGEN=y'
-./build-userland --has-package eigen -- eigen_hello
+./build-userland --package eigen -- userland/libs/eigen/hello.cpp
 ....
 
 Just create an array and print it:
 
 ....
-./run --eval-after '/eigen_hello.out'
+./run --eval-after './libs/eigen/hello.out'
 ....
 
 Output:
@@ -9837,7 +10176,7 @@ Output:
 2.5 1.5
 ....
 
-Source: link:userland/eigen_hello.cpp[]
+Source: link:userland/libs/eigen/hello.cpp[]
 
 This example just creates a matrix and prints it out.
 
@@ -10011,7 +10350,7 @@ You may also want to test if your patches are still functionally correct inside
 Analogous <<kernel-command-line-parameters,to QEMU>>:
 
 ....
-./run --arch arm --kernel-cli 'init=/poweroff.out' --emulator gem5
+./run --arch arm --kernel-cli 'init=/lkmc/linux/poweroff.out' --emulator gem5
 ....
 
 Internals: when we give `--command-line=` to gem5, it overrides default command lines, including some mandatory ones which are required to boot properly.
@@ -10037,7 +10376,7 @@ Kernel command line:
 Analogous <<gdb,to QEMU>>, on the first shell:
 
 ....
-./run --arch arm --wait-gdb --emulator gem5
+./run --arch arm --emulator gem5 --gdb-wait
 ....
 
 On the second shell:
@@ -10054,7 +10393,7 @@ On a third shell:
 
 When you want to break, just do a `Ctrl-C` on GDB shell, and then `continue`.
 
-And we now see the boot messages, and then get a shell. Now try the `/count.sh` procedure described for QEMU: <<gdb-step-debug-kernel-post-boot>>.
+And we now see the boot messages, and then get a shell. Now try the `./count.sh` procedure described for QEMU: <<gdb-step-debug-kernel-post-boot>>.
 
 ==== gem5 GDB step debug userland process
 
@@ -10064,7 +10403,7 @@ The alternative is to do as in <<gdb-step-debug-userland-processes>>.
 
 Next, follow the exact same steps explained at <<gdb-step-debug-userland-non-init-without--d>>, but passing `-g` to every command as usual.
 
-But then TODO (I'll still go crazy one of those days): for `arm`, while debugging `/myinsmod.out /hello.ko`, after then line:
+But then TODO (I'll still go crazy one of those days): for `arm`, while debugging `./linux/myinsmod.out hello.ko`, after then line:
 
 ....
 23     if (argc < 3) {
@@ -10196,7 +10535,7 @@ printf 'sh' > "$(./getvar gem5_readfile)"
 
 Since this is such a common setup, we provide some helpers for it as described at <<gem5-run-benchmark>>:
 
-* link:rootfs_overlay/gem5.sh[rootfs_overlay/gem5.sh]. This script is analogous to gem5's in-tree link:https://github.com/gem5/gem5/blob/2b4b94d0556c2d03172ebff63f7fc502c3c26ff8/configs/boot/hack_back_ckpt.rcS[hack_back_ckpt.rcS], but with less noise.
+* link:rootfs_overlay/lkmc/gem5.sh[]. This script is analogous to gem5's in-tree link:https://github.com/gem5/gem5/blob/2b4b94d0556c2d03172ebff63f7fc502c3c26ff8/configs/boot/hack_back_ckpt.rcS[hack_back_ckpt.rcS], but with less noise.
 * `./run --gem5-readfile` is a convenient way to set the `m5 readfile`
 
 Other loophole possibilities include:
@@ -10436,29 +10775,44 @@ adsf
 
 ==== m5ops instructions
 
-The executable `/m5ops.out` illustrates how to hard code with inline assembly the m5ops that you are most likely to hack into the benchmark you are analysing:
+gem5 allocates some magic instructions on unused instruction encodings for convenient guest instrumentation.
+
+Those instructions are exposed through the <<m5>> in tree executable.
+
+To make things simpler to understand, you can play around with our own minimized educational `m5` subset link:userland/c/m5ops.c[].
+
+The instructions used by `./c/m5ops.out` are present in link:lkmc/m5ops.h[] in a very simple to understand and reuse inline assembly form.
+
+To use that file, first rebuild `m5ops.out` with the m5ops instructions enabled and install it on the root filesystem:
+
+....
+./build-userland \
+  --arch aarch64 \
+  --ccflags='-DLKMC_M5OPS_ENABLE=1' \
+  --force-build c/m5ops \
+  --static \
+;
+./build-buildroot --arch aarch64
+....
+
+We don't enable `-DLKMC_M5OPS_ENABLE=1` by default on userland executables because we try to use a single image for both gem5, QEMU and <<userland-setup-getting-started-natively,native>>, and those instructions would break the latter two. We enable it in the <<baremetal-setup>> by default since we already have different images for QEMU and gem5 there.
+
+Then, from inside <<gem5-buildroot-setup>>, test it out with:
 
 ....
 # checkpoint
-/m5ops.out c
+./c/m5ops.out c
 
 # dumpstats
-/m5ops.out d
+./c/m5ops.out d
 
 # exit
-/m5ops.out e
+./c/m5ops.out e
 
 # dump resetstats
-/m5ops.out r
+./c/m5ops.out r
 ....
 
-Sources:
-
-* link:userland/m5ops.h[]
-* link:userland/m5ops.c[]
-
-That executable is of course a subset of <<m5>> and useless by itself: its goal is only illustrate how to hardcode some <<m5ops>> yourself as one-liners.
-
 In theory, the cleanest way to add m5ops to your benchmarks would be to do exactly what the `m5` tool does:
 
 * include link:https://github.com/gem5/gem5/blob/05c4c2b566ce351ab217b2bd7035562aa7a76570/include/gem5/asm/generic/m5ops.h[`include/gem5/asm/generic/m5ops.h`]
@@ -10645,39 +10999,7 @@ system.cpu.dtb.inst_misses
 system.cpu.dtb.inst_hits
 ....
 
-==== rdtsc
-
-Let's have some fun and try to correlate the gem5 cycle count `system.cpu.numCycles` with the link:https://en.wikipedia.org/wiki/Time_Stamp_Counter[x86 `rdtsc` instruction] that is supposed to do the same thing:
-
-....
-./build-userland -- rdtsc
-./run --eval '/rdtsc.out;m5 exit;' --emulator gem5
-./gem5-stat
-....
-
-Source: link:userland/rdtsc.c[]
-
-`rdtsc` outputs a cycle count which we compare with gem5's `gem5-stat`:
-
-* `3828578153`: `rdtsc`
-* `3830832635`: `gem5-stat`
-
-which gives pretty close results, and serve as a nice sanity check that the cycle counter is coherent.
-
-It is also nice to see that `rdtsc` is a bit smaller than the `stats.txt` value, since the latter also includes the exec syscall for `m5`.
-
-Bibliography:
-
-* https://en.wikipedia.org/wiki/Time_Stamp_Counter
-* https://stackoverflow.com/questions/9887839/clock-cycle-count-wth-gcc/9887979
-
-===== pmccntr
-
-TODO We didn't manage to find a working ARM analogue to <<rdtsc>>: link:kernel_modules/pmccntr.c[] is oopsing, and even it if weren't, it likely won't give the cycle count since boot since it needs to be activate before it starts counting anything:
-
-* https://stackoverflow.com/questions/40454157/is-there-an-equivalent-instruction-to-rdtsc-in-arm
-* https://stackoverflow.com/questions/31620375/arm-cortex-a7-returning-pmccntr-0-in-kernel-mode-and-illegal-instruction-in-u/31649809#31649809
-* https://blog.regehr.org/archives/794
+For x86, it is interesting to try and correlate `numCycles` with:
 
 ==== config.ini
 
@@ -11078,7 +11400,7 @@ Note that dots cannot be used as in `1.5G`, so just use Megs as in `1500M` inste
 Unfortunately, TODO we don't have a perfect way to find the right value for `BR2_TARGET_ROOTFS_EXT2_SIZE`. One good heuristic is:
 
 ....
-du -hsx "$(./getvar --arch arm target_dir)"
+du -hsx "$(./getvar --arch arm buildroot_target_dir)"
 ....
 
 Some promising ways to overcome this problem include:
@@ -11209,30 +11531,1437 @@ git -C "$(./getvar buildroot_source_dir)" grep 'depends on BR2_TOOLCHAIN_USES_GL
 
 One "downside" of glibc is that it exercises much more kernel functionality on its more bloated pre-main init, which breaks user mode C hello worlds more often, see: <<user-mode-simulation-with-glibc>>. I quote "downside" because glibc is actually exposing emulator bugs which we should actually go and fix.
 
+== Userland content
+
+This section contains userland content, such as <<c>>, <<cpp>> and <<posix>> examples.
+
+This content makes up the bulk of the link:userland/[] directory.
+
+Getting started at: <<userland-setup>>
+
+The quickest way to run the arch agnostic examples, which comprise the majority of the examples, is natively with: <<userland-setup-getting-started-natively>>
+
+This section was originally moved in here from: https://github.com/cirosantilli/cpp-cheat
+
+=== C
+
+Programs under link:userland/c/[] are examples of link:https://en.wikipedia.org/wiki/ANSI_C[ANSI C] programming:
+
+* Standard library
+** assert.h
+*** link:userland/c/assert_fail.c[]
+
+Userland assembly content is located at: <<userland-assembly>>. It was split from this section basically becase we were hitting the HTML `h6` limit, stupid web :-)
+
+==== GCC C extensions
+
+===== C empty struct
+
+Example: link:userland/gcc/empty_struct.c[]
+
+Documentation: https://gcc.gnu.org/onlinedocs/gcc-8.2.0/gcc/Empty-Structures.html#Empty-Structures
+
+Question: https://stackoverflow.com/questions/24685399/c-empty-struct-what-does-this-mean-do
+
+===== OpenMP
+
+GCC implements the <<OpenMP>> threading implementation: https://stackoverflow.com/questions/3949901/pthreads-vs-openmp
+
+Example: link:userland/gcc/openmp.c[]
+
+The implementation is built into GCC itself. It is enabled at GCC compile time by `BR2_GCC_ENABLE_OPENMP=y` on Buildroot, and at program compile time by `-fopenmp`.
+
+It seems to be easier to use for compute parallelism and more language agnostic than POSIX threads.
+
+pthreads are more versatile though and allow for a superset of OpenMP.
+
+The implementation lives under `libgomp` in the GCC tree, and is documented at: https://gcc.gnu.org/onlinedocs/libgomp/
+
+`strace` shows that OpenMP makes `clone()` syscalls in Linux. TODO: does it actually call `pthread_` functions, or does it make syscalls directly? Or in other words, can it work on <<freestanding-programs>>? A quick grep shows many references to pthreads.
+
+[[cpp]]
+=== C++
+
+Programs under link:userland/cpp/[] are examples of link:https://en.wikipedia.org/wiki/C%2B%2B#Standardization[ISO C] programming.
+
+=== POSIX
+
+Programs under link:userland/posix/[] are examples of POSIX C programming.
+
+What is POSIX:
+
+* https://stackoverflow.com/questions/1780599/what-is-the-meaning-of-posix/31865755#31865755
+* https://unix.stackexchange.com/questions/11983/what-exactly-is-posix/220877#220877
+
+== Userland assembly
+
+Programs under `userland/arch/<arch>/` are examples of userland assembly programming.
+
+This section will document ISA agnostic concepts.
+
+ISA specifics are covered at:
+
+* <<x86-userland-assembly>> under link:userland/arch/x86_64/[], originally migrated from: https://github.com/cirosantilli/x86-assembly-cheat
+* <<arm-userland-assembly>> under originally migrated from https://github.com/cirosantilli/arm-assembly-cheat under:
+** link:userland/arch/arm/[]
+** link:userland/arch/aarch64/[]
+
+Like other userland programs, these programs can be run as explained at: <<userland-setup>>.
+
+As a quick reminder, the fastest setups to get started are:
+
+* <<userland-setup-getting-started-natively>> if your host can run the examples, e.g. x86 example on an x86 host
+* <<userland-setup-getting-started-with-prebuilt-toolchain-and-qemu-user-mode>> otherwise
+
+However, as usual, it is saner to build your toolchain as explained at: <<qemu-user-mode-getting-started>>.
+
+The first example that you want to run for each arch is:
+
+....
+./run --userland userland/arch/<arch>/add.S
+....
+
+e.g.:
+
+....
+./run --userland userland/arch/x86_64/add.S
+....
+
+Sources:
+
+* link:userland/arch/x86_64/add.S[]
+* link:userland/arch/arm/add.S[]
+* link:userland/arch/aarch64/add.S[]
+
+These examples use the venerable ADD instruction to:
+
+* introduce the basics of how a given assembly works: how many inputs / outputs, who is input and output, can it use memory or just registers, etc.
++
+It is then a big copy paste for most other data instructions.
+* verify that the venerable `add` instruction and our assertions are working
+
+Then, modify that program to make the assertion fail:
+
+....
+ASSERT_EQ(%rax, $4)
+....
+
+because 1 + 2 tends to equal 3 instead of 4.
+
+And then watch the assertion fail:
+
+....
+./build-userland
+./run --userland userland/arch/x86_64/add.S
+....
+
+with error message:
+
+....
+assert_eq_64 failed
+val1 0x3
+val2 0x4
+error: asm_main returned 1 at line 8
+....
+
+and notice how the error message gives both:
+
+* the actual assembly source line number where the failing assert was
+* the actual and expected values
+
+Other infrastructure sanity checks that you might want to look into include:
+
+* link:userland/arch/empty.S[]
+* `FAIL` tests
+** link:userland/arch/fail.S[]
+* `ASSERT_MEMCMP` tests
+** link:userland/arch/x86_64/lkmc_assert_memcmp_fail.S[]
+
+=== Assembly registers
+
+After seeing an <<userland-assembly,ADD hello world>>, you need to learn the general registers:
+
+* arm
+** link:userland/arch/arm/registers.S[]
+* aarch64
+** link:userland/arch/aarch64/registers.S[]
+** link:userland/arch/aarch64/pc.S[]
+
+Bibliography: <<armarm7>> A2.3 "ARM core registers".
+
+==== ARMv8 aarch64 x31 register
+
+Example: link:userland/arch/aarch64/x31.S[]
+
+There is no `x31` name, and the encoding can have two different names depending on the instruction:
+
+* `xzr`: zero register:
+** https://stackoverflow.com/questions/42788696/why-might-one-use-the-xzr-register-instead-of-the-literal-0-on-armv8
+** https://community.arm.com/processors/f/discussions/3185/wzr-xzr-register-s-purpose
+* `sp`: stack pointer
+
+To make things more confusing, some aliases can take either name, which makes them alias to different things, e.g. `mov` accepts both:
+
+....
+mov x0, sp
+mov x0, xzr
+....
+
+and the first one is an alias to `add` while the second an alias to `orr`.
+
+The difference is documented on a per instruction basis. Instructions that encode 31 as SP say:
+
+....
+if d == 31 then
+  SP[] = result;
+else
+  X[d] = result;
+....
+
+And then those that don't say that, B1.2.1 "Registers in AArch64 state" implies the zero register:
+
+____
+In instruction encodings, the value 0b11111 (31) is used to indicate the ZR (zero register). This
+indicates that the argument takes the value zero, but does not indicate that the ZR is implemented
+as a physical register.
+____
+
+This is also described on <<armarm8>> C1.2.5 "Register names":
+
+____
+There is no register named W31 or X31.
+
+The name SP represents the stack pointer for 64-bit operands where an encoding of the value 31 in the
+corresponding register field is interpreted as a read or write of the current stack pointer. When instructions
+do not interpret this operand encoding as the stack pointer, use of the name SP is an error.
+
+The name XZR represents the zero register for 64-bit operands where an encoding of the value 31 in the
+corresponding register field is interpreted as returning zero when read or discarding the result when written.
+When instructions do not interpret this operand encoding as the zero register, use of the name XZR is an error
+____
+
+=== Floating point assembly
+
+Keep in mind that many ISAs started floating point as an optional thing, and it later got better integrated into the main CPU, side by side with SIMD.
+
+For this reason, there are sometimes multiple ways to do floating point operations in each ISA.
+
+Let's start as usual with floating point addition + register file:
+
+* arm
+** <<arm-vadd-instruction>>
+** <<arm-vfp-registers>>
+* aarch64
+** <<armv8-aarch64-fadd-instruction>>
+** <<armv8-aarch64-floating-point-registers>>
+
+=== SIMD assembly
+
+Much like ADD for non-SIMD, start learning SIMD instructions by looking at the integer and floating point SIMD ADD instructions of each ISA:
+
+* x86
+** <<x86-addpd-instruction>>
+** <<x86-paddq-instruction>>
+* arm
+** <<arm-vadd-instruction>>
+* aarch64
+** <<armv8-aarch64-add-vector-instruction>>
+** <<armv8-aarch64-fadd-instruction>>
+
+Then it is just a huge copy paste of infinite boring details:
+
+* <<x86-simd>>
+* <<arm-simd>>
+
+=== User vs system assembly
+
+By "userland assembly", we mean "the parts of the ISA which can be freely used from userland".
+
+Most ISAs are divided into a system and userland part, and to running the system part requires elevated privileges such as <<ring0>> in x86.
+
+One big difference between both is that we can run userland assembly on <<userland-setup>>, which is easier to get running and debug.
+
+In particular, most userland assembly examples link to the C standard library: <<userland-assembly-c-standard-library>>.
+
+Userland assembly is generally simpler, and a pre-requisite for <<baremetal-setup>>.
+
+System-land assembly cheats will be put under: <<baremetal-setup>>.
+
+=== Userland assembly C standard library
+
+All examples except the <<freestanding-programs>> link to the C standard library.
+
+This allows using the C standard library for IO, which is very convenient and portable across host OSes.
+
+It also exposes other non-IO functionality that is very convenient such as `memcmp`.
+
+The C standard library infrastructure is implemented in the following files:
+
+* link:userland/arch/main.c[]
+* link:userland/arch/common.h[]
+* link:userland/arch/x86_64/common_arch.h[]
+* link:userland/arch/arm/common_arch.h[]
+* link:userland/arch/aarch64/common_arch.h[]
+
+==== Freestanding programs
+
+Unlike most our other assembly examples, which use the C standard library for portability, examples under `freestanding/` directories don't link to the C standard library.
+
+As a result, those examples cannot do IO portably, and so they make raw system calls and only be run on one given OS, e.g. Linux: <<linux-system-calls>>.
+
+Such executables are called freestanding because they don't execute the glibc initialization code, but rather start directly on our custom hand written assembly.
+
+In order to GDB step debug those executables, you will want to use `--no-continue`, e.g.:
+
+....
+./run --arch aarch64 --userland userland/arch/aarch64/freestanding/linux/hello.S --gdb-wait
+./run-gdb --arch aarch64 --no-continue --userland userland/arch/aarch64/freestanding/linux/hello.S
+....
+
+You are now left on the very first instruction of our tiny executable!
+
+=== GCC inline assembly
+
+Examples under `arch/<arch>/c/` directories show to how use inline assembly from higher level languages such as C:
+
+* x86_64
+** link:userland/arch/x86_64/c/inc.c[]
+** link:userland/arch/x86_64/c/add.c[]
+* arm
+** link:userland/arch/arm/c/inc.c[]
+** link:userland/arch/arm/c/inc_memory.c[]
+** link:userland/arch/arm/c/inc_memory_global.c[]
+** link:userland/arch/arm/c/add.c[]
+* aarch64
+** link:userland/arch/aarch64/c/earlyclobber.c[]
+** link:userland/arch/aarch64/c/inc.c[]
+** link:userland/arch/aarch64/c/multiline.cpp[]
+
+==== GCC inline assembly register variables
+
+Used notably in some of the <<linux-system-calls>> setups:
+
+* link:userland/arch/arm/reg_var.c[]
+* link:userland/arch/aarch64/reg_var.c[]
+* link:userland/arch/aarch64/reg_var_float.c[]
+
+In x86, makes it possible to access variables not exposed with the one letter register constraints.
+
+In arm, it is the only way to achieve this effect: https://stackoverflow.com/questions/10831792/how-to-use-specific-register-in-arm-inline-assembler
+
+This feature notably useful for making system calls from C, see: <<linux-system-calls>>.
+
+Documentation: https://gcc.gnu.org/onlinedocs/gcc-4.4.2/gcc/Explicit-Reg-Vars.html
+
+==== GCC inline assembly scratch registers
+
+How to use temporary registers in inline assembly:
+
+* x86_64
+** link:userland/arch/x86_64/c/scratch.c[]
+** link:userland/arch/x86_64/c/scratch_hardcode.c[]
+
+Bibliography: https://stackoverflow.com/questions/6682733/gcc-prohibit-use-of-some-registers/54963829#54963829
+
+==== GCC inline assembly early-clobbers
+
+An example of using the `&` early-clobber modifier: link:userland/arch/aarch64/earlyclobber.c
+
+More details at: https://stackoverflow.com/questions/15819794/when-to-use-earlyclobber-constraint-in-extended-gcc-inline-assembly/54853663#54853663
+
+The assertion may fail without it. It actually does fail in GCC 8.2.0.
+
+==== GCC inline assembly floating point ARM
+
+Not documented as of GCC 8.2, but possible: https://stackoverflow.com/questions/53960240/armv8-floating-point-output-inline-assembly
+
+* link:userland/arch/arm/c/inc_float.c[]
+* link:userland/arch/aarch64/c/inc_float.c[]
+
+=== Linux system calls
+
+The following <<userland-setup>> programs illustrate how to make system calls:
+
+* x86_64
+** link:userland/arch/x86_64/freestanding/linux/hello.S[]
+** link:userland/arch/x86_64/c/freestanding/linux/hello.c[]
+** link:userland/arch/x86_64/c/freestanding/linux/hello_regvar.c[]
+* arm
+** link:userland/arch/arm/freestanding/linux/hello.S[]
+** link:userland/arch/arm/c/freestanding/linux/hello.c[]
+* aarch64
+** link:userland/arch/aarch64/freestanding/linux/hello.S[]
+** link:userland/arch/aarch64/c/freestanding/linux/hello.c[]
+** link:userland/arch/aarch64/c/freestanding/linux/hello_clobbers.c[]
+
+Determining the ARM syscall numbers:
+
+* https://reverseengineering.stackexchange.com/questions/16917/arm64-syscalls-table
+* arm: https://github.com/torvalds/linux/blob/v4.17/arch/arm/tools/syscall.tbl
+* aarch64: https://github.com/torvalds/linux/blob/v4.17/include/uapi/asm-generic/unistd.h
+
+Determining the ARM syscall interface:
+
+* https://stackoverflow.com/questions/12946958/what-is-the-interface-for-arm-system-calls-and-where-is-it-defined-in-the-linux
+* https://stackoverflow.com/questions/45742869/linux-syscall-conventions-for-armv8
+
+Questions about the C inline assembly examples:
+
+* x86_64: https://stackoverflow.com/questions/9506353/how-to-invoke-a-system-call-via-sysenter-in-inline-assembly/54956854#54956854
+* ARM: https://stackoverflow.com/questions/21729497/doing-a-syscall-without-libc-using-arm-inline-assembly
+
+=== Calling conventions
+
+==== x86_64 calling convention
+
+Examples:
+
+* link:userland/arch/x86_64/common_arch.h[] `ENTRY` and `EXIT`
+
+==== ARM calling convention
+
+Call C standard library functions from assembly and vice versa.
+
+* arm
+** link:userland/arch/arm/common_arch.h[] `ENTRY` and `EXIT`
+** link:userland/arch/arm/linux/c_from_asm.S[]
+* aarch64
+** link:userland/arch/aarch64/common_arch.h[] `ENTRY` and `EXIT`
+** link:userland/arch/aarch64/c/linux/asm_from_c.c[]
+
+ARM Architecture Procedure Call Standard (AAPCS) is the name that ARM Holdings gives to the calling convention.
+
+Official specification: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042f/IHI0042F_aapcs.pdf
+
+Bibliography:
+
+* https://en.wikipedia.org/wiki/Calling_convention#ARM_(A32) Wiki contains the master list as usual.
+* http://stackoverflow.com/questions/8422287/calling-c-functions-from-arm-assembly
+* http://stackoverflow.com/questions/261419/arm-to-c-calling-convention-registers-to-save
+* https://stackoverflow.com/questions/10494848/arm-whats-the-difference-between-apcs-and-aapcs-abi
+
+=== GNU GAS assembler
+
+link:https://en.wikipedia.org/wiki/GNU_Assembler[GNU GAS] is the default assembler used by GDB, and therefore it completely dominates in Linux.
+
+The Linux kernel in particular uses GNU GAS assembly extensively for the arch specific parts under `arch/`.
+
+==== GNU GAS assembler comments
+
+In this tutorial, we use exclusively C Preprocessor `/**/` comments because:
+
+* they are the same for all archs
+* we are already stuck to the C Preprocessor because GNU GAS macros are unusable so we need `#define`
+* mixing `#` GNU GAS comments and `#define` is a bad idea ;-)
+
+But just in case you want to suffer, see this full explanation of GNU GAS comments: https://stackoverflow.com/questions/15663280/how-to-make-the-gnu-assembler-use-a-slash-for-comments/51991349#51991349
+
+Examples:
+
+* link:userland/arch/arm/comments.S[]
+* link:userland/arch/aarch64/comments.S[]
+
+==== GNU GAS assembler immediates
+
+Summary:
+
+* x86 always dollar `$` everywhere.
+* ARM: can use either `#`, `$` or nothing depending on v7 vs v8 and <<gnu-gas-assembler-arm-unified-syntax,`.syntax unified`>>.
++
+Fuller explanation at: https://stackoverflow.com/questions/21652884/is-the-hash-required-for-immediate-values-in-arm-assembly/51987780#51987780
+
+Examples:
+
+* link:userland/arch/arm/immediates.S[]
+* link:userland/arch/aarch64/immediates.S[]
+
+==== GNU GAS assembler data sizes
+
+Let's see how many bytes go into each data type:
+
+* link:userland/arch/x86_64/gas_data_sizes.S[]
+* link:userland/arch/arm/gas_data_sizes.S[]
+* link:userland/arch/aarch64/gas_data_sizes.S[]
+
+Conclusion:
+
+[options="header"]
+|===
+|.byte |.word |.long |.quad |.octa
+
+|x86
+|1
+|2
+|4
+|8
+|16
+
+|arm
+|1
+|4
+|4
+|8
+|16
+
+|aarch64
+|1
+|4
+|4
+|8
+|16
+
+|===
+
+and also keep in mind that according to the manual:
+
+* `.int` is the same as `.long`
+* `.hword` is the same as `.short` which is usually the same as `.word`
+
+Bibliography:
+
+* https://sourceware.org/binutils/docs-2.32/as/Pseudo-Ops.html#Pseudo-Ops
+* https://stackoverflow.com/questions/43005411/how-does-the-quad-directive-work-in-assembly/43006616
+* https://gist.github.com/steakknife/d47d0b19a24817f48027
+
+===== GNU GAS assembler ARM specifics
+
+====== GNU GAS assembler ARM unified syntax
+
+There are two types of ARMv7 assemblies:
+
+* `.syntax divided`
+* `.syntax unified`
+
+They are very similar, but unified is the new and better one, which we use in this tutorial.
+
+Unfortunately, for backwards compatibility, GNU AS 2.31.1 and GCC 8.2.0 still use `.syntax divided` by default.
+
+The concept of unified assembly is mentioned in ARM's official assembler documentation: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0473c/BABJIHGJ.html and is often called Unified Assembly Language (UAL).
+
+Some of the differences include:
+
+* `#` is optional in unified syntax int literals, see <<gnu-gas-assembler-immediates>>
+* many mnemonics changed:
+** most of them are condition code position changes, e.g. `andseq` vs `andeqs`: https://stackoverflow.com/questions/51184921/wierd-gcc-behaviour-with-arm-assembler-andseq-instruction
+** but there are some more drastic ones, e.g. `swi` vs `svc`: https://stackoverflow.com/questions/8459279/are-arm-instructuons-swi-and-svc-exactly-same-thing/54078731#54078731
+* cannot have implicit destination with shift, see: <<arm-shift-suffixes>>
+
+===== GNU GAS assembler ARM .n and .w suffixes
+
+When reading disassembly, many instructions have either a `.n` or `.w` suffix.
+
+`.n` means narrow, and stands for the <<arm-instruction-encodings,Thumb encoding>> of an instructions, while `.w` means wide and stands for the ARM encoding.
+
+Bibliography: https://stackoverflow.com/questions/27147043/n-suffix-to-branch-instruction
+
+== x86 userland assembly
+
+Arch agnostic infrastructure getting started at: <<userland-assembly>>.
+
+=== x86 userland assembly getting started
+
+These are the main concepts and instructions that you should learn to be able to understand what is going on.
+
+Once those are done, everything else left on userland is just to learn a huge list of instructions: <<x86-userland-assembly-instructions>>
+
+=== x86 userland assembly instructions
+
+==== x86 SIMD
+
+History:
+
+* link:https://en.wikipedia.org/wiki/MMX_(instruction_set)[MMX]: 1997
+* link:https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions[SSE]: Streaming SIMD Extensions. 1999. 128-bit XMM registers.
+* link:https://en.wikipedia.org/wiki/SSE2[SSE2]: 2004
+* link:https://en.wikipedia.org/wiki/SSE3[SSE3]: 2006
+* link:https://en.wikipedia.org/wiki/SSE4[SSE4]: 2006
+* link:https://en.wikipedia.org/wiki/Advanced_Vector_Extensions[AVX]: Advanced Vector Extensions. 2011. 256-bit YMM registers. Extension of XMM.
+* AVX2:2013
+* AVX-512: 2016. 512-bit ZMM registers. Extension of YMM.
+
+===== x86 SSE2
+
+====== x86 addpd instruction
+
+link:userland/arch/x86_64/addpd.S[]: `addps`, `addpd`
+
+Good first instruction to learn SIMD: <<simd-assembly>>
+
+====== x86 paddq instruction
+
+link:userland/arch/x86_64/paddq.S[]: `paddq`, `paddl`, `paddw`, `paddb`
+
+Good first instruction to learn SIMD: <<simd-assembly>>
+
+=== x86 rdtsc instruction
+
+TODO: review this section, make a more controlled userland experiment with <<m5ops>> instrumentation.
+
+Let's have some fun and try to correlate the gem5 <<stats-txt>> `system.cpu.numCycles` cycle count with the link:https://en.wikipedia.org/wiki/Time_Stamp_Counter[x86 `rdtsc` instruction] that is supposed to do the same thing:
+
+....
+./build-userland --static userland/arch/x86_64/c/rdtsc.c
+./run --eval './arch/x86_64/c/rdtsc.out;m5 exit;' --emulator gem5
+./gem5-stat
+....
+
+Source: link:userland/rdtsc.c[]
+
+`rdtsc` outputs a cycle count which we compare with gem5's `gem5-stat`:
+
+* `3828578153`: `rdtsc`
+* `3830832635`: `gem5-stat`
+
+which gives pretty close results, and serve as a nice sanity check that the cycle counter is coherent.
+
+It is also nice to see that `rdtsc` is a bit smaller than the `stats.txt` value, since the latter also includes the exec syscall for `m5`.
+
+Bibliography:
+
+* https://en.wikipedia.org/wiki/Time_Stamp_Counter
+* https://stackoverflow.com/questions/9887839/clock-cycle-count-wth-gcc/9887979
+
+==== ARM pmccntr
+
+TODO We didn't manage to find a working ARM analogue to <<x86-rdtsc-instruction>>: link:kernel_modules/pmccntr.c[] is oopsing, and even it if weren't, it likely won't give the cycle count since boot since it needs to be activate before it starts counting anything:
+
+* https://stackoverflow.com/questions/40454157/is-there-an-equivalent-instruction-to-rdtsc-in-arm
+* https://stackoverflow.com/questions/31620375/arm-cortex-a7-returning-pmccntr-0-in-kernel-mode-and-illegal-instruction-in-u/31649809#31649809
+* https://blog.regehr.org/archives/794
+
+== ARM userland assembly
+
+Arch general getting started at: <<userland-assembly>>.
+
+Instructions here loosely grouped based on that of the <<armarm7>> Chapter A4 "The Instruction Sets".
+
+We cover here mostly ARMv7, and then treat aarch64 differentially, since much of the ARMv7 userland is the same in aarch32.
+
+=== Introduction to the ARM architecture
+
+The link:https://en.wikipedia.org/wiki/ARM_architecture[ARM architecture] is has been used on the vast majority of mobile phones in the 2010's, and on a large fraction of micro controllers.
+
+It competes with <<x86-userland-assembly>> because its implementations are designed for low power consumption, which is a major requirement of the cell phone market.
+
+ARM is generally considered a RISC instruction set, although there are some more complex instructions which would not generally be classified as purely RISC.
+
+ARM is developed by the British funded company ARM Holdings: https://en.wikipedia.org/wiki/Arm_Holdings which originated as a joint venture between Acorn Computers, Apple  and VLSI Technology in 1990.
+
+ARM Holdings was bought by the Japanese giant SoftBank in 2016.
+
+==== ARMv8 vs ARMv7 vs AArch64 vs AArch32
+
+ARMv7 is the older architecture described at: <<armarm7>>.
+
+ARMv8 is the newer architecture ISA link:https://developer.arm.com/docs/den0024/latest/preface[released in 2013] and described at: <<armarm8>>. It can be in either of two states:
+
+* <<aarch32>>
+* aarch64
+
+In the lose terminology of this repository:
+
+* `arm` means basically AArch32
+* `aarch64` means ARMv8 AArch64
+
+ARMv8 has link:https://en.wikipedia.org/wiki/ARM_architecture#ARMv8-A[had several updates] since its release:
+
+* v8.1: 2014
+* v8.2: 2016
+* v8.3: 2016
+* v8.4: TODO
+* v8.5: 2018
+
+They are described at: <<armarm8>> A1.7 "ARMv8 architecture extensions".
+
+===== AArch32
+
+32-bit mode of operation of ARMv8.
+
+Userland is highly / fully backwards compatible with ARMv7:
+
+* https://stackoverflow.com/questions/42972096/armv8-backward-compatibility-with-armv7-snapdragon-820-vs-cortex-a15
+* https://stackoverflow.com/questions/31848185/does-armv8-aarch32-mode-has-backward-compatible-with-armv4-armv5-or-armv6
+
+For this reason, QEMU and GAS seems to enable both AArch32 and ARMv7 under `arm` rather than `aarch64`.
+
+There are however some extensions over ARMv7, many of them are functionality that ARMv8 has and that designers decided to backport on AArch32 as well, e.g.:
+
+* <<armv8-aarch32-vcvta-instruction>>
+
+===== AArch32 vs AArch64
+
+A great summary of differences can be found at: https://en.wikipedia.org/wiki/ARM_architecture#AArch64_features
+
+Some random ones:
+
+* aarch32 has two encodings: Thumb and ARM: <<arm-instruction-encodings>>
+* in ARMv8, the stack has to 16-byte aligned. Therefore, the main way to push things to stack is with 8-byte pair pushes with the <<armv8-aarch64-ldp-and-stp-instructions>>
+
+==== Free ARM implementations
+
+The ARM instruction set is itself protected by patents / copyright / whatever, and you have to pay ARM Holdings a licence to implement it, even if you are creating your own custom Verilog code.
+
+ARM has already sued people in the past for implementing ARM ISA: http://www.eetimes.com/author.asp?section_id=36&doc_id=1287452
+
+http://semiengineering.com/an-alternative-to-x86-arm-architectures/ mentions that:
+
+____
+Asanovic joked that the shortest unit of time is not the moment between a traffic light turning green in New York City and the cab driver behind the first vehicle blowing the horn; it’s someone announcing that they have created an open-source, ARM-compatible core and receiving a “cease and desist” letter from a law firm representing ARM.
+____
+
+This licensing however does have the following fairness to it: ARM Holdings invents a lot of money in making a great open source software environment for the ARM ISA, so it is only natural that it should be able to get some money from hardware manufacturers for using their ISA.
+
+Patents for very old ISAs however have expired, Amber is one implementation of those: https://en.wikipedia.org/wiki/Amber_%28processor_core%29 TODO does it have any application?
+
+
+Generally, it is mostly large companies that implement the CPUs themselves. For example, the link:https://en.wikipedia.org/wiki/Apple_A12[Apple A12 chip], which is used in iPhones, has verilog designs:
+
+____
+The A12 features an Apple-designed 64-bit ARMv8.3-A six-core CPU, with two high-performance cores running at 2.49 GHz called Vortex and four energy-efficient cores called Tempest.
+____
+
+ARM designed CPUs however are mostly called `Coretx-A<id>`: https://en.wikipedia.org/wiki/List_of_applications_of_ARM_cores Vortex and Tempest are Apple designed ones.
+Bibliography: https://www.quora.com/Why-is-it-that-you-need-a-license-from-ARM-to-design-an-ARM-CPU-How-are-the-instruction-sets-protected
+
+==== ARM instruction encodings
+
+Understanding the basics of instruction encodings is fundamental to help you to remember what instructions do and why some things are possible or not, notably the <<arm-ldr-pseudo-instruction>> and the <<arm-adr-instruction,`adrp` instruction>>.
+
+aarch32 has two "instruction sets", which to look just like encodings.
+
+Some control bit must determine which one we are currently on, and userland can switch between them with the <<arm-bx-instruction>> TODO: details.
+
+The encodings are:
+
+* A32: every instruction is 4 bytes long. Can encode every instruction.
+* T32: most common instructions are 2 bytes long. Many others less common ones are 4 bytes long.
++
+T stands for "Thumb", which is the original name for the technology. The word "Thumb" does not appear on <<armarm8>> however. It does appear on <<armarm7>> though.
++
+Example: link:userland/arch/arm/thumb.S[]
++
+See also: <<armarm8>> F2.1.3 "Instruction encodings".
+
+Within each instruction set, there can be multiple encodings for a given function, and they are noted simply as:
+
+* A1, A2, ...: A32 encodings
+* T1, T2, ..m: T32 encodings
+
+This RISC-y mostly fixed instruction length design likely makes processor design easier and allows for certain optimizations, at the cost of slightly more complex assembly, as you can't encode 4 / 8 byte addresses in a single instruction. Totally worth it IMHO.
+
+This design can be contrasted with x86, which has widely variable instruction length.
+
+Bibliography:
+
+* https://stackoverflow.com/questions/28669905/what-is-the-difference-between-the-arm-thumb-and-thumb-2-instruction-encodings
+* https://reverseengineering.stackexchange.com/questions/6080/how-to-detect-thumb-mode-in-arm-disassembly
+
+=== ARM branch instructions
+
+==== ARM b instruction
+
+Unconditional branch.
+
+Example: link:userland/arch/arm/b.S[]
+
+The encoding stores `pc` offsets in 24 bits. The destination must be a multiple of 4, which is easy since all instructions are 4 bytes.
+
+This allows for 26 bit long jumps, which is 64 MiB.
+
+TODO: what to do if we want to jump longer than that?
+
+==== ARM beq instruction
+
+Branch if equal based on the status registers.
+
+Examples:
+
+* link:userland/arch/arm/beq.S[].
+* link:userland/arch/aarch64/beq.S[].
+
+The family of instructions includes:
+
+* `beq`: branch if equal
+* `bne`: branch if not equal
+* `ble`: less or equal
+* `bge`: greater or equal
+* `blt`: less than
+* `bgt`: greater than
+
+==== ARM bl instruction
+
+Branch with link, i.e. branch and store the return address on the `rl` register.
+
+Example: link:userland/arch/arm/bl.S[]
+
+This is the major way to make function calls.
+
+The current ARM / Thumb mode is encoded in the least significant bit of lr.
+
+===== ARM bx instruction
+
+`bx`: branch and switch between ARM / Thumb mode, encoded in the least significant bit of the given register.
+
+`bx lr` is the main way to return from function calls after a `bl` call.
+
+Since `bl` encodes the current ARM / Thumb in the register, `bx` keeps the mode unchanged by default.
+
+===== ARMv8 aarch64 ret instruction
+
+Example: link:userland/arch/aarch64/ret.S[]
+
+ARMv8 AArch64 only:
+
+* there is no `bx` in AArch64 since no Thumb to worry about, so it is called just `br`
+* the `ret` instruction was added in addition to `br`, with the following differences:
+** provides a hint that this is a function call return
+** has a default argument `x30` if none is given. This is where `bl` puts the return value.
+
+See also: https://stackoverflow.com/questions/32304646/arm-assembly-branch-to-address-inside-register-or-memory/54145818#54145818
+
+==== ARM cbz instruction
+
+Compare and branch if zero.
+
+Example: link:userland/arch/aarch64/cbz.S[]
+
+Only in ARMv8 and ARMv7 Thumb mode, not in armv7 ARM mode.
+
+Very handy!
+
+==== ARM conditional execution
+
+Weirdly, <<arm-b-instruction>> and family are not the only instructions that can execute conditionally on the flags: the same also applies to most instructions, e.g. `add`.
+
+Example: link:userland/arch/arm/cond.S[]
+
+Just add the usual `eq`, `ne`, etc. suffixes just as for `b`.
+
+The list of all extensions is documented at <<armarm7>> "A8.3 Conditional execution".
+
+=== ARM load and store instructions
+
+In ARM, there are only two instruction families that do memory access: <<arm-ldr-instruction>>  to load and <<arm-str-instruction>> to store.
+
+Everything else works on register and immediates.
+
+This is part of the RISC-y beauty of the ARM instruction set, unlike x86 in which several operations can read from memory, and helps to predict how to optimize for a given CPU pipeline.
+
+This kind of architecture is called a link:https://en.wikipedia.org/wiki/Load/store_architecture[Load/store architecture].
+
+==== ARM ldr instruction
+
+===== ARM ldr pseudo-instruction
+
+`ldr` can be either a regular instruction that loads stuff into memory, or also a pseudo-instruction (assembler magic): http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0041c/Babbfdih.html
+
+The pseudo instruction version is when an equal sign appears on one of the operators.
+
+The `ldr` pseudo instruction can automatically create hidden variables in a place called the "literal pool", and load them from memory with PC relative loads.
+
+Example: link:userland/arch/arm/ldr_pseudo.S[]
+
+This is done basically because all instructions are 32-bit wide, and there is not enough space to encode 32-bit addresses in them.
+
+Bibliography:
+
+* https://stackoverflow.com/questions/37840754/what-does-an-equals-sign-on-the-right-side-of-a-ldr-instruction-in-arm-mean
+* https://stackoverflow.com/questions/17214962/what-is-the-difference-between-label-equals-sign-and-label-brackets-in-ar
+* https://stackoverflow.com/questions/14046686/why-use-ldr-over-mov-or-vice-versa-in-arm-assembly
+
+===== ARM addressing modes
+
+Example: link:userland/arch/arm/address_modes.S[]
+
+Load and store instructions can update the source register with the following modes:
+
+* offset: add an offset, don't change the address register. Notation:
++
+....
+ldr r1, [r0, 4]
+....
+* pre-indexed: change the address register, and then use it modified. Notation:
++
+....
+ldr r1, [r0, 4]!
+....
+* post-indexed: use the address register unmodified, and then modify it. Notation:
++
+....
+ldr r1, [r0], 4
+....
+
+The offset itself can come from the following sources:
+
+* immediate
+* register
+* scaled register: left shift the register and use that as an offset
+
+The indexed modes are convenient to loop over arrays.
+
+Bibliography: <<armarm7>>:
+
+* A4.6.5 "Addressing modes"
+* A8.5 "Memory accesses"
+
+====== ARM loop over array
+
+As an application of the post-indexed addressing mode, let's increment an array.
+
+Example: link:userland/arch/arm/inc_array.S[]
+
+===== ARM ldrh and ldrb instructions
+
+There are `ldr` variants that load less than full 4 bytes:
+
+* link:userland/arch/arm/ldrb.S[]: load byte
+* link:userland/arch/arm/ldrh.S[]: load half word
+
+==== ARM str instruction
+
+Store from memory into registers.
+
+Example: link:userland/arch/arm/str.S[]
+
+Basically everything that applies to <<arm-ldr-instruction>> also applies here so we won't go into much detail.
+
+===== ARMv8 aarch64 str instruction
+
+PC-relative `str` is not possible in aarch64.
+
+For `ldr` it works <<arm-ldr-instruction,as in aarch32>>.
+
+As a result, it is not possible to load from the literal pool for `str`.
+
+Example: link:userland/arch/aarch64/str.S[]
+
+This can be seen from <<armarm8>> C3.2.1 "Load/Store register": `ldr` simply has on extra PC encoding that `str` does not.
+
+===== ARMv8 aarch64 ldp and stp instructions
+
+Push a pair of registers to the stack.
+
+TODO minimal example. Currently used on link:v8/commmon_arch.h[] since it is the main way to restore register state.
+
+==== ARM ldmia instruction
+
+Pop values form stack into the register and optionally update the address register.
+
+`stmdb` is the push version.
+
+Example: link:userland/arch/arm/ldmia.S[]
+
+The mnemonics stand for:
+
+* `stmdb`: STore Multiple Decrement Before
+* `ldmia`: LoaD Multiple Increment After
+
+Example: link:userland/arch/arm/push.S[]
+
+`push` and `pop` are just mnemonics `stdmdb` and `ldmia` using the stack pointer `sp` as address register:
+
+....
+stmdb sp!, reglist
+ldmia sp!, reglist
+....
+
+The `!` indicates that we want to update the register.
+
+The registers are encoded as single bits inside the instruction: each bit represents one register.
+
+As a consequence, the push order is fixed no matter how you write the assembly instruction: there is just not enough space to encode ordering.
+
+AArch64 loses those instructions, likely because it was not possible anymore to encode all registers: http://stackoverflow.com/questions/27941220/push-lr-and-pop-lr-in-arm-arch64 and replaces them with the <<armv8-aarch64-ldp-and-stp-instructions>>
+
+=== ARM data processing instructions
+
+Arithmetic:
+
+* link:userland/arch/arm/mul.S[]: multiply
+* link:userland/arch/arm/sub.S[]: subtract
+* link:userland/arch/arm/rbit.S[]: reverse bit order
+* link:userland/arch/arm/rev.S[]: reverse byte order
+* link:userland/arch/arm/tst.S[]
+
+==== ARM cset instruction
+
+Example: link:userland/arch/aarch64/cset.S[]
+
+Set a register conditionally depending on the condition flags:
+
+ARMv8-only, likely because in ARMv8 you can't have conditional suffixes for every instruction.
+
+==== ARM bitwise instructions
+
+* link:userland/arch/arm/and.S[]
+* `eor`: exclusive OR
+* `orr`: OR
+* link:userland/arch/arm/clz.S[]: count leading zeroes
+
+===== ARM bic instruction
+
+Bitwise Bit Clear: clear some bits.
+
+....
+dest = `left & ~right`
+....
+
+Example: link:userland/arch/arm/bic.S[]
+
+===== ARM ubfm instruction
+
+Unsigned Bitfield Move.
+
+____
+copies any number of low-order bits from a source register into the same number of adjacent bits at any position in the destination register, with zeros in the upper and lower bits.
+____
+
+Example: link:userland/arch/aarch64/ubfm.S[]
+
+TODO: explain full behaviour. Very complicated. Has several simpler to understand aliases.
+
+====== ARM ubfx instruction
+
+Alias for:
+
+....
+UBFM <Wd>, <Wn>, #<lsb>, #(<lsb>+<width>-1)
+....
+
+Example: link:userland/arch/aarch64/ubfx.S[]
+
+The operation:
+
+....
+UBFX dest, src, lsb, width
+....
+
+does:
+
+....
+dest = (src & ((1 << width) - 1)) >> lsb;
+....
+
+Bibliography: https://stackoverflow.com/questions/8366625/arm-bit-field-extract
+
+===== ARM bfm instruction
+
+TODO: explain. Similar to <<arm-ubfm-instruction,`ubfm`>> but leave untouched bits unmodified.
+
+====== ARM bfi instruction
+
+Examples:
+
+* link:userland/arch/arm/bfi.S[]
+* link:userland/arch/aarch64/bfi.S[]
+
+Move the lower bits of source register into any position in the destination:
+
+* ARMv8: an alias for <<arm-bfm-instruction,`bfm`>>
+* ARMv7: a real instruction
+
+==== ARM mov instruction
+
+Move an immediate to a register, or a register to another register.
+
+Cannot load from or to memory, since only the `ldr` and `str` instruction families can do that in ARM: <<arm-load-and-store-instructions>>
+
+Example: link:userland/arch/arm/mov.S[]
+
+Since every instruction <<arm-instruction-encodings,has a fixed 4 byte size>>, there is not enough space to encode arbitrary 32-bit immediates in a single instruction, since some of the bits are needed to actually encode the instruction itself.
+
+The solutions to this problem are mentioned at:
+
+* https://stackoverflow.com/questions/38689886/loading-32-bit-values-to-a-register-in-arm-assembly
+* https://community.arm.com/processors/b/blog/posts/how-to-load-constants-in-assembly-for-arm-architecture
+
+Summary of solutions:
+
+* <<arm-movw-and-movt-instructions>>
+* place it in memory. But then how to load the address, which is also a 32-bit value?
+** use pc-relative addressing if the memory is close enough
+** use `orr` encodable shifted immediates
+
+The blog article summarizes nicely which immediates can be encoded and the design rationale:
+
+____
+An Operand 2 immediate must obey the following rule to fit in the instruction: an 8-bit value rotated right by an even number of bits between 0 and 30 (inclusive). This allows for constants such as 0xFF (0xFF rotated right by 0), 0xFF00 (0xFF rotated right by 24) or 0xF000000F (0xFF rotated right by 4).
+
+In software - especially in languages like C - constants tend to be small. When they are not small they tend to be bit masks. Operand 2 immediates provide a reasonable compromise between constant coverage and encoding space; most common constants can be encoded directly.
+____
+
+Assemblers however support magic memory allocations which may hide what is truly going on: https://stackoverflow.com/questions/14046686/why-use-ldr-over-mov-or-vice-versa-in-arm-assembly Always ask your friendly disassembly for a good confirmation.
+
+===== ARM movw and movt instructions
+
+Set the higher or lower 16 bits of a register to an immediate in one go.
+
+Example: link:userland/arch/arm/movw.S[]
+
+The armv8 version analogue is <<armv8-aarch64-movk-instruction>>.
+
+===== ARMv8 aarch64 movk instruction
+
+Fill a 64 bit register with 4 16-bit instructions one at a time.
+
+Similar to <<arm-movw-and-movt-instructions>> in v7.
+
+Example: link:userland/arch/aarch64/movk.S[]
+
+Bibliography: https://stackoverflow.com/questions/27938768/moving-a-32-bit-constant-in-arm-arch64-register
+
+===== ARMv8 aarch64 movn instruction
+
+Set 16-bits negated and the rest to `1`.
+
+Example: link:userland/arch/aarch64/movn.S[]
+
+==== ARM data processing instruction suffixes
+
+===== ARM shift suffixes
+
+Most data processing instructions can also optionally shift the second register operand.
+
+Example: link:userland/arch/arm/shift.S[]
+
+The shift types are:
+
+* `lsr` and `lfl`: Logical Shift Right / Left. Insert zeroes.
+* `ror`: Rotate Right / Left. Wrap bits around.
+* `asr`: Arithmetic Shift Right. Keep sign.
+
+Documented at: <<armarm7>> "A4.4.1 Standard data-processing instructions"
+
+===== ARM S suffix
+
+Example: link:userland/arch/arm/s_suffix.S[]
+
+The `S` suffix, present on most <<arm-data-processing-instructions>>, makes the instruction also set the Status register flags that control conditional jumps.
+
+If the result of the operation is `0`, then it triggers `beq`, since comparison is a subtraction, with success on 0.
+
+`cmp` sets the flags by default of course.
+
+==== ARM adr instruction
+
+Similar rationale to the <<arm-ldr-pseudo-instruction>>, allowing to easily store a PC-relative reachable address into a register in one go, to overcome the 4-byte fixed instruction size.
+
+Examples:
+
+* link:userland/arch/arm/adr.S[]
+* link:userland/arch/aarch64/adr.S[]
+* link:userland/arch/aarch64/adrp.S[]
+
+More details: https://stackoverflow.com/questions/41906688/what-are-the-semantics-of-adrp-and-adrl-instructions-in-arm-assembly/54042899#54042899
+
+===== ARM adrl instruction
+
+See: <<arm-adr-instruction>>.
+
+=== ARM miscellaneous instructions
+
+==== ARM nop instruction
+
+There are a few different ways to encode `nop`, notably `mov` a register into itself, and a dedicated miscellaneous instruction.
+
+Example: link:userland/arch/arm/nop.S[]
+
+Try disassembling the executable to see what the assembler is emitting:
+
+....
+gdb-multiarch -batch -ex 'arch arm' -ex "file v7/nop.out" -ex "disassemble/rs asm_main_after_prologue"
+....
+
+Bibliography: https://stackoverflow.com/questions/1875491/nop-for-iphone-binaries
+
+=== ARM SIMD
+
+==== ARM VFP
+
+The name for the ARMv7 and AArch32 floating point and SIMD instructions / registers.
+
+Vector Floating Point extension.
+
+TODO I think it was optional in ARMv7, find quote.
+
+VFP has several revisions, named as VFPv1, VFPv2, etc. TODO: announcement dates.
+
+As mentioned at: https://stackoverflow.com/questions/37790029/what-is-difference-between-arm64-and-armhf/48954012#48954012 the Linux kernel shows those capabilities in `/proc/cpuinfo` with flags such as `vfp`, `vfpv3` and others, see:
+
+* https://github.com/torvalds/linux/blob/v4.18/arch/arm/kernel/setup.c#L1199
+* https://github.com/torvalds/linux/blob/v4.18/arch/arm64/kernel/cpuinfo.c#L95
+
+When a certain version of VFP is present on a CPU, the compiler prefix typically contains the `hf` characters which stands for Hard Float, e.g.: `arm-linux-gnueabihf`. This means that the compiler will emit VFP instructions instead of just using software implementations.
+
+Bibliography:
+
+* <<armarm7>> Appendix D6 "Common VFP Subarchitecture Specification". It is not part of the ISA, but just an extension. TODO: that spec does not seem to have the instructions documented, and instruction like `VMOV` just live with the main instructions. Is `VMOV` part of VFP?
+* https://mindplusplus.wordpress.com/2013/06/25/arm-vfp-vector-programming-part-1-introduction/
+* https://en.wikipedia.org/wiki/ARM_architecture#Floating-point_(VFP)
+
+===== ARM VFP registers
+
+TODO example
+
+<<armarm8>> E1.3.1 "The SIMD and floating-point register file" Figure E1-1 "SIMD and floating-point register file, AArch32 operation":
+
+....
++-----+-----+-----+
+| S0  |     |     |
++-----+ D0  +     |
+| S1  |     |     |
++-----+-----+ Q0  |
+| S2  |     |     |
++-----+ D1  +     |
+| S3  |     |     |
++-----+-----+-----+
+| S4  |     |     |
++-----+ D2  +     |
+| S5  |     |     |
++-----+-----+ Q1  |
+| S6  |     |     |
++-----+ D3  +     |
+| S7  |     |     |
++-----+-----+-----+
+....
+
+Note how Sn is weirdly packed inside Dn, and Dn weirdly packed inside Qn, likely for historical reasons.
+
+And you can't access the higher bytes at D16 or greater with Sn.
+
+===== ARM vadd instruction
+
+* link:userland/arch/arm/vadd_scalar.S[]: see also: <<floating-point-assembly>>
+* link:userland/arch/arm/vadd_vector.S[]: see also: <<simd-assembly>>
+
+===== ARM vcvt instruction
+
+Example: link:userland/arch/arm/vcvt.S[]
+
+Convert between integers and floating point.
+
+<<armarm7>> on rounding:
+
+____
+The floating-point to fixed-point operation uses the Round towards Zero rounding mode. The fixed-point to floating-point operation uses the Round to Nearest rounding mode.
+____
+
+Notice how the opcode takes two types.
+
+E.g., in our 32-bit float to 32-bit unsigned example we use:
+
+....
+vld1.32.f32
+....
+
+====== ARM vcvtr instruction
+
+Example: link:userland/arch/arm/vcvtr.S[]
+
+Like <<arm-vcvt-instruction>>, but the rounding mode is selected by the FPSCR.RMode field.
+
+Selecting rounding mode explicitly per instruction was apparently not possible in ARMv7, but was made possible in <<aarch32>> e.g. with <<armv8-aarch32-vcvta-instruction>>.
+
+Rounding mode selection is exposed in the ANSI C standard through link:https://en.cppreference.com/w/c/numeric/fenv/feround[`fesetround`].
+
+TODO: is the initial rounding mode specified by the ELF standard? Could not find a reference.
+
+====== ARMv8 AArch32 vcvta instruction
+
+Example: link:userland/arch/arm/vcvt.S[]
+
+Added in ARMv8 <<aarch32>> only, not present in ARMv7.
+
+In ARMv7, to use a non-round-to-zero rounding mode, you had to set the rounding mode with FPSCR and use the R version of the instruction e.g. <<arm-vcvtr-instruction>>.
+
+Now in AArch32 it is possible to do it explicitly per-instruction.
+
+Also there was no ties to away mode in ARMv7. This mode does not exist in C99 either.
+
+==== ARMv8 Advanced SIMD and floating-point support
+
+The <<armarm8>> specifies floating point and SIMD support in the main architecture at A1.5 "Advanced SIMD and floating-point support".
+
+The feature is often refered to simply as "SIMD&FP" throughout the manual.
+
+The Linux kernel shows `/proc/cpuinfo` compatibility as `neon`, which is yet another intermediate name that came up at some point: <<arm-neon>>
+
+Vs <<arm-vfp>>: https://stackoverflow.com/questions/4097034/arm-cortex-a8-whats-the-difference-between-vfp-and-neon
+
+===== ARMv8 floating point availability
+
+Support is semi-mandatory. <<armarm8>> A1.5 "Advanced SIMD and floating-point support":
+
+____
+ARMv8 can support the following levels of support for Advanced SIMD and floating-point instructions:
+
+- Full SIMD and floating-point support without exception trapping.
+- Full SIMD and floating-point support with exception trapping.
+- No floating-point or SIMD support. This option is licensed only for implementations targeting specialized markets.
+
+Note: All systems that support standard operating systems with rich application environments provide hardware
+support for Advanced SIMD and floating-point. It is a requirement of the ARM Procedure Call Standard for
+AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.
+____
+
+Therefore it is in theory optional, but highly available.
+
+This is unlike ARMv7, where floating point is completely optional through <<arm-vfp>>.
+
+===== ARM NEON
+
+Just an informal name for the "Advanced SIMD instructions"? Very confusing.
+
+<<armarm8>> F2.9 "Additional information about Advanced SIMD and floating-point instructions" says:
+
+____
+The Advanced SIMD architecture, its associated implementations, and supporting software, are commonly referred to as NEON technology.
+____
+
+https://developer.arm.com/technologies/neon mentions that is is present on both ARMv7 and ARMv8:
+
+____
+NEON technology was introduced to the Armv7-A and Armv7-R profiles. It is also now an extension to the Armv8-A and Armv8-R profiles.
+____
+
+==== ARMv8 AArch64 floating point registers
+
+TODO example.
+
+<<armarm8>> B1.2.1 "Registers in AArch64 state" describes the registers:
+
+____
+32 SIMD&FP registers, `V0` to `V31`. Each register can be accessed as:
+
+* A 128-bit register named `Q0` to `Q31`.
+* A 64-bit register named `D0` to `D31`.
+* A 32-bit register named `S0` to `S31`.
+* A 16-bit register named `H0` to `H31`.
+* An 8-bit register named `B0` to `B31`.
+____
+
+Notice how Sn is very different between v7 and v8! In v7 it goes across Dn, and in v8 inside each Dn.
+
+===== ARMv8 aarch64 add vector instruction
+
+link:userland/arch/aarch64/add_vector.S[]
+
+Good first instruction to learn SIMD: <<simd-assembly>>
+
+===== ARMv8 aarch64 fadd instruction
+
+* link:userland/arch/aarch64/fadd_vector.S[]: see also: <<simd-assembly>>
+* link:userland/arch/aarch64/fadd_scalar.S[]: see also: <<floating-point-assembly>>
+
+====== ARM fadd vs vadd
+
+It is very confusing, but `fadds` and `faddd` in Aarch32 are <<gnu-gas-assembler-arm-unified-syntax,pre-UAL>> for `vadd.f32` and `vadd.f64` which we use in this tutorial: <<arm-vadd-instruction>>
+
+The same goes for most ARMv7 mnemonics: `f*` is old, and `v*` is the newer better syntax.
+
+But then, in ARMv8, they decided to use <<armv8-aarch64-fadd-instruction>> as the main floating point add name, and get rid of `vadd`!
+
+Also keep in mind that fused multiply add is `fmadd`.
+
+Examples at: <<simd-assembly>>
+
+===== ARMv8 aarch64 ld2 instruction
+
+Example: link:userland/arch/aarch64/ld2.S[]
+
+We can load multiple vectors interleaved from memory in one single instruction!
+
+This is why the `ldN` instructions take an argument list denoted by `{}` for the registers, much like armv7 <<arm-ldmia-instruction>>.
+
+There are analogous `ld3` and `ld4` instruction.
+
+==== ARM SIMD bibliography
+
+* GNU GAS tests under link:https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=tree;f=gas/testsuite/gas/aarch64;hb=00f223631fa9803b783515a2f667f86997e2cdbe[`gas/testsuite/gas/aarch64`]
+* https://stackoverflow.com/questions/2851421/is-there-a-good-reference-for-arm-neon-intrinsics
+* assembly optimized libraries:
+** https://github.com/projectNe10/Ne10
+
+=== ARM assembly bibliography
+
+==== ARM non-official bibliography
+
+Good getting started tutorials:
+
+* http://www.davespace.co.uk/arm/introduction-to-arm/
+* https://azeria-labs.com/writing-arm-assembly-part-1/
+* https://thinkingeek.com/arm-assembler-raspberry-pi/
+* http://bob.cs.sonoma.edu/IntroCompOrg-RPi/app-make.html
+
+==== ARM official bibliography
+
+The official manuals were stored in http://infocenter.arm.com but as of 2017 they started to slowly move to link:https://developer.arm.com[].
+
+Each revision of a document has a "ARM DDI" unique document identifier.
+
+The "ARM Architecture Reference Manuals" are the official canonical ISA documentation document. In this repository, we always reference the following revisions:
+
+Bibliography: https://www.quora.com/Where-can-I-find-the-official-documentation-of-ARM-instruction-set-architectures-ISAs
+
+[[armarm7]]
+===== ARMv7 architecture reference manual
+
+https://developer.arm.com/products/architecture/a-profile/docs/ddi0406/latest/arm-architecture-reference-manual-armv7-a-and-armv7-r-edition
+
+The official comprehensive ARMv7 reference.
+
+We use by default: DDI 0406C.d: https://static.docs.arm.com/ddi0406/cd/DDI0406C_d_armv7ar_arm.pdf
+
+[[armarm8]]
+===== ARMv8 architecture reference manual
+
+https://static.docs.arm.com/ddi0487/ca/DDI0487C_a_armv8_arm.pdf
+
+Latest version: https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile
+
+The official comprehensive ARMv8 reference.
+
+ISA quick references can be found in some places:
+
+* https://web.archive.org/web/20161009122630/http://infocenter.arm.com/help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf
+
+[[armv8-programmers-guide]]
+===== Programmer's Guide for ARMv8-A
+
+https://static.docs.arm.com/den0024/a/DEN0024A_v8_architecture_PG.pdf
+
+A more terse human readable introduction to the ARM architecture than the reference manuals.
+
+Does not have as many assembly code examples as you'd hope however...
+
+Latest version at: https://developer.arm.com/docs/den0024/latest/preface
+
 == Baremetal
 
 Getting started at: <<baremetal-setup>>
 
 === Baremetal GDB step debug
 
-GDB step debug works on baremetal exactly as it does on the Linux kernel, except that is is even cooler here since we can easily control and understand every single instruction that is being run!
+GDB step debug works on baremetal exactly as it does on the Linux kernel: <<gdb>>.
+
+Except that is is even cooler here since we can easily control and understand every single instruction that is being run!
 
 For example, on the first shell:
 
 ....
-./run --arch arm --baremetal interactive/prompt --wait-gdb
+./run --arch arm --baremetal baremetal/hello.c --gdb-wait
 ....
 
 then on the second shell:
 
 ....
-./run-gdb --arch arm --baremetal interactive/prompt -- main
+./run-gdb --arch arm --baremetal baremetal/hello.c -- main
 ....
 
 Or if you are a <<tmux,tmux pro>>, do everything in one go with:
 
 ....
-./run --arch arm --baremetal interactive/prompt --wait-gdb --tmux-args main
+./run --arch arm --baremetal baremetal/hello.c --gdb
 ....
 
 Alternatively, to start from the very first executed instruction of our tiny <<baremetal-bootloaders>>:
@@ -11240,22 +12969,22 @@ Alternatively, to start from the very first executed instruction of our tiny <<b
 ....
 ./run \
   --arch arm \
-  --baremetal interactive/prompt \
+  --baremetal baremetal/hello.c \
+  --gdb-wait \
   --tmux-args=--no-continue \
-  --wait-gdb \
 ;
 ....
 
-Now you can just `stepi` to when jumping into main to go to the C code in link:baremetal/interactive/prompt.c[].
+Now you can just `stepi` to when jumping into main to go to the C code in link:baremetal/hello.c[].
 
 This is specially interesting for the executables that don't use the bootloader from under `baremetal/arch/<arch>/no_bootloader/*.S`, e.g.:
 
 ....
 ./run \
   --arch arm \
-  --baremetal arch/arm/no_bootloader/semihost_exit \
+  --baremetal baremetal/arch/arm/no_bootloader/semihost_exit.S \
+  --gdb-wait \
   --tmux-args=--no-continue \
-  --wait-gdb \
 ;
 ....
 
@@ -11299,7 +13028,7 @@ It is documented at: https://developer.arm.com/docs/100863/latest/introduction
 For example, the following code makes QEMU exit:
 
 ....
-./run --arch arm --baremetal arch/arm/semihost_exit
+./run --arch arm --baremetal baremetal/arch/arm/semihost_exit.S
 ....
 
 Source: link:baremetal/arch/arm/no_bootloader/semihost_exit.S[]
@@ -11314,7 +13043,7 @@ svc 0x00123456
 
 and we can see from the docs that `0x18` stands for the `SYS_EXIT` command.
 
-This is also how we implement the `exit(0)` system call in C for QEMU for link:baremetal/exit.c[] through the Newlib via the function `_exit` at link:baremetal/lib/kwargs['c'][].
+This is also how we implement the `exit(0)` system call in C for QEMU for link:baremetal/exit0.c[] through the Newlib via the function `_exit` at link:baremetal/lib/kwargs['c'][].
 
 Other magic operations we can do with semihosting besides exiting the on the host include:
 
@@ -11383,12 +13112,12 @@ For `arm`, some baremetal examples compile fine with:
 ....
 sudo apt-get install gcc-arm-none-eabi qemu-system-arm
 ./build-baremetal --arch arm --gcc-which host-baremetal
-./run --arch arm --baremetal interactive/prompt --qemu-which host
+./run --arch arm --baremetal baremetal/hello.c --qemu-which host
 ....
 
 However, there are as usual limitations to using prebuilts:
 
-* certain examples fail to build with the Ubuntu packaged toolchain. E.g.: link:baremetal/exit.c[] fails with:
+* certain examples fail to build with the Ubuntu packaged toolchain. E.g.: link:baremetal/exit0.c[] fails with:
 +
 ....
 /usr/lib/gcc/arm-none-eabi/6.3.1/../../../arm-none-eabi/lib/libg.a(lib_a-fini.o): In function `__libc_fini_array':
@@ -11412,7 +13141,7 @@ TODO: any advantage over QEMU? I doubt it, mostly using it as as toy for now:
 Without running `./run`, do directly:
 
 ....
-./run-gdb --arch arm --baremetal interactive/prompt --sim
+./run-gdb --arch arm --baremetal baremetal/hello.c --sim
 ....
 
 Then inside GDB:
@@ -11480,8 +13209,8 @@ ARM exception levels are analogous to x86 <<ring0,rings>>.
 Print the EL at the beginning of a baremetal simulation:
 
 ....
-./run --arch arm --baremetal arch/arm/el
-./run --arch aarch64 --baremetal arch/aarch64/el
+./run --arch arm --baremetal baremetal/arch/arm/el.c
+./run --arch aarch64 --baremetal baremetal/arch/aarch64/el.c
 ....
 
 Sources:
@@ -11496,12 +13225,12 @@ The lower ELs are not mandated by the architecture, and can be controlled throug
 In QEMU, you can configure the lowest EL as explained at https://stackoverflow.com/questions/42824706/qemu-system-aarch64-entering-el1-when-emulating-a53-power-up
 
 ....
-./run --arch arm --baremetal arch/arm/el
-./run --arch arm --baremetal arch/arm/el -- -machine virtualization=on
-./run --arch arm --baremetal arch/arm/el -- -machine secure=on
-./run --arch aarch64 --baremetal arch/aarch64/el
-./run --arch aarch64 --baremetal arch/aarch64/el -- -machine virtualization=on
-./run --arch aarch64 --baremetal arch/aarch64/el -- -machine secure=on
+./run --arch arm --baremetal baremetal/arch/arm/el.c
+./run --arch arm --baremetal baremetal/arch/arm/el.c -- -machine virtualization=on
+./run --arch arm --baremetal baremetal/arch/arm/el.c -- -machine secure=on
+./run --arch aarch64 --baremetal baremetal/arch/aarch64/el.c
+./run --arch aarch64 --baremetal baremetal/arch/aarch64/el.c -- -machine virtualization=on
+./run --arch aarch64 --baremetal baremetal/arch/aarch64/el.c -- -machine secure=on
 ....
 
 outputs respectively:
@@ -11520,17 +13249,17 @@ TODO: why is `arm` stuck at `19` which equals Supervisor mode?
 In gem5, you can configure the lowest EL with:
 
 ....
-./run --arch arm --baremetal arch/arm/el --emulator gem5
+./run --arch arm --baremetal baremeta/arch/arm/el.c --emulator gem5
 cat "$(./getvar --arch arm --emulator gem5 gem5_guest_terminal_file)"
-./run --arch arm --baremetal arch/arm/el --emulator gem5 -- --param 'system.have_virtualization = True'
+./run --arch arm --baremetal baremetal/arch/arm/el.c --emulator gem5 -- --param 'system.have_virtualization = True'
 cat "$(./getvar --arch arm --emulator gem5 gem5_guest_terminal_file)"
-./run --arch arm --baremetal arch/arm/el --emulator gem5 -- --param 'system.have_security = True'
+./run --arch arm --baremetal baremetal/arch/arm/el.c --emulator gem5 -- --param 'system.have_security = True'
 cat "$(./getvar --arch arm --emulator gem5 gem5_guest_terminal_file)"
-./run --arch aarch64 --baremetal arch/aarch64/el --emulator gem5
+./run --arch aarch64 --baremetal baremetal/arch/aarch64/el.c --emulator gem5
 cat "$(./getvar --arch aarch64 --emulator gem5 gem5_guest_terminal_file)"
-./run --arch aarch64 --baremetal arch/aarch64/el --emulator gem5 -- --param 'system.have_virtualization = True'
+./run --arch aarch64 --baremetal baremetal/arch/aarch64/el.c --emulator gem5 -- --param 'system.have_virtualization = True'
 cat "$(./getvar --arch aarch64 --emulator gem5 gem5_guest_terminal_file)"
-./run --arch aarch64 --baremetal arch/aarch64/el --emulator gem5 -- --param 'system.have_security = True'
+./run --arch aarch64 --baremetal baremetal/arch/aarch64/el.c --emulator gem5 -- --param 'system.have_security = True'
 cat "$(./getvar --arch aarch64 --emulator gem5 gem5_guest_terminal_file)"
 ....
 
@@ -11552,14 +13281,14 @@ This is the most basic example of exception handling we have.
 We a handler for `svc`, do an `svc`, and observe that the handler got called and returned from C and assembly:
 
 ....
-./run --arch aarch64 --baremetal arch/aarch64/svc
-./run --arch aarch64 --baremetal arch/aarch64/svc_asm
+./run --arch aarch64 --baremetal baremetal/arch/aarch64/svc.c
+./run --arch aarch64 --baremetal baremetal/arch/aarch64/svc_asm.S
 ....
 
 Sources:
 
-* link:baremetal/arch/aarch64/svc_asm.S[]
 * link:baremetal/arch/aarch64/svc.c[]
+* link:baremetal/arch/aarch64/svc_asm.S[]
 
 Sample output for the C one:
 
@@ -11612,7 +13341,7 @@ Both QEMU and gem5 are able to trace interrupts in addition to instructions, and
 ....
 ./run \
   --arch aarch64 \
-  --baremetal arch/aarch64/svc_asm
+  --baremetal baremetal/arch/aarch64/svc_asm.S
   -- -d in_asm,int \
 ;
 ....
@@ -11639,7 +13368,7 @@ and:
 ....
 ./run \
   --arch aarch64 \
-  --baremetal arch/aarch64/svc_asm \
+  --baremetal baremetal/arch/aarch64/svc_asm.S \
   --trace ExecAll,Faults \
   --trace-stdout \
 ;
@@ -11711,10 +13440,10 @@ Bibliography:
 ==== ARM multicore
 
 ....
-./run --arch aarch64 --baremetal arch/aarch64/multicore --cpus 2
-./run --arch aarch64 --baremetal arch/aarch64/multicore --cpus 2 --emulator gem5
-./run --arch arm --baremetal arch/aarch64/multicore --cpus 2
-./run --arch arm --baremetal arch/aarch64/multicore --cpus 2 --emulator gem5
+./run --arch aarch64 --baremetal baremetal/arch/aarch64/multicore.S --cpus 2
+./run --arch aarch64 --baremetal baremetal/arch/aarch64/multicore.S --cpus 2 --emulator gem5
+./run --arch arm --baremetal baremetal/arch/aarch64/multicore.S --cpus 2
+./run --arch arm --baremetal baremetal/arch/aarch64/multicore.S --cpus 2 --emulator gem5
 ....
 
 Sources:
@@ -11729,7 +13458,7 @@ So, we need CPU 1 to come to the rescue and set that memory address to `1`, othe
 Don't believe me? Then try:
 
 ....
-./run --arch aarch64 --baremetal arch/aarch64/multicore --cpus 1
+./run --arch aarch64 --baremetal baremetal/arch/aarch64/multicore.S --cpus 1
 ....
 
 and watch it hang forever.
@@ -11737,7 +13466,7 @@ and watch it hang forever.
 Note that if you try the same thing on gem5:
 
 ....
-./run --arch aarch64 --baremetal arch/aarch64/multicore --cpus 1 --emulator gem5
+./run --arch aarch64 --baremetal baremetal/arch/aarch64/multicore.S --cpus 1 --emulator gem5
 ....
 
 then the gem5 actually exits, but with a different message:
@@ -11848,6 +13577,8 @@ TODO: create and study a minimal examples in gem5 where the `DMB` instruction le
 
 ==== ARM baremetal bibliography
 
+First, also consider the userland bibliography: <<arm-assembly-bibliography>>.
+
 The most useful ARM baremetal example sets we've seen so far are:
 
 * https://github.com/dwelch67/raspberrypi real hardware
@@ -11865,24 +13596,6 @@ A large part of the code is taken from the awesome educational OS under 2-clause
 
 I needed the following minor patches: https://github.com/NienfengYao/armv8-bare-metal/pull/1
 
-[[armarm8]]
-===== ARMv8 architecture reference manual
-
-The official comprehensive ARMv8 reference.
-
-Latest version: https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile
-
-We use: DDI 0487C.a: https://static.docs.arm.com/ddi0487/ca/DDI0487C_a_armv8_arm.pdf
-
-[[armv8-programmers-guide]]
-===== Programmer's Guide for ARMv8-A
-
-A more terse human readable introduction to the ARM architecture than the reference manuals.
-
-Latest version: https://developer.arm.com/docs/den0024/latest/preface
-
-We use: DEN0024A https://static.docs.arm.com/den0024/a/DEN0024A_v8_architecture_PG.pdf
-
 === How we got some baremetal stuff to work
 
 It is nice when thing just work.
@@ -11983,14 +13696,25 @@ We then found out that QEMU starts in EL1, and so we kept just the EL1 part, and
 
 === Baremetal tests
 
-Automatically run non-interactive baremetal tests:
+Automatically run all non-interactive baremetal tests:
 
 ....
-./test-baremetal
+./test-baremetal --arch aarch64
 ....
 
 Source: link:test-baremetal[]
 
+Analogously to <<user-mode-tests>>, we can select individual tests or directories with:
+
+....
+./test-baremetal --arch aarch64 baremetal/hello.c baremetal/arch/aarch64/no_bootloader/
+....
+
+which would run all of:
+
+* link:baremetal/hello.c[]
+* all tests under the directory: link:baremetal/arch/aarch64/no_bootloader/[]
+
 We detect if tests failed by parsing logs for the <<magic-failure-string>>.
 
 We also skip tests that cannot work on certain conditions based on their basenames, e.g.:
@@ -12033,6 +13757,262 @@ make CROSS_COMPILE_DIR=/usr/bin
 ;
 ....
 
+== Android
+
+Remember: Android AOSP is a huge undocumented piece of bloatware. It's integration into this repo will likely never be super good.
+
+Verbose setup description: https://stackoverflow.com/questions/1809774/how-to-compile-the-android-aosp-kernel-and-test-it-with-the-android-emulator/48310014#48310014
+
+Download, build and run with the prebuilt AOSP QEMU emulator and the AOSP kernel:
+
+....
+./build-android \
+  --android-base-dir /path/to/your/hd \
+  --android-version 8.1.0_r60 \
+  download \
+  build \
+;
+./run-android \
+  --android-base-dir /path/to/your/hd \
+  --android-version 8.1.0_r60 \
+;
+....
+
+Sources:
+
+* link:build-android[]
+* link:run-android[]
+
+TODO how to hack the AOSP kernel, userland and emulator?
+
+Other archs work as well as usual with `--arch` parameter. However, running in non-x86 is very slow due to the lack of KVM.
+
+Tested on: `8.1.0_r60`.
+
+=== Android image structure
+
+https://source.android.com/devices/bootloader/partitions-images
+
+The messy AOSP generates a ton of images instead of just one.
+
+When the emulator launches, we can see them through QEMU `-drive` arguments:
+
+....
+emulator: argv[21] = "-initrd"
+emulator: argv[22] = "/data/aosp/8.1.0_r60/out/target/product/generic_x86_64/ramdisk.img"
+emulator: argv[23] = "-drive"
+emulator: argv[24] = "if=none,index=0,id=system,file=/path/to/aosp/8.1.0_r60/out/target/product/generic_x86_64/system-qemu.img,read-only"
+emulator: argv[25] = "-device"
+emulator: argv[26] = "virtio-blk-pci,drive=system,iothread=disk-iothread,modern-pio-notify"
+emulator: argv[27] = "-drive"
+emulator: argv[28] = "if=none,index=1,id=cache,file=/path/to/aosp/8.1.0_r60/out/target/product/generic_x86_64/cache.img.qcow2,overlap-check=none,cache=unsafe,l2-cache-size=1048576"
+emulator: argv[29] = "-device"
+emulator: argv[30] = "virtio-blk-pci,drive=cache,iothread=disk-iothread,modern-pio-notify"
+emulator: argv[31] = "-drive"
+emulator: argv[32] = "if=none,index=2,id=userdata,file=/path/to/aosp/8.1.0_r60/out/target/product/generic_x86_64/userdata-qemu.img.qcow2,overlap-check=none,cache=unsafe,l2-cache-size=1048576"
+emulator: argv[33] = "-device"
+emulator: argv[34] = "virtio-blk-pci,drive=userdata,iothread=disk-iothread,modern-pio-notify"
+emulator: argv[35] = "-drive"
+emulator: argv[36] = "if=none,index=3,id=encrypt,file=/path/to/aosp/8.1.0_r60/out/target/product/generic_x86_64/encryptionkey.img.qcow2,overlap-check=none,cache=unsafe,l2-cache-size=1048576"
+emulator: argv[37] = "-device"
+emulator: argv[38] = "virtio-blk-pci,drive=encrypt,iothread=disk-iothread,modern-pio-notify"
+emulator: argv[39] = "-drive"
+emulator: argv[40] = "if=none,index=4,id=vendor,file=/path/to/aosp/8.1.0_r60/out/target/product/generic_x86_64/vendor-qemu.img,read-only"
+emulator: argv[41] = "-device"
+emulator: argv[42] = "virtio-blk-pci,drive=vendor,iothread=disk-iothread,modern-pio-notify"
+....
+
+The root directory is the <<initrd>> given on the QEMU CLI, which `/proc/mounts` reports at:
+
+....
+rootfs on / type rootfs (ro,seclabel,size=886392k,nr_inodes=221598)
+....
+
+This contains the <<android-init>>, which through `.rc` must be mounting mounts the drives int o the right places TODO find exact point.
+
+The drive order is:
+
+....
+system
+cache
+userdata
+encryptionkey
+vendor-qemu
+....
+
+Then, on the terminal:
+
+....
+mount | grep vd
+....
+
+gives:
+
+....
+/dev/block/vda1 on /system type ext4 (ro,seclabel,relatime,data=ordered)
+/dev/block/vde1 on /vendor type ext4 (ro,seclabel,relatime,data=ordered)
+/dev/block/vdb on /cache type ext4 (rw,seclabel,nosuid,nodev,noatime,errors=panic,data=ordered)
+....
+
+and we see that the order of `vda`, `vdb`, etc. matches that in which `-drive` were given to QEMU.
+
+Tested on: `8.1.0_r60`.
+
+==== Android images read-only
+
+From `mount`, we can see that some of the mounted images are `ro`.
+
+Basically, every image that was given to QEMU as qcow2 is writable, and that qcow2 is an overlay over the actual original image.
+
+In order to make `/system` and `/vendor` writable by using qcow2 for them as well, we must use the `-writable-system` option:
+
+....
+./run-android -- -writable-system
+....
+
+* https://android.stackexchange.com/questions/110927/how-to-mount-system-rewritable-or-read-only-rw-ro/207200#207200
+* https://stackoverflow.com/questions/13089694/adb-remount-permission-denied-but-able-to-access-super-user-in-shell-android/43163693#43163693
+
+then:
+
+....
+su
+mount -o rw,remount /system
+date >/system/a
+....
+
+Now reboot, and relaunch with `-writable-system` once again to pick up the modified qcow2 images:
+
+....
+./run-android -- -writable-system
+....
+
+and the newly created file is still there:
+
+....
+date >/system/a
+....
+
+`/system` and `/vendor` can be nuked quickly with:
+
+....
+./build-android --extra-args snod
+./build-android --extra-args vnod
+....
+
+as mentioned at: https://stackoverflow.com/questions/29023406/how-to-just-build-android-system-image and on:
+
+....
+./build-android --extra-args help
+....
+
+Tested on: `8.1.0_r60`.
+
+==== Android /data partition
+
+When I install an app like F-Droid, it goes under `/data` according to:
+
+....
+find / -iname '*fdroid*'
+....
+
+and it <<disk-persistency,persists across boots>>.
+
+`/data` is behind a RW LVM device:
+
+....
+/dev/block/dm-0 on /data type ext4 (rw,seclabel,nosuid,nodev,noatime,errors=panic,data=ordered)
+....
+
+but TODO I can't find where it comes from since I don't have the CLI tools mentioned at:
+
+* https://superuser.com/questions/131519/what-is-this-dm-0-device
+* https://unix.stackexchange.com/questions/185057/where-does-lvm-store-its-configuration
+
+However, by looking at:
+
+....
+./run-android -- -help
+....
+
+we see:
+
+....
+-data <file>                   data image (default <datadir>/userdata-qemu.img
+....
+
+which confirms the suspicion that this data goes in `userdata-qemu.img`.
+
+To reset images to their original state, just remove the qcow2 overlay and regenerate it: https://stackoverflow.com/questions/54446680/how-to-reset-the-userdata-image-when-building-android-aosp-and-running-it-on-the
+
+Tested on: `8.1.0_r60`.
+
+=== Install Android apps
+
+I don't know how to download files from the web on Vanilla android, the default browser does not download anything, and there is no `wget`:
+
+* https://android.stackexchange.com/questions/6984/how-to-download-files-from-the-web-in-the-android-browser
+* https://stackoverflow.com/questions/26775079/wget-in-android-terminal
+
+Installing with `adb install` does however work: https://stackoverflow.com/questions/7076240/install-an-apk-file-from-command-prompt
+
+link:https://f-droid.org[F-Droid] installed fine like that, however it does not have permission to install apps: https://www.maketecheasier.com/install-apps-from-unknown-sources-android/
+
+And the `Settings` app crashes so I can't change it, logcat contains:
+
+....
+No service published for: wifip2p
+....
+
+which is mentioned at: https://stackoverflow.com/questions/47839955/android-8-settings-app-crashes-on-emulator-with-clean-aosp-build
+
+We also tried to enable it from the command line with:
+
+....
+settings put secure install_non_market_apps 1
+....
+
+as mentioned at: https://android.stackexchange.com/questions/77280/allow-unknown-sources-from-terminal-without-going-to-settings-app but it didn't work either.
+
+No person alive seems to know how to pre-install apps on AOSP: https://stackoverflow.com/questions/6249458/pre-installing-android-application
+
+Tested on: `8.1.0_r60`.
+
+=== Android init
+
+For Linux in general, see: <<init>>.
+
+The `/init` executable interprets the `/init.rc` files, which is in a custom Android init system language: https://android.googlesource.com/platform/system/core/+/ee0e63f71d90537bb0570e77aa8a699cc222cfaf/init/README.md
+
+The top of that file then sources other `.rc` files present on the root directory:
+
+....
+import /init.environ.rc
+import /init.usb.rc
+import /init.${ro.hardware}.rc
+import /vendor/etc/init/hw/init.${ro.hardware}.rc
+import /init.usb.configfs.rc
+import /init.${ro.zygote}.rc
+....
+
+TODO: how is `ro.hardware` determined? https://stackoverflow.com/questions/20572781/android-boot-where-is-the-init-hardware-rc-read-in-init-c-where-are-servic It is a system property and can be obtained with:
+
+....
+getprop ro.hardware
+....
+
+This gives:
+
+....
+ranchu
+....
+
+which is the codename for the QEMU virtual platform we are running on: https://www.oreilly.com/library/view/android-system-programming/9781787125360/9736a97c-cd09-40c3-b14d-955717648302.xhtml
+
+TODO: is it possible to add a custom `.rc` file without modifying the initrd that <<android-image-structure,gets mounted on root>>? https://stackoverflow.com/questions/9768103/make-persistent-changes-to-init-rc
+
+Tested on: `8.1.0_r60`.
+
 == Benchmark this repo
 
 TODO: didn't fully port during refactor after 3b0a343647bed577586989fb702b760bd280844a. Reimplementing should not be hard.
@@ -12072,15 +14052,15 @@ cat "$(./getvar test_boot_benchmark_file)"
 Sample results at 8fb9db39316d43a6dbd571e04dd46ae73915027f:
 
 ....
-cmd ./run --arch x86_64 --eval '/poweroff.out'
+cmd ./run --arch x86_64 --eval './linux/poweroff.out'
 time 8.25
 exit_status 0
 
-cmd ./run --arch x86_64 --eval '/poweroff.out' --kvm
+cmd ./run --arch x86_64 --eval './linux/poweroff.out' --kvm
 time 1.22
 exit_status 0
 
-cmd ./run --arch x86_64 --eval '/poweroff.out' --trace exec_tb
+cmd ./run --arch x86_64 --eval './linux/poweroff.out' --trace exec_tb
 time 8.83
 exit_status 0
 instructions 2244297
@@ -12090,10 +14070,10 @@ time 213.39
 exit_status 0
 instructions 318486337
 
-cmd ./run --arch arm --eval '/poweroff.out'
+cmd ./run --arch arm --eval './linux/poweroff.out'
 time 6.62
 exit_status 0
-cmd ./run --arch arm --eval '/poweroff.out' --trace exec_tb
+cmd ./run --arch arm --eval './linux/poweroff.out' --trace exec_tb
 time 6.90
 exit_status 0
 instructions 776374
@@ -12108,11 +14088,11 @@ time 2250.40
 exit_status 0
 instructions 151981914
 
-cmd ./run --arch aarch64 --eval '/poweroff.out'
+cmd ./run --arch aarch64 --eval './linux/poweroff.out'
 time 4.94
 exit_status 0
 
-cmd ./run --arch aarch64 --eval '/poweroff.out' --trace exec_tb
+cmd ./run --arch aarch64 --eval './linux/poweroff.out' --trace exec_tb
 time 5.04
 exit_status 0
 instructions 233162
@@ -12316,266 +14296,6 @@ gem5:
 ** https://stackoverflow.com/questions/47997565/gem5-system-requirements-for-decent-performance/48941793#48941793
 ** https://github.com/gem5/gem5/issues/25
 
-== WIP
-
-Big new features that are not yet working.
-
-=== Android
-
-Remember: Android AOSP is a huge undocumented piece of bloatware. It's integration into this repo will likely never be super good.
-
-Verbose setup description: https://stackoverflow.com/questions/1809774/how-to-compile-the-android-aosp-kernel-and-test-it-with-the-android-emulator/48310014#48310014
-
-Download, build and run with the prebuilt AOSP QEMU emulator and the AOSP kernel:
-
-....
-./build-android \
-  --android-base-dir /path/to/your/hd \
-  --android-version 8.1.0_r60 \
-  download \
-  build \
-;
-./run-android \
-  --android-base-dir /path/to/your/hd \
-  --android-version 8.1.0_r60 \
-;
-....
-
-Sources:
-
-* link:build-android[]
-* link:run-android[]
-
-TODO how to hack the AOSP kernel, userland and emulator?
-
-Other archs work as well as usual with `--arch` parameter. However, running in non-x86 is very slow due to the lack of KVM.
-
-Tested on: `8.1.0_r60`.
-
-==== Android image structure
-
-https://source.android.com/devices/bootloader/partitions-images
-
-The messy AOSP generates a ton of images instead of just one.
-
-When the emulator launches, we can see them through QEMU `-drive` arguments:
-
-....
-emulator: argv[21] = "-initrd"
-emulator: argv[22] = "/data/aosp/8.1.0_r60/out/target/product/generic_x86_64/ramdisk.img"
-emulator: argv[23] = "-drive"
-emulator: argv[24] = "if=none,index=0,id=system,file=/path/to/aosp/8.1.0_r60/out/target/product/generic_x86_64/system-qemu.img,read-only"
-emulator: argv[25] = "-device"
-emulator: argv[26] = "virtio-blk-pci,drive=system,iothread=disk-iothread,modern-pio-notify"
-emulator: argv[27] = "-drive"
-emulator: argv[28] = "if=none,index=1,id=cache,file=/path/to/aosp/8.1.0_r60/out/target/product/generic_x86_64/cache.img.qcow2,overlap-check=none,cache=unsafe,l2-cache-size=1048576"
-emulator: argv[29] = "-device"
-emulator: argv[30] = "virtio-blk-pci,drive=cache,iothread=disk-iothread,modern-pio-notify"
-emulator: argv[31] = "-drive"
-emulator: argv[32] = "if=none,index=2,id=userdata,file=/path/to/aosp/8.1.0_r60/out/target/product/generic_x86_64/userdata-qemu.img.qcow2,overlap-check=none,cache=unsafe,l2-cache-size=1048576"
-emulator: argv[33] = "-device"
-emulator: argv[34] = "virtio-blk-pci,drive=userdata,iothread=disk-iothread,modern-pio-notify"
-emulator: argv[35] = "-drive"
-emulator: argv[36] = "if=none,index=3,id=encrypt,file=/path/to/aosp/8.1.0_r60/out/target/product/generic_x86_64/encryptionkey.img.qcow2,overlap-check=none,cache=unsafe,l2-cache-size=1048576"
-emulator: argv[37] = "-device"
-emulator: argv[38] = "virtio-blk-pci,drive=encrypt,iothread=disk-iothread,modern-pio-notify"
-emulator: argv[39] = "-drive"
-emulator: argv[40] = "if=none,index=4,id=vendor,file=/path/to/aosp/8.1.0_r60/out/target/product/generic_x86_64/vendor-qemu.img,read-only"
-emulator: argv[41] = "-device"
-emulator: argv[42] = "virtio-blk-pci,drive=vendor,iothread=disk-iothread,modern-pio-notify"
-....
-
-The root directory is the <<initrd>> given on the QEMU CLI, which `/proc/mounts` reports at:
-
-....
-rootfs on / type rootfs (ro,seclabel,size=886392k,nr_inodes=221598)
-....
-
-This contains the <<android-init>>, which through `.rc` must be mounting mounts the drives int o the right places TODO find exact point.
-
-The drive order is:
-
-....
-system
-cache
-userdata
-encryptionkey
-vendor-qemu
-....
-
-Then, on the terminal:
-
-....
-mount | grep vd
-....
-
-gives:
-
-....
-/dev/block/vda1 on /system type ext4 (ro,seclabel,relatime,data=ordered)
-/dev/block/vde1 on /vendor type ext4 (ro,seclabel,relatime,data=ordered)
-/dev/block/vdb on /cache type ext4 (rw,seclabel,nosuid,nodev,noatime,errors=panic,data=ordered)
-....
-
-and we see that the order of `vda`, `vdb`, etc. matches that in which `-drive` were given to QEMU.
-
-Tested on: `8.1.0_r60`.
-
-===== Android images read-only
-
-From `mount`, we can see that some of the mounted images are `ro`.
-
-Basically, every image that was given to QEMU as qcow2 is writable, and that qcow2 is an overlay over the actual original image.
-
-In order to make `/system` and `/vendor` writable by using qcow2 for them as well, we must use the `-writable-system` option:
-
-....
-./run-android -- -writable-system
-....
-
-* https://android.stackexchange.com/questions/110927/how-to-mount-system-rewritable-or-read-only-rw-ro/207200#207200
-* https://stackoverflow.com/questions/13089694/adb-remount-permission-denied-but-able-to-access-super-user-in-shell-android/43163693#43163693
-
-then:
-
-....
-su
-mount -o rw,remount /system
-date >/system/a
-....
-
-Now reboot, and relaunch with `-writable-system` once again to pick up the modified qcow2 images:
-
-....
-./run-android -- -writable-system
-....
-
-and the newly created file is still there:
-
-....
-date >/system/a
-....
-
-`/system` and `/vendor` can be nuked quickly with:
-
-....
-./build-android --extra-args snod
-./build-android --extra-args vnod
-....
-
-as mentioned at: https://stackoverflow.com/questions/29023406/how-to-just-build-android-system-image and on:
-
-....
-./build-android --extra-args help
-....
-
-Tested on: `8.1.0_r60`.
-
-===== Android /data partition
-
-When I install an app like F-Droid, it goes under `/data` according to:
-
-....
-find / -iname '*fdroid*'
-....
-
-and it <<disk-persistency,persists across boots>>.
-
-`/data` is behind a RW LVM device:
-
-....
-/dev/block/dm-0 on /data type ext4 (rw,seclabel,nosuid,nodev,noatime,errors=panic,data=ordered)
-....
-
-but TODO I can't find where it comes from since I don't have the CLI tools mentioned at:
-
-* https://superuser.com/questions/131519/what-is-this-dm-0-device
-* https://unix.stackexchange.com/questions/185057/where-does-lvm-store-its-configuration
-
-However, by looking at:
-
-....
-./run-android -- -help
-....
-
-we see:
-
-....
--data <file>                   data image (default <datadir>/userdata-qemu.img
-....
-
-which confirms the suspicion that this data goes in `userdata-qemu.img`.
-
-To reset images to their original state, just remove the qcow2 overlay and regenerate it: https://stackoverflow.com/questions/54446680/how-to-reset-the-userdata-image-when-building-android-aosp-and-running-it-on-the
-
-Tested on: `8.1.0_r60`.
-
-==== Install Android apps
-
-I don't know how to download files from the web on Vanilla android, the default browser does not download anything, and there is no `wget`:
-
-* https://android.stackexchange.com/questions/6984/how-to-download-files-from-the-web-in-the-android-browser
-* https://stackoverflow.com/questions/26775079/wget-in-android-terminal
-
-Installing with `adb install` does however work: https://stackoverflow.com/questions/7076240/install-an-apk-file-from-command-prompt
-
-link:https://f-droid.org[F-Droid] installed fine like that, however it does not have permission to install apps: https://www.maketecheasier.com/install-apps-from-unknown-sources-android/
-
-And the `Settings` app crashes so I can't change it, logcat contains:
-
-....
-No service published for: wifip2p
-....
-
-which is mentioned at: https://stackoverflow.com/questions/47839955/android-8-settings-app-crashes-on-emulator-with-clean-aosp-build
-
-We also tried to enable it from the command line with:
-
-....
-settings put secure install_non_market_apps 1
-....
-
-as mentioned at: https://android.stackexchange.com/questions/77280/allow-unknown-sources-from-terminal-without-going-to-settings-app but it didn't work either.
-
-No person alive seems to know how to pre-install apps on AOSP: https://stackoverflow.com/questions/6249458/pre-installing-android-application
-
-Tested on: `8.1.0_r60`.
-
-=== Android init
-
-For Linux in general, see: <<init>>.
-
-The `/init` executable interprets the `/init.rc` files, which is in a custom Android init system language: https://android.googlesource.com/platform/system/core/+/ee0e63f71d90537bb0570e77aa8a699cc222cfaf/init/README.md
-
-The top of that file then sources other `.rc` files present on the root directory:
-
-....
-import /init.environ.rc
-import /init.usb.rc
-import /init.${ro.hardware}.rc
-import /vendor/etc/init/hw/init.${ro.hardware}.rc
-import /init.usb.configfs.rc
-import /init.${ro.zygote}.rc
-....
-
-TODO: how is `ro.hardware` determined? https://stackoverflow.com/questions/20572781/android-boot-where-is-the-init-hardware-rc-read-in-init-c-where-are-servic It is a system property and can be obtained with:
-
-....
-getprop ro.hardware
-....
-
-This gives:
-
-....
-ranchu
-....
-
-which is the codename for the QEMU virtual platform we are running on: https://www.oreilly.com/library/view/android-system-programming/9781787125360/9736a97c-cd09-40c3-b14d-955717648302.xhtml
-
-TODO: is it possible to add a custom `.rc` file without modifying the initrd that <<android-image-structure,gets mounted on root>>? https://stackoverflow.com/questions/9768103/make-persistent-changes-to-init-rc
-
-Tested on: `8.1.0_r60`.
-
 == About this repo
 
 === Supported hosts
@@ -12587,7 +14307,7 @@ For other Linux distros, everything will likely also just work if you install th
 Find out the packages that we install with:
 
 ....
-./build --download-dependencies --dry-run | less
+./build --download-dependencies --dry-run <some-target> | less
 ....
 
 and then just look for the `apt-get` commands shown on the log.
@@ -12595,7 +14315,7 @@ and then just look for the `apt-get` commands shown on the log.
 After installing the missing packages for your distro, do the build with:
 
 ....
-./build --download-dependencies --no-apt
+./build --download-dependencies --no-apt <some-target>
 ....
 
 which does everything as normal, except that it skips any `apt` commands.
@@ -12604,7 +14324,9 @@ Ports to new host systems are welcome and will be merged.
 
 If something does not work however, <<docker>> should just work on any Linux distro.
 
-Native Windows is unlikely feasible because Buildroot is a huge set of GNU Make scripts + host tools, just do everything from inside an Ubuntu in VirtualBox instance in that case.
+Native Windows is unlikely feasible for Buildroot setups becuase Buildroot is a huge set of GNU Make scripts + host tools, just do everything from inside an Ubuntu in VirtualBox instance in that case.
+
+Some setups of this repository are however very portable, notably setups under <<userland-setup>>, e.g. <<c>>.
 
 === Common build issues
 
@@ -12770,7 +14492,7 @@ We have link:https://buildroot.org/downloads/manual/manual.html#ccache[enabled c
 * absolute paths are used and GDB can find source files
 * but builds are not reused across separated LKMC directories
 
-=== Rebuild buildroot while running
+=== Rebuild Buildroot while running
 
 It is not possible to rebuild the root filesystem while running QEMU because QEMU holds the file qcow2 file:
 
@@ -13051,93 +14773,38 @@ git -C "$(./getvar buildroot_source_dir)" checkout -
 
 === Directory structure
 
-==== include directory
+==== lkmc directory
 
-link:include/[] contains headers that are shared across both kernel modules and userland structures.
+link:lkmc/[] contains sources and headers that are shared across kernel modules, userland and baremetal examples.
 
-They contain data structs and magic constant for kernel to userland communication.
+We chose this awkward name so that our includes will have an `lkmc/` prefix.
 
-==== userland directory
+Another option would have been to name it as `includes/lkmc`, but that would make paths longer, and we might want to store source code in that directory as well in the future.
 
-Userland test programs. They can be used in the following ways:
+===== Userland objects vs header-only
 
-* inside a full system simulation, e.g.: <<qemu-buildroot-setup>>
-* inside <<user-mode-simulation>>
-* directly on the host: <<userland-directory-host-build>>
+When factoring out functionality across userland examples, there are two main options:
 
-For usage inside full system simulation, first ensure that Buildroot has been built for the toolchain, and then build the examples with:
+* use header-only implementations
+* use separate C files and link to separate objects.
 
-....
-./build-userland
-....
+The downsides of the header-only implementation are:
 
-Source: link:build-userland[].
+* slower compilation time, especially for C++
+* cannot call C implementations from assembly files
 
-This makes them visible immediately on the <<9p>> mount of a running simulator.
+The advantages of header-only implementations are:
 
-In order to place them in the root filesystem image itself, you must also run:
+* easier to use, just `#include` and you are done, no need to modify build metadata.
 
-....
-./build-buildroot
-....
+As a result, we are currently using the following rule:
 
-===== userland directory host build
-
-It is possible to build and run some of the userland examples directly on your host:
-
-....
-cd userland
-make
-./hello.out
-make clean
-....
-
-or more cleanly out of tree:
-
-....
-./build-userland --gcc-which host --userland-build-id host
-"$(./getvar --userland-build-id host userland_build_dir)/hello.out"
-....
-
-Extra make flags may be passed as:
-
-....
-./build-userland --gcc-which host --userland-build-id host-static --make-args='-B CFLAGS_EXTRA=-static'
-"$(./getvar --userland-build-id host-static userland_build_dir)/hello.out"
-....
-
-This for example would both force a rebuild due to `-B` and link statically due to `CFLAGS_EXTRA=-static`.
-
-TODO: OpenMP does not like `-static`:
-
-....
-/usr/lib/gcc/x86_64-linux-gnu/5/libgomp.a(target.o): In function `gomp_target_init':
-(.text+0xba): warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
-....
-
-See: https://stackoverflow.com/questions/23869981/linking-openmp-statically-with-gcc
-
-===== userland cheats
-
-We have accumulated considerable material in the following userland subjects.
-
-====== C
-
-Programs under link:userland/c/[] are examples of link:https://en.wikipedia.org/wiki/ANSI_C[ANSI C] programming.
-
-[[cpp]]
-====== C++
-
-Programs under link:userland/cpp/[] are examples of link:https://en.wikipedia.org/wiki/C%2B%2B#Standardization[ISO C] programming.
-
-====== POSIX
-
-Programs under link:userland/posix/[] are examples of POSIX C programming.
-
-What is POSIX:
-
-* https://stackoverflow.com/questions/1780599/what-is-the-meaning-of-posix/31865755#31865755
-* https://unix.stackexchange.com/questions/11983/what-exactly-is-posix/220877#220877
+* if something is only going to be used from C and not assembly, define it in a header which is easier to use
++
+The slower compilation should be OK as long as split functionality amongst different headers and only include the required ones.
++
+Also we don't have a choice in the case of C++ template, which must stay in headers.
+* if the functionality will be called from assembly, then we don't have a choice, and must add it to a separate source file and link against it.
 
 ==== buildroot_packages directory
 
@@ -13154,7 +14821,7 @@ Those packages get automatically added to Buildroot's `BR2_EXTERNAL`, so all you
 then test it out with:
 
 ....
-./run --eval-after '/sample_package.out'
+./run --eval-after './sample_package.out'
 ....
 
 and you should see:
@@ -13171,7 +14838,7 @@ You can force a rebuild with:
 ./build-buildroot --config 'BR2_PACKAGE_SAMPLE_PACKAGE=y' -- sample_package-reconfigure
 ....
 
-Buildroot packages are convenient, but in general, if a package if very important to you, but not really mergeable back to Buildroot, you might want to just use a custom build script for it, and point it to the Buildroot toolchain, and then use `BR2_ROOTFS_OVERLAY`, much like we do for <<userland-directory>>.
+Buildroot packages are convenient, but in general, if a package if very important to you, but not really mergeable back to Buildroot, you might want to just use a custom build script for it, and point it to the Buildroot toolchain, and then use `BR2_ROOTFS_OVERLAY`, much like we do for <<userland-setup>>.
 
 A custom build script can give you more flexibility: e.g. the package can be made work with other root filesystems more easily, have better <<9p>> support, and rebuild faster as it evades some Buildroot boilerplate.
 
@@ -13274,6 +14941,40 @@ Those files also contain arch specific helpers under ifdefs like:
 
 We try to keep as much as possible in those files. It bloats builds a little, but just makes everything simpler to understand.
 
+==== rand_check.out
+
+Print out several parameters that normally change randomly from boot to boot:
+
+....
+./run --eval-after './linux/rand_check.out;./linux/poweroff.out'
+....
+
+Source: link:userland/linux/rand_check.c[]
+
+This can be used to check the determinism of:
+
+* <<norandmaps>>
+* <<qemu-record-and-replay>>
+
+==== lkmc_home
+
+`lkmc_home` refers to the target base directory in which we put all our custom built stuff, such as <<userland-setup,userland executables>> and <<your-first-kernel-module-hack,kernel modules>>.
+
+The current value can be found with:
+
+....
+./getvar guest_lkmc_home
+....
+
+In the past, we used to dump everything into the root filesystem, but as the userland structure got more complex with subfolders, we decided that the risk of conflicting with important root files was becoming too great.
+
+To save you from typing that path every time, we have made our most common commands `cd` into that directory by default for you, e.g.:
+
+* interactive shells `cd` there through <<busybox-shell-initrc-files>>
+* `--eval` and `--eval-after` through <<replace-init>> and <<init-busybox>>
+
+Whenever a relative path is used inside a guest sample command, e.g. `insmod hello.ko` or `./hello.out`, it means that the path lives in `lkmc_home` unless stated otherwise.
+
 === Test this repo
 
 ==== Automated tests
@@ -13344,6 +15045,10 @@ You can then see which tests failed on the test summary report at the end.
 
 ===== Test userland in full system
 
+TODO: we really need a mechanism to automatically generate the test list automatically e.g. based on <<path-properties>>, currently there are many tests missing, and we have to add everything manually which is very annoying.
+
+We could just generate it on the fly on the host, and forward it to guest through CLI arguments.
+
 Run all userland tests from inside full system simulation (i.e. not <<user-mode-simulation>>):
 
 ....
@@ -13352,7 +15057,7 @@ Run all userland tests from inside full system simulation (i.e. not <<user-mode-
 
 This includes, in particular, userland programs that test the kernel modules, which cannot be tested in user mode simulation.
 
-Basically just boots and runs: link:rootfs_overlay/test_all.sh[]
+Basically just boots and runs: link:rootfs_overlay/lkmc/test_all.sh[]
 
 Failure is detected by looking for the <<magic-failure-string>>
 
@@ -13375,14 +15080,14 @@ Sources:
 If a test fails, re-run the test commands manually and use `--verbose` to understand what happened:
 
 ....
-./run --arch arm --background --baremetal add --wait-gdb &
-./run-gdb --arch arm --baremetal add --verbose -- main
+./run --arch arm --background --baremetal baremetal/add.c --gdb-wait &
+./run-gdb --arch arm --baremetal baremetal/add.c --verbose -- main
 ....
 
 and possibly repeat the GDB steps manually with the usual:
 
 ....
-./run-gdb --arch arm --baremetal add --no-continue --verbose
+./run-gdb --arch arm --baremetal baremetal/add.c --no-continue --verbose
 ....
 
 To debug GDB problems on gem5, you might want to enable the following <<gem5-tracing,tracing>> options:
@@ -13390,8 +15095,8 @@ To debug GDB problems on gem5, you might want to enable the following <<gem5-tra
 ....
 ./run \
   --arch arm \
-  --baremetal add \
-  --wait-gdb \
+  --baremetal baremetal/add.c \
+  --gdb-wait \
   --trace GDBRecv,GDBSend \
   --trace-stdout \
 ;
@@ -13399,20 +15104,82 @@ To debug GDB problems on gem5, you might want to enable the following <<gem5-tra
 
 ===== Magic failure string
 
-Since there is no standardized exit status concept that works across all emulators for full system, we just parse the terminal output for a magic failure string to check if tests failed.
+We do not know of any way to set the emulator exit status in QEMU arm full system.
 
-If a full system simulation outputs a line containing only exactly the magic string:
+For other arch / emulator combinations, we know how to do it:
+
+* aarch64: aarch64 semihosting supports exit status
+* gem5: <<m5-fail>> works on all archs
+* user mode: QEMU forwards exit status, gem5 we do some log parsing: <<gem5-syscall-emulation-exit-status>>
+
+Since we can't do it for QEMU arm, the only reliable solution is to just parse the guest serial output for a magic failure string to check if tests failed.
+
+Our run scripts parse the serial output looking for a line line containing only exactly the magic regular expression:
 
 ....
-lkmc_test_fail
+lkmc_exit_status_(\d+)
 ....
 
-to the terminal, then our run scripts detect that and exit with status `1`.
+and then exit with the given regular expression, e.g.:
 
-This magic output string is notably used by:
+....
+./run --arch aarch64 baremetal/return2.c
+echo $?
+....
 
-* the `lkmc_assert_fail()` function, which is used by <<baremetal-tests>>
-* link:rootfs_overlay/test_fail.sh[], which is used by <<test-userland-in-full-system>>
+should output:
+
+....
+2
+....
+
+This magic output string is notably generated by:
+
+* link:rootfs_overlay/lkmc/test_fail.sh[], which is used by <<test-userland-in-full-system>>
+* the `exit()` baremetal function when `status != 1`.
++
+Unfortunately the only way we found to set this up was with `on_exit`: link:https://github.com/cirosantilli/linux-kernel-module-cheat/issues/59[].
++
+Trying to patch `_exit` directly fails since at that point some de-initialization has already happened which prevents the print.
++
+So setup this `on_exit` automatically from all our <<baremetal-bootloaders>>, so it just works automatically for the examples that use the bootloaders: https://stackoverflow.com/questions/44097610/pass-parameter-to-atexit/49659697#49659697
++
+The following examples end up testing that our setup is working:
++
+* link:baremetal/assert_fail.c[]
+* link:baremetal/lkmc_assert_fail.c[]
+* link:baremetal/return1.c[]
+* link:baremetal/return2.c[]
+* link:baremetal/exit0.c[]
+* link:baremetal/exit1.c[]
+* link:baremetal/arch/arm/return1.S[]
+* link:baremetal/arch/aarch64/return1.S[]
+
+Beware that on Linux kernel simulations, you cannot even echo that string from userland, since userland stdout shows up on the serial.
+
+====== baremetal assert
+
+TODO: implement enough syscalls for it, so we can get the error line:
+
+....
+cd baremetal
+ln -s ../lkmc/assert_fail.c
+cd ..
+./build --arch aarch64
+....
+
+fails with:
+
+....
+/path/to/linux-kernel-module-cheat/out/crosstool-ng/build/default/install/aarch64/lib/gcc/aarch64-unknown-elf/8.1.0/../../../../aarch64-unknown-elf/lib/libg.a(lib_a-signalr.o): In function `_kill_r':
+/path/to/linux-kernel-module-cheat/out/crosstool-ng/build/default/build/aarch64-unknown-elf/src/newlib/newlib/libc/reent/signalr.c:53: undefined reference to `_kill'
+/path/to/linux-kernel-module-cheat/out/crosstool-ng/build/default/build/aarch64-unknown-elf/src/newlib/newlib/libc/reent/signalr.c:53:(.text+0x20): relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol `_kill'
+/path/to/linux-kernel-module-cheat/out/crosstool-ng/build/default/install/aarch64/lib/gcc/aarch64-unknown-elf/8.1.0/../../../../aarch64-unknown-elf/lib/libg.a(lib_a-signalr.o): In function `_getpid_r':
+/path/to/linux-kernel-module-cheat/out/crosstool-ng/build/default/build/aarch64-unknown-elf/src/newlib/newlib/libc/reent/signalr.c:83: undefined reference to `_getpid'
+/path/to/linux-kernel-module-cheat/out/crosstool-ng/build/default/build/aarch64-unknown-elf/src/newlib/newlib/libc/reent/signalr.c:83:(.text+0x44): relocation truncated to fit: R_AARCH64_JUMP26 against undefined symbol `_getpid'
+....
+
+at 406ee82cf33a6e3df0067b219b0414c59d7018b3 + 1.
 
 ==== Non-automated tests
 
@@ -13423,7 +15190,7 @@ For the Linux kernel, do the following manual tests for now.
 Shell 1:
 
 ....
-./run --wait-gdb
+./run --gdb-wait
 ....
 
 Shell 2:
@@ -13436,8 +15203,8 @@ Should break GDB at `start_kernel`.
 
 Then proceed to do the following tests:
 
-* `/count.sh` and `break __x64_sys_write`
-* `insmod /timer.ko` and `break lkmc_timer_callback`
+* `./count.sh` and `break __x64_sys_write`
+* `insmod timer.ko` and `break lkmc_timer_callback`
 
 ===== Test the Internet
 
@@ -13447,6 +15214,13 @@ You should also test that the Internet works:
 ./run --arch x86_64 --kernel-cli '- lkmc_eval="ifup -a;wget -S google.com;poweroff;"'
 ....
 
+===== CLI script tests
+
+`build-userland` and `test-user-mode` have a wide variety of target selection modes, and it was hard to keep them all working without some tests:
+
+* link:test-build-userland[]
+* link:test-test-user-mode[]
+
 === Bisection
 
 When updating the Linux kernel, QEMU and gem5, things sometimes break.
@@ -13486,6 +15260,40 @@ git submodule update
 
 TODO broken, fix: An example of Linux kernel commit bisection on gem5 boots can be found at: link:bisect-linux-boot-gem5[].
 
+[[path-properties]]
+=== path_properties
+
+In order to build and run each userland and <<baremetal-setup,baremetal>> example properly, we need per-file metadata such as compiler flags and required number of cores.
+
+This data is stored is stored in link:path_properties.py[] at `path_properties_tuples`.
+
+Maybe we should embed it magically into source files directories to make it easier to see? But one big Python dict was easier to implement so we started like this. And it allows factoring chunks out easily.
+
+The format is as follows:
+
+....
+'path_component': (
+    {'property': value},
+    {
+        'child_path_component':
+        {
+            {'child_property': },
+            {}
+        }
+    }
+)
+....
+
+and as a shortcut, paths that don't have any children can be written directly as:
+
+.....
+'path_component': {'property': value}
+.....
+
+Properties of parent directories apply to all children.
+
+Lists coming from parent directories are extended instead of overwritten by children, this is especially useful for C compiler flags.
+
 === Update a forked submodule
 
 This is a template update procedure for submodules for which we have some patches on on top of mainline.
@@ -13512,42 +15320,6 @@ git rebase --onto "$next_mainline_revision" "$last_mainline_revision"
 git commit -m "linux: update to ${next_mainline_revision}"
 ....
 
-=== Sanity checks
-
-Basic C and C++ hello worlds:
-
-....
-/hello.out
-/hello_cpp.out
-....
-
-Output:
-
-....
-hello
-hello cpp
-....
-
-Sources:
-
-* link:userland/hello.c[]
-* link:userland/hello_cpp.c[]
-
-==== rand_check.out
-
-Print out several parameters that normally change randomly from boot to boot:
-
-....
-./run --eval-after '/rand_check.out;/poweroff.out'
-....
-
-Source: link:userland/rand_check.c[]
-
-This can be used to check the determinism of:
-
-* <<norandmaps>>
-* <<qemu-record-and-replay>>
-
 === Release
 
 ==== Release procedure
@@ -13768,7 +15540,14 @@ We haven't found the ultimate distro yet, here is a summary table of trade-offs
 * (7): `ls recipes-* | wc`
 * (8): Poky reference system: http://git.yoctoproject.org/cgit/cgit.cgi/poky
 
-=== Fairy tale
+Other interesting possibilities that I haven't evaluated well:
+
+* NixOS https://nixos.org/ Seems to support full build from source well. Not much cross compilation information however.
+* Gentoo https://en.wikipedia.org/wiki/Gentoo_Linux Seems to support full build from source well.
+
+=== Soft topics
+
+==== Fairy tale
 
 ____
 Once upon a time, there was a boy called Linus.
@@ -13792,6 +15571,52 @@ And so everyone was happy. Except some of the old weird kernel hackers who wante
 THE END
 ____
 
+==== Should you waste your life with systems programming?
+
+Being the hardcore person who fully understands an important complex system such as a computer, it does have a nice ring to it doesn't it?
+
+But before you dedicate your life to this nonsense, do consider the following points:
+
+* almost all contributions to the kernel are done by large companies, and if you are not an employee in one of them, you are likely not going to be able to do much.
++
+This can be inferred by the fact that the `devices/` directory is by far the largest in the kernel.
++
+The kernel is of course just an interface to hardware, and the hardware developers start developing their kernel stuff even before specs are publicly released, both to help with hardware development and to have things working when the announcement is made.
++
+Furthermore, I believe that there are in-tree devices which have never been properly publicly documented. Linus is of course fine with this, since code == documentation for him, but it is not as easy for mere mortals.
++
+There are some less hardware bound higher level layers in the kernel which might not require being in a hardware company, and a few people must be living off it.
++
+But of course, those are heavily motivated by the underlying hardware characteristics, and it is very likely that most of the people working there were previously at a hardware company.
++
+In that sense, therefore, the kernel is not as open as one might want to believe.
+* it is impossible to become rich with this knowledge.
++
+This is partly implied by the fact that you need to be in a big company to make useful low level things, and therefore you will only be a tiny cog in the engine.
++
+The key problem is that the entry cost of hardware design is just too insanely high for startups in general.
+* Is learning this the most useful thing that you think can do for society?
++
+Or are you just learning it for job security and having a nice sounding title?
++
+I'm not a huge fan of the person, but I think Jobs said it right: https://www.youtube.com/watch?v=FF-tKLISfPE
++
+First determine the useful goal, and then backtrack down to the most efficient thing you can do to reach it.
+* there are two things that sadden me compared to physics-based engineering:
++
+--
+** you will never become eternally famous. All tech disappears sooner or later, while laws of nature, at least as useful approximations, stay unchanged.
+** every problem that you face is caused by imperfections introduced by other humans.
++
+It is much easier to accept limitations of physics, and even natural selection in biology, which is are produced by a sentient being (?).
+--
++
+Physics-based engineering, just like low level hardware, is of course completely closed source however, since wrestling against the laws of physics is about the most expensive thing humans can do.
+
+Are you fine with those points, and ready to continue wasting your life?
+
+Good. In that case, read on, and let's have some fun together ;-)
+
 === Bibliography
 
 Runnable stuff:
diff --git a/baremetal/add.c b/baremetal/add.c
deleted file mode 100644
index b363317..0000000
--- a/baremetal/add.c
+++ /dev/null
@@ -1,13 +0,0 @@
-#include <lkmc.h>
-
-int main(void) {
-    int i, j, k;
-    i = 1;
-    /* test-gdb-op1 */
-    j = 2;
-    /* test-gdb-op2 */
-    k = i + j;
-    /* test-gdb-result */
-    if (k != 3)
-        lkmc_assert_fail();
-}
diff --git a/baremetal/add.c b/baremetal/add.c
new file mode 120000
index 0000000..42c36f0
--- /dev/null
+++ b/baremetal/add.c
@@ -0,0 +1 @@
+../lkmc/add.c
\ No newline at end of file
diff --git a/baremetal/add.py b/baremetal/add.py
deleted file mode 100644
index 0f42d2b..0000000
--- a/baremetal/add.py
+++ /dev/null
@@ -1,9 +0,0 @@
-def test(self):
-    self.sendline('tbreak main')
-    self.sendline('continue')
-    self.continue_to('op1')
-    assert self.get_int('i') == 1
-    self.continue_to('op2')
-    assert self.get_int('j') == 2
-    self.continue_to('result')
-    assert self.get_int('k') == 3
diff --git a/baremetal/add.py b/baremetal/add.py
new file mode 120000
index 0000000..d819366
--- /dev/null
+++ b/baremetal/add.py
@@ -0,0 +1 @@
+../lkmc/add.py
\ No newline at end of file
diff --git a/baremetal/arch/aarch64/add.S b/baremetal/arch/aarch64/add.S
index 6c08fe8..caaffae 100644
--- a/baremetal/arch/aarch64/add.S
+++ b/baremetal/arch/aarch64/add.S
@@ -1,12 +1,13 @@
 .global main
 main:
     /* 1 + 2 == 3 */
-    mov x0, #1
+    mov x0, 1
     /* test-gdb-op1 */
-    add x1, x0, #2
+    add x1, x0, 2
     /* test-gdb-result */
-    cmp x1, #3
+    cmp x1, 3
     beq 1f
     bl lkmc_assert_fail
 1:
+    mov x0, 0
     ret
diff --git a/baremetal/arch/aarch64/c_from_as.S b/baremetal/arch/aarch64/c_from_as.S
index 8359afb..c6cef1f 100644
--- a/baremetal/arch/aarch64/c_from_as.S
+++ b/baremetal/arch/aarch64/c_from_as.S
@@ -1,5 +1,5 @@
 /* Call a C function. */
 .global main
 main:
-    mov x0, #0
+    mov x0, 0
     bl exit
diff --git a/baremetal/arch/aarch64/fadd.S b/baremetal/arch/aarch64/fadd.S
index f9259f7..e974e6c 100644
--- a/baremetal/arch/aarch64/fadd.S
+++ b/baremetal/arch/aarch64/fadd.S
@@ -3,43 +3,45 @@
 .global main
 main:
     /* 1.5 + 2.5 == 4.0 */
-    fmov d0, #1.5
+    fmov d0, 1.5
     /* test-gdb-d0 */
-    fmov d1, #2.5
+    fmov d1, 2.5
     /* test-gdb-d1 */
     fadd d2, d0, d1
     /* test-gdb-d2 */
-    fmov d3, #4.0
+    fmov d3, 4.0
     fcmp d2, d3
     beq 1f
     bl lkmc_assert_fail
 1:
 
     /* Now in 32-bit. */
-    fmov s0, #1.5
+    fmov s0, 1.5
     /* test-gdb-s0 */
-    fmov s1, #2.5
+    fmov s1, 2.5
     /* test-gdb-s1 */
     fadd s2, s0, s1
     /* test-gdb-s2 */
     fadd s2, s0, s1
-    fmov s3, #4.0
+    fmov s3, 4.0
     fcmp s2, s3
     beq 1f
     bl lkmc_assert_fail
 1:
 
     /* Higher registers. */
-    fmov d28, #1.5
+    fmov d28, 1.5
     /* test-gdb-d28 */
-    fmov d29, #2.5
+    fmov d29, 2.5
     /* test-gdb-d29 */
     fadd d30, d28, d29
     /* test-gdb-d30 */
-    fmov d31, #4.0
+    fmov d31, 4.0
     /* test-gdb-d31 */
     fcmp d30, d31
     beq 1f
     bl lkmc_assert_fail
 1:
+
+    mov x0, 0
     ret
diff --git a/baremetal/arch/aarch64/multicore.S b/baremetal/arch/aarch64/multicore.S
index 45d8f8f..742366a 100644
--- a/baremetal/arch/aarch64/multicore.S
+++ b/baremetal/arch/aarch64/multicore.S
@@ -3,7 +3,7 @@
 .global main
 main:
     /* Reset spinlock. */
-    mov x0, #0
+    mov x0, 0
     ldr x1, =spinlock
     str x0, [x1]
 
@@ -66,6 +66,7 @@ spinlock_start:
     wfe
     cbz x0, spinlock_start
 
+    mov x0, 0
     ret
 
 spinlock:
diff --git a/baremetal/arch/aarch64/no_bootloader/gem5_exit.S b/baremetal/arch/aarch64/no_bootloader/gem5_exit.S
index 067f46d..0c9ba89 100644
--- a/baremetal/arch/aarch64/no_bootloader/gem5_exit.S
+++ b/baremetal/arch/aarch64/no_bootloader/gem5_exit.S
@@ -1,3 +1,5 @@
+#include <lkmc/m5ops.h>
+
 .global mystart
 mystart:
-    mov x0, #0; .inst 0XFF000110 | (0x21 << 16);
+    LKMC_M5OPS_EXIT_ASM
diff --git a/baremetal/arch/aarch64/no_bootloader/semihost_exit.S b/baremetal/arch/aarch64/no_bootloader/semihost_exit.S
index 5d87c96..634880a 100644
--- a/baremetal/arch/aarch64/no_bootloader/semihost_exit.S
+++ b/baremetal/arch/aarch64/no_bootloader/semihost_exit.S
@@ -6,7 +6,7 @@ mystart:
     movk x1, 2, lsl 16
     ldr x2, =semihost_args
     str x1, [x2, 0]
-    mov x0, #0
+    mov x0, 0
     str x0, [x2, 8]
     mov x1, x2
     mov w0, 0x18
diff --git a/baremetal/arch/aarch64/regs.S b/baremetal/arch/aarch64/regs.S
index 894e0d1..6015239 100644
--- a/baremetal/arch/aarch64/regs.S
+++ b/baremetal/arch/aarch64/regs.S
@@ -4,26 +4,26 @@
  */
 .global main
 main:
-    mov x0, #1
+    mov x0, 1
     /* test-gdb-x0 */
-    mov x1, #2
+    mov x1, 2
     /* test-gdb-x1 */
 
-    mov x29, #1
+    mov x29, 1
     /* test-gdb-x29 */
-    mov x30, #2
+    mov x30, 2
     /* test-gdb-x30 */
 
-    fmov d0, #1.5
+    fmov d0, 1.5
     /* test-gdb-d0 */
-    fmov d1, #2.5
+    fmov d1, 2.5
     /* test-gdb-d1 */
 
-    fmov d30, #1.5
+    fmov d30, 1.5
     /* test-gdb-d30 */
-    fmov d31, #2.5
+    fmov d31, 2.5
     /* test-gdb-d31 */
 
     /* Exit required since we messed up with x30 which is the lr. */
-    mov x0, #0
+    mov x0, 0
     bl exit
diff --git a/baremetal/arch/aarch64/return.S b/baremetal/arch/aarch64/return.S
index b16d907..25a43ec 100644
--- a/baremetal/arch/aarch64/return.S
+++ b/baremetal/arch/aarch64/return.S
@@ -1,4 +1,5 @@
 /* Return to ensure that the post main works. */
 .global main
 main:
+    mov x0, 0
     ret
diff --git a/baremetal/arch/aarch64/return1.S b/baremetal/arch/aarch64/return1.S
new file mode 100644
index 0000000..3e9b898
--- /dev/null
+++ b/baremetal/arch/aarch64/return1.S
@@ -0,0 +1,5 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#magic-failure-string */
+.global main
+main:
+    mov x0, 1
+    ret
diff --git a/baremetal/arch/aarch64/svc_asm.S b/baremetal/arch/aarch64/svc_asm.S
index 95f1df4..2803a2f 100644
--- a/baremetal/arch/aarch64/svc_asm.S
+++ b/baremetal/arch/aarch64/svc_asm.S
@@ -16,6 +16,7 @@ main:
 1:
 
     /* Go home. */
+    mov x0, 0
     ret
 
 LKMC_GLOBAL(lkmc_vector_trap_handler)
diff --git a/baremetal/arch/arm/add.S b/baremetal/arch/arm/add.S
index bb9eb0f..254ae1a 100644
--- a/baremetal/arch/arm/add.S
+++ b/baremetal/arch/arm/add.S
@@ -9,4 +9,5 @@ main:
     beq 1f
     bl lkmc_assert_fail
 1:
+    mov r0, #0
     bx lr
diff --git a/baremetal/arch/arm/dump_regs.c b/baremetal/arch/arm/dump_regs.c
new file mode 100644
index 0000000..5b60bd6
--- /dev/null
+++ b/baremetal/arch/arm/dump_regs.c
@@ -0,0 +1,15 @@
+/* I want to move el and all other "what's the initial value of such system register" into here. */
+
+#include <stdio.h>
+#include <inttypes.h>
+
+int main(void) {
+    uint32_t cpsr;
+    uint32_t mvfr1;
+    __asm__ ("mrs %0, cpsr" : "=r" (cpsr) : :);
+    /* TODO this is blowing up an exception, how to I read from it? */
+    /*__asm__ ("vmrs %0, mvfr1" : "=r" (mvfr1) : :);*/
+    printf("cpsr  %" PRIx32 "\n", cpsr);
+    /*printf("mvfr1 %" PRIx32 "\n", mvfr1);*/
+    return 0;
+}
diff --git a/baremetal/arch/arm/gem5_assert.S b/baremetal/arch/arm/gem5_assert.S
index 97791c5..b6cbbe9 100644
--- a/baremetal/arch/arm/gem5_assert.S
+++ b/baremetal/arch/arm/gem5_assert.S
@@ -1,8 +1,10 @@
 /* assert 0x12345678 + 1 == 0x12345679 */
+
+#include <lkmc/m5ops.h>
+
 .global main
 main:
-    movw r0, #:lower16:myvar
-    movt r0, #:upper16:myvar
+    adr r0, myvar
     ldr r1, [r0]
     add r1, r1, #1
     str r1, [r0]
@@ -10,10 +12,8 @@ main:
     movt r2, #0x1234
     cmp r1, r2
     beq ok
-    /* m5 fail 1 */
-    mov r0, #0; mov r1, #0; mov r2, #1; mov r3, #0; .inst 0xEE000110 | (0x22 << 16);
+    LKMC_M5OPS_FAIL_1_ASM
 ok:
-    /* m5 exit */
-    mov r0, #0; mov r1, #0; .inst 0xEE000110 | (0x21 << 16);
+    LKMC_M5OPS_EXIT_ASM
 myvar:
     .word 0x12345678
diff --git a/baremetal/arch/arm/multicore.S b/baremetal/arch/arm/multicore.S
index a051de4..b56ebbb 100644
--- a/baremetal/arch/arm/multicore.S
+++ b/baremetal/arch/arm/multicore.S
@@ -32,6 +32,7 @@ spinlock_start:
     wfe
     cmp r0, #0
     beq spinlock_start
+    mov r0, #0
     bx lr
 spinlock:
     .skip 4
diff --git a/baremetal/arch/arm/regs.S b/baremetal/arch/arm/regs.S
index dfca10c..4b38b82 100644
--- a/baremetal/arch/arm/regs.S
+++ b/baremetal/arch/arm/regs.S
@@ -5,4 +5,5 @@ main:
     /* test-gdb-r0 */
     mov r1, #2
     /* test-gdb-r1 */
+    mov r0, #0
     bx lr
diff --git a/baremetal/arch/arm/return1.S b/baremetal/arch/arm/return1.S
new file mode 100644
index 0000000..70123b0
--- /dev/null
+++ b/baremetal/arch/arm/return1.S
@@ -0,0 +1,5 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#magic-failure-string */
+.global main
+main:
+    mov r0, #1
+    bx lr
diff --git a/baremetal/assert_fail.c b/baremetal/assert_fail.c
new file mode 120000
index 0000000..0263f59
--- /dev/null
+++ b/baremetal/assert_fail.c
@@ -0,0 +1 @@
+../lkmc/assert_fail.c
\ No newline at end of file
diff --git a/baremetal/exit.c b/baremetal/exit.c
deleted file mode 100644
index 98bcda2..0000000
--- a/baremetal/exit.c
+++ /dev/null
@@ -1,6 +0,0 @@
-#include <stdio.h>
-#include <stdlib.h>
-
-int main(void) {
-    exit(0);
-}
diff --git a/baremetal/exit0.c b/baremetal/exit0.c
new file mode 100644
index 0000000..736e9d3
--- /dev/null
+++ b/baremetal/exit0.c
@@ -0,0 +1,7 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#magic-failure-string */
+
+#include <stdlib.h>
+
+int main(void) {
+    exit(0);
+}
diff --git a/baremetal/exit1.c b/baremetal/exit1.c
new file mode 100644
index 0000000..dc8ceb6
--- /dev/null
+++ b/baremetal/exit1.c
@@ -0,0 +1,7 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#magic-failure-string */
+
+#include <stdlib.h>
+
+int main(void) {
+    exit(1);
+}
diff --git a/baremetal/interactive/prompt.c b/baremetal/getchar.c
similarity index 89%
rename from baremetal/interactive/prompt.c
rename to baremetal/getchar.c
index 49f4d8a..f3e01f2 100644
--- a/baremetal/interactive/prompt.c
+++ b/baremetal/getchar.c
@@ -1,3 +1,5 @@
+/* Test that input request through serial also works. */
+
 #include <stdio.h>
 #include <stdlib.h>
 
diff --git a/baremetal/hello.c b/baremetal/hello.c
deleted file mode 100644
index 20d437d..0000000
--- a/baremetal/hello.c
+++ /dev/null
@@ -1,6 +0,0 @@
-#include <stdio.h>
-
-int main(void) {
-    puts("hello");
-    return 0;
-}
diff --git a/baremetal/hello.c b/baremetal/hello.c
new file mode 120000
index 0000000..d00921e
--- /dev/null
+++ b/baremetal/hello.c
@@ -0,0 +1 @@
+../lkmc/hello.c
\ No newline at end of file
diff --git a/baremetal/interactive/infinite_loop.c b/baremetal/infinite_loop.c
similarity index 100%
rename from baremetal/interactive/infinite_loop.c
rename to baremetal/infinite_loop.c
diff --git a/baremetal/interactive/README.adoc b/baremetal/interactive/README.adoc
deleted file mode 100644
index 74158eb..0000000
--- a/baremetal/interactive/README.adoc
+++ /dev/null
@@ -1 +0,0 @@
-This folder contains examples that are not very testable: either are supposed to return 0, or are interactive, etc.
diff --git a/baremetal/interactive/exit1.c b/baremetal/interactive/exit1.c
deleted file mode 100644
index 342579e..0000000
--- a/baremetal/interactive/exit1.c
+++ /dev/null
@@ -1,6 +0,0 @@
-#include <stdio.h>
-#include <stdlib.h>
-
-int main(void) {
-    exit(1);
-}
diff --git a/baremetal/interactive/return1.c b/baremetal/interactive/return1.c
deleted file mode 100644
index 98c444a..0000000
--- a/baremetal/interactive/return1.c
+++ /dev/null
@@ -1 +0,0 @@
-int main(void) { return 1; }
diff --git a/baremetal/lib/aarch64.S b/baremetal/lib/aarch64.S
index 3c887ca..235de91 100644
--- a/baremetal/lib/aarch64.S
+++ b/baremetal/lib/aarch64.S
@@ -14,6 +14,12 @@ mystart:
     /* Prepare the stack for main, mandatory for C code. */
     ldr x0, =stack_top
     mov sp, x0
+
+    /* https://github.com/cirosantilli/linux-kernel-module-cheat#magic-failure-string */
+    adr x0, lkmc_baremetal_on_exit_callback
+    bl on_exit
+
+    /* Run main. */
     bl main
 
     /* If main returns, exit. */
diff --git a/baremetal/lib/arm.S b/baremetal/lib/arm.S
index 79fc1d0..d0fc20b 100644
--- a/baremetal/lib/arm.S
+++ b/baremetal/lib/arm.S
@@ -1,5 +1,16 @@
+#include <lkmc.h>
+
 .global mystart
 mystart:
+    /* Prepare the stack for main, mandatory for C code. */
     ldr sp, =stack_top
+
+    /* https://github.com/cirosantilli/linux-kernel-module-cheat#magic-failure-string */
+    ldr r0, =lkmc_baremetal_on_exit_callback
+    bl on_exit
+
+    /* Run main. */
     bl main
+
+    /* If main returns, exit. */
     bl exit
diff --git a/baremetal/lib/syscalls.c b/baremetal/lib/syscalls.c
index fd2d50a..da42a7b 100644
--- a/baremetal/lib/syscalls.c
+++ b/baremetal/lib/syscalls.c
@@ -2,6 +2,8 @@
 #include <stdlib.h>
 #include <sys/stat.h>
 
+#include <lkmc/m5ops.h>
+
 enum {
     UART_FR_RXFE = 0x10,
 };
@@ -11,13 +13,55 @@ enum {
 
 int _close(int file) { return -1; }
 
+void _exit(int status) {
+#if defined(GEM5)
+    LKMC_M5OPS_EXIT;
+#else
+#if defined(__arm__)
+    __asm__ __volatile__ (
+        "mov r0, #0x18\n"
+        "ldr r1, =#0x20026\n"
+        "svc 0x00123456\n"
+        :
+        :
+        : "r0", "r1"
+    );
+#elif defined(__aarch64__)
+    /* TODO actually use the exit value here, just for fun. */
+    __asm__ __volatile__ (
+        "mov x1, #0x26\n" \
+        "movk x1, #2, lsl #16\n" \
+        "str x1, [sp,#0]\n" \
+        "mov x0, #0\n" \
+        "str x0, [sp,#8]\n" \
+        "mov x1, sp\n" \
+        "mov w0, #0x18\n" \
+        "hlt 0xf000\n"
+        :
+        :
+        : "x0", "x1"
+    );
+#endif
+#endif
+}
+
 int _fstat(int file, struct stat *st) {
     st->st_mode = S_IFCHR;
     return 0;
 }
 
+/* Required by assert. */
+int _getpid(void) { return 0; }
+
+/* Required by assert. */
+int _kill(pid_t pid, int sig) {
+    exit(128 + sig);
+}
+
 int _isatty(int file) { return 1; }
+
 int _lseek(int file, int ptr, int dir) { return 0; }
+
 int _open(const char *name, int flags, int mode) { return -1; }
 
 int _read(int file, char *ptr, int len) {
@@ -59,33 +103,3 @@ int _write(int file, char *ptr, int len) {
     }
     return len;
 }
-
-/* Only 0 is supported for now, arm semihosting cannot handle other values. */
-void _exit(int status) {
-#if defined(GEM5)
-#if defined(__arm__)
-    __asm__ __volatile__ ("mov r0, #0; mov r1, #0; .inst 0xEE000110 | (0x21 << 16);" : : : "r0", "r1");
-#elif defined(__aarch64__)
-    __asm__ __volatile__ ("mov x0, #0; .inst 0XFF000110 | (0x21 << 16);" : : : "x0");
-#endif
-#else
-#if defined(__arm__)
-    __asm__ __volatile__ ("mov r0, #0x18; ldr r1, =#0x20026; svc 0x00123456" : : : "r0", "r1");
-#elif defined(__aarch64__)
-    /* TODO actually use the exit value here, just for fun. */
-    __asm__ __volatile__ (
-        "mov x1, #0x26\n" \
-        "movk x1, #2, lsl #16\n" \
-        "str x1, [sp,#0]\n" \
-        "mov x0, #0\n" \
-        "str x0, [sp,#8]\n" \
-        "mov x1, sp\n" \
-        "mov w0, #0x18\n" \
-        "hlt 0xf000\n"
-        :
-        :
-        : "x0", "x1"
-    );
-#endif
-#endif
-}
diff --git a/baremetal/lkmc_assert_fail.c b/baremetal/lkmc_assert_fail.c
new file mode 120000
index 0000000..c612c70
--- /dev/null
+++ b/baremetal/lkmc_assert_fail.c
@@ -0,0 +1 @@
+../lkmc/lkmc_assert_fail.c
\ No newline at end of file
diff --git a/baremetal/return1.c b/baremetal/return1.c
new file mode 100644
index 0000000..ef4cafb
--- /dev/null
+++ b/baremetal/return1.c
@@ -0,0 +1,2 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#magic-failure-string */
+int main(void) { return 1; }
diff --git a/baremetal/return2.c b/baremetal/return2.c
new file mode 100644
index 0000000..f160bf0
--- /dev/null
+++ b/baremetal/return2.c
@@ -0,0 +1,2 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#magic-failure-string */
+int main(void) { return 2; }
diff --git a/bisect-linux-boot-gem5 b/bisect-linux-boot-gem5
index 6d9a43c..55fe176 100755
--- a/bisect-linux-boot-gem5
+++ b/bisect-linux-boot-gem5
@@ -1,34 +1,11 @@
-#!/usr/bin/env python3
-
-import imp
-import os
-import shutil
-import sys
-
-import common
-build_linux = imp.load_source('build-linux', os.path.join(kwargs['root_dir'], 'build_linux'))
-run = imp.load_source('run', os.path.join(kwargs['root_dir'], 'run'))
-
-parser = self.get_argparse(
-    argparse_args={
-        'description': '''Bisect the Linux kernel on gem5 boots.
-
-More information at: https://github.com/cirosantilli/linux-kernel-module-cheat#bisection
-'''},
-    default_args={
-        'emulators': ['gem5'],
-        'linux_build_id': 'bisect',
-    },
-)
-args = self.setup(parser)
-# We need a clean rebuild because rebuilds at different revisions:
-# - may fail
-# - may not actually rebuild all files, e.g. on header changes
-self.rmrf(kwargs['linux_build_dir'])
-build_linux.LinuxComponent().do_build(args)
-status = run.main(args, {
-    'eval': 'm5 exit',
-})
-if status == 125 or status == 127:
-    status = 1
-sys.exit(status)
+#!/usr/bin/env bash
+set -eu
+./build-linux --clean "$@"
+./build-linux "$@"
+set +e
+./run --eval 'm5 exit' "$@" || status=$?
+# https://stackoverflow.com/questions/4713088/how-to-use-git-bisect/22592593#22592593
+if [ "$status" -eq 125 ] || [ "$status" -gt 127 ]; then
+  status=1
+fi
+exit "$status"
diff --git a/bisect-qemu-linux-boot b/bisect-qemu-linux-boot
index 6e797a9..51bcfa4 100755
--- a/bisect-qemu-linux-boot
+++ b/bisect-qemu-linux-boot
@@ -3,4 +3,4 @@ set -eu
 git submodule update --recursive
 cd ../..
 ./build-qemu --arch aarch64 --qemu-build-id bisect
-./run --arch aarch64 --kernel-cli 'init=/poweroff.out' --qemu-build-id bisect
+./run --arch aarch64 --kernel-cli 'init=/lkmc/linux/poweroff.out' --qemu-build-id bisect
diff --git a/bst-vs-heap b/bst-vs-heap
index 69ec9b9..a3b4eb5 100755
--- a/bst-vs-heap
+++ b/bst-vs-heap
@@ -1,17 +1,31 @@
 #!/usr/bin/env python3
+
 import common
-parser = self.get_argparse(
-    argparse_args={'description':'Convert a BST vs heap stat file into a gnuplot input'}
-)
-args = self.setup(parser)
-stats = self.get_stats()
-it = iter(stats)
-i = 1
-for stat in it:
-    try:
-        next_stat = next(it)
-    except StopIteration:
-        # Automatic dumpstats at end may lead to odd number of stats.
-        break
-    print('{} {} {}'.format(i, stat, next_stat))
-    i += 1
+
+class Main(common.LkmcCliFunction):
+    def __init__(self):
+        super().__init__(
+            defaults={
+                'emulator': 'gem5',
+                'show_time': False,
+            },
+            description='''\
+Convert a BST vs heap stat file into a gnuplot input
+''',
+        )
+
+    def timed_main(self):
+        stats = self.get_stats()
+        it = iter(stats)
+        i = 1
+        for stat in it:
+            try:
+                next_stat = next(it)
+            except StopIteration:
+                # Automatic dumpstats at end may lead to odd number of stats.
+                break
+            print('{} {} {}'.format(i, stat, next_stat))
+            i += 1
+
+if __name__ == '__main__':
+    Main().cli()
diff --git a/bst-vs-heap.gnuplot b/bst-vs-heap.gnuplot
new file mode 100755
index 0000000..4f025c2
--- /dev/null
+++ b/bst-vs-heap.gnuplot
@@ -0,0 +1,25 @@
+#!/usr/bin/env gnuplot
+set terminal png size 1024, 2048
+set output "bst-vs-heap.tmp.png"
+set multiplot layout 5,1 title "Heap vs BST vs Hash map insert time"
+set xlabel "size"
+set ylabel "nanoseconds"
+
+set title "Heap"
+plot "bst_vs_heap.dat" using 1:2 notitle
+
+set title "Heap (zoom)"
+set yrange [0:25]
+plot "bst_vs_heap.dat" using 1:2 notitle
+
+set title "BST"
+set yrange [*:*]
+plot "bst_vs_heap.dat" using 1:3 notitle
+
+#set title "Hash map"
+#set yrange [*:*]
+#plot "bst_vs_heap.dat" using 1:4 notitle
+#
+#set title "Hash map zoom"
+#set yrange [0:350]
+#plot "bst_vs_heap.dat" using 1:4 notitle
diff --git a/build b/build
index a4adfed..9954590 100755
--- a/build
+++ b/build
@@ -7,9 +7,12 @@ import cli_function
 import collections
 import common
 import copy
+import subprocess
 import shell_helpers
 from shell_helpers import LF
 
+import lkmc
+
 class _Component:
     '''
     Yes, we are re-inventing a crappy dependency resolution system,
@@ -107,10 +110,8 @@ so looping over all of them would waste time.
         )
         buildroot_component = _Component(
             self._build_file('build-buildroot'),
-            submodules = {
-                'buildroot',
-            },
             submodules_shallow = {
+                'buildroot',
                 'binutils-gdb',
                 'gcc',
                 'glibc',
@@ -164,7 +165,7 @@ so looping over all of them would waste time.
                 # Generate graphs of config.ini under m5out.
                 'pydot',
             },
-            'submodules': {'gem5'},
+            'submodules_shallow': {'gem5'},
         }
 
         self.name_to_component_map = {
@@ -227,7 +228,7 @@ so looping over all of them would waste time.
                     'python-dev',
                     'texinfo',
                 },
-                submodules={'crosstool-ng'},
+                submodules_shallow={'crosstool-ng'},
             ),
             'doc': _Component(
                 self._build_file('build-doc'),
@@ -272,7 +273,7 @@ so looping over all of them would waste time.
             'm5': _Component(
                 self._build_file('build-m5'),
                 dependencies=['buildroot'],
-                submodules={'gem5'},
+                submodules_shallow={'gem5'},
             ),
             'overlay': _Component(dependencies=[
                 'copy-overlay',
@@ -284,7 +285,7 @@ so looping over all of them would waste time.
                 'overlay',
             ]),
             'parsec-benchmark': _Component(
-                submodules={'parsec-benchmark'},
+                submodules_shallow={'parsec-benchmark'},
                 dependencies=['buildroot'],
             ),
             'qemu': _Component(
@@ -339,6 +340,14 @@ so looping over all of them would waste time.
                 self._build_file('build-userland'),
                 dependencies=['buildroot'],
             ),
+            'userland-host': _Component(
+                self._build_file('build-userland-in-tree'),
+                apt_get_pkgs={
+                    'libdrm-dev',
+                    'libeigen3-dev',
+                    'libopenblas-dev',
+                },
+            ),
             'userland-gem5': _Component(
                 self._build_file('build-userland', static=True, userland_build_id='static'),
                 dependencies=['buildroot'],
@@ -394,8 +403,8 @@ Which components to build. Default: qemu-buildroot
         def f():
             args = self.get_common_args()
             args.update(extra_args)
-            args['print_time'] = False
-            self.import_path_main(component_file)(**args)
+            args['show_time'] = False
+            lkmc.import_path.import_path_main(component_file)(**args)
         return f
 
     def timed_main(self):
@@ -504,46 +513,54 @@ Which components to build. Default: qemu-buildroot
                     ['python3', '-m', 'pip', 'install', '--user', LF] +
                     self.sh.add_newlines(sorted(python3_pkgs))
                 )
-            git_cmd_common = ['git', 'submodule', 'update', '--init', '--recursive']
-            if submodules:
-                # == Other nice git options for when distros move to newer Git
-                #
-                # Currently not on Ubuntu 16.04:
-                #
-                # `--progress`: added on Git 2.10:
-                #
+            git_version_tuple = tuple(int(x) for x in subprocess.check_output(['git', '--version']).decode().split(' ')[-1].split('.'))
+            git_cmd_common = [
+                'git', LF,
+                'submodule', LF,
+                'update', LF,
+                '--init', LF,
+                '--recursive', LF,
+            ]
+            if git_version_tuple >= (2, 9, 0):
+                # https://stackoverflow.com/questions/26957237/how-to-make-git-clone-faster-with-multiple-threads/52327638#52327638
+                git_cmd_common.extend(['--jobs', str(len(os.sched_getaffinity(0))), LF])
+            if git_version_tuple >= (2, 10, 0):
                 # * https://stackoverflow.com/questions/32944468/how-to-show-progress-for-submodule-fetching
                 # * https://stackoverflow.com/questions/4640020/progress-indicator-for-git-clone
-                #
-                # `--jobs"`: https://stackoverflow.com/questions/26957237/how-to-make-git-clone-faster-with-multiple-threads/52327638#52327638
-                self.sh.run_cmd(
-                    git_cmd_common + ['--', LF] +
-                    self.sh.add_newlines([os.path.join(common.consts['submodules_dir'], x) for x in sorted(submodules)])
-                )
+                git_cmd_common.extend(['--progress', LF])
+            def submodule_ids_to_cmd(submodules):
+                return self.sh.add_newlines([os.path.join(common.consts['submodules_dir'], x) for x in sorted(submodules)])
+            if submodules:
+                self.sh.run_cmd(git_cmd_common + ['--', LF] + submodule_ids_to_cmd(submodules))
             if submodules_shallow:
-                # == Shallow cloning.
-                #
                 # TODO Ideally we should shallow clone --depth 1 all of them.
                 #
                 # However, most git servers out there are crap or craply configured
                 # and don't allow shallow cloning except for branches.
                 #
-                # So for now, let's shallow clone only the Linux kernel, which has by far
-                # the largest .git repo history, and full clone the others.
+                # So for now I'm keeping all mirrors in my GitHub.
+                # and always have a lkmc-* branch pointint to it.
                 #
-                # Then we will maintain a GitHub Linux kernel mirror / fork that always has a
-                # lkmc branch, and point to it, so that it will always succeed.
+                # However, QEMU has a bunch of submodules itself, and I'm not in the mood
+                # to mirror all of them...
                 #
                 # See also:
                 #
                 # * https://stackoverflow.com/questions/3489173/how-to-clone-git-repository-with-specific-revision-changeset
                 # * https://stackoverflow.com/questions/2144406/git-shallow-submodules/47374702#47374702
                 # * https://unix.stackexchange.com/questions/338578/why-is-the-git-clone-of-the-linux-kernel-source-code-much-larger-than-the-extrac
-                #
-                self.sh.run_cmd(
-                    git_cmd_common + ['--depth', '1', '--', LF] +
-                    self.sh.add_newlines([os.path.join(common.consts['submodules_dir'], x) for x in sorted(submodules_shallow)])
-                )
+                cmd = git_cmd_common.copy()
+                if git_version_tuple > (2, 7, 4):
+                    # Then there is a bug in Ubuntu 16.04 git 2.7.4 where --depth 1 fails...
+                    # OMG git submodules implementation sucks:
+                    # * https://stackoverflow.com/questions/2155887/git-submodule-head-reference-is-not-a-tree-error/25875273#25875273
+                    # * https://github.com/boostorg/boost/issues/245
+                    cmd.extend(['--depth', '1', LF])
+                else:
+                    self.log_warn('your git is too old for git submodule update --depth 1')
+                    self.log_warn('update to git 2.17 or newer and you will save clone time')
+                    self.log_warn('see: https://github.com/cirosantilli/linux-kernel-module-cheat/issues/44')
+                self.sh.run_cmd(cmd + ['--', LF] + submodule_ids_to_cmd(submodules_shallow))
 
         # Do the build.
         for component in selected_components:
diff --git a/build-baremetal b/build-baremetal
index de0efcc..f6a2977 100755
--- a/build-baremetal
+++ b/build-baremetal
@@ -15,6 +15,9 @@ Build the baremetal examples with crosstool-NG.
 ''',
             supported_archs=common.consts['crosstool_ng_supported_archs']
         )
+        self._add_argument('--ccflags')
+        self._add_argument('--force-rebuild')
+        self._add_argument('--optimization-level')
 
     def build(self):
         build_dir = self.get_build_dir()
@@ -39,11 +42,25 @@ Build the baremetal examples with crosstool-NG.
         cflags = [
             '-I', self.env['baremetal_source_lib_dir'], LF,
             '-I', self.env['root_dir'], LF,
-            '-O0', LF,
+            '-O{}'.format(self.env['optimization_level']), LF,
             '-ggdb3', LF,
             '-mcpu={}'.format(self.env['mcpu']), LF,
             '-nostartfiles', LF,
         ]
+        if self.env['arch'] == 'arm':
+            cflags.extend([
+                '-mhard-float', LF,
+                # This uses the soft float ABI for calling functions from objets in Newlib which
+                # our crosstool-NG config compiles with soft floats, while emiting hard float
+                # from C and allowing us to use it from assembly, e.g. for the VMRS instruction:
+                # which would otherwise fail "with selected processor does not support XXX in ARM mode"
+                # Bibliography:
+                # - https://stackoverflow.com/questions/9753749/arm-compilation-error-vfp-registered-used-by-executable-not-object-file
+                # - https://stackoverflow.com/questions/41131432/cross-compiling-error-selected-processor-does-not-support-fmrx-r3-fpexc-in/41131782#41131782
+                # - https://embeddedartistry.com/blog/2017/10/9/r1q7pksku2q3gww9rpqef0dnskphtc
+                '-mfloat-abi=softfp', LF,
+                '-mfpu=crypto-neon-fp-armv8', LF,
+            ])
         cflags_after = ['-lm']
         if self.env['emulator'] == 'gem5':
             if self.env['machine'] == 'VExpress_GEM5_V1':
@@ -54,7 +71,10 @@ Build the baremetal examples with crosstool-NG.
                 uart_address = 0x10009000
             else:
                 raise Exception('unknown machine: ' + self.env['machine'])
-            cflags.extend(['-D', 'GEM5'.format(uart_address), LF])
+            cflags.extend([
+                '-D', 'GEM5'.format(uart_address), LF,
+                '-DLKMC_M5OPS_ENABLE=1', LF,
+            ])
         else:
             entry_address = 0x40000000
             uart_address = 0x09000000
@@ -67,9 +87,10 @@ Build the baremetal examples with crosstool-NG.
                 self.env['asm_ext']
             )
         )
+        cflags.extend(self.sh.shlex_split(self.env['ccflags']))
         if self.need_rebuild([src], bootloader_obj):
             self.sh.run_cmd(
-                [self.env['gcc'],  LF] +
+                [self.env['gcc_path'],  LF] +
                 cflags +
                 [
                     '-c', LF,
@@ -84,11 +105,11 @@ Build the baremetal examples with crosstool-NG.
         ]:
             if self.need_rebuild([src, self.env['common_h']], obj):
                 self.sh.run_cmd(
-                    [self.env['gcc'],  LF] +
+                    [self.env['gcc_path'],  LF] +
                     cflags +
                     [
-                        '-c', LF,
                         '-D', 'UART0_ADDR={:#x}'.format(uart_address), LF,
+                        '-c', LF,
                         '-o', obj, LF,
                         src, LF,
                     ] +
@@ -112,43 +133,34 @@ Build the baremetal examples with crosstool-NG.
                     in_name, in_ext = os.path.splitext(in_basename)
                     if  (
                         os.path.isfile(in_path) and
-                        in_ext in (self.env['c_ext'], self.env['asm_ext'])
+                        in_ext in self.env['build_in_exts']
                     ):
-                        main_obj = os.path.join(
-                            out_dir,
-                            '{}{}'.format(
-                                in_name,
-                                self.env['obj_ext']
-                            )
-                        )
-                        src = os.path.join(self.env['baremetal_source_dir'], in_path)
-                        if self.need_rebuild([src, self.env['common_h']], main_obj):
-                            self.sh.run_cmd(
-                                [self.env['gcc'],  LF] +
-                                cflags +
-                                [
-                                    '-c', LF,
-                                    '-o', main_obj, LF,
-                                    src, LF,
-                                ] +
-                                cflags_after
-                            )
-                        objs = common_objs_bootloader + [main_obj]
                         out = os.path.join(out_dir, in_name + self.env['baremetal_build_ext'])
-                        if self.need_rebuild(objs + [self.env['baremetal_link_script']], out):
+                        src = os.path.join(self.env['baremetal_source_dir'], in_path)
+                        if self.need_rebuild(
+                            common_objs_bootloader +
+                            [
+                                src,
+                                self.env['baremetal_link_script'],
+                                self.env['common_h']
+                            ],
+                            out
+                        ):
                             self.sh.run_cmd(
-                                [self.env['gcc'],  LF] +
+                                [self.env['gcc_path'],  LF] +
                                 cflags +
                                 [
                                     '-Wl,--section-start=.text={:#x}'.format(entry_address), LF,
                                     '-o', out, LF,
                                     '-T', self.env['baremetal_link_script'], LF,
                                 ] +
-                                self.sh.add_newlines(objs) +
+                                [
+                                    src, LF,
+                                ] +
+                                self.sh.add_newlines(common_objs_bootloader) +
                                 cflags_after
                             )
 
-
     def get_build_dir(self):
         return self.env['baremetal_build_dir']
 
diff --git a/build-buildroot b/build-buildroot
index f2f1375..c84cb2b 100755
--- a/build-buildroot
+++ b/build-buildroot
@@ -70,6 +70,7 @@ Extra arguments to be passed to the Buildroot make,
 usually extra Buildroot targets.
 '''
         )
+        self._add_argument('--force-rebuild')
 
     def build(self):
         build_dir = self.get_build_dir()
@@ -147,6 +148,8 @@ usually extra Buildroot targets.
             cwd=self.env['buildroot_source_dir'],
         )
         self.make_build_dirs()
+        if self.env['force_rebuild']:
+            extra_make_args.extend(['-B', LF])
         if not self.env['no_all']:
             extra_make_args.extend(['all', LF])
         self.sh.run_cmd(
@@ -158,7 +161,7 @@ usually extra Buildroot targets.
             ] +
             extra_make_args
             ,
-            out_file=os.path.join(self.env['buildroot_build_dir'], 'lkmc.log'),
+            out_file=os.path.join(self.env['buildroot_build_dir'], self.env['repo_short_id'] + '.log'),
             delete_env=['LD_LIBRARY_PATH'],
             cwd=self.env['buildroot_source_dir'],
         )
diff --git a/build-crosstool-ng b/build-crosstool-ng
index af40db3..627a1a9 100755
--- a/build-crosstool-ng
+++ b/build-crosstool-ng
@@ -61,7 +61,7 @@ Build crosstool-NG with Newlib for bare metal compilation
                 'build', LF,
                 'CT_JOBS={}'.format(str(self.env['nproc'])), LF,
             ],
-            out_file=os.path.join(build_dir, 'lkmc.log'),
+            out_file=os.path.join(build_dir, self.env['repo_short_id'] + '.log'),
             delete_env=['LD_LIBRARY_PATH'],
             extra_paths=[self.env['ccache_dir']],
         )
diff --git a/build-doc b/build-doc
index 8e958ab..002c3c9 100755
--- a/build-doc
+++ b/build-doc
@@ -9,7 +9,7 @@ class Main(common.LkmcCliFunction):
     def __init__(self):
         super().__init__(
             defaults = {
-                'print_time': False,
+                'show_time': False,
             },
             description='''\
 https://github.com/cirosantilli/linux-kernel-module-cheat#build-the-documentation
@@ -27,11 +27,12 @@ https://github.com/cirosantilli/linux-kernel-module-cheat#build-the-documentatio
         )
         error_re = re.compile('^asciidoctor: WARNING: ')
         exit_status = 0
-        with open(self.env['build_doc_log']) as f:
-            for line in f:
-                if error_re.search(line):
-                    exit_status = 1
-                    break
+        if not self.env['dry_run']:
+            with open(self.env['build_doc_log']) as f:
+                for line in f:
+                    if error_re.search(line):
+                        exit_status = 1
+                        break
         return exit_status
 
 if __name__ == '__main__':
diff --git a/build-docker b/build-docker
index 6c2ef2c..02a31cd 100755
--- a/build-docker
+++ b/build-docker
@@ -7,7 +7,6 @@ import tarfile
 import common
 from shell_helpers import LF
 
-
 class DockerComponent(self.Component):
     def get_argparse_args(self):
         return {
diff --git a/build-gem5 b/build-gem5
index 720c497..3b4ce97 100755
--- a/build-gem5
+++ b/build-gem5
@@ -64,7 +64,8 @@ https://github.com/cirosantilli/linux-kernel-module-cheat-regression#gem5-unit-t
                     self.env['gem5_source_dir'], LF,
                 ])
             else:
-                raise Exception('gem5 submodule not checked out')
+                if not self.env['dry_run']:
+                    raise Exception('gem5 submodule not checked out')
         if self.env['verbose']:
             verbose = ['--verbose', LF]
         else:
diff --git a/build-linux b/build-linux
index 4693474..a49f6f6 100755
--- a/build-linux
+++ b/build-linux
@@ -81,21 +81,19 @@ Run `make modules_install` after `make`.
             metavar='extra-make-args',
             nargs='*'
         )
+        self._add_argument('--force-rebuild')
 
     def build(self):
         build_dir = self.get_build_dir()
         os.makedirs(build_dir, exist_ok=True)
-        tool = 'gcc'
-        gcc = self.get_toolchain_tool(tool)
-        prefix = gcc[:-len(tool)]
         common_args = {
             'cwd': self.env['linux_source_dir'],
         }
         ccache = shutil.which('ccache')
         if ccache is not None:
-            cc = '{} {}'.format(ccache, gcc)
+            cc = '{} {}'.format(ccache, self.env['gcc_path'])
         else:
-            cc = gcc
+            cc = self.env['gcc_path']
         if self.env['verbose']:
             verbose = ['V=1']
         else:
@@ -104,10 +102,12 @@ Run `make modules_install` after `make`.
             'make', LF,
             '-j', str(self.env['nproc']), LF,
             'ARCH={}'.format(self.env['linux_arch']), LF,
-            'CROSS_COMPILE={}'.format(prefix), LF,
+            'CROSS_COMPILE={}-'.format(self.env['toolchain_prefix']), LF,
             'CC={}'.format(cc), LF,
             'O={}'.format(build_dir), LF,
         ] + verbose
+        if self.env['force_rebuild']:
+            common_make_args.extend(['-B', LF])
         if self.env['configure']:
             if self.env['custom_config_target']:
                 base_config_given = True
@@ -182,13 +182,13 @@ Run `make modules_install` after `make`.
                     ] +
                     self.sh.add_newlines(config_fragments)
                 )
-                self.sh.run_cmd(
-                    (
-                        common_make_args +
-                        ['olddefconfig', LF]
-                    ),
-                    **common_args
-                )
+            self.sh.run_cmd(
+                (
+                    common_make_args +
+                    ['olddefconfig', LF]
+                ),
+                **common_args
+            )
         if self.env['build']:
             self.sh.run_cmd(
                 (
@@ -199,7 +199,7 @@ Run `make modules_install` after `make`.
                 extra_env={
                     'KBUILD_BUILD_VERSION': '1',
                     'KBUILD_BUILD_TIMESTAMP': 'Thu Jan  1 00:00:00 UTC 1970',
-                    'KBUILD_BUILD_USER': 'lkmc',
+                    'KBUILD_BUILD_USER': self.env['repo_short_id'],
                     'KBUILD_BUILD_HOST': common.git_sha(self.env['linux_source_dir']),
                 },
                 **common_args
@@ -209,7 +209,7 @@ Run `make modules_install` after `make`.
                     (
                         common_make_args +
                         [
-                            'INSTALL_MOD_PATH={}'.format(self.env['out_rootfs_overlay_dir']), LF,
+                            'INSTALL_MOD_PATH={}'.format(self.env['out_rootfs_overlay_lkmc_dir']), LF,
                             'modules_install', LF,
                         ]
                     ),
diff --git a/build-m5 b/build-m5
index dba3f0f..ee7a811 100755
--- a/build-m5
+++ b/build-m5
@@ -16,8 +16,8 @@ class Main(common.BuildCliFunction):
             'make', LF,
             '-j', str(self.env['nproc']), LF,
             '-f', 'Makefile.{}'.format(arch), LF,
-            'CC={}'.format(self.env['gcc']), LF,
-            'LD={}'.format(self.env['ld']), LF,
+            'CC={}'.format(self.env['gcc_path']), LF,
+            'LD={}'.format(self.env['ld_path']), LF,
             'PWD={}'.format(self.env['gem5_m5_source_dir']), LF,
         ]
 
diff --git a/build-modules b/build-modules
index 9ed6c1d..6f1b344 100755
--- a/build-modules
+++ b/build-modules
@@ -3,6 +3,7 @@
 import distutils.dir_util
 import os
 import platform
+import shlex
 import shutil
 
 import common
@@ -19,6 +20,9 @@ See also: https://github.com/cirosantilli/linux-kernel-module-cheat#host
         self.add_argument(
             '--make-args',
             default='',
+            help='''
+Pass custom options to make.
+''',
         )
         self.add_argument(
             '--host',
@@ -34,6 +38,7 @@ Place the modules on a separate magic directory from non --host builds.
             help='Which kernel modules to build. Default: build all',
             nargs='*',
         )
+        self._add_argument('--force-rebuild')
 
     def build(self):
         build_dir = self.get_build_dir()
@@ -72,22 +77,23 @@ Place the modules on a separate magic directory from non --host builds.
             build_subdir = self.env['kernel_modules_build_host_subdir']
         else:
             build_subdir = self.env['kernel_modules_build_subdir']
-        tool = 'gcc'
-        gcc = self.get_toolchain_tool(tool)
-        prefix = gcc[:-len(tool)]
         ccache = shutil.which('ccache')
         if ccache is not None:
-            cc = '{} {}'.format(ccache, gcc)
+            cc = '{} {}'.format(ccache, self.env['gcc_path'])
         else:
-            cc = gcc
-        if self.env['verbose']:
-            verbose = ['V=1']
-        else:
-            verbose = []
+            cc = self.env['gcc_path']
         if self.env['host']:
             linux_dir = os.path.join('/lib', 'modules', platform.uname().release, 'build')
         else:
             linux_dir = self.env['linux_build_dir']
+        ccflags = [
+            '-I', self.env['root_dir'], LF,
+        ]
+        make_args_extra = []
+        if self.env['verbose']:
+            make_args_extra.extend(['V=1', LF])
+        if self.env['force_rebuild']:
+            make_args_extra.extend(['-B', LF])
         self.sh.run_cmd(
             (
                 [
@@ -95,20 +101,21 @@ Place the modules on a separate magic directory from non --host builds.
                     '-j', str(self.env['nproc']), LF,
                     'ARCH={}'.format(self.env['linux_arch']), LF,
                     'CC={}'.format(cc), LF,
-                    'CROSS_COMPILE={}'.format(prefix), LF,
+                    'CCFLAGS={}'.format(self.sh.cmd_to_string(ccflags)), LF,
+                    'CROSS_COMPILE={}-'.format(self.env['toolchain_prefix']), LF,
                     'LINUX_DIR={}'.format(linux_dir), LF,
                     'M={}'.format(build_subdir), LF,
                     'OBJECT_FILES={}'.format(' '.join(object_files)), LF,
                 ] +
-                self.sh.shlex_split(self.env['make_args']) +
-                verbose
+                make_args_extra +
+                self.sh.shlex_split(self.env['make_args'])
             ),
             cwd=os.path.join(self.env['kernel_modules_build_subdir']),
         )
         if not self.env['host']:
             self.sh.copy_dir_if_update_non_recursive(
                 srcdir=self.env['kernel_modules_build_subdir'],
-                destdir=self.env['out_rootfs_overlay_dir'],
+                destdir=self.env['out_rootfs_overlay_lkmc_dir'],
                 filter_ext=self.env['kernel_module_ext'],
             )
 
diff --git a/build-userland b/build-userland
index a7bfed3..48037de 100755
--- a/build-userland
+++ b/build-userland
@@ -2,295 +2,141 @@
 
 import os
 import shlex
-
-import common
-import threading
 import subprocess
-from shell_helpers import LF
+import threading
 
-error = False
+from shell_helpers import LF
+import common
+import thread_pool
 
 class Main(common.BuildCliFunction):
-    def __init__(self):
-        super().__init__(
-            description='''\
+    def __init__(self, *args, **kwargs):
+        if not 'description' in kwargs:
+            kwargs['description'] = '''\
 Build our compiled userland examples.
 '''
-        )
-        self.default_cstd = 'c11'
-        self.default_cxxstd = 'c++17'
-        self.add_argument(
-            '--has-package',
-            action='append',
-            default=[],
-            help='''\
-Indicate that a given package is present in the root filesystem, which
-allows us to build examples that rely on it.
-''',
-        )
-        self.add_argument(
-            '--in-tree',
-            default=False,
-            help='''\
-Magic build mode tailored to build from within the source tree:
-
-* place build output inside soure tree to conveniently run it
-* if not targets are given, build use the current working directory
-''',
-        )
+        super().__init__(*args, **kwargs)
         self.add_argument(
             'targets',
             default=[],
             help='''\
-Build only the given userland programs or all programs in the given directories.
+Select to build only the given userland programs, or all programs under
+the given directories.
+
+Default: build all.
+
+Must point to either sources or directories under userland/, or to LKMC
+toplevel which is a synonym for userland/.
 
 Default: build all examples that have their package dependencies met, e.g.:
+
 -   userland/arch/ programs only build if the target arch matches
 -   an OpenBLAS example can only be built if the target root filesystem
-    has the OpenBLAS libraries and headers installed, which you must inform with --has-package
+    has the OpenBLAS libraries and headers installed, which you must inform
+    with --package
 ''',
             nargs='*',
         )
-
-    def _build_one(
-        self,
-        in_path,
-        out_path,
-        ccflags,
-        ccflags_after=None,
-        cstd=None,
-        cxxstd=None,
-        extra_deps=None,
-        extra_objs=None,
-        link=True,
-        raise_on_failure=True,
-        thread_limiter=None,
-    ):
-        try:
-            if extra_deps is None:
-                extra_deps = []
-            if extra_objs is None:
-                extra_objs = []
-            if ccflags_after is None:
-                ccflags_after = []
-            ret = 0
-            if self.need_rebuild([in_path] + extra_objs + extra_deps, out_path):
-                ccflags = ccflags.copy()
-                if not link:
-                    ccflags.extend(['-c', LF])
-                in_ext = os.path.splitext(in_path)[1]
-                do_compile = True
-                if in_ext == self.env['c_ext']:
-                    cc = self.env['gcc']
-                    if cstd is None:
-                        std = self.default_cstd
-                    else:
-                        std = cstd
-                    ccflags.extend([
-                        '-fopenmp', LF,
-                    ])
-                elif in_ext == self.env['cxx_ext']:
-                    cc = self.env['gxx']
-                    if cxxstd is None:
-                        std = self.default_cxxstd
-                    else:
-                        std = cxxstd
-                else:
-                    do_compile = False
-                if do_compile:
-                    ret = self.sh.run_cmd(
-                        (
-                            [
-                                cc, LF,
-                            ] +
-                            ccflags +
-                            [
-                                '-std={}'.format(std), LF,
-                                '-o', out_path, LF,
-                                in_path, LF,
-                            ] +
-                            self.sh.add_newlines(extra_objs) +
-                            [
-                                '-lm', LF,
-                                '-pthread', LF,
-                            ] +
-                            ccflags_after
-                        ),
-                        extra_paths=[self.env['ccache_dir']],
-                        raise_on_failure=raise_on_failure,
-                    )
-        finally:
-            if thread_limiter is not None:
-                thread_limiter.release()
-        if ret != 0:
-            self.error = True
-        return ret
-
-    def _get_targets(self):
-        if self.env['_args_given']['targets']:
-            targets = self.env['targets']
-            if self.env['in_tree']:
-                cwd = os.getcwd()
-                targets = [os.path.join(cwd, target) for target in targets]
-            return targets
-        else:
-            if self.env['in_tree']:
-                return [os.getcwd()]
-            else:
-                return [self.env['userland_source_dir']]
+        self._add_argument('--ccflags')
+        self._add_argument('--force-rebuild')
+        self._add_argument('--optimization-level')
 
     def build(self):
         build_dir = self.get_build_dir()
-        os.makedirs(build_dir, exist_ok=True)
-        has_packages = set(self.env['has_package'])
-        ccflags = [
+        cc_flags = [
             '-I', self.env['root_dir'], LF,
-            '-I', self.env['userland_source_dir'], LF,
-            '-O0', LF,
-            '-Wall', LF,
-            '-Werror', LF,
-            '-Wextra', LF,
-            '-Wno-unused-function', LF,
-            '-ggdb3', LF,
-        ]
+            '-O{}'.format(self.env['optimization_level']), LF,
+        ] + self.sh.shlex_split(self.env['ccflags'])
         if self.env['static']:
-            ccflags.extend(['-static', LF])
-        common_obj = os.path.join(
+            cc_flags.extend(['-static', LF])
+        extra_obj_lkmc_common = os.path.join(
             build_dir,
             self.env['common_basename_noext'] + self.env['obj_ext']
         )
         self._build_one(
             in_path=self.env['common_c'],
-            out_path=common_obj,
-            ccflags=ccflags,
+            out_path=extra_obj_lkmc_common,
+            cc_flags=cc_flags,
             extra_deps=[self.env['common_h']],
             link=False,
         )
-        pkgs = {
-            'eigen': {
-                # TODO: was failing with:
-                # fatal error: Eigen/Dense: No such file or directory as of
-                # 975ce0723ee3fa1fea1766e6683e2f3acb8558d6
-                # http://lists.busybox.net/pipermail/buildroot/2018-June/222914.html
-                'ccflags': [
-                    '-I',
-                    os.path.join(
-                        self.env['buildroot_staging_dir'],
-                        'usr',
-                        'include',
-                        'eigen3'
-                    ),
-                    LF
-                ],
-                # Header only.
-                'ccflags_after': [],
-            },
-            'libdrm': {},
-            'openblas': {},
-        }
-        rootdir_abs_len = len(self.env['userland_source_dir'])
-        thread_limiter = threading.BoundedSemaphore(self.env['nproc'])
-        self.error = False
-        for target in self._get_targets():
-            target = self.resolve_userland_source(target)
-            for path, in_dirnames, in_filenames in self.sh.walk(target):
-                in_dirnames.sort()
-                path_abs = os.path.abspath(path)
-                dirpath_relative_root = path_abs[rootdir_abs_len + 1:]
-                dirpath_relative_root_components = dirpath_relative_root.split(os.sep)
-                if (
-                    len(dirpath_relative_root_components) < 2 or
-                    dirpath_relative_root_components[0] != 'arch' or
-                    dirpath_relative_root_components[1] == self.env['arch']
-                ):
-                    out_dir = os.path.join(
-                        build_dir,
-                        dirpath_relative_root
-                    )
-                    os.makedirs(out_dir, exist_ok=True)
-                    ccflags_dir = ccflags.copy()
-                    if dirpath_relative_root_components == ['gcc']:
-                        cstd = 'gnu11'
-                        cxxstd = 'gnu++17'
-                    else:
-                        cstd = self.default_cstd
-                        cxxstd = self.default_cxxstd
-                        # -pedantic complains even if we use -std=gnu11.
-                        ccflags_dir.extend(['-pedantic', LF])
-                    for in_filename in in_filenames:
-                        in_path = os.path.join(path, in_filename)
-                        in_name, in_ext = os.path.splitext(in_filename)
-                        out_path = os.path.join(
-                            out_dir,
-                            in_name + self.env['userland_build_ext']
-                        )
-                        pkg_key = in_name.split('_')[0]
-                        ccflags_file = ccflags_dir.copy()
-                        ccflags_after = []
-                        if pkg_key in pkgs:
-                            if pkg_key not in has_packages:
-                                continue
-                            pkg = pkgs[pkg_key]
-                            if 'ccflags' in pkg:
-                                ccflags_file.extend(pkg['ccflags'])
-                            else:
-                                pkg_config_output = subprocess.check_output([
-                                    self.env['buildroot_pkg_config'],
-                                    '--cflags',
-                                    pkg_key
-                                ]).decode()
-                                ccflags_file.extend(self.sh.shlex_split(pkg_config_output))
-                            if 'ccflags_after' in pkg:
-                                ccflags_file.extend(pkg['ccflags_after'])
-                            else:
-                                pkg_config_output = subprocess.check_output([
-                                    self.env['buildroot_pkg_config'],
-                                    '--libs',
-                                    pkg_key
-                                ]).decode()
-                                ccflags_after.extend(self.sh.shlex_split(pkg_config_output))
-                        thread_limiter.acquire()
-                        if self.error:
-                            return 1
-                        thread = threading.Thread(
-                            target=self._build_one,
-                            kwargs={
-                                'in_path': in_path,
-                                'out_path': out_path,
-                                'ccflags': ccflags_file,
-                                'cstd': cstd,
-                                'cxxstd': cxxstd,
-                                'extra_objs': [common_obj],
-                                'ccflags_after': ccflags_after,
-                                'raise_on_failure': False,
-                                'thread_limiter': thread_limiter,
-                            }
-                        )
-                        thread.start()
-        self.sh.copy_dir_if_update(
-            srcdir=build_dir,
-            destdir=self.env['out_rootfs_overlay_dir'],
-            filter_ext=self.env['userland_build_ext'],
+        extra_obj_userland_asm = os.path.join(
+            build_dir,
+            'arch',
+            'main' + self.env['obj_ext']
         )
+        extra_obj_userland_asm_relpath = os.path.join(
+            'arch',
+            'main' + self.env['c_ext']
+        )
+        self._build_one(
+            in_path=os.path.join(
+                self.env['userland_source_dir'],
+                extra_obj_userland_asm_relpath
+            ),
+            out_path=extra_obj_userland_asm,
+            cc_flags=cc_flags,
+            extra_deps=[self.env['common_h']],
+            link=False,
+        )
+        with thread_pool.ThreadPool(
+            self._build_one,
+            nthreads=self.env['nproc'],
+        ) as my_thread_pool:
+            try:
+                for target in self.env['targets']:
+                    for path, in_dirnames, in_filenames in self.sh.walk(target):
+                        for in_filename in in_filenames:
+                            in_ext = os.path.splitext(in_filename)[1]
+                            if not in_ext in self.env['build_in_exts']:
+                                continue
+                            in_path = os.path.join(path, in_filename)
+                            error = my_thread_pool.submit({
+                                'in_path': in_path,
+                                'out_path': self.resolve_userland_executable(in_path),
+                                'cc_flags': cc_flags,
+                                'extra_objs_lkmc_common': [extra_obj_lkmc_common],
+                                'extra_objs_userland_asm': [extra_obj_userland_asm],
+                            })
+                            if error is not None:
+                                raise common.ExitLoop()
+            except common.ExitLoop:
+                pass
+        error = my_thread_pool.get_error()
+        if error is not None:
+            print(error)
+            return 1
+        if not self.env['in_tree']:
+            self.sh.copy_dir_if_update(
+                srcdir=build_dir,
+                destdir=self.env['out_rootfs_overlay_lkmc_dir'],
+                filter_ext=self.env['userland_executable_ext'],
+            )
         return 0
 
     def clean(self):
         if self.env['in_tree']:
-            for target in self._get_targets():
-                for path, dirnames, filenames in os.walk(target):
-                    filenames.sort()
-                    dirnames.sort()
-                    for filename in filenames:
-                        if os.path.splitext(filename)[1] in self.env['userland_out_exts']:
-                            self.sh.rmrf(os.path.join(path, filename))
+            for target in self.env['targets']:
+                if os.path.exists(target):
+                    if os.path.isfile(target):
+                        self.sh.rmrf(self.resolve_userland_executable(target))
+                    else:
+                        for path, dirnames, filenames in self.sh.walk(target):
+                            for filename in filenames:
+                                if os.path.splitext(filename)[1] in self.env['userland_out_exts']:
+                                    self.sh.rmrf(os.path.join(path, filename))
         else:
-            self.sh.rmrf(self.get_build_dir())
+            for target in self.env['targets']:
+                self.sh.rmrf(self.resolve_userland_executable(target))
 
     def get_build_dir(self):
-        if self.env['in_tree']:
-            return self.env['userland_source_dir']
-        else:
-            return self.env['userland_build_dir']
+        return self.env['userland_build_dir']
+
+    def setup_one(self):
+        self.env['targets'] = self.resolve_targets(
+            self.env['userland_source_dir'],
+            self.env['targets']
+        )
 
 if __name__ == '__main__':
     Main().cli()
diff --git a/build-userland-in-tree b/build-userland-in-tree
index 08d43af..d6a7c17 100755
--- a/build-userland-in-tree
+++ b/build-userland-in-tree
@@ -1,2 +1,24 @@
-#!/usr/bin/env bash
-"$(git rev-parse --show-toplevel)/build-userland" --gcc-which host --in-tree "$@"
+#!/usr/bin/env python3
+
+import os
+import subprocess
+
+import lkmc.import_path
+
+build_userland = lkmc.import_path.import_path_relative_root('build-userland')
+
+class Main(build_userland.Main):
+    def __init__(self):
+        super().__init__(
+            description='''\
+https://github.com/cirosantilli/linux-kernel-module-cheat#userland-setup-getting-started-natively
+''',
+            defaults={
+                'gcc_which': 'host',
+                'in_tree': True,
+                'targets': ['.'],
+            }
+        )
+
+if __name__ == '__main__':
+    Main().cli()
diff --git a/cli_function.py b/cli_function.py
index 7a96efe..e4103af 100755
--- a/cli_function.py
+++ b/cli_function.py
@@ -11,10 +11,11 @@ made to this file.
 import argparse
 import bisect
 import collections
-import imp
 import os
 import sys
 
+import lkmc.import_path
+
 class _Argument:
     def __init__(
             self,
@@ -113,8 +114,7 @@ class _Argument:
 
 class CliFunction:
     '''
-    Represent a function that can be called either from Python code, or
-    from the command line.
+    A function that can be called either from Python code, or from the command line.
 
     Features:
 
@@ -134,6 +134,10 @@ class CliFunction:
 
     * that decorator API is insane
     * CLI + Python for single functions was wontfixed: https://github.com/pallets/click/issues/40
+    +
+    Oh, and I commented on that issue pointing to this alternative and they deleted my comment:
+    https://github.com/pallets/click/issues/40#event-2088718624 Lol. It could have been useful
+    for other Googlers and as an implementation reference.
     '''
     def __call__(self, **kwargs):
         '''
@@ -147,15 +151,15 @@ class CliFunction:
     def _do_main(self, kwargs):
         return self.main(**self._get_args(kwargs))
 
-    def __init__(self, config_file=None, description=None, extra_config_params=None):
+    def __init__(self, default_config_file=None, description=None, extra_config_params=None):
         self._arguments = collections.OrderedDict()
-        self._config_file = config_file
+        self._default_config_file = default_config_file
         self._description = description
         self.extra_config_params = extra_config_params
-        if self._config_file is not None:
+        if self._default_config_file is not None:
             self.add_argument(
                 '--config-file',
-                default=self._config_file,
+                default=self._default_config_file,
                 help='Path to the configuration file to use'
             )
 
@@ -172,30 +176,35 @@ class CliFunction:
         args_with_defaults = kwargs.copy()
         # Add missing args from config file.
         config_file = None
+        args_given = {}
         if 'config_file' in args_with_defaults and args_with_defaults['config_file'] is not None:
             config_file = args_with_defaults['config_file']
+            args_given['config_file'] = True
         else:
-            config_file = self._config_file
-        args_given = {}
+            config_file = self._default_config_file
+            args_given['config_file'] = False
         for key in self._arguments:
             args_given[key] = not (
                 not key in args_with_defaults or
                 args_with_defaults[key] is None or
                 self._arguments[key].nargs == '*' and args_with_defaults[key] == []
             )
-        if config_file is not None and os.path.exists(config_file):
-            config_configs = {}
-            config = imp.load_source('config', config_file)
-            if self.extra_config_params is None:
-                config.set_args(config_configs)
-            else:
-                config.set_args(config_configs, self.extra_config_params)
-            for key in config_configs:
-                if key not in self._arguments:
-                    raise Exception('Unknown key in config file: ' + key)
-                if not args_given[key]:
-                    args_with_defaults[key] = config_configs[key]
-                    args_given[key] = True
+        if config_file is not None:
+            if os.path.exists(config_file):
+                config_configs = {}
+                config = lkmc.import_path.import_path(config_file)
+                if self.extra_config_params is None:
+                    config.set_args(config_configs)
+                else:
+                    config.set_args(config_configs, self.extra_config_params)
+                for key in config_configs:
+                    if key not in self._arguments:
+                        raise Exception('Unknown key in config file: ' + key)
+                    if not args_given[key]:
+                        args_with_defaults[key] = config_configs[key]
+                        args_given[key] = True
+            elif args_given['config_file']:
+                raise Exception('Config file does not exist: ' + config_file)
         # Add missing args from hard-coded defaults.
         for key in self._arguments:
             argument = self._arguments[key]
@@ -290,7 +299,10 @@ class CliFunction:
                 if value != default:
                     if argument.is_option:
                         if argument.is_bool:
-                            vals = [(argument.longname,)]
+                            if value:
+                                vals = [(argument.longname,)]
+                            else:
+                                vals = [('--no-' + argument.longname[2:],)]
                         elif 'action' in argument.kwargs and argument.kwargs['action'] == 'append':
                             vals = [(argument.longname, str(val)) for val in value]
                         else:
@@ -326,7 +338,7 @@ if __name__ == '__main__':
     class OneCliFunction(CliFunction):
         def __init__(self):
             super().__init__(
-                config_file='cli_function_test_config.py',
+                default_config_file='cli_function_test_config.py',
                 description = '''\
 Description of this
 amazing function!
@@ -454,7 +466,8 @@ amazing function!
     # get_cli
     assert one_cli_function.get_cli(pos_mandatory=1, asdf='B') == [('--asdf', 'B'), ('--bool-cli',), ('1',)]
     assert one_cli_function.get_cli(pos_mandatory=1, asdf='B', qwer='R') == [('--asdf', 'B'), ('--bool-cli',), ('--qwer', 'R'), ('1',)]
-    assert one_cli_function.get_cli(pos_mandatory=1, bool_true=False) == [('--bool-cli',), ('--bool-true',), ('1',)]
+    assert one_cli_function.get_cli(pos_mandatory=1, bool_true=False) == [('--bool-cli',), ('--no-bool-true',), ('1',)]
+    assert one_cli_function.get_cli(pos_mandatory=1, bool_false=True) == [('--bool-cli',), ('--bool-false',), ('1',)]
     assert one_cli_function.get_cli(pos_mandatory=1, pos_optional=2, args_star=['asdf', 'qwer']) == [('--bool-cli',), ('1',), ('2',), ('asdf',), ('qwer',)]
     assert one_cli_function.get_cli(pos_mandatory=1, append=['2', '3']) == [('--append', '2'), ('--append', '3',), ('--bool-cli',), ('1',)]
 
diff --git a/common.py b/common.py
index 27d8c6c..de61efc 100644
--- a/common.py
+++ b/common.py
@@ -1,30 +1,35 @@
 #!/usr/bin/env python3
 
 import argparse
+import bisect
 import collections
 import copy
 import datetime
 import enum
+import functools
 import glob
-import imp
 import inspect
+import itertools
 import json
 import math
-import multiprocessing
 import os
 import platform
+import pathlib
+import queue
 import re
 import shutil
 import signal
 import subprocess
 import sys
+import threading
 import time
 import urllib
 import urllib.request
 
-import cli_function
-import shell_helpers
 from shell_helpers import LF
+import cli_function
+import path_properties
+import shell_helpers
 
 common = sys.modules[__name__]
 
@@ -56,8 +61,9 @@ consts['kernel_modules_subdir'] = 'kernel_modules'
 consts['kernel_modules_source_dir'] = os.path.join(consts['root_dir'], consts['kernel_modules_subdir'])
 consts['userland_subdir'] = 'userland'
 consts['userland_source_dir'] = os.path.join(consts['root_dir'], consts['userland_subdir'])
-consts['userland_build_ext'] = '.out'
-consts['include_subdir'] = 'include'
+consts['userland_source_arch_dir'] = os.path.join(consts['userland_source_dir'], 'arch')
+consts['userland_executable_ext'] = '.out'
+consts['include_subdir'] = consts['repo_short_id']
 consts['include_source_dir'] = os.path.join(consts['root_dir'], consts['include_subdir'])
 consts['submodules_dir'] = os.path.join(consts['root_dir'], 'submodules')
 consts['buildroot_source_dir'] = os.path.join(consts['submodules_dir'], 'buildroot')
@@ -98,42 +104,67 @@ consts['cxx_ext'] = '.cpp'
 consts['header_ext'] = '.h'
 consts['kernel_module_ext'] = '.ko'
 consts['obj_ext'] = '.o'
-consts['userland_in_exts'] = [
+consts['build_in_exts'] = [
     consts['asm_ext'],
     consts['c_ext'],
     consts['cxx_ext'],
 ]
 consts['userland_out_exts'] = [
-    consts['userland_build_ext'],
+    consts['userland_executable_ext'],
     consts['obj_ext'],
 ]
-consts['config_file'] = os.path.join(consts['data_dir'], 'config.py')
-consts['magic_fail_string'] = b'lkmc_test_fail'
+consts['default_config_file'] = os.path.join(consts['data_dir'], 'config.py')
+consts['serial_magic_exit_status_regexp_string'] = b'lkmc_exit_status_(\d+)'
 consts['baremetal_lib_basename'] = 'lib'
+consts['emulator_userland_only_short_to_long_dict'] = collections.OrderedDict([
+    ('n', 'native'),
+])
+consts['all_userland_only_emulators'] = set()
+for key in consts['emulator_userland_only_short_to_long_dict']:
+    consts['all_userland_only_emulators'].add(key)
+    consts['all_userland_only_emulators'].add(consts['emulator_userland_only_short_to_long_dict'][key])
 consts['emulator_short_to_long_dict'] = collections.OrderedDict([
     ('q', 'qemu'),
     ('g', 'gem5'),
 ])
+consts['emulator_short_to_long_dict'].update(consts['emulator_userland_only_short_to_long_dict'])
 consts['all_long_emulators'] = [consts['emulator_short_to_long_dict'][k] for k in consts['emulator_short_to_long_dict']]
 consts['emulator_choices'] = set()
 for key in consts['emulator_short_to_long_dict']:
     consts['emulator_choices'].add(key)
     consts['emulator_choices'].add(consts['emulator_short_to_long_dict'][key])
 consts['host_arch'] = platform.processor()
+consts['guest_lkmc_home'] = os.sep + consts['repo_short_id']
+
+class ExitLoop(Exception):
+    pass
 
 class LkmcCliFunction(cli_function.CliFunction):
     '''
     Common functionality shared across our CLI functions:
 
     * command timing
-    * some common flags, e.g.: --arch, --dry-run, --quiet, --verbose
+    * a lot some common flags, e.g.: --arch, --dry-run, --quiet, --verbose
+    * a lot of helpers that depend on self.env
+    +
+    self.env contains the command line arguments + a ton of values derived from those.
+    +
+    It would be beautiful to do this evaluation in a lazy way, e.g. with functions +
+    cache decorators:
+    https://stackoverflow.com/questions/815110/is-there-a-decorator-to-simply-cache-function-return-values
     '''
-    def __init__(self, *args, defaults=None, supported_archs=None, **kwargs):
+    def __init__(
+        self,
+        *args,
+        defaults=None,
+        supported_archs=None,
+        **kwargs
+    ):
         '''
         :ptype defaults: Dict[str,Any]
         :param defaults: override the default value of an argument
         '''
-        kwargs['config_file'] = consts['config_file']
+        kwargs['default_config_file'] = consts['default_config_file']
         kwargs['extra_config_params'] = os.path.basename(inspect.getfile(self.__class__))
         if defaults is None:
             defaults = {}
@@ -142,6 +173,7 @@ class LkmcCliFunction(cli_function.CliFunction):
         self._common_args = set()
         super().__init__(*args, **kwargs)
         self.supported_archs = supported_archs
+        self.print_lock = threading.Lock()
 
         # Args for all scripts.
         arches = consts['arch_short_to_long_dict']
@@ -209,12 +241,14 @@ Which toolchain binaries to use:
 '''
         )
         self.add_argument(
-            '--print-time',
-            default=True,
-            help='''\
-Print how long it took to run the command at the end.
-Implied by --quiet.
-'''
+            '-j',
+            '--nproc',
+            default=len(os.sched_getaffinity(0)),
+            type=int,
+            help='''Number of processors to use for the action.
+This is currently only implemented for the following scripts:
+all ./build-* scripts, test-user-mode.
+''',
         )
         self.add_argument(
             '-q',
@@ -230,6 +264,14 @@ TODO: implement fully, some stuff is escaping it currently.
             default=True,
             help='''\
 Stop running at the first failed test.
+'''
+        )
+        self.add_argument(
+            '--show-time',
+            default=True,
+            help='''\
+Print how long it took to run the command at the end.
+Implied by --quiet.
 '''
         )
         self.add_argument(
@@ -324,10 +366,8 @@ See: https://github.com/cirosantilli/linux-kernel-module-cheat#initrd
             help='''\
 Use the given baremetal executable instead of the Linux kernel.
 
-If the path is absolute, it is used as is.
-
-If the path is relative, we assume that it points to a source code
-inside baremetal/ and then try to use corresponding executable.
+If the path points to a source code inside baremetal/, then the
+corresponding executable is automatically found.
 '''
         )
 
@@ -386,8 +426,8 @@ Use the docker download Ubuntu root filesystem instead of the default Buildroot
         )
         self.add_argument(
             '--qemu-which',
-            choices=['lkmc', 'host'],
-            default='lkmc',
+            choices=[consts['repo_short_id'], 'host'],
+            default=consts['repo_short_id'],
             help='''\
 Which qemu binaries to use: qemu-system-, qemu-, qemu-img, etc.:
 - lkmc: the ones we built with ./build-qemu
@@ -404,6 +444,22 @@ Machine type:
         )
 
         # Userland.
+        self.add_argument(
+            '--package',
+            action='append',
+            help='''\
+Request to install a package in the target root filesystem, or indicate that it is present
+when building examples that rely on it or running tests for those examples.
+''',
+        )
+        self.add_argument(
+            '--package-all',
+            action='store_true',
+            help='''\
+Indicate that all packages used by our userland/ examples with --package
+are available.
+''',
+        )
         self.add_argument(
             '--static',
             default=False,
@@ -413,7 +469,8 @@ if one was not given explicitly.
 ''',
         )
         self.add_argument(
-            '-u', '--userland',
+            '-u',
+            '--userland',
             help='''\
 Run the given userland executable in user mode instead of booting the Linux kernel
 in full system mode. In gem5, user mode is called Syscall Emulation (SE) mode and
@@ -433,14 +490,49 @@ CLI arguments to pass to the userland executable.
 
         # Run.
         self.add_argument(
-            '-n', '--run-id', default='0',
+            '--background',
+            default=False,
             help='''\
-ID for run outputs such as gem5's m5out. Allows you to do multiple runs,
-and then inspect separate outputs later in different output directories.
+Make programs that would take over the terminal such as QEMU full system run on the
+background instead.
+
+Currently only implemented for ./run.
+
+Interactive input cannot be given.
+
+Send QEMU serial output to a file instead of the host terminal.
+
+TODO: use a port instead. If only there was a way to redirect a serial to multiple
+places, both to a port and a file? We use the file currently to be able to have
+any output at all.
+https://superuser.com/questions/1373226/how-to-redirect-qemu-serial-output-to-both-a-file-and-the-terminal-or-a-port
 '''
         )
         self.add_argument(
-            '-P', '--prebuilt', default=False,
+            '--in-tree',
+            default=False,
+            help='''\
+Place build output inside source tree to conveniently run it, especially when
+building with the host native toolchain.
+
+When running, use in-tree executables instead of out-of-tree ones,
+userland/c/hello resolves userland/c/hello.out instead of the out-of-tree one.
+
+Currently only supported by userland scripts such as ./build-userland and
+./run --userland.
+''',
+        )
+        self.add_argument(
+            '--port-offset',
+            type=int,
+            help='''\
+Increase the ports to be used such as for GDB by an offset to run multiple
+instances in parallel. Default: the run ID (-n) if that is an integer, otherwise 0.
+'''
+        )
+        self.add_argument(
+            '--prebuilt',
+            default=False,
             help='''\
 Use prebuilt packaged host utilities as much as possible instead
 of the ones we built ourselves. Saves build time, but decreases
@@ -448,10 +540,11 @@ the likelihood of incompatibilities.
 '''
         )
         self.add_argument(
-            '--port-offset', type=int,
+            '--run-id',
+            default='0',
             help='''\
-Increase the ports to be used such as for GDB by an offset to run multiple
-instances in parallel. Default: the run ID (-n) if that is an integer, otherwise 0.
+ID for run outputs such as gem5's m5out. Allows you to do multiple runs,
+and then inspect separate outputs later in different output directories.
 '''
         )
 
@@ -465,7 +558,7 @@ instances in parallel. Default: the run ID (-n) if that is an integer, otherwise
         self.add_argument(
             '--all-emulators', default=False,
             help='''\
-Run action for all supported --emulators emulators. Ignore --emulators.
+Run action for all supported emulators. Ignore --emulator.
 '''.format(emulators_string)
         )
         self.add_argument(
@@ -478,6 +571,11 @@ Run action for all supported --emulators emulators. Ignore --emulators.
             help='''\
 Emulator to use. If given multiple times, semantics are similar to --arch.
 Valid emulators: {}
+
+"native" means running natively on host. It is only supported for userland,
+and you must have built the program for native running, see:
+https://github.com/cirosantilli/linux-kernel-module-cheat#userland-setup-getting-started-natively
+Incompatible archs are skipped.
 '''.format(emulators_string)
         )
         self._is_common = False
@@ -587,14 +685,31 @@ Valid emulators: {}
 
         # QEMU
         env['qemu_build_dir'] = join(env['out_dir'], 'qemu', env['qemu_build_id'])
-        env['qemu_executable_basename'] = 'qemu-system-{}'.format(env['arch'])
-        env['qemu_executable'] = join(env['qemu_build_dir'], '{}-softmmu'.format(env['arch']), env['qemu_executable_basename'])
         env['qemu_img_basename'] = 'qemu-img'
         env['qemu_img_executable'] = join(env['qemu_build_dir'], env['qemu_img_basename'])
+        if env['userland'] is None:
+            env['qemu_executable_basename'] = 'qemu-system-{}'.format(env['arch'])
+        else:
+            env['qemu_executable_basename'] = 'qemu-{}'.format(env['arch'])
+        if env['qemu_which'] == 'host':
+            env['qemu_executable'] = env['qemu_executable_basename']
+        else:
+            if env['userland'] is None:
+                env['qemu_executable'] = join(
+                    env['qemu_build_dir'],
+                    '{}-softmmu'.format(env['arch']),
+                    env['qemu_executable_basename']
+                )
+            else:
+                env['qemu_executable'] = join(
+                    self.env['qemu_build_dir'],
+                    '{}-linux-user'.format(self.env['arch']),
+                    env['qemu_executable_basename']
+                )
 
         # gem5
         if not env['_args_given']['gem5_build_dir']:
-            env['gem5_build_dir'] = join(env['gem5_out_dir'], env['gem5_build_id'], env['gem5_build_type'])
+            env['gem5_build_dir'] = join(env['gem5_out_dir'], env['gem5_build_id'])
         env['gem5_fake_iso'] = join(env['gem5_out_dir'], 'fake.iso')
         env['gem5_m5term'] = join(env['gem5_build_dir'], 'm5term')
         env['gem5_build_build_dir'] = join(env['gem5_build_dir'], 'build')
@@ -674,7 +789,10 @@ Valid emulators: {}
             env['executable'] = env['qemu_executable']
             env['run_dir'] = env['qemu_run_dir']
             env['termout_file'] = env['qemu_termout_file']
-            env['guest_terminal_file'] = env['qemu_termout_file']
+            if env['background']:
+                env['guest_terminal_file'] = env['qemu_background_serial_file']
+            else:
+                env['guest_terminal_file'] = env['qemu_termout_file']
             env['trace_txt_file'] = env['qemu_trace_txt_file']
         env['run_cmd_file'] = join(env['run_dir'], 'run.sh')
 
@@ -704,9 +822,13 @@ Valid emulators: {}
             env['linux_image'] = env['lkmc_linux_image']
         env['linux_config'] = join(env['linux_build_dir'], '.config')
         if env['emulator']== 'gem5':
-            env['userland_quit_cmd'] = '/gem5_exit.sh'
+            env['userland_quit_cmd'] = './gem5_exit.sh'
         else:
-            env['userland_quit_cmd'] = '/poweroff.out'
+            env['userland_quit_cmd'] = join(
+                env['guest_lkmc_home'],
+                'linux',
+                'poweroff' + env['userland_executable_ext']
+            )
         env['ramfs'] = env['initrd'] or env['initramfs']
         if env['ramfs']:
             env['initarg'] = 'rdinit'
@@ -714,14 +836,24 @@ Valid emulators: {}
             env['initarg'] = 'init'
         env['quit_init'] = '{}={}'.format(env['initarg'], env['userland_quit_cmd'])
 
+        # Userland
+        env['userland_source_arch_arch_dir'] = join(env['userland_source_arch_dir'], env['arch'])
+        if env['in_tree']:
+            env['userland_build_dir'] = self.env['userland_source_dir']
+        else:
+            env['userland_build_dir'] = join(env['out_dir'], 'userland', env['userland_build_id'], env['arch'])
+        env['package'] = set(env['package'])
+
         # Kernel modules.
         env['kernel_modules_build_dir'] = join(env['kernel_modules_build_base_dir'], env['arch'])
         env['kernel_modules_build_subdir'] = join(env['kernel_modules_build_dir'], env['kernel_modules_subdir'])
         env['kernel_modules_build_host_dir'] = join(env['kernel_modules_build_base_dir'], 'host')
         env['kernel_modules_build_host_subdir'] = join(env['kernel_modules_build_host_dir'], env['kernel_modules_subdir'])
-        env['userland_build_dir'] = join(env['out_dir'], 'userland', env['userland_build_id'], env['arch'])
+
+        # Overlay.
         env['out_rootfs_overlay_dir'] = join(env['out_dir'], 'rootfs_overlay', env['arch'])
-        env['out_rootfs_overlay_bin_dir'] = join(env['out_rootfs_overlay_dir'], 'bin')
+        env['out_rootfs_overlay_lkmc_dir'] = join(env['out_rootfs_overlay_dir'], env['repo_short_id'])
+        env['out_rootfs_overlay_bin_dir'] = join(env['out_rootfs_overlay_lkmc_dir'], 'bin')
 
         # Baremetal.
         env['baremetal_source_dir'] = join(env['root_dir'], 'baremetal')
@@ -738,7 +870,7 @@ Valid emulators: {}
         env['baremetal_build_ext'] = '.elf'
 
         # Userland / baremetal common source.
-        env['common_basename_noext'] = 'lkmc'
+        env['common_basename_noext'] = env['repo_short_id']
         env['common_c'] = common_c = os.path.join(
             env['root_dir'],
             env['common_basename_noext'] + env['c_ext']
@@ -783,6 +915,8 @@ Valid emulators: {}
                         env['source_path'] = source_path
                         break
             env['image'] = path
+        elif env['userland'] is not None:
+            env['image'] = self.resolve_userland_executable(env['userland'])
         else:
             if env['emulator'] == 'gem5':
                 env['image'] = env['vmlinux']
@@ -816,6 +950,7 @@ lunch aosp_{}-eng
                 env['buildroot_toolchain_prefix']
             )
             env['userland_library_dir'] = env['buildroot_target_dir']
+            env['pkg_config'] = env['buildroot_pkg_config']
         elif env['gcc_which'] == 'crosstool-ng':
             env['toolchain_prefix'] = os.path.join(
                 env['crosstool_ng_bin_dir'],
@@ -829,6 +964,7 @@ lunch aosp_{}-eng
                 env['userland_library_dir'] = '/usr/arm-linux-gnueabihf'
             elif env['arch'] == 'aarch64':
                 env['userland_library_dir'] = '/usr/aarch64-linux-gnu/'
+            env['pkg_config'] = 'pkg-config'
         elif env['gcc_which'] == 'host-baremetal':
             if env['arch'] == 'arm':
                 env['toolchain_prefix'] = 'arm-none-eabi'
@@ -836,8 +972,16 @@ lunch aosp_{}-eng
                 raise Exception('There is no host baremetal chain for arch: ' + env['arch'])
         else:
             raise Exception('Unknown toolchain: ' + env['gcc_which'])
-        env['gcc'] = self.get_toolchain_tool('gcc')
-        env['gxx'] = self.get_toolchain_tool('g++')
+        env['gcc_path'] = self.get_toolchain_tool('gcc')
+        env['gxx_path'] = self.get_toolchain_tool('g++')
+        env['ld_path'] = self.get_toolchain_tool('ld')
+        if env['gcc_which'] == 'host':
+            if env['arch'] == 'x86_64':
+                env['gdb_path'] = 'gdb'
+            else:
+                env['gdb_path'] = 'gdb-multiarch'
+        else:
+            env['gdb_path'] = self.get_toolchain_tool('gdb')
 
     def add_argument(self, *args, **kwargs):
         '''
@@ -853,6 +997,15 @@ lunch aosp_{}-eng
             self._common_args.add(key)
         super().add_argument(*args, **kwargs)
 
+    def assert_is_subpath(self, subpath, parent):
+        if not self.is_subpath(subpath, parent):
+            raise Exception(
+                'Can only accept targets inside {}, given: {}'.format(
+                    parent,
+                    subpath
+                )
+            )
+
     def get_elf_entry(self, elf_file_path):
         readelf_header = subprocess.check_output([
             self.get_toolchain_tool('readelf'),
@@ -892,7 +1045,13 @@ lunch aosp_{}-eng
         of the script.
         '''
         return {
-            key:self.env[key] for key in self._common_args if self.env['_args_given'][key]
+            key:self.env[key] for key in self._common_args if
+            (
+                # Args given on command line.
+                self.env['_args_given'][key] or
+                # Ineritance changed defaults.
+                key in self._defaults
+            )
         }
 
     def get_stats(self, stat_re=None, stats_file=None):
@@ -944,32 +1103,29 @@ lunch aosp_{}-eng
             _json = {}
         return _json
 
-    def import_path(self, basename):
-        '''
-        https://stackoverflow.com/questions/2601047/import-a-python-module-without-the-py-extension
-        https://stackoverflow.com/questions/31773310/what-does-the-first-argument-of-the-imp-load-source-method-do
-        '''
-        return imp.load_source(basename.replace('-', '_'), os.path.join(self.env['root_dir'], basename))
-
-    def import_path_main(self, path):
-        '''
-        Import an object of the Main class of a given file.
-
-        By convention, we call the main object of all our CLI scripts as Main.
-        '''
-        return self.import_path(path).Main()
-
     def is_arch_supported(self, arch):
         return self.supported_archs is None or arch in self.supported_archs
 
     def log_error(self, msg):
-        print('error: {}'.format(msg), file=sys.stdout)
+        with self.print_lock:
+            print('error: {}'.format(msg), file=sys.stdout)
 
     def log_info(self, msg='', flush=False, **kwargs):
-        if not self.env['quiet']:
-            print('{}'.format(msg), **kwargs)
-        if flush:
-            sys.stdout.flush()
+        with self.print_lock:
+            if not self.env['quiet']:
+                print('{}'.format(msg), **kwargs)
+            if flush:
+                sys.stdout.flush()
+
+    def log_warn(self, msg):
+        with self.print_lock:
+            print('warning: {}'.format(msg), file=sys.stdout)
+
+    def is_subpath(self, subpath, parent):
+        '''
+        https://stackoverflow.com/questions/3812849/how-to-check-whether-a-directory-is-a-sub-directory-of-another-directory
+        '''
+        return os.path.abspath(subpath).startswith(os.path.abspath(parent))
 
     def main(self, *args, **kwargs):
         '''
@@ -991,16 +1147,14 @@ lunch aosp_{}-eng
         else:
             real_emulators = env['emulators']
         return_value = 0
-        class GetOutOfLoop(Exception): pass
         try:
-            ret = self.setup()
-            if ret is not None and ret != 0:
-                return_value = ret
-                raise GetOutOfLoop()
             for emulator in real_emulators:
                 for arch in real_archs:
                     if arch in env['arch_short_to_long_dict']:
                         arch = env['arch_short_to_long_dict'][arch]
+                    if emulator == 'native':
+                        if arch != env['host_arch']:
+                            continue
                     if self.is_arch_supported(arch):
                         if not env['dry_run']:
                             start_time = time.time()
@@ -1018,6 +1172,7 @@ lunch aosp_{}-eng
                             dry_run=self.env['dry_run'],
                             quiet=self.env['quiet'],
                         )
+                        self.setup_one()
                         ret = self.timed_main()
                         if not env['dry_run']:
                             end_time = time.time()
@@ -1026,11 +1181,10 @@ lunch aosp_{}-eng
                         if ret is not None and ret != 0:
                             return_value = ret
                             if self.env['quit_on_fail']:
-                                raise GetOutOfLoop()
+                                raise ExitLoop()
                     elif not real_all_archs:
                         raise Exception('Unsupported arch for this action: ' + arch)
-
-        except GetOutOfLoop:
+        except ExitLoop:
             pass
         ret = self.teardown()
         if ret is not None and ret != 0:
@@ -1067,7 +1221,7 @@ lunch aosp_{}-eng
         return '{:02}:{:02}:{:02}'.format(int(hours), int(minutes), int(seconds))
 
     def print_time(self, ellapsed_seconds):
-        if self.env['print_time'] and not self.env['quiet']:
+        if self.env['show_time'] and not self.env['quiet']:
             print('time {}'.format(self.seconds_to_hms(ellapsed_seconds)))
 
     def raw_to_qcow2(self, qemu_which=False, reverse=False):
@@ -1103,105 +1257,72 @@ lunch aosp_{}-eng
             ]
         )
 
-    @staticmethod
-    def resolve_args(defaults, args, extra_args):
-        if extra_args is None:
-            extra_args = {}
-        argcopy = copy.copy(args)
-        argcopy.__dict__ = dict(list(defaults.items()) + list(argcopy.__dict__.items()) + list(extra_args.items()))
-        return argcopy
-
-    def resolve_source(self, in_path, magic_in_dir, in_exts):
+    def resolve_executable(
+        self,
+        in_path,
+        magic_in_dir,
+        magic_out_dir,
+        executable_ext
+    ):
         '''
-        Convert a path-like string to a source file to the full source path,
-        e.g. all follogin work and to do the same:
+        Resolve the path of an userland or baremetal executable.
 
-        - hello
-        - hello.
-        - hello.c
-        - userland/hello
-        - userland/hello.
-        - userland/hello.c
-        - /full/path/to/userland/hello
-        - /full/path/to/userland/hello.
-        - /full/path/to/userland/hello.c
+        If it is in tree, resolve source paths to their corresponding executables.
 
-        Also works on directories:
+        If it is out of tree, return the same exact path as input.
 
-        - arch
-        - userland/arch
-        - /full/path/to/userland/arch
+        If the input path is a file, add the executable extension automatically.
+
+        Directories map to the directories that would contain executable in that directory.
         '''
-        if os.path.isabs(in_path):
-            return in_path
-        else:
-            paths = [
-                os.path.join(magic_in_dir, in_path),
-                os.path.join(
-                    magic_in_dir,
-                    os.path.relpath(in_path, magic_in_dir),
+        if not self.env['dry_run'] and not os.path.exists(in_path):
+            raise Exception('Input path does not exist: ' + in_path)
+        if self.is_subpath(in_path, magic_in_dir):
+            # Abspath needed to remove the trailing `/.` which makes e.g. rmrf fail.
+            out = os.path.abspath(os.path.join(
+                magic_out_dir,
+                os.path.relpath(
+                    os.path.splitext(in_path)[0],
+                    magic_in_dir
                 )
-            ]
-            for path in paths:
-                name, ext = os.path.splitext(path)
-                if len(ext) > 1:
-                    try_exts = [ext]
-                else:
-                    try_exts = in_exts + ['']
-                for in_ext in try_exts:
-                    path = name + in_ext
-                    if os.path.exists(path):
-                        return path
-            if not self.env['dry_run']:
-                raise Exception('Source file not found for input: ' + in_path)
+            ))
+            if os.path.isfile(in_path):
+                out += executable_ext
+            return out
+        else:
+            return in_path
 
-    def resolve_executable(self, in_path, magic_in_dir, magic_out_dir, out_ext):
-        if os.path.isabs(in_path):
-            return in_path
-        else:
-            paths = [
-                os.path.join(magic_out_dir, in_path),
-                os.path.join(
-                    magic_out_dir,
-                    os.path.relpath(in_path, magic_in_dir),
-                )
-            ]
-            for path in paths:
-                path = os.path.splitext(path)[0] + out_ext
-                if os.path.exists(path):
-                    return path
-            if not self.env['dry_run']:
-                raise Exception('Executable file not found. Tried:\n' + '\n'.join(paths))
+    def resolve_targets(self, source_dir, targets):
+        if not targets:
+            targets = [source_dir]
+        new_targets = []
+        for target in targets:
+            target = self.toplevel_to_source_dir(target, source_dir)
+            self.assert_is_subpath(target, source_dir)
+            new_targets.append(target)
+        return new_targets
 
     def resolve_userland_executable(self, path):
-        '''
-        Convert an userland source path-like string to an
-        absolute userland build output path.
-        '''
         return self.resolve_executable(
             path,
             self.env['userland_source_dir'],
             self.env['userland_build_dir'],
-            self.env['userland_build_ext'],
+            self.env['userland_executable_ext'],
         )
 
-    def resolve_userland_source(self, path):
-        return self.resolve_source(
-            path,
-            self.env['userland_source_dir'],
-            self.env['userland_in_exts']
-        )
-
-    def setup(self):
+    def setup_one(self):
         '''
-        Similar to timed_main, but gets run only once for all --arch and --emulator,
-        before timed_main.
-
-        Different from __init__, since at this point env has already been calculated,
-        so variables that don't depend on --arch or --emulator can be used.
+        Run just before timed_main, after _init_env.
         '''
         pass
 
+    def toplevel_to_source_dir(self, path, source_dir):
+        path = os.path.abspath(path)
+        if path == self.env['root_dir']:
+            return source_dir
+        else:
+            return path
+
     def timed_main(self):
         '''
         Main action of the derived class.
@@ -1212,7 +1333,7 @@ lunch aosp_{}-eng
 
     def teardown(self):
         '''
-        Similar to setup, but run after timed_main.
+        Similar to setup, but run once after all timed_main are called.
         '''
         pass
 
@@ -1230,22 +1351,167 @@ class BuildCliFunction(LkmcCliFunction):
             default=False,
             help='Clean the build instead of building.',
         ),
+        self._build_arguments = {
+            '--ccflags': {
+                'default': '',
+                'help': '''\
+Pass the given compiler flags to all languages (C, C++, Fortran, etc.)
+''',
+            },
+            '--force-rebuild': {
+                'default': False,
+                "help": '''\
+Force rebuild even if sources didn't change.
+''',
+            },
+            '--optimization-level': {
+                'default': '0',
+                'help': '''
+Use the given GCC -O optimization level.
+For some scripts, there are hard technical challenges why it cannot
+be implemented, e.g.: https://github.com/cirosantilli/linux-kernel-module-cheat#kernel-o0
+and for others such as gem5 have their custom mechanism:
+https://github.com/cirosantilli/linux-kernel-module-cheat#gem5-debug-build
+''',
+            }
+        }
+
+    def _add_argument(self, argument_name):
         self.add_argument(
-            '--force-rebuild',
-            default=False,
-            help='''\
-Force rebuild even if sources didn't chage.
-TODO: not yet implemented on all scripts.
-'''
+            argument_name,
+            **self._build_arguments[argument_name]
         )
-        self.add_argument(
-            '-j',
-            '--nproc',
-            default=multiprocessing.cpu_count(),
-            type=int,
-            help='Number of processors to use for the build.',
-        )
-        self.test_results = []
+
+    def _build_one(
+        self,
+        in_path,
+        out_path,
+        build_exts=None,
+        cc_flags=None,
+        cc_flags_after=None,
+        extra_objs_userland_asm=None,
+        extra_objs_lkmc_common=None,
+        extra_deps=None,
+        link=True,
+    ):
+        '''
+        Build one userland or baremetal executable.
+        '''
+        if cc_flags is None:
+            cc_flags = []
+        else:
+            cc_flags = cc_flags.copy()
+        if cc_flags_after is None:
+            cc_flags_after = []
+        else:
+            cc_flags_after = cc_flags_after.copy()
+        if extra_deps is None:
+            extra_deps = []
+        ret = 0
+        in_dir, in_basename = os.path.split(in_path)
+        in_dir_abs = os.path.abspath(in_dir)
+        dirpath_relative_root = in_dir_abs[len(self.env['root_dir']) + 1:]
+        dirpath_relative_root_components = dirpath_relative_root.split(os.sep)
+        dirpath_relative_root_components_len = len(dirpath_relative_root_components)
+        my_path_properties = path_properties.get(os.path.join(
+            dirpath_relative_root,
+            in_basename
+        ))
+        if my_path_properties.should_be_built(self.env, link):
+            extra_objs= []
+            if my_path_properties['extra_objs_lkmc_common']:
+                extra_objs.extend(extra_objs_lkmc_common)
+            if my_path_properties['extra_objs_userland_asm']:
+                extra_objs.extend(extra_objs_userland_asm)
+            if self.need_rebuild([in_path] + extra_objs + extra_deps, out_path):
+                cc_flags.extend(my_path_properties['cc_flags'])
+                cc_flags_after.extend(my_path_properties['cc_flags_after'])
+                if my_path_properties['cc_pedantic']:
+                    cc_flags.extend(['-pedantic', LF])
+                if not link:
+                    cc_flags.extend(['-c', LF])
+                in_ext = os.path.splitext(in_path)[1]
+                if in_ext in (self.env['c_ext'], self.env['asm_ext']):
+                    cc = self.env['gcc_path']
+                    std = my_path_properties['c_std']
+                elif in_ext == self.env['cxx_ext']:
+                    cc = self.env['gxx_path']
+                    std = my_path_properties['cxx_std']
+                if dirpath_relative_root_components_len > 0:
+                    if dirpath_relative_root_components[0] == 'userland':
+                        if dirpath_relative_root_components_len > 1:
+                            if dirpath_relative_root_components[1] == 'arch':
+                                cc_flags.extend([
+                                    '-I', os.path.join(self.env['userland_source_arch_arch_dir']), LF,
+                                    '-I', os.path.join(self.env['userland_source_arch_dir']), LF,
+                                ])
+                            elif dirpath_relative_root_components[1] == 'libs':
+                                if dirpath_relative_root_components_len > 1:
+                                    if self.env['gcc_which'] == 'host':
+                                        eigen_root = '/'
+                                    else:
+                                        eigen_root = self.env['buildroot_staging_dir']
+                                    packages = {
+                                        'eigen': {
+                                            # TODO: was failing with:
+                                            # fatal error: Eigen/Dense: No such file or directory as of
+                                            # 975ce0723ee3fa1fea1766e6683e2f3acb8558d6
+                                            # http://lists.busybox.net/pipermail/buildroot/2018-June/222914.html
+                                            'cc_flags': [
+                                                '-I',
+                                                os.path.join(
+                                                    eigen_root,
+                                                    'usr',
+                                                    'include',
+                                                    'eigen3'
+                                                ),
+                                                LF
+                                            ],
+                                            # Header only.
+                                            'cc_flags_after': [],
+                                        },
+                                    }
+                                    package_key = dirpath_relative_root_components[1]
+                                    if package_key in packages:
+                                        package = packages[package_key]
+                                    else:
+                                        package = {}
+                                    if 'cc_flags' in package:
+                                        cc_flags.extend(package['cc_flags'])
+                                    else:
+                                        pkg_config_output = subprocess.check_output([
+                                            self.env['pkg_config'],
+                                            '--cflags',
+                                            package_key
+                                        ]).decode()
+                                        cc_flags.extend(self.sh.shlex_split(pkg_config_output))
+                                    if 'cc_flags_after' in package:
+                                        cc_flags.extend(package['cc_flags_after'])
+                                    else:
+                                        pkg_config_output = subprocess.check_output([
+                                            self.env['pkg_config'],
+                                            '--libs',
+                                            package_key
+                                        ]).decode()
+                                        cc_flags_after.extend(self.sh.shlex_split(pkg_config_output))
+                os.makedirs(os.path.dirname(out_path), exist_ok=True)
+                ret = self.sh.run_cmd(
+                    (
+                        [
+                            cc, LF,
+                        ] +
+                        cc_flags +
+                        [
+                            '-std={}'.format(std), LF,
+                            '-o', out_path, LF,
+                            in_path, LF,
+                        ] +
+                        self.sh.add_newlines(extra_objs) +
+                        cc_flags_after
+                    ),
+                    extra_paths=[self.env['ccache_dir']],
+                )
+        return ret
 
     def clean(self):
         build_dir = self.get_build_dir()
@@ -1282,26 +1548,36 @@ TODO: not yet implemented on all scripts.
         else:
             return self.build()
 
-# from aenum import Enum  # for the aenum version
-TestResult = enum.Enum('TestResult', ['PASS', 'FAIL'])
+TestStatus = enum.Enum('TestStatus', ['PASS', 'FAIL'])
 
-class Test:
+@functools.total_ordering
+class TestResult:
     def __init__(
         self,
-        test_id: str,
-        result : TestResult =None,
-        ellapsed_seconds : float =None
+        test_id: str ='',
+        status : TestStatus =TestStatus.PASS,
+        ellapsed_seconds : float =0,
+        reason : str =''
     ):
         self.test_id = test_id
-        self.result = result
+        self.status = status
         self.ellapsed_seconds = ellapsed_seconds
+        self.reason = reason
+
+    def __eq__(self, other):
+        return self.test_id == other.test_id
+
+    def __lt__(self, other):
+        return self.test_id < other.test_id
+
     def __str__(self):
-        out = []
-        if self.result is not None:
-            out.append(self.result.name)
-        if self.ellapsed_seconds is not None:
-            out.append(LkmcCliFunction.seconds_to_hms(self.ellapsed_seconds))
-        out.append(self.test_id)
+        out = [
+            self.status.name,
+            LkmcCliFunction.seconds_to_hms(self.ellapsed_seconds),
+            repr(self.test_id),
+        ]
+        if self.status is TestStatus.FAIL:
+            out.append(repr(self.reason))
         return ' '.join(out)
 
 class TestCliFunction(LkmcCliFunction):
@@ -1313,15 +1589,22 @@ class TestCliFunction(LkmcCliFunction):
 
     def __init__(self, *args, **kwargs):
         defaults = {
-            'print_time': False,
+            'show_time': False,
         }
         if 'defaults' in kwargs:
             defaults.update(kwargs['defaults'])
         kwargs['defaults'] = defaults
         super().__init__(*args, **kwargs)
-        self.tests = []
+        self.test_results = queue.Queue()
 
-    def run_test(self, run_obj, run_args=None, test_id=None):
+    def run_test(
+        self,
+        run_obj,
+        run_args=None,
+        test_id=None,
+        expected_exit_status=None,
+        thread_id=0,
+    ):
         '''
         This is a setup / run / teardown setup for simple tests that just do a single run.
 
@@ -1331,13 +1614,20 @@ class TestCliFunction(LkmcCliFunction):
         :param run_obj: callable object
         :param run_args: arguments to be passed to the runnable object
         :param test_id: test identifier, to be added in addition to of arch and emulator ids
+        :param thread_id: which thread the test is running under
         '''
         if run_obj.is_arch_supported(self.env['arch']):
             if run_args is None:
                 run_args = {}
+            run_args['run_id'] = thread_id
             test_id_string = self.test_setup(test_id)
             exit_status = run_obj(**run_args)
-            self.test_teardown(run_obj, exit_status, test_id_string)
+            return self.test_teardown(
+                run_obj,
+                exit_status,
+                test_id_string,
+                expected_exit_status=expected_exit_status
+            )
 
     def test_setup(self, test_id):
         test_id_string = '{} {}'.format(self.env['emulator'], self.env['arch'])
@@ -1346,41 +1636,55 @@ class TestCliFunction(LkmcCliFunction):
         self.log_info('test_id {}'.format(test_id_string), flush=True)
         return test_id_string
 
-    def test_teardown(self, run_obj, exit_status, test_id_string):
+    def test_teardown(
+        self,
+        run_obj,
+        exit_status,
+        test_id_string,
+        expected_exit_status=None
+    ):
+        if expected_exit_status is None:
+            expected_exit_status = 0
+        reason = ''
         if not self.env['dry_run']:
-            if exit_status == 0:
-                test_result = TestResult.PASS
+            if exit_status == expected_exit_status:
+                test_result = TestStatus.PASS
             else:
-                test_result = TestResult.FAIL
-                if self.env['quit_on_fail']:
-                    self.log_error('Test failed')
-                    sys.exit(1)
-            self.log_info('test_result {}'.format(test_result.name))
+                test_result = TestStatus.FAIL
+                reason = 'wrong exit status, got {} expected {}'.format(
+                    exit_status,
+                    expected_exit_status
+                )
             ellapsed_seconds = run_obj.ellapsed_seconds
         else:
-            test_result = None
-            ellapsed_seconds = None
-        self.log_info()
-        self.tests.append(Test(test_id_string, test_result, ellapsed_seconds))
+            test_result = TestStatus.PASS
+            ellapsed_seconds = 0
+        test_result = TestResult(
+            test_id_string,
+            test_result,
+            ellapsed_seconds,
+            reason
+        )
+        self.log_info(test_result)
+        self.test_results.put(test_result)
+        return test_result
 
     def teardown(self):
         '''
         :return: 1 if any test failed, 0 otherwise
         '''
-        self.log_info('Test result summary')
+        self.log_info('\nTest result summary')
         passes = []
         fails = []
-        for test in self.tests:
-            if test.result in (TestResult.PASS, None):
-                passes.append(test)
+        while not self.test_results.empty():
+            test = self.test_results.get()
+            if test.status in (TestStatus.PASS, None):
+                bisect.insort(passes, test)
             else:
-                fails.append(test)
-        if passes:
-            for test in passes:
-                self.log_info(test)
+                bisect.insort(fails, test)
+        for test in itertools.chain(passes, fails):
+            self.log_info(test)
         if fails:
-            for test in fails:
-                self.log_info(test)
             self.log_error('A test failed')
             return 1
         return 0
diff --git a/copy-overlay b/copy-overlay
index a59155a..3ad0692 100755
--- a/copy-overlay
+++ b/copy-overlay
@@ -15,12 +15,9 @@ https://github.com/cirosantilli/linux-kernel-module-cheat#rootfs_overlay
 ''')
 
     def build(self):
-        # TODO: print rsync equivalent, move into shell_helpers.
-        distutils.dir_util.copy_tree(
-            self.env['rootfs_overlay_dir'],
-            self.env['out_rootfs_overlay_dir'],
-            preserve_symlinks=True,
-            update=1,
+        self.sh.copy_dir_if_update(
+            srcdir=self.env['rootfs_overlay_dir'],
+            destdir=self.env['out_rootfs_overlay_dir'],
         )
 
 if __name__ == '__main__':
diff --git a/gem5-stat b/gem5-stat
index cf1adff..7f75ef3 100755
--- a/gem5-stat
+++ b/gem5-stat
@@ -6,7 +6,7 @@ class Main(common.LkmcCliFunction):
     def __init__(self):
         super().__init__(
             defaults={
-                'print_time': False,
+                'show_time': False,
             },
             description='''\
 Get the value of a gem5 stat from the stats.txt file.
diff --git a/getvar b/getvar
index e9c7c28..81eec00 100755
--- a/getvar
+++ b/getvar
@@ -6,7 +6,7 @@ class Main(common.LkmcCliFunction):
     def __init__(self):
         super().__init__(
             defaults = {
-                'print_time': False,
+                'show_time': False,
             },
             description='''\
 Print the value of a self.env['py'] variable.
diff --git a/include/README.adoc b/include/README.adoc
deleted file mode 100644
index f3e97b5..0000000
--- a/include/README.adoc
+++ /dev/null
@@ -1 +0,0 @@
-https://github.com/cirosantilli/linux-kernel-module-cheat#include-directory
diff --git a/kernel_modules/Makefile b/kernel_modules/Makefile
index 5ac980d..7f5d60a 100644
--- a/kernel_modules/Makefile
+++ b/kernel_modules/Makefile
@@ -1,15 +1,17 @@
-ifeq ($(OBJECT_FILES),)
-# Hardcoding  LKMC_MODULE_SUBDIRS here because is not defined.
-obj-m += $(addsuffix .o, $(notdir $(basename $(filter-out %.mod.c, $(wildcard $(BR2_EXTERNAL_LKMC_PATH)/kernel_modules/*.c)))))
-else
 # Trying to do:
 # $(MAKE) -C '$(LINUX_DIR)' M='$(M)' hello.ko hello2.ko
 # to restrict which modules are built leads to failures
 # when doing parallel builds. The only solution I could find
 # was to let the host select obj-m itself.
 obj-m += $(OBJECT_FILES)
-endif
-ccflags-y := -DDEBUG -g -std=gnu99 -Werror -Wno-declaration-after-statement -Wframe-larger-than=1000000000
+ccflags-y := \
+  -DDEBUG \
+  -ggdb3 \
+  -std=gnu99 \
+  -Werror \
+  -Wframe-larger-than=1000000000 \
+  -Wno-declaration-after-statement \
+  $(CCFLAGS)
 
 .PHONY: all
 
diff --git a/kernel_modules/anonymous_inode.c b/kernel_modules/anonymous_inode.c
index e3dbd21..d060763 100644
--- a/kernel_modules/anonymous_inode.c
+++ b/kernel_modules/anonymous_inode.c
@@ -9,7 +9,7 @@
 #include <linux/printk.h> /* printk */
 #include <linux/uaccess.h> /* copy_from_user */
 
-#include "../include/anonymous_inode.h"
+#include <lkmc/anonymous_inode.h>
 
 static struct dentry *debugfs_file;
 static u32 myval = 1;
diff --git a/kernel_modules/ioctl.c b/kernel_modules/ioctl.c
index 4079a50..7207683 100644
--- a/kernel_modules/ioctl.c
+++ b/kernel_modules/ioctl.c
@@ -5,7 +5,7 @@
 #include <linux/printk.h> /* printk */
 #include <linux/uaccess.h> /* copy_from_user, copy_to_user */
 
-#include "../include/ioctl.h"
+#include <lkmc/ioctl.h>
 
 static struct dentry *debugfs_file;
 
diff --git a/kernel_modules/netlink.c b/kernel_modules/netlink.c
index 34b5bca..0f012b6 100644
--- a/kernel_modules/netlink.c
+++ b/kernel_modules/netlink.c
@@ -6,7 +6,7 @@
 #include <linux/skbuff.h>
 #include <net/sock.h>
 
-#include "../include/netlink.h"
+#include <lkmc/netlink.h>
 
 struct sock *nl_sk = NULL;
 
diff --git a/kernel_modules/pmccntr.c b/kernel_modules/pmccntr.c
index 4f33f3d..9ddccf3 100644
--- a/kernel_modules/pmccntr.c
+++ b/kernel_modules/pmccntr.c
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#pmccntr */
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-pmccntr */
 
 #include <linux/debugfs.h>
 #include <linux/errno.h> /* EFAULT */
diff --git a/kernel_modules/ring0.c b/kernel_modules/ring0.c
index 1fad122..211f342 100644
--- a/kernel_modules/ring0.c
+++ b/kernel_modules/ring0.c
@@ -3,13 +3,13 @@
 #include <linux/module.h>
 #include <linux/kernel.h>
 
-#include "../include/ring0.h"
+#include <lkmc/ring0.h>
 
 static int myinit(void)
 {
 #if defined(__x86_64__) || defined(__i386__)
-	Ring0Regs ring0_regs;
-	ring0_get_control_regs(&ring0_regs);
+	LkmcRing0Regs ring0_regs;
+	lkmc_ring0_get_control_regs(&ring0_regs);
 	pr_info("cr0 = 0x%8.8llX\n", (unsigned long long)ring0_regs.cr0);
 	pr_info("cr2 = 0x%8.8llX\n", (unsigned long long)ring0_regs.cr2);
 	pr_info("cr3 = 0x%8.8llX\n", (unsigned long long)ring0_regs.cr3);
diff --git a/lkmc.c b/lkmc.c
index 23d56d7..b8669b7 100644
--- a/lkmc.c
+++ b/lkmc.c
@@ -11,22 +11,15 @@ void lkmc_assert(bool condition) {
         lkmc_assert_fail();
 }
 
-void lkmc_assert_fail() {
-    puts("lkmc_test_fail");
+void lkmc_assert_fail(void) {
     exit(1);
 }
 
-bool lkmc_vector_equal(size_t n, double *v1, double *v2, double max_err) {
-    double sum = 0.0;
-    double diff;
-    size_t i;
-    for (i = 0; i < n; ++i) {
-        diff = v1[i] - v2[i];
-        sum += diff * diff;
+void lkmc_baremetal_on_exit_callback(int status, void *arg) {
+    (void)arg;
+    if (status != 0) {
+        printf("lkmc_exit_status_%d\n", status);
     }
-    if (sqrt(sum)/n > max_err)
-        return false;
-    return true;
 }
 
 #if defined(__aarch64__)
diff --git a/lkmc.h b/lkmc.h
index ed7df9e..7bcc931 100644
--- a/lkmc.h
+++ b/lkmc.h
@@ -11,7 +11,7 @@
 
 void lkmc_assert(bool);
 void lkmc_assert_fail();
-bool lkmc_vector_equal(size_t n, double *v1, double *v2, double max_err);
+void lkmc_baremetal_on_exit_callback(int status, void *arg);
 #endif
 
 /* https://stackoverflow.com/questions/1489932/how-to-concatenate-twice-with-the-c-preprocessor-and-expand-a-macro-as-in-arg */
diff --git a/lkmc/README.adoc b/lkmc/README.adoc
new file mode 100644
index 0000000..7c6fe7f
--- /dev/null
+++ b/lkmc/README.adoc
@@ -0,0 +1 @@
+https://github.com/cirosantilli/linux-kernel-module-cheat#lkmc-directory
diff --git a/lkmc/__init__.py b/lkmc/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/lkmc/add.c b/lkmc/add.c
new file mode 100644
index 0000000..b363317
--- /dev/null
+++ b/lkmc/add.c
@@ -0,0 +1,13 @@
+#include <lkmc.h>
+
+int main(void) {
+    int i, j, k;
+    i = 1;
+    /* test-gdb-op1 */
+    j = 2;
+    /* test-gdb-op2 */
+    k = i + j;
+    /* test-gdb-result */
+    if (k != 3)
+        lkmc_assert_fail();
+}
diff --git a/lkmc/add.py b/lkmc/add.py
new file mode 100644
index 0000000..0f42d2b
--- /dev/null
+++ b/lkmc/add.py
@@ -0,0 +1,9 @@
+def test(self):
+    self.sendline('tbreak main')
+    self.sendline('continue')
+    self.continue_to('op1')
+    assert self.get_int('i') == 1
+    self.continue_to('op2')
+    assert self.get_int('j') == 2
+    self.continue_to('result')
+    assert self.get_int('k') == 3
diff --git a/include/anonymous_inode.h b/lkmc/anonymous_inode.h
similarity index 51%
rename from include/anonymous_inode.h
rename to lkmc/anonymous_inode.h
index d35fc06..93fed14 100644
--- a/include/anonymous_inode.h
+++ b/lkmc/anonymous_inode.h
@@ -1,5 +1,7 @@
-#ifndef IOCTL_H
-#define IOCTL_H
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#anonymous-inode */
+
+#ifndef LKMC_ANONYMOUS_INODE_H
+#define LKMC_ANONYMOUS_INODE_H
 
 #include <linux/ioctl.h>
 
diff --git a/lkmc/assert_fail.c b/lkmc/assert_fail.c
new file mode 100644
index 0000000..cc881b6
--- /dev/null
+++ b/lkmc/assert_fail.c
@@ -0,0 +1,18 @@
+/* Let's see what happens when an assert fails.
+ *
+ * Outcome on Ubuntu 19.04 shows the failure line:
+ *
+ *     assert_fail.out: /path/to/linux-kernel-module-cheat/userland/c/assert_fail.c:15: main: Assertion `0' failed.
+ *
+ * and exit status 134 == 128 + 6, which corresponds to SIGABORT (6).
+ */
+
+#include <assert.h>
+#include <stdlib.h>
+#include <stdio.h>
+
+int main(void) {
+    assert(0);
+    puts("here");
+    return EXIT_SUCCESS;
+}
diff --git a/lkmc/hello.c b/lkmc/hello.c
new file mode 100644
index 0000000..42cb55d
--- /dev/null
+++ b/lkmc/hello.c
@@ -0,0 +1,9 @@
+/* Print hello to stdout ;-) */
+
+#include <stdio.h>
+#include <stdlib.h>
+
+int main(void) {
+    puts("hello");
+    return EXIT_SUCCESS;
+}
diff --git a/lkmc/import_path.py b/lkmc/import_path.py
new file mode 100644
index 0000000..e344e5c
--- /dev/null
+++ b/lkmc/import_path.py
@@ -0,0 +1,34 @@
+#!/usr/bin/env python3
+
+import importlib.machinery
+import importlib.util
+import os
+import sys
+
+root_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+
+def import_path(path):
+    '''
+    https://stackoverflow.com/questions/2601047/import-a-python-module-without-the-py-extension
+    https://stackoverflow.com/questions/31773310/what-does-the-first-argument-of-the-imp-load-source-method-do
+    '''
+    module_name = os.path.basename(path).replace('-', '_')
+    spec = importlib.util.spec_from_loader(
+        module_name,
+        importlib.machinery.SourceFileLoader(module_name, path)
+    )
+    module = importlib.util.module_from_spec(spec)
+    spec.loader.exec_module(module)
+    sys.modules[module_name] = module
+    return module
+
+def import_path_relative_root(basename):
+    return import_path(os.path.join(root_dir, basename))
+
+def import_path_main(basename):
+    '''
+    Import an object of the Main class of a given file.
+
+    By convention, we call the main object of all our CLI scripts as Main.
+    '''
+    return import_path_relative_root(basename).Main()
diff --git a/include/ioctl.h b/lkmc/ioctl.h
similarity index 92%
rename from include/ioctl.h
rename to lkmc/ioctl.h
index 14d6cdf..2469009 100644
--- a/include/ioctl.h
+++ b/lkmc/ioctl.h
@@ -1,5 +1,7 @@
-#ifndef IOCTL_H
-#define IOCTL_H
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#ioctl */
+
+#ifndef LKMC_IOCTL_H
+#define LKMC_IOCTL_H
 
 #include <linux/ioctl.h>
 
diff --git a/baremetal/interactive/assert_fail.c b/lkmc/lkmc_assert_fail.c
similarity index 98%
rename from baremetal/interactive/assert_fail.c
rename to lkmc/lkmc_assert_fail.c
index a08a498..68fbefa 100644
--- a/baremetal/interactive/assert_fail.c
+++ b/lkmc/lkmc_assert_fail.c
@@ -3,4 +3,3 @@
 int main(void) {
     lkmc_assert_fail();
 }
-
diff --git a/lkmc/m5ops.h b/lkmc/m5ops.h
new file mode 100644
index 0000000..4bdb1c6
--- /dev/null
+++ b/lkmc/m5ops.h
@@ -0,0 +1,56 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#m5ops-instructions */
+
+#ifndef LKMC_M5OPS_H
+#define LKMC_M5OPS_H
+
+#if LKMC_M5OPS_ENABLE == 1
+
+#if defined(__arm__)
+
+#define LKMC_M5OPS_CHECKPOINT_ASM mov r0, #0; mov r1, #0; mov r2, #0; mov r3, #0; .inst 0xEE000110 | (0x43 << 16)
+#define LKMC_M5OPS_DUMPSTATS_ASM  mov r0, #0; mov r1, #0; mov r2, #0; mov r3, #0; .inst 0xEE000110 | (0x41 << 16)
+#define LKMC_M5OPS_EXIT_ASM       mov r0, #0; mov r1, #0;                         .inst 0xEE000110 | (0x21 << 16)
+#define LKMC_M5OPS_FAIL_1_ASM     mov r0, #0; mov r1, #0; mov r2, #1; mov r3, #0; .inst 0xEE000110 | (0x22 << 16)
+#define LKMC_M5OPS_RESETSTATS_ASM mov r0, #0; mov r1, #0; mov r2, #0; mov r3, #0; .inst 0xEE000110 | (0x40 << 16)
+
+#define LKMC_M5OPS_CHECKPOINT __asm__ __volatile__ ("mov r0, #0; mov r1, #0; mov r2, #0; mov r3, #0; .inst 0xEE000110 | (0x43 << 16);" : : : "r0", "r1", "r2", "r3")
+#define LKMC_M5OPS_DUMPSTATS  __asm__ __volatile__ ("mov r0, #0; mov r1, #0; mov r2, #0; mov r3, #0; .inst 0xEE000110 | (0x41 << 16);" : : : "r0", "r1", "r2", "r3")
+#define LKMC_M5OPS_EXIT       __asm__ __volatile__ ("mov r0, #0; mov r1, #0;                         .inst 0xEE000110 | (0x21 << 16);" : : : "r0", "r1"            )
+#define LKMC_M5OPS_FAIL_1     __asm__ __volatile__ ("mov r0, #0; mov r1, #0; mov r2, #1; mov r3, #0; .inst 0xEE000110 | (0x22 << 16);" : : : "r0", "r1", "r2", "r3")
+#define LKMC_M5OPS_RESETSTATS __asm__ __volatile__ ("mov r0, #0; mov r1, #0; mov r2, #0; mov r3, #0; .inst 0xEE000110 | (0x40 << 16);" : : : "r0", "r1", "r2", "r3")
+
+#elif defined(__aarch64__)
+
+#define LKMC_M5OPS_CHECKPOINT_ASM mov x0, #0; mov x1, #0; .inst 0xFF000110 | (0x43 << 16);
+#define LKMC_M5OPS_DUMPSTATS_ASM  mov x0, #0; mov x1, #0; .inst 0xFF000110 | (0x41 << 16);
+#define LKMC_M5OPS_EXIT_ASM       mov x0, #0;             .inst 0XFF000110 | (0x21 << 16);
+#define LKMC_M5OPS_FAIL_1_ASM     mov x0, #0; mov x1, #1; .inst 0xFF000110 | (0x22 << 16);
+#define LKMC_M5OPS_RESETSTATS_ASM mov x0, #0; mov x1, #0; .inst 0XFF000110 | (0x40 << 16);
+
+#define LKMC_M5OPS_CHECKPOINT __asm__ __volatile__ ("mov x0, #0; mov x1, #0; .inst 0xFF000110 | (0x43 << 16);" : : : "x0", "x1")
+#define LKMC_M5OPS_DUMPSTATS  __asm__ __volatile__ ("mov x0, #0; mov x1, #0; .inst 0xFF000110 | (0x41 << 16);" : : : "x0", "x1")
+#define LKMC_M5OPS_EXIT       __asm__ __volatile__ ("mov x0, #0;             .inst 0XFF000110 | (0x21 << 16);" : : : "x0"      )
+#define LKMC_M5OPS_FAIL_1     __asm__ __volatile__ ("mov x0, #0; mov x1, #1; .inst 0xFF000110 | (0x22 << 16);" : : : "x0", "x1")
+#define LKMC_M5OPS_RESETSTATS __asm__ __volatile__ ("mov x0, #0; mov x1, #0; .inst 0XFF000110 | (0x40 << 16);" : : : "x0", "x1")
+
+#else
+#error m5ops not implemented for the current arch
+#endif
+
+#else
+
+#define LKMC_M5OPS_CHECKPOINT_ASM
+#define LKMC_M5OPS_DUMPSTATS_ASM
+#define LKMC_M5OPS_EXIT_ASM
+#define LKMC_M5OPS_FAIL_1_ASM
+#define LKMC_M5OPS_RESETSTATS_ASM
+
+#define LKMC_M5OPS_CHECKPOINT
+#define LKMC_M5OPS_DUMPSTATS
+#define LKMC_M5OPS_EXIT
+#define LKMC_M5OPS_FAIL_1
+#define LKMC_M5OPS_RESETSTATS
+
+#endif
+
+#endif
diff --git a/lkmc/math.h b/lkmc/math.h
new file mode 100644
index 0000000..80e42ca
--- /dev/null
+++ b/lkmc/math.h
@@ -0,0 +1,20 @@
+#ifndef LKMC_MATH_H
+#define LKMC_MATH_H
+
+#include <math.h>
+#include <stdbool.h>
+
+bool lkmc_vector_equal(size_t n, double *v1, double *v2, double max_err) {
+    double sum = 0.0;
+    double diff;
+    size_t i;
+    for (i = 0; i < n; ++i) {
+        diff = v1[i] - v2[i];
+        sum += diff * diff;
+    }
+    if (sqrt(sum)/n > max_err)
+        return false;
+    return true;
+}
+
+#endif
diff --git a/include/netlink.h b/lkmc/netlink.h
similarity index 72%
rename from include/netlink.h
rename to lkmc/netlink.h
index 927f51d..c31abf7 100644
--- a/include/netlink.h
+++ b/lkmc/netlink.h
@@ -1,5 +1,7 @@
-#ifndef NETLINK_H
-#define NETLINK_H
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#netlink-sockets */
+
+#ifndef LKMC_NETLINK_H
+#define LKMC_NETLINK_H
 
 /* Socket identifier, matches userland. TODO can be anything?
  * Is there a more scalable way to do it? E.g. ioctl device,
diff --git a/userland/common_userland.h b/lkmc/pagemap.h
similarity index 86%
rename from userland/common_userland.h
rename to lkmc/pagemap.h
index fbee7f4..f8891ca 100644
--- a/userland/common_userland.h
+++ b/lkmc/pagemap.h
@@ -1,5 +1,5 @@
-#ifndef COMMON_H
-#define COMMON_H
+#ifndef LKMC_PAGEMAP_H
+#define LKMC_PAGEMAP_H
 
 #define _XOPEN_SOURCE 700
 #include <fcntl.h> /* open */
@@ -17,7 +17,7 @@ typedef struct {
     unsigned int file_page : 1;
     unsigned int swapped : 1;
     unsigned int present : 1;
-} PagemapEntry;
+} LkmcPagemapEntry;
 
 /* Parse the pagemap entry for the given virtual address.
  *
@@ -26,8 +26,7 @@ typedef struct {
  * @param[in]  vaddr      virtual address to get entry for
  * @return                0 for success, 1 for failure
  */
-int pagemap_get_entry(PagemapEntry *entry, int pagemap_fd, uintptr_t vaddr)
-{
+int lkmc_pagemap_get_entry(LkmcPagemapEntry *entry, int pagemap_fd, uintptr_t vaddr) {
     size_t nread;
     ssize_t ret;
     uint64_t data;
@@ -62,8 +61,7 @@ int pagemap_get_entry(PagemapEntry *entry, int pagemap_fd, uintptr_t vaddr)
  * @param[in]  vaddr virtual address to get entry for
  * @return           0 for success, 1 for failure
  */
-int virt_to_phys_user(uintptr_t *paddr, pid_t pid, uintptr_t vaddr)
-{
+int lkmc_pagemap_virt_to_phys_user(uintptr_t *paddr, pid_t pid, uintptr_t vaddr) {
     char pagemap_file[BUFSIZ];
     int pagemap_fd;
 
@@ -72,8 +70,8 @@ int virt_to_phys_user(uintptr_t *paddr, pid_t pid, uintptr_t vaddr)
     if (pagemap_fd < 0) {
         return 1;
     }
-    PagemapEntry entry;
-    if (pagemap_get_entry(&entry, pagemap_fd, vaddr)) {
+    LkmcPagemapEntry entry;
+    if (lkmc_pagemap_get_entry(&entry, pagemap_fd, vaddr)) {
         return 1;
     }
     close(pagemap_fd);
diff --git a/include/ring0.h b/lkmc/ring0.h
similarity index 75%
rename from include/ring0.h
rename to lkmc/ring0.h
index 8c3a885..17fe7cf 100644
--- a/include/ring0.h
+++ b/lkmc/ring0.h
@@ -1,29 +1,31 @@
-#if defined(__x86_64__) || defined(__i386__)
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#ring0 */
 
+#ifndef LKMC_RING0_H
+#define LKMC_RING0_H
+#if defined(__x86_64__) || defined(__i386__)
 #ifdef THIS_MODULE
 #include <linux/kernel.h>
 #if defined(__x86_64__)
-typedef u64 T;
+typedef u64 LkmcRing0RegsType;
 #elif defined(__i386__)
-typedef u32 T;
+typedef u32 LkmcRing0RegsType;
 #endif
 #else
 #include <stdint.h>
 #if defined(__x86_64__)
-typedef uint64_t T;
+typedef uint64_t LkmcRing0RegsType;
 #elif defined(__i386__)
-typedef uint32_t T;
+typedef uint32_t LkmcRing0RegsType;
 #endif
 #endif
 
 typedef struct {
-    T cr0;
-    T cr2;
-    T cr3;
-} Ring0Regs;
+    LkmcRing0RegsType cr0;
+    LkmcRing0RegsType cr2;
+    LkmcRing0RegsType cr3;
+} LkmcRing0Regs;
 
-void ring0_get_control_regs(Ring0Regs *ring0_regs)
-{
+void lkmc_ring0_get_control_regs(LkmcRing0Regs *ring0_regs) {
 #if defined(__x86_64__)
     __asm__ __volatile__ (
         "mov %%cr0, %%rax;"
@@ -70,5 +72,5 @@ void ring0_get_control_regs(Ring0Regs *ring0_regs)
     );
 #endif
 }
-
+#endif
 #endif
diff --git a/path_properties.py b/path_properties.py
new file mode 100644
index 0000000..70c1381
--- /dev/null
+++ b/path_properties.py
@@ -0,0 +1,413 @@
+#!/usr/bin/env python3
+
+import os
+
+from shell_helpers import LF
+
+class PathProperties:
+    default_c_std = 'c11'
+    default_cxx_std = 'c++17'
+    default_properties = {
+        'allowed_archs': None,
+        'c_std': default_c_std,
+        'cc_flags': [
+            '-Wall', LF,
+            '-Werror', LF,
+            '-Wextra', LF,
+            '-Wno-unused-function', LF,
+            '-fopenmp', LF,
+            '-ggdb3', LF,
+            # PIE causes the following problems:
+            # * QEMU GDB step debug does not find breakpoints:
+            #   https://stackoverflow.com/questions/51310756/how-to-gdb-step-debug-a-dynamically-linked-executable-in-qemu-user-mode/51343326#51343326
+            # * when writing assembly code, we have to constantly think about it:
+            #   https://stackoverflow.com/questions/2463150/what-is-the-fpie-option-for-position-independent-executables-in-gcc-and-ld/51308031#51308031
+            # As of 91986fb2955f96e06d1c5ffcc5536ba9f0af1fd9, our Buildroot toolchain
+            # does not have it enabled by default, but the Ubuntu 18.04 host toolchain does.
+            '-fno-pie', LF,
+            '-no-pie', LF,
+        ],
+        'cc_flags_after': [],
+        'cc_pedantic': True,
+        'cxx_std': default_cxx_std,
+        # Expected program exit status. When signals are raised, this refers
+        # to the native exit status. as reported by Bash #?.
+        'exit_status': 0,
+        'extra_objs_baremetal_bootloader': False,
+        # We should get rid of this if we ever properly implement dependency graphs.
+        'extra_objs_lkmc_common': False,
+        'extra_objs_userland_asm': False,
+        'interactive': False,
+        # The script takes a perceptible amount of time to run. Possibly an infinite loop.
+        'more_than_1s': False,
+        # The path should not be built. E.g., it is symlinked into multiple archs.
+        'no_build': False,
+        # The path does not generate an executable in itself, e.g.
+        # it only generates intermediate object files. Therefore it
+        # should not be run while testing.
+        'no_executable': False,
+        # the test receives a signal. We skip those tests for now,
+        # on userland because we are lazy to figure out the exact semantics
+        # of how Python + QEMU + gem5 determine the exit status of signals.
+        'receives_signal': False,
+        # The script requires a non-trivial argument to be passed to run properly.
+        'requires_argument': False,
+        'requires_dynamic_library': False,
+        'requires_m5ops': False,
+        # gem5 fatal: syscall getcpu (#168) unimplemented.
+        'requires_syscall_getcpu': False,
+        'requires_semihosting': False,
+        # Requires certain of our custom kernel modules to be inserted to run.
+        'requires_kernel_modules': False,
+        # The example requires sudo, which usually implies that it can do something
+        # deeply to the system it runs on, which would preventing further interactive
+        # or test usage of the system, for example poweroff or messing up the GUI.
+        'requires_sudo': False,
+        # We were lazy to properly classify why we are skipping these tests.
+        # TODO get it done.
+        'skip_run_unclassified': False,
+        # Aruments added automatically to run when running tests,
+        # but not on manual running.
+        'test_run_args': {
+            'ctrl_c_host': True,
+            'show_stdout': False,
+            'show_time': False,
+            'background': True,
+        },
+    }
+
+    '''
+    Encodes properties of userland and baremetal paths.
+    For directories, it applies to all files under the directory.
+    Used to determine how to build and test the examples.
+    '''
+    def __init__(
+        self,
+        properties
+    ):
+        for key in properties:
+            if not key in self.default_properties:
+                raise ValueError('Unknown key: {}'.format(key))
+        self.properties = properties.copy()
+
+    def __getitem__(self, key):
+        return self.properties[key]
+
+    def __repr__(self):
+        return str(self.properties)
+
+    def set_path_components(self, path_components):
+        self.path_components = path_components
+
+    def should_be_built(self, env, link=False):
+        if len(self.path_components) > 1 and \
+                self.path_components[1] == 'libs' and \
+                not env['package_all'] and \
+                not self.path_components[2] in env['package']:
+            return False
+        return \
+            not self['no_build'] and \
+            (
+                self['allowed_archs'] is None or
+                env['arch'] in self['allowed_archs']
+            ) and \
+            not (
+                link and
+                self['no_executable']
+            )
+
+    def should_be_tested(self, env):
+        return (
+            self.should_be_built(env) and
+            not self['interactive'] and
+            not self['more_than_1s'] and
+            not self['no_executable'] and
+            not self['requires_argument'] and
+            not self['requires_kernel_modules'] and
+            not self['requires_sudo'] and
+            not self['skip_run_unclassified'] and
+            not (
+                env['emulator'] == 'gem5' and
+                (
+                    self['requires_dynamic_library'] or
+                    self['requires_semihosting'] or
+                    self['requires_syscall_getcpu']
+                )
+            ) and
+            not (
+                env['emulator'] == 'qemu' and
+                (
+                    self['requires_m5ops']
+                )
+            )
+        )
+
+    def update(self, other):
+        other_tmp_properties = other.properties.copy()
+        if 'cc_flags' in self.properties and 'cc_flags' in other_tmp_properties:
+            other_tmp_properties['cc_flags'] = \
+                self.properties['cc_flags'] + \
+                other_tmp_properties['cc_flags']
+        if 'test_run_args' in self.properties and 'test_run_args' in other_tmp_properties:
+            other_tmp_properties['test_run_args'] = {
+                **self.properties['test_run_args'],
+                **other_tmp_properties['test_run_args']
+            }
+        return self.properties.update(other_tmp_properties)
+
+class PrefixTree:
+    def __init__(self, path_properties_dict=None, children=None):
+        if path_properties_dict is None:
+            path_properties_dict = {}
+        if children is None:
+            children = {}
+        self.children = children
+        self.path_properties = PathProperties(path_properties_dict)
+
+    @staticmethod
+    def make_from_tuples(tuples):
+        def tree_from_tuples(tuple_):
+            if not type(tuple_) is tuple:
+                tuple_ = (tuple_, {})
+            cur_properties, cur_children = tuple_
+            return PrefixTree(cur_properties, cur_children)
+        top_tree = tree_from_tuples(tuples)
+        todo_trees = [top_tree]
+        while todo_trees:
+            cur_tree = todo_trees.pop()
+            cur_children = cur_tree.children
+            for child_key in cur_children:
+                new_tree = tree_from_tuples(cur_children[child_key])
+                cur_children[child_key] = new_tree
+                todo_trees.append(new_tree)
+        return top_tree
+
+def get(path):
+    cur_node = path_properties_tree
+    path_components = path.split(os.sep)
+    path_properties = PathProperties(cur_node.path_properties.properties.copy())
+    for path_component in path_components:
+        if path_component in cur_node.children:
+            cur_node = cur_node.children[path_component]
+            path_properties.update(cur_node.path_properties)
+        else:
+            break
+    path_properties.set_path_components(path_components)
+    return path_properties
+
+gnu_extension_properties = {
+    'c_std': 'gnu11',
+    'cc_pedantic': False,
+    'cxx_std': 'gnu++17'
+}
+freestanding_properties = {
+    'cc_flags': [
+        '-ffreestanding', LF,
+        '-nostdlib', LF,
+        '-static', LF,
+    ],
+    'extra_objs_userland_asm': False,
+}
+# See: https://github.com/cirosantilli/linux-kernel-module-cheat#path-properties
+path_properties_tuples = (
+    PathProperties.default_properties,
+    {
+        'baremetal': (
+            {},
+            {
+                'arch': (
+                    {},
+                    {
+                        'arm': (
+                            {'allowed_archs': {'arm'}},
+                            {
+                                'gem5_assert.S': {'requires_m5ops': True},
+                                'multicore.S': {'test_run_args': {'cpus': 2}},
+                                'no_bootloader': (
+                                    {'extra_objs_baremetal_bootloader': False},
+                                    {
+                                        'gem5_exit.S': {'requires_m5ops': True},
+                                        'semihost_exit.S': {'requires_semihosting': True},
+                                    }
+                                ),
+                                'return1.S': {'exit_status': 1},
+                                'semihost_exit.S': {'requires_semihosting': True},
+                            },
+
+                        ),
+                        'aarch64': (
+                            {'allowed_archs': {'aarch64'}},
+                            {
+                                'multicore.S': {'test_run_args': {'cpus': 2}},
+                                'no_bootloader': (
+                                    {'extra_objs_baremetal_bootloader': False},
+                                    {
+                                        'gem5_exit.S': {'requires_m5ops': True},
+                                        'semihost_exit.S': {'requires_semihosting': True},
+                                    }
+                                ),
+                                'return1.S': {'exit_status': 1},
+                                'semihost_exit.S': {'requires_semihosting': True},
+                            },
+                        )
+                    }
+                ),
+                'assert_fail.c': {'exit_status': 134},
+                'lkmc_assert_fail.c': {'exit_status': 1},
+                'exit1.c': {'exit_status': 1},
+                'infinite_loop.c': {'more_than_1s': True},
+                'lib': (
+                    {'no_executable': True},
+                    {}
+                ),
+                'getchar.c': {'interactive': True},
+                'return1.c': {'exit_status': 1},
+                'return2.c': {'exit_status': 2},
+            }
+        ),
+        'userland': (
+            {
+                'cc_flags_after': [
+                    '-lm', LF,
+                    '-pthread', LF,
+                ],
+            },
+            {
+                'arch': (
+                    {
+                        'extra_objs_userland_asm': True,
+                    },
+                    {
+                        'arm': (
+                            {
+                                'allowed_archs': {'arm'},
+                                'cc_flags': [
+                                    '-Xassembler', '-mcpu=cortex-a72', LF,
+                                    # To prevent:
+                                    # > vfp.S: Error: selected processor does not support <FPU instruction> in ARM mode
+                                    # https://stackoverflow.com/questions/41131432/cross-compiling-error-selected-processor-does-not-support-fmrx-r3-fpexc-in/52875732#52875732
+                                    # We aim to take the most extended mode currently available that works on QEMU.
+                                    '-Xassembler', '-mfpu=crypto-neon-fp-armv8.1', LF,
+                                    '-Xassembler', '-meabi=5', LF,
+                                    # Treat inline assembly as arm instead of thumb
+                                    # The opposite of -mthumb.
+                                    '-marm', LF,
+                                    # Make gcc generate .syntax unified for inline assembly.
+                                    # However, it gets ignored if -marm is given, which a GCC bug that was recently fixed:
+                                    # https://stackoverflow.com/questions/54078112/how-to-write-syntax-unified-ual-armv7-inline-assembly-in-gcc/54132097#54132097
+                                    # So we just write divided inline assembly for now.
+                                    '-masm-syntax-unified', LF,
+                                ]
+                            },
+                            {
+                                'c': (
+                                    {
+                                        'extra_objs_userland_asm': False,
+                                    },
+                                    {
+                                        'freestanding': freestanding_properties,
+                                    },
+                                ),
+                                'freestanding': freestanding_properties,
+                            }
+                        ),
+                        'aarch64': (
+                            {'allowed_archs': {'aarch64'}},
+                            {
+                                'c': (
+                                    {
+                                        'extra_objs_userland_asm': False,
+                                    },
+                                    {
+                                        'freestanding': freestanding_properties,
+                                    },
+                                ),
+                                'freestanding': freestanding_properties,
+                            }
+                        ),
+                        'fail.S': {'exit_status': 1},
+                        'main.c': {
+                            'extra_objs_userland_asm': False,
+                            'no_executable': True
+                        },
+                        'x86_64': (
+                            {'allowed_archs': {'x86_64'}},
+                            {
+                                'c': (
+                                    {
+                                        'extra_objs_userland_asm': False,
+                                    },
+                                    {
+                                        'freestanding': freestanding_properties,
+                                        'ring0.c': {
+                                            'exit_status': 139,
+                                            'receives_signal': True
+                                        }
+                                    }
+                                ),
+                                'freestanding': freestanding_properties,
+                                'lkmc_assert_eq_fail.S': {'exit_status': 1},
+                                'lkmc_assert_memcmp_fail.S': {'exit_status': 1},
+                            }
+                        ),
+                    }
+                ),
+                'c': (
+                    {},
+                    {
+                        'assert_fail.c': {
+                            'exit_status': 134,
+                            'receives_signal': True,
+                        },
+                        'false.c': {'exit_status': 1},
+                        'getchar.c': {'interactive': True},
+                        'infinite_loop.c': {'more_than_1s': True},
+                    }
+                ),
+                'gcc': gnu_extension_properties,
+                'kernel_modules': {**gnu_extension_properties, **{'requires_kernel_modules': True}},
+                'lkmc': (
+                    {'extra_objs_lkmc_common': True},
+                    {
+                        'assert_fail.c': {'exit_status': 1},
+                    }
+                ),
+                'libs': (
+                    {'requires_dynamic_library': True},
+                    {
+                        'libdrm': {'requires_sudo': True},
+                    }
+                ),
+                'linux': (
+                    gnu_extension_properties,
+                    {
+                        'ctrl_alt_del.c': {'requires_sudo': True},
+                        'init_env_poweroff.c': {'requires_sudo': True},
+                        'myinsmod.c': {'requires_sudo': True},
+                        'myrmmod.c': {'requires_sudo': True},
+                        'pagemap_dump.c': {'requires_argument': True},
+                        'poweroff.c': {'requires_sudo': True},
+                        'proc_events.c': {'requires_sudo': True},
+                        'proc_events.c': {'requires_sudo': True},
+                        'sched_getaffinity.c': {'requires_syscall_getcpu': True},
+                        'sched_getaffinity_threads.c': {
+                            'requires_syscall_getcpu': True,
+                            'more_than_1s': True,
+                        },
+                        'time_boot.c': {'requires_sudo': True},
+                        'virt_to_phys_user.c': {'requires_argument': True},
+                    }
+                ),
+                'posix': (
+                    {},
+                    {
+                        'count.c': {'more_than_1s': True},
+                        'sleep_forever.c': {'more_than_1s': True},
+                        'virt_to_phys_test.c': {'more_than_1s': True},
+                    }
+                ),
+            }
+        ),
+    }
+)
+path_properties_tree = PrefixTree.make_from_tuples(path_properties_tuples)
diff --git a/qemu-monitor b/qemu-monitor
index 50e8f51..7da27b7 100755
--- a/qemu-monitor
+++ b/qemu-monitor
@@ -5,41 +5,44 @@ import sys
 import telnetlib
 
 import common
-from shell_helpers import LF
 
-prompt = b'\n(qemu) '
-
-parser = self.get_argparse({
-    'description': '''\
+class Main(common.LkmcCliFunction):
+    def __init__(self):
+        super().__init__(
+            description='''\
 Run a command on the QEMU monitor of a running QEMU instance
 
 If the stdin is a terminal, open an interact shell. Otherwise,
 run commands from stdin and quit.
-'''
-})
-parser.add_argument(
-    'command',
-    help='If given, run this command and quit',
-    nargs='*',
-)
-args = self.setup(parser)
+''',
+        )
+        self.add_argument(
+            'command',
+            help='If given, run this command and quit',
+            nargs='*',
+        )
 
-def write_and_read(tn, cmd, prompt):
-    tn.write(cmd.encode('utf-8'))
-    return '\n'.join(tn.read_until(prompt).decode('utf-8').splitlines()[1:])[:-len(prompt)]
+    def timed_main(self):
+        def write_and_read(tn, cmd, prompt):
+            tn.write(cmd.encode('utf-8'))
+            return '\n'.join(tn.read_until(prompt).decode('utf-8').splitlines()[1:])[:-len(prompt)]
 
-with telnetlib.Telnet('localhost', kwargs['qemu_monitor_port']) as tn:
-    # Couldn't disable server echo, so just removing the write for now.
-    # https://stackoverflow.com/questions/12421799/how-to-disable-telnet-echo-in-python-telnetlib
-    # sock = tn.get_socket()
-    # sock.send(telnetlib.IAC + telnetlib.WILL + telnetlib.ECHO)
-    if os.isatty(sys.stdin.fileno()):
-        if kwargs['command'] == []:
-            print(tn.read_until(prompt).decode('utf-8'), end='')
-            tn.interact()
-        else:
-            tn.read_until(prompt)
-            print(write_and_read(tn, ' '.join(kwargs['command']) + '\n', prompt))
-    else:
-        tn.read_until(prompt)
-        print(write_and_read(tn, sys.stdin.read() + '\n', prompt))
+        with telnetlib.Telnet('localhost', self.env['qemu_monitor_port']) as tn:
+            prompt = b'\n(qemu) '
+            # Couldn't disable server echo, so just removing the write for now.
+            # https://stackoverflow.com/questions/12421799/how-to-disable-telnet-echo-in-python-telnetlib
+            # sock = tn.get_socket()
+            # sock.send(telnetlib.IAC + telnetlib.WILL + telnetlib.ECHO)
+            if os.isatty(sys.stdin.fileno()):
+                if self.env['command'] == []:
+                    print(tn.read_until(prompt).decode('utf-8'), end='')
+                    tn.interact()
+                else:
+                    tn.read_until(prompt)
+                    print(write_and_read(tn, ' '.join(self.env['command']) + '\n', prompt))
+            else:
+                tn.read_until(prompt)
+                print(write_and_read(tn, sys.stdin.read() + '\n', prompt))
+
+if __name__ == '__main__':
+    Main().cli()
diff --git a/release-zip b/release-zip
index 2bb2477..a3f98f5 100755
--- a/release-zip
+++ b/release-zip
@@ -12,7 +12,7 @@ class Main(common.LkmcCliFunction):
 https://github.com/cirosantilli/linux-kernel-module-cheat#release-zip
 ''',
             defaults = {
-                'print_time': False,
+                'show_time': False,
             }
         )
         self.zip_files = []
diff --git a/rootfs_overlay/.profile b/rootfs_overlay/.profile
new file mode 100644
index 0000000..d7c6426
--- /dev/null
+++ b/rootfs_overlay/.profile
@@ -0,0 +1,4 @@
+# https://github.com/cirosantilli/linux-kernel-module-cheat#busybox-shell-initrc-files
+echo "hello .profile"
+export PS1='\w\n\u@\h# '
+cd /lkmc
diff --git a/rootfs_overlay/anonymous_inode.sh b/rootfs_overlay/anonymous_inode.sh
deleted file mode 100755
index 630250f..0000000
--- a/rootfs_overlay/anonymous_inode.sh
+++ /dev/null
@@ -1,5 +0,0 @@
-#!/bin/sh
-set -e
-insmod anonymous_inode.ko
-[ "$(/anonymous_inode.out /sys/kernel/debug/lkmc_anonymous_inode 3)" = "$(printf '1\n10\n100')" ]
-rmmod anonymous_inode
diff --git a/rootfs_overlay/conf.sh b/rootfs_overlay/conf.sh
deleted file mode 100755
index 2899b22..0000000
--- a/rootfs_overlay/conf.sh
+++ /dev/null
@@ -1,2 +0,0 @@
-#!/bin/sh
-zcat /proc/config.gz | grep -Ei "${1:-}"
diff --git a/rootfs_overlay/etc/init.d/S98 b/rootfs_overlay/etc/init.d/S98
index 8e534e6..2e92321 100755
--- a/rootfs_overlay/etc/init.d/S98
+++ b/rootfs_overlay/etc/init.d/S98
@@ -1,5 +1,7 @@
 #!/bin/sh
+# https://github.com/cirosantilli/linux-kernel-module-cheat#init-busybox
 echo "hello S98"
+cd "$lkmc_home"
 if [ -n "$lkmc_eval" ]; then
   eval "$lkmc_eval"
 elif [ -n "$lkmc_eval_base64" ]; then
diff --git a/rootfs_overlay/eval.sh b/rootfs_overlay/eval.sh
deleted file mode 100755
index 2fe7ccd..0000000
--- a/rootfs_overlay/eval.sh
+++ /dev/null
@@ -1,18 +0,0 @@
-#!/bin/sh
-echo "$lkmc_eval"
-eval "$lkmc_eval"
-
-# Ideally, this script would do just:
-#
-## Get rid of the '-'.
-#shift
-#echo "$@"
-#
-# However, the kernel CLI parsing is crap, and the 4.14 docs lie.
-#
-# In particular, not all that is passed after "-" goes to an argument to init,
-# e.g. stuff with dots like "- ./poweroff.out" still gets treated specially and
-# does not go to init.
-#
-# This also likely means that the above solution is also unreliable in some cases,
-# and that in the end you just have to add a script to the root filesystem.
diff --git a/rootfs_overlay/eval_base64.sh b/rootfs_overlay/eval_base64.sh
deleted file mode 100755
index c3eddcf..0000000
--- a/rootfs_overlay/eval_base64.sh
+++ /dev/null
@@ -1,2 +0,0 @@
-#!/bin/sh
-eval "$(printf "$lkmc_eval" | base64 -d)"
diff --git a/rootfs_overlay/ioctl.sh b/rootfs_overlay/ioctl.sh
deleted file mode 100755
index 095b9ed..0000000
--- a/rootfs_overlay/ioctl.sh
+++ /dev/null
@@ -1,7 +0,0 @@
-#!/bin/sh
-set -e
-f=/sys/kernel/debug/lkmc_ioctl
-insmod ioctl.ko
-[ "$(/ioctl.out "$f" 0 1)" = 2 ]
-[ "$(/ioctl.out "$f" 1 1 1)" = '2 0' ]
-rmmod ioctl
diff --git a/rootfs_overlay/lkmc/anonymous_inode.sh b/rootfs_overlay/lkmc/anonymous_inode.sh
new file mode 100755
index 0000000..89df545
--- /dev/null
+++ b/rootfs_overlay/lkmc/anonymous_inode.sh
@@ -0,0 +1,6 @@
+#!/bin/sh
+# https://github.com/cirosantilli/linux-kernel-module-cheat#anonymous-inode
+set -e
+insmod anonymous_inode.ko
+[ "$(./kernel_modules/anonymous_inode.out /sys/kernel/debug/lkmc_anonymous_inode 3)" = "$(printf '1\n10\n100')" ]
+rmmod anonymous_inode
diff --git a/rootfs_overlay/character_device.sh b/rootfs_overlay/lkmc/character_device.sh
similarity index 100%
rename from rootfs_overlay/character_device.sh
rename to rootfs_overlay/lkmc/character_device.sh
diff --git a/rootfs_overlay/character_device_create.sh b/rootfs_overlay/lkmc/character_device_create.sh
similarity index 100%
rename from rootfs_overlay/character_device_create.sh
rename to rootfs_overlay/lkmc/character_device_create.sh
diff --git a/rootfs_overlay/lkmc/conf.sh b/rootfs_overlay/lkmc/conf.sh
new file mode 100755
index 0000000..2d37b1c
--- /dev/null
+++ b/rootfs_overlay/lkmc/conf.sh
@@ -0,0 +1,3 @@
+#!/bin/sh
+# https://github.com/cirosantilli/linux-kernel-module-cheat#find-the-kernel-config
+zcat /proc/config.gz | grep -Ei "${1:-}"
diff --git a/rootfs_overlay/count.sh b/rootfs_overlay/lkmc/count.sh
similarity index 64%
rename from rootfs_overlay/count.sh
rename to rootfs_overlay/lkmc/count.sh
index d084b81..7d210d0 100755
--- a/rootfs_overlay/count.sh
+++ b/rootfs_overlay/lkmc/count.sh
@@ -1,4 +1,5 @@
 #!/bin/sh
+# Count to infinity with 1 second sleep between each increment.
 # Generate infinitely many system calls :-)
 i=0
 while true; do
diff --git a/rootfs_overlay/debugfs.sh b/rootfs_overlay/lkmc/debugfs.sh
similarity index 100%
rename from rootfs_overlay/debugfs.sh
rename to rootfs_overlay/lkmc/debugfs.sh
diff --git a/rootfs_overlay/dep.sh b/rootfs_overlay/lkmc/dep.sh
similarity index 100%
rename from rootfs_overlay/dep.sh
rename to rootfs_overlay/lkmc/dep.sh
diff --git a/rootfs_overlay/lkmc/eval_base64.sh b/rootfs_overlay/lkmc/eval_base64.sh
new file mode 100755
index 0000000..a4f574c
--- /dev/null
+++ b/rootfs_overlay/lkmc/eval_base64.sh
@@ -0,0 +1,4 @@
+#!/bin/sh
+# https://github.com/cirosantilli/linux-kernel-module-cheat#replace-init
+cd "$lkmc_home"
+eval "$(printf "$lkmc_eval" | base64 -d)"
diff --git a/rootfs_overlay/fb.sh b/rootfs_overlay/lkmc/fb.sh
similarity index 100%
rename from rootfs_overlay/fb.sh
rename to rootfs_overlay/lkmc/fb.sh
diff --git a/rootfs_overlay/fops.sh b/rootfs_overlay/lkmc/fops.sh
similarity index 100%
rename from rootfs_overlay/fops.sh
rename to rootfs_overlay/lkmc/fops.sh
diff --git a/rootfs_overlay/gdbserver.sh b/rootfs_overlay/lkmc/gdbserver.sh
similarity index 100%
rename from rootfs_overlay/gdbserver.sh
rename to rootfs_overlay/lkmc/gdbserver.sh
diff --git a/rootfs_overlay/gem5.sh b/rootfs_overlay/lkmc/gem5.sh
similarity index 100%
rename from rootfs_overlay/gem5.sh
rename to rootfs_overlay/lkmc/gem5.sh
diff --git a/rootfs_overlay/gem5_exit.sh b/rootfs_overlay/lkmc/gem5_exit.sh
similarity index 100%
rename from rootfs_overlay/gem5_exit.sh
rename to rootfs_overlay/lkmc/gem5_exit.sh
diff --git a/rootfs_overlay/gpio.sh b/rootfs_overlay/lkmc/gpio.sh
similarity index 100%
rename from rootfs_overlay/gpio.sh
rename to rootfs_overlay/lkmc/gpio.sh
diff --git a/rootfs_overlay/init_forward.sh b/rootfs_overlay/lkmc/init_forward.sh
similarity index 100%
rename from rootfs_overlay/init_forward.sh
rename to rootfs_overlay/lkmc/init_forward.sh
diff --git a/rootfs_overlay/init_lkmc.sh b/rootfs_overlay/lkmc/init_lkmc.sh
similarity index 100%
rename from rootfs_overlay/init_lkmc.sh
rename to rootfs_overlay/lkmc/init_lkmc.sh
diff --git a/rootfs_overlay/init_module.sh b/rootfs_overlay/lkmc/init_module.sh
similarity index 100%
rename from rootfs_overlay/init_module.sh
rename to rootfs_overlay/lkmc/init_module.sh
diff --git a/rootfs_overlay/insrm.sh b/rootfs_overlay/lkmc/insrm.sh
similarity index 100%
rename from rootfs_overlay/insrm.sh
rename to rootfs_overlay/lkmc/insrm.sh
diff --git a/rootfs_overlay/lkmc/ioctl.sh b/rootfs_overlay/lkmc/ioctl.sh
new file mode 100755
index 0000000..e5eaf30
--- /dev/null
+++ b/rootfs_overlay/lkmc/ioctl.sh
@@ -0,0 +1,8 @@
+#!/bin/sh
+# https://github.com/cirosantilli/linux-kernel-module-cheat#ioctl
+set -e
+f=/sys/kernel/debug/lkmc_ioctl
+insmod ioctl.ko
+[ "$(./kernel_modules/ioctl.out "$f" 0 1)" = 2 ]
+[ "$(./kernel_modules/ioctl.out "$f" 1 1 1)" = '2 0' ]
+rmmod ioctl
diff --git a/rootfs_overlay/kgdb.sh b/rootfs_overlay/lkmc/kgdb.sh
similarity index 100%
rename from rootfs_overlay/kgdb.sh
rename to rootfs_overlay/lkmc/kgdb.sh
diff --git a/rootfs_overlay/kstrto.sh b/rootfs_overlay/lkmc/kstrto.sh
similarity index 100%
rename from rootfs_overlay/kstrto.sh
rename to rootfs_overlay/lkmc/kstrto.sh
diff --git a/rootfs_overlay/lkmc/loginroot.sh b/rootfs_overlay/lkmc/loginroot.sh
new file mode 100755
index 0000000..565ff77
--- /dev/null
+++ b/rootfs_overlay/lkmc/loginroot.sh
@@ -0,0 +1,3 @@
+#!/bin/sh
+# https://github.com/cirosantilli/linux-kernel-module-cheat#tty
+exec /bin/login root
diff --git a/rootfs_overlay/mknoddev.sh b/rootfs_overlay/lkmc/mknoddev.sh
similarity index 100%
rename from rootfs_overlay/mknoddev.sh
rename to rootfs_overlay/lkmc/mknoddev.sh
diff --git a/rootfs_overlay/lkmc/mmap.sh b/rootfs_overlay/lkmc/mmap.sh
new file mode 100755
index 0000000..4647980
--- /dev/null
+++ b/rootfs_overlay/lkmc/mmap.sh
@@ -0,0 +1,5 @@
+#!/bin/sh
+set -e
+insmod mmap.ko
+./kernel_modules/mmap.out /proc/lkmc_mmap 2>&1 1>/dev/null
+rmmod mmap.ko
diff --git a/rootfs_overlay/lkmc/netlink.sh b/rootfs_overlay/lkmc/netlink.sh
new file mode 100755
index 0000000..58dc14d
--- /dev/null
+++ b/rootfs_overlay/lkmc/netlink.sh
@@ -0,0 +1,8 @@
+#!/bin/sh
+# https://github.com/cirosantilli/linux-kernel-module-cheat#netlink-sockets
+set -e
+insmod netlink.ko
+[ "$(./linux/netlink.out)" = 0 ]
+[ "$(./linux/netlink.out)" = 1 ]
+[ "$(./linux/netlink.out)" = 2 ]
+rmmod netlink
diff --git a/rootfs_overlay/params.sh b/rootfs_overlay/lkmc/params.sh
similarity index 100%
rename from rootfs_overlay/params.sh
rename to rootfs_overlay/lkmc/params.sh
diff --git a/rootfs_overlay/pci_rescan.sh b/rootfs_overlay/lkmc/pci_rescan.sh
similarity index 100%
rename from rootfs_overlay/pci_rescan.sh
rename to rootfs_overlay/lkmc/pci_rescan.sh
diff --git a/rootfs_overlay/pmccntr.sh b/rootfs_overlay/lkmc/pmccntr.sh
similarity index 100%
rename from rootfs_overlay/pmccntr.sh
rename to rootfs_overlay/lkmc/pmccntr.sh
diff --git a/rootfs_overlay/lkmc/poll.sh b/rootfs_overlay/lkmc/poll.sh
new file mode 100755
index 0000000..b19363b
--- /dev/null
+++ b/rootfs_overlay/lkmc/poll.sh
@@ -0,0 +1,6 @@
+#!/bin/sh
+# https://github.com/cirosantilli/linux-kernel-module-cheat#poll
+set -e
+insmod poll.ko
+./kernel_modules/poll.out /sys/kernel/debug/lkmc_poll
+#rmmod poll
diff --git a/rootfs_overlay/pr_debug.sh b/rootfs_overlay/lkmc/pr_debug.sh
similarity index 100%
rename from rootfs_overlay/pr_debug.sh
rename to rootfs_overlay/lkmc/pr_debug.sh
diff --git a/rootfs_overlay/procfs.sh b/rootfs_overlay/lkmc/procfs.sh
similarity index 100%
rename from rootfs_overlay/procfs.sh
rename to rootfs_overlay/lkmc/procfs.sh
diff --git a/rootfs_overlay/psa.sh b/rootfs_overlay/lkmc/psa.sh
similarity index 100%
rename from rootfs_overlay/psa.sh
rename to rootfs_overlay/lkmc/psa.sh
diff --git a/rootfs_overlay/qemu_edu.sh b/rootfs_overlay/lkmc/qemu_edu.sh
similarity index 100%
rename from rootfs_overlay/qemu_edu.sh
rename to rootfs_overlay/lkmc/qemu_edu.sh
diff --git a/rootfs_overlay/rand_check_poweroff.sh b/rootfs_overlay/lkmc/rand_check_poweroff.sh
similarity index 88%
rename from rootfs_overlay/rand_check_poweroff.sh
rename to rootfs_overlay/lkmc/rand_check_poweroff.sh
index 6940b62..bb1168f 100755
--- a/rootfs_overlay/rand_check_poweroff.sh
+++ b/rootfs_overlay/lkmc/rand_check_poweroff.sh
@@ -1,6 +1,6 @@
 #!/bin/sh
 set -ex
-./rand_check.out
+./linux/rand_check.out
 
 # Check if network is being replayed.
 # https://superuser.com/questions/635020/how-to-know-current-time-from-internet-from-command-line-in-linux
@@ -9,4 +9,4 @@ set -ex
 
 # busybox's poweroff panics, TODO why. Likely tries to kill shell.
 # So just use our super raw command.
-./poweroff.out
+./linux/poweroff.out
diff --git a/rootfs_overlay/seq_file.sh b/rootfs_overlay/lkmc/seq_file.sh
similarity index 100%
rename from rootfs_overlay/seq_file.sh
rename to rootfs_overlay/lkmc/seq_file.sh
diff --git a/rootfs_overlay/seq_file_single_open.sh b/rootfs_overlay/lkmc/seq_file_single_open.sh
similarity index 100%
rename from rootfs_overlay/seq_file_single_open.sh
rename to rootfs_overlay/lkmc/seq_file_single_open.sh
diff --git a/rootfs_overlay/sshd.sh b/rootfs_overlay/lkmc/sshd.sh
similarity index 100%
rename from rootfs_overlay/sshd.sh
rename to rootfs_overlay/lkmc/sshd.sh
diff --git a/rootfs_overlay/sysfs.sh b/rootfs_overlay/lkmc/sysfs.sh
similarity index 100%
rename from rootfs_overlay/sysfs.sh
rename to rootfs_overlay/lkmc/sysfs.sh
diff --git a/rootfs_overlay/test_all.sh b/rootfs_overlay/lkmc/test_all.sh
similarity index 95%
rename from rootfs_overlay/test_all.sh
rename to rootfs_overlay/lkmc/test_all.sh
index f28f2e8..795647d 100755
--- a/rootfs_overlay/test_all.sh
+++ b/rootfs_overlay/lkmc/test_all.sh
@@ -20,7 +20,7 @@ for test in \
 ; do
   if ! "${test_dir}/${test}"; then
     echo "Test failed: ${test}"
-    test_fail.sh
+    ./test_fail.sh
     exit 1
   fi
 done
diff --git a/rootfs_overlay/test_fail.sh b/rootfs_overlay/lkmc/test_fail.sh
similarity index 79%
rename from rootfs_overlay/test_fail.sh
rename to rootfs_overlay/lkmc/test_fail.sh
index 682d36b..87931c8 100755
--- a/rootfs_overlay/test_fail.sh
+++ b/rootfs_overlay/lkmc/test_fail.sh
@@ -1,3 +1,3 @@
 #!/bin/sh
 # https://github.com/cirosantilli/linux-kernel-module-cheat#magic-failure-string
-echo lkmc_test_fail
+echo lkmc_exit_status_1
diff --git a/rootfs_overlay/uio_read.sh b/rootfs_overlay/lkmc/uio_read.sh
similarity index 94%
rename from rootfs_overlay/uio_read.sh
rename to rootfs_overlay/lkmc/uio_read.sh
index 499b874..b68a51b 100755
--- a/rootfs_overlay/uio_read.sh
+++ b/rootfs_overlay/lkmc/uio_read.sh
@@ -4,7 +4,7 @@ set -e
 modprobe uio_pci_generic
 # pci_min device
 echo '1234 11e9' > /sys/bus/pci/drivers/uio_pci_generic/new_id
-./uio_read.out &
+./kernel_modules/uio_read.out &
 # Helper to observe interrupts.
 insmod irq.ko
 base="$(setpci -d 1234:11e9 BASE_ADDRESS_0)"
diff --git a/rootfs_overlay/vermagic.sh b/rootfs_overlay/lkmc/vermagic.sh
similarity index 100%
rename from rootfs_overlay/vermagic.sh
rename to rootfs_overlay/lkmc/vermagic.sh
diff --git a/rootfs_overlay/virt_to_phys.sh b/rootfs_overlay/lkmc/virt_to_phys.sh
similarity index 100%
rename from rootfs_overlay/virt_to_phys.sh
rename to rootfs_overlay/lkmc/virt_to_phys.sh
diff --git a/rootfs_overlay/loginroot.sh b/rootfs_overlay/loginroot.sh
deleted file mode 100755
index fe8718b..0000000
--- a/rootfs_overlay/loginroot.sh
+++ /dev/null
@@ -1,2 +0,0 @@
-#!/bin/sh
-exec /bin/login root
diff --git a/rootfs_overlay/mmap.sh b/rootfs_overlay/mmap.sh
deleted file mode 100755
index d09e366..0000000
--- a/rootfs_overlay/mmap.sh
+++ /dev/null
@@ -1,5 +0,0 @@
-#!/bin/sh
-set -e
-insmod mmap.ko
-./mmap.out /proc/lkmc_mmap 2>&1 1>/dev/null
-rmmod mmap.ko
diff --git a/rootfs_overlay/netlink.sh b/rootfs_overlay/netlink.sh
deleted file mode 100755
index 0698d79..0000000
--- a/rootfs_overlay/netlink.sh
+++ /dev/null
@@ -1,7 +0,0 @@
-#!/bin/sh
-set -e
-insmod netlink.ko
-[ "$(/netlink.out)" = 0 ]
-[ "$(/netlink.out)" = 1 ]
-[ "$(/netlink.out)" = 2 ]
-rmmod netlink
diff --git a/rootfs_overlay/poll.sh b/rootfs_overlay/poll.sh
deleted file mode 100755
index d7586b4..0000000
--- a/rootfs_overlay/poll.sh
+++ /dev/null
@@ -1,5 +0,0 @@
-#!/bin/sh
-set -e
-insmod poll.ko
-./poll.out /sys/kernel/debug/lkmc_poll
-#rmmod poll
diff --git a/rootfs_overlay/root/.profile b/rootfs_overlay/root/.profile
deleted file mode 100644
index d59f2e4..0000000
--- a/rootfs_overlay/root/.profile
+++ /dev/null
@@ -1,4 +0,0 @@
-# https://unix.stackexchange.com/questions/176027/ash-profile-configuration-file
-echo "hello $(pwd)/.profile"
-# Does not inherit init environment variables.
-#env
diff --git a/run b/run
index eea3dad..d80a4c1 100755
--- a/run
+++ b/run
@@ -18,32 +18,31 @@ Run some content on an emulator.
 '''
         )
         self.add_argument(
-            '--background', default=False,
-            help='''\
-Send QEMU output to a file instead of the terminal so it does not require a
-terminal attached to run on the background. Interactive input cannot be given.
-TODO: use a port instead. If only there was a way to redirect a serial to multiple
-places, both to a port and a file? We use the file currently to be able to have
-any output at all.
-https://superuser.com/questions/1373226/how-to-redirect-qemu-serial-output-to-both-a-file-and-the-terminal-or-a-port
-'''
-        )
-        self.add_argument(
-            '-c', '--cpus', default=1, type=int,
+            '-c',
+            '--cpus',
+            default=1,
+            type=int,
             help='Number of guest CPUs to emulate. Default: %(default)s'
         )
         self.add_argument(
-            '--ctrl-c-host', default=False,
+            '--ctrl-c-host',
+            default=False,
             help='''\
 Ctrl +C kills the QEMU simulator instead of being passed to the guest.
 '''
         )
         self.add_argument(
-            '-D', '--debug-vm', default=False,
-            help='Run GDB on the emulator itself.'
+            '-D',
+            '--debug-vm',
+            default=False,
+            help='''\
+Run GDB on the emulator itself.
+For --emulator native, this debugs the target program.
+'''
         )
         self.add_argument(
-            '--debug-vm-args', default='',
+            '--debug-vm-args',
+            default='',
             help='Pass arguments to GDB. Implies --debug-vm.'
         )
         self.add_argument(
@@ -54,10 +53,90 @@ which is what you usually want.
 '''
         )
         self.add_argument(
-            '-E', '--eval',
+            '-E',
+            '--eval',
             help='''\
-Replace the normal init with a minimal init that just evals the given string.
+Replace the normal init with a minimal init that just evals the given sh string.
 See: https://github.com/cirosantilli/linux-kernel-module-cheat#replace-init
+chdir into lkmc_home before running the command:
+https://github.com/cirosantilli/linux-kernel-module-cheat#lkmc_home
+'''
+        )
+        self.add_argument(
+            '-F',
+            '--eval-after',
+            help='''\
+Similar to --eval, but the string gets evaled at the last init script,
+after the normal init finished.
+See: https://github.com/cirosantilli/linux-kernel-module-cheat#init-busybox
+'''
+        )
+        self.add_argument(
+            '-G',
+            '--gem5-exe-args',
+            default='',
+            help='''\
+Pass extra options to the gem5 executable.
+Do not confuse with the arguments passed to config scripts,
+like `fs.py`. Example:
+./run --emulator gem5 --gem5-exe-args '--debug-flags=Exec --debug' -- --cpu-type=HPI --caches
+will run:
+gem.op5 --debug-flags=Exec fs.py --cpu-type=HPI --caches
+'''
+        )
+        self.add_argument(
+            '--gdb',
+            default=False,
+            help='''\
+Shortcut for the most common GDB options that you want most of the time. Implies:
+* --gdb-wait
+* --tmux-args <main> where <main> is:
+** start_kernel in full system
+** main in user mode
+* --tmux-program gdb
+'''
+        )
+        self.add_argument(
+            '--gdb-wait',
+            default=False,
+            help='''\
+Wait for GDB to connect before starting execution
+See: https://github.com/cirosantilli/linux-kernel-module-cheat#gdb
+'''
+        )
+        self.add_argument(
+            '--gem5-script',
+            default='fs',
+            choices=['fs', 'biglittle'],
+            help='Which gem5 script to use'
+        )
+        self.add_argument(
+            '--gem5-readfile',
+            default='',
+            help='Set the contents of m5 readfile to this string.'
+        )
+        self.add_argument(
+            '--gem5-restore',
+            type=int,
+            help='''\
+Restore the nth most recently taken gem5 checkpoint according to directory
+timestamps.
+'''
+        )
+        self.add_argument(
+            '--graphic',
+            default=False,
+            help='''\
+Run in graphic mode.
+See: http://github.com/cirosantilli/linux-kernel-module-cheat#graphics
+'''
+        )
+        self.add_argument(
+            '--kdb',
+            default=False,
+            help='''\
+Setup KDB kernel CLI options.
+See: http://github.com/cirosantilli/linux-kernel-module-cheat#kdb
 '''
         )
         self.add_argument(
@@ -66,15 +145,7 @@ See: https://github.com/cirosantilli/linux-kernel-module-cheat#replace-init
 Pass an extra Linux kernel command line options, and place them before
 the dash separator `-`. Only options that come before the `-`, i.e.
 "standard" options, should be passed with this option.
-Example: `./run --arch arm --kernel-cli 'init=/poweroff.out'`
-'''
-        )
-        self.add_argument(
-            '-F', '--eval-after',
-            help='''\
-Pass a base64 encoded command line parameter that gets evalled at the end of
-the normal init.
-See: https://github.com/cirosantilli/linux-kernel-module-cheat#init-busybox
+Example: `./run --arch arm --kernel-cli 'init=/lkmc/poweroff.out'`
 '''
         )
         self.add_argument(
@@ -88,45 +159,41 @@ Example: `./run --kernel-cli-after-dash 'lkmc_eval="wget google.com" lkmc_lala=y
 '''
         )
         self.add_argument(
-            '-G', '--gem5-exe-args', default='',
+            '--kernel-version',
+            default='5.0',
             help='''\
-Pass extra options to the gem5 executable.
-Do not confuse with the arguments passed to config scripts,
-like `fs.py`. Example:
-./run --emulator gem5 --gem5-exe-args '--debug-flags=Exec --debug' -- --cpu-type=HPI --caches
-will run:
-gem.op5 --debug-flags=Exec fs.py --cpu-type=HPI --caches
+Pass a base64 encoded command line parameter that gets evalled at the end of
+the normal init.
+See: https://github.com/cirosantilli/linux-kernel-module-cheat#init-busybox
+chdir into lkmc_home before running the command:
+https://github.com/cirosantilli/linux-kernel-module-cheat#lkmc_home
+Specify the Linux kernel version to be reported by syscall emulation.
+Defaults to the same kernel version as our default Buildroot build.
+Currently only works for QEMU.
+See: http://github.com/cirosantilli/linux-kernel-module-cheat#fatal-kernel-too-old
 '''
         )
         self.add_argument(
-            '--gem5-script',
-            default='fs',
-            choices=['fs', 'biglittle'],
-            help='Which gem5 script to use'
-        )
-        self.add_argument(
-            '--gem5-readfile', default='',
-            help='Set the contents of m5 readfile to this string.'
-        )
-        self.add_argument(
-            '-K', '--kvm', default=False,
-            help='Use KVM. Only works if guest arch == host arch'
-        )
-        self.add_argument(
-            '--kgdb', default=False,
-        )
-        self.add_argument(
-            '--kdb', default=False,
-        )
-        self.add_argument(
-            '--gem5-restore', type=int,
+            '--kgdb',
+            default=False,
             help='''\
-Restore the nth most recently taken gem5 checkpoint according to directory
-timestamps.
+Setup KGDB kernel CLI options.
+See: http://github.com/cirosantilli/linux-kernel-module-cheat#kgdb
 '''
         )
         self.add_argument(
-            '-m', '--memory', default='256M',
+            '-K',
+            '--kvm',
+            default=False,
+            help='''\
+Use KVM. Only works if guest arch == host arch.
+See: http://github.com/cirosantilli/linux-kernel-module-cheat#kvm
+'''
+        )
+        self.add_argument(
+            '-m',
+            '--memory',
+            default='256M',
             help='''\
 Set the memory size of the guest. E.g.: `-m 512M`. We try to keep the default
 at the minimal ammount amount that boots all archs. Anything lower could lead
@@ -142,21 +209,43 @@ Setup a kernel init parameter that makes the emulator quit immediately after boo
 '''
         )
         self.add_argument(
-            '-R', '--replay', default=False,
+            '-R',
+            '--replay',
+            default=False,
             help='Replay a QEMU run record deterministically'
         )
         self.add_argument(
-            '-r', '--record', default=False,
+            '-r',
+            '--record',
+            default=False,
             help='Record a QEMU run record for later replay with `-R`'
         )
         self.add_argument(
-            '-T', '--trace',
+            '--show-stdout',
+            default=True,
+            help='''Show emulator stdout and stderr on the host terminal.'''
+        )
+        self.add_argument(
+            '--terminal',
+            default=False,
+            help='''\
+Output directly to the terminal, don't pipe to tee as the default.
+With this, we don't not save the output to a file as is done by default,
+but we are able to do things that require not having a pipe suh as you to
+using debuggers. This option issSet automatically by --debug-vm, but you still need
+it to debug gem5 Python scripts with pdb.
+'''
+        )
+        self.add_argument(
+            '-T',
+            '--trace',
             help='''\
 Set trace events to be enabled. If not given, gem5 tracing is completely
 disabled, while QEMU tracing is enabled but uses default traces that are very
 rare and don't affect performance, because `./configure
 --enable-trace-backends=simple` seems to enable some traces by default, e.g.
 `pr_manager_run`, and I don't know how to get rid of them.
+See: http://github.com/cirosantilli/linux-kernel-module-cheat#tracing
 '''
         )
         self.add_argument(
@@ -174,16 +263,9 @@ Trace instructions run to stdout. Shortcut for --trace --trace-stdout.
 '''
         )
         self.add_argument(
-            '--terminal', default=False,
-            help='''\
-Output to the terminal, don't pipe to tee as the default.
-Does not save the output to a file, but allows you to use debuggers.
-Set automatically by --debug-vm, but you still need this option to debug
-gem5 Python scripts with pdb.
-'''
-        )
-        self.add_argument(
-            '-t', '--tmux', default=False,
+            '-t',
+            '--tmux',
+            default=False,
             help='''\
 Create a tmux split the window. You must already be inside of a `tmux` session
 to use this option:
@@ -201,27 +283,35 @@ Parameters to pass to the program running on the tmux split. Implies --tmux.
 '''
         )
         self.add_argument(
-            '-w', '--wait-gdb', default=False,
-            help='Wait for GDB to connect before starting execution'
+            '--tmux-program',
+            choices=('gdb', 'shell'),
+            help='''\
+Which program to run in tmux. Implies --tmux. Defaults:
+* 'gdb' in qemu
+* 'shell' in gem5. 'shell' is only supported in gem5 currently.
+'''
         )
         self.add_argument(
-            '-x', '--graphic', default=False,
-            help='Run in graphic mode. Mnemonic: X11'
-        )
-        self.add_argument(
-            '-V', '--vnc', default=False,
+            '--vnc',
+            default=False,
             help='''\
 Run QEMU with VNC instead of the default SDL. Connect to it with:
 `vinagre localhost:5900`.
 '''
         )
         self.add_argument(
-            'extra_emulator_args', nargs='*', default=[],
-            help='Extra options to append at the end of the emulator command line'
+            'extra_emulator_args',
+            nargs='*',
+            default=[],
+            help='''\
+Extra options to append at the end of the emulator command line.
+'''
         )
 
     def timed_main(self):
-        show_stdout = True
+        if self.env['emulator'] == 'native' and self.env['userland'] is None:
+            raise Exception('native emulator only supported in user mode')
+        show_stdout = self.env['show_stdout']
         # Common qemu / gem5 logic.
         # nokaslr:
         # * https://unix.stackexchange.com/questions/397939/turning-off-kaslr-to-debug-linux-kernel-using-qemu-and-gdb
@@ -232,16 +322,31 @@ Run QEMU with VNC instead of the default SDL. Connect to it with:
             kernel_cli += ' {}'.format(self.env['kernel_cli'])
         if self.env['quit_after_boot']:
             kernel_cli += ' {}'.format(self.env['quit_init'])
-        kernel_cli_after_dash = ''
+        kernel_cli_after_dash = ' lkmc_home={}'.format(self.env['guest_lkmc_home'])
         extra_emulator_args = []
         extra_qemu_args = []
-        if self.env['tmux_args'] is not None:
+        if not self.env['_args_given']['tmux_program']:
+            if self.env['emulator'] == 'qemu':
+                self.env['tmux_program'] = 'gdb'
+            elif self.env['emulator'] == 'gem5':
+                self.env['tmux_program'] = 'shell'
+        if self.env['gdb']:
+            if not self.env['_args_given']['gdb_wait']:
+                self.env['gdb_wait'] = True
+            if not self.env['_args_given']['tmux_args']:
+                if self.env['userland'] is None and self.env['baremetal'] is None:
+                    self.env['tmux_args'] = 'start_kernel'
+                else:
+                    self.env['tmux_args'] = 'main'
+            if not self.env['_args_given']['tmux_program']:
+                self.env['tmux_program'] = 'gdb'
+        if self.env['tmux_args'] is not None or self.env['_args_given']['tmux_program']:
             self.env['tmux'] = True
         if self.env['debug_vm'] or self.env['debug_vm_args']:
             debug_vm = ['gdb', LF, '-q', LF] + self.sh.shlex_split(self.env['debug_vm_args']) + ['--args', LF]
         else:
             debug_vm = []
-        if self.env['wait_gdb']:
+        if self.env['gdb_wait']:
             extra_qemu_args.extend(['-S', LF])
         if self.env['eval_after'] is not None:
             kernel_cli_after_dash += ' lkmc_eval_base64="{}"'.format(self.sh.base64_encode(self.env['eval_after']))
@@ -252,7 +357,7 @@ Run QEMU with VNC instead of the default SDL. Connect to it with:
         else:
             vnc = []
         if self.env['eval'] is not None:
-            kernel_cli += ' {}=/eval_base64.sh'.format(self.env['initarg'])
+            kernel_cli += ' {}=/lkmc/eval_base64.sh'.format(self.env['initarg'])
             kernel_cli_after_dash += ' lkmc_eval="{}"'.format(self.sh.base64_encode(self.env['eval']))
         if not self.env['graphic']:
             extra_qemu_args.extend(['-nographic', LF])
@@ -301,36 +406,40 @@ Run QEMU with VNC instead of the default SDL. Connect to it with:
             if not self.env['dry_run']:
                 raise Exception('Root filesystem not found. Did you build it? ' \
                                 'Tried to use: ' + self.env['disk_image'])
-        def raise_image_not_found():
+        def raise_image_not_found(image):
             if not self.env['dry_run']:
                 raise Exception('Executable image not found. Did you build it? ' \
-                                'Tried to use: ' + self.env['image'])
-        if self.env['image'] is None:
-            raise Exception('Baremetal ELF file not found. Tried:\n' + '\n'.join(paths))
+                                'Tried to use: ' + image)
+        if not os.path.exists(self.env['image']):
+            raise_image_not_found(self.env['image'])
         cmd = debug_vm.copy()
         if self.env['emulator'] == 'gem5':
             if self.env['quiet']:
                 show_stdout = False
-            if self.env['baremetal'] is None:
+            if not self.env['baremetal'] is None:
+                if not os.path.exists(self.env['gem5_fake_iso']):
+                    os.makedirs(os.path.dirname(self.env['gem5_fake_iso']), exist_ok=True)
+                    self.sh.write_string_to_file(self.env['gem5_fake_iso'], 'a' * 512)
+            elif self.env['userland'] is None:
                 if not os.path.exists(self.env['rootfs_raw_file']):
                     if not os.path.exists(self.env['qcow2_file']):
                         raise_rootfs_not_found()
                     self.raw_to_qcow2(qemu_which=self.env['qemu_which'], reverse=True)
-            else:
-                if not os.path.exists(self.env['gem5_fake_iso']):
-                    os.makedirs(os.path.dirname(self.env['gem5_fake_iso']), exist_ok=True)
-                    self.sh.write_string_to_file(self.env['gem5_fake_iso'], 'a' * 512)
             if not os.path.exists(self.env['image']):
                 # This is to run gem5 from a prebuilt download.
-                if (not self.env['baremetal'] is None) or (not os.path.exists(self.env['linux_image'])):
-                    raise_image_not_found()
-                self.sh.run_cmd([os.path.join(self.env['extract_vmlinux'], self.env['linux_image'])])
+                if (
+                    self.env['baremetal'] is None and
+                    self.env['userland'] is None
+                ):
+                    if not os.path.exists(self.env['linux_image']):
+                        raise_image_not_found(self.env['image'])
+                    self.sh.run_cmd([os.path.join(self.env['extract_vmlinux'], self.env['linux_image'])])
             os.makedirs(os.path.dirname(self.env['gem5_readfile']), exist_ok=True)
             self.sh.write_string_to_file(self.env['gem5_readfile'], self.env['gem5_readfile'])
             memory = '{}B'.format(self.env['memory'])
             gem5_exe_args = self.sh.shlex_split(self.env['gem5_exe_args'])
             if do_trace:
-                gem5_exe_args.extend(['--debug-flags={}'.format(trace_type), LF])
+                gem5_exe_args.extend(['--debug-flags', trace_type, LF])
             extra_env['M5_PATH'] = self.env['gem5_system_dir']
             # https://stackoverflow.com/questions/52312070/how-to-modify-a-file-under-src-python-and-run-it-without-rebuilding-in-gem5/52312071#52312071
             extra_env['M5_OVERRIDE_PY_SOURCE'] = 'true'
@@ -350,7 +459,7 @@ Run QEMU with VNC instead of the default SDL. Connect to it with:
             if self.env['userland'] is not None:
                 cmd.extend([
                     self.env['gem5_se_file'], LF,
-                    '--cmd', self.resolve_userland_executable(self.env['userland']), LF,
+                    '--cmd', self.env['image'], LF,
                 ])
                 if self.env['userland_args'] is not None:
                     cmd.extend(['--options', self.env['userland_args'], LF])
@@ -421,7 +530,7 @@ Run QEMU with VNC instead of the default SDL. Connect to it with:
                     ])
                     if self.env['dtb']:
                         cmd.extend(['--dtb', os.path.join(self.env['gem5_system_dir'], 'arm', 'dt', 'armv8_gem5_v1_big_little_2_2.dtb'), NL])
-            if self.env['wait_gdb']:
+            if self.env['gdb_wait']:
                 # https://stackoverflow.com/questions/49296092/how-to-make-gem5-wait-for-gdb-to-connect-to-reliably-break-at-start-kernel-of-th
                 cmd.extend(['--param', 'system.cpu[0].wait_for_remote_gdb = True', LF])
         elif self.env['emulator'] == 'qemu':
@@ -429,38 +538,28 @@ Run QEMU with VNC instead of the default SDL. Connect to it with:
                 '-trace', 'enable={},file={}'.format(trace_type, self.env['qemu_trace_file']), LF,
             ]
             if self.env['userland'] is not None:
-                if self.env['wait_gdb']:
+                if self.env['gdb_wait']:
                     debug_args = ['-g', str(self.env['gdb_port']), LF]
                 else:
                     debug_args = []
                 cmd.extend(
                     [
-                        os.path.join(self.env['qemu_build_dir'], '{}-linux-user'.format(self.env['arch']), 'qemu-{}'.format(self.env['arch'])), LF,
-                        '-L', self.env['userland_library_dir'], LF
+                        self.env['qemu_executable'], LF,
+                        '-L', self.env['userland_library_dir'], LF,
+                        '-r', self.env['kernel_version'], LF,
+                        '-seed', '0', LF,
                     ] +
                     qemu_user_and_system_options +
                     debug_args
                 )
             else:
-                if not os.path.exists(self.env['image']):
-                    raise_image_not_found()
                 extra_emulator_args.extend(extra_qemu_args)
                 self.make_run_dirs()
-                if self.env['qemu_which'] == 'host' or not os.path.exists(self.env['qemu_executable']):
-                    qemu_executable = self.env['qemu_executable_basename']
-                    qemu_executable_host = True
-                else:
-                    qemu_executable = self.env['qemu_executable']
-                    qemu_executable_host = False
-                qemu_executable = shutil.which(qemu_executable)
-                if qemu_executable is None:
-                    raise Exception('QEMU executable not found, did you forget to build or install it?\n' \
-                                    'Tried to use: ' + qemu_executable)
                 if self.env['debug_vm']:
                     serial_monitor = []
                 else:
                     if self.env['background']:
-                        serial_monitor = ['-serial', 'file:{}'.format(self.env['qemu_background_serial_file']), LF]
+                        serial_monitor = ['-serial', 'file:{}'.format(self.env['guest_terminal_file']), LF]
                         if self.env['quiet']:
                             show_stdout = False
                     else:
@@ -496,7 +595,7 @@ Run QEMU with VNC instead of the default SDL. Connect to it with:
                     machine2 = []
                 cmd.extend(
                     [
-                        qemu_executable, LF,
+                        self.env['qemu_executable'], LF,
                         '-machine', self.env['machine'], LF,
                     ] +
                     machine2 +
@@ -506,7 +605,11 @@ Run QEMU with VNC instead of the default SDL. Connect to it with:
                         '-kernel', self.env['image'], LF,
                         '-m', self.env['memory'], LF,
                         '-monitor', 'telnet::{},server,nowait'.format(self.env['qemu_monitor_port']), LF,
-                        '-netdev', 'user,hostfwd=tcp::{}-:{},hostfwd=tcp::{}-:22,id=net0'.format(self.env['qemu_hostfwd_generic_port'], self.env['qemu_hostfwd_generic_port'], self.env['qemu_hostfwd_ssh_port']), LF,
+                        '-netdev', 'user,hostfwd=tcp::{}-:{},hostfwd=tcp::{}-:22,id=net0'.format(
+                            self.env['qemu_hostfwd_generic_port'],
+                            self.env['qemu_hostfwd_generic_port'],
+                            self.env['qemu_hostfwd_ssh_port']
+                        ), LF,
                         '-no-reboot', LF,
                         '-smp', str(self.env['cpus']), LF,
                     ] +
@@ -516,7 +619,7 @@ Run QEMU with VNC instead of the default SDL. Connect to it with:
                 )
                 if self.env['dtb'] is not None:
                     cmd.extend(['-dtb', self.env['dtb'], LF])
-                if not qemu_executable_host:
+                if not self.env['qemu_which'] == 'host':
                     cmd.extend(qemu_user_and_system_options)
                 if self.env['initrd']:
                     extra_emulator_args.extend(['-initrd', self.env['buildroot_cpio'], LF])
@@ -542,7 +645,12 @@ Run QEMU with VNC instead of the default SDL. Connect to it with:
                             self.raw_to_qcow2(qemu_which=self.env['qemu_which'])
                         extra_emulator_args.extend([
                             '-drive',
-                            'file={},format=qcow2,if={}{}{}'.format(self.env['disk_image'], driveif, snapshot, rrid),
+                            'file={},format=qcow2,if={}{}{}'.format(
+                                self.env['disk_image'],
+                                driveif,
+                                snapshot,
+                                rrid
+                            ),
                             LF,
                         ])
                         if rr:
@@ -553,7 +661,10 @@ Run QEMU with VNC instead of the default SDL. Connect to it with:
                 if rr:
                     extra_emulator_args.extend([
                         '-object', 'filter-replay,id=replay,netdev=net0',
-                        '-icount', 'shift=7,rr={},rrfile={}'.format('record' if self.env['record'] else 'replay', self.env['qemu_rrfile']),
+                        '-icount', 'shift=7,rr={},rrfile={}'.format(
+                            'record' if self.env['record'] else 'replay',
+                            self.env['qemu_rrfile']
+                        ),
                     ])
                     virtio_gpu_pci = []
                 else:
@@ -580,34 +691,44 @@ Run QEMU with VNC instead of the default SDL. Connect to it with:
                     cmd.extend(append)
         if self.env['tmux']:
             tmux_args = '--run-id {}'.format(self.env['run_id'])
-            if self.env['emulator'] == 'gem5':
-                tmux_cmd = './gem5-shell'
-            else:
-                tmux_cmd = './run-gdb'
+            if self.env['tmux_program'] == 'shell':
+                if self.env['emulator'] == 'gem5':
+                    tmux_cmd = './gem5-shell'
+                else:
+                    raise Exception('--tmux-program is only supported in gem5 currently.')
+            elif self.env['tmux_program'] == 'gdb':
+                tmux_cmd = os.path.join(self.env['root_dir'], 'run-gdb')
                 # TODO find a nicer way to forward all those args automatically.
                 # Part of me wants to: https://github.com/jonathanslenders/pymux
                 # but it cannot be used as a library properly it seems, and it is
                 # slower than tmux.
-                tmux_args += " --arch {} --linux-build-id '{}' --run-id '{}'".format(
+                tmux_args += " --arch {} --emulator '{}' --gcc-which '{}' --linux-build-id '{}' --run-id '{}' --userland-build-id '{}'".format(
                     self.env['arch'],
+                    self.env['emulator'],
+                    self.env['gcc_which'],
                     self.env['linux_build_id'],
                     self.env['run_id'],
+                    self.env['userland_build_id'],
                 )
                 if self.env['baremetal']:
                     tmux_args += " --baremetal '{}'".format(self.env['baremetal'])
                 if self.env['userland']:
                     tmux_args += " --userland '{}'".format(self.env['userland'])
+                if self.env['in_tree']:
+                    tmux_args += ' --in-tree'
             if self.env['tmux_args'] is not None:
                 tmux_args += ' {}'.format(self.env['tmux_args'])
-            subprocess.Popen([
-                os.path.join(self.env['root_dir'], 'tmu'),
+            tmux_cmd = [
+                os.path.join(self.env['root_dir'], 'tmux-split'),
                 "sleep 2;{} {}".format(tmux_cmd, tmux_args)
-            ])
+            ]
+            self.log_info(tmux_cmd)
+            subprocess.Popen(tmux_cmd)
         cmd.extend(extra_emulator_args)
         cmd.extend(self.env['extra_emulator_args'])
-        if self.env['emulator'] == 'qemu' and self.env['userland']:
+        if self.env['userland'] and self.env['emulator'] in ('qemu', 'native'):
             # The program and arguments must come at the every end of the CLI.
-            cmd.extend([self.resolve_userland_executable(self.env['userland']), LF])
+            cmd.extend([self.env['image'], LF])
             if self.env['userland_args'] is not None:
                 cmd.extend(self.sh.shlex_split(self.env['userland_args']))
         if debug_vm or self.env['terminal']:
@@ -623,35 +744,37 @@ Run QEMU with VNC instead of the default SDL. Connect to it with:
             show_stdout=show_stdout,
         )
         if exit_status == 0:
-            # Check if guest panicked.
-            if self.env['emulator'] == 'gem5':
-                # We have to do some parsing here because gem5 exits with status 0 even when panic happens.
-                # Grepping for '^panic: ' does not work because some errors don't show that message.
-                panic_msg = b'--- BEGIN LIBC BACKTRACE ---$'
-            else:
-                panic_msg = b'Kernel panic - not syncing'
-            panic_re = re.compile(panic_msg)
             error_string_found = False
             exit_status = 0
             if out_file is not None and not self.env['dry_run']:
-                with open(self.env['termout_file'], 'br') as logfile:
-                    line = None
-                    for line in logfile:
-                        if panic_re.search(line):
-                            exit_status = 1
-                    if line is not None:
-                        last_line = line.rstrip()
-                        match = re.search(b'Simulated exit code not 0! Exit code is (\d+)', last_line)
-                        if match:
-                            exit_status = int(match.group(1))
+                if self.env['emulator'] == 'gem5':
+                    with open(self.env['termout_file'], 'br') as logfile:
+                        # We have to do some parsing here because gem5 exits with status 0 even when panic happens.
+                        # Grepping for '^panic: ' does not work because some errors don't show that message...
+                        gem5_panic_re = re.compile(b'--- BEGIN LIBC BACKTRACE ---$')
+                        line = None
+                        for line in logfile:
+                            if gem5_panic_re.search(line):
+                                exit_status = 1
+                        if self.env['userland']:
+                            if line is not None:
+                                last_line = line.rstrip()
+                                match = re.search(b'Simulated exit code not 0! Exit code is (\d+)', last_line)
+                                if match:
+                                    exit_status = int(match.group(1))
                 if not self.env['userland']:
                     if os.path.exists(self.env['guest_terminal_file']):
                         with open(self.env['guest_terminal_file'], 'br') as logfile:
+                            linux_panic_re = re.compile(b'Kernel panic - not syncing')
+                            serial_magic_exit_status_regexp = re.compile(self.env['serial_magic_exit_status_regexp_string'])
                             for line in logfile.readlines():
-                                if line.rstrip() == self.env['magic_fail_string']:
+                                line = line.rstrip()
+                                if not self.env['baremetal'] and linux_panic_re.search(line):
                                     exit_status = 1
-                                    break
-            if exit_status != 0:
+                                match = serial_magic_exit_status_regexp.match(line)
+                                if match:
+                                    exit_status = int(match.group(1))
+            if exit_status != 0 and self.env['show_stdout']:
                 self.log_error('simulation error detected by parsing logs')
         return exit_status
 
diff --git a/run-gdb b/run-gdb
index afbaa71..eb286c5 100755
--- a/run-gdb
+++ b/run-gdb
@@ -1,12 +1,12 @@
 #!/usr/bin/env python3
 
-import imp
 import os
 import signal
 import subprocess
 import sys
 
 import common
+import lkmc.import_path
 from shell_helpers import LF
 
 class GdbTestcase:
@@ -34,7 +34,7 @@ class GdbTestcase:
         self.child.setecho(False)
         self.child.waitnoecho()
         self.child.expect(self.prompt)
-        test = imp.load_source('test', test_script_path)
+        test = lkmc.import_path.import_path(test_script_path)
         exception = None
         try:
             test.test(self)
@@ -153,7 +153,7 @@ See: https://github.com/cirosantilli/linux-kernel-module-cheat#gdb-builtin-cpu-s
         else:
             image = self.env['vmlinux']
         cmd = (
-            [self.get_toolchain_tool('gdb'), LF] +
+            [self.env['gdb_path'], LF] +
             before
         )
         if linux_full_system:
diff --git a/run-gdb-user b/run-gdb-user
index 0e2d406..be7a485 100755
--- a/run-gdb-user
+++ b/run-gdb-user
@@ -1,37 +1,43 @@
 #!/usr/bin/env python3
 
-import imp
 import os
-import sys
 
 import common
-rungdb = imp.load_source('run_gdb', os.path.join(kwargs['root_dir'], 'run-gdb'))
+import lkmc.import_path
 
-parser = self.get_argparse(argparse_args={
-    'description': '''GDB step debug guest userland processes without gdbserver.
+class Main(common.LkmcCliFunction):
+    def __init__(self):
+        super().__init__(
+            description='''GDB step debug guest userland processes without gdbserver.
 
 More information at: https://github.com/cirosantilli/linux-kernel-module-cheat#gdb-step-debug-userland-processes
 '''
-})
-parser.add_argument(
-    'executable',
-    help='Path to the executable to be debugged relative to the Buildroot build directory.'
-)
-parser.add_argument(
-    'break_at',
-    default=None,
-    help='Break at this point, e.g. main.',
-    nargs='?'
-)
-args = self.setup(parser)
-executable = self.resolve_userland_executable(kwargs['executable'])
-addr = self.get_elf_entry(os.path.join(kwargs['buildroot_build_build_dir'], executable))
-extra_args = {}
-extra_args['before'] = '-ex \"add-symbol-file {} {}\"'.format(executable, hex(addr))
-# Or else lx-symbols throws for arm:
-# gdb.MemoryError: Cannot access memory at address 0xbf0040cc
-# TODO understand better.
-# Also, lx-symbols overrides the add-symbol-file commands.
-extra_args['no_lxsymbols'] = True
-extra_args['break_at'] = kwargs['break_at']
-sys.exit(rungdb.main(args, extra_args))
+        )
+        self.add_argument(
+            'executable',
+            help='Path to the executable to be debugged relative to the Buildroot build directory.'
+        )
+        self.add_argument(
+            'break_at',
+            default=None,
+            help='Break at this point, e.g. main.',
+            nargs='?'
+        )
+
+    def timed_main(self):
+        raise Exception("This is known to be broken, but fixing shouldn't be too hard! Keyword: get_argparse. See also: https://github.com/cirosantilli/linux-kernel-module-cheat/issues/63")
+        executable = self.env['image']
+        addr = self.get_elf_entry(os.path.join(self.env['buildroot_build_build_dir'], executable))
+        args = {}
+        args['before'] = '-ex \"add-symbol-file {} {}\"'.format(executable, hex(addr))
+        # Or else lx-symbols throws for arm:
+        # gdb.MemoryError: Cannot access memory at address 0xbf0040cc
+        # TODO understand better.
+        # Also, lx-symbols overrides the add-symbol-file commands.
+        args['no_lxsymbols'] = True
+        args['break_at'] = self.env['break_at']
+        rungdb = lkmc.import_path.import_path_main('run-gdb')
+        return rungdb(**args)
+
+if __name__ == '__main__':
+    Main().cli()
diff --git a/run-gdbserver b/run-gdbserver
index fbb2a1e..0f42310 100755
--- a/run-gdbserver
+++ b/run-gdbserver
@@ -19,7 +19,7 @@ parser.add_argument(
 )
 args = self.setup(parser)
 sys.exit(subprocess.Popen([
-  self.get_toolchain_tool('gdb'),
+  self.env['gdb_path'],
   '-q',
   '-ex', 'set sysroot {}'.format(kwargs['buildroot_staging_dir']),
   '-ex', 'target remote localhost:{}'.format(kwargs['qemu_hostfwd_generic_port']),
diff --git a/run-toolchain b/run-toolchain
index 3672979..259654a 100755
--- a/run-toolchain
+++ b/run-toolchain
@@ -1,13 +1,17 @@
 #!/usr/bin/env python3
 
 import os
-import sys
 
 import common
 from shell_helpers import LF
 
-parser = self.get_argparse(argparse_args={
-    'description': '''Run a Buildroot ToolChain tool like readelf or objdump.
+class Main(common.LkmcCliFunction):
+    def __init__(self):
+        super().__init__(
+            defaults = {
+                'show_time': False,
+            },
+            description='''Run a Buildroot ToolChain tool like readelf or objdump.
 
 For example, to get some information about the arm vmlinux:
 
@@ -20,31 +24,40 @@ Get the list of available tools with:
 ....
 ls "$(./getvar -a arm buildroot_host_bin_dir)"
 ....
-'''
-})
-parser.add_argument(
-    '--dry',
-    help='Just output the tool path to stdout but actually run it',
-)
-parser.add_argument('tool', help='Which tool to run.')
-parser.add_argument(
-    'extra_args',
-    default=[],
-    help='Extra arguments for the tool.',
-    metavar='extra-args',
-    nargs='*'
-)
-args = self.setup(parser)
-if kwargs['baremetal'] is None:
-    image = kwargs['vmlinux']
-else:
-    image = kwargs['image']
-tool= self.get_toolchain_tool(kwargs['tool'])
-if kwargs['dry']:
-    print(tool)
-else:
-    sys.exit(self.sh.run_cmd(
-        [tool, LF]
-        + self.sh.add_newlines(kwargs['extra_args']),
-        cmd_file=os.path.join(kwargs['run_dir'], 'run-toolchain.sh'),
-    ))
+''',
+        )
+        self.add_argument(
+            '--print-tool',
+            default=False,
+            help='''
+Just output print tool path to stdout but don't actually run it.
+Suitable for programmatic consumption by other shell programs.
+''',
+        )
+        self.add_argument('tool', help='Which tool to run.')
+        self.add_argument(
+            'extra_args',
+            default=[],
+            help='Extra arguments for the tool.',
+            metavar='extra-args',
+            nargs='*'
+        )
+
+    def timed_main(self):
+        if self.env['baremetal'] is None:
+            image = self.env['vmlinux']
+        else:
+            image = self.env['image']
+        tool = self.get_toolchain_tool(self.env['tool'])
+        if self.env['print_tool']:
+            print(tool)
+            return 0
+        else:
+            return self.sh.run_cmd(
+                [tool, LF]
+                + self.sh.add_newlines(self.env['extra_args']),
+                cmd_file=os.path.join(self.env['run_dir'], 'run-toolchain.sh'),
+            )
+
+if __name__ == '__main__':
+    Main().cli()
diff --git a/shell_helpers.py b/shell_helpers.py
index 8a57f19..40e84de 100644
--- a/shell_helpers.py
+++ b/shell_helpers.py
@@ -52,10 +52,9 @@ class ShellHelpers:
         https://stackoverflow.com/questions/3029816/how-do-i-get-a-thread-safe-print-in-python-2-6
         The initial use case was test-gdb which must create a thread for GDB to run the program in parallel.
         '''
-        cls._print_lock.acquire()
-        sys.stdout.write(string + '\n')
-        sys.stdout.flush()
-        cls._print_lock.release()
+        with cls._print_lock:
+            sys.stdout.write(string + '\n')
+            sys.stdout.flush()
 
     def add_newlines(self, cmd):
         out = []
@@ -124,21 +123,20 @@ class ShellHelpers:
         os.makedirs(destdir, exist_ok=True)
         for basename in sorted(os.listdir(srcdir)):
             src = os.path.join(srcdir, basename)
-            if os.path.isfile(src):
+            if os.path.isfile(src) or os.path.islink(src):
                 noext, ext = os.path.splitext(basename)
-                if filter_ext is not None and ext == filter_ext:
-                    distutils.file_util.copy_file(
-                        src,
-                        os.path.join(destdir, basename),
-                        update=1,
-                    )
+                dest = os.path.join(destdir, basename)
+                if (
+                    (filter_ext is None or ext == filter_ext) and
+                    (not os.path.exists(dest) or os.path.getmtime(src) > os.path.getmtime(dest))
+                ):
+                    self.cp(src, dest)
 
     def copy_dir_if_update(self, srcdir, destdir, filter_ext=None):
         self.copy_dir_if_update_non_recursive(srcdir, destdir, filter_ext)
         srcdir_abs = os.path.abspath(srcdir)
         srcdir_abs_len = len(srcdir_abs)
-        for path, dirnames, filenames in os.walk(srcdir_abs):
-            dirnames.sort()
+        for path, dirnames, filenames in self.walk(srcdir_abs):
             for dirname in dirnames:
                 dirpath = os.path.join(path, dirname)
                 dirpath_relative_root = dirpath[srcdir_abs_len + 1:]
@@ -151,7 +149,13 @@ class ShellHelpers:
     def cp(self, src, dest, **kwargs):
         self.print_cmd(['cp', src, dest])
         if not self.dry_run:
-            shutil.copy2(src, dest)
+            if os.path.islink(src):
+                if os.path.lexists(dest):
+                    os.unlink(dest)
+                linkto = os.readlink(src)
+                os.symlink(linkto, dest)
+            else:
+                shutil.copy2(src, dest)
 
     def print_cmd(self, cmd, cwd=None, cmd_file=None, extra_env=None, extra_paths=None):
         '''
@@ -170,24 +174,33 @@ class ShellHelpers:
         if not self.quiet:
             self._print_thread_safe('+ ' + cmd_string)
         if cmd_file is not None:
+            os.makedirs(os.path.dirname(cmd_file), exist_ok=True)
             with open(cmd_file, 'w') as f:
                 f.write('#!/usr/bin/env bash\n')
                 f.write(cmd_string)
             self.chmod(cmd_file)
 
+    def rmrf(self, path):
+        self.print_cmd(['rm', '-r', '-f', path, LF])
+        if not self.dry_run and os.path.exists(path):
+            if os.path.isdir(path):
+                shutil.rmtree(path)
+            else:
+                os.unlink(path)
+
     def run_cmd(
-            self,
-            cmd,
-            cmd_file=None,
-            out_file=None,
-            show_stdout=True,
-            show_cmd=True,
-            extra_env=None,
-            extra_paths=None,
-            delete_env=None,
-            raise_on_failure=True,
-            **kwargs
-        ):
+        self,
+        cmd,
+        cmd_file=None,
+        out_file=None,
+        show_stdout=True,
+        show_cmd=True,
+        extra_env=None,
+        extra_paths=None,
+        delete_env=None,
+        raise_on_failure=True,
+        **kwargs
+    ):
         '''
         Run a command. Write the command to stdout before running it.
 
@@ -211,16 +224,16 @@ class ShellHelpers:
         :return: exit status of the command
         :rtype: int
         '''
-        if out_file is not None:
-            stdout = subprocess.PIPE
-            stderr = subprocess.STDOUT
-        else:
+        if out_file is None:
             if show_stdout:
                 stdout = None
                 stderr = None
             else:
                 stdout = subprocess.DEVNULL
                 stderr = subprocess.DEVNULL
+        else:
+            stdout = subprocess.PIPE
+            stderr = subprocess.STDOUT
         if extra_env is None:
             extra_env = {}
         if delete_env is None:
@@ -283,7 +296,9 @@ class ShellHelpers:
                 #signal.signal(signal.SIGPIPE, sigpipe_old)
             returncode = proc.returncode
             if returncode != 0 and raise_on_failure:
-                raise Exception('Command exited with status: {}'.format(returncode))
+                e = Exception('Command exited with status: {}'.format(returncode))
+                e.returncode = returncode
+                raise e
             return returncode
         else:
             return 0
@@ -302,14 +317,6 @@ class ShellHelpers:
         else:
             return [x for x in cmd if x != LF]
 
-    def rmrf(self, path):
-        self.print_cmd(['rm', '-r', '-f', path, LF])
-        if not self.dry_run and os.path.exists(path):
-            if os.path.isdir(path):
-                shutil.rmtree(path)
-            else:
-                os.unlink(path)
-
     def walk(self, root):
         '''
         Extended walk that can take files or directories.
@@ -321,6 +328,8 @@ class ShellHelpers:
             yield dirname, [], [basename]
         else:
             for path, dirnames, filenames in os.walk(root):
+                dirnames.sort()
+                filenames.sort()
                 yield path, dirnames, filenames
 
     def wget(self, url, download_path):
diff --git a/test b/test
index 3389277..8f53681 100755
--- a/test
+++ b/test
@@ -1,7 +1,7 @@
 #!/usr/bin/env python3
 
 import common
-import shell_helpers
+import lkmc.import_path
 from shell_helpers import LF
 
 class Main(common.TestCliFunction):
@@ -28,15 +28,15 @@ Size of the tests to run. Scale:
         run_args = self.get_common_args()
         test_boot_args = run_args.copy()
         test_boot_args['size'] = self.env['size']
-        self.run_test(self.import_path_main('test-boot'), test_boot_args, 'test-boot')
-        self.run_test(self.import_path_main('test-userland-full-system'), run_args, 'test-userland')
-        self.run_test(self.import_path_main('test-baremetal'), run_args, 'test-baremetal')
-        self.run_test(self.import_path_main('test-user-mode'), run_args, 'test-user-mode')
-        self.run_test(self.import_path_main('test-gdb'), run_args, 'test-gdb')
+        self.run_test(lkmc.import_path.import_path_main('test-boot'), test_boot_args, 'test-boot')
+        self.run_test(lkmc.import_path.import_path_main('test-userland-full-system'), run_args, 'test-userland')
+        self.run_test(lkmc.import_path.import_path_main('test-baremetal'), run_args, 'test-baremetal')
+        self.run_test(lkmc.import_path.import_path_main('test-user-mode'), run_args, 'test-user-mode')
+        self.run_test(lkmc.import_path.import_path_main('test-gdb'), run_args, 'test-gdb')
         if self.env['emulator'] == 'gem5':
             gem5_unit_test_args = run_args.copy()
             gem5_unit_test_args['unit_tests'] = True
-            self.run_test(self.import_path_main('build-gem5'), gem5_unit_test_args, 'gem5-unit-tests')
+            self.run_test(lkmc.import_path.import_path_main('build-gem5'), gem5_unit_test_args, 'gem5-unit-tests')
 
 if __name__ == '__main__':
     Main().cli()
diff --git a/test-baremetal b/test-baremetal
index bfb19fd..5802e02 100755
--- a/test-baremetal
+++ b/test-baremetal
@@ -4,6 +4,9 @@ import os
 import sys
 
 import common
+import lkmc.import_path
+import path_properties
+import thread_pool
 
 class Main(common.TestCliFunction):
     def __init__(self):
@@ -18,40 +21,54 @@ If given, run only the given tests. Otherwise, run all tests.
 '''
         )
 
+    def setup_one(self):
+        self.env['tests'] = self.resolve_targets(
+            self.env['baremetal_source_dir'],
+            self.env['tests']
+        )
+
     def timed_main(self):
-        run = self.import_path_main('run')
         run_args = self.get_common_args()
-        if self.env['emulator'] == 'gem5':
-            run_args['userland_build_id'] = 'static'
-        if self.env['tests'] == []:
-            baremetal_source_exts = (self.env['c_ext'], self.env['asm_ext'])
-            paths = []
-            for f in sorted(os.listdir(self.env['baremetal_source_dir'])):
-                path = os.path.join(self.env['baremetal_source_dir'], f)
-                if os.path.isfile(path) and os.path.splitext(path)[1] in baremetal_source_exts:
-                    paths.append(path)
-            for root, dirnames, filenames in os.walk(self.env['baremetal_source_arch_dir'], topdown=True):
-                dirnames[:] = [d for d in dirnames if d != 'interactive']
-                dirnames.sort()
-                for filename in filenames:
-                    path = os.path.join(root, filename)
-                    if os.path.splitext(path)[1] in baremetal_source_exts:
-                        paths.append(path)
-            sources = []
-            for path in paths:
-                if not (
-                        self.env['emulator'] == 'gem5' and os.path.basename(path).startswith('semihost_') or
-                        self.env['emulator'] == 'qemu' and os.path.basename(path).startswith('gem5_')
-                    ):
-                        sources.append(os.path.relpath(path, self.env['baremetal_source_dir']))
+        rootdir_abs_len = len(self.env['root_dir'])
+        with thread_pool.ThreadPool(
+            self.run_test,
+            nthreads=self.env['nproc'],
+            thread_id_arg='thread_id',
+        ) as my_thread_pool:
+            try:
+                for test in self.env['tests']:
+                    for path, in_dirnames, in_filenames in self.sh.walk(test):
+                        path_abs = os.path.abspath(path)
+                        dirpath_relative_root = path_abs[rootdir_abs_len + 1:]
+                        for in_filename in in_filenames:
+                            if os.path.splitext(in_filename)[1] in (self.env['c_ext'], self.env['asm_ext']):
+                                path_relative_root = os.path.join(dirpath_relative_root, in_filename)
+                                my_path_properties = path_properties.get(path_relative_root)
+                                if my_path_properties.should_be_tested(self.env):
+                                    cur_run_args = run_args.copy()
+                                    cur_run_args.update({
+                                        'baremetal': os.path.relpath(os.path.join(path_abs, in_filename), os.getcwd()),
+                                    })
+                                    cur_run_args.update(my_path_properties['test_run_args'])
+                                    test_args = {
+                                        'expected_exit_status': my_path_properties['exit_status'],
+                                        'run_args': cur_run_args,
+                                        'run_obj': lkmc.import_path.import_path_main('run'),
+                                        'test_id': path_relative_root,
+                                    }
+                                    error = my_thread_pool.submit(test_args)
+                                    if error is not None:
+                                        if self.env['quit_on_fail']:
+                                            raise common.ExitLoop()
+
+            except common.ExitLoop:
+                pass
+        error = my_thread_pool.get_error()
+        if error is not None:
+            print(error)
+            return 1
         else:
-            sources = self.env['tests']
-        for source in sources:
-            run_args['baremetal'] = source
-            run_args['ctrl_c_host'] = True
-            if os.path.splitext(os.path.basename(source))[0] == 'multicore':
-                run_args['cpus'] = 2
-            self.run_test(run, run_args, source)
+            return 0
 
 if __name__ == '__main__':
     Main().cli()
diff --git a/test-boot b/test-boot
index d373277..1e5ff33 100755
--- a/test-boot
+++ b/test-boot
@@ -1,6 +1,7 @@
 #!/usr/bin/env python3
 
 import common
+import lkmc.import_path
 import shell_helpers
 from shell_helpers import LF
 
@@ -45,7 +46,7 @@ See ./test --help for --size.
         #)
         #
         #rm -f "${self.env['test_boot_benchmark_file']}"
-        self.run = self.import_path_main('run')
+        self.run = lkmc.import_path.import_path_main('run')
         self.common_args = self.get_common_args()
         self.common_args['ctrl_c_host'] = True
         self.common_args['quit_after_boot'] = True
diff --git a/test-build-userland b/test-build-userland
new file mode 100755
index 0000000..4517c4e
--- /dev/null
+++ b/test-build-userland
@@ -0,0 +1,70 @@
+#!/usr/bin/env bash
+
+# https://github.com/cirosantilli/linux-kernel-module-cheat#cli-script-tests
+
+set -eux
+
+for in_tree in '' --in-tree; do
+  userland_build_dir="$(./getvar $in_tree userland_build_dir)"
+  # Toplevel.
+  ./build-userland $in_tree
+  [ -f "${userland_build_dir}/c/hello.out" ]
+  ./build-userland $in_tree --clean
+  ! [ -f "${userland_build_dir}/c/hello.out" ]
+
+  # Toplevel explicit.
+  ./build-userland $in_tree userland/
+  [ -f "${userland_build_dir}/c/hello.out" ]
+  ./build-userland $in_tree --clean
+  ! [ -f "${userland_build_dir}/c/hello.out" ]
+
+  # Toplevel root dir.
+  ./build-userland $in_tree .
+  [ -f "${userland_build_dir}/c/hello.out" ]
+  ./build-userland $in_tree --clean
+  ! [ -f "${userland_build_dir}/c/hello.out" ]
+
+  # Subdirectory.
+  ./build-userland $in_tree userland/c
+  [ -f "${userland_build_dir}/c/hello.out" ]
+  ./build-userland $in_tree --clean userland/c
+  ! [ -f "${userland_build_dir}/c/hello.out" ]
+
+  # One program.
+  ./build-userland $in_tree userland/c/hello.c
+  [ -f "${userland_build_dir}/c/hello.out" ]
+  ./build-userland $in_tree --clean userland/c/hello.c
+  ! [ -f "${userland_build_dir}/c/hello.out" ]
+
+  # Things that don't work: building:
+  # - non-existent files
+  # - paths outside of tree
+  ! ./build-userland $in_tree userland/c/hello
+  ! ./build-userland $in_tree userland/c/hello.
+  ! ./build-userland $in_tree "${userland_build_dir}/c/hello.out"
+  tmpfile="$(mktemp)"
+  ! ./build-userland $in_tree "$tmpfile"
+  ! ./build-userland --clean $in_tree "$tmpfile"
+  rm "$tmpfile"
+  ! ./build-userland $in_tree ..
+  ! ./build-userland $in_tree kernel_modules
+  ! ./build-userland --clean $in_tree userland/does_not_exist
+  ./build-userland --clean $in_tree
+done
+
+./build-userland-in-tree
+[ -f userland/c/hello.out ]
+./build-userland-in-tree --clean
+! [ -f userland/c/hello.out ]
+
+cd userland
+./build
+[ -f c/hello.out ]
+./build --clean
+! [ -f c/hello.out ]
+./build c
+[ -f c/hello.out ]
+./build --clean c
+! [ -f c/hello.out ]
+./build --clean c/hello.c
+! [ -f c/hello.out ]
diff --git a/test-gdb b/test-gdb
index 0932044..27af259 100755
--- a/test-gdb
+++ b/test-gdb
@@ -4,6 +4,7 @@ import threading
 import os
 
 import common
+import lkmc.import_path
 
 class Main(common.TestCliFunction):
     def __init__(self):
@@ -22,38 +23,51 @@ found by searching for the Python test files.
         )
 
     def timed_main(self):
-        run = self.import_path_main('run')
-        run_gdb = self.import_path_main('run-gdb')
+        run = lkmc.import_path.import_path_main('run')
+        run_gdb = lkmc.import_path.import_path_main('run-gdb')
         if self.env['arch'] in self.env['crosstool_ng_supported_archs']:
+            test_sources = []
             if self.env['tests'] == []:
-                test_scripts_noext = []
+                source_paths = []
                 for filename in sorted(os.listdir(self.env['baremetal_source_dir'])):
                     base, ext = os.path.splitext(filename)
-                    if ext == '.py':
-                        test_scripts_noext.append(base)
-                for root, dirnames, filenames in os.walk(os.path.join(self.env['baremetal_source_dir'], 'arch', self.env['arch'])):
+                    if ext in self.env['build_in_exts']:
+                        test_sources.append(
+                            os.path.join(
+                                self.env['baremetal_source_dir'],
+                                filename
+                            )
+                        )
+                for root, dirnames, filenames in os.walk(
+                    os.path.join(
+                        self.env['baremetal_source_dir'],
+                        'arch',
+                        self.env['arch']
+                    )
+                ):
                     for filename in filenames:
                         base, ext = os.path.splitext(filename)
-                        if ext == '.py':
-                            full_path = os.path.join(root, base)
-                            relpath = os.path.relpath(full_path, self.env['baremetal_source_dir'])
-                            test_scripts_noext.append(relpath)
+                        if ext in self.env['build_in_exts']:
+                            test_sources.append(os.path.join(root, filename))
             else:
-                test_scripts_noext = self.env['tests']
-            for test_script_noext in test_scripts_noext:
-                common_args = self.get_common_args()
-                common_args['baremetal'] = test_script_noext
-                test_id_string = self.test_setup(test_script_noext)
-                run_args = common_args.copy()
-                run_args['wait_gdb'] = True
-                run_args['background'] = True
-                run_thread = threading.Thread(target=lambda: run(**run_args))
-                run_thread.start()
-                gdb_args = common_args.copy()
-                gdb_args['test'] = True
-                run_gdb(**gdb_args)
-                run_thread.join()
-                self.test_teardown(run, 0, test_id_string)
+                test_sources = self.env['tests']
+            for test_source_full in test_sources:
+                base, ext = os.path.splitext(test_source_full)
+                if os.path.exists(base + '.py'):
+                    test_source_base = os.path.relpath(base, self.env['root_dir'])
+                    common_args = self.get_common_args()
+                    common_args['baremetal'] = test_source_base + ext
+                    test_id_string = self.test_setup(test_source_base)
+                    run_args = common_args.copy()
+                    run_args['gdb_wait'] = True
+                    run_args['background'] = True
+                    run_thread = threading.Thread(target=lambda: run(**run_args))
+                    run_thread.start()
+                    gdb_args = common_args.copy()
+                    gdb_args['test'] = True
+                    run_gdb(**gdb_args)
+                    run_thread.join()
+                    self.test_teardown(run, 0, test_id_string)
 
 if __name__ == '__main__':
     Main().cli()
diff --git a/test-test-user-mode b/test-test-user-mode
new file mode 100755
index 0000000..59e8364
--- /dev/null
+++ b/test-test-user-mode
@@ -0,0 +1,39 @@
+#!/usr/bin/env bash
+
+# https://github.com/cirosantilli/linux-kernel-module-cheat#cli-script-tests
+
+set -eux
+
+./build-userland
+./build-userland-in-tree
+
+f="$(tempfile)"
+
+./test-user-mode | tee "$f"
+grep -E '^PASS .* userland/c/hello' "$f"
+grep -E '^PASS .* userland/posix/uname' "$f"
+
+./test-user-mode userland | tee "$f"
+grep -E '^PASS .* userland/c/hello' "$f"
+grep -E '^PASS .* userland/posix/uname' "$f"
+
+./test-user-mode userland/c | tee "$f"
+grep -E '^PASS .* userland/c/hello' "$f"
+! grep -E '^PASS .* userland/posix/uname' "$f"
+
+./test-user-mode userland/c/hello.c | tee "$f"
+grep -E '^PASS .* userland/c/hello' "$f"
+! grep -E '^PASS .* userland/c/false' "$f"
+! grep -E '^PASS .* userland/posix/uname' "$f"
+
+./test-user-mode-in-tree | tee "$f"
+grep -E '^PASS .* userland/c/hello' "$f"
+grep -E '^PASS .* userland/posix/uname' "$f"
+
+cd userland
+./test
+grep -E '^PASS .* userland/c/hello' "$f"
+grep -E '^PASS .* userland/posix/uname' "$f"
+cd ..
+
+rm "$f"
diff --git a/test-user-mode b/test-user-mode
index 651a770..8776f5d 100755
--- a/test-user-mode
+++ b/test-user-mode
@@ -4,14 +4,20 @@ import os
 import sys
 
 import common
+import lkmc.import_path
+import path_properties
+import thread_pool
 
 class Main(common.TestCliFunction):
-    def __init__(self):
-        super().__init__(
-            description='''\
+    def __init__(self, *args, **kwargs):
+        if not 'description' in kwargs:
+            kwargs['description'] = '''\
 https://github.com/cirosantilli/linux-kernel-module-cheat#user-mode-tests
+TODO: expose all userland relevant ./run args here as well somehow.
 '''
-        )
+        if not 'defaults' in kwargs:
+            kwargs['defaults'] = {}
+        super().__init__(*args, **kwargs)
         self.add_argument(
             'tests',
             nargs='*',
@@ -20,40 +26,58 @@ If given, run only the given tests. Otherwise, run all tests.
 '''
         )
 
+    def setup_one(self):
+        self.env['tests'] = self.resolve_targets(
+            self.env['userland_source_dir'],
+            self.env['tests']
+        )
+
     def timed_main(self):
-        run = self.import_path_main('run')
         run_args = self.get_common_args()
-        run_args['ctrl_c_host'] = True
         if self.env['emulator'] == 'gem5':
             run_args['userland_build_id'] = 'static'
-        if self.env['tests'] == []:
-            sources = [
-                'add.c',
-                'hello.c',
-                'hello_cpp.cpp',
-                'print_argv.c',
-            ]
-            if self.env['arch'] == 'x86_64':
-                arch_sources = [
-                    'asm_hello'
-                ]
-            elif self.env['arch'] == 'aarch64':
-                arch_sources = [
-                    'asm_hello'
-                ]
-            else:
-                arch_sources = []
-            arch_sources[:] = [
-                os.path.join('arch', self.env['arch'], arch_source)
-                for arch_source
-                in arch_sources
-            ]
-            sources.extend(arch_sources)
+        had_failure = False
+        rootdir_abs_len = len(self.env['root_dir'])
+        with thread_pool.ThreadPool(
+            self.run_test,
+            nthreads=self.env['nproc'],
+            thread_id_arg='thread_id',
+        ) as my_thread_pool:
+            try:
+                for test in self.env['tests']:
+                    for path, in_dirnames, in_filenames in self.sh.walk(test):
+                        path_abs = os.path.abspath(path)
+                        dirpath_relative_root = path_abs[rootdir_abs_len + 1:]
+                        for in_filename in in_filenames:
+                            if os.path.splitext(in_filename)[1] in self.env['build_in_exts']:
+                                path_relative_root = os.path.join(dirpath_relative_root, in_filename)
+                                my_path_properties = path_properties.get(path_relative_root)
+                                if my_path_properties.should_be_tested(self.env):
+                                    cur_run_args = run_args.copy()
+                                    cur_run_args.update({
+                                        'userland': os.path.relpath(os.path.join(path_abs, in_filename), os.getcwd()),
+                                    })
+                                    cur_run_args.update(my_path_properties['test_run_args'])
+                                    run_test_args = {
+                                        'expected_exit_status': my_path_properties['exit_status'],
+                                        'run_args': cur_run_args,
+                                        'run_obj':  lkmc.import_path.import_path_main('run'),
+                                        'test_id': path_relative_root,
+                                    }
+                                    if my_path_properties['receives_signal']:
+                                        run_test_args['expected_exit_status'] = 128 - my_path_properties['exit_status']
+                                    error = my_thread_pool.submit(run_test_args)
+                                    if error is not None:
+                                        if self.env['quit_on_fail']:
+                                            raise common.ExitLoop()
+            except common.ExitLoop:
+                pass
+        error = my_thread_pool.get_error()
+        if error is not None:
+            print(error)
+            return 1
         else:
-            sources = self.env['tests']
-        for source in sources:
-            run_args['userland'] = source
-            self.run_test(run, run_args, source)
+            return 0
 
 if __name__ == '__main__':
     Main().cli()
diff --git a/test-user-mode-in-tree b/test-user-mode-in-tree
new file mode 100755
index 0000000..38fedba
--- /dev/null
+++ b/test-user-mode-in-tree
@@ -0,0 +1,21 @@
+#!/usr/bin/env python3
+
+import lkmc.import_path
+
+test_user_mode = lkmc.import_path.import_path_relative_root('test-user-mode')
+
+class Main(test_user_mode.Main):
+    def __init__(self):
+        super().__init__(
+            description='''\
+https://github.com/cirosantilli/linux-kernel-module-cheat#userland-setup-getting-started-natively
+''',
+            defaults={
+                'emulators': ['native'],
+                'in_tree': True,
+                'tests': ['.'],
+            }
+        )
+
+if __name__ == '__main__':
+    Main().cli()
diff --git a/test-userland-full-system b/test-userland-full-system
index 9178d2d..32623d1 100755
--- a/test-userland-full-system
+++ b/test-userland-full-system
@@ -1,9 +1,9 @@
 #!/usr/bin/env python3
 
 import os
-import sys
 
 import common
+import lkmc.import_path
 
 class Main(common.TestCliFunction):
     def __init__(self):
@@ -13,9 +13,9 @@ https://github.com/cirosantilli/linux-kernel-module-cheat#test-userland-in-full-
 '''
         )
     def timed_main(self):
-        run = self.import_path_main('run')
+        run = lkmc.import_path.import_path_main('run')
         run_args = self.get_common_args()
-        run_args['eval_after'] = '/test_all.sh;{};'.format(self.env['userland_quit_cmd'])
+        run_args['eval_after'] = './test_all.sh;{};'.format(self.env['userland_quit_cmd'])
         self.run_test(run, run_args)
 
 if __name__ == '__main__':
diff --git a/thread_pool.py b/thread_pool.py
new file mode 100644
index 0000000..5b0b677
--- /dev/null
+++ b/thread_pool.py
@@ -0,0 +1,269 @@
+#!/usr/bin/env python3
+
+from typing import Any, Callable, Dict, Iterable, Union
+import os
+import queue
+import sys
+import threading
+import time
+
+class ThreadPool:
+    '''
+    Start a pool of a limited number of threads to do some work.
+
+    This is similar to the stdlib concurrent, but I could not find
+    how to reach all my design goals with that implementation:
+
+    * the input function does not need to be modified
+    * limit the number of threads
+    * queue sizes closely follow number of threads
+    * if an exception happens, optionally stop soon afterwards
+
+    Functional form and further discussion at:
+    https://stackoverflow.com/questions/19369724/the-right-way-to-limit-maximum-number-of-threads-running-at-once/55263676#55263676
+
+    This class form allows to use your own while loops with submit().
+
+    Quick test with:
+
+    ....
+    python3 thread_pool.py 2 -10 20 0
+    python3 thread_pool.py 2 -10 20 1
+    python3 thread_pool.py 2 -10 20 2
+    python3 thread_pool.py 2 -10 20 3
+    python3 thread_pool.py 2 -10 20 0 1
+    ....
+
+    These ensure that execution stops neatly on error.
+    '''
+    def __init__(
+        self,
+        func: Callable,
+        handle_output: Union[Callable[[Any,Any,Exception],Any],None] = None,
+        nthreads: Union[int,None] = None,
+        thread_id_arg: Union[str,None] = None,
+    ):
+        '''
+        Start in a thread pool immediately.
+
+        join() must be called afterwards at some point.
+
+        :param func: main work function to be evaluated.
+        :param handle_output: called on func return values as they
+            are returned.
+
+            Signature is: handle_output(input, output, exception) where:
+
+            * input: input given to func
+            * output: return value of func
+            * exception: the exception that func raised, or None otherwise
+
+            If this function returns non-None or raises, stop feeding
+            new input and exit ASAP when all currently running threads
+            have finished.
+
+            Default: a handler that does nothing and just exits on exception.
+        :param nthreads: number of threads to use. Default: nproc.
+        :param thread_id_arg: if not None, set the argument of func with this name
+            to a 0-indexed thread ID. This allows function calls to coordinate
+            usage of external resources such as files or ports.
+        '''
+        self.func = func
+        if handle_output is None:
+            handle_output = lambda input, output, exception: exception
+        self.handle_output = handle_output
+        if nthreads is None:
+            nthreads = len(os.sched_getaffinity(0))
+        self.thread_id_arg = thread_id_arg
+        self.nthreads = nthreads
+        self.error_output = None
+        self.error_output_lock = threading.Lock()
+        self.in_queue = queue.Queue(maxsize=nthreads)
+        self.threads = []
+        for i in range(self.nthreads):
+            thread = threading.Thread(
+                target=self._func_runner,
+                args=(i,)
+            )
+            self.threads.append(thread)
+            thread.start()
+
+    def __enter__(self):
+        '''
+        __exit__ automatically calls join() for you.
+
+        This is cool because it automatically ends the loop if an exception occurs.
+
+        But don't forget that errors may happen after the last submit is called, so you
+        likely want to check for that with get_error after the with.
+
+        get_error() returns the same as the explicit join().
+        '''
+        return self
+
+    def __exit__(self, type, value, traceback):
+        self.join()
+
+    def get_error(self):
+        return self.error_output
+
+    def submit(self, work):
+        '''
+        Submit work. Block if there is already enough work scheduled (~nthreads).
+
+        :return: if an error occurred in some previously executed thread, the error.
+                 Otherwise, None. This allows the caller to stop submitting further
+                 work if desired.
+        '''
+        self.in_queue.put(work)
+        return self.error_output
+
+    def join(self):
+        '''
+        Request all threads to stop after they finish currently submitted work.
+
+        :return: same as submit()
+        '''
+        for thread in range(self.nthreads):
+            self.in_queue.put(None)
+        for thread in self.threads:
+            thread.join()
+        return self.error_output
+
+    def _func_runner(self, thread_id):
+        while True:
+            work = self.in_queue.get(block=True)
+            if work is None:
+                break
+            if self.thread_id_arg is not None:
+                work[self.thread_id_arg] = thread_id
+            try:
+                exception = None
+                out = self.func(**work)
+            except Exception as e:
+                exception = e
+                out = None
+            try:
+                handle_output_return = self.handle_output(work, out, exception)
+            except Exception as e:
+                with self.error_output_lock:
+                    self.error_output = (work, out, e)
+            else:
+                if handle_output_return is not None:
+                    with self.error_output_lock:
+                        self.error_output = handle_output_return
+            finally:
+                self.in_queue.task_done()
+
+if __name__ == '__main__':
+    def func_maybe_raise(i):
+        '''
+        The main function that will be evaluated.
+
+        It sleeps to simulate an IO operation.
+        '''
+        time.sleep((abs(i) % 4) / 10.0)
+        return 10.0 / i
+
+    def func_get_thread(i, thread_id):
+        time.sleep((abs(i) % 4) / 10.0)
+        return thread_id
+
+    def get_work(min_, max_):
+        '''
+        Generate simple range work for my_func.
+        '''
+        for i in range(min_, max_):
+            yield {'i': i}
+
+    def handle_output_print(input, output, exception):
+        '''
+        Print outputs and exit immediately on failure.
+        '''
+        print('{!r} {!r} {!r}'.format(input, output, exception))
+        return exception
+
+    def handle_output_print_no_exit(input, output, exception):
+        '''
+        Print outputs, don't exit on failure.
+        '''
+        print('{!r} {!r} {!r}'.format(input, output, exception))
+
+    out_queue = queue.Queue()
+    def handle_output_queue(input, output, exception):
+        '''
+        Store outputs in a queue for later usage.
+        '''
+        global out_queue
+        out_queue.put((input, output, exception))
+        return exception
+
+    def handle_output_raise(input, output, exception):
+        '''
+        Raise if input == 10, to test that execution
+        stops nicely if this raises.
+        '''
+        print('{!r} {!r} {!r}'.format(input, output, exception))
+        if input['i'] == 10:
+            raise Exception
+
+    # CLI arguments.
+    argv_len = len(sys.argv)
+    if argv_len > 1:
+        nthreads = int(sys.argv[1])
+        if nthreads == 0:
+            nthreads = None
+    else:
+        nthreads = None
+
+    if argv_len > 2:
+        min_ = int(sys.argv[2])
+    else:
+        min_ = 1
+
+    if argv_len > 3:
+        max_ = int(sys.argv[3])
+    else:
+        max_ = 100
+
+    if argv_len > 4:
+        c = sys.argv[4][0]
+    else:
+        c = '0'
+    if c == '1':
+        handle_output = handle_output_print_no_exit
+    elif c == '2':
+        handle_output = handle_output_queue
+    elif c == '3':
+        handle_output = handle_output_raise
+    else:
+        handle_output = handle_output_print
+
+    if argv_len > 5:
+        c = sys.argv[5][0]
+    else:
+        c = '0'
+    if c == '1':
+        my_func = func_get_thread
+        thread_id_arg = 'thread_id'
+    else:
+        my_func = func_maybe_raise
+        thread_id_arg = None
+
+    # Action.
+    thread_pool = ThreadPool(
+        my_func,
+        handle_output,
+        nthreads,
+        thread_id_arg,
+    )
+    for work in get_work(min_, max_):
+        error = thread_pool.submit(work)
+        if error is not None:
+            break
+    error = thread_pool.join()
+    if error is not None:
+        print('error: {!r}'.format(error))
+    if handle_output == handle_output_queue:
+        while not out_queue.empty():
+            print(out_queue.get())
diff --git a/tmu b/tmux-split
similarity index 93%
rename from tmu
rename to tmux-split
index 8a6bc03..f505da9 100755
--- a/tmu
+++ b/tmux-split
@@ -1,4 +1,5 @@
 #!/usr/bin/env bash
+# TODO: move to Python.
 if [ "$(tmux list-panes | wc -l | cut -d' ' -f1)" -ne 1 ]; then
   tmux kill-pane -t 1
 fi
diff --git a/trace-boot b/trace-boot
index 4336979..51458fc 100755
--- a/trace-boot
+++ b/trace-boot
@@ -1,7 +1,8 @@
 #!/usr/bin/env python3
 
-import common
 from shell_helpers import LF
+import common
+import lkmc.import_path
 
 class Main(common.LkmcCliFunction):
     def __init__(self):
@@ -14,17 +15,17 @@ More information at: https://github.com/cirosantilli/linux-kernel-module-cheat#t
 
     def timed_main(self):
         args = self.get_common_args()
-        run = self.import_path_main('run')
+        run = lkmc.import_path.import_path_main('run')
         if self.env['emulator'] == 'gem5':
             args['trace'] = 'Exec,-ExecSymbol,-ExecMicro'
-            run.main(**args)
+            run(**args)
         elif self.env['emulator'] == 'qemu':
             run_args = args.copy()
             run_args['trace'] = 'exec_tb'
             run_args['quit_after_boot'] = True
-            run.main(**run_args)
-            qemu_trace2txt = self.import_path_main('qemu-trace2txt')
-            qemu_trace2txt.main(**args)
+            run(**run_args)
+            qemu_trace2txt = lkmc.import_path.import_path_main('qemu-trace2txt')
+            qemu_trace2txt(**args)
             # Instruction count.
             # We could put this on a separate script, but it just adds more arch boilerplate to a new script.
             # So let's just leave it here for now since it did not add a significant processing time.
diff --git a/trace2line b/trace2line
index cd53d98..4563d97 100755
--- a/trace2line
+++ b/trace2line
@@ -8,27 +8,39 @@ now...
 '''
 
 import os
-import re
-import subprocess
-import sys
 
 import common
+import lkmc.import_path
 from shell_helpers import LF
 
-parser = self.get_argparse(argparse_args={
-    'description': 'Convert an execution trace containing PC values into the Linux kernel linex executed'
-})
-args = self.setup(parser)
-sys.exit(subprocess.Popen([
-    os.path.join(kwargs['root_dir'], 'trace2line.sh'),
-    'true' if kwargs['emulator'] == 'gem5' else 'false',
-    kwargs['trace_txt_file'],
-    self.get_toolchain_tool('addr2line'),
-    kwargs['vmlinux'],
-    kwargs['run_dir'],
-]).wait())
+class Main(common.LkmcCliFunction):
+    def __init__(self):
+        super().__init__(
+            defaults = {
+                'show_time': False,
+            },
+            description='''\
+Convert an execution trace containing PC values into the Linux kernel lines executed.
+''',
+        )
 
-# This was the full conversion attempt.
+    def timed_main(self):
+        self.sh.run_cmd([
+            os.path.join(self.env['root_dir'], 'trace2line.sh'), LF,
+            'true' if self.env['emulator'] == 'gem5' else 'false', LF,
+            self.env['trace_txt_file'], LF,
+            self.get_toolchain_tool('addr2line'), LF,
+            self.env['vmlinux'], LF,
+            self.env['run_dir'], LF,
+        ])
+
+if __name__ == '__main__':
+    Main().cli()
+
+# This was the old full Python port attempt that was failing:
+
+# import subprocess
+# import sys
 
 # if kwargs['emulator'] == 'gem5':
 #     def get_pc(line):
diff --git a/userland/README.adoc b/userland/README.adoc
deleted file mode 100644
index 78fdc90..0000000
--- a/userland/README.adoc
+++ /dev/null
@@ -1 +0,0 @@
-https://github.com/cirosantilli/linux-kernel-module-cheat#ansi-c
diff --git a/userland/add.c b/userland/add.c
deleted file mode 120000
index b0aea66..0000000
--- a/userland/add.c
+++ /dev/null
@@ -1 +0,0 @@
-../baremetal/add.c
\ No newline at end of file
diff --git a/userland/add.py b/userland/add.py
deleted file mode 120000
index 5b1d1e0..0000000
--- a/userland/add.py
+++ /dev/null
@@ -1 +0,0 @@
-../baremetal/add.py
\ No newline at end of file
diff --git a/userland/anonymous_inode.c b/userland/anonymous_inode.c
deleted file mode 100644
index 7049e36..0000000
--- a/userland/anonymous_inode.c
+++ /dev/null
@@ -1,46 +0,0 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#anonymous-inode */
-
-#define _GNU_SOURCE
-#include <errno.h>
-#include <fcntl.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <sys/ioctl.h>
-#include <sys/stat.h>
-#include <sys/types.h>
-#include <unistd.h> /* sleep */
-
-#include "../include/anonymous_inode.h"
-
-int main(int argc, char **argv)
-{
-	char buf[1024];
-	int fd_ioctl, fd_ioctl_anon, ret;
-	size_t i, nreads;
-
-	if (argc < 2) {
-		puts("Usage: ./prog <ioctl-file> [<nreads>]");
-		return EXIT_FAILURE;
-	} else if (argc > 2) {
-		nreads = strtol(argv[2], NULL, 10);
-	} else {
-		nreads = 3;
-	}
-	fd_ioctl = open(argv[1], O_RDONLY);
-	if (fd_ioctl == -1) {
-		perror("open");
-		return EXIT_FAILURE;
-	}
-	ret = ioctl(fd_ioctl, LKMC_ANONYMOUS_INODE_GET_FD, &fd_ioctl_anon);
-	if (ret == -1) {
-		perror("ioctl");
-		return EXIT_FAILURE;
-	}
-	for (i = 0; i < nreads; ++i) {
-		ret = read(fd_ioctl_anon, buf, sizeof(buf));
-		printf("%.*s\n", ret, buf);
-	}
-	close(fd_ioctl_anon);
-	close(fd_ioctl);
-	return EXIT_SUCCESS;
-}
diff --git a/userland/arch/aarch64/add.S b/userland/arch/aarch64/add.S
new file mode 100644
index 0000000..120c746
--- /dev/null
+++ b/userland/arch/aarch64/add.S
@@ -0,0 +1,9 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#userland-assembly */
+
+#include "common.h"
+
+ENTRY
+    mov x0, 1
+    add x1, x0, 2
+    ASSERT_EQ(x1, 3)
+EXIT
diff --git a/userland/arch/aarch64/add_vector.S b/userland/arch/aarch64/add_vector.S
new file mode 100644
index 0000000..c919551
--- /dev/null
+++ b/userland/arch/aarch64/add_vector.S
@@ -0,0 +1,32 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#armv8-aarch64-add-vector-instruction
+ *
+ * Add a bunch of integers in one go.
+ */
+
+#include "common.h"
+
+ENTRY
+.data
+    input0:    .long 0xF1F1F1F1, 0xF2F2F2F2, 0xF3F3F3F3, 0xF4F4F4F4
+    input1:    .long 0x12121212, 0x13131313, 0x14141414, 0x15151515
+    expect_4s: .long 0x04040403, 0x06060605, 0x08080807, 0x0A0A0A09
+    expect_2d: .long 0x04040403, 0x06060606, 0x08080807, 0x0A0A0A0A
+.bss
+    output:    .skip 16
+.text
+#define TEST(size) \
+    adr x0, input0; \
+    ld1 {v0. ## size}, [x0]; \
+    adr x1, input1; \
+    ld1 {v1. ## size}, [x1]; \
+    add v2. ## size, v0. ## size, v1. ## size; \
+    adr x0, output; \
+    st1 {v2. ## size}, [x0]; \
+    ASSERT_MEMCMP(output, expect_ ## size, 0x10)
+
+    /* 4x 32-bit */
+    TEST(4s)
+    /* 2x 64-bit */
+    TEST(2d)
+#undef TEST
+EXIT
diff --git a/userland/arch/aarch64/adr.S b/userland/arch/aarch64/adr.S
new file mode 100644
index 0000000..51478f5
--- /dev/null
+++ b/userland/arch/aarch64/adr.S
@@ -0,0 +1,21 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-adr-instruction */
+
+#include "common.h"
+
+.data
+data_label:
+    .word 0x1234678
+ENTRY
+    /* This is not possible in v7 because the label is in another section.
+     * objdump says that this generates a R_AARCH64_ADR_PRE relocation.
+     * which looks specific to ADR, and therefore makes it more likely
+     * that there was no such relocation in v7.
+     *
+     * This relocation is particularly important because str does not have a
+     * pc-relative mode in ARMv8.
+     */
+    adr x0, data_label
+    ldr x1, =data_label
+label:
+    ASSERT_EQ_REG(x0, x1)
+EXIT
diff --git a/userland/arch/aarch64/adrp.S b/userland/arch/aarch64/adrp.S
new file mode 100644
index 0000000..530d749
--- /dev/null
+++ b/userland/arch/aarch64/adrp.S
@@ -0,0 +1,13 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-adr-instruction */
+
+#include "common.h"
+
+ENTRY
+    adrp x0, label
+    adr x1, label
+label:
+    /* Clear the lower 12 bits. */
+    bic x1, x1, 0xFF
+    bic x1, x1, 0xF00
+    ASSERT_EQ_REG(x0, x1)
+EXIT
diff --git a/userland/arch/aarch64/asm_hello.c b/userland/arch/aarch64/asm_hello.c
deleted file mode 100644
index 8ca733a..0000000
--- a/userland/arch/aarch64/asm_hello.c
+++ /dev/null
@@ -1,13 +0,0 @@
-#include <assert.h>
-#include <inttypes.h>
-
-int main(void) {
-    uint32_t myvar = 1;
-    __asm__ (
-        "add %[myvar], %[myvar], 1;"
-        : [myvar] "=r" (myvar)
-        :
-        :
-    );
-    assert(myvar == 2);
-}
diff --git a/userland/arch/aarch64/beq.S b/userland/arch/aarch64/beq.S
new file mode 100644
index 0000000..997c8ec
--- /dev/null
+++ b/userland/arch/aarch64/beq.S
@@ -0,0 +1,33 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-beq-instruction */
+
+#include "common.h"
+
+ENTRY
+    /* cbz == 0 */
+    mov x0, 0
+    cbz x0, 1f
+    FAIL
+1:
+
+    /* cbz != 0 */
+    mov x0, 1
+    cbz x0, 1f
+    b 2f
+1:
+    FAIL
+2:
+
+    /* cbnz != 0 */
+    mov x0, 1
+    cbnz x0, 1f
+    FAIL
+1:
+
+    /* cbnz == 0 */
+    mov x0, 0
+    cbnz x0, 1f
+    b 2f
+1:
+    FAIL
+2:
+EXIT
diff --git a/userland/arch/aarch64/bfi.S b/userland/arch/aarch64/bfi.S
new file mode 100644
index 0000000..b3b6596
--- /dev/null
+++ b/userland/arch/aarch64/bfi.S
@@ -0,0 +1,11 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-bfi-instruction */
+
+#include "common.h"
+
+ENTRY
+    ldr x0, =0x1122334455667788
+
+    ldr x1, =0xFFFFFFFFFFFFFFFF
+    bfi x1, x0, 16, 32
+    ASSERT_EQ(x1, 0xFFFF55667788FFFF)
+EXIT
diff --git a/userland/arch/aarch64/c/build b/userland/arch/aarch64/c/build
new file mode 120000
index 0000000..ab18017
--- /dev/null
+++ b/userland/arch/aarch64/c/build
@@ -0,0 +1 @@
+../build
\ No newline at end of file
diff --git a/userland/arch/aarch64/c/earlyclobber.c b/userland/arch/aarch64/c/earlyclobber.c
new file mode 100644
index 0000000..4e9b6f2
--- /dev/null
+++ b/userland/arch/aarch64/c/earlyclobber.c
@@ -0,0 +1,17 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#gcc-inline-assembly-early-clobbers */
+
+#include <assert.h>
+#include <inttypes.h>
+
+int main(void) {
+    uint64_t in = 1;
+    uint64_t out;
+    __asm__ (
+        "add %[out], %[in], 1;"
+        "add %[out], %[in], 1;"
+        : [out] "=&r" (out)
+        : [in] "r" (in)
+        :
+    );
+    assert(out == 2);
+}
diff --git a/userland/arch/aarch64/c/freestanding/build b/userland/arch/aarch64/c/freestanding/build
new file mode 120000
index 0000000..ab18017
--- /dev/null
+++ b/userland/arch/aarch64/c/freestanding/build
@@ -0,0 +1 @@
+../build
\ No newline at end of file
diff --git a/userland/arch/aarch64/c/freestanding/linux/build b/userland/arch/aarch64/c/freestanding/linux/build
new file mode 120000
index 0000000..ab18017
--- /dev/null
+++ b/userland/arch/aarch64/c/freestanding/linux/build
@@ -0,0 +1 @@
+../build
\ No newline at end of file
diff --git a/userland/arch/aarch64/c/freestanding/linux/hello.c b/userland/arch/aarch64/c/freestanding/linux/hello.c
new file mode 100644
index 0000000..f1f48e9
--- /dev/null
+++ b/userland/arch/aarch64/c/freestanding/linux/hello.c
@@ -0,0 +1,39 @@
+/* aarch64 freestanding C inline assemby Linux hello world
+ * https://github.com/cirosantilli/linux-kernel-module-cheat#linux-system-calls
+ */
+
+#include <inttypes.h>
+
+void _start(void) {
+    uint64_t exit_status;
+
+    /* write */
+    {
+        char msg[] = "hello\n";
+        uint64_t syscall_return;
+        register uint64_t x0 __asm__ ("x0") = 1; /* stdout */
+        register char *x1 __asm__ ("x1") = msg;
+        register uint64_t x2 __asm__ ("x2") = sizeof(msg);
+        register uint64_t x8 __asm__ ("x8") = 64; /* syscall number */
+        __asm__ __volatile__ (
+            "svc 0;"
+            : "+r" (x0)
+            : "r" (x1), "r" (x2), "r" (x8)
+            : "memory"
+        );
+        syscall_return = x0;
+        exit_status = (syscall_return != sizeof(msg));
+    }
+
+    /* exit */
+    {
+        register uint64_t x0 __asm__ ("x0") = exit_status;
+        register uint64_t x8 __asm__ ("x8") = 93;
+        __asm__ __volatile__ (
+            "svc 0;"
+            : "+r" (x0)
+            : "r" (x8)
+            :
+        );
+    }
+}
diff --git a/userland/arch/aarch64/c/freestanding/linux/hello_clobbers.c b/userland/arch/aarch64/c/freestanding/linux/hello_clobbers.c
new file mode 100644
index 0000000..af0b69e
--- /dev/null
+++ b/userland/arch/aarch64/c/freestanding/linux/hello_clobbers.c
@@ -0,0 +1,42 @@
+/* Like hello.c trying to do it without named register variables.
+ * The code is more complicated, and I was not able to get as efficient,
+ * so better just stick to named register variables.
+ *
+ * https://github.com/cirosantilli/linux-kernel-module-cheat#linux-system-calls
+ */
+
+#include <inttypes.h>
+
+void _start(void) {
+    uint64_t exit_status;
+
+    /* write */
+    {
+        char msg[] = "hello\n";
+        uint64_t syscall_return;
+        __asm__ (
+            "mov x0, 1;" /* stdout */
+            "mov x1, %[msg];"
+            "mov x2, %[len];"
+            "mov x8, 64;" /* syscall number */
+            "svc 0;"
+            "mov %[syscall_return], x0;"
+            : [syscall_return] "=r" (syscall_return)
+            : [msg] "p" (msg),
+            [len] "i" (sizeof(msg))
+            : "x0", "x1", "x2", "x8", "memory"
+        );
+        exit_status = (syscall_return != sizeof(msg));
+    }
+
+    /* exit */
+    __asm__ (
+        "mov x0, %[exit_status];"
+        "mov x8, 93;" /* syscall number */
+        "svc 0;"
+        :
+        : [exit_status] "r" (exit_status)
+        : "x0", "x8"
+    );
+}
+
diff --git a/userland/arch/aarch64/c/freestanding/linux/test b/userland/arch/aarch64/c/freestanding/linux/test
new file mode 120000
index 0000000..419df4f
--- /dev/null
+++ b/userland/arch/aarch64/c/freestanding/linux/test
@@ -0,0 +1 @@
+../test
\ No newline at end of file
diff --git a/userland/arch/aarch64/c/freestanding/test b/userland/arch/aarch64/c/freestanding/test
new file mode 120000
index 0000000..419df4f
--- /dev/null
+++ b/userland/arch/aarch64/c/freestanding/test
@@ -0,0 +1 @@
+../test
\ No newline at end of file
diff --git a/userland/arch/aarch64/c/inc.c b/userland/arch/aarch64/c/inc.c
new file mode 100644
index 0000000..80b259c
--- /dev/null
+++ b/userland/arch/aarch64/c/inc.c
@@ -0,0 +1,18 @@
+/* Increment a variable in inline assembly.
+ *
+ * https://github.com/cirosantilli/linux-kernel-module-cheat#gcc-inline-assembly
+ */
+
+#include <assert.h>
+#include <inttypes.h>
+
+int main(void) {
+    uint64_t io = 1;
+    __asm__ (
+        "add %[io], %[io], 1;"
+        : [io] "+r" (io)
+        :
+        :
+    );
+    assert(io == 2);
+}
diff --git a/userland/arch/aarch64/c/inc_float.c b/userland/arch/aarch64/c/inc_float.c
new file mode 100644
index 0000000..bada60d
--- /dev/null
+++ b/userland/arch/aarch64/c/inc_float.c
@@ -0,0 +1,25 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#gcc-inline-assembly-floating-point-arm */
+
+#include <assert.h>
+
+int main(void) {
+    float my_float = 1.5;
+    __asm__ (
+        "fmov s0, 1.0;"
+        "fadd %s[my_float], %s[my_float], s0;"
+        : [my_float] "+w" (my_float)
+        :
+        : "s0"
+    );
+    assert(my_float == 2.5);
+
+    double my_double = 1.5;
+    __asm__ (
+        "fmov d0, 1.0;"
+        "fadd %d[my_double], %d[my_double], d0;"
+        : [my_double] "+w" (my_double)
+        :
+        : "d0"
+    );
+    assert(my_double == 2.5);
+}
diff --git a/userland/arch/aarch64/c/linux/asm_from_c.c b/userland/arch/aarch64/c/linux/asm_from_c.c
new file mode 100644
index 0000000..d5b6bc6
--- /dev/null
+++ b/userland/arch/aarch64/c/linux/asm_from_c.c
@@ -0,0 +1,39 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-calling-convention */
+
+#include <assert.h>
+#include <inttypes.h>
+
+uint64_t my_asm_func(void);
+/* { return 42; } */
+__asm__(
+    ".global my_asm_func;"
+    "my_asm_func:"
+    "mov x0, 42;"
+    "ret;"
+);
+
+/* Now a more complex example that also calls a C function.
+ * We have to store the return value x30 for later because bl modifies it.
+ * https://stackoverflow.com/questions/27941220/push-lr-and-pop-lr-in-arm-arch64/34504752#34504752
+ * We are not modifying any other callee saved register in this function,
+ * since my_c_func is not either (unless GCC has a bug ;-)), so everything else if fine.
+ */
+uint64_t my_asm_func_2(void);
+/* { return my_c_func(); } */
+__asm__(
+    ".global my_asm_func_2;"
+    "my_asm_func_2:"
+    "str x30, [sp, -16]!;"
+    "bl my_c_func;"
+    "ldr x30, [sp], 16;"
+    "ret;"
+);
+
+uint64_t my_c_func(void) {
+    return 42;
+}
+
+int main(void) {
+    assert(my_asm_func() == 42);
+    assert(my_asm_func_2() == 42);
+}
diff --git a/userland/arch/aarch64/c/linux/build b/userland/arch/aarch64/c/linux/build
new file mode 120000
index 0000000..ab18017
--- /dev/null
+++ b/userland/arch/aarch64/c/linux/build
@@ -0,0 +1 @@
+../build
\ No newline at end of file
diff --git a/userland/arch/aarch64/c/linux/test b/userland/arch/aarch64/c/linux/test
new file mode 120000
index 0000000..419df4f
--- /dev/null
+++ b/userland/arch/aarch64/c/linux/test
@@ -0,0 +1 @@
+../test
\ No newline at end of file
diff --git a/userland/arch/aarch64/c/multiline.cpp b/userland/arch/aarch64/c/multiline.cpp
new file mode 100644
index 0000000..f618db6
--- /dev/null
+++ b/userland/arch/aarch64/c/multiline.cpp
@@ -0,0 +1,18 @@
+// https://stackoverflow.com/questions/3666013/how-to-write-multiline-inline-assembly-code-in-gcc-c/54575948#54575948
+
+#include <assert.h>
+#include <inttypes.h>
+
+int main(void) {
+    uint64_t io = 0;
+    __asm__ (
+        R"(
+add %[io], %[io], #1
+add %[io], %[io], #1
+)"
+        : [io] "+r" (io)
+        :
+        :
+    );
+    assert(io == 2);
+}
diff --git a/userland/arch/aarch64/c/reg_var.c b/userland/arch/aarch64/c/reg_var.c
new file mode 100644
index 0000000..52ed40d
--- /dev/null
+++ b/userland/arch/aarch64/c/reg_var.c
@@ -0,0 +1,27 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#gcc-inline-assembly-register-variables */
+
+#include <assert.h>
+#include <inttypes.h>
+
+int main(void) {
+    register uint32_t x0 __asm__ ("x0");
+    register uint32_t x1 __asm__ ("x1");
+    uint32_t new_x0;
+    uint32_t new_x1;
+    {
+        x0 = 1;
+        x1 = 2;
+        __asm__ (
+            "add %[x0], x0, #1;"
+            "add %[x1], x1, #1;"
+            : [x0] "+r" (x0),
+              [x1] "+r" (x1)
+            :
+            :
+        );
+        new_x0 = x0;
+        new_x1 = x1;
+    }
+    assert(new_x0 == 2);
+    assert(new_x1 == 3);
+}
diff --git a/userland/arch/aarch64/c/reg_var_float.c b/userland/arch/aarch64/c/reg_var_float.c
new file mode 100644
index 0000000..8c1c88e
--- /dev/null
+++ b/userland/arch/aarch64/c/reg_var_float.c
@@ -0,0 +1,28 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#gcc-inline-assembly-register-variables */
+
+#include <assert.h>
+#include <inttypes.h>
+
+int main(void) {
+    register double d0 __asm__ ("d0");
+    register double d1 __asm__ ("d1");
+    double new_d0;
+    double new_d1;
+    {
+        d0 = 1.5;
+        d1 = 2.5;
+        __asm__ (
+            "fmov d2, 1.5;"
+            "fadd %d[d0], d0, d2;"
+            "fadd %d[d1], d1, d2;"
+            : [d0] "+w" (d0),
+              [d1] "+w" (d1)
+            :
+            : "d2"
+        );
+        new_d0 = d0;
+        new_d1 = d1;
+    }
+    assert(new_d0 == 3.0);
+    assert(new_d1 == 4.0);
+}
diff --git a/userland/arch/aarch64/c/test b/userland/arch/aarch64/c/test
new file mode 120000
index 0000000..419df4f
--- /dev/null
+++ b/userland/arch/aarch64/c/test
@@ -0,0 +1 @@
+../test
\ No newline at end of file
diff --git a/userland/arch/aarch64/cbz.S b/userland/arch/aarch64/cbz.S
new file mode 100644
index 0000000..31982ab
--- /dev/null
+++ b/userland/arch/aarch64/cbz.S
@@ -0,0 +1,19 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-cbz-instruction */
+
+#include "common.h"
+
+ENTRY
+
+    /* Branch. */
+    mov x0, 0x0
+    cbz x0, ok
+    FAIL
+ok:
+
+    /* Don't branch. */
+    mov x0, 0x1
+    cbz x0, ko
+
+EXIT
+ko:
+    FAIL
diff --git a/userland/arch/aarch64/comments.S b/userland/arch/aarch64/comments.S
new file mode 100644
index 0000000..4a2dd25
--- /dev/null
+++ b/userland/arch/aarch64/comments.S
@@ -0,0 +1,17 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#gnu-gas-assembler-comments */
+
+#include "common.h"
+ENTRY
+    # mycomment
+
+    /* ARMv8 has // instead of @ as for comments. */
+    // mycomment
+    nop // mycomment
+
+    /* All these fail. Lol, different than v7, no consistency. */
+#if 0
+    nop # mycomment
+    @ mycomment
+    nop @ mycomment
+#endif
+EXIT
diff --git a/userland/arch/aarch64/common_arch.h b/userland/arch/aarch64/common_arch.h
new file mode 100644
index 0000000..84c3140
--- /dev/null
+++ b/userland/arch/aarch64/common_arch.h
@@ -0,0 +1,83 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#userland-assembly-c-standard-library */
+
+#ifndef COMMON_ARCH_H
+#define COMMON_ARCH_H
+
+#define ASSERT_EQ(reg, const) \
+    mov x0, reg; \
+    ldr x1, =const; \
+    ASSERT_EQ_DO(64); \
+;
+
+#define ASSERT_EQ_DO(bits) \
+    bl assert_eq_ ## bits; \
+    cmp x0, 0; \
+    ASSERT(beq); \
+;
+
+#define ASSERT_EQ_REG(reg1, reg2) \
+    str reg2, [sp, -16]!; \
+    mov x0, reg1; \
+    ldr x1, [sp], 16; \
+    ASSERT_EQ_DO(64); \
+;
+
+#define ASSERT_EQ_REG_32(reg1, reg2) \
+    str reg2, [sp, -4]!; \
+    mov w0, reg1; \
+    ldr w1, [sp], 4; \
+    ASSERT_EQ_DO(32); \
+;
+
+#define ASSERT_MEMCMP(label1, label2, const_size) \
+    adr x0, label1; \
+    adr x1, label2; \
+    ldr x2, =const_size; \
+    bl assert_memcmp; \
+    cmp x0, 0; \
+    ASSERT(beq); \
+;
+
+#define ENTRY \
+.text; \
+.global asm_main; \
+asm_main: \
+    sub sp, sp, 0xA0; \
+    stp x29, x30, [sp]; \
+    stp x27, x28, [sp, 0x10]; \
+    stp x25, x26, [sp, 0x20]; \
+    stp x23, x24, [sp, 0x30]; \
+    stp x21, x22, [sp, 0x40]; \
+    stp x19, x20, [sp, 0x50]; \
+    stp  x6,  x7, [sp, 0x60]; \
+    stp  x4,  x5, [sp, 0x70]; \
+    stp  x2,  x3, [sp, 0x80]; \
+    stp  x0,  x1, [sp, 0x90]; \
+asm_main_after_prologue: \
+;
+
+#define EXIT \
+    mov w0, 0; \
+    mov w1, 0; \
+    b pass; \
+fail: \
+    ldr x1, [sp, 0x90]; \
+    str w0, [x1]; \
+    mov w0, 1; \
+pass: \
+    ldp x19, x20, [sp, 0x50]; \
+    ldp x21, x22, [sp, 0x40]; \
+    ldp x23, x24, [sp, 0x30]; \
+    ldp x25, x26, [sp, 0x20]; \
+    ldp x27, x28, [sp, 0x10]; \
+    ldp x29, x30, [sp]; \
+    add sp, sp, 0xA0; \
+    ret; \
+;
+
+#define FAIL \
+    ldr w0, =__LINE__; \
+    b fail; \
+;
+
+#endif
diff --git a/userland/arch/aarch64/cset.S b/userland/arch/aarch64/cset.S
new file mode 100644
index 0000000..66a8fb1
--- /dev/null
+++ b/userland/arch/aarch64/cset.S
@@ -0,0 +1,28 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-cset-instruction */
+
+#include "common.h"
+
+ENTRY
+    /* Test values. */
+    mov x19, 0
+    mov x20, 1
+
+    /* eq is true, set x21 = 1. */
+    cmp x19, x19
+    cset x21, eq
+    ASSERT_EQ(x21, 1)
+
+    /* eq is false, set x21 = 0. */
+    cmp x19, x20
+    cset x21, eq
+    ASSERT_EQ(x21, 0)
+
+    /* Same for ne. */
+    cmp x19, x19
+    cset x21, ne
+    ASSERT_EQ(x21, 0)
+
+    cmp x19, x20
+    cset x21, ne
+    ASSERT_EQ(x21, 1)
+EXIT
diff --git a/userland/arch/aarch64/fadd_scalar.S b/userland/arch/aarch64/fadd_scalar.S
new file mode 100644
index 0000000..62b9fd7
--- /dev/null
+++ b/userland/arch/aarch64/fadd_scalar.S
@@ -0,0 +1,60 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#advanced-simd-instructions */
+
+#include "common.h"
+
+ENTRY
+    /* 1.5 + 2.5 == 4.0
+     * using 64-bit double immediates.
+     */
+    fmov d0, 1.5
+    fmov d1, 2.5
+    fadd d2, d0, d1
+    fmov d3, 4.0
+    /* Unlike VFP vcmp, this stores the status
+     * automatically in the main CPSR.
+     */
+    fcmp d2, d3
+    ASSERT(beq)
+
+    /* Now with a memory stored value. */
+.data
+my_double_0:
+    .double 1.5
+my_double_1:
+    .double 2.5
+my_double_sum_expect:
+    .double 4.0
+.text
+    ldr d0, my_double_0
+    ldr d1, my_double_1
+    fadd d2, d0, d1
+    ldr d3, my_double_sum_expect
+    fcmp d2, d3
+    ASSERT(beq)
+
+    /* Now in 32-bit. */
+    fmov s0, 1.5
+    fmov s1, 2.5
+    fadd s2, s0, s1
+    fmov s3, 4.0
+    fcmp s2, s3
+    ASSERT(beq)
+
+    /* TODO why? What's the point of q then?
+     * Error: operand mismatch -- `fmov q0,1.5'
+     */
+#if 0
+    fmov q0, 1.5
+#endif
+
+    /* Much like integers, immediates are constrained to
+     * fit in 32-byte instructions. TODO exact rules.
+     *
+     * Assembly here would fail with:
+     *
+     * Error: invalid floating-point constant at operand 2
+     */
+#if 0
+    fmov d0, 1.23456798
+#endif
+EXIT
diff --git a/userland/arch/aarch64/fadd_vector.S b/userland/arch/aarch64/fadd_vector.S
new file mode 100644
index 0000000..d94a144
--- /dev/null
+++ b/userland/arch/aarch64/fadd_vector.S
@@ -0,0 +1,34 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#armv8-aarch64-fadd-vector-instruction
+ *
+ * Add a bunch of floating point numbers in one go.
+ */
+
+#include "common.h"
+
+ENTRY
+.data
+    input0_4s: .float 1.5, 2.5,  3.5,  4.5
+    input1_4s: .float 5.5, 6.5,  7.5,  8.5
+    expect_4s: .float 7.0, 9.0, 11.0, 13.0
+    input0_2d: .double 1.5, 2.5
+    input1_2d: .double 5.5, 6.5
+    expect_2d: .double 7.0, 9.0
+.bss
+    output:    .skip 16
+.text
+#define TEST(size) \
+    adr x0, input0_ ## size; \
+    ld1 {v0. ## size}, [x0]; \
+    adr x1, input1_ ## size; \
+    ld1 {v1. ## size}, [x1]; \
+    fadd v2. ## size, v0. ## size, v1. ## size; \
+    adr x0, output; \
+    st1 {v2. ## size}, [x0]; \
+    ASSERT_MEMCMP(output, expect_ ## size, 0x10)
+
+    /* 4x 32-bit */
+    TEST(4s)
+    /* 2x 64-bit */
+    TEST(2d)
+#undef TEST
+EXIT
diff --git a/userland/arch/aarch64/freestanding/build b/userland/arch/aarch64/freestanding/build
new file mode 120000
index 0000000..ab18017
--- /dev/null
+++ b/userland/arch/aarch64/freestanding/build
@@ -0,0 +1 @@
+../build
\ No newline at end of file
diff --git a/userland/arch/aarch64/freestanding/linux/build b/userland/arch/aarch64/freestanding/linux/build
new file mode 120000
index 0000000..ab18017
--- /dev/null
+++ b/userland/arch/aarch64/freestanding/linux/build
@@ -0,0 +1 @@
+../build
\ No newline at end of file
diff --git a/userland/arch/aarch64/freestanding/linux/hello.S b/userland/arch/aarch64/freestanding/linux/hello.S
new file mode 100644
index 0000000..fa4c298
--- /dev/null
+++ b/userland/arch/aarch64/freestanding/linux/hello.S
@@ -0,0 +1,22 @@
+/* aarch64 freestanding Linux hello world
+ * https://github.com/cirosantilli/linux-kernel-module-cheat#linux-system-calls
+ */
+
+.text
+.global _start
+_start:
+asm_main_after_prologue:
+    /* write */
+    mov x0, 1     /* stdout */
+    adr x1, msg   /* buffer */
+    ldr x2, =len  /* len */
+    mov x8, 64    /* syscall number */
+    svc 0
+
+    /* exit */
+    mov x0, 0     /* exit status */
+    mov x8, 93    /* syscall number */
+    svc 0
+msg:
+    .ascii "hello\n"
+len = . - msg
diff --git a/userland/arch/aarch64/freestanding/linux/test b/userland/arch/aarch64/freestanding/linux/test
new file mode 120000
index 0000000..419df4f
--- /dev/null
+++ b/userland/arch/aarch64/freestanding/linux/test
@@ -0,0 +1 @@
+../test
\ No newline at end of file
diff --git a/userland/arch/aarch64/freestanding/test b/userland/arch/aarch64/freestanding/test
new file mode 120000
index 0000000..419df4f
--- /dev/null
+++ b/userland/arch/aarch64/freestanding/test
@@ -0,0 +1 @@
+../test
\ No newline at end of file
diff --git a/userland/arch/aarch64/gas_data_sizes.S b/userland/arch/aarch64/gas_data_sizes.S
new file mode 100644
index 0000000..3243e59
--- /dev/null
+++ b/userland/arch/aarch64/gas_data_sizes.S
@@ -0,0 +1,29 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#gnu-gas-assembler data sizes */
+
+#include "common.h"
+
+ENTRY
+#define ASSERT_DIFF(label1, label2, result) \
+    adr x0, label1; \
+    adr x1, label2; \
+    sub x0, x1, x0; \
+    ASSERT_EQ(x0, result)
+
+    ASSERT_DIFF(mybyte, myword, 1)
+    ASSERT_DIFF(myword, mylong, 4)
+    ASSERT_DIFF(mylong, myquad, 4)
+    ASSERT_DIFF(myquad, myocta, 8)
+    ASSERT_DIFF(myocta, theend, 16)
+#undef ASSERT_DIF
+EXIT
+mybyte:
+    .byte 0x12
+myword:
+    .word 0x1234
+mylong:
+    .long 0x12345678
+myquad:
+    .quad 0x123456789ABCDEF0
+myocta:
+    .octa 0x123456789ABCDEF0123456789ABCDEF0
+theend:
diff --git a/userland/arch/aarch64/immediates.S b/userland/arch/aarch64/immediates.S
new file mode 100644
index 0000000..d60b9aa
--- /dev/null
+++ b/userland/arch/aarch64/immediates.S
@@ -0,0 +1,9 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#gnu-gas-assembler-immediates */
+
+#include "common.h"
+ENTRY
+    mov x0, 1
+    mov x0, 0x1
+    mov x0, 1
+    mov x0, 0x1
+EXIT
diff --git a/userland/arch/aarch64/ld2.S b/userland/arch/aarch64/ld2.S
new file mode 100644
index 0000000..22d4304
--- /dev/null
+++ b/userland/arch/aarch64/ld2.S
@@ -0,0 +1,26 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-ld2-instruction */
+
+#include "common.h"
+
+ENTRY
+.data
+    u32_interleave: .word \
+        0x11111111, 0x55555555, \
+        0x22222222, 0x66666666, \
+        0x33333333, 0x77777777, \
+        0x44444444, 0x88888888
+    u32_interleave_sum_expect: .word \
+        0x66666666, \
+        0x88888888, \
+        0xAAAAAAAA, \
+        0xCCCCCCCC
+.bss
+    u32_interleave_sum: .skip 16
+.text
+    adr x0, u32_interleave
+    ld2 {v0.4s, v1.4s}, [x0]
+    add v2.4s, v0.4s, v1.4s
+    adr x0, u32_interleave_sum
+    st1 {v2.4s}, [x0]
+    ASSERT_MEMCMP(u32_interleave_sum, u32_interleave_sum_expect, 0x10)
+EXIT
diff --git a/userland/arch/aarch64/movk.S b/userland/arch/aarch64/movk.S
new file mode 100644
index 0000000..4563c95
--- /dev/null
+++ b/userland/arch/aarch64/movk.S
@@ -0,0 +1,26 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#armv8-aarch64-movk-instruction */
+
+#include "common.h"
+
+ENTRY
+    movk x0, 0x4444, lsl 0
+    movk x0, 0x3333, lsl 16
+    movk x0, 0x2222, lsl 32
+    movk x0, 0x1111, lsl 48
+    ASSERT_EQ(x0, 0x1111222233334444)
+
+    /* Set a label (addresses are 48-bit) with immediates:
+     *
+     * * https://stackoverflow.com/questions/38570495/aarch64-relocation-prefixes
+     * * https://sourceware.org/binutils/docs-2.26/as/AArch64_002dRelocations.html
+     *
+     * This could be used if the label is too far away for
+     * adr relative addressing.
+     */
+    movz x0, :abs_g2:label     /* bits 32-47, overflow check */
+    movk x0, :abs_g1_nc:label  /* bits 16-31, no overflow check */
+    movk x0, :abs_g0_nc:label  /* bits  0-15, no overflow check */
+    adr x1, label
+label:
+    ASSERT_EQ_REG(x0, x1)
+EXIT
diff --git a/userland/arch/aarch64/movn.S b/userland/arch/aarch64/movn.S
new file mode 100644
index 0000000..253e9ca
--- /dev/null
+++ b/userland/arch/aarch64/movn.S
@@ -0,0 +1,9 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#armv8-aarch64-movn-instruction */
+
+#include "common.h"
+
+ENTRY
+    ldr x0, =0x123456789ABCDEF0
+    movn x0, 0x8888, lsl 16
+    ASSERT_EQ(x0, 0xFFFFFFFF7777FFFF)
+EXIT
diff --git a/userland/arch/aarch64/pc.S b/userland/arch/aarch64/pc.S
new file mode 100644
index 0000000..3497adc
--- /dev/null
+++ b/userland/arch/aarch64/pc.S
@@ -0,0 +1,78 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#assembly-registers */
+
+#include "common.h"
+
+ENTRY
+#if 0
+    /* Unlike v7, we can't use PC like any other register in ARMv8,
+     * since it is not a general purpose register anymore.
+     *
+     * Only branch instructions can modify the PC.
+     *
+     * B1.2.1 "Registers in AArch64 state" says:
+     *
+     * Software cannot write directly to the PC. It
+     * can only be updated on a branch, exception entry or
+     * exception return.
+     */
+    ldr pc, =10f
+    FAIL
+10:
+#endif
+#if 0
+    mov x0, pc
+#endif
+
+    /* LDR PC-relative loads exist in ARMv8, but they have a separate encoding
+     * "LDR (literal)" instead of "LDR (immediate)":
+     * https://stackoverflow.com/questions/28638981/howto-write-pc-relative-adressing-on-arm-asm/54480999#54480999
+     */
+    ldr x0, pc_relative_ldr
+    b 1f
+pc_relative_ldr:
+    .quad 0x123456789ABCDEF0
+1:
+    ASSERT_EQ(x0, 0x123456789ABCDEF0)
+
+    /* Just for fun, we can also use relative numbers instead of labels.
+     * https://reverseengineering.stackexchange.com/questions/17666/how-does-the-ldr-instruction-work-on-arm/20567#20567
+     */
+    ldr x0, 0x8
+    b 1f
+    .quad 0x123456789ABCDEF0
+1:
+    ASSERT_EQ(x0, 0x123456789ABCDEF0)
+
+    /* Analogous for b with PC. */
+    mov x0, 0
+    /* Jumps over mov to ASSERT_EQ. */
+    b 8
+    mov x0, 1
+    ASSERT_EQ(x0, 0)
+
+    /* Trying to use the old "LDR (immediate)" PC-relative
+     * syntax does not work.
+     */
+#if 0
+    /* 64-bit integer or SP register expected at operand 2 -- `ldr x0,[pc]' */
+    ldr x0, [pc]
+#endif
+
+    /* There is however no analogue for str. TODO rationale? */
+#if 0
+    /* Error: invalid addressing mode at operand 2 -- `str x0,pc_relative_str' */
+    str x0, pc_relative_str
+#endif
+
+    /* You just have to use adr + "STR (register)". */
+    ldr x0, pc_relative_str
+    ASSERT_EQ(x0, 0x0)
+    adr x1, pc_relative_str
+    ldr x0, pc_relative_ldr
+    str x0, [x1]
+    ldr x0, pc_relative_str
+    ASSERT_EQ(x0, 0x123456789ABCDEF0)
+EXIT
+.data
+pc_relative_str:
+    .quad 0x0000000000000000
diff --git a/userland/arch/aarch64/registers.S b/userland/arch/aarch64/registers.S
new file mode 100644
index 0000000..1466974
--- /dev/null
+++ b/userland/arch/aarch64/registers.S
@@ -0,0 +1,47 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#assembly-registers */
+
+#include "common.h"
+
+ENTRY
+
+    /* 31 64-bit eXtended general purpose registers. */
+    mov x0, 0
+    mov x1, 1
+    mov x2, 2
+    mov x3, 3
+    mov x4, 4
+    mov x5, 5
+    mov x6, 6
+    mov x7, 7
+    mov x8, 8
+    mov x9, 9
+    mov x10, 10
+    mov x11, 11
+    mov x12, 12
+    mov x13, 13
+    mov x14, 14
+    mov x15, 15
+    mov x16, 16
+    mov x17, 17
+    mov x18, 18
+    mov x19, 19
+    mov x20, 20
+    mov x21, 21
+    mov x22, 22
+    mov x23, 23
+    mov x24, 24
+    mov x25, 25
+    mov x26, 26
+    mov x27, 27
+    mov x28, 28
+    mov x29, 29
+
+    /* x30 is the link register. BL stores the return address here. */
+    /*mov x30, 30*/
+
+    /* W form addresses the lower 4 bytes word, and zeroes the top. */
+    ldr x0, =0x1111222233334444
+    ldr x1, =0x5555666677778888
+    mov w0, w1
+    ASSERT_EQ(x0, 0x0000000077778888)
+EXIT
diff --git a/userland/arch/aarch64/ret.S b/userland/arch/aarch64/ret.S
new file mode 100644
index 0000000..6570ea1
--- /dev/null
+++ b/userland/arch/aarch64/ret.S
@@ -0,0 +1,28 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#armv8-aarch64-ret-instruction */
+
+#include "common.h"
+
+ENTRY
+    mov x19, 1
+    bl inc
+    ASSERT_EQ(x19, 2)
+    bl inc2
+    ASSERT_EQ(x19, 3)
+    bl inc3
+    ASSERT_EQ(x19, 4)
+EXIT
+
+/* void inc(uint64_t *i) { (*i)++ } */
+inc:
+    add x19, x19, 1
+    ret
+
+/* Same but explicit return register. */
+inc2:
+    add x19, x19, 1
+    ret x30
+
+/* Same but with br. */
+inc3:
+    add x19, x19, 1
+    br x30
diff --git a/userland/arch/aarch64/str.S b/userland/arch/aarch64/str.S
new file mode 100644
index 0000000..e65799d
--- /dev/null
+++ b/userland/arch/aarch64/str.S
@@ -0,0 +1,13 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#armv8-aarch64-str-instruction */
+
+#include "common.h"
+
+ENTRY
+    ldr x0, myvar
+    ASSERT_EQ(x0, 0x12346789ABCDEF0)
+#if 0
+    /* Error: invalid addressing mode at operand 2 -- `str x0,myvar' */
+    str x0, myvar
+#endif
+EXIT
+    myvar: .quad 0x12346789ABCDEF0
diff --git a/userland/arch/aarch64/test b/userland/arch/aarch64/test
new file mode 120000
index 0000000..419df4f
--- /dev/null
+++ b/userland/arch/aarch64/test
@@ -0,0 +1 @@
+../test
\ No newline at end of file
diff --git a/userland/arch/aarch64/ubfm.S b/userland/arch/aarch64/ubfm.S
new file mode 100644
index 0000000..7924917
--- /dev/null
+++ b/userland/arch/aarch64/ubfm.S
@@ -0,0 +1,17 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-ubfm-instruction */
+
+#include "common.h"
+
+ENTRY
+    ldr x19, =0x1122334455667788
+
+    // lsr alias: imms == 63
+
+    ldr x20, =0xFFFFFFFFFFFFFFFF
+    ubfm x20, x19, 16, 63
+    ASSERT_EQ(x20, 0x0000112233445566)
+
+    ldr x20, =0xFFFFFFFFFFFFFFFF
+    ubfm x20, x19, 32, 63
+    ASSERT_EQ(x20, 0x0000000011223344)
+EXIT
diff --git a/userland/arch/aarch64/ubfx.S b/userland/arch/aarch64/ubfx.S
new file mode 100644
index 0000000..3223235
--- /dev/null
+++ b/userland/arch/aarch64/ubfx.S
@@ -0,0 +1,15 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-ubfx-instruction */
+
+#include "common.h"
+
+ENTRY
+    ldr x19, =0x1122334455667788
+
+    ldr x20, =0xFFFFFFFFFFFFFFFF
+    ubfx x20, x19, 8, 16
+    ASSERT_EQ(x20, 0x0000000000006677)
+
+    ldr x20, =0xFFFFFFFFFFFFFFFF
+    ubfx x20, x19, 8, 32
+    ASSERT_EQ(x20, 0x0000000044556677)
+EXIT
diff --git a/userland/arch/aarch64/x31.S b/userland/arch/aarch64/x31.S
new file mode 100644
index 0000000..00b9432
--- /dev/null
+++ b/userland/arch/aarch64/x31.S
@@ -0,0 +1,51 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#armv8-aarch64-x31-register */
+
+#include "common.h"
+
+ENTRY
+    /* ERROR: can never use the name x31. */
+#if 0
+    mov x31, 31
+#endif
+
+    /* mov (register) is an alias for ORR, which accepts xzr. */
+    mov x19, 1
+    mov x19, xzr
+    ASSERT_EQ(x19, 0)
+
+    /* Same encoding as the mov version. */
+    mov x19, 1
+    orr x19, xzr, xzr
+    ASSERT_EQ(x19, 0)
+
+    /* So, orr, which is not an alias, can only take xzr, not sp. */
+#if 0
+    orr sp, sp, sp
+#endif
+
+    /* Zero register discards result if written to. */
+    mov x19, 1
+    orr xzr, x19, x19
+    ASSERT_EQ(xzr, 0)
+
+    /* MOV (to/from SP) is an alias for ADD (immediate). */
+    mov x19, sp
+    mov sp, 1
+    /* Alias to add. */
+    mov x20, sp
+    /* Exact same encoding as above. */
+    add x20, sp, 0
+    mov sp, x19
+    ASSERT_EQ(x20, 1)
+
+    /* So, ADD (immediate), which is not an alias, can only take sp, not xzr. */
+#if 0
+    /* Error: integer register expected in the extended/shifted operand register at operand 3 -- `add xzr,xzr,1' */
+    add xzr, xzr, 1
+#endif
+
+    /* Note however that ADD (register), unlike ADD (immediate),
+     * does not say anything about SP, and so does accept xzr just fine.
+     */
+    add xzr, xzr, xzr
+EXIT
diff --git a/userland/arch/arm/add.S b/userland/arch/arm/add.S
new file mode 100644
index 0000000..5bec4f9
--- /dev/null
+++ b/userland/arch/arm/add.S
@@ -0,0 +1,58 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-data-processing-instructions */
+
+#include "common.h"
+
+ENTRY
+
+    /* Immediate encoding.
+     *
+     * r1 = r0 + 2
+     */
+    mov r0, 1
+    /* r1 = r0 + 2 */
+    add r1, r0, 2
+    ASSERT_EQ(r1, 3)
+
+    /* If src == dest, we can omit one of them.
+     *
+     * r0 = r0 + 2
+     */
+    mov r0, 1
+    add r0, 2
+    ASSERT_EQ(r0, 3)
+
+    /* Same as above but explicit. */
+    mov r0, 1
+    add r0, r0, 2
+    ASSERT_EQ(r0, 3)
+
+#if 0
+    /* But we cannot omit the register if there is a shift when using .syntx unified:
+     * https://github.com/cirosantilli/linux-kernel-module-cheat#shift-suffixes
+     */
+    .syntax unified
+    /* Error: garbage following instruction */
+    add r0, r1, lsl 1
+    /* OK */
+    add r0, r0, r1, lsl 1
+#endif
+
+    /* Register encoding.
+     *
+     * r2 = r0 + r1
+     */
+    mov r0, 1
+    mov r1, 2
+    add r2, r0, r1
+    ASSERT_EQ(r2, 3)
+
+    /* Register encoding, omit implicit register.
+     *
+     * r1 = r1 + r0
+     */
+    mov r0, 1
+    mov r1, 2
+    add r1, r0
+    ASSERT_EQ(r1, 3)
+
+EXIT
diff --git a/userland/arch/arm/address_modes.S b/userland/arch/arm/address_modes.S
new file mode 100644
index 0000000..66e2ad1
--- /dev/null
+++ b/userland/arch/arm/address_modes.S
@@ -0,0 +1,51 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-addressing-modes */
+
+#include "common.h"
+
+ENTRY
+
+    /* Offset mode with immediate. Add 4 to the address register,
+     * which ends up * reading myvar6 instead of myvar.
+     */
+    adr r4, myvar
+    ldr r5, [r4, 4]
+    ASSERT_EQ(r5, 0x9ABCDEF0)
+    /* r4 was not modified. */
+    ASSERT_EQ(r4, myvar)
+
+    /* Pre-indexed mode: modify register, then use it. */
+    adr r4, myvar
+    ldr r5, [r4, 4]!
+    ASSERT_EQ(r5, 0x9ABCDEF0)
+    /* r4 was modified. */
+    ASSERT_EQ(r4, myvar6)
+
+    /* Post-indexed mode: use register, then modify it. */
+    adr r4, myvar
+    ldr r5, [r4], 4
+    ASSERT_EQ(r5, 0x12345678)
+    /* r4 was modified. */
+    ASSERT_EQ(r4, myvar6)
+
+    /* Offset in register. */
+    adr r4, myvar
+    mov r5, 4
+    ldr r6, [r4, r5]
+    ASSERT_EQ(r6, 0x9ABCDEF0)
+
+    /* Offset in shifted register:
+     * r6 =
+     * (r4 + (r5 << 1))
+     * == *(myvar + (2 << 1))
+     * == *(myvar + 4)
+     */
+    adr r4, myvar
+    mov r5, 2
+    ldr r6, [r4, r5, lsl 1]
+    ASSERT_EQ(r6, 0x9ABCDEF0)
+
+EXIT
+myvar:
+    .word 0x12345678
+myvar6:
+    .word 0x9ABCDEF0
diff --git a/userland/arch/arm/adr.S b/userland/arch/arm/adr.S
new file mode 100644
index 0000000..13e1638
--- /dev/null
+++ b/userland/arch/arm/adr.S
@@ -0,0 +1,33 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-adr-instruction */
+
+#include "common.h"
+
+.data
+data_label:
+    .word 0x1234678
+ENTRY
+    adr r4, label
+    /* objdump tells us that this uses the literal pool,
+     * it does not get converted to adr, which is the better
+     * alternative here.
+     */
+    ldr r5, =label
+    adrl r6, label
+label:
+    ASSERT_EQ_REG(r4, r5)
+    ASSERT_EQ_REG(r4, r6)
+
+#if 0
+    /* Error: symbol .data is in a different section.
+     *
+     * It works however in ARMv8.
+     * I think this means that there is no relocation type
+     * that takes care of this encoding in ARMv8, but there
+     * is one in ARMv8.
+     *
+     * If you have no idea what I'm talking about, read this:
+     * https://stackoverflow.com/questions/3322911/what-do-linkers-do/33690144#33690144
+     */
+    adr r5, data_label
+#endif
+EXIT
diff --git a/userland/arch/arm/and.S b/userland/arch/arm/and.S
new file mode 100644
index 0000000..0e39161
--- /dev/null
+++ b/userland/arch/arm/and.S
@@ -0,0 +1,27 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-bitwise-instructions */
+
+#include "common.h"
+
+ENTRY
+
+    /* 0x00 && 0xFF == 0x00 */
+    mov r0, 0x00
+    and r0, 0xFF
+    ASSERT_EQ(r0, 0x00)
+
+    /* 0x0F && 0xF0 == 0x00 */
+    mov r0, 0x0F
+    and r0, 0xF0
+    ASSERT_EQ(r0, 0x00)
+
+    /* 0x0F && 0xFF == 0x0F */
+    mov r0, 0x0F
+    and r0, 0xFF
+    ASSERT_EQ(r0, 0x0F)
+
+    /* 0xF0 && 0xFF == 0xF0 */
+    mov r0, 0xF0
+    and r0, 0xFF
+    ASSERT_EQ(r0, 0xF0)
+
+EXIT
diff --git a/userland/arch/arm/b.S b/userland/arch/arm/b.S
new file mode 100644
index 0000000..bf7ab03
--- /dev/null
+++ b/userland/arch/arm/b.S
@@ -0,0 +1,9 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-b-instruction */
+
+#include "common.h"
+ENTRY
+    /* Jump over the fail. 26-bit PC-relative. */
+    b ok
+    FAIL
+ok:
+EXIT
diff --git a/userland/arch/arm/beq.S b/userland/arch/arm/beq.S
new file mode 100644
index 0000000..3c9564f
--- /dev/null
+++ b/userland/arch/arm/beq.S
@@ -0,0 +1,28 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-beq-instruction */
+
+#include "common.h"
+
+ENTRY
+
+    /* Smaller*/
+    mov r0, 1
+    cmp r0, 2
+    ASSERT(ble)
+    ASSERT(blt)
+    ASSERT(bne)
+
+    /* Equal. */
+    mov r1, 0
+    cmp r1, 0
+    ASSERT(beq)
+    ASSERT(bge)
+    ASSERT(ble)
+
+    /* Greater. */
+    mov r0, 2
+    cmp r0, 1
+    ASSERT(bge)
+    ASSERT(bgt)
+    ASSERT(bne)
+
+EXIT
diff --git a/userland/arch/arm/bfi.S b/userland/arch/arm/bfi.S
new file mode 100644
index 0000000..0f15c16
--- /dev/null
+++ b/userland/arch/arm/bfi.S
@@ -0,0 +1,10 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-bfi-instruction */
+
+#include "common.h"
+
+ENTRY
+    ldr r0, =0x11223344
+    ldr r1, =0xFFFFFFFF
+    bfi r1, r0, 8, 16
+    ASSERT_EQ(r1, 0xFF3344FF)
+EXIT
diff --git a/userland/arch/arm/bic.S b/userland/arch/arm/bic.S
new file mode 100644
index 0000000..a1aff45
--- /dev/null
+++ b/userland/arch/arm/bic.S
@@ -0,0 +1,10 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-bic-instruction */
+
+#include "common.h"
+
+ENTRY
+    /* 0x0F & ~0x55 == 0x0F & 0xAA == 0x0A */
+    mov r0, 0x0F
+    bic r0, 0x55
+    ASSERT_EQ(r0, 0x0A)
+EXIT
diff --git a/userland/arch/arm/bl.S b/userland/arch/arm/bl.S
new file mode 100644
index 0000000..4dbbdef
--- /dev/null
+++ b/userland/arch/arm/bl.S
@@ -0,0 +1,14 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-bl-instruction */
+
+#include "common.h"
+
+ENTRY
+    mov r0, 1
+    bl inc
+    ASSERT_EQ(r0, 2)
+EXIT
+
+/* void inc(int *i) { (*i)++ } */
+inc:
+    add r0, 1
+    bx lr
diff --git a/userland/arch/arm/build b/userland/arch/arm/build
new file mode 120000
index 0000000..ab18017
--- /dev/null
+++ b/userland/arch/arm/build
@@ -0,0 +1 @@
+../build
\ No newline at end of file
diff --git a/userland/arch/arm/c/add.c b/userland/arch/arm/c/add.c
new file mode 100644
index 0000000..d46acf3
--- /dev/null
+++ b/userland/arch/arm/c/add.c
@@ -0,0 +1,20 @@
+/* 1 + 2 == 3
+ *
+ * https://github.com/cirosantilli/linux-kernel-module-cheat#gcc-inline-assembly
+ */
+
+#include <assert.h>
+#include <inttypes.h>
+
+int main(void) {
+    uint32_t in0 = 1, in1 = 2, out;
+    __asm__ (
+        "add %[out], %[in0], %[in1];"
+        : [out] "=r" (out)
+        : [in0] "r"  (in0),
+          [in1] "r"  (in1)
+    );
+    assert(in0 == 1);
+    assert(in1 == 2);
+    assert(out == 3);
+}
diff --git a/userland/arch/arm/c/build b/userland/arch/arm/c/build
new file mode 120000
index 0000000..ab18017
--- /dev/null
+++ b/userland/arch/arm/c/build
@@ -0,0 +1 @@
+../build
\ No newline at end of file
diff --git a/userland/arch/arm/c/freestanding/build b/userland/arch/arm/c/freestanding/build
new file mode 120000
index 0000000..ab18017
--- /dev/null
+++ b/userland/arch/arm/c/freestanding/build
@@ -0,0 +1 @@
+../build
\ No newline at end of file
diff --git a/userland/arch/arm/c/freestanding/linux/build b/userland/arch/arm/c/freestanding/linux/build
new file mode 120000
index 0000000..ab18017
--- /dev/null
+++ b/userland/arch/arm/c/freestanding/linux/build
@@ -0,0 +1 @@
+../build
\ No newline at end of file
diff --git a/userland/arch/arm/c/freestanding/linux/hello.c b/userland/arch/arm/c/freestanding/linux/hello.c
new file mode 100644
index 0000000..d27344a
--- /dev/null
+++ b/userland/arch/arm/c/freestanding/linux/hello.c
@@ -0,0 +1,40 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#linux-system-calls
+ *
+ * arm freestanding C inline assemby Linux hello world.
+ */
+
+#include <inttypes.h>
+
+void _start(void) {
+    uint32_t exit_status;
+
+    /* write */
+    {
+        char msg[] = "hello\n";
+        uint32_t syscall_return;
+        register uint32_t r0 __asm__ ("r0") = 1; /* stdout */
+        register char *r1 __asm__ ("r1") = msg;
+        register uint32_t r2 __asm__ ("r2") = sizeof(msg);
+        register uint32_t r8 __asm__ ("r7") = 4; /* syscall number */
+        __asm__ __volatile__ (
+            "svc 0;"
+            : "+r" (r0)
+            : "r" (r1), "r" (r2), "r" (r8)
+            : "memory"
+        );
+        syscall_return = r0;
+        exit_status = (syscall_return != sizeof(msg));
+    }
+
+    /* exit */
+    {
+        register uint32_t r0 __asm__ ("r0") = exit_status;
+        register uint32_t r7 __asm__ ("r7") = 1;
+        __asm__ __volatile__ (
+            "svc 0;"
+            : "+r" (r0)
+            : "r" (r7)
+            :
+        );
+    }
+}
diff --git a/userland/arch/arm/c/freestanding/linux/test b/userland/arch/arm/c/freestanding/linux/test
new file mode 120000
index 0000000..419df4f
--- /dev/null
+++ b/userland/arch/arm/c/freestanding/linux/test
@@ -0,0 +1 @@
+../test
\ No newline at end of file
diff --git a/userland/arch/arm/c/freestanding/test b/userland/arch/arm/c/freestanding/test
new file mode 120000
index 0000000..419df4f
--- /dev/null
+++ b/userland/arch/arm/c/freestanding/test
@@ -0,0 +1 @@
+../test
\ No newline at end of file
diff --git a/userland/arch/arm/c/inc.c b/userland/arch/arm/c/inc.c
new file mode 100644
index 0000000..9649b13
--- /dev/null
+++ b/userland/arch/arm/c/inc.c
@@ -0,0 +1,18 @@
+/* Increment a variable in inline assembly.
+ *
+ * https://github.com/cirosantilli/linux-kernel-module-cheat#gcc-inline-assembly
+ */
+
+#include <assert.h>
+#include <inttypes.h>
+
+int main(void) {
+    uint32_t my_local_var = 1;
+    __asm__ (
+        "add %[my_local_var], %[my_local_var], #1;"
+        : [my_local_var] "+r" (my_local_var)
+        :
+        :
+    );
+    assert(my_local_var == 2);
+}
diff --git a/userland/arch/arm/c/inc_float.c b/userland/arch/arm/c/inc_float.c
new file mode 100644
index 0000000..8de4630
--- /dev/null
+++ b/userland/arch/arm/c/inc_float.c
@@ -0,0 +1,28 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#gcc-inline-assembly-floating-point-arm */
+
+#include <assert.h>
+
+int main(void) {
+    float my_float = 1.5;
+    __asm__ (
+        "vmov s0, 1.0;"
+        "vadd.f32 %[my_float], %[my_float], s0;"
+        : [my_float] "+t" (my_float)
+        :
+        : "s0"
+    );
+    assert(my_float == 2.5);
+
+    /* Undocumented %P
+     * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89482
+     */
+    double my_double = 1.5;
+    __asm__ (
+        "vmov.f64 d0, 1.0;"
+        "vadd.f64 %P[my_double], %P[my_double], d0;"
+        : [my_double] "+w" (my_double)
+        :
+        : "d0"
+    );
+    assert(my_double == 2.5);
+}
diff --git a/userland/arch/arm/c/inc_memory.c b/userland/arch/arm/c/inc_memory.c
new file mode 100644
index 0000000..4dd67a1
--- /dev/null
+++ b/userland/arch/arm/c/inc_memory.c
@@ -0,0 +1,34 @@
+/* Like inc.c but less good since we do more work ourselves.
+ *
+ * Just doing this to test out the "m" memory constraint.
+ *
+ * GCC 8.2.0 -O0 assembles ldr line to:
+ *
+ * ....
+ * ldr r0, [fp, #-12]
+ * ....
+ *
+ * and `-O3` assembles to:
+ *
+ * ....
+ * ldr r0, [sp]
+ * ....
+ *
+ * https://github.com/cirosantilli/linux-kernel-module-cheat#gcc-inline-assembly
+ */
+
+#include <assert.h>
+#include <inttypes.h>
+
+int main(void) {
+    uint32_t my_local_var = 1;
+    __asm__ (
+        "ldr r0, %[my_local_var];"
+        "add r0, r0, #1;"
+        "str r0, %[my_local_var];"
+        : [my_local_var] "+m" (my_local_var)
+        :
+        : "r0"
+    );
+    assert(my_local_var == 2);
+}
diff --git a/userland/arch/arm/c/inc_memory_global.c b/userland/arch/arm/c/inc_memory_global.c
new file mode 100644
index 0000000..41477e0
--- /dev/null
+++ b/userland/arch/arm/c/inc_memory_global.c
@@ -0,0 +1,27 @@
+/* GCC 8.2.0 -O0 and -O3 assembles ldr line to:
+ *
+ * ....
+ * movw r3, #<lower address part>
+ * movt r3, #<higher address part>
+ * ldr r0, [r3]
+ * ....
+ *
+ * https://github.com/cirosantilli/linux-kernel-module-cheat#gcc-inline-assembly
+ */
+
+#include <assert.h>
+#include <inttypes.h>
+
+uint32_t my_global_var = 1;
+
+int main(void) {
+    __asm__ (
+        "ldr r0, %[my_global_var];"
+        "add r0, r0, #1;"
+        "str r0, %[my_global_var];"
+        : [my_global_var] "+m" (my_global_var)
+        :
+        : "r0"
+    );
+    assert(my_global_var == 2);
+}
diff --git a/userland/arch/arm/c/reg_var.c b/userland/arch/arm/c/reg_var.c
new file mode 100644
index 0000000..7629dd5
--- /dev/null
+++ b/userland/arch/arm/c/reg_var.c
@@ -0,0 +1,38 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#gcc-inline-assembly-register-variables */
+
+#include <assert.h>
+#include <inttypes.h>
+
+int main(void) {
+    register uint32_t r0 __asm__ ("r0");
+    register uint32_t r1 __asm__ ("r1");
+    uint32_t new_r0;
+    uint32_t new_r1;
+    {
+        /* We must set the registers immediately before calling,
+         * without making any function calls in between.
+         */
+        r0 = 1;
+        r1 = 2;
+        __asm__ (
+            /* We intentionally use an explicit r0 and r1 here,
+            * just to illustrate that we are certain that the
+            * r0 variable will go in r0. Real code would never do this.
+            */
+            "add %[r0], r0, #1;"
+            "add %[r1], r1, #1;"
+            /* We have to specify r0 in the constraints.*/
+            : [r0] "+r" (r0),
+              [r1] "+r" (r1)
+            :
+            :
+        );
+        /* When we are done, we must immediatly assign
+         * the register variables to regular variables.
+         */
+        new_r0 = r0;
+        new_r1 = r1;
+    }
+    assert(new_r0 == 2);
+    assert(new_r1 == 3);
+}
diff --git a/userland/arch/arm/c/test b/userland/arch/arm/c/test
new file mode 120000
index 0000000..419df4f
--- /dev/null
+++ b/userland/arch/arm/c/test
@@ -0,0 +1 @@
+../test
\ No newline at end of file
diff --git a/userland/arch/arm/clz.S b/userland/arch/arm/clz.S
new file mode 100644
index 0000000..058e319
--- /dev/null
+++ b/userland/arch/arm/clz.S
@@ -0,0 +1,17 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#data-bitwise-instructions */
+
+#include "common.h"
+
+ENTRY
+    ldr r0, =0x7FFFFFFF
+    clz r1, r0
+    ASSERT_EQ(r1, 1)
+
+    ldr r0, =0x3FFFFFFF
+    clz r1, r0
+    ASSERT_EQ(r1, 2)
+
+    ldr r0, =0x1FFFFFFF
+    clz r1, r0
+    ASSERT_EQ(r1, 3)
+EXIT
diff --git a/userland/arch/arm/comments.S b/userland/arch/arm/comments.S
new file mode 100644
index 0000000..1477567
--- /dev/null
+++ b/userland/arch/arm/comments.S
@@ -0,0 +1,14 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#gnu-gas-assembler-comments */
+
+#include "common.h"
+ENTRY
+    # mycomment
+    @ mycomment
+    /* # only works at the beginning of the line.
+     * Error: garbage following instruction -- `nop #comment'
+     */
+#if 0
+    nop # mycomment
+#endif
+    nop @ mycomment
+EXIT
diff --git a/userland/arch/arm/common_arch.h b/userland/arch/arm/common_arch.h
new file mode 100644
index 0000000..ab65b3a
--- /dev/null
+++ b/userland/arch/arm/common_arch.h
@@ -0,0 +1,90 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#userland-assembly-c-standard-library */
+
+#ifndef COMMON_ARCH_H
+#define COMMON_ARCH_H
+
+.syntax unified
+
+/* Assert that a register equals a constant.
+ * * reg: the register to check
+ * * const: the constant to compare to. Only works for literals or labels, not for registers.
+ *          For register / register comparison, use ASSERT_EQ_REG.
+ */
+#define ASSERT_EQ(reg, const) \
+    mov r0, reg; \
+    ldr r1, =const; \
+    ASSERT_EQ_DO; \
+;
+
+#define ASSERT_EQ_DO \
+    bl assert_eq_32; \
+    cmp r0, 0; \
+    ASSERT(beq); \
+;
+
+#define ASSERT_EQ_REG(reg1, reg2) \
+    str reg2, [sp, -4]!; \
+    mov r0, reg1; \
+    ldr r1, [sp], 4; \
+    ASSERT_EQ_DO; \
+;
+
+/* Assert that two arrays are the same. */
+#define ASSERT_MEMCMP(label1, label2, const_size) \
+    ldr r0, =label1; \
+    ldr r1, =label2; \
+    ldr r2, =const_size; \
+    bl assert_memcmp; \
+    cmp r0, 0; \
+    ASSERT(beq); \
+;
+
+/* Store all callee saved registers, and LR in case we make further BL calls.
+ *
+ * Also save the input arguments r0-r3 on the stack, so we can access them later on,
+ * despite those registers being overwritten.
+ */
+#define ENTRY \
+.text; \
+.global asm_main; \
+asm_main: \
+    stmdb sp!, {r0-r12, lr}; \
+asm_main_after_prologue: \
+;
+
+/* Meant to be called at the end of ENTRY.*
+ *
+ * Branching to "fail" makes tests fail with exit status 1.
+ *
+ * If EXIT is reached, the program ends successfully.
+ *
+ * Restore LR and bx jump to it to return from asm_main.
+ */
+#define EXIT \
+    mov r0, 0; \
+    mov r1, 0; \
+    b pass; \
+fail: \
+    ldr r1, [sp]; \
+    str r0, [r1]; \
+    mov r0, 1; \
+pass: \
+    add sp, 16; \
+    ldmia sp!, {r4-r12, lr}; \
+    bx lr; \
+;
+
+/* Always fail. */
+#define FAIL \
+    ldr r0, =__LINE__; \
+    b fail; \
+;
+
+#define MEMCMP(s1, s2, n) \
+    ldr r0, =s1; \
+    ldr r1, =s2; \
+    ldr r2, =n; \
+    bl memcmp; \
+;
+
+#endif
diff --git a/userland/arch/arm/cond.S b/userland/arch/arm/cond.S
new file mode 100644
index 0000000..25098e7
--- /dev/null
+++ b/userland/arch/arm/cond.S
@@ -0,0 +1,16 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-conditional-execution */
+
+#include "common.h"
+
+ENTRY
+    mov r0, 0
+    mov r1, 1
+    cmp r0, 1
+    /* Previous cmp failed, skip this operation. */
+    addeq r1, 1
+    ASSERT_EQ(r1, 1)
+    cmp r0, 0
+    /* Previous passed, do this operation. */
+    addeq r1, 1
+    ASSERT_EQ(r1, 2)
+EXIT
diff --git a/userland/arch/arm/freestanding/build b/userland/arch/arm/freestanding/build
new file mode 120000
index 0000000..ab18017
--- /dev/null
+++ b/userland/arch/arm/freestanding/build
@@ -0,0 +1 @@
+../build
\ No newline at end of file
diff --git a/userland/arch/arm/freestanding/linux/build b/userland/arch/arm/freestanding/linux/build
new file mode 120000
index 0000000..ab18017
--- /dev/null
+++ b/userland/arch/arm/freestanding/linux/build
@@ -0,0 +1 @@
+../build
\ No newline at end of file
diff --git a/userland/arch/arm/freestanding/linux/hello.S b/userland/arch/arm/freestanding/linux/hello.S
new file mode 100644
index 0000000..3ef842a
--- /dev/null
+++ b/userland/arch/arm/freestanding/linux/hello.S
@@ -0,0 +1,23 @@
+/* arm freestanding Linux hello world
+ * https://github.com/cirosantilli/linux-kernel-module-cheat#linux-system-calls
+ */
+
+.syntax unified
+.text
+.global _start
+_start:
+asm_main_after_prologue:
+    /* write */
+    mov r0, 1     /* stdout */
+    adr r1, msg   /* buffer */
+    ldr r2, =len  /* len */
+    mov r7, 4     /* syscall number */
+    svc 0
+
+    /* exit */
+    mov r0, 0     /* exit status */
+    mov r7, 1     /* syscall number */
+    svc 0
+msg:
+    .ascii "hello\n"
+len = . - msg
diff --git a/userland/arch/arm/freestanding/linux/test b/userland/arch/arm/freestanding/linux/test
new file mode 120000
index 0000000..419df4f
--- /dev/null
+++ b/userland/arch/arm/freestanding/linux/test
@@ -0,0 +1 @@
+../test
\ No newline at end of file
diff --git a/userland/arch/arm/freestanding/test b/userland/arch/arm/freestanding/test
new file mode 120000
index 0000000..419df4f
--- /dev/null
+++ b/userland/arch/arm/freestanding/test
@@ -0,0 +1 @@
+../test
\ No newline at end of file
diff --git a/userland/arch/arm/gas_data_sizes.S b/userland/arch/arm/gas_data_sizes.S
new file mode 100644
index 0000000..4e7f91d
--- /dev/null
+++ b/userland/arch/arm/gas_data_sizes.S
@@ -0,0 +1,30 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#gnu-gas-assembler data sizes */
+
+#include "common.h"
+
+.data
+mybyte:
+    .byte 0x12
+myword:
+    .word 0x1234
+mylong:
+    .long 0x12345678
+myquad:
+    .quad 0x123456789ABCDEF0
+myocta:
+    .octa 0x123456789ABCDEF0123456789ABCDEF0
+theend:
+ENTRY
+#define ASSERT_DIFF(label1, label2, result) \
+    ldr r0, =label1; \
+    ldr r1, =label2; \
+    sub r0, r1, r0; \
+    ASSERT_EQ(r0, result)
+
+    ASSERT_DIFF(mybyte, myword, 1)
+    ASSERT_DIFF(myword, mylong, 4)
+    ASSERT_DIFF(mylong, myquad, 4)
+    ASSERT_DIFF(myquad, myocta, 8)
+    ASSERT_DIFF(myocta, theend, 16)
+#undef ASSERT_DIF
+EXIT
diff --git a/userland/arch/arm/immediates.S b/userland/arch/arm/immediates.S
new file mode 100644
index 0000000..0959a0c
--- /dev/null
+++ b/userland/arch/arm/immediates.S
@@ -0,0 +1,24 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-gnu-instruction-gas-assembler-immediates */
+
+#include "common.h"
+
+ENTRY
+    /* This is the default. We hack it in common.h however. */
+.syntax divided
+   /* These fail. */
+#if 0
+    mov r0, 1
+    mov r0, 0x1
+#endif
+    mov r0, #1
+    mov r0, #0x1
+    mov r0, $1
+    mov r0, $0x1
+.syntax unified
+    mov r0, 1
+    mov r0, 0x1
+    mov r0, 1
+    mov r0, 0x1
+    mov r0, $1
+    mov r0, $0x1
+EXIT
diff --git a/userland/arch/arm/inc_array.S b/userland/arch/arm/inc_array.S
new file mode 100644
index 0000000..6c1711c
--- /dev/null
+++ b/userland/arch/arm/inc_array.S
@@ -0,0 +1,27 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-loop-instruction-over-array */
+
+#include "common.h"
+
+#define NELEM 4
+#define ELEM_SIZE 4
+
+.data;
+my_array:
+    .word 0x11111111, 0x22222222, 0x33333333, 0x44444444
+my_array_expect:
+    .word 0x11111112, 0x22222223, 0x33333334, 0x44444445
+
+ENTRY
+    /* Increment. */
+    ldr r0, =my_array
+    mov r1, NELEM
+increment:
+    ldr r2, [r0]
+    add r2, 1
+    /* Post index usage. */
+    str r2, [r0], ELEM_SIZE
+    sub r1, 1
+    cmp r1, 0
+    bne increment
+    ASSERT_MEMCMP(my_array, my_array_expect, 0x10)
+EXIT
diff --git a/userland/arch/arm/ldmia.S b/userland/arch/arm/ldmia.S
new file mode 100644
index 0000000..80ea3b4
--- /dev/null
+++ b/userland/arch/arm/ldmia.S
@@ -0,0 +1,61 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-loop-instruction-over-array */
+
+#include "common.h"
+
+#define NELEM 4
+#define ELEM_SIZE 4
+
+.data;
+my_array_0:
+    .word 0x11111111, 0x22222222, 0x33333333, 0x44444444
+my_array_1:
+    .word 0x55555555, 0x66666666, 0x77777777, 0x88888888
+
+ENTRY
+
+    /* Load r5, r6, r7 and r8 starting from the address in r4. Don't change r4 */
+    ldr r4, =my_array_0
+    ldr r5, =0
+    ldr r6, =0
+    ldr r7, =0
+    ldr r8, =0
+    ldmia r4, {r5-r8}
+    ASSERT_EQ(r4, my_array_0)
+    ASSERT_EQ(r5, 0x11111111)
+    ASSERT_EQ(r6, 0x22222222)
+    ASSERT_EQ(r7, 0x33333333)
+    ASSERT_EQ(r8, 0x44444444)
+
+    /* Swapping the order of r5 and r6 on the mnemonic makes no difference to load order.
+     *
+     * But it gives an assembler warning, so we won't do it by default:
+     *
+     *  ldmia.S: Assembler messages:
+     *  ldmia.S:32: Warning: register range not in ascending order
+     */
+#if 0
+    ldr r4, =my_array_0
+    ldr r5, =0
+    ldr r6, =0
+    ldmia r4, {r6,r5}
+    ASSERT_EQ(r5, 0x11111111)
+    ASSERT_EQ(r6, 0x22222222)
+#endif
+
+    /* Modify the array */
+    ldr r4, =my_array_1
+    ldr r5, =0x55555555
+    ldr r6, =0x66666666
+    ldr r7, =0x77777777
+    ldr r8, =0x88888888
+    stmdb r4, {r5-r8}
+
+    /* Verify that my_array_0 changed and is equal to my_array_1. */
+    ASSERT_MEMCMP(my_array_0, my_array_1, 0x10)
+
+    /* Load registers and increment r4. */
+    ldr r4, =my_array_0
+    ldmia r4!, {r5-r8}
+    ASSERT_EQ(r4, my_array_1)
+
+EXIT
diff --git a/userland/arch/arm/ldr_pseudo.S b/userland/arch/arm/ldr_pseudo.S
new file mode 100644
index 0000000..800444a
--- /dev/null
+++ b/userland/arch/arm/ldr_pseudo.S
@@ -0,0 +1,65 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-ldr-instruction-pseudo-instruction */
+
+#include "common.h"
+
+ENTRY
+
+    /* Mnemonic for a PC relative load:
+     *
+     * ....
+     * ldr r0, [pc, offset]
+     * r0 = myvar
+     * ....
+     */
+    ldr r0, myvar
+    ASSERT_EQ(r0, 0x12345678)
+
+    /* Mnemonic PC relative load with an offset.
+     * Load myvar2 instead of myvar.
+     */
+    ldr r0, myvar + 4
+    ASSERT_EQ(r0, 0x9ABCDEF0)
+
+    /* First store the address in r0 using a magic =myvar, which creates
+     * a new variable containing the address and PC-relative addresses it
+     * https://stackoverflow.com/questions/17214962/what-is-the-difference-between-label-equals-sign-and-label-brackets-in-ar
+     *
+     * Use the adr instruction would likely be better for this application however.
+     *
+     * ....
+     * r0 = &myvar
+     * r1 = *r0
+     * ....
+     */
+    ldr r0, =myvar
+    ldr r1, [r0]
+    ASSERT_EQ(r1, 0x12345678)
+
+    /* More efficiently, use r0 as the address to read, and write to r0 itself. */
+    ldr r0, =myvar
+    ldr r0, [r0]
+    ASSERT_EQ(r0, 0x12345678)
+
+    /* Same as =myvar but store a constant to a register.
+     * Can also be done with movw and movt. */
+    ldr r0, =0x11112222
+    ASSERT_EQ(r0, 0x11112222)
+
+    /* We can also use GAS tolower16 and topper16  and movw and movt
+     * to load the address of myvar into r0 with two immediates.
+     *
+     * This results in one extra 4 byte instruction read from memory,
+     * and one less data read, so it is likely more cache efficient.
+     *
+     * https://sourceware.org/binutils/docs-2.19/as/ARM_002dRelocations.html
+     */
+    movw r0, #:lower16:myvar
+    movt r0, #:upper16:myvar
+    ldr r1, [r0]
+    ASSERT_EQ(r1, 0x12345678)
+
+EXIT
+myvar:
+    .word 0x12345678
+myvar2:
+    .word 0x9ABCDEF0
diff --git a/userland/arch/arm/ldrb.S b/userland/arch/arm/ldrb.S
new file mode 100644
index 0000000..569a6fb
--- /dev/null
+++ b/userland/arch/arm/ldrb.S
@@ -0,0 +1,12 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-ldrh-instruction-and-ldrb */
+
+#include "common.h"
+
+ENTRY
+    ldr r0, =myvar
+    mov r1, 0x0
+    ldrb r1, [r0]
+    ASSERT_EQ(r1, 0x00000078)
+EXIT
+myvar:
+    .word 0x12345678
diff --git a/userland/arch/arm/ldrh.S b/userland/arch/arm/ldrh.S
new file mode 100644
index 0000000..99d00bc
--- /dev/null
+++ b/userland/arch/arm/ldrh.S
@@ -0,0 +1,12 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-ldrh-instruction-and-ldrb */
+
+#include "common.h"
+
+ENTRY
+    ldr r0, =myvar
+    mov r1, 0x0
+    ldrh r1, [r0]
+    ASSERT_EQ(r1, 0x00005678)
+EXIT
+myvar:
+    .word 0x12345678
diff --git a/userland/arch/arm/linux/build b/userland/arch/arm/linux/build
new file mode 120000
index 0000000..ab18017
--- /dev/null
+++ b/userland/arch/arm/linux/build
@@ -0,0 +1 @@
+../build
\ No newline at end of file
diff --git a/userland/arch/arm/linux/c_from_asm.S b/userland/arch/arm/linux/c_from_asm.S
new file mode 100644
index 0000000..c469f74
--- /dev/null
+++ b/userland/arch/arm/linux/c_from_asm.S
@@ -0,0 +1,59 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#calling-convention */
+
+#include "common.h"
+
+.data
+puts_s:
+    .asciz "hello puts"
+printf_format:
+    .asciz "hello printf %x\n"
+my_array_0:
+    .word 0x11111111, 0x22222222, 0x33333333, 0x44444444
+my_array_1:
+    .word 0x55555555, 0x66666666, 0x77777777, 0x88888888
+
+ENTRY
+    /* puts("hello world") */
+    /* r0 is first argument. */
+    ldr r0, =puts_s
+    bl puts
+    /* Check exit status >= 0 for success. */
+    cmp r0, 0
+    ASSERT(bge)
+
+    /* printf */
+    ldr r0, =printf_format
+    ldr r1, =0x12345678
+    bl printf
+    cmp r0, 0
+    ASSERT(bge)
+
+    /* memcpy and memcmp. */
+
+        /* Smaller. */
+        ldr r0, =my_array_0
+        ldr r1, =my_array_1
+        ldr r2, =0x10
+        bl memcmp
+        cmp r0, 0
+        ASSERT(blt)
+
+        /* Copy. */
+        ldr r0, =my_array_0
+        ldr r1, =my_array_1
+        ldr r2, =0x10
+        bl memcpy
+
+        /* Equal. */
+        ldr r0, =my_array_0
+        ldr r1, =my_array_1
+        ldr r2, =0x10
+        bl memcmp
+        ASSERT_EQ(r0, 0)
+
+    /* exit(0) */
+    mov r0, 0
+    bl exit
+
+    /* Never reached, just for the fail symbol. */
+EXIT
diff --git a/userland/arch/arm/linux/test b/userland/arch/arm/linux/test
new file mode 120000
index 0000000..419df4f
--- /dev/null
+++ b/userland/arch/arm/linux/test
@@ -0,0 +1 @@
+../test
\ No newline at end of file
diff --git a/userland/arch/arm/mov.S b/userland/arch/arm/mov.S
new file mode 100644
index 0000000..c2a7477
--- /dev/null
+++ b/userland/arch/arm/mov.S
@@ -0,0 +1,19 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-mov-instruction */
+
+#include "common.h"
+
+ENTRY
+
+    /* Immediate. */
+    mov r0, 0
+    ASSERT_EQ(r0, 0)
+    mov r0, 1
+    ASSERT_EQ(r0, 1)
+
+    /* Register. */
+    mov r0, 0
+    mov r1, 1
+    mov r1, r0
+    ASSERT_EQ(r1, 0)
+
+EXIT
diff --git a/userland/arch/arm/movw.S b/userland/arch/arm/movw.S
new file mode 100644
index 0000000..4fa8d39
--- /dev/null
+++ b/userland/arch/arm/movw.S
@@ -0,0 +1,27 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-movw-and-movt-instructions */
+
+#include "common.h"
+
+ENTRY
+
+    /* movt (top) and movw (TODO what is w) set the higher
+     * and lower 16 bits of the register.
+     */
+    movw r0, 0xFFFF
+    movt r0, 0x1234
+    add r0, 1
+    ASSERT_EQ(r0, 0x12350000)
+
+    /* movw also zeroes out the top bits, allowing small 16-bit
+     * C constants to be assigned in a single instruction.
+     *
+     * It differs from mov because mov can only encode 8 bits
+     * at a time, while movw can encode 16.
+     *
+     * movt does not modify the lower bits however.
+     */
+    ldr r0, =0x12345678
+    movw r0, 0x1111
+    ASSERT_EQ(r0, 0x00001111)
+
+EXIT
diff --git a/userland/arch/arm/mul.S b/userland/arch/arm/mul.S
new file mode 100644
index 0000000..e92538c
--- /dev/null
+++ b/userland/arch/arm/mul.S
@@ -0,0 +1,15 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-data-processing-instructions
+ *
+ * Multiplication.
+ */
+
+#include "common.h"
+
+ENTRY
+    /* 2 * 3 = 6 */
+    mov r0, 0
+    mov r1, 2
+    mov r2, 3
+    mul r1, r2
+    ASSERT_EQ(r1, 6)
+EXIT
diff --git a/userland/arch/arm/nop.S b/userland/arch/arm/nop.S
new file mode 100644
index 0000000..bf0c041
--- /dev/null
+++ b/userland/arch/arm/nop.S
@@ -0,0 +1,32 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-nop-instruction */
+
+#include "common.h"
+
+ENTRY
+    /* Disassembles as:
+     *
+     * ....
+     * nop {0}
+     * ....
+     *
+     * TODO what is the `{0}`?
+     */
+    nop
+
+    /* Disassembles as:
+     *
+     * ....
+     * nop ; (mov r0, r0)
+     * ....
+     */
+    mov r0, r0
+
+    /* Disassemble as mov. TODO Why not as nop as in `mov r0, r0`?
+     * Do they have any effect?
+     */
+    mov r1, r1
+    mov r8, r8
+
+    /* And there are other nops as well? Disassembles as `and`. */
+    and r0, r0, r0
+EXIT
diff --git a/userland/arch/arm/push.S b/userland/arch/arm/push.S
new file mode 100644
index 0000000..6d226f2
--- /dev/null
+++ b/userland/arch/arm/push.S
@@ -0,0 +1,31 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-ldmia-instruction */
+
+#include "common.h"
+
+ENTRY
+
+    /* Save sp before push. */
+    mov r4, sp
+
+    /* Push. */
+    mov r5, 1
+    mov r6, 2
+    push {r5, r6}
+
+    /* Save sp after push. */
+    mov r5, sp
+
+    /* Restore. */
+    mov r7, 0
+    mov r8, 0
+    pop {r7, r8}
+    ASSERT_EQ(r7, 1)
+    ASSERT_EQ(r8, 2)
+
+    /* Check that stack pointer moved down by 8 bytes
+     * (2 registers x 4 bytes each).
+     */
+    sub r4, r5
+    ASSERT_EQ(r4, 8)
+
+EXIT
diff --git a/userland/arch/arm/rbit.S b/userland/arch/arm/rbit.S
new file mode 100644
index 0000000..f8826fd
--- /dev/null
+++ b/userland/arch/arm/rbit.S
@@ -0,0 +1,12 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-data-processing-instructions
+ *
+ * Reverse bit order.
+ */
+
+#include "common.h"
+
+ENTRY
+    ldr r0,      =0b00000001001000110100010101100101
+    rbit r1, r0
+    ASSERT_EQ(r1, 0b10100110101000101100010010000000)
+EXIT
diff --git a/userland/arch/arm/registers.S b/userland/arch/arm/registers.S
new file mode 100644
index 0000000..8545903
--- /dev/null
+++ b/userland/arch/arm/registers.S
@@ -0,0 +1,69 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#assembly-registers */
+
+#include "common.h"
+
+ENTRY
+
+    /* 13 general purpose registers. */
+    mov r0, 0
+    mov r1, 1
+    mov r2, 2
+    mov r3, 3
+    mov r4, 4
+    mov r5, 5
+    mov r6, 6
+    mov r7, 7
+    mov r8, 8
+    mov r9, 9
+    mov r10, 10
+    mov r11, 11
+    mov r12, 12
+
+    /* * r11: aliased to FP (frame pointer, debug stack trace usage only)
+     * +
+     * I think FP is only a convention with no instruction impact, but TODO:
+     * not mentioned on AAPCS. aarch64 AAPCS mentions it though.
+     * * r13: aliased to SP (stack pointer), what push / pop use
+     * * r14: aliased to LR (link register), what bl writes the return address to
+     * * r15: aliased to PC (program counter), contains the current instruction address
+     *
+     * In ARMv8, SP and PC have dedicated registers in addition to
+     * the 32-general purpose ones. LR is still general purpose as before.
+     *
+     * Therefore, it is possible to use those registers in any place
+     * other registers may be used.
+     *
+     * This is not possible in ARMv8 anymore.
+     *
+     * For example, we can load an address into PC, which is very similar to what B / BX does:
+     * https://stackoverflow.com/questions/32304646/arm-assembly-branch-to-address-inside-register-or-memory/54145818#54145818
+     */
+    ldr pc, =10f
+    FAIL
+10:
+
+    /* Same with r15, which is the same as pc. */
+    ldr r15, =10f
+    FAIL
+10:
+
+    /* Another example with mov reading from pc. */
+pc_addr:
+    mov r0, pc
+    /* Why sub 8:
+     * https://stackoverflow.com/questions/24091566/why-does-the-arm-pc-register-point-to-the-instruction-after-the-next-one-to-be-e
+     */
+    sub r0, r0, 8
+
+    /* pc-relative load also just work just like any other register. */
+    ldr r0, [pc]
+    b 1f
+    .word 0x12345678
+1:
+    ASSERT_EQ(r0, 0x12345678)
+
+    /* We can also use fp in GNU GAS assembly. */
+    mov r11, 0
+    mov fp, 1
+    ASSERT_EQ(r11, 1)
+EXIT
diff --git a/userland/arch/arm/rev.S b/userland/arch/arm/rev.S
new file mode 100644
index 0000000..05cdd3e
--- /dev/null
+++ b/userland/arch/arm/rev.S
@@ -0,0 +1,18 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#data-processing-instructions
+ *
+ * Reverse byte order.
+ */
+
+#include "common.h"
+
+ENTRY
+    /* All bytes in register. */
+    ldr r0, =0x11223344
+    rev r1, r0
+    ASSERT_EQ(r1, 0x44332211)
+
+    /* Groups of 16-bits. */
+    ldr r0, =0x11223344
+    rev16 r1, r0
+    ASSERT_EQ(r1, 0x22114433)
+EXIT
diff --git a/userland/arch/arm/s_suffix.S b/userland/arch/arm/s_suffix.S
new file mode 100644
index 0000000..c014964
--- /dev/null
+++ b/userland/arch/arm/s_suffix.S
@@ -0,0 +1,35 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-s-suffix */
+
+#include "common.h"
+
+ENTRY
+
+    /* Result is 0, set beq. */
+    movs r0, 0
+    ASSERT(beq)
+
+    /* The opposite. */
+    movs r0, 1
+    ASSERT(bne)
+
+    /* mov without s does not set the status. */
+    movs r0, 0
+    mov r0, 1
+    ASSERT(beq)
+
+    /* movs still moves... */
+    mov r0, 0
+    movs r0, 1
+    ASSERT_EQ(r0, 1)
+
+    /* add: the result is 0. */
+    mov r0, 1
+    adds r0, -1
+    ASSERT(beq)
+
+    /* add: result non 0. */
+    mov r0, 1
+    adds r0, 1
+    ASSERT(bne)
+
+EXIT
diff --git a/userland/arch/arm/shift.S b/userland/arch/arm/shift.S
new file mode 100644
index 0000000..4a8694e
--- /dev/null
+++ b/userland/arch/arm/shift.S
@@ -0,0 +1,79 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-shift-suffixes */
+
+#include "common.h"
+
+ENTRY
+
+    /* lsr */
+    ldr r0, =0xFFF00FFF
+    mov r1, r0, lsl 8
+    ldr r2, =0xF00FFF00
+    ASSERT_EQ_REG(r1, r2)
+
+    /* lsl */
+    ldr r0, =0xFFF00FFF
+    mov r1, r0, lsr 8
+    ldr r2, =0x00FFF00F
+    ASSERT_EQ_REG(r1, r2)
+
+    /* ror */
+    ldr r0, =0xFFF00FFF
+    mov r1, r0, ror 8
+    ldr r2, =0xFFFFF00F
+    ASSERT_EQ_REG(r1, r2)
+
+    /* asr negative */
+    ldr r0, =0x80000008
+    mov r1, r0, asr 1
+    ldr r2, =0xC0000004
+    ASSERT_EQ_REG(r1, r2)
+
+    /* asr positive */
+    ldr r0, =0x40000008
+    mov r1, r0, asr 1
+    ldr r2, =0x20000004
+    ASSERT_EQ_REG(r1, r2)
+
+    /* There are also direct shift mnemonics for the mov shifts.
+     *
+     * They assembly to the exact same bytes as the mov version
+     */
+    ldr r0, =0xFFF00FFF
+    lsl r1, r0, 8
+    ldr r2, =0xF00FFF00
+    ASSERT_EQ_REG(r1, r2)
+
+    /* If used with the `mov` instruction, it results in a pure shift,
+     * but the suffixes also exist for all the other data processing instructions.
+     *
+     * Here we illustrate a shifted add instruction which calculates:
+     *
+     * ....
+     * r1 = r1 + (r0 << 1)
+     * ....
+     */
+    ldr r0, =0x10
+    ldr r1, =0x100
+    add r1, r1, r0, lsl 1
+    ldr r2, =0x00000120
+    ASSERT_EQ_REG(r1, r2)
+
+    /* The shift takes up the same encoding slot as the immediate,
+     * therefore it is not possible to both use an immediate and shift.
+     *
+     * Error: shift expression expected -- `add r1,r0,1,lsl#1'
+     */
+#if 0
+    add r1, r0, 1, lsl 1
+#endif
+
+    /* However, you can still encode shifted bitmasks of
+     * limited width in immediates, so why not just use the
+     * assembler pre-processing for it?
+     */
+    ldr r1, =0x100
+    add r1, r1, (0x10 << 1)
+    ldr r2, =0x00000120
+    ASSERT_EQ_REG(r1, r2)
+
+EXIT
diff --git a/userland/arch/arm/str.S b/userland/arch/arm/str.S
new file mode 100644
index 0000000..0f08268
--- /dev/null
+++ b/userland/arch/arm/str.S
@@ -0,0 +1,60 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#load-and-store-instructions */
+
+#include "common.h"
+
+.data;
+    /* Must be in the .data section, since we want to modify it. */
+myvar:
+    .word 0x12345678
+
+ENTRY
+    /* Sanity check. */
+    ldr r0, =myvar
+    ldr r1, [r0]
+    movw r2, 0x5678
+    movt r2, 0x1234
+    ASSERT_EQ_REG(r1, r2)
+
+    /* Modify the value. */
+    ldr r0, =myvar
+    movw r1, 0xDEF0
+    movt r1, 0x9ABC
+    str r1, [r0]
+
+    /* Check that it changed. */
+    ldr r0, =myvar
+    ldr r1, [r0]
+    movw r2, 0xDEF0
+    movt r2, 0x9ABC
+    ASSERT_EQ_REG(r1, r2)
+
+    /* Cannot use PC relative addressing to a different segment,
+     * or else it fails with:
+     *
+     * ....
+     * Error: internal_relocation (type: OFFSET_IMM) not fixed up
+     * ....
+     *
+     * https://stackoverflow.com/questions/10094282/internal-relocation-not-fixed-up
+     */
+    /*ldr r0, myvar*/
+
+#if 0
+    /* We could in theory write this to set the address of myvar,
+     * but it will always segfault under Linux because the text segment is read-only.
+     * This is however useful in baremetal programming.
+     * This construct is not possible in ARMv8 for str:
+     * https://github.com/cirosantilli/linux-kernel-module-cheat#armv8-str
+     */
+    str r1, var_in_same_section
+var_in_same_section:
+#endif
+
+    /* = sign just doesn't make sense for str, you can't set the
+     * address of a variable.
+     */
+#if 0
+    str r1, =myvar
+#endif
+
+EXIT
diff --git a/userland/arch/arm/sub.S b/userland/arch/arm/sub.S
new file mode 100644
index 0000000..9a5342c
--- /dev/null
+++ b/userland/arch/arm/sub.S
@@ -0,0 +1,14 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-data-processing-instructions
+ *
+ * Subtraction.
+ */
+
+#include "common.h"
+
+ENTRY
+    /* 3 - 2 == 1 , register version.*/
+    mov r0, 3
+    mov r1, 2
+    sub r0, r0, r1
+    ASSERT_EQ(r0, 1)
+EXIT
diff --git a/userland/arch/arm/test b/userland/arch/arm/test
new file mode 120000
index 0000000..419df4f
--- /dev/null
+++ b/userland/arch/arm/test
@@ -0,0 +1 @@
+../test
\ No newline at end of file
diff --git a/userland/arch/arm/thumb.S b/userland/arch/arm/thumb.S
new file mode 100644
index 0000000..6570806
--- /dev/null
+++ b/userland/arch/arm/thumb.S
@@ -0,0 +1,21 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-instruction-encodings
+ *
+ * Illustrates features that are only available in thumb.
+ * TODO ensure that we are actually inside of thumb.
+ */
+
+.syntax unified
+.text
+.thumb_func
+.global asm_main
+asm_main:
+asm_main_after_prologue:
+
+    /* CBZ: cmp and branch if zero instruction. Equivalent to CMP + BEQ.
+     * TODO create an interesting assertion here.
+     */
+    cbz r1, 1f
+    1:
+
+    mov r0, 0
+    bx lr
diff --git a/userland/arch/arm/tst.S b/userland/arch/arm/tst.S
new file mode 100644
index 0000000..a7dc616
--- /dev/null
+++ b/userland/arch/arm/tst.S
@@ -0,0 +1,22 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-data-processing-instructions
+ *
+ * Test. Same as ands, but don't store the result, just update flags.
+ */
+
+#include "common.h"
+
+ENTRY
+
+    /* 0x0F && 0xF0 == 0x00, so beq. */
+    mov r0, 0x0F
+    tst r0, 0xF0
+    ASSERT(beq)
+
+    /* bne */
+    mov r0, 0xFF
+    tst r0, 0x0F
+    ASSERT(bne)
+    # r0 was not modified.
+    ASSERT_EQ(r0, 0xFF)
+
+EXIT
diff --git a/userland/arch/arm/vadd_scalar.S b/userland/arch/arm/vadd_scalar.S
new file mode 100644
index 0000000..84dba0e
--- /dev/null
+++ b/userland/arch/arm/vadd_scalar.S
@@ -0,0 +1,72 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#vfp
+ * Adapted from: https://mindplusplus.wordpress.com/2013/06/27/arm-vfp-vector-programming-part-2-examples/ */
+
+#include "common.h"
+
+ENTRY
+    /* Minimal single precision floating point example.
+     * TODO: floating point representation constraints due to 4-byte instruction?
+     */
+    vmov s0, 1.5
+    vmov s1, 2.5
+    vadd.f32 s2, s0, s1
+    vmov s3, 4.0
+    /* Compare two floating point registers. Stores results in fpscr:
+     * (floating point status and control register).
+     */
+    vcmp.f32 s2, s3
+    /* Move the nzcv bits from fpscr to apsr */
+    vmrs apsr_nzcv, fpscr
+    /* This branch uses the Z bit of apsr, which was set accordingly. */
+    ASSERT(beq)
+
+    /* Now the same from memory with vldr and vstr. */
+.data
+my_float_0:
+    .float 1.5
+my_float_1:
+    .float 2.5
+my_float_sum_expect:
+    .float 4.0
+.bss
+my_float_sum:
+    .skip 4
+.text
+    ldr r0, =my_float_0
+    vldr s0, [r0]
+    ldr r0, =my_float_1
+    vldr s1, [r0]
+    vadd.f32 s2, s0, s1
+    ldr r0, =my_float_sum
+    vstr.f32 s2, [r0]
+    ASSERT_MEMCMP(my_float_sum, my_float_sum_expect, 4)
+
+#if 0
+    /* We can't do pseudo vldr as for ldr, fails with:
+     * Error: cannot represent CP_OFF_IMM relocation in this object file format
+     * It works on ARMv8 however, so the relocation must have been added.
+     */
+    vldr s0, my_float_0
+#endif
+
+    /* Minimal double precision floating point example. */
+    vmov.f64 d0, 1.5
+    vmov.f64 d1, 2.5
+    vadd.f64 d2, d0, d1
+    vmov.f64 d3, 4.0
+    vcmp.f64 d2, d3
+    vmrs apsr_nzcv, fpscr
+    ASSERT(beq)
+
+    /* vmov can also move to general purpose registers.
+     *
+     * Just remember that we can't use float immediates with general purpose registers:
+     * https://stackoverflow.com/questions/6514537/how-do-i-specify-immediate-floating-point-numbers-with-inline-assembly/52906126#52906126
+     */
+    mov r1, 2
+    mov r0, 1
+    vmov s0, r0
+    vmov s1, s0
+    vmov r1, s1
+    ASSERT_EQ_REG(r0, r1)
+EXIT
diff --git a/userland/arch/arm/vadd_vector.S b/userland/arch/arm/vadd_vector.S
new file mode 100644
index 0000000..e688584
--- /dev/null
+++ b/userland/arch/arm/vadd_vector.S
@@ -0,0 +1,71 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-vadd-instruction */
+
+#include "common.h"
+
+.bss
+    output:      .skip 16
+ENTRY
+    /* Integer. */
+.data
+    input0_u:    .long 0xF1F1F1F1, 0xF2F2F2F2, 0xF3F3F3F3, 0xF4F4F4F4
+    input1_u:    .long 0x12121212, 0x13131313, 0x14141414, 0x15151515
+    expect_u_32: .long 0x04040403, 0x06060605, 0x08080807, 0x0A0A0A09
+    expect_u_64: .long 0x04040403, 0x06060606, 0x08080807, 0x0A0A0A0A
+.text
+#define TEST(size) \
+    ldr r0, =input0_u; \
+    vld1. ## size {q0}, [r0]; \
+    ldr r0, =input1_u; \
+    vld1. ## size {q1}, [r0]; \
+    vadd.u ## size q2, q0, q1; \
+    ldr r0, =output; \
+    vst1.u ## size {q2}, [r0]; \
+    ASSERT_MEMCMP(output, expect_u_ ## size, 0x10)
+
+    /* vadd.u32
+     *
+     * Add 4x 32-bit unsigned integers in one go.
+     *
+     * q means quad (128-bits)
+     *
+     * u32 means that we treat memory as uint32_t types.
+     *
+     * 4 is deduced: in 128 bits you can fit 4x u32.
+     */
+    TEST(32)
+    /* 2x 64-bit */
+    TEST(64)
+#undef TEST
+
+    /* Floating point. */
+.data
+    input0_f_32: .float 1.5, 2.5,  3.5,  4.5
+    input1_f_32: .float 5.5, 6.5,  7.5,  8.5
+    expect_f_32: .float 7.0, 9.0, 11.0, 13.0
+    input0_f_64: .double 1.5, 2.5
+    input1_f_64: .double 5.5, 6.5
+    expect_f_64: .double 7.0, 9.0
+.text
+#define TEST(size) \
+    ldr r0, =input0_f_ ## size; \
+    vld1. ## size {q0}, [r0]; \
+    ldr r0, =input1_f_ ## size; \
+    vld1. ## size {q1}, [r0]; \
+    vadd.f ## size q2, q0, q1; \
+    ldr r0, =output; \
+    vst1. ## size {q2}, [r0]; \
+    ASSERT_MEMCMP(output, expect_f_ ## size, 0x10)
+
+    /* 4x 32-bit. */
+    TEST(32)
+#if 0
+    /* vadd.f64: 2x 64-bit float add: appears not possible.
+     * https://stackoverflow.com/questions/36052564/does-arm-support-simd-operations-for-64-bit-floating-point-numbers
+     *
+     * Fails with:
+     * bad type in Neon instruction -- `vadd.f64 q2,q0,q1' */
+     */
+    TEST(64)
+#endif
+#undef TEST
+EXIT
diff --git a/userland/arch/arm/vcvt.S b/userland/arch/arm/vcvt.S
new file mode 100644
index 0000000..cae037b
--- /dev/null
+++ b/userland/arch/arm/vcvt.S
@@ -0,0 +1,90 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-vcvt-instruction */
+
+#include "common.h"
+
+ENTRY
+    /* SIMD positive. */
+.data
+    vcvt_positive_0:      .float 1.25, 2.5, 3.75, 4.0
+    vcvt_positive_expect: .word  1,    2,   3,    4
+.bss
+    vcvt_positive_result: .skip 0x10
+.text
+    ldr r0, =vcvt_positive_0
+    vld1.32 {q0}, [r0]
+    vcvt.u32.f32 q1, q0
+    ldr r0, =vcvt_positive_result
+    vst1.32 {q1}, [r0]
+    ASSERT_MEMCMP(vcvt_positive_result, vcvt_positive_expect, 0x10)
+
+    /* SIMD negative. */
+.data
+    vcvt_negative_0:      .float -1.25, -2.5, -3.75, -4.0
+    vcvt_negative_expect: .word  -1,    -2,   -3,    -4
+.bss
+    vcvt_negative_result: .skip 0x10
+.text
+    ldr r0, =vcvt_negative_0
+    vld1.32 {q0}, [r0]
+    vcvt.s32.f32 q1, q0
+    ldr r0, =vcvt_negative_result
+    vst1.32 {q1}, [r0]
+    ASSERT_MEMCMP(vcvt_negative_result, vcvt_negative_expect, 0x10)
+
+    /* Floating point. */
+.data
+    vcvt_positive_float_0:      .float 1.5, 2.5
+    vcvt_positive_float_expect: .word  1
+                                .float      2.5
+.bss
+    vcvt_positive_float_result: .skip 0x8
+.text
+    ldr r0, =vcvt_positive_float_0
+    vld1.32 {d0}, [r0]
+    vcvt.u32.f32 s0, s0
+    ldr r0, =vcvt_positive_float_result
+    vst1.32 {d0}, [r0]
+    ASSERT_MEMCMP(vcvt_positive_float_result, vcvt_positive_float_expect, 0x8)
+
+    /* Floating point but with immediates.
+     *
+     * You have to worry of course about representability of
+     * the immediate in 4 bytes, which is even more fun for
+     * floating point numbers :-)
+     *
+     * Doing this mostly to illustrate the joys of vmov.i32.
+     *
+     * For some reason, there is no vmov.i32 sn, only dn.
+     * If you try to use sn, it does the same as .f32 and
+     * stores a float instead. Horrible!
+     */
+    vmov.f32 d0, 1.5
+    vcvt.u32.f32 s0, s0
+    vmov.i32 d1, 1
+    vcmp.f32 s0, s2
+    vmrs apsr_nzcv, fpscr
+    ASSERT(beq)
+    /* Check that s1 wasn't modified by vcvt. */
+    vmov.f32 s2, 1.5
+    vcmp.f32 s1, s2
+    vmrs apsr_nzcv, fpscr
+    ASSERT(beq)
+
+    /* Floating point double precision. */
+.data
+    vcvt_positive_double_0:      .double 1.5
+    vcvt_positive_double_expect: .word   1
+.bss
+    vcvt_positive_double_result: .skip 0x8
+.text
+    ldr r0, =vcvt_positive_double_0
+    vld1.64 {d0}, [r0]
+    vcvt.u32.f64 s0, d0
+    ldr r0, =vcvt_positive_double_result
+    vst1.32 {d0}, [r0]
+    ASSERT_MEMCMP(
+        vcvt_positive_double_result,
+        vcvt_positive_double_expect,
+        0x4
+    )
+EXIT
diff --git a/userland/arch/arm/vcvta.S b/userland/arch/arm/vcvta.S
new file mode 100644
index 0000000..539c31f
--- /dev/null
+++ b/userland/arch/arm/vcvta.S
@@ -0,0 +1,41 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-vcvta-instruction */
+
+#include "common.h"
+
+ENTRY
+    /* SIMD positive. */
+.data
+    vcvta_positive_0:      .float 1.25, 2.5, 3.75, 4.0
+    vcvta_positive_expect: .word  1,    3,   4,    4
+.bss
+    vcvta_positive_result: .skip 0x10
+.text
+    ldr r0, =vcvta_positive_0
+    vld1.32 {q0}, [r0]
+    vcvta.u32.f32 q1, q0
+    ldr r0, =vcvta_positive_result
+    vst1.32 {q1}, [r0]
+    ASSERT_MEMCMP(
+        vcvta_positive_result,
+        vcvta_positive_expect,
+        0x10
+    )
+
+    /* SIMD negative. */
+.data
+    vcvta_negative_0:      .float -1.25, -2.5, -3.75, -4.0
+    vcvta_negative_expect: .word  -1,    -3,   -4,    -4
+.bss
+    vcvta_negative_result: .skip 0x10
+.text
+    ldr r0, =vcvta_negative_0
+    vld1.32 {q0}, [r0]
+    vcvta.s32.f32 q1, q0
+    ldr r0, =vcvta_negative_result
+    vst1.32 {q1}, [r0]
+    ASSERT_MEMCMP(
+        vcvta_negative_result,
+        vcvta_negative_expect,
+        0x10
+    )
+EXIT
diff --git a/userland/arch/arm/vcvtr.S b/userland/arch/arm/vcvtr.S
new file mode 100644
index 0000000..0795366
--- /dev/null
+++ b/userland/arch/arm/vcvtr.S
@@ -0,0 +1,46 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#arm-vcvtrr-instruction */
+
+#include "common.h"
+
+ENTRY
+.data
+    vcvtr_0:                    .float 1.25, 2.5, 3.75, 4.0
+    vcvtr_expect_zero:          .word  1,    2,   3,    4
+    vcvtr_expect_plus_infinity: .word  2,    3,   4,    4
+.bss
+    vcvtr_result_zero:          .skip 0x10
+    vcvtr_result_plus_infinity: .skip 0x10
+.text
+    ldr r0, =vcvtr_0
+    vld1.32 {q0}, [r0]
+
+    /* zero */
+    vmrs r0, fpscr
+    orr r0, r0, (3 << 22)
+    vmsr fpscr, r0
+    vcvtr.u32.f32 q1, q0
+    ldr r0, =vcvtr_result_zero
+    vst1.32 {q1}, [r0]
+    ASSERT_MEMCMP(
+        vcvtr_result_zero,
+        vcvtr_expect_zero,
+        0x10
+    )
+
+#if 0
+    /* TODO why is this not working? Rounds to zero still. */
+    /* plus infinity */
+    vmrs r0, fpscr
+    mov r1, 1
+    bfi r0, r1, 22, 2
+    vmsr fpscr, r0
+    vcvtr.u32.f32 q1, q0
+    ldr r0, =vcvtr_result_plus_infinity
+    vst1.32 {q1}, [r0]
+    ASSERT_MEMCMP(
+        vcvtr_result_plus_infinity,
+        vcvtr_expect_plus_infinity,
+        0x10
+    )
+#endif
+EXIT
diff --git a/userland/arch/common.h b/userland/arch/common.h
new file mode 100644
index 0000000..d6812ec
--- /dev/null
+++ b/userland/arch/common.h
@@ -0,0 +1,32 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#userland-assembly-c-standard-library */
+
+#ifndef COMMON_H
+#define COMMON_H
+
+/* We define in this header only macros that are the same on all archs. */
+
+/* common_arch.h contains arch specific macros. */
+#include "common_arch.h"
+
+.extern \
+    exit, \
+    printf, \
+    puts \
+;
+
+/* Assert that the given branch instruction is taken. */
+#define ASSERT(branch_if_pass) \
+    branch_if_pass 1f; \
+    FAIL; \
+1: \
+;
+
+#ifndef ASSERT_EQ_REG
+/* Assert that a register equals another register. */
+#define ASSERT_EQ_REG(reg1, reg2) \
+    cmp reg1, reg2; \
+    ASSERT(beq); \
+;
+#endif
+
+#endif
diff --git a/userland/arch/empty.S b/userland/arch/empty.S
new file mode 100644
index 0000000..cf9ae14
--- /dev/null
+++ b/userland/arch/empty.S
@@ -0,0 +1,8 @@
+/* Please don't do anything, including crashing.
+ * https://github.com/cirosantilli/linux-kernel-module-cheat#userland-assembly
+ */
+
+#include "common.h"
+
+ENTRY
+EXIT
diff --git a/userland/arch/fail.S b/userland/arch/fail.S
new file mode 100644
index 0000000..7102e95
--- /dev/null
+++ b/userland/arch/fail.S
@@ -0,0 +1,9 @@
+/* See what happens on test failure with FAIL.
+ * https://github.com/cirosantilli/linux-kernel-module-cheat#userland-assembly
+ */
+
+#include "common.h"
+
+ENTRY
+    FAIL
+EXIT
diff --git a/userland/arch/main.c b/userland/arch/main.c
new file mode 100644
index 0000000..eddcbcf
--- /dev/null
+++ b/userland/arch/main.c
@@ -0,0 +1,61 @@
+/* This is the main entry point for all .S examples.
+ * https://github.com/cirosantilli/linux-kernel-module-cheat#userland-assembly-c-standard-library
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+
+#include <lkmc.h>
+
+int asm_main(uint32_t *line);
+
+#define ASSERT_EQ_DEFINE(bits) \
+    int assert_eq_ ## bits(uint ## bits ## _t val1, uint ## bits ## _t val2) { \
+        if (val1 != val2) { \
+            printf("%s failed\n", __func__); \
+            printf("val1 0x%" PRIX ## bits "\n", val1); \
+            printf("val2 0x%" PRIX ## bits "\n", val2); \
+            return 1; \
+        } \
+        return 0; \
+    }
+
+ASSERT_EQ_DEFINE(32)
+ASSERT_EQ_DEFINE(64)
+
+int assert_memcmp(const void *s1, const void *s2, size_t n) {
+    int ret;
+    size_t i;
+    uint8_t *s1b, *s2b;
+    uint8_t b1, b2;
+
+    ret = 0;
+    s1b = (uint8_t *)s1;
+    s2b = (uint8_t *)s2;
+    for (i = 0; i < n; ++i) {
+        b1 = s1b[i];
+        b2 = s2b[i];
+        if (b1 != b2) {
+            printf(
+                "%s failed: "
+                "byte1, byte2, index: "
+                "0x%02" PRIX8 " 0x%02" PRIX8 " 0x%zX\n",
+                __func__,
+                b1,
+                b2,
+                i
+            );
+            ret = 1;
+        }
+    }
+    return ret;
+}
+
+int main(void) {
+    uint32_t ret, line;
+    ret = asm_main(&line);
+    if (ret) {
+        printf("error: asm_main returned %d at line %d\n", ret, line);
+    }
+    return ret;
+}
diff --git a/userland/arch/test b/userland/arch/test
new file mode 120000
index 0000000..419df4f
--- /dev/null
+++ b/userland/arch/test
@@ -0,0 +1 @@
+../test
\ No newline at end of file
diff --git a/userland/arch/x86_64/add.S b/userland/arch/x86_64/add.S
new file mode 100644
index 0000000..978b1a3
--- /dev/null
+++ b/userland/arch/x86_64/add.S
@@ -0,0 +1,9 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#userland-assembly */
+
+#include "common.h"
+
+ENTRY
+    mov $1, %rax
+    add $2, %rax
+    ASSERT_EQ(%rax, $3)
+EXIT
diff --git a/userland/arch/x86_64/addpd.S b/userland/arch/x86_64/addpd.S
new file mode 100644
index 0000000..cef8599
--- /dev/null
+++ b/userland/arch/x86_64/addpd.S
@@ -0,0 +1,31 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#x86-addpq-instruction
+ *
+ * Add a bunch of floating point numbers in one go.
+ */
+
+#include "common.h"
+
+ENTRY
+.bss
+    output:       .skip 16
+.data
+    addps_input0: .float 1.5, 2.5,  3.5,  4.5
+    addps_input1: .float 5.5, 6.5,  7.5,  8.5
+    addps_expect: .float 7.0, 9.0, 11.0, 13.0
+    addpd_input0: .double 1.5, 2.5
+    addpd_input1: .double 5.5, 6.5
+    addpd_expect: .double 7.0, 9.0
+.text
+#define TEST(size) \
+    movups addp ## size ## _input0, %xmm0; \
+    movups addp ## size ## _input1, %xmm1; \
+    addp ## size %xmm1, %xmm0; \
+    movups %xmm0, output; \
+    ASSERT_MEMCMP(output, addp ## size ## _expect, $0x10)
+
+    /* 4x 32-bit */
+    TEST(s)
+    /* 2x 64-bit */
+    TEST(d)
+#undef TEST
+EXIT
diff --git a/userland/arch/x86_64/asm_hello.c b/userland/arch/x86_64/asm_hello.c
deleted file mode 100644
index 0158aa9..0000000
--- a/userland/arch/x86_64/asm_hello.c
+++ /dev/null
@@ -1,16 +0,0 @@
-#include <assert.h>
-#include <inttypes.h>
-
-int main(void) {
-	uint64_t in = 0xFFFFFFFF;
-	uint64_t out = 0;
-	__asm__ (
-		"mov %[in], %%rax;"
-		"inc %%rax;"
-		"movq %%rax, %[out]"
-		: [out] "=g" (out)
-		: [in] "g" (in)
-		: "rax"
-	);
-	assert(out == in + 1);
-}
diff --git a/userland/arch/x86_64/binutils_hack.c b/userland/arch/x86_64/binutils_hack.c
deleted file mode 100644
index 6871f68..0000000
--- a/userland/arch/x86_64/binutils_hack.c
+++ /dev/null
@@ -1,20 +0,0 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#your-first-binutils-hack */
-
-#include <assert.h>
-#include <inttypes.h>
-
-int main(void) {
-#if 0
-	uint64_t in = 0xFFFFFFFF;
-	uint64_t out = 0;
-	__asm__ (
-		"mov %[in], %%rax;"
-		"myinc %%rax;"
-		"movq %%rax, %[out]"
-		: [out] "=g" (out)
-		: [in] "g" (in)
-		: "rax"
-	);
-	assert(out == in + 1);
-#endif
-}
diff --git a/userland/arch/x86_64/c/add.c b/userland/arch/x86_64/c/add.c
new file mode 100644
index 0000000..2d3e461
--- /dev/null
+++ b/userland/arch/x86_64/c/add.c
@@ -0,0 +1,18 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#gcc-inline-assembly */
+
+#include <assert.h>
+#include <inttypes.h>
+
+int main(void) {
+    uint64_t in1 = 0xFFFFFFFF;
+    uint64_t in2 = 0x1;
+    uint64_t out;
+    __asm__ (
+        "lea (%[in1], %[in2]), %[out];"
+        : [out] "=r" (out)
+        : [in1] "r" (in1),
+          [in2] "r" (in2)
+        :
+    );
+    assert(out == 0x100000000);
+}
diff --git a/userland/arch/x86_64/c/binutils_hack.c b/userland/arch/x86_64/c/binutils_hack.c
new file mode 100644
index 0000000..1432167
--- /dev/null
+++ b/userland/arch/x86_64/c/binutils_hack.c
@@ -0,0 +1,20 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#your-first-binutils-hack */
+
+#include <assert.h>
+#include <inttypes.h>
+
+int main(void) {
+#if 0
+    uint64_t in = 0xFFFFFFFF;
+    uint64_t out = 0;
+    __asm__ (
+        "mov %[in], %%rax;"
+        "myinc %%rax;"
+        "movq %%rax, %[out]"
+        : [out] "=g" (out)
+        : [in] "g" (in)
+        : "rax"
+    );
+    assert(out == in + 1);
+#endif
+}
diff --git a/userland/arch/x86_64/c/binutils_nohack.c b/userland/arch/x86_64/c/binutils_nohack.c
new file mode 100644
index 0000000..892d20a
--- /dev/null
+++ b/userland/arch/x86_64/c/binutils_nohack.c
@@ -0,0 +1,18 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#your-first-binutils-hack */
+
+#include <assert.h>
+#include <inttypes.h>
+
+int main(void) {
+    uint64_t in = 0xFFFFFFFF;
+    uint64_t out = 0;
+    __asm__ (
+        "mov %[in], %%rax;"
+        "inc %%rax;"
+        "movq %%rax, %[out]"
+        : [out] "=g" (out)
+        : [in] "g" (in)
+        : "rax"
+    );
+    assert(out == in + 1);
+}
diff --git a/userland/arch/x86_64/c/build b/userland/arch/x86_64/c/build
new file mode 120000
index 0000000..ab18017
--- /dev/null
+++ b/userland/arch/x86_64/c/build
@@ -0,0 +1 @@
+../build
\ No newline at end of file
diff --git a/userland/arch/x86_64/c/freestanding/build b/userland/arch/x86_64/c/freestanding/build
new file mode 120000
index 0000000..ab18017
--- /dev/null
+++ b/userland/arch/x86_64/c/freestanding/build
@@ -0,0 +1 @@
+../build
\ No newline at end of file
diff --git a/userland/arch/x86_64/c/freestanding/linux/build b/userland/arch/x86_64/c/freestanding/linux/build
new file mode 120000
index 0000000..ab18017
--- /dev/null
+++ b/userland/arch/x86_64/c/freestanding/linux/build
@@ -0,0 +1 @@
+../build
\ No newline at end of file
diff --git a/userland/arch/x86_64/c/freestanding/linux/hello.c b/userland/arch/x86_64/c/freestanding/linux/hello.c
new file mode 100644
index 0000000..a5ed92f
--- /dev/null
+++ b/userland/arch/x86_64/c/freestanding/linux/hello.c
@@ -0,0 +1,33 @@
+/* x86_64 freestanding C inline assemby Linux hello world
+ * https://github.com/cirosantilli/linux-kernel-module-cheat#linux-system-calls
+ */
+
+#define _XOPEN_SOURCE 700
+#include <inttypes.h>
+#include <sys/types.h>
+
+ssize_t my_write(int fd, const void *buf, size_t size) {
+    ssize_t ret;
+    __asm__ __volatile__ (
+        "syscall"
+        : "=a" (ret)
+        : "0" (1), "D" (fd), "S" (buf), "d" (size)
+        : "cc", "rcx", "r11", "memory"
+    );
+    return ret;
+}
+
+void my_exit(int exit_status) {
+    ssize_t ret;
+    __asm__ __volatile__ (
+        "syscall"
+        : "=a" (ret)
+        : "0" (60), "D" (exit_status)
+        : "cc", "rcx", "r11", "memory"
+    );
+}
+
+void _start(void) {
+    char msg[] = "hello\n";
+    my_exit(my_write(1, msg, sizeof(msg)) != sizeof(msg));
+}
diff --git a/userland/arch/x86_64/c/freestanding/linux/hello_regvar.c b/userland/arch/x86_64/c/freestanding/linux/hello_regvar.c
new file mode 100644
index 0000000..8f98c54
--- /dev/null
+++ b/userland/arch/x86_64/c/freestanding/linux/hello_regvar.c
@@ -0,0 +1,37 @@
+/* Same as hello.c, but with explicit register variables, see:
+ * https://github.com/cirosantilli/linux-kernel-module-cheat#linux-system-calls
+ */
+
+#define _XOPEN_SOURCE 700
+#include <inttypes.h>
+#include <sys/types.h>
+
+ssize_t my_write(int fd, const void *buf, size_t size) {
+    register int64_t rax __asm__ ("rax") = 1;
+    register int rdi __asm__ ("rdi") = fd;
+    register const void *rsi __asm__ ("rsi") = buf;
+    register size_t rdx __asm__ ("rdx") = size;
+    __asm__ __volatile__ (
+        "syscall"
+        : "+r" (rax)
+        : "r" (rdi), "r" (rsi), "r" (rdx)
+        : "cc", "rcx", "r11", "memory"
+    );
+    return rax;
+}
+
+void my_exit(int exit_status) {
+    register int64_t rax __asm__ ("rax") = 60;
+    register int rdi __asm__ ("rdi") = exit_status;
+    __asm__ __volatile__ (
+        "syscall"
+        : "+r" (rax)
+        : "r" (rdi)
+        : "cc", "rcx", "r11", "memory"
+    );
+}
+
+void _start(void) {
+    char msg[] = "hello\n";
+    my_exit(my_write(1, msg, sizeof(msg)) != sizeof(msg));
+}
diff --git a/userland/arch/x86_64/c/freestanding/linux/test b/userland/arch/x86_64/c/freestanding/linux/test
new file mode 120000
index 0000000..419df4f
--- /dev/null
+++ b/userland/arch/x86_64/c/freestanding/linux/test
@@ -0,0 +1 @@
+../test
\ No newline at end of file
diff --git a/userland/arch/x86_64/c/freestanding/test b/userland/arch/x86_64/c/freestanding/test
new file mode 120000
index 0000000..419df4f
--- /dev/null
+++ b/userland/arch/x86_64/c/freestanding/test
@@ -0,0 +1 @@
+../test
\ No newline at end of file
diff --git a/userland/arch/x86_64/c/inc.c b/userland/arch/x86_64/c/inc.c
new file mode 100644
index 0000000..5106398
--- /dev/null
+++ b/userland/arch/x86_64/c/inc.c
@@ -0,0 +1,15 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#gcc-inline-assembly */
+
+#include <assert.h>
+#include <inttypes.h>
+
+int main(void) {
+    uint64_t io = 1;
+    __asm__ (
+        "lea 1(%[io]), %[io];"
+        : [io] "+r" (io)
+        :
+        :
+    );
+    assert(io == 2);
+}
diff --git a/userland/arch/x86_64/c/rdtsc.c b/userland/arch/x86_64/c/rdtsc.c
new file mode 100644
index 0000000..219553c
--- /dev/null
+++ b/userland/arch/x86_64/c/rdtsc.c
@@ -0,0 +1,14 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#rdtsc */
+
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+#include <x86intrin.h>
+
+int main(void) {
+    uintmax_t val;
+    val = __rdtsc();
+    printf("%ju\n", val);
+    return EXIT_SUCCESS;
+}
diff --git a/userland/arch/x86_64/c/ring0.c b/userland/arch/x86_64/c/ring0.c
new file mode 100644
index 0000000..821d953
--- /dev/null
+++ b/userland/arch/x86_64/c/ring0.c
@@ -0,0 +1,12 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#ring0 */
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#include <lkmc/ring0.h>
+
+int main(void) {
+    LkmcRing0Regs ring0_regs;
+    lkmc_ring0_get_control_regs(&ring0_regs);
+    return EXIT_SUCCESS;
+}
diff --git a/userland/arch/x86_64/c/scratch.c b/userland/arch/x86_64/c/scratch.c
new file mode 100644
index 0000000..7d91ae2
--- /dev/null
+++ b/userland/arch/x86_64/c/scratch.c
@@ -0,0 +1,22 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#gcc-inline-assembly-scratch-registers */
+
+#include <assert.h>
+#include <inttypes.h>
+
+int main(void) {
+    uint64_t in1 = 0xFFFFFFFF;
+    uint64_t in2 = 1;
+    uint64_t out;
+    uint64_t scratch;
+    __asm__ (
+        "mov %[in2], %[scratch];" /* scratch = in2 */
+        "add %[in1], %[scratch];" /* scratch += in1 */
+        "mov %[scratch], %[out];" /* out = scratch */
+        : [scratch] "=&r" (scratch),
+          [out] "=r" (out)
+        : [in1] "r" (in1),
+          [in2] "r" (in2)
+        :
+    );
+    assert(out == 0x100000000);
+}
diff --git a/userland/arch/x86_64/c/scratch_hardcode.c b/userland/arch/x86_64/c/scratch_hardcode.c
new file mode 100644
index 0000000..2cd9eac
--- /dev/null
+++ b/userland/arch/x86_64/c/scratch_hardcode.c
@@ -0,0 +1,22 @@
+/* This is a worse version of scratch.c with hardcoded scratch.
+ * https://github.com/cirosantilli/linux-kernel-module-cheat#gcc-inline-assembly-scratch-registers
+ */
+
+#include <assert.h>
+#include <inttypes.h>
+
+int main(void) {
+    uint64_t in1 = 0xFFFFFFFF;
+    uint64_t in2 = 1;
+    uint64_t out;
+    __asm__ (
+        "mov %[in2], %%rax;" /* scratch = in2 */
+        "add %[in1], %%rax;" /* scratch += in1 */
+        "mov %%rax, %[out];" /* out = scratch */
+        : [out] "=r" (out)
+        : [in1] "r" (in1),
+          [in2] "r" (in2)
+        : "rax"
+    );
+    assert(out == 0x100000000);
+}
diff --git a/userland/arch/x86_64/c/test b/userland/arch/x86_64/c/test
new file mode 120000
index 0000000..419df4f
--- /dev/null
+++ b/userland/arch/x86_64/c/test
@@ -0,0 +1 @@
+../test
\ No newline at end of file
diff --git a/userland/arch/x86_64/common_arch.h b/userland/arch/x86_64/common_arch.h
new file mode 100644
index 0000000..aa035ea
--- /dev/null
+++ b/userland/arch/x86_64/common_arch.h
@@ -0,0 +1,87 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#userland-assembly-c-standard-library */
+
+#ifndef COMMON_ARCH_H
+#define COMMON_ARCH_H
+
+/* This and other macros may make C function calls, and therefore can destroy
+ * non-callee saved registers. */
+#define ASSERT_EQ(general1, general2) \
+    mov general2, %rdi; \
+    push %rdi; \
+    mov general1, %rdi; \
+    pop %rsi; \
+    ASSERT_EQ_DO(64); \
+;
+
+#define ASSERT_EQ_DO(bits) \
+    call assert_eq_ ## bits; \
+    cmp $0, %rax; \
+    ASSERT(je); \
+;
+
+#define ASSERT_MEMCMP(label1, label2, const_size) \
+    lea label1(%rip), %rdi; \
+    lea label2(%rip), %rsi; \
+    mov const_size, %rdx; \
+    call assert_memcmp; \
+    cmp $0, %rax; \
+    ASSERT(je); \
+;
+
+/* Program entry point.
+ *
+ * Return with EXIT.
+ *
+ * Basically implements an x86_64 prologue:
+ *
+ * - save callee saved registers
+ *   x86_64 explained at: https://stackoverflow.com/questions/18024672/what-registers-are-preserved-through-a-linux-x86-64-function-call/55207335#55207335
+ * - save register arguments for later usage
+ */
+#define ENTRY \
+.text; \
+.global asm_main; \
+asm_main: \
+    push %rbp; \
+    mov %rsp, %rbp; \
+    push %r15; \
+    push %r14; \
+    push %r13; \
+    push %r12; \
+    push %rbx; \
+    push %rdi; \
+    sub $8, %rsp; \
+asm_main_after_prologue: \
+;
+
+/* Meant to be called at the end of ENTRY.*
+ *
+ * Branching to "fail" makes tests fail with exit status 1.
+ *
+ * If EXIT is reached, the program ends successfully.
+ */
+#define EXIT \
+    mov $0, %rax; \
+    jmp pass; \
+fail: \
+    /* -0x30(%rbp) is argument 1 which we pushed at prologue */ \
+    mov -0x30(%rbp), %rbx; \
+    movl %eax, (%rbx); \
+    mov $1, %rax; \
+pass: \
+    add $16, %rsp; \
+    pop %rbx; \
+    pop %r12; \
+    pop %r13; \
+    pop %r14; \
+    pop %r15; \
+    pop %rbp; \
+    ret; \
+;
+
+#define FAIL \
+    mov $__LINE__, %eax; \
+    jmp fail; \
+;
+
+#endif
diff --git a/userland/arch/x86_64/freestanding/hello.S b/userland/arch/x86_64/freestanding/hello.S
deleted file mode 100644
index f58f967..0000000
--- a/userland/arch/x86_64/freestanding/hello.S
+++ /dev/null
@@ -1,19 +0,0 @@
-.data
-    s:
-        .ascii "hello\n"
-        len = . - s
-.text
-    .global _start
-    _start:
-
-    /* Write. */
-    mov $1, %rax
-    mov $1, %rdi
-    mov $s, %rsi
-    mov $len, %rdx
-    syscall
-
-    /* Exit. */
-    mov $60, %rax
-    mov $0, %rdi
-    syscall
diff --git a/userland/arch/x86_64/freestanding/linux/build b/userland/arch/x86_64/freestanding/linux/build
new file mode 120000
index 0000000..ab18017
--- /dev/null
+++ b/userland/arch/x86_64/freestanding/linux/build
@@ -0,0 +1 @@
+../build
\ No newline at end of file
diff --git a/userland/arch/x86_64/freestanding/linux/hello.S b/userland/arch/x86_64/freestanding/linux/hello.S
new file mode 100644
index 0000000..9ae517c
--- /dev/null
+++ b/userland/arch/x86_64/freestanding/linux/hello.S
@@ -0,0 +1,22 @@
+/* x86_64 freestanding Linux hello world
+ * https://github.com/cirosantilli/linux-kernel-module-cheat#linux-system-calls
+ */
+
+.text
+.global _start
+_start:
+asm_main_after_prologue:
+    /* write */
+    mov $1, %rax   /* syscall number */
+    mov $1, %rdi   /* stdout */
+    lea msg(%rip), %rsi  /* buffer */
+    mov $len, %rdx /* len */
+    syscall
+
+    /* exit */
+    mov $60, %rax   /* syscall number */
+    mov $0, %rdi    /* exit status */
+    syscall
+msg:
+    .ascii "hello\n"
+len = . - msg
diff --git a/userland/arch/x86_64/freestanding/linux/test b/userland/arch/x86_64/freestanding/linux/test
new file mode 120000
index 0000000..419df4f
--- /dev/null
+++ b/userland/arch/x86_64/freestanding/linux/test
@@ -0,0 +1 @@
+../test
\ No newline at end of file
diff --git a/userland/arch/x86_64/freestanding/test b/userland/arch/x86_64/freestanding/test
new file mode 120000
index 0000000..419df4f
--- /dev/null
+++ b/userland/arch/x86_64/freestanding/test
@@ -0,0 +1 @@
+../test
\ No newline at end of file
diff --git a/userland/arch/x86_64/gas_data_sizes.S b/userland/arch/x86_64/gas_data_sizes.S
new file mode 100644
index 0000000..459c167
--- /dev/null
+++ b/userland/arch/x86_64/gas_data_sizes.S
@@ -0,0 +1,29 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#gnu-gas-assembler data sizes */
+
+#include "common.h"
+
+ENTRY
+#define ASSERT_DIFF(label1, label2, result) \
+    lea label2(%rip), %rax; \
+    lea label1(%rip), %rbx; \
+    sub %rbx, %rax; \
+    ASSERT_EQ(%rax, $result)
+
+    ASSERT_DIFF(mybyte, myword, 1)
+    ASSERT_DIFF(myword, mylong, 2)
+    ASSERT_DIFF(mylong, myquad, 4)
+    ASSERT_DIFF(myquad, myocta, 8)
+    ASSERT_DIFF(myocta, theend, 16)
+#undef ASSERT_DIF
+EXIT
+mybyte:
+    .byte 0x12
+myword:
+    .word 0x1234
+mylong:
+    .long 0x12345678
+myquad:
+    .quad 0x123456789ABCDEF0
+myocta:
+    .octa 0x123456789ABCDEF0123456789ABCDEF0
+theend:
diff --git a/userland/arch/x86_64/lkmc_assert_eq_fail.S b/userland/arch/x86_64/lkmc_assert_eq_fail.S
new file mode 100644
index 0000000..39a2b6a
--- /dev/null
+++ b/userland/arch/x86_64/lkmc_assert_eq_fail.S
@@ -0,0 +1,16 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#userland-assembly */
+
+#include "common.h"
+
+ENTRY
+    mov $0x123456789ABCDEF0, %r12
+    mov $0x123456789ABCDEF0, %r13
+    ASSERT_EQ(%r12, $0x123456789ABCDEF0)
+    ASSERT_EQ(%r12, %r13)
+    ASSERT_EQ(%r12, myvar)
+    ASSERT_EQ($0x123456789ABCDEF0, %r12)
+    ASSERT_EQ(%r13, %r12)
+    ASSERT_EQ(myvar, %r12)
+    ASSERT_EQ(%r12, $0x123456789ABCDEF1)
+EXIT
+myvar: .quad 0x123456789ABCDEF0
diff --git a/userland/arch/x86_64/lkmc_assert_memcmp_fail.S b/userland/arch/x86_64/lkmc_assert_memcmp_fail.S
new file mode 100644
index 0000000..b3b5bf0
--- /dev/null
+++ b/userland/arch/x86_64/lkmc_assert_memcmp_fail.S
@@ -0,0 +1,11 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#userland-assembly */
+
+#include "common.h"
+
+ENTRY
+    ASSERT_MEMCMP(var0, var1, $0x10)
+    ASSERT_MEMCMP(var0, var2, $0x10)
+EXIT
+    var0: .long 0x11111111, 0x22222222, 0x33333333, 0x44444444
+    var1: .long 0x11111111, 0x22222222, 0x33333333, 0x44444444
+    var2: .long 0x11111111, 0x22222223, 0x23333333, 0x44444444
diff --git a/userland/arch/x86_64/paddq.S b/userland/arch/x86_64/paddq.S
new file mode 100644
index 0000000..fbe23e3
--- /dev/null
+++ b/userland/arch/x86_64/paddq.S
@@ -0,0 +1,38 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#x86-paddq-instruction
+ *
+ * Add a bunch of integers in one go.
+ *
+ * The different variants basically determine if carries get forwarded or not.
+ */
+
+#include "common.h"
+
+ENTRY
+.data
+    input0:       .long 0xF1F1F1F1, 0xF2F2F2F2, 0xF3F3F3F3, 0xF4F4F4F4
+    input1:       .long 0x12121212, 0x13131313, 0x14141414, 0x15151515
+    paddb_expect: .long 0x03030303, 0x05050505, 0x07070707, 0x09090909
+    paddw_expect: .long 0x04030403, 0x06050605, 0x08070807, 0x0A090A09
+    paddd_expect: .long 0x04040403, 0x06060605, 0x08080807, 0x0A0A0A09
+    paddq_expect: .long 0x04040403, 0x06060606, 0x08080807, 0x0A0A0A0A
+.bss
+    output:       .skip 16
+.text
+    movups input1, %xmm1
+#define TEST(size) \
+    movups input0, %xmm0; \
+    padd ## size %xmm1, %xmm0; \
+    movups %xmm0, output; \
+    ASSERT_MEMCMP(output, padd ## size ## _expect, $0x10)
+
+    /* 16x 8-bit */
+    TEST(b)
+    /* 8x 4-bit */
+    TEST(w)
+    /* 4x 8-bit */
+    /*  4x long */
+    TEST(d)
+    /* 2x 16-bit */
+    TEST(q)
+#undef TEST
+EXIT
diff --git a/userland/arch/x86_64/test b/userland/arch/x86_64/test
new file mode 120000
index 0000000..419df4f
--- /dev/null
+++ b/userland/arch/x86_64/test
@@ -0,0 +1 @@
+../test
\ No newline at end of file
diff --git a/userland/assert_fail.c b/userland/assert_fail.c
deleted file mode 120000
index fb1be37..0000000
--- a/userland/assert_fail.c
+++ /dev/null
@@ -1 +0,0 @@
-../baremetal/interactive/assert_fail.c
\ No newline at end of file
diff --git a/userland/c/README.adoc b/userland/c/README.adoc
new file mode 100644
index 0000000..021699d
--- /dev/null
+++ b/userland/c/README.adoc
@@ -0,0 +1 @@
+https://github.com/cirosantilli/linux-kernel-module-cheat#c
diff --git a/userland/c/assert_fail.c b/userland/c/assert_fail.c
new file mode 120000
index 0000000..c1c99d6
--- /dev/null
+++ b/userland/c/assert_fail.c
@@ -0,0 +1 @@
+../../lkmc/assert_fail.c
\ No newline at end of file
diff --git a/userland/c/false.c b/userland/c/false.c
new file mode 100644
index 0000000..1e64a6f
--- /dev/null
+++ b/userland/c/false.c
@@ -0,0 +1,18 @@
+/* Exit with status 1 like the POSIX false utility:
+ * http://pubs.opengroup.org/onlinepubs/9699919799/utilities/false.html
+ *
+ * Can be uesd to test that emulators forward the exit status properly.
+ * https://github.com/cirosantilli/linux-kernel-module-cheat#gem5-syscall-emulation-exit-status
+ */
+
+#include <stdlib.h>
+
+int main(int argc, char **argv) {
+    int ret;
+    if (argc == 1) {
+        ret = 1;
+    } else {
+        ret = strtoull(argv[1], NULL, 0);
+    }
+    return ret;
+}
diff --git a/userland/gcc_hack.c b/userland/c/gcc_hack.c
similarity index 100%
rename from userland/gcc_hack.c
rename to userland/c/gcc_hack.c
diff --git a/userland/c/getchar.c b/userland/c/getchar.c
new file mode 100644
index 0000000..2430e29
--- /dev/null
+++ b/userland/c/getchar.c
@@ -0,0 +1,21 @@
+/* Get on character from stdin, and then print it back out.
+ *
+ * Same as getc(stdin).
+ *
+ * You have to press enter for the character to go through:
+ * https://stackoverflow.com/questions/1798511/how-to-avoid-pressing-enter-with-getchar
+ *
+ * Used at:
+ * https://stackoverflow.com/questions/556405/what-do-real-user-and-sys-mean-in-the-output-of-time1/53937376#53937376
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+
+int main(void) {
+    char c;
+    printf("enter a character: ");
+    c = getchar();
+    printf("you entered: %c\n", c);
+    return EXIT_SUCCESS;
+}
diff --git a/userland/c/hello.c b/userland/c/hello.c
deleted file mode 100644
index 81f6976..0000000
--- a/userland/c/hello.c
+++ /dev/null
@@ -1,9 +0,0 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#sanity-checks */
-
-#include <stdio.h>
-#include <stdlib.h>
-
-int main(void) {
-	puts("hello");
-	return EXIT_SUCCESS;
-}
diff --git a/userland/c/hello.c b/userland/c/hello.c
new file mode 120000
index 0000000..1a462a6
--- /dev/null
+++ b/userland/c/hello.c
@@ -0,0 +1 @@
+../../lkmc/hello.c
\ No newline at end of file
diff --git a/userland/c/infinite_loop.c b/userland/c/infinite_loop.c
new file mode 100644
index 0000000..2ae905f
--- /dev/null
+++ b/userland/c/infinite_loop.c
@@ -0,0 +1,29 @@
+/* Loop infinitely. Print an integer whenever a period is reached:
+ *
+ * ....
+ * ./infinite_loop [period]
+ * ....
+ */
+
+#include <inttypes.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+int main(int argc, char **argv) {
+    uintmax_t i, j, period;
+    if (argc > 1) {
+        period = strtoumax(argv[1], NULL, 10);
+    } else {
+        period = 100000000;
+    }
+    i = 0;
+    j = 0;
+    while (1) {
+        i++;
+        if (i % period == 0) {
+            printf("%ju\n", j);
+            j++;
+        }
+    }
+}
diff --git a/userland/m5ops.c b/userland/c/m5ops.c
similarity index 73%
rename from userland/m5ops.c
rename to userland/c/m5ops.c
index 16b73ab..cc7ecbf 100644
--- a/userland/m5ops.c
+++ b/userland/c/m5ops.c
@@ -4,7 +4,7 @@
 #include <stdio.h>
 #include <stdlib.h>
 
-#include "m5ops.h"
+#include <lkmc/m5ops.h>
 
 int main(int argc, char **argv) {
     char action;
@@ -15,19 +15,19 @@ int main(int argc, char **argv) {
     }
     switch (action) {
         case 'c':
-            m5_checkpoint();
+            LKMC_M5OPS_CHECKPOINT;
             break;
         case 'd':
-            m5_dumpstats();
+            LKMC_M5OPS_DUMPSTATS;
             break;
         case 'e':
-            m5_exit();
+            LKMC_M5OPS_EXIT;
             break;
         case 'f':
-            m5_fail_1();
+            LKMC_M5OPS_FAIL_1;
             break;
         case 'r':
-            m5_resetstats();
+            LKMC_M5OPS_RESETSTATS;
             break;
     }
     return EXIT_SUCCESS;
diff --git a/userland/c/print_argv.c b/userland/c/print_argv.c
new file mode 100644
index 0000000..6984aeb
--- /dev/null
+++ b/userland/c/print_argv.c
@@ -0,0 +1,14 @@
+/* Print each command line argument received, one per line.
+ *
+ * Good sanity check for user mode:
+ * https://github.com/cirosantilli/linux-kernel-module-cheat#qemu-user-mode-getting-started
+ */
+
+#include <stdio.h>
+
+int main(int argc, char **argv) {
+    size_t i;
+    for (i = 0; i < (size_t)argc; ++i)
+        printf("%s\n", argv[i]);
+    return 0;
+}
diff --git a/userland/c/stderr.c b/userland/c/stderr.c
new file mode 100644
index 0000000..b43d067
--- /dev/null
+++ b/userland/c/stderr.c
@@ -0,0 +1,7 @@
+/* Print hello to stderr. */
+
+#include <stdio.h>
+
+int main(void) {
+    fputs("hello\n", stderr);
+}
diff --git a/userland/c/test b/userland/c/test
new file mode 120000
index 0000000..419df4f
--- /dev/null
+++ b/userland/c/test
@@ -0,0 +1 @@
+../test
\ No newline at end of file
diff --git a/userland/count.c b/userland/count.c
deleted file mode 100644
index 9b625ca..0000000
--- a/userland/count.c
+++ /dev/null
@@ -1,19 +0,0 @@
-#define _XOPEN_SOURCE 700
-#include <limits.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <unistd.h>
-
-int main(int argc, char **argv) {
-    unsigned long i = 0, max;
-    if (argc > 1) {
-        max = strtoul(argv[1], NULL, 10);
-    } else {
-        max = ULONG_MAX;
-    }
-    while (i < max) {
-        printf("%lu\n", i);
-        i++;
-        sleep(1);
-    }
-}
diff --git a/userland/cpp/README.adoc b/userland/cpp/README.adoc
index bb60001..dc55289 100644
--- a/userland/cpp/README.adoc
+++ b/userland/cpp/README.adoc
@@ -1 +1 @@
-https://github.com/cirosantilli/linux-kernel-module-cheat#ansi-cpp
+https://github.com/cirosantilli/linux-kernel-module-cheat#cpp
diff --git a/userland/bst_vs_heap.cpp b/userland/cpp/bst_vs_heap.cpp
similarity index 65%
rename from userland/bst_vs_heap.cpp
rename to userland/cpp/bst_vs_heap.cpp
index 2020714..dca4028 100644
--- a/userland/bst_vs_heap.cpp
+++ b/userland/cpp/bst_vs_heap.cpp
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#bst-vs-heap */
+// https://github.com/cirosantilli/linux-kernel-module-cheat#bst-vs-heap
 
 #include <algorithm>
 #include <iostream>
@@ -6,7 +6,7 @@
 #include <random>
 #include <set>
 
-#include "m5ops.h"
+#include <lkmc/m5ops.h>
 
 int main(int argc, char **argv) {
     typedef uint64_t I;
@@ -20,7 +20,7 @@ int main(int argc, char **argv) {
     if (argc > 1) {
         n = std::stoi(argv[1]);
     } else {
-        n = 1000;
+        n = 1;
     }
 
     // Action.
@@ -29,16 +29,16 @@ int main(int argc, char **argv) {
     }
     std::shuffle(randoms.begin(), randoms.end(), std::mt19937(seed));
     for (i = 0; i < n; ++i) {
-    	auto random = randoms[i];
+        auto random = randoms[i];
 
         // Heap.
-		m5_resetstats();
-		heap.emplace(random);
-		m5_dumpstats();
+        LKMC_M5OPS_RESETSTATS;
+        heap.emplace(random);
+        LKMC_M5OPS_DUMPSTATS;
 
         // BST.
-		m5_resetstats();
-		bst.insert(random);
-		m5_dumpstats();
+        LKMC_M5OPS_RESETSTATS;
+        bst.insert(random);
+        LKMC_M5OPS_DUMPSTATS;
     }
 }
diff --git a/userland/cpp/hello.cpp b/userland/cpp/hello.cpp
index 2a435e9..b9ca976 100644
--- a/userland/cpp/hello.cpp
+++ b/userland/cpp/hello.cpp
@@ -1,4 +1,4 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#sanity-checks */
+// https://github.com/cirosantilli/linux-kernel-module-cheat#sanity-checks
 
 #include <iostream>
 
diff --git a/userland/cpp/test b/userland/cpp/test
new file mode 120000
index 0000000..419df4f
--- /dev/null
+++ b/userland/cpp/test
@@ -0,0 +1 @@
+../test
\ No newline at end of file
diff --git a/userland/eigen_hello.cpp b/userland/eigen_hello.cpp
deleted file mode 100644
index ee79d86..0000000
--- a/userland/eigen_hello.cpp
+++ /dev/null
@@ -1,13 +0,0 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#eigen
- * Adapted from: https://eigen.tuxfamily.org/dox/GettingStarted.html
- */
-#include <iostream>
-#include <Eigen/Dense>
-int main() {
-	Eigen::MatrixXd m(2,2);
-	m(0,0) = 3;
-	m(1,0) = 2.5;
-	m(0,1) = -1;
-	m(1,1) = m(1,0) + m(0,1);
-	std::cout << m << std::endl;
-}
diff --git a/userland/false.c b/userland/false.c
deleted file mode 100644
index 5425237..0000000
--- a/userland/false.c
+++ /dev/null
@@ -1,13 +0,0 @@
-/* Test that emulators forward the exit status properly. */
-
-#include <stdlib.h>
-
-int main(int argc, char **argv) {
-    int ret;
-    if (argc == 1) {
-        ret = 1;
-    } else {
-        ret = strtoull(argv[1], NULL, 0);
-    }
-	return ret;
-}
diff --git a/userland/gcc/empty_struct.c b/userland/gcc/empty_struct.c
index 1cf5bc8..2a9a2b6 100644
--- a/userland/gcc/empty_struct.c
+++ b/userland/gcc/empty_struct.c
@@ -1,4 +1,4 @@
-/* Empty struct */
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#c-empty-struct */
 
 #include <assert.h>
 #include <stdlib.h>
diff --git a/userland/gcc/openmp.c b/userland/gcc/openmp.c
new file mode 100644
index 0000000..32b920b
--- /dev/null
+++ b/userland/gcc/openmp.c
@@ -0,0 +1,20 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#openmp */
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#include <omp.h>
+
+int main (void) {
+    int nthreads, tid;
+#pragma omp parallel private(nthreads, tid)
+    {
+        tid = omp_get_thread_num();
+        printf("Hello World from thread = %d\n", tid);
+        if (tid == 0) {
+            nthreads = omp_get_num_threads();
+            printf("Number of threads = %d\n", nthreads);
+        }
+    }
+    return EXIT_SUCCESS;
+}
diff --git a/userland/gcc/test b/userland/gcc/test
new file mode 120000
index 0000000..419df4f
--- /dev/null
+++ b/userland/gcc/test
@@ -0,0 +1 @@
+../test
\ No newline at end of file
diff --git a/userland/init_env_poweroff.c b/userland/init_env_poweroff.c
deleted file mode 100644
index dc275f9..0000000
--- a/userland/init_env_poweroff.c
+++ /dev/null
@@ -1,26 +0,0 @@
-#define _XOPEN_SOURCE 700
-#include <stdio.h>
-#include <sys/reboot.h>
-#include <unistd.h>
-
-int main(int argc, char **argv)
-{
-	int i;
-
-	puts("args:");
-	for (i = 0; i < argc; ++i)
-		puts(argv[i]);
-	puts("");
-
-	puts("env:");
-	extern char **environ;
-	char **env = environ;
-	while (*env) {
-		printf("%s\n", *env);
-		env++;
-	}
-	puts("");
-
-	/* Poweroff. */
-	reboot(RB_POWER_OFF);
-}
diff --git a/userland/ioctl.c b/userland/ioctl.c
deleted file mode 100644
index 7c0a811..0000000
--- a/userland/ioctl.c
+++ /dev/null
@@ -1,67 +0,0 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#ioctl */
-
-#define _GNU_SOURCE
-#include <errno.h>
-#include <fcntl.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <sys/ioctl.h>
-#include <sys/stat.h>
-#include <sys/types.h>
-#include <unistd.h>
-
-#include "../include/ioctl.h"
-
-int main(int argc, char **argv)
-{
-	char *ioctl_path;
-	int fd, request, arg0, arg1, arg_int, ret;
-	lkmc_ioctl_struct arg_struct;
-
-	if (argc < 2) {
-		puts("Usage: ./prog <ioctl-file> <request> [<arg>...]");
-		return EXIT_FAILURE;
-	}
-	ioctl_path = argv[1];
-	request = strtol(argv[2], NULL, 10);
-	if (argc > 3) {
-		arg0 = strtol(argv[3], NULL, 10);
-	}
-	if (argc > 4) {
-		arg1 = strtol(argv[4], NULL, 10);
-	}
-
-	fd = open(ioctl_path, O_RDONLY);
-	if (fd == -1) {
-		perror("open");
-		return EXIT_FAILURE;
-	}
-	switch (request)
-	{
-		case 0:
-			arg_int = arg0;
-			ret = ioctl(fd, LKMC_IOCTL_INC, &arg_int);
-			if (ret != -1) {
-				printf("%d\n", arg_int);
-			}
-			break;
-		case 1:
-			arg_struct.i = arg0;
-			arg_struct.j = arg1;
-			ret = ioctl(fd, LKMC_IOCTL_INC_DEC, &arg_struct);
-			if (ret != -1) {
-				printf("%d %d\n", arg_struct.i, arg_struct.j);
-			}
-			break;
-		default:
-			puts("error: unknown request");
-			return EXIT_FAILURE;
-	}
-	if (ret == -1) {
-		perror("ioctl");
-		printf("errno = %d\n", errno);
-		return EXIT_FAILURE;
-	}
-	close(fd);
-	return EXIT_SUCCESS;
-}
diff --git a/userland/kernel_modules/anonymous_inode.c b/userland/kernel_modules/anonymous_inode.c
new file mode 100644
index 0000000..3d16086
--- /dev/null
+++ b/userland/kernel_modules/anonymous_inode.c
@@ -0,0 +1,45 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#anonymous-inode */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/ioctl.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <unistd.h> /* sleep */
+
+#include <lkmc/anonymous_inode.h>
+
+int main(int argc, char **argv) {
+    char buf[1024];
+    int fd_ioctl, fd_ioctl_anon, ret;
+    size_t i, nreads;
+
+    if (argc < 2) {
+        puts("Usage: ./prog <ioctl-file> [<nreads>]");
+        return EXIT_FAILURE;
+    } else if (argc > 2) {
+        nreads = strtol(argv[2], NULL, 10);
+    } else {
+        nreads = 3;
+    }
+    fd_ioctl = open(argv[1], O_RDONLY);
+    if (fd_ioctl == -1) {
+        perror("open");
+        return EXIT_FAILURE;
+    }
+    ret = ioctl(fd_ioctl, LKMC_ANONYMOUS_INODE_GET_FD, &fd_ioctl_anon);
+    if (ret == -1) {
+        perror("ioctl");
+        return EXIT_FAILURE;
+    }
+    for (i = 0; i < nreads; ++i) {
+        ret = read(fd_ioctl_anon, buf, sizeof(buf));
+        printf("%.*s\n", ret, buf);
+    }
+    close(fd_ioctl_anon);
+    close(fd_ioctl);
+    return EXIT_SUCCESS;
+}
diff --git a/userland/kernel_modules/ioctl.c b/userland/kernel_modules/ioctl.c
new file mode 100644
index 0000000..7c149ad
--- /dev/null
+++ b/userland/kernel_modules/ioctl.c
@@ -0,0 +1,66 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#ioctl */
+
+#define _GNU_SOURCE
+#include <errno.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/ioctl.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+#include <lkmc/ioctl.h>
+
+int main(int argc, char **argv) {
+    char *ioctl_path;
+    int fd, request, arg0, arg1, arg_int, ret;
+    lkmc_ioctl_struct arg_struct;
+
+    if (argc < 2) {
+        puts("Usage: ./prog <ioctl-file> <request> [<arg>...]");
+        return EXIT_FAILURE;
+    }
+    ioctl_path = argv[1];
+    request = strtol(argv[2], NULL, 10);
+    if (argc > 3) {
+        arg0 = strtol(argv[3], NULL, 10);
+    }
+    if (argc > 4) {
+        arg1 = strtol(argv[4], NULL, 10);
+    }
+
+    fd = open(ioctl_path, O_RDONLY);
+    if (fd == -1) {
+        perror("open");
+        return EXIT_FAILURE;
+    }
+    switch (request)
+    {
+        case 0:
+            arg_int = arg0;
+            ret = ioctl(fd, LKMC_IOCTL_INC, &arg_int);
+            if (ret != -1) {
+                printf("%d\n", arg_int);
+            }
+            break;
+        case 1:
+            arg_struct.i = arg0;
+            arg_struct.j = arg1;
+            ret = ioctl(fd, LKMC_IOCTL_INC_DEC, &arg_struct);
+            if (ret != -1) {
+                printf("%d %d\n", arg_struct.i, arg_struct.j);
+            }
+            break;
+        default:
+            puts("error: unknown request");
+            return EXIT_FAILURE;
+    }
+    if (ret == -1) {
+        perror("ioctl");
+        printf("errno = %d\n", errno);
+        return EXIT_FAILURE;
+    }
+    close(fd);
+    return EXIT_SUCCESS;
+}
diff --git a/userland/kernel_modules/mmap.c b/userland/kernel_modules/mmap.c
new file mode 100644
index 0000000..afdaa9e
--- /dev/null
+++ b/userland/kernel_modules/mmap.c
@@ -0,0 +1,93 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#mmap */
+
+#define _XOPEN_SOURCE 700
+#include <assert.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h> /* uintmax_t */
+#include <string.h>
+#include <sys/mman.h>
+#include <unistd.h> /* sysconf */
+
+#include <lkmc/pagemap.h> /* lkmc_pagemap_virt_to_phys_user */
+
+enum { BUFFER_SIZE = 4 };
+
+int main(int argc, char **argv) {
+    int fd;
+    long page_size;
+    char *address1, *address2;
+    char buf[BUFFER_SIZE];
+    uintptr_t paddr;
+
+    if (argc < 2) {
+        printf("Usage: %s <mmap_file>\n", argv[0]);
+        return EXIT_FAILURE;
+    }
+    page_size = sysconf(_SC_PAGE_SIZE);
+    printf("open pathname = %s\n", argv[1]);
+    fd = open(argv[1], O_RDWR | O_SYNC);
+    if (fd < 0) {
+        perror("open");
+        assert(0);
+    }
+    printf("fd = %d\n", fd);
+
+    /* mmap twice for double fun. */
+    puts("mmap 1");
+    address1 = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+    if (address1 == MAP_FAILED) {
+        perror("mmap");
+        assert(0);
+    }
+    puts("mmap 2");
+    address2 = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+    if (address2 == MAP_FAILED) {
+        perror("mmap");
+        return EXIT_FAILURE;
+    }
+    assert(address1 != address2);
+
+    /* Read and modify memory. */
+    puts("access 1");
+    assert(!strcmp(address1, "asdf"));
+    /* vm_fault */
+    puts("access 2");
+    assert(!strcmp(address2, "asdf"));
+    /* vm_fault */
+    strcpy(address1, "qwer");
+    /* Also modified. So both virtual addresses point to the same physical address. */
+    assert(!strcmp(address2, "qwer"));
+
+    /* Check that the physical addresses are the same.
+     * They are, but TODO why virt_to_phys on kernel gives a different value? */
+    assert(!lkmc_pagemap_virt_to_phys_user(&paddr, getpid(), (uintptr_t)address1));
+    printf("paddr1 = 0x%jx\n", (uintmax_t)paddr);
+    assert(!lkmc_pagemap_virt_to_phys_user(&paddr, getpid(), (uintptr_t)address2));
+    printf("paddr2 = 0x%jx\n", (uintmax_t)paddr);
+
+    /* Check that modifications made from userland are also visible from the kernel. */
+    read(fd, buf, BUFFER_SIZE);
+    assert(!memcmp(buf, "qwer", BUFFER_SIZE));
+
+    /* Modify the data from the kernel, and check that the change is visible from userland. */
+    write(fd, "zxcv", 4);
+    assert(!strcmp(address1, "zxcv"));
+    assert(!strcmp(address2, "zxcv"));
+
+    /* Cleanup. */
+    puts("munmap 1");
+    if (munmap(address1, page_size)) {
+        perror("munmap");
+        assert(0);
+    }
+    puts("munmap 2");
+    if (munmap(address2, page_size)) {
+        perror("munmap");
+        assert(0);
+    }
+    puts("close");
+    close(fd);
+    return EXIT_SUCCESS;
+}
diff --git a/userland/netlink.c b/userland/kernel_modules/netlink.c
similarity index 97%
rename from userland/netlink.c
rename to userland/kernel_modules/netlink.c
index fa37cfb..f751ed4 100644
--- a/userland/netlink.c
+++ b/userland/kernel_modules/netlink.c
@@ -7,7 +7,7 @@
 #include <sys/socket.h>
 #include <unistd.h>
 
-#include "../include/netlink.h"
+#include <lkmc/netlink.h>
 
 #define MAX_PAYLOAD 1024
 
diff --git a/userland/kernel_modules/poll.c b/userland/kernel_modules/poll.c
new file mode 100644
index 0000000..491c4d5
--- /dev/null
+++ b/userland/kernel_modules/poll.c
@@ -0,0 +1,41 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#poll */
+
+#define _XOPEN_SOURCE 700
+#include <assert.h>
+#include <fcntl.h> /* creat, O_CREAT */
+#include <poll.h> /* poll */
+#include <stdio.h> /* printf, puts, snprintf */
+#include <stdlib.h> /* EXIT_FAILURE, EXIT_SUCCESS */
+#include <unistd.h> /* read */
+
+int main(int argc, char **argv) {
+    char buf[1024];
+    int fd, i, n;
+    short revents;
+    struct pollfd pfd;
+
+    if (argc < 2) {
+        fprintf(stderr, "usage: %s <poll-device>\n", argv[0]);
+        exit(EXIT_FAILURE);
+    }
+    fd = open(argv[1], O_RDONLY | O_NONBLOCK);
+    if (fd == -1) {
+        perror("open");
+        exit(EXIT_FAILURE);
+    }
+    pfd.fd = fd;
+    pfd.events = POLLIN;
+    while (1) {
+        puts("loop");
+        i = poll(&pfd, 1, -1);
+        if (i == -1) {
+            perror("poll");
+            assert(0);
+        }
+        revents = pfd.revents;
+        if (revents & POLLIN) {
+            n = read(pfd.fd, buf, sizeof(buf));
+            printf("POLLIN n=%d buf=%.*s\n", n, n, buf);
+        }
+    }
+}
diff --git a/userland/kernel_modules/test b/userland/kernel_modules/test
new file mode 120000
index 0000000..419df4f
--- /dev/null
+++ b/userland/kernel_modules/test
@@ -0,0 +1 @@
+../test
\ No newline at end of file
diff --git a/userland/uio_read.c b/userland/kernel_modules/uio_read.c
similarity index 89%
rename from userland/uio_read.c
rename to userland/kernel_modules/uio_read.c
index 6a272f1..f246861 100644
--- a/userland/uio_read.c
+++ b/userland/kernel_modules/uio_read.c
@@ -16,8 +16,7 @@
 #include <assert.h>
 #include <sys/mman.h>
 
-int main(int argc, char **argv)
-{
+int main(int argc, char **argv) {
     char *dev = "/dev/uio0";
     if (argc > 1) {
         dev = argv[1];
@@ -30,12 +29,14 @@ int main(int argc, char **argv)
     }
 
     /* TODO not supported by this kernel module? */
-	/*int *addr = mmap(NULL, sysconf(_SC_PAGE_SIZE), PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);*/
-	/*if (addr == MAP_FAILED) {*/
-		/*perror("mmap");*/
-		/*assert(0);*/
-	/*}*/
-	/**addr = 0x12345678;*/
+#if 0
+    int *addr = mmap(NULL, sysconf(_SC_PAGE_SIZE), PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
+    if (addr == MAP_FAILED) {
+        perror("mmap");
+        assert(0);
+    }
+    *addr = 0x12345678;
+#endif
 
     while (1) {
         uint32_t info = 1;
@@ -66,8 +67,7 @@ int main(int argc, char **argv)
 #include <unistd.h>
 #include <unistd.h>
 
-int main(void)
-{
+int main(void) {
     int uiofd;
     int configfd;
     int err;
diff --git a/userland/libs/README.adoc b/userland/libs/README.adoc
new file mode 100644
index 0000000..4a85bf0
--- /dev/null
+++ b/userland/libs/README.adoc
@@ -0,0 +1,3 @@
+Examples in this directory rely on non-libc libraries.
+
+Each example is prefixed by an identifier of the library it depends on.
diff --git a/userland/libs/build b/userland/libs/build
new file mode 120000
index 0000000..ab18017
--- /dev/null
+++ b/userland/libs/build
@@ -0,0 +1 @@
+../build
\ No newline at end of file
diff --git a/userland/libs/eigen/build b/userland/libs/eigen/build
new file mode 120000
index 0000000..ab18017
--- /dev/null
+++ b/userland/libs/eigen/build
@@ -0,0 +1 @@
+../build
\ No newline at end of file
diff --git a/userland/libs/eigen/hello.cpp b/userland/libs/eigen/hello.cpp
new file mode 100644
index 0000000..3d8d00c
--- /dev/null
+++ b/userland/libs/eigen/hello.cpp
@@ -0,0 +1,15 @@
+// https://github.com/cirosantilli/linux-kernel-module-cheat#eigen
+// Adapted from: https://eigen.tuxfamily.org/dox/GettingStarted.html
+
+#include <iostream>
+
+#include <Eigen/Dense>
+
+int main() {
+    Eigen::MatrixXd m(2,2);
+    m(0,0) = 3;
+    m(1,0) = 2.5;
+    m(0,1) = -1;
+    m(1,1) = m(1,0) + m(0,1);
+    std::cout << m << std::endl;
+}
diff --git a/userland/libs/eigen/test b/userland/libs/eigen/test
new file mode 120000
index 0000000..419df4f
--- /dev/null
+++ b/userland/libs/eigen/test
@@ -0,0 +1 @@
+../test
\ No newline at end of file
diff --git a/userland/libs/libdrm/build b/userland/libs/libdrm/build
new file mode 120000
index 0000000..ab18017
--- /dev/null
+++ b/userland/libs/libdrm/build
@@ -0,0 +1 @@
+../build
\ No newline at end of file
diff --git a/userland/libdrm_modeset.c b/userland/libs/libdrm/modeset.c
similarity index 59%
rename from userland/libdrm_modeset.c
rename to userland/libs/libdrm/modeset.c
index 69e61eb..ca54f29 100644
--- a/userland/libdrm_modeset.c
+++ b/userland/libs/libdrm/modeset.c
@@ -45,10 +45,10 @@
 
 struct modeset_dev;
 static int modeset_find_crtc(int fd, drmModeRes *res, drmModeConnector *conn,
-			     struct modeset_dev *dev);
+                 struct modeset_dev *dev);
 static int modeset_create_fb(int fd, struct modeset_dev *dev);
 static int modeset_setup_dev(int fd, drmModeRes *res, drmModeConnector *conn,
-			     struct modeset_dev *dev);
+                 struct modeset_dev *dev);
 static int modeset_open(int *out, const char *node);
 static int modeset_prepare(int fd);
 static void modeset_draw(void);
@@ -81,26 +81,26 @@ static void modeset_cleanup(int fd);
 
 static int modeset_open(int *out, const char *node)
 {
-	int fd, ret;
-	uint64_t has_dumb;
+    int fd, ret;
+    uint64_t has_dumb;
 
-	fd = open(node, O_RDWR | O_CLOEXEC);
-	if (fd < 0) {
-		ret = -errno;
-		fprintf(stderr, "cannot open '%s': %m\n", node);
-		return ret;
-	}
+    fd = open(node, O_RDWR | O_CLOEXEC);
+    if (fd < 0) {
+        ret = -errno;
+        fprintf(stderr, "cannot open '%s': %s\n", node, strerror(errno));
+        return ret;
+    }
 
-	if (drmGetCap(fd, DRM_CAP_DUMB_BUFFER, &has_dumb) < 0 ||
-	    !has_dumb) {
-		fprintf(stderr, "drm device '%s' does not support dumb buffers\n",
-			node);
-		close(fd);
-		return -EOPNOTSUPP;
-	}
+    if (drmGetCap(fd, DRM_CAP_DUMB_BUFFER, &has_dumb) < 0 ||
+        !has_dumb) {
+        fprintf(stderr, "drm device '%s' does not support dumb buffers\n",
+            node);
+        close(fd);
+        return -EOPNOTSUPP;
+    }
 
-	*out = fd;
-	return 0;
+    *out = fd;
+    return 0;
 }
 
 /*
@@ -149,20 +149,20 @@ static int modeset_open(int *out, const char *node)
  */
 
 struct modeset_dev {
-	struct modeset_dev *next;
+    struct modeset_dev *next;
 
-	uint32_t width;
-	uint32_t height;
-	uint32_t stride;
-	uint32_t size;
-	uint32_t handle;
-	uint8_t *map;
+    uint32_t width;
+    uint32_t height;
+    uint32_t stride;
+    uint32_t size;
+    uint32_t handle;
+    uint8_t *map;
 
-	drmModeModeInfo mode;
-	uint32_t fb;
-	uint32_t conn;
-	uint32_t crtc;
-	drmModeCrtc *saved_crtc;
+    drmModeModeInfo mode;
+    uint32_t fb;
+    uint32_t conn;
+    uint32_t crtc;
+    drmModeCrtc *saved_crtc;
 };
 
 static struct modeset_dev *modeset_list = NULL;
@@ -188,57 +188,57 @@ static struct modeset_dev *modeset_list = NULL;
 
 static int modeset_prepare(int fd)
 {
-	drmModeRes *res;
-	drmModeConnector *conn;
-	unsigned int i;
-	struct modeset_dev *dev;
-	int ret;
+    drmModeRes *res;
+    drmModeConnector *conn;
+    unsigned int i;
+    struct modeset_dev *dev;
+    int ret;
 
-	/* retrieve resources */
-	res = drmModeGetResources(fd);
-	if (!res) {
-		fprintf(stderr, "cannot retrieve DRM resources (%d): %m\n",
-			errno);
-		return -errno;
-	}
+    /* retrieve resources */
+    res = drmModeGetResources(fd);
+    if (!res) {
+        fprintf(stderr, "cannot retrieve DRM resources (%d): %s\n",
+            errno, strerror(errno));
+        return -errno;
+    }
 
-	/* iterate all connectors */
-	for (i = 0; i < (unsigned int)res->count_connectors; ++i) {
-		/* get information for each connector */
-		conn = drmModeGetConnector(fd, res->connectors[i]);
-		if (!conn) {
-			fprintf(stderr, "cannot retrieve DRM connector %u:%u (%d): %m\n",
-				i, res->connectors[i], errno);
-			continue;
-		}
+    /* iterate all connectors */
+    for (i = 0; i < (unsigned int)res->count_connectors; ++i) {
+        /* get information for each connector */
+        conn = drmModeGetConnector(fd, res->connectors[i]);
+        if (!conn) {
+            fprintf(stderr, "cannot retrieve DRM connector %u:%u (%d): %s\n",
+                i, res->connectors[i], errno, strerror(errno));
+            continue;
+        }
 
-		/* create a device structure */
-		dev = malloc(sizeof(*dev));
-		memset(dev, 0, sizeof(*dev));
-		dev->conn = conn->connector_id;
+        /* create a device structure */
+        dev = malloc(sizeof(*dev));
+        memset(dev, 0, sizeof(*dev));
+        dev->conn = conn->connector_id;
 
-		/* call helper function to prepare this connector */
-		ret = modeset_setup_dev(fd, res, conn, dev);
-		if (ret) {
-			if (ret != -ENOENT) {
-				errno = -ret;
-				fprintf(stderr, "cannot setup device for connector %u:%u (%d): %m\n",
-					i, res->connectors[i], errno);
-			}
-			free(dev);
-			drmModeFreeConnector(conn);
-			continue;
-		}
+        /* call helper function to prepare this connector */
+        ret = modeset_setup_dev(fd, res, conn, dev);
+        if (ret) {
+            if (ret != -ENOENT) {
+                errno = -ret;
+                fprintf(stderr, "cannot setup device for connector %u:%u (%d): %s\n",
+                    i, res->connectors[i], errno, strerror(errno));
+            }
+            free(dev);
+            drmModeFreeConnector(conn);
+            continue;
+        }
 
-		/* free connector data and link device into global list */
-		drmModeFreeConnector(conn);
-		dev->next = modeset_list;
-		modeset_list = dev;
-	}
+        /* free connector data and link device into global list */
+        drmModeFreeConnector(conn);
+        dev->next = modeset_list;
+        modeset_list = dev;
+    }
 
-	/* free resources again */
-	drmModeFreeResources(res);
-	return 0;
+    /* free resources again */
+    drmModeFreeResources(res);
+    return 0;
 }
 
 /*
@@ -271,48 +271,48 @@ static int modeset_prepare(int fd)
  */
 
 static int modeset_setup_dev(int fd, drmModeRes *res, drmModeConnector *conn,
-			     struct modeset_dev *dev)
+                 struct modeset_dev *dev)
 {
-	int ret;
+    int ret;
 
-	/* check if a monitor is connected */
-	if (conn->connection != DRM_MODE_CONNECTED) {
-		fprintf(stderr, "ignoring unused connector %u\n",
-			conn->connector_id);
-		return -ENOENT;
-	}
+    /* check if a monitor is connected */
+    if (conn->connection != DRM_MODE_CONNECTED) {
+        fprintf(stderr, "ignoring unused connector %u\n",
+            conn->connector_id);
+        return -ENOENT;
+    }
 
-	/* check if there is at least one valid mode */
-	if (conn->count_modes == 0) {
-		fprintf(stderr, "no valid mode for connector %u\n",
-			conn->connector_id);
-		return -EFAULT;
-	}
+    /* check if there is at least one valid mode */
+    if (conn->count_modes == 0) {
+        fprintf(stderr, "no valid mode for connector %u\n",
+            conn->connector_id);
+        return -EFAULT;
+    }
 
-	/* copy the mode information into our device structure */
-	memcpy(&dev->mode, &conn->modes[0], sizeof(dev->mode));
-	dev->width = conn->modes[0].hdisplay;
-	dev->height = conn->modes[0].vdisplay;
-	fprintf(stderr, "mode for connector %u is %ux%u\n",
-		conn->connector_id, dev->width, dev->height);
+    /* copy the mode information into our device structure */
+    memcpy(&dev->mode, &conn->modes[0], sizeof(dev->mode));
+    dev->width = conn->modes[0].hdisplay;
+    dev->height = conn->modes[0].vdisplay;
+    fprintf(stderr, "mode for connector %u is %ux%u\n",
+        conn->connector_id, dev->width, dev->height);
 
-	/* find a crtc for this connector */
-	ret = modeset_find_crtc(fd, res, conn, dev);
-	if (ret) {
-		fprintf(stderr, "no valid crtc for connector %u\n",
-			conn->connector_id);
-		return ret;
-	}
+    /* find a crtc for this connector */
+    ret = modeset_find_crtc(fd, res, conn, dev);
+    if (ret) {
+        fprintf(stderr, "no valid crtc for connector %u\n",
+            conn->connector_id);
+        return ret;
+    }
 
-	/* create a framebuffer for this CRTC */
-	ret = modeset_create_fb(fd, dev);
-	if (ret) {
-		fprintf(stderr, "cannot create framebuffer for connector %u\n",
-			conn->connector_id);
-		return ret;
-	}
+    /* create a framebuffer for this CRTC */
+    ret = modeset_create_fb(fd, dev);
+    if (ret) {
+        fprintf(stderr, "cannot create framebuffer for connector %u\n",
+            conn->connector_id);
+        return ret;
+    }
 
-	return 0;
+    return 0;
 }
 
 /*
@@ -338,80 +338,80 @@ static int modeset_setup_dev(int fd, drmModeRes *res, drmModeConnector *conn,
  */
 
 static int modeset_find_crtc(int fd, drmModeRes *res, drmModeConnector *conn,
-			     struct modeset_dev *dev)
+                 struct modeset_dev *dev)
 {
-	drmModeEncoder *enc;
-	unsigned int i, j;
-	int32_t crtc;
-	struct modeset_dev *iter;
+    drmModeEncoder *enc;
+    unsigned int i, j;
+    int32_t crtc;
+    struct modeset_dev *iter;
 
-	/* first try the currently conected encoder+crtc */
-	if (conn->encoder_id)
-		enc = drmModeGetEncoder(fd, conn->encoder_id);
-	else
-		enc = NULL;
+    /* first try the currently conected encoder+crtc */
+    if (conn->encoder_id)
+        enc = drmModeGetEncoder(fd, conn->encoder_id);
+    else
+        enc = NULL;
 
-	if (enc) {
-		if (enc->crtc_id) {
-			crtc = enc->crtc_id;
-			for (iter = modeset_list; iter; iter = iter->next) {
-				if ((int32_t)iter->crtc == crtc) {
-					crtc = -1;
-					break;
-				}
-			}
+    if (enc) {
+        if (enc->crtc_id) {
+            crtc = enc->crtc_id;
+            for (iter = modeset_list; iter; iter = iter->next) {
+                if ((int32_t)iter->crtc == crtc) {
+                    crtc = -1;
+                    break;
+                }
+            }
 
-			if (crtc >= 0) {
-				drmModeFreeEncoder(enc);
-				dev->crtc = crtc;
-				return 0;
-			}
-		}
+            if (crtc >= 0) {
+                drmModeFreeEncoder(enc);
+                dev->crtc = crtc;
+                return 0;
+            }
+        }
 
-		drmModeFreeEncoder(enc);
-	}
+        drmModeFreeEncoder(enc);
+    }
 
-	/* If the connector is not currently bound to an encoder or if the
-	 * encoder+crtc is already used by another connector (actually unlikely
-	 * but lets be safe), iterate all other available encoders to find a
-	 * matching CRTC. */
-	for (i = 0; i < (unsigned int)conn->count_encoders; ++i) {
-		enc = drmModeGetEncoder(fd, conn->encoders[i]);
-		if (!enc) {
-			fprintf(stderr, "cannot retrieve encoder %u:%u (%d): %m\n",
-				i, conn->encoders[i], errno);
-			continue;
-		}
+    /* If the connector is not currently bound to an encoder or if the
+     * encoder+crtc is already used by another connector (actually unlikely
+     * but lets be safe), iterate all other available encoders to find a
+     * matching CRTC. */
+    for (i = 0; i < (unsigned int)conn->count_encoders; ++i) {
+        enc = drmModeGetEncoder(fd, conn->encoders[i]);
+        if (!enc) {
+            fprintf(stderr, "cannot retrieve encoder %u:%u (%d): %s\n",
+                i, conn->encoders[i], errno, strerror(errno));
+            continue;
+        }
 
-		/* iterate all global CRTCs */
-		for (j = 0; j < (unsigned int)res->count_crtcs; ++j) {
-			/* check whether this CRTC works with the encoder */
-			if (!(enc->possible_crtcs & (1 << j)))
-				continue;
+        /* iterate all global CRTCs */
+        for (j = 0; j < (unsigned int)res->count_crtcs; ++j) {
+            /* check whether this CRTC works with the encoder */
+            if (!(enc->possible_crtcs & (1 << j)))
+                continue;
 
-			/* check that no other device already uses this CRTC */
-			crtc = res->crtcs[j];
-			for (iter = modeset_list; iter; iter = iter->next) {
-				if ((int32_t)iter->crtc == crtc) {
-					crtc = -1;
-					break;
-				}
-			}
+            /* check that no other device already uses this CRTC */
+            crtc = res->crtcs[j];
+            for (iter = modeset_list; iter; iter = iter->next) {
+                if ((int32_t)iter->crtc == crtc) {
+                    crtc = -1;
+                    break;
+                }
+            }
 
-			/* we have found a CRTC, so save it and return */
-			if (crtc >= 0) {
-				drmModeFreeEncoder(enc);
-				dev->crtc = crtc;
-				return 0;
-			}
-		}
+            /* we have found a CRTC, so save it and return */
+            if (crtc >= 0) {
+                drmModeFreeEncoder(enc);
+                dev->crtc = crtc;
+                return 0;
+            }
+        }
 
-		drmModeFreeEncoder(enc);
-	}
+        drmModeFreeEncoder(enc);
+    }
 
-	fprintf(stderr, "cannot find suitable CRTC for connector %u\n",
-		conn->connector_id);
-	return -ENOENT;
+    fprintf(stderr, "cannot find suitable CRTC for connector %u\n",
+        conn->connector_id);
+    return -ENOENT;
 }
 
 /*
@@ -441,69 +441,69 @@ static int modeset_find_crtc(int fd, drmModeRes *res, drmModeConnector *conn,
 
 static int modeset_create_fb(int fd, struct modeset_dev *dev)
 {
-	struct drm_mode_create_dumb creq;
-	struct drm_mode_destroy_dumb dreq;
-	struct drm_mode_map_dumb mreq;
-	int ret;
+    struct drm_mode_create_dumb creq;
+    struct drm_mode_destroy_dumb dreq;
+    struct drm_mode_map_dumb mreq;
+    int ret;
 
-	/* create dumb buffer */
-	memset(&creq, 0, sizeof(creq));
-	creq.width = dev->width;
-	creq.height = dev->height;
-	creq.bpp = 32;
-	ret = drmIoctl(fd, DRM_IOCTL_MODE_CREATE_DUMB, &creq);
-	if (ret < 0) {
-		fprintf(stderr, "cannot create dumb buffer (%d): %m\n",
-			errno);
-		return -errno;
-	}
-	dev->stride = creq.pitch;
-	dev->size = creq.size;
-	dev->handle = creq.handle;
+    /* create dumb buffer */
+    memset(&creq, 0, sizeof(creq));
+    creq.width = dev->width;
+    creq.height = dev->height;
+    creq.bpp = 32;
+    ret = drmIoctl(fd, DRM_IOCTL_MODE_CREATE_DUMB, &creq);
+    if (ret < 0) {
+        fprintf(stderr, "cannot create dumb buffer (%d): %s\n",
+            errno, strerror(errno));
+        return -errno;
+    }
+    dev->stride = creq.pitch;
+    dev->size = creq.size;
+    dev->handle = creq.handle;
 
-	/* create framebuffer object for the dumb-buffer */
-	ret = drmModeAddFB(fd, dev->width, dev->height, 24, 32, dev->stride,
-			   dev->handle, &dev->fb);
-	if (ret) {
-		fprintf(stderr, "cannot create framebuffer (%d): %m\n",
-			errno);
-		ret = -errno;
-		goto err_destroy;
-	}
+    /* create framebuffer object for the dumb-buffer */
+    ret = drmModeAddFB(fd, dev->width, dev->height, 24, 32, dev->stride,
+               dev->handle, &dev->fb);
+    if (ret) {
+        fprintf(stderr, "cannot create framebuffer (%d): %s\n",
+            errno, strerror(errno));
+        ret = -errno;
+        goto err_destroy;
+    }
 
-	/* prepare buffer for memory mapping */
-	memset(&mreq, 0, sizeof(mreq));
-	mreq.handle = dev->handle;
-	ret = drmIoctl(fd, DRM_IOCTL_MODE_MAP_DUMB, &mreq);
-	if (ret) {
-		fprintf(stderr, "cannot map dumb buffer (%d): %m\n",
-			errno);
-		ret = -errno;
-		goto err_fb;
-	}
+    /* prepare buffer for memory mapping */
+    memset(&mreq, 0, sizeof(mreq));
+    mreq.handle = dev->handle;
+    ret = drmIoctl(fd, DRM_IOCTL_MODE_MAP_DUMB, &mreq);
+    if (ret) {
+        fprintf(stderr, "cannot map dumb buffer (%d): %s\n",
+            errno, strerror(errno));
+        ret = -errno;
+        goto err_fb;
+    }
 
-	/* perform actual memory mapping */
-	dev->map = mmap(0, dev->size, PROT_READ | PROT_WRITE, MAP_SHARED,
-		        fd, mreq.offset);
-	if (dev->map == MAP_FAILED) {
-		fprintf(stderr, "cannot mmap dumb buffer (%d): %m\n",
-			errno);
-		ret = -errno;
-		goto err_fb;
-	}
+    /* perform actual memory mapping */
+    dev->map = mmap(0, dev->size, PROT_READ | PROT_WRITE, MAP_SHARED,
+                fd, mreq.offset);
+    if (dev->map == MAP_FAILED) {
+        fprintf(stderr, "cannot mmap dumb buffer (%d): %s\n",
+            errno, strerror(errno));
+        ret = -errno;
+        goto err_fb;
+    }
 
-	/* clear the framebuffer to 0 */
-	memset(dev->map, 0, dev->size);
+    /* clear the framebuffer to 0 */
+    memset(dev->map, 0, dev->size);
 
-	return 0;
+    return 0;
 
 err_fb:
-	drmModeRmFB(fd, dev->fb);
+    drmModeRmFB(fd, dev->fb);
 err_destroy:
-	memset(&dreq, 0, sizeof(dreq));
-	dreq.handle = dev->handle;
-	drmIoctl(fd, DRM_IOCTL_MODE_DESTROY_DUMB, &dreq);
-	return ret;
+    memset(&dreq, 0, sizeof(dreq));
+    dreq.handle = dev->handle;
+    drmIoctl(fd, DRM_IOCTL_MODE_DESTROY_DUMB, &dreq);
+    return ret;
 }
 
 /*
@@ -543,56 +543,56 @@ err_destroy:
 
 int main(int argc, char **argv)
 {
-	int ret, fd;
-	const char *card;
-	struct modeset_dev *iter;
+    int ret, fd;
+    const char *card;
+    struct modeset_dev *iter;
 
-	/* check which DRM device to open */
-	if (argc > 1)
-		card = argv[1];
-	else
-		card = "/dev/dri/card0";
+    /* check which DRM device to open */
+    if (argc > 1)
+        card = argv[1];
+    else
+        card = "/dev/dri/card0";
 
-	fprintf(stderr, "using card '%s'\n", card);
+    fprintf(stderr, "using card '%s'\n", card);
 
-	/* open the DRM device */
-	ret = modeset_open(&fd, card);
-	if (ret)
-		goto out_return;
+    /* open the DRM device */
+    ret = modeset_open(&fd, card);
+    if (ret)
+        goto out_return;
 
-	/* prepare all connectors and CRTCs */
-	ret = modeset_prepare(fd);
-	if (ret)
-		goto out_close;
+    /* prepare all connectors and CRTCs */
+    ret = modeset_prepare(fd);
+    if (ret)
+        goto out_close;
 
-	/* perform actual modesetting on each found connector+CRTC */
-	for (iter = modeset_list; iter; iter = iter->next) {
-		iter->saved_crtc = drmModeGetCrtc(fd, iter->crtc);
-		ret = drmModeSetCrtc(fd, iter->crtc, iter->fb, 0, 0,
-				     &iter->conn, 1, &iter->mode);
-		if (ret)
-			fprintf(stderr, "cannot set CRTC for connector %u (%d): %m\n",
-				iter->conn, errno);
-	}
+    /* perform actual modesetting on each found connector+CRTC */
+    for (iter = modeset_list; iter; iter = iter->next) {
+        iter->saved_crtc = drmModeGetCrtc(fd, iter->crtc);
+        ret = drmModeSetCrtc(fd, iter->crtc, iter->fb, 0, 0,
+                     &iter->conn, 1, &iter->mode);
+        if (ret)
+            fprintf(stderr, "cannot set CRTC for connector %u (%d): %s\n",
+                iter->conn, errno, strerror(errno));
+    }
 
-	/* draw some colors for 5seconds */
-	modeset_draw();
+    /* draw some colors for 5seconds */
+    modeset_draw();
 
-	/* cleanup everything */
-	modeset_cleanup(fd);
+    /* cleanup everything */
+    modeset_cleanup(fd);
 
-	ret = 0;
+    ret = 0;
 
 out_close:
-	close(fd);
+    close(fd);
 out_return:
-	if (ret) {
-		errno = -ret;
-		fprintf(stderr, "modeset failed with error %d: %m\n", errno);
-	} else {
-		fprintf(stderr, "exiting\n");
-	}
-	return ret;
+    if (ret) {
+        errno = -ret;
+        fprintf(stderr, "modeset failed with error %d: %s\n", errno, strerror(errno));
+    } else {
+        fprintf(stderr, "exiting\n");
+    }
+    return ret;
 }
 
 /*
@@ -602,15 +602,15 @@ out_return:
 
 static uint8_t next_color(bool *up, uint8_t cur, unsigned int mod)
 {
-	uint8_t next;
+    uint8_t next;
 
-	next = cur + (*up ? 1 : -1) * (rand() % mod);
-	if ((*up && next < cur) || (!*up && next > cur)) {
-		*up = !*up;
-		next = cur;
-	}
+    next = cur + (*up ? 1 : -1) * (rand() % mod);
+    if ((*up && next < cur) || (!*up && next > cur)) {
+        *up = !*up;
+        next = cur;
+    }
 
-	return next;
+    return next;
 }
 
 /*
@@ -635,34 +635,34 @@ static uint8_t next_color(bool *up, uint8_t cur, unsigned int mod)
 
 static void modeset_draw(void)
 {
-	uint8_t r, g, b;
-	bool r_up, g_up, b_up;
-	unsigned int i, j, k, off;
-	struct modeset_dev *iter;
+    uint8_t r, g, b;
+    bool r_up, g_up, b_up;
+    unsigned int i, j, k, off;
+    struct modeset_dev *iter;
 
-	srand(time(NULL));
-	r = rand() % 0xff;
-	g = rand() % 0xff;
-	b = rand() % 0xff;
-	r_up = g_up = b_up = true;
+    srand(time(NULL));
+    r = rand() % 0xff;
+    g = rand() % 0xff;
+    b = rand() % 0xff;
+    r_up = g_up = b_up = true;
 
-	for (i = 0; i < 50; ++i) {
-		r = next_color(&r_up, r, 20);
-		g = next_color(&g_up, g, 10);
-		b = next_color(&b_up, b, 5);
+    for (i = 0; i < 50; ++i) {
+        r = next_color(&r_up, r, 20);
+        g = next_color(&g_up, g, 10);
+        b = next_color(&b_up, b, 5);
 
-		for (iter = modeset_list; iter; iter = iter->next) {
-			for (j = 0; j < iter->height; ++j) {
-				for (k = 0; k < iter->width; ++k) {
-					off = iter->stride * j + k * 4;
-					*(uint32_t*)&iter->map[off] =
-						     (r << 16) | (g << 8) | b;
-				}
-			}
-		}
+        for (iter = modeset_list; iter; iter = iter->next) {
+            for (j = 0; j < iter->height; ++j) {
+                for (k = 0; k < iter->width; ++k) {
+                    off = iter->stride * j + k * 4;
+                    *(uint32_t*)&iter->map[off] =
+                             (r << 16) | (g << 8) | b;
+                }
+            }
+        }
 
-		usleep(100000);
-	}
+        usleep(100000);
+    }
 }
 
 /*
@@ -674,39 +674,41 @@ static void modeset_draw(void)
 
 static void modeset_cleanup(int fd)
 {
-	struct modeset_dev *iter;
-	struct drm_mode_destroy_dumb dreq;
+    struct modeset_dev *iter;
+    struct drm_mode_destroy_dumb dreq;
 
-	while (modeset_list) {
-		/* remove from global list */
-		iter = modeset_list;
-		modeset_list = iter->next;
+    while (modeset_list) {
+        /* remove from global list */
+        iter = modeset_list;
+        modeset_list = iter->next;
 
-		/* restore saved CRTC configuration */
-		drmModeSetCrtc(fd,
-			       iter->saved_crtc->crtc_id,
-			       iter->saved_crtc->buffer_id,
-			       iter->saved_crtc->x,
-			       iter->saved_crtc->y,
-			       &iter->conn,
-			       1,
-			       &iter->saved_crtc->mode);
-		drmModeFreeCrtc(iter->saved_crtc);
+        /* restore saved CRTC configuration */
+        drmModeSetCrtc(
+            fd,
+            iter->saved_crtc->crtc_id,
+            iter->saved_crtc->buffer_id,
+            iter->saved_crtc->x,
+            iter->saved_crtc->y,
+            &iter->conn,
+            1,
+            &iter->saved_crtc->mode
+        );
+        drmModeFreeCrtc(iter->saved_crtc);
 
-		/* unmap buffer */
-		munmap(iter->map, iter->size);
+        /* unmap buffer */
+        munmap(iter->map, iter->size);
 
-		/* delete framebuffer */
-		drmModeRmFB(fd, iter->fb);
+        /* delete framebuffer */
+        drmModeRmFB(fd, iter->fb);
 
-		/* delete dumb buffer */
-		memset(&dreq, 0, sizeof(dreq));
-		dreq.handle = iter->handle;
-		drmIoctl(fd, DRM_IOCTL_MODE_DESTROY_DUMB, &dreq);
+        /* delete dumb buffer */
+        memset(&dreq, 0, sizeof(dreq));
+        dreq.handle = iter->handle;
+        drmIoctl(fd, DRM_IOCTL_MODE_DESTROY_DUMB, &dreq);
 
-		/* free allocated memory */
-		free(iter);
-	}
+        /* free allocated memory */
+        free(iter);
+    }
 }
 
 /*
diff --git a/userland/libs/libdrm/test b/userland/libs/libdrm/test
new file mode 120000
index 0000000..419df4f
--- /dev/null
+++ b/userland/libs/libdrm/test
@@ -0,0 +1 @@
+../test
\ No newline at end of file
diff --git a/userland/libs/openblas/build b/userland/libs/openblas/build
new file mode 120000
index 0000000..ab18017
--- /dev/null
+++ b/userland/libs/openblas/build
@@ -0,0 +1 @@
+../build
\ No newline at end of file
diff --git a/userland/libs/openblas/hello.c b/userland/libs/openblas/hello.c
new file mode 100644
index 0000000..16e63d0
--- /dev/null
+++ b/userland/libs/openblas/hello.c
@@ -0,0 +1,39 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#blas
+ * Adapted from: https://github.com/xianyi/OpenBLAS/wiki/User-Manual/59b62f98e7400270fb03ad1d85fba5b64ebbff2b#call-cblas-interface
+ */
+
+#include <assert.h>
+#include <cblas.h>
+
+#include <lkmc/math.h>
+
+int main(void) {
+    double A[6] = {
+         1.0, 2.0,  1.0,
+        -3.0, 4.0, -1.0
+    };
+    double B[6] = {
+         1.0,  2.0,
+         1.0, -3.0,
+         4.0, -1.0
+    };
+    double C[9] = {
+        0.5, 0.5, 0.5,
+        0.5, 0.5, 0.5,
+        0.5, 0.5, 0.5
+    };
+    cblas_dgemm(
+        CblasColMajor, CblasNoTrans, CblasTrans,
+        3, 3, 2, 1, A, 3, B, 3, 2, C, 3
+    );
+    assert(lkmc_vector_equal(
+        9,
+        C,
+        (double[]) {
+            11.0, -9.0,  5.0,
+            -9.0, 21.0, -1.0,
+             5.0, -1.0,  3.0
+        },
+        1e-6
+    ));
+}
diff --git a/userland/libs/openblas/test b/userland/libs/openblas/test
new file mode 120000
index 0000000..419df4f
--- /dev/null
+++ b/userland/libs/openblas/test
@@ -0,0 +1 @@
+../test
\ No newline at end of file
diff --git a/userland/libs/test b/userland/libs/test
new file mode 120000
index 0000000..419df4f
--- /dev/null
+++ b/userland/libs/test
@@ -0,0 +1 @@
+../test
\ No newline at end of file
diff --git a/userland/linux/README.adoc b/userland/linux/README.adoc
index fd45463..346e888 100644
--- a/userland/linux/README.adoc
+++ b/userland/linux/README.adoc
@@ -1,3 +1,5 @@
-Programs in this directory rely on Linux kernel specific functionality such as `/proc`.
+Programs in this directory rely on Linux kernel specific functionality such as `/proc` or raw system calls.
 
 These programs may also conform to ANSI C or POSIX, but we put them here because you can only observe them at work under Linux.
+
+Many of these programs use glibc or GCC extensions.
diff --git a/userland/ctrl_alt_del.c b/userland/linux/ctrl_alt_del.c
similarity index 55%
rename from userland/ctrl_alt_del.c
rename to userland/linux/ctrl_alt_del.c
index a3f05c9..e35c7c1 100644
--- a/userland/ctrl_alt_del.c
+++ b/userland/linux/ctrl_alt_del.c
@@ -8,19 +8,19 @@
 #include <unistd.h>
 
 void signal_handler(int sig) {
-	write(STDOUT_FILENO, "cad\n", 4);
-	signal(sig, signal_handler);
+    write(STDOUT_FILENO, "cad\n", 4);
+    signal(sig, signal_handler);
 }
 
 int main(void) {
-	int i = 0;
-	/* Disable the forced reboot, enable sending SIGINT to init. */
-	reboot(RB_DISABLE_CAD);
+    int i = 0;
+    /* Disable the forced reboot, enable sending SIGINT to init. */
+    reboot(RB_DISABLE_CAD);
     signal(SIGINT, signal_handler);
-	while (1) {
-		sleep(1);
-		printf("%d\n", i);
-		i++;
-	}
+    while (1) {
+        sleep(1);
+        printf("%d\n", i);
+        i++;
+    }
     return EXIT_SUCCESS;
 }
diff --git a/userland/linux/init_env_poweroff.c b/userland/linux/init_env_poweroff.c
new file mode 100644
index 0000000..07885b0
--- /dev/null
+++ b/userland/linux/init_env_poweroff.c
@@ -0,0 +1,25 @@
+#define _XOPEN_SOURCE 700
+#include <stdio.h>
+#include <sys/reboot.h>
+#include <unistd.h>
+
+int main(int argc, char **argv) {
+    int i;
+
+    puts("args:");
+    for (i = 0; i < argc; ++i)
+        puts(argv[i]);
+    puts("");
+
+    puts("env:");
+    extern char **environ;
+    char **env = environ;
+    while (*env) {
+        printf("%s\n", *env);
+        env++;
+    }
+    puts("");
+
+    /* Poweroff. */
+    reboot(RB_POWER_OFF);
+}
diff --git a/userland/linux/myinsmod.c b/userland/linux/myinsmod.c
new file mode 100644
index 0000000..318eef9
--- /dev/null
+++ b/userland/linux/myinsmod.c
@@ -0,0 +1,61 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#myinsmod */
+
+#define _GNU_SOURCE
+#include <fcntl.h>
+#include <stdio.h>
+#include <sys/stat.h>
+#include <sys/syscall.h>
+#include <sys/types.h>
+#include <unistd.h>
+#include <stdlib.h>
+
+#define init_module(module_image, len, param_values) syscall(__NR_init_module, module_image, len, param_values)
+#define finit_module(fd, param_values, flags) syscall(__NR_finit_module, fd, param_values, flags)
+
+int main(int argc, char **argv) {
+    const char *params;
+    int fd, use_finit;
+    size_t image_size;
+    struct stat st;
+    void *image;
+
+    /* CLI handling. */
+    if (argc < 2) {
+        puts("Usage ./prog mymodule.ko [args="" [use_finit=0]");
+        return EXIT_FAILURE;
+    }
+    if (argc < 3) {
+        params = "";
+    } else {
+        params = argv[2];
+    }
+    if (argc < 4) {
+        use_finit = 0;
+    } else {
+        use_finit = (argv[3][0] != '0');
+    }
+
+    /* Action. */
+    fd = open(argv[1], O_RDONLY);
+    if (use_finit) {
+        puts("finit");
+        if (finit_module(fd, params, 0) != 0) {
+            perror("finit_module");
+            return EXIT_FAILURE;
+        }
+        close(fd);
+    } else {
+        puts("init");
+        fstat(fd, &st);
+        image_size = st.st_size;
+        image = malloc(image_size);
+        read(fd, image, image_size);
+        close(fd);
+        if (init_module(image, image_size, params) != 0) {
+            perror("init_module");
+            return EXIT_FAILURE;
+        }
+        free(image);
+    }
+    return EXIT_SUCCESS;
+}
diff --git a/userland/myrmmod.c b/userland/linux/myrmmod.c
similarity index 59%
rename from userland/myrmmod.c
rename to userland/linux/myrmmod.c
index e4d68b0..3e09870 100644
--- a/userland/myrmmod.c
+++ b/userland/linux/myrmmod.c
@@ -12,13 +12,13 @@
 #define delete_module(name, flags) syscall(__NR_delete_module, name, flags)
 
 int main(int argc, char **argv) {
-	if (argc != 2) {
-		puts("Usage ./prog mymodule");
-		return EXIT_FAILURE;
-	}
-	if (delete_module(argv[1], O_NONBLOCK) != 0) {
-		perror("delete_module");
-		return EXIT_FAILURE;
-	}
-	return EXIT_SUCCESS;
+    if (argc != 2) {
+        puts("Usage ./prog mymodule");
+        return EXIT_FAILURE;
+    }
+    if (delete_module(argv[1], O_NONBLOCK) != 0) {
+        perror("delete_module");
+        return EXIT_FAILURE;
+    }
+    return EXIT_SUCCESS;
 }
diff --git a/userland/linux/pagemap_dump.c b/userland/linux/pagemap_dump.c
new file mode 100644
index 0000000..e6669c9
--- /dev/null
+++ b/userland/linux/pagemap_dump.c
@@ -0,0 +1,116 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#pagemap_dump-out */
+
+#define _XOPEN_SOURCE 700
+#include <errno.h>
+#include <fcntl.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+#include <lkmc/pagemap.h> /* lkmc_pagemap_virt_to_phys_user */
+
+int main(int argc, char **argv) {
+    char buffer[BUFSIZ];
+    char maps_file[BUFSIZ];
+    char pagemap_file[BUFSIZ];
+    int maps_fd;
+    int offset = 0;
+    int pagemap_fd;
+    pid_t pid;
+
+    if (argc < 2) {
+        printf("Usage: %s pid\n", argv[0]);
+        return EXIT_FAILURE;
+    }
+    pid = strtoull(argv[1], NULL, 0);
+    snprintf(maps_file, sizeof(maps_file), "/proc/%ju/maps", (uintmax_t)pid);
+    snprintf(pagemap_file, sizeof(pagemap_file), "/proc/%ju/pagemap", (uintmax_t)pid);
+    maps_fd = open(maps_file, O_RDONLY);
+    if (maps_fd < 0) {
+        perror("open maps");
+        return EXIT_FAILURE;
+    }
+    pagemap_fd = open(pagemap_file, O_RDONLY);
+    if (pagemap_fd < 0) {
+        perror("open pagemap");
+        return EXIT_FAILURE;
+    }
+    printf("vaddr pfn soft-dirty file/shared swapped present library\n");
+    for (;;) {
+        ssize_t length = read(maps_fd, buffer + offset, sizeof buffer - offset);
+        if (length <= 0) break;
+        length += offset;
+        for (size_t i = offset; i < (size_t)length; i++) {
+            uintptr_t low = 0, high = 0;
+            if (buffer[i] == '\n' && i) {
+                const char *lib_name;
+                size_t y;
+                /* Parse a line from maps. Each line contains a range that contains many pages. */
+                {
+                    size_t x = i - 1;
+                    while (x && buffer[x] != '\n') x--;
+                    if (buffer[x] == '\n') x++;
+                    while (buffer[x] != '-' && x < sizeof buffer) {
+                        char c = buffer[x++];
+                        low *= 16;
+                        if (c >= '0' && c <= '9') {
+                            low += c - '0';
+                        } else if (c >= 'a' && c <= 'f') {
+                            low += c - 'a' + 10;
+                        } else {
+                            break;
+                        }
+                    }
+                    while (buffer[x] != '-' && x < sizeof buffer) x++;
+                    if (buffer[x] == '-') x++;
+                    while (buffer[x] != ' ' && x < sizeof buffer) {
+                        char c = buffer[x++];
+                        high *= 16;
+                        if (c >= '0' && c <= '9') {
+                            high += c - '0';
+                        } else if (c >= 'a' && c <= 'f') {
+                            high += c - 'a' + 10;
+                        } else {
+                            break;
+                        }
+                    }
+                    lib_name = 0;
+                    for (int field = 0; field < 4; field++) {
+                        x++;
+                        while(buffer[x] != ' ' && x < sizeof buffer) x++;
+                    }
+                    while (buffer[x] == ' ' && x < sizeof buffer) x++;
+                    y = x;
+                    while (buffer[y] != '\n' && y < sizeof buffer) y++;
+                    buffer[y] = 0;
+                    lib_name = buffer + x;
+                }
+                /* Get info about all pages in this page range with pagemap. */
+                {
+                    LkmcPagemapEntry entry;
+                    for (uintptr_t vaddr = low; vaddr < high; vaddr += sysconf(_SC_PAGE_SIZE)) {
+                        /* TODO always fails for the last page (vsyscall), why? pread returns 0. */
+                        if (!lkmc_pagemap_get_entry(&entry, pagemap_fd, vaddr)) {
+                            printf(
+                                "%jx %jx %u %u %u %u %s\n",
+                                (uintmax_t)vaddr,
+                                (uintmax_t)entry.pfn,
+                                entry.soft_dirty,
+                                entry.file_page,
+                                entry.swapped,
+                                entry.present,
+                                lib_name
+                            );
+                        }
+                    }
+                }
+                buffer[y] = '\n';
+            }
+        }
+    }
+    close(maps_fd);
+    close(pagemap_fd);
+    return EXIT_SUCCESS;
+}
diff --git a/userland/poweroff.c b/userland/linux/poweroff.c
similarity index 86%
rename from userland/poweroff.c
rename to userland/linux/poweroff.c
index de4a645..bcae5a9 100644
--- a/userland/poweroff.c
+++ b/userland/linux/poweroff.c
@@ -5,5 +5,5 @@
 #include <unistd.h>
 
 int main(void) {
-	reboot(RB_POWER_OFF);
+    reboot(RB_POWER_OFF);
 }
diff --git a/userland/proc_events.c b/userland/linux/proc_events.c
similarity index 100%
rename from userland/proc_events.c
rename to userland/linux/proc_events.c
diff --git a/userland/linux/rand_check.c b/userland/linux/rand_check.c
new file mode 100644
index 0000000..0e0f29f
--- /dev/null
+++ b/userland/linux/rand_check.c
@@ -0,0 +1,41 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#rand_check-out */
+
+#include <inttypes.h>
+#include <signal.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <sys/types.h>
+#include <time.h>
+#include <unistd.h>
+
+int bss = 0;
+int data = 1;
+
+int main(__attribute__((unused)) int argc, char **argv) {
+    int i, *ip;
+    uint64_t uint64;
+    FILE *fp;
+
+    /* Loaded addresses. */
+    printf("&i         = %p\n", (void *)&i);
+    printf("&argv[0]   = %p\n", (void *)&argv[0]);
+    printf("&main      = %p\n", (void *)(intptr_t)main);
+    printf("&bss       = %p\n", (void *)&bss);
+    printf("&data      = %p\n", (void *)&data);
+
+    /* Misc syscalls. */
+    printf("time(NULL) = %ju\n", (uintmax_t)time(NULL));
+    printf("pid        = %ju\n", (uintmax_t)getpid());
+
+    /* malloc */
+    ip = malloc(sizeof(*ip));
+    printf("&malloc    = %p\n", (void *)ip);
+    free(ip);
+
+    /* /dev/urandom */
+    fp = fopen("/dev/urandom", "rb");
+    fread(&uint64, sizeof(uint64), 1, fp);
+    printf("/dev/urandom = %" PRIx64 "\n", uint64);
+    fclose(fp);
+}
diff --git a/userland/sched_getaffinity.c b/userland/linux/sched_getaffinity.c
similarity index 100%
rename from userland/sched_getaffinity.c
rename to userland/linux/sched_getaffinity.c
diff --git a/userland/sched_getaffinity_threads.c b/userland/linux/sched_getaffinity_threads.c
similarity index 75%
rename from userland/sched_getaffinity_threads.c
rename to userland/linux/sched_getaffinity_threads.c
index 4e56996..293f4d5 100644
--- a/userland/sched_getaffinity_threads.c
+++ b/userland/linux/sched_getaffinity_threads.c
@@ -10,31 +10,31 @@
 #include <unistd.h>
 
 void* main_thread_0(void *arg) {
-	int i;
+    int i;
     cpu_set_t mask;
     CPU_ZERO(&mask);
     CPU_SET(*((int*)arg), &mask);
     sched_setaffinity(0, sizeof(cpu_set_t), &mask);
     i = 0;
     while (true) {
-    	printf("0 %d\n", i);
-    	sleep(1);
-    	i++;
+        printf("0 %d\n", i);
+        sleep(1);
+        i++;
     }
     return NULL;
 }
 
 void* main_thread_1(void *arg) {
-	int i;
+    int i;
     cpu_set_t mask;
     CPU_ZERO(&mask);
     CPU_SET(*((int*)arg), &mask);
     sched_setaffinity(1, sizeof(cpu_set_t), &mask);
     i = 0;
     while (true) {
-    	printf("1 %d\n", i);
-    	sleep(1);
-    	i++;
+        printf("1 %d\n", i);
+        sleep(1);
+        i++;
     }
     return NULL;
 }
@@ -43,8 +43,8 @@ int main(void) {
     enum NUM_THREADS {NUM_THREADS = 2};
     pthread_t threads[NUM_THREADS];
     int thread_args[NUM_THREADS];
-	pthread_create(&threads[0], NULL, main_thread_0, (void*)&thread_args[0]);
-	pthread_create(&threads[1], NULL, main_thread_1, (void*)&thread_args[1]);
+    pthread_create(&threads[0], NULL, main_thread_0, (void*)&thread_args[0]);
+    pthread_create(&threads[1], NULL, main_thread_1, (void*)&thread_args[1]);
     pthread_join(threads[0], NULL);
     pthread_join(threads[1], NULL);
     return EXIT_SUCCESS;
diff --git a/userland/linux/test b/userland/linux/test
new file mode 120000
index 0000000..419df4f
--- /dev/null
+++ b/userland/linux/test
@@ -0,0 +1 @@
+../test
\ No newline at end of file
diff --git a/userland/time_boot.c b/userland/linux/time_boot.c
similarity index 70%
rename from userland/time_boot.c
rename to userland/linux/time_boot.c
index 5b38929..3a5efe9 100644
--- a/userland/time_boot.c
+++ b/userland/linux/time_boot.c
@@ -7,8 +7,8 @@
 int main(void) {
     FILE *fp;
     fp = fopen("/dev/kmsg", "w");
-	fputs(__FILE__ "\n", fp);
-	fclose(fp);
-	while (1)
-		sleep(0xFFFFFFFF);
+    fputs(__FILE__ "\n", fp);
+    fclose(fp);
+    while (1)
+        sleep(0xFFFFFFFF);
 }
diff --git a/userland/linux/virt_to_phys_user.c b/userland/linux/virt_to_phys_user.c
new file mode 100644
index 0000000..c25b08c
--- /dev/null
+++ b/userland/linux/virt_to_phys_user.c
@@ -0,0 +1,25 @@
+/* https://github.com/cirosantilli/linux-kernel-module-cheat#userland-physical-address-experiments */
+
+#define _XOPEN_SOURCE 700
+#include <stdio.h> /* printf */
+#include <stdlib.h> /* EXIT_SUCCESS, EXIT_FAILURE, strtoull */
+
+#include <lkmc/pagemap.h> /* lkmc_pagemap_virt_to_phys_user */
+
+int main(int argc, char **argv) {
+    pid_t pid;
+    uintptr_t vaddr, paddr = 0;
+
+    if (argc < 3) {
+        printf("Usage: %s pid vaddr\n", argv[0]);
+        return EXIT_FAILURE;
+    }
+    pid = strtoull(argv[1], NULL, 0);
+    vaddr = strtoull(argv[2], NULL, 0);
+    if (lkmc_pagemap_virt_to_phys_user(&paddr, pid, vaddr)) {
+        fprintf(stderr, "error: lkmc_pagemap_virt_to_phys_user\n");
+        return EXIT_FAILURE;
+    };
+    printf("0x%jx\n", (uintmax_t)paddr);
+    return EXIT_SUCCESS;
+}
diff --git a/userland/lkmc/README.adoc b/userland/lkmc/README.adoc
new file mode 100644
index 0000000..f06563e
--- /dev/null
+++ b/userland/lkmc/README.adoc
@@ -0,0 +1 @@
+Testing mostly infrastructure of this repository rather than anything else.
diff --git a/userland/lkmc/add.c b/userland/lkmc/add.c
new file mode 120000
index 0000000..01339d7
--- /dev/null
+++ b/userland/lkmc/add.c
@@ -0,0 +1 @@
+../../lkmc/add.c
\ No newline at end of file
diff --git a/userland/lkmc/add.py b/userland/lkmc/add.py
new file mode 120000
index 0000000..6c162bc
--- /dev/null
+++ b/userland/lkmc/add.py
@@ -0,0 +1 @@
+../../lkmc/add.py
\ No newline at end of file
diff --git a/userland/lkmc/assert_fail.c b/userland/lkmc/assert_fail.c
new file mode 120000
index 0000000..73df102
--- /dev/null
+++ b/userland/lkmc/assert_fail.c
@@ -0,0 +1 @@
+../../lkmc/lkmc_assert_fail.c
\ No newline at end of file
diff --git a/userland/m5ops.h b/userland/m5ops.h
deleted file mode 100644
index 82dce5c..0000000
--- a/userland/m5ops.h
+++ /dev/null
@@ -1,44 +0,0 @@
-#ifndef M5OPS_H
-#define M5OPS_H
-
-#if defined(__arm__)
-static void m5_checkpoint(void) {
-    __asm__ __volatile__ ("mov r0, #0; mov r1, #0; mov r2, #0; mov r3, #0; .inst 0xEE000110 | (0x43 << 16);" : : : "r0", "r1", "r2", "r3");
-}
-static void m5_dumpstats(void) {
-    __asm__ __volatile__ ("mov r0, #0; mov r1, #0; mov r2, #0; mov r3, #0; .inst 0xEE000110 | (0x41 << 16);" : : : "r0", "r1", "r2", "r3");
-}
-static void m5_exit() {
-    __asm__ __volatile__ ("mov r0, #0; mov r1, #0; .inst 0xEE000110 | (0x21 << 16);" : : : "r0", "r1");
-}
-static void m5_fail_1(void) {
-    __asm__ __volatile__ ("mov r0, #0; mov r1, #0; mov r2, #1; mov r3, #0; .inst 0xEE000110 | (0x22 << 16);" : : : "r0", "r1", "r2", "r3");
-}
-static void m5_resetstats(void) {
-    __asm__ __volatile__ ("mov r0, #0; mov r1, #0; mov r2, #0; mov r3, #0; .inst 0xEE000110 | (0x40 << 16);" : : : "r0", "r1", "r2", "r3");
-}
-#elif defined(__aarch64__)
-static void m5_checkpoint(void) {
-    __asm__ __volatile__ ("mov x0, #0; mov x1, #0; .inst 0xFF000110 | (0x43 << 16);" : : : "x0", "x1");
-}
-static void m5_dumpstats(void) {
-    __asm__ __volatile__ ("mov x0, #0; mov x1, #0; .inst 0xFF000110 | (0x41 << 16);" : : : "x0", "x1");
-}
-static void m5_exit(void) {
-    __asm__ __volatile__ ("mov x0, #0; .inst 0XFF000110 | (0x21 << 16);" : : : "x0");
-}
-static void m5_fail_1(void) {
-    __asm__ __volatile__ ("mov x0, #0; mov x1, #1; .inst 0xFF000110 | (0x22 << 16);" : : : "x0", "x1");
-}
-static void m5_resetstats(void) {
-    __asm__ __volatile__ ("mov x0, #0; mov x1, #0; .inst 0XFF000110 | (0x40 << 16);" : : : "x0", "x1");
-}
-#else
-static void m5_checkpoint(void) {}
-static void m5_dumpstats(void) {}
-static void m5_exit(void) {}
-static void m5_fail_1(void) {}
-static void m5_resetstats(void) {}
-#endif
-
-#endif
diff --git a/userland/mmap.c b/userland/mmap.c
deleted file mode 100644
index 321c32f..0000000
--- a/userland/mmap.c
+++ /dev/null
@@ -1,94 +0,0 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#mmap */
-
-#define _XOPEN_SOURCE 700
-#include <assert.h>
-#include <fcntl.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <stdint.h> /* uintmax_t */
-#include <string.h>
-#include <sys/mman.h>
-#include <unistd.h> /* sysconf */
-
-#include "common_userland.h" /* virt_to_phys_user */
-
-enum { BUFFER_SIZE = 4 };
-
-int main(int argc, char **argv)
-{
-	int fd;
-	long page_size;
-	char *address1, *address2;
-	char buf[BUFFER_SIZE];
-	uintptr_t paddr;
-
-	if (argc < 2) {
-		printf("Usage: %s <mmap_file>\n", argv[0]);
-		return EXIT_FAILURE;
-	}
-	page_size = sysconf(_SC_PAGE_SIZE);
-	printf("open pathname = %s\n", argv[1]);
-	fd = open(argv[1], O_RDWR | O_SYNC);
-	if (fd < 0) {
-		perror("open");
-		assert(0);
-	}
-	printf("fd = %d\n", fd);
-
-    /* mmap twice for double fun. */
-	puts("mmap 1");
-	address1 = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
-	if (address1 == MAP_FAILED) {
-		perror("mmap");
-		assert(0);
-	}
-	puts("mmap 2");
-	address2 = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
-	if (address2 == MAP_FAILED) {
-		perror("mmap");
-		return EXIT_FAILURE;
-	}
-	assert(address1 != address2);
-
-    /* Read and modify memory. */
-	puts("access 1");
-	assert(!strcmp(address1, "asdf"));
-	/* vm_fault */
-	puts("access 2");
-	assert(!strcmp(address2, "asdf"));
-	/* vm_fault */
-	strcpy(address1, "qwer");
-	/* Also modified. So both virtual addresses point to the same physical address. */
-	assert(!strcmp(address2, "qwer"));
-
-	/* Check that the physical addresses are the same.
-	 * They are, but TODO why virt_to_phys on kernel gives a different value? */
-	assert(!virt_to_phys_user(&paddr, getpid(), (uintptr_t)address1));
-	printf("paddr1 = 0x%jx\n", (uintmax_t)paddr);
-	assert(!virt_to_phys_user(&paddr, getpid(), (uintptr_t)address2));
-	printf("paddr2 = 0x%jx\n", (uintmax_t)paddr);
-
-    /* Check that modifications made from userland are also visible from the kernel. */
-	read(fd, buf, BUFFER_SIZE);
-	assert(!memcmp(buf, "qwer", BUFFER_SIZE));
-
-	/* Modify the data from the kernel, and check that the change is visible from userland. */
-	write(fd, "zxcv", 4);
-	assert(!strcmp(address1, "zxcv"));
-	assert(!strcmp(address2, "zxcv"));
-
-    /* Cleanup. */
-    puts("munmap 1");
-	if (munmap(address1, page_size)) {
-		perror("munmap");
-		assert(0);
-	}
-    puts("munmap 2");
-	if (munmap(address2, page_size)) {
-		perror("munmap");
-		assert(0);
-	}
-    puts("close");
-	close(fd);
-	return EXIT_SUCCESS;
-}
diff --git a/userland/myinsmod.c b/userland/myinsmod.c
deleted file mode 100644
index 2a91670..0000000
--- a/userland/myinsmod.c
+++ /dev/null
@@ -1,61 +0,0 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#myinsmod */
-
-#define _GNU_SOURCE
-#include <fcntl.h>
-#include <stdio.h>
-#include <sys/stat.h>
-#include <sys/syscall.h>
-#include <sys/types.h>
-#include <unistd.h>
-#include <stdlib.h>
-
-#define init_module(module_image, len, param_values) syscall(__NR_init_module, module_image, len, param_values)
-#define finit_module(fd, param_values, flags) syscall(__NR_finit_module, fd, param_values, flags)
-
-int main(int argc, char **argv) {
-	const char *params;
-	int fd, use_finit;
-	size_t image_size;
-	struct stat st;
-	void *image;
-
-	/* CLI handling. */
-	if (argc < 2) {
-		puts("Usage ./prog mymodule.ko [args="" [use_finit=0]");
-		return EXIT_FAILURE;
-	}
-	if (argc < 3) {
-		params = "";
-	} else {
-		params = argv[2];
-	}
-	if (argc < 4) {
-		use_finit = 0;
-	} else {
-		use_finit = (argv[3][0] != '0');
-	}
-
-	/* Action. */
-	fd = open(argv[1], O_RDONLY);
-	if (use_finit) {
-		puts("finit");
-		if (finit_module(fd, params, 0) != 0) {
-			perror("finit_module");
-			return EXIT_FAILURE;
-		}
-		close(fd);
-	} else {
-		puts("init");
-		fstat(fd, &st);
-		image_size = st.st_size;
-		image = malloc(image_size);
-		read(fd, image, image_size);
-		close(fd);
-		if (init_module(image, image_size, params) != 0) {
-			perror("init_module");
-			return EXIT_FAILURE;
-		}
-		free(image);
-	}
-	return EXIT_SUCCESS;
-}
diff --git a/userland/openblas_hello.c b/userland/openblas_hello.c
deleted file mode 100644
index 581bcfb..0000000
--- a/userland/openblas_hello.c
+++ /dev/null
@@ -1,15 +0,0 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#blas
- * Adapted from: https://github.com/xianyi/OpenBLAS/wiki/User-Manual/59b62f98e7400270fb03ad1d85fba5b64ebbff2b#call-cblas-interface */
-
-#include "lkmc.h"
-
-#include <assert.h>
-#include <cblas.h>
-
-int main(void) {
-    double A[6] = {1.0, 2.0, 1.0, -3.0, 4.0, -1.0};
-    double B[6] = {1.0, 2.0, 1.0, -3.0, 4.0, -1.0};
-    double C[9] = {0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5};
-    cblas_dgemm(CblasColMajor, CblasNoTrans, CblasTrans, 3, 3, 2, 1, A, 3, B, 3, 2, C, 3);
-    assert(lkmc_vector_equal(9, C, (double[]){11.0, -9.0, 5.0, -9.0, 21.0, -1.0, 5.0, -1.0, 3.0}, 1e-6));
-}
diff --git a/userland/openmp.c b/userland/openmp.c
deleted file mode 100644
index 2074aa4..0000000
--- a/userland/openmp.c
+++ /dev/null
@@ -1,19 +0,0 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#openmp */
-
-#include <omp.h>
-#include <stdio.h>
-#include <stdlib.h>
-
-int main () {
-	int nthreads, tid;
-#pragma omp parallel private(nthreads, tid)
-	{
-		tid = omp_get_thread_num();
-		printf("Hello World from thread = %d\n", tid);
-		if (tid == 0) {
-			nthreads = omp_get_num_threads();
-			printf("Number of threads = %d\n", nthreads);
-		}
-	}
-	return EXIT_SUCCESS;
-}
diff --git a/userland/pagemap_dump.c b/userland/pagemap_dump.c
deleted file mode 100644
index f3553a4..0000000
--- a/userland/pagemap_dump.c
+++ /dev/null
@@ -1,116 +0,0 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#pagemap_dump-out */
-
-#define _XOPEN_SOURCE 700
-#include <errno.h>
-#include <fcntl.h>
-#include <stdint.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <sys/types.h>
-#include <unistd.h>
-
-#include "common_userland.h" /* pagemap_get_entry */
-
-int main(int argc, char **argv)
-{
-	char buffer[BUFSIZ];
-	char maps_file[BUFSIZ];
-	char pagemap_file[BUFSIZ];
-	int maps_fd;
-	int offset = 0;
-	int pagemap_fd;
-	pid_t pid;
-
-	if (argc < 2) {
-		printf("Usage: %s pid\n", argv[0]);
-		return EXIT_FAILURE;
-	}
-	pid = strtoull(argv[1], NULL, 0);
-	snprintf(maps_file, sizeof(maps_file), "/proc/%ju/maps", (uintmax_t)pid);
-	snprintf(pagemap_file, sizeof(pagemap_file), "/proc/%ju/pagemap", (uintmax_t)pid);
-	maps_fd = open(maps_file, O_RDONLY);
-	if (maps_fd < 0) {
-		perror("open maps");
-		return EXIT_FAILURE;
-	}
-	pagemap_fd = open(pagemap_file, O_RDONLY);
-	if (pagemap_fd < 0) {
-		perror("open pagemap");
-		return EXIT_FAILURE;
-	}
-	printf("vaddr pfn soft-dirty file/shared swapped present library\n");
-	for (;;) {
-		ssize_t length = read(maps_fd, buffer + offset, sizeof buffer - offset);
-		if (length <= 0) break;
-		length += offset;
-		for (size_t i = offset; i < (size_t)length; i++) {
-			uintptr_t low = 0, high = 0;
-			if (buffer[i] == '\n' && i) {
-				const char *lib_name;
-				size_t y;
-				/* Parse a line from maps. Each line contains a range that contains many pages. */
-				{
-					size_t x = i - 1;
-					while (x && buffer[x] != '\n') x--;
-					if (buffer[x] == '\n') x++;
-					while (buffer[x] != '-' && x < sizeof buffer) {
-						char c = buffer[x++];
-						low *= 16;
-						if (c >= '0' && c <= '9') {
-							low += c - '0';
-						} else if (c >= 'a' && c <= 'f') {
-							low += c - 'a' + 10;
-						} else {
-						    break;
-						}
-					}
-					while (buffer[x] != '-' && x < sizeof buffer) x++;
-					if (buffer[x] == '-') x++;
-					while (buffer[x] != ' ' && x < sizeof buffer) {
-						char c = buffer[x++];
-						high *= 16;
-						if (c >= '0' && c <= '9') {
-							high += c - '0';
-						} else if (c >= 'a' && c <= 'f') {
-							high += c - 'a' + 10;
-						} else {
-							break;
-						}
-					}
-					lib_name = 0;
-					for (int field = 0; field < 4; field++) {
-						x++;
-						while(buffer[x] != ' ' && x < sizeof buffer) x++;
-					}
-					while (buffer[x] == ' ' && x < sizeof buffer) x++;
-					y = x;
-					while (buffer[y] != '\n' && y < sizeof buffer) y++;
-					buffer[y] = 0;
-					lib_name = buffer + x;
-				}
-				/* Get info about all pages in this page range with pagemap. */
-				{
-					PagemapEntry entry;
-					for (uintptr_t vaddr = low; vaddr < high; vaddr += sysconf(_SC_PAGE_SIZE)) {
-						/* TODO always fails for the last page (vsyscall), why? pread returns 0. */
-						if (!pagemap_get_entry(&entry, pagemap_fd, vaddr)) {
-							printf("%jx %jx %u %u %u %u %s\n",
-								(uintmax_t)vaddr,
-								(uintmax_t)entry.pfn,
-								entry.soft_dirty,
-								entry.file_page,
-								entry.swapped,
-								entry.present,
-								lib_name
-							);
-						}
-					}
-				}
-				buffer[y] = '\n';
-			}
-		}
-	}
-	close(maps_fd);
-	close(pagemap_fd);
-	return EXIT_SUCCESS;
-}
diff --git a/userland/poll.c b/userland/poll.c
deleted file mode 100644
index c15821b..0000000
--- a/userland/poll.c
+++ /dev/null
@@ -1,41 +0,0 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#poll */
-
-#define _XOPEN_SOURCE 700
-#include <assert.h>
-#include <fcntl.h> /* creat, O_CREAT */
-#include <poll.h> /* poll */
-#include <stdio.h> /* printf, puts, snprintf */
-#include <stdlib.h> /* EXIT_FAILURE, EXIT_SUCCESS */
-#include <unistd.h> /* read */
-
-int main(int argc, char **argv) {
-	char buf[1024];
-	int fd, i, n;
-	short revents;
-	struct pollfd pfd;
-
-	if (argc < 2) {
-		fprintf(stderr, "usage: %s <poll-device>\n", argv[0]);
-		exit(EXIT_FAILURE);
-	}
-	fd = open(argv[1], O_RDONLY | O_NONBLOCK);
-	if (fd == -1) {
-		perror("open");
-		exit(EXIT_FAILURE);
-	}
-	pfd.fd = fd;
-	pfd.events = POLLIN;
-	while (1) {
-		puts("loop");
-		i = poll(&pfd, 1, -1);
-		if (i == -1) {
-			perror("poll");
-			assert(0);
-		}
-		revents = pfd.revents;
-		if (revents & POLLIN) {
-			n = read(pfd.fd, buf, sizeof(buf));
-			printf("POLLIN n=%d buf=%.*s\n", n, n, buf);
-		}
-	}
-}
diff --git a/userland/posix/count.c b/userland/posix/count.c
new file mode 100644
index 0000000..5aacc16
--- /dev/null
+++ b/userland/posix/count.c
@@ -0,0 +1,18 @@
+/* Count to infinity with 1 second sleep between each increment.
+ * Sample application: https://github.com/cirosantilli/linux-kernel-module-cheat#gdb-step-debug-userland-custom-init
+ */
+
+#define _XOPEN_SOURCE 700
+#include <limits.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+int main(void) {
+    unsigned long i = 0;
+    while (1) {
+        printf("%lu\n", i);
+        i++;
+        sleep(1);
+    }
+}
diff --git a/userland/posix/environ.c b/userland/posix/environ.c
new file mode 100644
index 0000000..39ef1cb
--- /dev/null
+++ b/userland/posix/environ.c
@@ -0,0 +1,15 @@
+/* Print all environment variables. */
+
+#include <stdio.h>
+#include <stdlib.h>
+
+extern char **environ;
+
+int main(void) {
+    char **env = environ;
+    while (*env) {
+        printf("%s\n", *env);
+        env++;
+    }
+    return EXIT_SUCCESS;
+}
diff --git a/userland/sleep_forever.c b/userland/posix/sleep_forever.c
similarity index 73%
rename from userland/sleep_forever.c
rename to userland/posix/sleep_forever.c
index cba5228..46cd15e 100644
--- a/userland/sleep_forever.c
+++ b/userland/posix/sleep_forever.c
@@ -5,7 +5,7 @@
 #include <unistd.h>
 
 int main(void) {
-	puts(__FILE__);
-	while (1)
-		sleep(0xFFFFFFFF);
+    puts(__FILE__);
+    while (1)
+        sleep(0xFFFFFFFF);
 }
diff --git a/userland/posix/test b/userland/posix/test
new file mode 120000
index 0000000..419df4f
--- /dev/null
+++ b/userland/posix/test
@@ -0,0 +1 @@
+../test
\ No newline at end of file
diff --git a/userland/virt_to_phys_test.c b/userland/posix/virt_to_phys_test.c
similarity index 60%
rename from userland/virt_to_phys_test.c
rename to userland/posix/virt_to_phys_test.c
index 830b8f6..0f08c62 100644
--- a/userland/virt_to_phys_test.c
+++ b/userland/posix/virt_to_phys_test.c
@@ -11,11 +11,11 @@ enum { I0 = 0x12345678 };
 static volatile uint32_t i = I0;
 
 int main(void) {
-	printf("vaddr %p\n", (void *)&i);
-	printf("pid %ju\n", (uintmax_t)getpid());
-	while (i == I0) {
-		sleep(1);
-	}
-	printf("i %jx\n", (uintmax_t)i);
-	return EXIT_SUCCESS;
+    printf("vaddr %p\n", (void *)&i);
+    printf("pid %ju\n", (uintmax_t)getpid());
+    while (i == I0) {
+        sleep(1);
+    }
+    printf("i %jx\n", (uintmax_t)i);
+    return EXIT_SUCCESS;
 }
diff --git a/userland/print_argv.c b/userland/print_argv.c
deleted file mode 100644
index da7bc6a..0000000
--- a/userland/print_argv.c
+++ /dev/null
@@ -1,10 +0,0 @@
-/* Print each command line argument received, one per line. */
-
-#include <stdio.h>
-
-int main(int argc, char **argv) {
-	size_t i;
-	for (i = 0; i < (size_t)argc; ++i)
-		printf("%s\n", argv[i]);
-	return 0;
-}
diff --git a/userland/rand_check.c b/userland/rand_check.c
deleted file mode 100644
index 7b5a5a5..0000000
--- a/userland/rand_check.c
+++ /dev/null
@@ -1,41 +0,0 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#rand_check-out */
-
-#include <inttypes.h>
-#include <signal.h>
-#include <stdint.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <sys/types.h>
-#include <time.h>
-#include <unistd.h>
-
-int bss = 0;
-int data = 1;
-
-int main(__attribute__((unused)) int argc, char **argv) {
-	int i, *ip;
-	uint64_t uint64;
-	FILE *fp;
-
-	/* Loaded addresses. */
-	printf("&i         = %p\n", (void *)&i);
-	printf("&argv[0]   = %p\n", (void *)&argv[0]);
-	printf("&main      = %p\n", (void *)(intptr_t)main);
-	printf("&bss       = %p\n", (void *)&bss);
-	printf("&data      = %p\n", (void *)&data);
-
-	/* Misc syscalls. */
-	printf("time(NULL) = %ju\n", (uintmax_t)time(NULL));
-	printf("pid = %ju\n", (uintmax_t)getpid());
-
-	/* malloc */
-	ip = malloc(sizeof(*ip));
-	printf("&malloc    = %p\n", (void *)ip);
-	free(ip);
-
-	/* /dev/urandom */
-	fp = fopen("/dev/urandom", "rb");
-	fread(&uint64, sizeof(uint64), 1, fp);
-	printf("/dev/urandom = %" PRIx64 "\n", uint64);
-	fclose(fp);
-}
diff --git a/userland/rdtsc.c b/userland/rdtsc.c
deleted file mode 100644
index 6159271..0000000
--- a/userland/rdtsc.c
+++ /dev/null
@@ -1,20 +0,0 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#rdtsc */
-
-#include <stdint.h>
-#include <stdio.h>
-#include <stdlib.h>
-
-#if defined(__i386__) || defined(__x86_64__)
-#include <x86intrin.h>
-#endif
-
-int main(void) {
-	uintmax_t val;
-#if defined(__i386__) || defined(__x86_64__)
-	val = __rdtsc();
-#else
-	val = 0;
-#endif
-	printf("%ju\n", val);
-	return EXIT_SUCCESS;
-}
diff --git a/userland/ring0.c b/userland/ring0.c
deleted file mode 100644
index 5ff5daf..0000000
--- a/userland/ring0.c
+++ /dev/null
@@ -1,14 +0,0 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#ring0 */
-
-#include <stdio.h>
-#include <stdlib.h>
-
-#include "../include/ring0.h"
-
-int main(void) {
-#if defined(__x86_64__) || defined(__i386__)
-	Ring0Regs ring0_regs;
-	ring0_get_control_regs(&ring0_regs);
-#endif
-	return EXIT_SUCCESS;
-}
diff --git a/userland/test b/userland/test
new file mode 120000
index 0000000..2eb7181
--- /dev/null
+++ b/userland/test
@@ -0,0 +1 @@
+../test-user-mode-in-tree
\ No newline at end of file
diff --git a/userland/virt_to_phys_user.c b/userland/virt_to_phys_user.c
deleted file mode 100644
index 72cf1fd..0000000
--- a/userland/virt_to_phys_user.c
+++ /dev/null
@@ -1,26 +0,0 @@
-/* https://github.com/cirosantilli/linux-kernel-module-cheat#userland-physical-address-experiments */
-
-#define _XOPEN_SOURCE 700
-#include <stdio.h> /* printf */
-#include <stdlib.h> /* EXIT_SUCCESS, EXIT_FAILURE, strtoull */
-
-#include "common_userland.h" /* virt_to_phys_user */
-
-int main(int argc, char **argv)
-{
-	pid_t pid;
-	uintptr_t vaddr, paddr = 0;
-
-	if (argc < 3) {
-		printf("Usage: %s pid vaddr\n", argv[0]);
-		return EXIT_FAILURE;
-	}
-	pid = strtoull(argv[1], NULL, 0);
-	vaddr = strtoull(argv[2], NULL, 0);
-	if (virt_to_phys_user(&paddr, pid, vaddr)) {
-		fprintf(stderr, "error: virt_to_phys_user\n");
-		return EXIT_FAILURE;
-	};
-	printf("0x%jx\n", (uintmax_t)paddr);
-	return EXIT_SUCCESS;
-}