diff --git a/index.html b/index.html index a64bab1..5f6368e 100644 --- a/index.html +++ b/index.html @@ -1003,430 +1003,433 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
  • 16. Xen
  • -
  • 17. QEMU +
  • 17. U-Boot
  • +
  • 18. QEMU
  • -
  • 18. gem5 +
  • 19. gem5
  • -
  • 19. Buildroot +
  • 20. Buildroot
  • -
  • 20. Userland content +
  • 21. Userland content
  • -
  • 21. Userland assembly +
  • 22. Userland assembly
  • -
  • 22. x86 userland assembly +
  • 23. x86 userland assembly
    -

    and can therefore be used to estimate system performance, see: Section 18.2, “gem5 run benchmark” for an example.

    +

    and can therefore be used to estimate system performance, see: Section 19.2, “gem5 run benchmark” for an example.

    The downside of gem5 much slower than QEMU because of the greater simulation detail.

    @@ -2682,7 +2685,7 @@ j = 0
    -

    More gem5 information is present at: Section 18, “gem5”

    +

    More gem5 information is present at: Section 19, “gem5”

    Good next steps are:

    @@ -2708,7 +2711,7 @@ j = 0

    This repository has been tested inside clean Docker containers.

    -

    This is a good option if you are on a Linux host, but the native setup failed due to your weird host distribution, and you have better things to do with your life than to debug it. See also: Section 29.1, “Supported hosts”.

    +

    This is a good option if you are on a Linux host, but the native setup failed due to your weird host distribution, and you have better things to do with your life than to debug it. See also: Section 30.1, “Supported hosts”.

    For example, to do a QEMU Buildroot setup inside Docker, run:

    @@ -2896,7 +2899,7 @@ j = 0
    -

    as shown at: Section 17.7, “Debug the emulator”, although direct GDB host usage works as well of course.

    +

    as shown at: Section 18.7, “Debug the emulator”, although direct GDB host usage works as well of course.

    @@ -3670,7 +3673,7 @@ error: simulation error detected by parsing logs
    -

    TODO: the carriage returns are a bit different than in QEMU, see: Section 26.4, “gem5 baremetal carriage return”.

    +

    TODO: the carriage returns are a bit different than in QEMU, see: Section 27.4, “gem5 baremetal carriage return”.

    Note that ./build-baremetal requires the --emulator gem5 option, and generates separate executable images for both, as can be seen from:

    @@ -3703,7 +3706,7 @@ echo "$(./getvar --arch aarch64 --emulator gem5 image)"
    -

    see also: Section 18.18, “gem5 ARM platforms”.

    +

    see also: Section 19.18, “gem5 ARM platforms”.

    This generates yet new separate images with new magic constants:

    @@ -3718,10 +3721,10 @@ echo "$(./getvar --arch aarch64 --baremetal userland/c/hello.c --emulator gem5 -

    But just stick to newer and better VExpress_GEM5_V1 unless you have a good reason to use RealViewPBX.

    -

    When doing baremetal programming, it is likely that you will want to learn userland assembly first, see: Section 21, “Userland assembly”.

    +

    When doing baremetal programming, it is likely that you will want to learn userland assembly first, see: Section 22, “Userland assembly”.

    -

    For more information on baremetal, see the section: Section 26, “Baremetal”.

    +

    For more information on baremetal, see the section: Section 27, “Baremetal”.

    The following subjects are particularly important:

    @@ -3786,7 +3789,7 @@ xdg-open README.html
    -

    More information about our documentation internals can be found at: Section 29.5, “Documentation”

    +

    More information about our documentation internals can be found at: Section 30.5, “Documentation”

    @@ -4978,7 +4981,7 @@ Breakpoint 3 at 0xffffffff811615e3: fdget_pos. (9 locations)

    2.9. GDB step debug multicore userland

    -

    For a more minimal baremetal multicore setup, see: Section 26.8.3, “ARM multicore”.

    +

    For a more minimal baremetal multicore setup, see: Section 27.8.3, “ARM multicore”.

    We can set and get which cores the Linux kernel allows a program to run on with sched_getaffinity and sched_setaffinity:

    @@ -5026,7 +5029,7 @@ sched_getcpu = 0
    -

    The number of cores is modified as explained at: Section 18.2.2.1, “Number of cores”

    +

    The number of cores is modified as explained at: Section 19.2.2.1, “Number of cores”

    taskset from the util-linux package sets the initial core affinity of a program:

    @@ -6410,7 +6413,7 @@ cat f

    which can be good for automated tests, as it ensures that you are using a pristine unmodified system image every time.

    -

    Not however that we already disable disk persistency by default on ext2 filesystems even without --initrd: Section 17.2, “Disk persistency”.

    +

    Not however that we already disable disk persistency by default on ext2 filesystems even without --initrd: Section 18.2, “Disk persistency”.

    One downside of this method is that it has to put the entire filesystem into memory, and could lead to a panic:

    @@ -7104,7 +7107,7 @@ qw er
    -

    To stop at the very first instruction of a freestanding program, just use --no-continue. A good example of this is shown at: Section 21.5.1, “Freestanding programs”.

    +

    To stop at the very first instruction of a freestanding program, just use --no-continue. A good example of this is shown at: Section 22.5.1, “Freestanding programs”.

    @@ -7157,7 +7160,7 @@ qw er

    The gem5 tests require building statically with build id static, see also: Section 10.6, “gem5 syscall emulation mode”. TODO automate this better.

    -

    See: Section 29.13, “Test this repo” for more useful testing tips.

    +

    See: Section 30.13, “Test this repo” for more useful testing tips.

    @@ -7835,7 +7838,7 @@ hello
    @@ -8742,7 +8745,7 @@ xeyes

    14.1. Enable networking

    -

    We disable networking by default because it starts an userland process, and we want to keep the number of userland processes to a minimum to make the system more understandable as explained at: Section 29.18.3, “Resource tradeoff guidelines”

    +

    We disable networking by default because it starts an userland process, and we want to keep the number of userland processes to a minimum to make the system more understandable as explained at: Section 30.18.3, “Resource tradeoff guidelines”

    To enable networking on Buildroot, simply run:

    @@ -9513,7 +9516,7 @@ CONFIG_IKCONFIG_PROC=y
    -

    In case something breaks while updating the Linux kernel, you can try to bisect it to understand the root cause, see: Section 29.14, “Bisection”.

    +

    In case something breaks while updating the Linux kernel, you can try to bisect it to understand the root cause, see: Section 30.14, “Bisection”.

    15.2.2.1. Update the Linux kernel LKMC procedure
    -

    First, use use the branching procedure described at: Section 29.16, “Update a forked submodule”

    +

    First, use use the branching procedure described at: Section 30.16, “Update a forked submodule”

    -

    Because the kernel is so central to this repository, almost all tests must be re-run, so basically just follow the full testing procedure described at: Section 29.13, “Test this repo”. The only tests that can be skipped are essentially the Baremetal tests.

    +

    Because the kernel is so central to this repository, almost all tests must be re-run, so basically just follow the full testing procedure described at: Section 30.13, “Test this repo”. The only tests that can be skipped are essentially the Baremetal tests.

    Before comitting, don’t forget to update:

    @@ -13996,7 +13999,7 @@ detected buffer overflow in strlen
    -

    SELinux requires glibc as mentioned at: Section 19.10, “libc choice”.

    +

    SELinux requires glibc as mentioned at: Section 20.10, “libc choice”.

    @@ -15127,7 +15130,7 @@ wget \
    -

    STRESS_NG is likely the best, but it requires glibc, see: Section 19.10, “libc choice”.

    +

    STRESS_NG is likely the best, but it requires glibc, see: Section 20.10, “libc choice”.

    Websites:

    @@ -15266,10 +15269,24 @@ ps
    -

    17. QEMU

    +

    17. U-Boot

    +
    + +
    +

    U-Boot is a popular bootloader.

    +
    +
    +

    It can read disk filesystems, and Buildroot supports it, so we could in theory put it into memory, and let it find a kernel image from the root filesystem and boot that, but I didn’t manage to get it working yet: https://stackoverflow.com/questions/58028789/how-to-boot-linux-aarch64-with-u-boot-with-buildroot-on-qemu

    +
    +
    +
    +
    +

    18. QEMU

    -

    17.1. Introduction to QEMU

    +

    18.1. Introduction to QEMU

    QEMU is a system simulator: it simulates a CPU and devices such as interrupt handlers, timers, UART, screen, keyboard, etc.

    @@ -15299,7 +15316,7 @@ ps
    -

    17.2. Disk persistency

    +

    18.2. Disk persistency

    We disable disk persistency for both QEMU and gem5 by default, to prevent the emulator from putting the image in an unknown state.

    @@ -15354,7 +15371,7 @@ ps

    Disk persistency is useful to re-run shell commands from the history of a previous session with Ctrl-R, but we felt that the loss of determinism was not worth it.

    -

    17.2.1. gem5 disk persistency

    +

    18.2.1. gem5 disk persistency

    TODO how to make gem5 disk writes persistent?

    @@ -15384,7 +15401,7 @@ index 17498c42b..76b8b351d 100644
    -

    17.3. gem5 qcow2

    +

    18.3. gem5 qcow2

    qcow2 does not appear supported, there are not hits in the source tree, and there is a mention on Nate’s 2009 wishlist: http://gem5.org/Nate%27s_Wish_List

    @@ -15393,7 +15410,7 @@ index 17498c42b..76b8b351d 100644
    -

    17.4. Snapshot

    +

    18.4. Snapshot

    QEMU allows us to take snapshots at any time through the monitor.

    @@ -15491,7 +15508,7 @@ index 17498c42b..76b8b351d 100644

    Bibliography: https://stackoverflow.com/questions/40227651/does-qemu-emulator-have-checkpoint-function/48724371#48724371

    -

    17.4.1. Snapshot internals

    +

    18.4.1. Snapshot internals

    Snapshots are stored inside the .qcow2 images themselves.

    @@ -15540,7 +15557,7 @@ Format specific information:
    -

    17.5. Device models

    +

    18.5. Device models

    This section documents:

    @@ -15585,12 +15602,12 @@ Format specific information:
    -

    17.5.1. PCI

    +

    18.5.1. PCI

    Only tested in x86.

    -
    17.5.1.1. pci_min
    +
    18.5.1.1. pci_min

    PCI driver for our minimal pci_min.c QEMU fork device:

    @@ -15660,7 +15677,7 @@ lkmc_pci_min mmio_write addr = 4 val = 0 size = 4
    -
    17.5.1.2. QEMU edu PCI device
    +
    18.5.1.2. QEMU edu PCI device

    Small upstream educational PCI device:

    @@ -15727,7 +15744,7 @@ lkmc_pci_min mmio_write addr = 4 val = 0 size = 4
    -
    17.5.1.3. Manipulate PCI registers directly
    +
    18.5.1.3. Manipulate PCI registers directly

    In this section we will try to interact with PCI devices directly from userland without kernel modules.

    @@ -15873,7 +15890,7 @@ devmem 0xfeb54000 w 0x12345678
    -
    17.5.1.4. pciutils
    +
    18.5.1.4. pciutils

    There are two versions of setpci and lspci:

    @@ -15889,7 +15906,7 @@ devmem 0xfeb54000 w 0x12345678
    -
    17.5.1.5. Introduction to PCI
    +
    18.5.1.5. Introduction to PCI

    The PCI standard is non-free, obviously like everything in low level: https://pcisig.com/specifications but Google gives several illegal PDF hits :-)

    @@ -15949,7 +15966,7 @@ devmem 0xfeb54000 w 0x12345678
    -
    17.5.1.6. PCI BFD
    +
    18.5.1.6. PCI BFD

    lspci -k shows something like:

    @@ -16003,7 +16020,7 @@ devmem 0xfeb54000 w 0x12345678
    -
    17.5.1.7. PCI BAR
    +
    18.5.1.7. PCI BAR
    @@ -16045,7 +16062,7 @@ pci_register_bar(pdev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &edu->mmio);
    -

    17.5.2. GPIO

    +

    18.5.2. GPIO

    TODO: broken. Was working before we moved arm from -M versatilepb to -M virt around af210a76711b7fa4554dcc2abd0ddacfc810dfd4. Either make it work on -M virt if that is possible, or document precisely how to make it work with versatilepb, or hopefully vexpress which is newer.

    @@ -16088,7 +16105,7 @@ pci_register_bar(pdev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &edu->mmio);
    -

    17.5.3. LEDs

    +

    18.5.3. LEDs

    TODO: broken when arm moved to -M virt, same as GPIO.

    @@ -16160,7 +16177,7 @@ echo 255 >brightness
    -

    17.5.4. platform_device

    +

    18.5.4. platform_device

    Minimal platform device example coded into the -M versatilepb SoC of our QEMU fork.

    @@ -16238,7 +16255,7 @@ insmod platform_device.ko
    -

    17.5.5. gem5 educational hardware models

    +

    18.5.5. gem5 educational hardware models

    TODO get some working!

    @@ -16248,7 +16265,7 @@ insmod platform_device.ko
    -

    17.6. QEMU monitor

    +

    18.6. QEMU monitor

    The QEMU monitor is a magic terminal that allows you to send text commands to the QEMU VM itself: https://en.wikibooks.org/wiki/QEMU/Monitor

    @@ -16368,7 +16385,7 @@ insmod platform_device.ko
    -

    17.6.1. QEMU monitor from guest

    +

    18.6.1. QEMU monitor from guest

    @@ -16385,7 +16402,7 @@ insmod platform_device.ko
    -

    17.6.2. QEMU monitor from GDB

    +

    18.6.2. QEMU monitor from GDB

    When doing GDB step debug it is possible to send QEMU monitor commands through the GDB monitor command, which saves you the trouble of opening yet another shell.

    @@ -16401,7 +16418,7 @@ monitor info qtree
    -

    17.7. Debug the emulator

    +

    18.7. Debug the emulator

    When you start hacking QEMU or gem5, it is useful to see what is going on inside the emulator themselves.

    @@ -16439,7 +16456,7 @@ monitor info qtree

    The build outputs are automatically stored in a different directories for optimized and debug builds, which prevents debug files from overwriting opt ones. Therefore, --gem5-build-id is not required:

    -

    The price to pay for debuggability is high however: a Linux kernel boot was about 3x slower in QEMU and 14 times slower in gem5 debug compared to opt, see benchmarks at: Section 28.2.1, “Benchmark Linux kernel boot”

    +

    The price to pay for debuggability is high however: a Linux kernel boot was about 3x slower in QEMU and 14 times slower in gem5 debug compared to opt, see benchmarks at: Section 29.2.1, “Benchmark Linux kernel boot”

    When in QEMU text mode, using --debug-vm makes Ctrl-C not get passed to the QEMU guest anymore: it is instead captured by GDB itself, so allow breaking. So e.g. you won’t be able to easily quit from a guest program like:

    @@ -16456,7 +16473,7 @@ monitor info qtree

    You can still send key presses to QEMU however even without the mouse capture, just either click on the title bar, or alt tab to give it focus.

    -

    17.7.1. Reverse debug the emulator

    +

    18.7.1. Reverse debug the emulator

    While step debugging any complext program, you always end up feeling the need to step in reverse to reach the last call to some function that was called before the failure point, in order to trace back the problem to the actual bug source.

    @@ -16527,7 +16544,7 @@ reverse-next
    -

    17.7.2. Debug gem5 Python scripts

    +

    18.7.2. Debug gem5 Python scripts

    Start pdb at the first instruction:

    @@ -16561,7 +16578,7 @@ reverse-next
    -

    17.8. Tracing

    +

    18.8. Tracing

    QEMU can log several different events.

    @@ -16652,7 +16669,7 @@ Call Trace:
    -

    17.8.1. QEMU -d tracing

    +

    18.8.1. QEMU -d tracing

    QEMU also has a second trace mechanism in addition to -trace, find out the events with:

    @@ -16693,7 +16710,7 @@ IN:
    -

    17.8.2. QEMU trace register values

    +

    18.8.2. QEMU trace register values

    TODO: is it possible to show the register values for each instruction?

    @@ -16723,11 +16740,11 @@ IN:

    PANDA can list memory addresses, so I bet it can also decode the instructions: https://github.com/panda-re/panda/blob/883c85fa35f35e84a323ed3d464ff40030f06bd6/panda/docs/LINE_Censorship.md I wonder why they don’t just upstream those things to QEMU’s tracing: https://github.com/panda-re/panda/issues/290

    -

    gem5 can do it as shown at: Section 17.8.6, “gem5 tracing”.

    +

    gem5 can do it as shown at: Section 18.8.6, “gem5 tracing”.

    -

    17.8.3. Trace source lines

    +

    18.8.3. Trace source lines

    We can further use Binutils' addr2line to get the line that corresponds to each address:

    @@ -16783,7 +16800,7 @@ less "$(./getvar --arch x86_64 run_dir)/trace-lines.txt"
    -

    17.8.4. QEMU record and replay

    +

    18.8.4. QEMU record and replay

    QEMU runs, unlike gem5, are not deterministic by default, however it does support a record and replay mechanism that allows you to replay a previous run deterministically.

    @@ -16890,7 +16907,7 @@ less "$(./getvar --arch x86_64 run_dir)/trace-lines.txt"

    Solved on unmerged c42634d8e3428cfa60672c3ba89cabefc720cde9 from https://github.com/ispras/qemu/tree/rr-180725

    -
    17.8.4.1. QEMU reverse debugging
    +
    18.8.4.1. QEMU reverse debugging

    TODO get working.

    @@ -16929,7 +16946,7 @@ reverse-continue
    -

    17.8.5. QEMU trace multicore

    +

    18.8.5. QEMU trace multicore

    TODO: is there any way to distinguish which instruction runs on each core? Doing:

    @@ -16944,7 +16961,7 @@ reverse-continue
    -

    17.8.6. gem5 tracing

    +

    18.8.6. gem5 tracing

    gem5 provides also provides a tracing mechanism documented at: http://www.gem5.org/Trace_Based_Debugging:

    @@ -17052,7 +17069,7 @@ less "$(./getvar --arch aarch64 run_dir)/trace-lines.txt"

    TODO: 7452d399290c9c1fc6366cdad129ef442f323564 ./trace2line this is too slow and takes hours. QEMU’s processing of 170k events takes 7 seconds. gem5’s processing is analogous, but there are 140M events, so it should take 7000 seconds ~ 2 hours which seems consistent with what I observe, so maybe there is no way to speed this up…​ The workaround is to just use gem5’s ExecSymbol to get function granularity, and then GDB individually if line detail is needed?

    -
    17.8.6.1. gem5 ExecAll trace format
    +
    18.8.6.1. gem5 ExecAll trace format

    This debug flag traces all instructions.

    @@ -17090,7 +17107,7 @@ less "$(./getvar --arch aarch64 run_dir)/trace-lines.txt"

    25007500: time count in some unit. Note how the microops execute at further timestamps.

  • -

    system.cpu: distinguishes between CPUs when there are more than one. For example, running Section 26.8.3, “ARM multicore” with two cores produces system.cpu0 and system.cpu1

    +

    system.cpu: distinguishes between CPUs when there are more than one. For example, running Section 27.8.3, “ARM multicore” with two cores produces system.cpu0 and system.cpu1

  • T0: thread number. TODO: hyperthread? How to play with it?

    @@ -17135,7 +17152,7 @@ less "$(./getvar --arch aarch64 run_dir)/trace-lines.txt"
    -
    17.8.6.2. gem5 Registers trace format
    +
    18.8.6.2. gem5 Registers trace format

    This flag shows a more detailed register usage than gem5 ExecAll trace format.

    @@ -17190,13 +17207,13 @@ add x1, x0, 2
    -
    17.8.6.3. gem5 TARMAC traces
    +
    18.8.6.3. gem5 TARMAC traces
    -
    17.8.6.4. gem5 tracing internals
    +
    18.8.6.4. gem5 tracing internals

    As of gem5 16eeee5356585441a49d05c78abc328ef09f7ace the default tracer is ExeTracer. It is set at:

    @@ -17269,7 +17286,7 @@ src/arch/x86/nativetrace.hh:41:class X86NativeTrace : public NativeTrace
    -

    17.9. QEMU GUI is unresponsive

    +

    18.9. QEMU GUI is unresponsive

    Sometimes in Ubuntu 14.04, after the QEMU SDL GUI starts, it does not get updated after keyboard strokes, and there are artifacts like disappearing text.

    @@ -17293,13 +17310,13 @@ root
    -

    18. gem5

    +

    19. gem5

    -

    18.1. gem5 vs QEMU

    +

    19.1. gem5 vs QEMU

    @@ -17380,7 +17397,7 @@ root
    -

    18.2. gem5 run benchmark

    +

    19.2. gem5 run benchmark

    OK, this is why we used gem5 in the first place, performance measurements!

    @@ -17550,7 +17567,7 @@ cat out/gem5-bench-dhrystone.txt
    -

    but the problem is that this method does not allow to easily run a different script without running the boot again. The ./gem5.sh script works around that by using m5 readfile as explained further at: Section 18.5.2, “gem5 checkpoint restore and run a different script”.

    +

    but the problem is that this method does not allow to easily run a different script without running the boot again. The ./gem5.sh script works around that by using m5 readfile as explained further at: Section 19.5.2, “gem5 checkpoint restore and run a different script”.

    Now you can play a fun little game with your friends:

    @@ -17572,7 +17589,7 @@ cat out/gem5-bench-dhrystone.txt

    To find out why your program is slow, a good first step is to have a look at the gem5 m5out/stats.txt file.

    -

    18.2.1. Skip extra benchmark instructions

    +

    19.2.1. Skip extra benchmark instructions

    A few imperfections of our benchmarking method are:

    @@ -17607,7 +17624,7 @@ cat out/gem5-bench-dhrystone.txt
    -

    18.2.2. gem5 system parameters

    +

    19.2.2. gem5 system parameters

    Besides optimizing a program for a given CPU setup, chip developers can also do the inverse, and optimize the chip for a given benchmark!

    @@ -17615,7 +17632,7 @@ cat out/gem5-bench-dhrystone.txt

    The rabbit hole is likely deep, but let’s scratch a bit of the surface.

    -
    18.2.2.1. Number of cores
    +
    19.2.2.1. Number of cores
    ./run --arch arm --cpus 2 --emulator gem5
    @@ -17631,7 +17648,7 @@ getconf _NPROCESSORS_CONF
    -
    18.2.2.1.1. QEMU user mode multithreading
    +
    19.2.2.1.1. QEMU user mode multithreading

    TODO why in User mode simulation QEMU always shows the number of cores of the host. E.g., both of the following output the same as nproc on the host:

    @@ -17662,7 +17679,7 @@ ps Haux | grep qemu | wc
    -
    18.2.2.1.2. gem5 syscall emulation multithreading
    +
    19.2.2.1.2. gem5 syscall emulation multithreading

    gem5 user mode multithreading has been particularly flaky compared to QEMU’s.

    @@ -17723,7 +17740,7 @@ ps Haux | grep qemu | wc
    -
    18.2.2.1.3. gem5 se.py user mode with 2 or more pthreads fails with because simulate() limit reached
    +
    19.2.2.1.3. gem5 se.py user mode with 2 or more pthreads fails with because simulate() limit reached
    @@ -17732,7 +17749,7 @@ ps Haux | grep qemu | wc
    -
    18.2.2.1.4. gem5 ARM full system with more than 8 cores
    +
    19.2.2.1.4. gem5 ARM full system with more than 8 cores
    @@ -17754,7 +17771,7 @@ ps Haux | grep qemu | wc
    -
    18.2.2.2. gem5 cache size
    +
    19.2.2.2. gem5 cache size
    @@ -17923,7 +17940,7 @@ instructions 91738770
    -
    18.2.2.3. gem5 memory latency
    +
    19.2.2.3. gem5 memory latency

    TODO These look promising:

    @@ -17941,7 +17958,7 @@ instructions 91738770
    -
    18.2.2.4. Memory size
    +
    19.2.2.4. Memory size
    ./run --memory 512M
    @@ -18042,7 +18059,7 @@ get_avphys_pages() * sysconf(_SC_PAGESIZE) = 0x1D178000
    -
    18.2.2.5. gem5 disk and network latency
    +
    19.2.2.5. gem5 disk and network latency

    TODO These look promising:

    @@ -18057,7 +18074,7 @@ get_avphys_pages() * sysconf(_SC_PAGESIZE) = 0x1D178000
    -
    18.2.2.6. gem5 clock frequency
    +
    19.2.2.6. gem5 clock frequency

    Clock frequency: TODO how does it affect performance in benchmarks?

    @@ -18095,7 +18112,7 @@ m5 dumpstats
    -

    18.2.3. Interesting benchmarks

    +

    19.2.3. Interesting benchmarks

    Buildroot built-in libraries, mostly under Libraries > Other:

    @@ -18128,10 +18145,10 @@ m5 dumpstats
    -

    There are not yet enabled, but it should be easy to so, see: Section 19.5, “Add new Buildroot packages”

    +

    There are not yet enabled, but it should be easy to so, see: Section 20.5, “Add new Buildroot packages”

    -
    18.2.3.1. BST vs heap vs hashmap
    +
    19.2.3.1. BST vs heap vs hashmap

    The following benchmark setup works both:

    @@ -18224,7 +18241,7 @@ xdg-open bst_vs_heap_vs_hashmap_gem5.tmp.png

    The cache sizes were chosen to match the host P51 to improve the comparison. Ideally we should also use the same standard library.

    -

    Note that this will take a long time, and will produce a humongous ~40Gb stats file as explained at: Section 18.9.2.1, “gem5 only dump selected stats”

    +

    Note that this will take a long time, and will produce a humongous ~40Gb stats file as explained at: Section 19.9.2.1, “gem5 only dump selected stats”

    Sources:

    @@ -18244,7 +18261,7 @@ xdg-open bst_vs_heap_vs_hashmap_gem5.tmp.png
    -
    18.2.3.2. BLAS
    +
    19.2.3.2. BLAS

    Buildroot supports it, which makes everything just trivial:

    @@ -18296,7 +18313,7 @@ cblas_dgemm( CblasColMajor, CblasNoTrans, CblasTrans,3,3,2 ,1, A,3, B,
    -
    18.2.3.3. Eigen
    +
    19.2.3.3. Eigen

    Header only linear algebra library with a mainline Buildroot package:

    @@ -18334,7 +18351,7 @@ cblas_dgemm( CblasColMajor, CblasNoTrans, CblasTrans,3,3,2 ,1, A,3, B,
    -
    18.2.3.4. PARSEC benchmark
    +
    19.2.3.4. PARSEC benchmark

    We have ported parts of the PARSEC benchmark for cross compilation at: https://github.com/cirosantilli/parsec-benchmark See the documentation on that repo to find out which benchmarks have been ported. Some of the benchmarks were are segfaulting, they are documented in that repo.

    @@ -18352,7 +18369,7 @@ cblas_dgemm( CblasColMajor, CblasNoTrans, CblasTrans,3,3,2 ,1, A,3, B,
    -
    18.2.3.4.1. PARSEC benchmark without parsecmgmt
    +
    19.2.3.4.1. PARSEC benchmark without parsecmgmt
    ./build --arch arm --download-dependencies gem5-buildroot parsec-benchmark
    @@ -18386,7 +18403,7 @@ cblas_dgemm(      CblasColMajor, CblasNoTrans, CblasTrans,3,3,2  ,1,    A,3,  B,
     
    -
    18.2.3.4.2. PARSEC change the input size
    +
    19.2.3.4.2. PARSEC change the input size

    Running a benchmark of a size different than test, e.g. simsmall, requires a rebuild with:

    @@ -18450,7 +18467,7 @@ cblas_dgemm( CblasColMajor, CblasNoTrans, CblasTrans,3,3,2 ,1, A,3, B,
    -
    18.2.3.4.3. PARSEC benchmark with parsecmgmt
    +
    19.2.3.4.3. PARSEC benchmark with parsecmgmt

    Most users won’t want to use this method because:

    @@ -18513,9 +18530,9 @@ parsecmgmt -a run -p splash2x.fmm -i test
    -
    18.2.3.4.4. PARSEC uninstall
    +
    19.2.3.4.4. PARSEC uninstall
    -

    If you want to remove PARSEC later, Buildroot doesn’t provide an automated package removal mechanism as mentioned at: Section 19.6, “Remove Buildroot packages”, but the following procedure should be satisfactory:

    +

    If you want to remove PARSEC later, Buildroot doesn’t provide an automated package removal mechanism as mentioned at: Section 20.6, “Remove Buildroot packages”, but the following procedure should be satisfactory:

    @@ -18531,7 +18548,7 @@ parsecmgmt -a run -p splash2x.fmm -i test
    -
    18.2.3.4.5. PARSEC benchmark hacking
    +
    19.2.3.4.5. PARSEC benchmark hacking

    If you end up going inside submodules/parsec-benchmark to hack up the benchmark (you will!), these tips will be helpful.

    @@ -18586,7 +18603,7 @@ git clean -xdf .
    -

    18.3. gem5 kernel command line parameters

    +

    19.3. gem5 kernel command line parameters

    Analogous to QEMU:

    @@ -18619,9 +18636,9 @@ git clean -xdf .
    -

    18.4. gem5 GDB step debug

    +

    19.4. gem5 GDB step debug

    -

    18.4.1. gem5 GDB step debug kernel

    +

    19.4.1. gem5 GDB step debug kernel

    Analogous to QEMU, on the first shell:

    @@ -18654,7 +18671,7 @@ git clean -xdf .
    -

    18.4.2. gem5 GDB step debug userland process

    +

    19.4.2. gem5 GDB step debug userland process

    We are unable to use gdbserver because of networking as mentioned at: Section 14.3.1.3, “gem5 host to guest networking”

    @@ -18690,7 +18707,7 @@ git clean -xdf .
    -

    18.5. gem5 checkpoint

    +

    19.5. gem5 checkpoint

    Analogous to QEMU’s Snapshot, but better since it can be started from inside the guest, so we can easily checkpoint after a specific guest event, e.g. just before init is done.

    @@ -18775,7 +18792,7 @@ m5 checkpoint

    since boot has already happened, and the parameters are already in the RAM of the snapshot.

    -

    18.5.1. gem5 checkpoint internals

    +

    19.5.1. gem5 checkpoint internals

    Checkpoints are stored inside the m5out directory at:

    @@ -18801,7 +18818,7 @@ m5 checkpoint
    -

    18.5.2. gem5 checkpoint restore and run a different script

    +

    19.5.2. gem5 checkpoint restore and run a different script

    You want to automate running several tests from a single pristine post-boot state.

    @@ -18894,7 +18911,7 @@ expect eof
    -

    18.5.3. gem5 restore checkpoint with a different CPU

    +

    19.5.3. gem5 restore checkpoint with a different CPU

    gem5 can switch to a different CPU model when restoring a checkpoint.

    @@ -18928,7 +18945,7 @@ expect eof
    -

    18.6. Pass extra options to gem5

    +

    19.6. Pass extra options to gem5

    Remember that in the gem5 command line, we can either pass options to the script being run as in:

    @@ -18985,7 +19002,7 @@ expect eof
    -

    18.7. m5ops

    +

    19.7. m5ops

    m5ops are magic instructions which lead gem5 to do magic things, like quitting or dumping stats.

    @@ -19025,7 +19042,7 @@ expect eof
    -

    18.7.1. m5

    +

    19.7.1. m5

    m5 is a guest command line utility that is installed and run on the guest, that serves as a CLI front-end for the m5ops

    @@ -19036,7 +19053,7 @@ expect eof

    It is possible to guess what most tools do from the corresponding m5ops, but let’s at least document the less obvious ones here.

    -
    18.7.1.1. m5 exit
    +
    19.7.1.1. m5 exit

    End the simulation.

    @@ -19045,7 +19062,7 @@ expect eof
    -
    18.7.1.2. m5 fail
    +
    19.7.1.2. m5 fail

    End the simulation with a failure exit event:

    @@ -19084,7 +19101,7 @@ expect eof
    -
    18.7.1.3. m5 writefile
    +
    19.7.1.3. m5 writefile

    Send a guest file to the host. 9P is a more advanced alternative.

    @@ -19115,7 +19132,7 @@ m5 writefile myfileguest myfilehost
    -
    18.7.1.4. m5 readfile
    +
    19.7.1.4. m5 readfile

    Read a host file pointed to by the fs.py --script option to stdout.

    @@ -19143,7 +19160,7 @@ m5 writefile myfileguest myfilehost
    -
    18.7.1.5. m5 initparam
    +
    19.7.1.5. m5 initparam

    Ermm, just another m5 readfile that only takes integers and only from CLI options? Is this software so redundant?

    @@ -19169,7 +19186,7 @@ m5 writefile myfileguest myfilehost
    -
    18.7.1.6. m5 execfile
    +
    19.7.1.6. m5 execfile

    Trivial combination of m5 readfile + execute the script.

    @@ -19204,7 +19221,7 @@ m5 execfile
    -

    18.7.2. m5ops instructions

    +

    19.7.2. m5ops instructions

    gem5 allocates some magic instructions on unused instruction encodings for convenient guest instrumentation.

    @@ -19283,7 +19300,7 @@ m5 execfile
    -
    18.7.2.1. m5ops instructions interface
    +
    19.7.2.1. m5ops instructions interface

    Let’s study how m5 uses them:

    @@ -19397,7 +19414,7 @@ m5_fail(ints[1], ints[0]);
    -
    18.7.2.2. m5op annotations
    +
    19.7.2.2. m5op annotations

    include/gem5/asm/generic/m5ops.h also describes some annotation instructions.

    @@ -19408,7 +19425,7 @@ m5_fail(ints[1], ints[0]);
    -

    18.8. gem5 arm Linux kernel patches

    +

    19.8. gem5 arm Linux kernel patches

    https://gem5.googlesource.com/arm/linux/ contains an ARM Linux kernel forks with a few gem5 specific Linux kernel patches on top of mainline created by ARM Holdings on top of a few upstream kernel releases.

    @@ -19485,7 +19502,7 @@ git -C "$(./getvar linux_source_dir)" checkout -

    drm: Add component-aware simple encoder allows you to see images through VNC, see: Section 13.3, “gem5 graphic mode”

  • -

    gem5: Add support for gem5’s extended GIC mode adds support for more than 8 cores, see: Section 18.2.2.1.4, “gem5 ARM full system with more than 8 cores”

    +

    gem5: Add support for gem5’s extended GIC mode adds support for more than 8 cores, see: Section 19.2.2.1.4, “gem5 ARM full system with more than 8 cores”

  • @@ -19493,7 +19510,7 @@ git -C "$(./getvar linux_source_dir)" checkout -

    Tested on 649d06d6758cefd080d04dc47fd6a5a26a620874 + 1.

    -

    18.8.1. gem5 arm Linux kernel patches boot speedup

    +

    19.8.1. gem5 arm Linux kernel patches boot speedup

    We have observed that with the kernel patches, boot is 2x faster, falling from 1m40s to 50s.

    @@ -19511,7 +19528,7 @@ git -C "$(./getvar linux_source_dir)" checkout -
    -

    18.9. m5out directory

    +

    19.9. m5out directory

    When you run gem5, it generates an m5out directory at:

    @@ -19527,7 +19544,7 @@ git -C "$(./getvar linux_source_dir)" checkout -

    The files in that directory contains some very important information about the run, and you should become familiar with every one of them.

    -

    18.9.1. gem5 m5out/system.terminal file

    +

    19.9.1. gem5 m5out/system.terminal file

    Contains UART output, both from the Linux kernel or from the baremetal system.

    @@ -19536,7 +19553,7 @@ git -C "$(./getvar linux_source_dir)" checkout -
    -

    18.9.2. gem5 m5out/stats.txt file

    +

    19.9.2. gem5 m5out/stats.txt file

    This file contains important statistics about the run:

    @@ -19569,7 +19586,7 @@ system.cpu.dtb.inst_hits

    For x86, it is interesting to try and correlate numCycles with:

    -
    18.9.2.1. gem5 only dump selected stats
    +
    19.9.2.1. gem5 only dump selected stats

    TODO

    @@ -19582,7 +19599,7 @@ system.cpu.dtb.inst_hits
    -

    18.9.3. gem5 config.ini

    +

    19.9.3. gem5 config.ini

    The m5out/config.ini file, contains a very good high level description of the system:

    @@ -19665,7 +19682,7 @@ clock=500
    -

    18.10. m5term

    +

    19.10. m5term

    We use the m5term in-tree executable to connect to the terminal instead of a direct telnet.

    @@ -19690,7 +19707,7 @@ clock=500
    -

    18.11. gem5 Python scripts without rebuild

    +

    19.11. gem5 Python scripts without rebuild

    We have made a crazy setup that allows you to just cd into submodules/gem5, and edit Python scripts directly there.

    @@ -19724,7 +19741,7 @@ clock=500
    -

    18.12. gem5 fs_bigLITTLE

    +

    19.12. gem5 fs_bigLITTLE

    By default, we use configs/example/fs.py script.

    @@ -19784,7 +19801,7 @@ clock=500
    -

    18.13. gem5 unit tests

    +

    19.13. gem5 unit tests

    https://stackoverflow.com/questions/52279971/how-to-run-the-gem5-unit-tests

    @@ -19842,7 +19859,7 @@ clock=500
    -

    18.14. gem5 regression tests

    +

    19.14. gem5 regression tests

    https://stackoverflow.com/questions/52279971/how-to-run-the-gem5-unit-tests

    @@ -19859,7 +19876,7 @@ clock=500
    -

    18.15. gem5 simulate() limit reached

    +

    19.15. gem5 simulate() limit reached

    This error happens when the following instruction limits are reached:

    @@ -19988,18 +20005,18 @@ Exiting @ tick 18446744073709551615 because simulate() limit reached
    -

    18.16. gem5 build options

    +

    19.16. gem5 build options

    In order to use different build options, you might also want to use gem5 build variants to keep the build outputs separate from one another.

    -

    18.16.1. gem5 debug build

    +

    19.16.1. gem5 debug build

    -

    18.16.2. gem5 clang build

    +

    19.16.2. gem5 clang build

    TODO test properly, benchmark vs GCC.

    @@ -20012,7 +20029,7 @@ Exiting @ tick 18446744073709551615 because simulate() limit reached
    -

    18.16.3. gem5 sanitation build

    +

    19.16.3. gem5 sanitation build

    If there gem5 appears to have a C++ undefined behaviour bug, which is often very difficult to track down, you can try to build it with the following extra SCons options:

    @@ -20074,7 +20091,7 @@ Indirect leak of 1346 byte(s) in 2 object(s) allocated from:
    -

    18.16.4. gem5 Ruby build

    +

    19.16.4. gem5 Ruby build

    Ruby is a system that includes the SLICC domain specific language to describe memory systems: http://gem5.org/Ruby

    @@ -20128,7 +20145,7 @@ Indirect leak of 1346 byte(s) in 2 object(s) allocated from:
    -

    18.16.5. gem5 Python 3 build

    +

    19.16.5. gem5 Python 3 build

    Python 3 support was mostly added in 2019 Q3 at arounda347a1a68b8a6e370334be3a1d2d66675891e0f1 but remained buggy for some time afterwards.

    @@ -20146,7 +20163,7 @@ Indirect leak of 1346 byte(s) in 2 object(s) allocated from:
    -

    18.17. gem5 CPU types

    +

    19.17. gem5 CPU types

    gem5 has a few in tree CPU models for different purposes. In fs.py and se.py, those are selectable with the --cpu-type option. Here is an overview of the most interesting ones:

    @@ -20181,7 +20198,7 @@ Indirect leak of 1346 byte(s) in 2 object(s) allocated from:
    -

    18.17.1. gem5 ARM RSK

    +

    19.17.1. gem5 ARM RSK

    https://github.com/arm-university/arm-gem5-rsk/blob/aa3b51b175a0f3b6e75c9c856092ae0c8f2a7cdc/gem5_rsk.pdf

    @@ -20191,7 +20208,7 @@ Indirect leak of 1346 byte(s) in 2 object(s) allocated from:
    -

    18.18. gem5 ARM platforms

    +

    19.18. gem5 ARM platforms

    The gem5 platform is selectable with the --machine option, which is named after the analogous QEMU -machine option, and which sets the --machine-type.

    @@ -20219,9 +20236,9 @@ Indirect leak of 1346 byte(s) in 2 object(s) allocated from:
    -

    18.19. gem5 internals

    +

    19.19. gem5 internals

    -

    18.19.1. gem5 Python C++ interaction

    +

    19.19.1. gem5 Python C++ interaction

    The interaction uses the Python C extension interface https://docs.python.org/2/extending/extending.html interface through the pybind11 helper library: https://github.com/pybind/pybind11

    @@ -20390,10 +20407,10 @@ static EmbeddedPyBind embed_obj("BadDevice", module_init, "BasicPioDevice");
    -

    19. Buildroot

    +

    20. Buildroot

    -

    19.1. Introduction to Buildroot

    +

    20.1. Introduction to Buildroot

    Buildroot is a set of Make scripts that download and compile from source compatible versions of:

    @@ -20406,7 +20423,7 @@ static EmbeddedPyBind embed_obj("BadDevice", module_init, "BasicPioDevice");Linux kernel

  • -

    C standard library: Buildroot supports several implementations, see: Section 19.10, “libc choice”

    +

    C standard library: Buildroot supports several implementations, see: Section 20.10, “libc choice”

  • BusyBox: provides the shell and basic command line utilities

    @@ -20454,7 +20471,7 @@ qemu-system-aarch64 -M virt -cpu cortex-a57 -nographic -smp 1 -kernel output/ima
  • -

    19.2. Custom Buildroot configs

    +

    20.2. Custom Buildroot configs

    We provide the following mechanisms:

    @@ -20489,10 +20506,10 @@ qemu-system-aarch64 -M virt -cpu cortex-a57 -nographic -smp 1 -kernel output/ima

    The clean is necessary because the source files didn’t change, so make would just check the timestamps and not build anything.

    -

    You will then likely want to make those more permanent as explained at: Section 29.4, “Default command line arguments”.

    +

    You will then likely want to make those more permanent as explained at: Section 30.4, “Default command line arguments”.

    -

    19.2.1. Enable Buildroot compiler optimizations

    +

    20.2.1. Enable Buildroot compiler optimizations

    If you are benchmarking compiled programs instead of hand written assembly, remember that we configure Buildroot to disable optimizations by default with:

    @@ -20524,7 +20541,7 @@ qemu-system-aarch64 -M virt -cpu cortex-a57 -nographic -smp 1 -kernel output/ima
    -

    19.4. Change user

    +

    20.4. Change user

    At startup, we login automatically as the root user.

    @@ -20623,7 +20640,7 @@ make menuconfig
    -

    19.5. Add new Buildroot packages

    +

    20.5. Add new Buildroot packages

    First, see if you can’t get away without actually adding a new package, for example:

    @@ -20656,7 +20673,7 @@ make menuconfig

    if you have a standalone C file with no dependencies besides the C standard library to be compiled with GCC, just add a new file under buildroot_packages/sample_package and you are done

  • -

    if you have a dependency on a library, first check if Buildroot doesn’t have a package for it already with ls buildroot/package. If yes, just enable that package as explained at: Section 19.2, “Custom Buildroot configs”

    +

    if you have a dependency on a library, first check if Buildroot doesn’t have a package for it already with ls buildroot/package. If yes, just enable that package as explained at: Section 20.2, “Custom Buildroot configs”

  • @@ -20664,14 +20681,14 @@ make menuconfig

    If none of those methods are flexible enough for you, you can just fork or hack up buildroot_packages/sample_package the sample package to do what you want.

    -

    For how to use that package, see: Section 29.12.2, “buildroot_packages directory”.

    +

    For how to use that package, see: Section 30.12.2, “buildroot_packages directory”.

    Then iterate trying to do what you want and reading the manual until it works: https://buildroot.org/downloads/manual/manual.html

    -

    19.6. Remove Buildroot packages

    +

    20.6. Remove Buildroot packages

    Once you’ve built a package in to the image, there is no easy way to remove it.

    @@ -20682,11 +20699,11 @@ make menuconfig

    Also mentioned at: https://stackoverflow.com/questions/47320800/how-to-clean-only-target-in-buildroot

    -

    See this for a sample manual workaround: Section 18.2.3.4.4, “PARSEC uninstall”.

    +

    See this for a sample manual workaround: Section 19.2.3.4.4, “PARSEC uninstall”.

    -

    19.7. BR2_TARGET_ROOTFS_EXT2_SIZE

    +

    20.7. BR2_TARGET_ROOTFS_EXT2_SIZE

    When adding new large package to the Buildroot root filesystem, it may fail with the message:

    @@ -20730,7 +20747,7 @@ TODO benchmark: would gem5 suffer a considerable disk read performance hit due t

    libguestfs: https://serverfault.com/questions/246835/convert-directory-to-qemu-kvm-virtual-disk-image/916697#916697, in particular vfs-minimum-size

  • -

    use methods described at: Section 18.5.2, “gem5 checkpoint restore and run a different script” instead of putting builds on the root filesystem

    +

    use methods described at: Section 19.5.2, “gem5 checkpoint restore and run a different script” instead of putting builds on the root filesystem

  • @@ -20738,7 +20755,7 @@ TODO benchmark: would gem5 suffer a considerable disk read performance hit due t

    Bibliography: https://stackoverflow.com/questions/49211241/is-there-a-way-to-automatically-detect-the-minimum-required-br2-target-rootfs-ex

    -

    19.7.1. SquashFS

    +

    20.7.1. SquashFS

    SquashFS creation with mksquashfs does not take fixed sizes, and I have successfully booted from it, but it is readonly, which is unacceptable.

    @@ -20751,7 +20768,7 @@ TODO benchmark: would gem5 suffer a considerable disk read performance hit due t
    -

    19.8. Buildroot rebuild is slow when the root filesystem is large

    +

    20.8. Buildroot rebuild is slow when the root filesystem is large

    Buildroot is not designed for large root filesystem images, and the rebuild becomes very slow when we add a large package to it.

    @@ -20789,7 +20806,7 @@ TODO benchmark: would gem5 suffer a considerable disk read performance hit due t
    -

    19.9. Report upstream bugs

    +

    20.9. Report upstream bugs

    When asking for help on upstream repositories outside of this repository, you will need to provide the commands that you are running in detail without referencing our scripts.

    @@ -20849,7 +20866,7 @@ git -C "$(./getvar qemu_source_dir)" checkout -

    Then, you will also want to do a Bisection to pinpoint the exact commit to blame, and CC that developer.

    -

    Finally, give the images you used save upstream developers' time as shown at: Section 29.17.2, “release-zip”.

    +

    Finally, give the images you used save upstream developers' time as shown at: Section 30.17.2, “release-zip”.

    For Buildroot problems, you should wither provide the config you have:

    @@ -20864,7 +20881,7 @@ git -C "$(./getvar qemu_source_dir)" checkout -
    -

    19.10. libc choice

    +

    20.10. libc choice

    Buildroot supports several libc implementations, including:

    @@ -20911,10 +20928,29 @@ git -C "$(./getvar qemu_source_dir)" checkout -

    One "downside" of glibc is that it exercises much more kernel functionality on its more bloated pre-main init, which breaks user mode C hello worlds more often, see: Section 10.4, “User mode simulation with glibc”. I quote "downside" because glibc is actually exposing emulator bugs which we should actually go and fix.

    +
    +

    20.11. Buildroot hello world

    +
    +

    This repo doesn’t do much more other than setting a bunch of Buildroot configurations and building it, and the minimal work you have to do to get QEMU to boot Buildroot from scratch is tiny if you want to quickly test Buildroot specifics, for example:

    +
    +
    + +
    +
    -

    20. Userland content

    +

    21. Userland content

    This section contains userland content, such as C, C++ and POSIX examples.

    @@ -20923,7 +20959,7 @@ git -C "$(./getvar qemu_source_dir)" checkout -

    Getting started at: Section 1.6, “Userland setup”

    -

    Userland assembly content is located at: Section 21, “Userland assembly”. It was split from this section basically because we were hitting the HTML h6 limit, stupid web :-)

    +

    Userland assembly content is located at: Section 22, “Userland assembly”. It was split from this section basically because we were hitting the HTML h6 limit, stupid web :-)

    This content makes up the bulk of the userland/ directory.

    @@ -20935,7 +20971,7 @@ git -C "$(./getvar qemu_source_dir)" checkout -

    This section was originally moved in here from: https://github.com/cirosantilli/cpp-cheat

    -

    20.1. C

    +

    21.1. C

    Programs under userland/c/ are examples of ANSI C programming:

    @@ -21045,7 +21081,7 @@ git -C "$(./getvar qemu_source_dir)" checkout -
    -

    20.1.1. malloc

    +

    21.1.1. malloc

    @@ -21059,7 +21095,7 @@ git -C "$(./getvar qemu_source_dir)" checkout -

    malloc leads to the infinite joys of Memory leaks.

    -
    20.1.1.1. malloc implementation
    +
    21.1.1.1. malloc implementation

    TODO: the exact answer is going to be hard.

    @@ -21104,7 +21140,7 @@ printf '%x\n' 4198400
    -
    20.1.1.2. malloc maximum size
    +
    21.1.1.2. malloc maximum size
    @@ -21170,7 +21206,7 @@ echo 1 > /proc/sys/vm/overcommit_memory

    If we start using the pages, the OOM killer would sooner or later step in and kill our process: Linux out-of-memory killer.

    -
    20.1.1.2.1. Linux out-of-memory killer
    +
    21.1.1.2.1. Linux out-of-memory killer

    We can observe the OOM in LKMC 1e969e832f66cb5a72d12d57c53fb09e9721d589 which defaults to 256MiB of memory with:

    @@ -21193,7 +21229,7 @@ echo 1 > /proc/sys/vm/overcommit_memory
    -

    20.1.2. C multithreading

    +

    21.1.2. C multithreading

    Added in C11!

    @@ -21216,9 +21252,9 @@ echo 1 > /proc/sys/vm/overcommit_memory
    -

    20.1.3. GCC C extensions

    +

    21.1.3. GCC C extensions

    -
    20.1.3.1. C empty struct
    +
    21.1.3.1. C empty struct
    @@ -21230,7 +21266,7 @@ echo 1 > /proc/sys/vm/overcommit_memory
    -
    20.1.3.2. OpenMP
    +
    21.1.3.2. OpenMP

    GCC implements the OpenMP threading implementation: https://stackoverflow.com/questions/3949901/pthreads-vs-openmp

    @@ -21256,7 +21292,7 @@ echo 1 > /proc/sys/vm/overcommit_memory
    -

    20.2. C++

    +

    21.2. C++

    Programs under userland/cpp/ are examples of ISO C programming.

    @@ -21287,7 +21323,7 @@ echo 1 > /proc/sys/vm/overcommit_memory
    -

    20.2.1. C++ multithreading

    +

    21.2.1. C++ multithreading

    -

    20.2.2. C++ standards

    +

    21.2.2. C++ standards

    Like for C, you have to pay for the standards…​ insane. So we just use the closest free drafts instead.

    @@ -21331,7 +21367,7 @@ echo 1 > /proc/sys/vm/overcommit_memory

    https://stackoverflow.com/questions/81656/where-do-i-find-the-current-c-or-c-standard-documents

    -
    20.2.2.1. C++17 N4659 standards draft
    +
    21.2.2.1. C++17 N4659 standards draft

    http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4659.pdf

    @@ -21339,7 +21375,7 @@ echo 1 > /proc/sys/vm/overcommit_memory
    -

    20.3. POSIX

    +

    21.3. POSIX

    Programs under userland/posix/ are examples of POSIX C programming.

    @@ -21357,7 +21393,7 @@ echo 1 > /proc/sys/vm/overcommit_memory
    -

    20.3.1. unistd.h

    +

    21.3.1. unistd.h

    -

    20.3.2. pthreads

    +

    21.3.2. pthreads

    POSIX' multithreading API. This was for a looong time the only "portable" multithreading alternative, until C++11 finally added threads, thus also extending the portability to Windows.

    @@ -21389,7 +21425,7 @@ echo 1 > /proc/sys/vm/overcommit_memory
    -

    20.3.3. sysconf

    +

    21.3.3. sysconf

    https://pubs.opengroup.org/onlinepubs/9699919799/functions/sysconf.html

    @@ -21422,7 +21458,7 @@ echo 1 > /proc/sys/vm/overcommit_memory
    -

    20.3.4. mmap

    +

    21.3.4. mmap

    The mmap system call allows advanced memory operations.

    @@ -21433,7 +21469,7 @@ echo 1 > /proc/sys/vm/overcommit_memory

    Linux adds has several POSIX extension flags to it.

    -
    20.3.4.1. mmap MAP_ANONYMOUS
    +
    21.3.4.1. mmap MAP_ANONYMOUS

    Basic mmap example, do the same as userland/c/malloc.c, but with mmap.

    @@ -21448,7 +21484,7 @@ echo 1 > /proc/sys/vm/overcommit_memory
    -
    20.3.4.2. mmap file
    +
    21.3.4.2. mmap file

    Memory mapped file example: userland/posix/mmap_file.c

    @@ -21460,7 +21496,7 @@ echo 1 > /proc/sys/vm/overcommit_memory
    -
    20.3.4.3. brk
    +
    21.3.4.3. brk

    Previously POSIX, but was deprecated in favor of malloc

    @@ -21475,9 +21511,22 @@ echo 1 > /proc/sys/vm/overcommit_memory
    +
    +

    21.3.5. socket

    +
    +

    A bit like read and write, but from / to the Internet!

    +
    +
    + +
    +
    -

    20.4. Userland multithreading

    +

    21.4. Userland multithreading

    The following sections are related to multithreading in userland:

    @@ -21529,12 +21578,12 @@ echo 1 > /proc/sys/vm/overcommit_memory
    -

    20.5. C debugging

    +

    21.5. C debugging

    Let’s group the hard-to-debug undefined-behaviour-like stuff found in C / C+ here and how to tackle those problems.

    -

    20.5.1. Stack smashing

    +

    21.5.1. Stack smashing

    @@ -21551,7 +21600,7 @@ echo 1 > /proc/sys/vm/overcommit_memory
    -

    20.5.2. Memory leaks

    +

    21.5.2. Memory leaks

    How to debug: https://stackoverflow.com/questions/6261201/how-to-find-memory-leak-in-a-c-code-project/57877190#57877190

    @@ -21561,7 +21610,7 @@ echo 1 > /proc/sys/vm/overcommit_memory
    -

    20.6. Userland content bibliography

    +

    21.6. Userland content bibliography

    -

    21. Userland assembly

    +

    22. Userland assembly

    Programs under userland/arch/<arch>/ are examples of userland assembly programming.

    @@ -21674,7 +21723,7 @@ echo 1 > /proc/sys/vm/overcommit_memory
  • -

    registers, see: Section 21.1, “Assembly registers”

    +

    registers, see: Section 22.1, “Assembly registers”

  • jumping:

    @@ -21817,14 +21866,14 @@ error: asm_main returned 1 at line 8
  • -

    21.1. Assembly registers

    +

    22.1. Assembly registers

    After seeing an ADD hello world, you need to learn the general registers:

    -

    21.1.1. ARMv8 aarch64 x31 register

    +

    22.1.1. ARMv8 aarch64 x31 register

    @@ -21940,7 +21989,7 @@ When instructions do not interpret this operand encoding as the zero register, u
    -

    21.2. Floating point assembly

    +

    22.2. Floating point assembly

    Keep in mind that many ISAs started floating point as an optional thing, and it later got better integrated into the main CPU, side by side with SIMD.

    @@ -21982,7 +22031,7 @@ When instructions do not interpret this operand encoding as the zero register, u
    -

    21.3. SIMD assembly

    +

    22.3. SIMD assembly

    Much like ADD for non-SIMD, start learning SIMD instructions by looking at the integer and floating point SIMD ADD instructions of each ISA:

    @@ -22072,14 +22121,14 @@ When instructions do not interpret this operand encoding as the zero register, u

    Bibliography: https://stackoverflow.com/questions/1389712/getting-started-with-intel-x86-sse-simd-instructions/56409539#56409539

    -

    21.3.1. FMA instruction

    +

    22.3.1. FMA instruction

    Fused multiply add:

    @@ -22121,7 +22170,7 @@ When instructions do not interpret this operand encoding as the zero register, u
    -

    21.4. User vs system assembly

    +

    22.4. User vs system assembly

    By "userland assembly", we mean "the parts of the ISA which can be freely used from userland".

    @@ -22132,7 +22181,7 @@ When instructions do not interpret this operand encoding as the zero register, u

    One big difference between both is that we can run userland assembly on Userland setup, which is easier to get running and debug.

    -

    In particular, most userland assembly examples link to the C standard library, see: Section 21.5, “Userland assembly C standard library”.

    +

    In particular, most userland assembly examples link to the C standard library, see: Section 22.5, “Userland assembly C standard library”.

    Userland assembly is generally simpler, and a pre-requisite for Baremetal setup.

    @@ -22142,7 +22191,7 @@ When instructions do not interpret this operand encoding as the zero register, u
    -

    21.5. Userland assembly C standard library

    +

    22.5. Userland assembly C standard library

    All examples except the Freestanding programs link to the C standard library.

    @@ -22175,7 +22224,7 @@ When instructions do not interpret this operand encoding as the zero register, u
    -

    21.5.1. Freestanding programs

    +

    22.5.1. Freestanding programs

    Unlike most our other assembly examples, which use the C standard library for portability, examples under freestanding/ directories don’t link to the C standard library:

    @@ -22227,7 +22276,7 @@ When instructions do not interpret this operand encoding as the zero register, u

    This is analogous to step debugging baremetal examples.

    -
    21.5.1.1. nostartfiles programs
    +
    22.5.1.1. nostartfiles programs

    Assembly examples under nostartfiles directories can use the standard library, but they don’t use the pre-main boilerplate and start directly at our explicitly given _start:

    @@ -22245,7 +22294,7 @@ When instructions do not interpret this operand encoding as the zero register, u
    -

    21.6. GCC inline assembly

    +

    22.6. GCC inline assembly

    Examples under arch/<arch>/c/ directories show to how use inline assembly from higher level languages such as C:

    @@ -22305,7 +22354,7 @@ When instructions do not interpret this operand encoding as the zero register, u
    -

    21.6.1. GCC inline assembly register variables

    +

    22.6.1. GCC inline assembly register variables

    Used notably in some of the Linux system calls setups:

    @@ -22329,14 +22378,14 @@ When instructions do not interpret this operand encoding as the zero register, u

    In arm, it is the only way to achieve this effect: https://stackoverflow.com/questions/10831792/how-to-use-specific-register-in-arm-inline-assembler

    -

    This feature notably useful for making system calls from C, see: Section 21.7, “Linux system calls”.

    +

    This feature notably useful for making system calls from C, see: Section 22.7, “Linux system calls”.

    Documentation: https://gcc.gnu.org/onlinedocs/gcc-4.4.2/gcc/Explicit-Reg-Vars.html

    -

    21.6.2. GCC inline assembly scratch registers

    +

    22.6.2. GCC inline assembly scratch registers

    How to use temporary registers in inline assembly:

    @@ -22362,7 +22411,7 @@ When instructions do not interpret this operand encoding as the zero register, u
    -

    21.6.3. GCC inline assembly early-clobbers

    +

    22.6.3. GCC inline assembly early-clobbers

    An example of using the & early-clobber modifier: link:userland/arch/aarch64/earlyclobber.c

    @@ -22374,7 +22423,7 @@ When instructions do not interpret this operand encoding as the zero register, u
    -

    21.6.4. GCC inline assembly floating point ARM

    +

    22.6.4. GCC inline assembly floating point ARM

    Not documented as of GCC 8.2, but possible: https://stackoverflow.com/questions/53960240/armv8-floating-point-output-inline-assembly

    @@ -22390,7 +22439,7 @@ When instructions do not interpret this operand encoding as the zero register, u
    -

    21.6.5. GCC intrinsics

    +

    22.6.5. GCC intrinsics

    Pre-existing C wrappers using inline assembly, this is what production programs should use instead of inline assembly for SIMD:

    @@ -22412,7 +22461,7 @@ When instructions do not interpret this operand encoding as the zero register, u
    -
    21.6.5.1. GCC x86 intrinsics
    +
    22.6.5.1. GCC x86 intrinsics

    Good official cheatsheet with all intrinsics and what they expand to: https://software.intel.com/sites/landingpage/IntrinsicsGuide

    @@ -22540,7 +22589,7 @@ zmmintrin.h AVX512
    -

    21.7. Linux system calls

    +

    22.7. Linux system calls

    The following Userland setup programs illustrate how to make system calls:

    @@ -22640,7 +22689,7 @@ zmmintrin.h AVX512
    -

    21.8. Linux calling conventions

    +

    22.8. Linux calling conventions

    A summary of results is shown at: Table 3, “Summary of Linux calling conventions for several architectures”.

    @@ -22682,7 +22731,7 @@ zmmintrin.h AVX512
    -

    21.8.1. x86_64 calling convention

    +

    22.8.1. x86_64 calling convention

    Examples:

    @@ -22711,7 +22760,7 @@ zmmintrin.h AVX512
    -

    21.8.2. ARM calling convention

    +

    22.8.2. ARM calling convention

    Call C standard library functions from assembly and vice versa.

    @@ -22773,7 +22822,7 @@ zmmintrin.h AVX512
    -

    21.9. GNU GAS assembler

    +

    22.9. GNU GAS assembler

    GNU GAS is the default assembler used by GDB, and therefore it completely dominates in Linux.

    @@ -22781,7 +22830,7 @@ zmmintrin.h AVX512

    The Linux kernel in particular uses GNU GAS assembly extensively for the arch specific parts under arch/.

    -

    21.9.1. GNU GAS assembler comments

    +

    22.9.1. GNU GAS assembler comments

    In this tutorial, we use exclusively C Preprocessor /**/ comments because:

    @@ -22816,7 +22865,7 @@ zmmintrin.h AVX512
    -

    21.9.2. GNU GAS assembler immediates

    +

    22.9.2. GNU GAS assembler immediates

    Summary:

    @@ -22848,7 +22897,7 @@ zmmintrin.h AVX512
    -

    21.9.3. GNU GAS assembler data sizes

    +

    22.9.3. GNU GAS assembler data sizes

    Let’s see how many bytes go into each data type:

    @@ -22940,9 +22989,9 @@ zmmintrin.h AVX512
    -
    21.9.3.1. GNU GAS assembler ARM specifics
    +
    22.9.3.1. GNU GAS assembler ARM specifics
    -
    21.9.3.1.1. GNU GAS assembler ARM unified syntax
    +
    22.9.3.1.1. GNU GAS assembler ARM unified syntax

    There are two types of ARMv7 assemblies:

    @@ -22987,14 +23036,14 @@ zmmintrin.h AVX512
  • -

    cannot have implicit destination with shift, see: Section 23.4.4.1, “ARM shift suffixes”

    +

    cannot have implicit destination with shift, see: Section 24.4.4.1, “ARM shift suffixes”

  • -
    21.9.3.2. GNU GAS assembler ARM .n and .w suffixes
    +
    22.9.3.2. GNU GAS assembler ARM .n and .w suffixes

    When reading disassembly, many instructions have either a .n or .w suffix.

    @@ -23007,7 +23056,7 @@ zmmintrin.h AVX512
    -

    21.9.4. GNU GAS assembler char literals

    +

    22.9.4. GNU GAS assembler char literals

    userland/arch/x86_64/char_literals.S

    @@ -23028,14 +23077,14 @@ zmmintrin.h AVX512
    -

    21.10. NOP instructions

    +

    22.10. NOP instructions

    @@ -23052,13 +23101,13 @@ zmmintrin.h AVX512
    -

    22. x86 userland assembly

    +

    23. x86 userland assembly

    -

    Arch agnostic infrastructure getting started at: Section 21, “Userland assembly”.

    +

    Arch agnostic infrastructure getting started at: Section 22, “Userland assembly”.

    -

    22.1. x86 registers

    +

    23.1. x86 registers

    link:userland/arch/x86_64/registers.S

    @@ -23109,7 +23158,7 @@ zmmintrin.h AVX512
    -

    22.2. x86 addressing modes

    +

    23.2. x86 addressing modes

    @@ -23192,7 +23241,7 @@ zmmintrin.h AVX512
    -

    22.3. x86 data transfer instructions

    +

    23.3. x86 data transfer instructions

    5.1.1 "Data Transfer Instructions"

    @@ -23223,7 +23272,7 @@ zmmintrin.h AVX512
    -

    22.3.1. x86 exchange instructions

    +

    23.3.1. x86 exchange instructions

    Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 7.3.1.2 "Exchange Instructions":

    @@ -23241,7 +23290,7 @@ zmmintrin.h AVX512

    TODO: concrete multi-thread GCC inline assembly examples of how all those instructions are normally used as synchronization primitives.

    -
    22.3.1.1. x86 CMPXCHG instruction
    +
    23.3.1.1. x86 CMPXCHG instruction

    userland/arch/x86_64/cmpxchg.S

    @@ -23265,7 +23314,7 @@ zmmintrin.h AVX512
    -

    22.3.2. x86 PUSH and POP instructions

    +

    23.3.2. x86 PUSH and POP instructions

    userland/arch/x86_64/push.S

    @@ -23292,7 +23341,7 @@ add $8, %rsp
    -

    22.3.3. x86 CQTO and CLTQ instructions

    +

    23.3.3. x86 CQTO and CLTQ instructions

    Examples:

    @@ -23393,7 +23442,7 @@ add $8, %rsp
    -

    22.3.4. x86 CMOVcc instructions

    +

    23.3.4. x86 CMOVcc instructions

    -

    It is interesting to compare this with ARMv7 conditional execution: which is available for all instructions, as shown at: Section 23.2.5, “ARM conditional execution”.

    +

    It is interesting to compare this with ARMv7 conditional execution: which is available for all instructions, as shown at: Section 24.2.5, “ARM conditional execution”.

    -

    22.4. x86 binary arithmetic instructions

    +

    23.4. x86 binary arithmetic instructions

    Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.1.2 "Binary Arithmetic Instructions":

    @@ -23519,7 +23568,7 @@ add $8, %rsp
    -

    22.5. x86 logical instructions

    +

    23.5. x86 logical instructions

    Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.1.4 "Logical Instructions"

    @@ -23541,7 +23590,7 @@ add $8, %rsp
    -

    22.6. x86 shift and rotate instructions

    +

    23.6. x86 shift and rotate instructions

    Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.1.5 "Shift and Rotate Instructions"

    @@ -23593,7 +23642,7 @@ add $8, %rsp
    -

    22.7. x86 bit and byte instructions

    +

    23.7. x86 bit and byte instructions

    Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.1.6 "Bit and Byte Instructions"

    @@ -23652,7 +23701,7 @@ add $8, %rsp
    -

    22.8. x86 control transfer instructions

    +

    23.8. x86 control transfer instructions

    Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.1.7 "Control Transfer Instructions"

    @@ -23671,7 +23720,7 @@ add $8, %rsp
    -

    22.8.1. x86 Jcc instructions

    +

    23.8.1. x86 Jcc instructions

    userland/arch/x86_64/jcc.S

    @@ -23745,7 +23794,7 @@ add $8, %rsp
    -

    22.8.2. x86 LOOP instruction

    +

    23.8.2. x86 LOOP instruction

    userland/arch/x86_64/loop.S

    @@ -23754,7 +23803,7 @@ add $8, %rsp
    -

    22.8.3. x86 string instructions

    +

    23.8.3. x86 string instructions

    Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.1.8 "String Instructions"

    @@ -23807,7 +23856,7 @@ add $8, %rsp

    However, as computer architecture evolved, those instructions might not offer considerable speedups anymore, and modern glibc such as 2.29 just uses x86 SIMD operations instead:, see also: https://stackoverflow.com/questions/33480999/how-can-the-rep-stosb-instruction-execute-faster-than-the-equivalent-loop

    -
    22.8.3.1. x86 REP prefix
    +
    23.8.3.1. x86 REP prefix

    Example: userland/arch/x86_64/rep.S

    @@ -23846,7 +23895,7 @@ add $8, %rsp
    -

    22.8.4. x86 ENTER and LEAVE instructions

    +

    23.8.4. x86 ENTER and LEAVE instructions

    userland/arch/x86_64/enter.S

    @@ -23897,16 +23946,16 @@ pop %rbp
    -

    22.9. x86 miscellaneous instructions

    +

    23.9. x86 miscellaneous instructions

    Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.1.13 "Miscellaneous Instructions"

    -

    NOP: Section 21.10, “NOP instructions”

    +

    NOP: Section 22.10, “NOP instructions”

    -

    22.10. x86 random number generator instructions

    +

    23.10. x86 random number generator instructions

    Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.1.15 Random Number Generator Instructions

    @@ -23929,7 +23978,7 @@ pop %rbp

    RDRAND sets the carry flag when data is ready so we must loop if the carry flag isn’t set.

    -

    22.10.1. x86 CPUID instruction

    +

    23.10.1. x86 CPUID instruction

    Example: userland/arch/x86_64/cpuid.S

    @@ -24000,7 +24049,7 @@ pop %rbp
    -

    22.11. x86 x87 FPU instructions

    +

    23.11. x86 x87 FPU instructions

    Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.2 "X87 FPU INSTRUCTIONS"

    @@ -24093,7 +24142,7 @@ pop %rbp
    -

    22.11.1. x86 x87 FPU vs SIMD

    +

    23.11.1. x86 x87 FPU vs SIMD

    https://stackoverflow.com/questions/1844669/benefits-of-x87-over-sse

    @@ -24132,9 +24181,9 @@ pop %rbp
    -

    22.12. x86 SIMD

    +

    23.12. x86 SIMD

    -

    Parent section: Section 21.3, “SIMD assembly”

    +

    Parent section: Section 22.3, “SIMD assembly”

    History:

    @@ -24168,12 +24217,12 @@ pop %rbp
    -

    22.12.1. x86 SSE instructions

    +

    23.12.1. x86 SSE instructions

    -
    22.12.1.2. x86 SSE packed arithmetic instructions
    +
    23.12.1.2. x86 SSE packed arithmetic instructions
    @@ -24205,14 +24254,14 @@ pop %rbp
    -
    22.12.1.3. x86 SSE conversion instructions
    +
    23.12.1.3. x86 SSE conversion instructions

    Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.5.1.6 "SSE Conversion Instructions"

    -

    22.12.2. x86 SSE2 instructions

    +

    23.12.2. x86 SSE2 instructions

    Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.6 "SSE2 INSTRUCTIONS"

    @@ -24224,7 +24273,7 @@ pop %rbp
    -
    22.12.2.1. x86 PADDQ instruction
    +
    23.12.2.1. x86 PADDQ instruction

    userland/arch/x86_64/paddq.S: PADDQ, PADDL, PADDW, PADDB

    @@ -24234,7 +24283,7 @@ pop %rbp
    -

    22.12.3. x86 fused multiply add (FMA)

    +

    23.12.3. x86 fused multiply add (FMA)

    Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.15 "FUSED-MULTIPLY-ADD (FMA)"

    @@ -24254,12 +24303,12 @@ pop %rbp
    -

    22.13. x86 system instructions

    +

    23.13. x86 system instructions

    Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.20 "SYSTEM INSTRUCTIONS"

    -

    22.13.1. x86 RDTSC instruction

    +

    23.13.1. x86 RDTSC instruction

    Sources:

    @@ -24333,7 +24382,7 @@ pop %rbp
    -
    22.13.1.1. x86 RDTSCP instruction
    +
    23.13.1.1. x86 RDTSCP instruction

    RDTSCP is like RDTSP, but it also stores the CPU ID into ECX: this is convenient because the value of RDTSC depends on which core we are currently on, so you often also want the core ID when you want the RDTSC.

    @@ -24376,7 +24425,7 @@ taskset -c 1 ./userland/arch/x86_64/rdtscp.out | tail -n 1
    -
    22.13.1.2. ARM PMCCNTR register
    +
    23.13.1.2. ARM PMCCNTR register

    TODO We didn’t manage to find a working ARM analogue to x86 RDTSC instruction: kernel_modules/pmccntr.c is oopsing, and even it if weren’t, it likely won’t give the cycle count since boot since it needs to be activate before it starts counting anything:

    @@ -24397,9 +24446,9 @@ taskset -c 1 ./userland/arch/x86_64/rdtscp.out | tail -n 1
    -

    22.14. x86 thread synchronization primitives

    +

    23.14. x86 thread synchronization primitives

    -

    22.14.1. x86 LOCK prefix

    +

    23.14.1. x86 LOCK prefix

    Inline assembly example at: userland/cpp/atomic.cpp

    @@ -24425,11 +24474,11 @@ taskset -c 1 ./userland/arch/x86_64/rdtscp.out | tail -n 1
    -

    22.15. x86 assembly bibliography

    +

    23.15. x86 assembly bibliography

    -

    22.15.1. x86 official bibliography

    +

    23.15.1. x86 official bibliography

    -
    22.15.1.1. Intel 64 and IA-32 Architectures Software Developer’s Manuals
    +
    23.15.1.1. Intel 64 and IA-32 Architectures Software Developer’s Manuals

    We are using the May 2019 version unless otherwise noted.

    @@ -24446,25 +24495,25 @@ taskset -c 1 ./userland/arch/x86_64/rdtscp.out | tail -n 1

    Also I can’t find older versions on the website easily, so I just web archive everything.

    -

    23. ARM userland assembly

    +

    24. ARM userland assembly

    -

    Arch general getting started at: Section 21, “Userland assembly”.

    +

    Arch general getting started at: Section 22, “Userland assembly”.

    Instructions here loosely grouped based on that of the ARMv7 architecture reference manual Chapter A4 "The Instruction Sets".

    @@ -24487,7 +24536,7 @@ taskset -c 1 ./userland/arch/x86_64/rdtscp.out | tail -n 1

    We cover here mostly ARMv7, and then treat aarch64 differentially, since much of the ARMv7 userland is the same in aarch32.

    -

    23.1. Introduction to the ARM architecture

    +

    24.1. Introduction to the ARM architecture

    The ARM architecture is has been used on the vast majority of mobile phones in the 2010’s, and on a large fraction of micro controllers.

    @@ -24504,7 +24553,7 @@ taskset -c 1 ./userland/arch/x86_64/rdtscp.out | tail -n 1

    ARM Holdings was bought by the Japanese giant SoftBank in 2016.

    -

    23.1.1. ARMv8 vs ARMv7 vs AArch64 vs AArch32

    +

    24.1.1. ARMv8 vs ARMv7 vs AArch64 vs AArch32

    ARMv7 is the older architecture described at: ARMv7 architecture reference manual.

    @@ -24560,7 +24609,7 @@ taskset -c 1 ./userland/arch/x86_64/rdtscp.out | tail -n 1

    They are described at: ARMv8 architecture reference manual A1.7 "ARMv8 architecture extensions".

    -
    23.1.1.1. AArch32
    +
    24.1.1.1. AArch32

    32-bit mode of operation of ARMv8.

    @@ -24592,7 +24641,7 @@ taskset -c 1 ./userland/arch/x86_64/rdtscp.out | tail -n 1
    -
    23.1.1.2. AArch32 vs AArch64
    +
    24.1.1.2. AArch32 vs AArch64

    A great summary of differences can be found at: https://en.wikipedia.org/wiki/ARM_architecture#AArch64_features

    @@ -24602,17 +24651,17 @@ taskset -c 1 ./userland/arch/x86_64/rdtscp.out | tail -n 1
    -

    23.1.2. Free ARM implementations

    +

    24.1.2. Free ARM implementations

    The ARM instruction set is itself protected by patents / copyright / whatever, and you have to pay ARM Holdings a licence to implement it, even if you are creating your own custom Verilog code.

    @@ -24651,7 +24700,7 @@ Bibliography: -

    23.1.3. ARM instruction encodings

    +

    24.1.3. ARM instruction encodings

    Understanding the basics of instruction encodings is fundamental to help you to remember what instructions do and why some things are possible or not, notably the ARM LDR pseudo-instruction and the ADRP instruction.

    @@ -24763,7 +24812,7 @@ Bibliography: -
    23.1.3.1. ARM Thumb encoding
    +
    24.1.3.1. ARM Thumb encoding

    Thumb examples are available at:

    @@ -24822,7 +24871,7 @@ Bibliography: -
    23.1.3.2. ARM big endian mode
    +
    24.1.3.2. ARM big endian mode

    ARM can switch between big and little endian mode on the fly!

    @@ -24918,9 +24967,9 @@ Bibliography: -

    23.2. ARM branch instructions

    +

    24.2. ARM branch instructions

    -

    23.2.1. ARM B instruction

    +

    24.2.1. ARM B instruction

    Unconditional branch.

    @@ -24938,7 +24987,7 @@ Bibliography: -

    23.2.2. ARM BEQ instruction

    +

    24.2.2. ARM BEQ instruction

    Branch if equal based on the status registers.

    @@ -24982,7 +25031,7 @@ Bibliography: -

    23.2.3. ARM BL instruction

    +

    24.2.3. ARM BL instruction

    Branch with link, i.e. branch and store the return address on the RL register.

    @@ -24996,13 +25045,13 @@ Bibliography: -
    23.2.3.1. ARM BX instruction
    +
    24.2.3.1. ARM BX instruction
    -
    23.2.3.2. ARMv8 aarch64 ret instruction
    +
    24.2.3.2. ARMv8 aarch64 ret instruction
    @@ -25035,7 +25084,7 @@ Bibliography: -

    23.2.4. ARM CBZ instruction

    +

    24.2.4. ARM CBZ instruction

    Compare and branch if zero.

    @@ -25050,7 +25099,7 @@ Bibliography: -

    23.2.5. ARM conditional execution

    +

    24.2.5. ARM conditional execution

    Weirdly, ARM B instruction and family are not the only instructions that can execute conditionally on the flags: the same also applies to most instructions, e.g. ADD.

    @@ -25066,7 +25115,7 @@ Bibliography: -

    23.3. ARM load and store instructions

    +

    24.3. ARM load and store instructions

    In ARM, there are only two instruction families that do memory access:

    @@ -25090,9 +25139,9 @@ Bibliography: Load/store architecture.

    -

    23.3.1. ARM LDR instruction

    +

    24.3.1. ARM LDR instruction

    -
    23.3.1.1. ARM LDR pseudo-instruction
    +
    24.3.1.1. ARM LDR pseudo-instruction

    LDR can be either a regular instruction that loads stuff into memory, or also a pseudo-instruction (assembler magic): http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0041c/Babbfdih.html

    @@ -25126,7 +25175,7 @@ Bibliography: -
    23.3.1.2. ARM addressing modes
    +
    24.3.1.2. ARM addressing modes
    @@ -25197,7 +25246,7 @@ Bibliography: ARMv8 architecture reference manual: C1.3.3 "Load/Store addressing modes"

    -
    23.3.1.2.1. ARM loop over array
    +
    24.3.1.2.1. ARM loop over array

    As an application of the post-indexed addressing mode, let’s increment an array.

    @@ -25207,7 +25256,7 @@ Bibliography: -
    23.3.1.3. ARM LDRH and LDRB instructions
    +
    24.3.1.3. ARM LDRH and LDRB instructions

    There are LDR variants that load less than full 4 bytes:

    @@ -25234,7 +25283,7 @@ Bibliography: -

    23.3.2. ARM STR instruction

    +

    24.3.2. ARM STR instruction

    Store from memory into registers.

    @@ -25245,7 +25294,7 @@ Bibliography: ARM LDR instruction also applies here so we won’t go into much detail.

    -
    23.3.2.1. ARMv8 aarch64 STR instruction
    +
    24.3.2.1. ARMv8 aarch64 STR instruction

    PC-relative STR is not possible in aarch64.

    @@ -25263,7 +25312,7 @@ Bibliography: -
    23.3.2.2. ARMv8 aarch64 LDP and STP instructions
    +
    24.3.2.2. ARMv8 aarch64 LDP and STP instructions

    Push a pair of registers to the stack.

    @@ -25271,7 +25320,7 @@ Bibliography: lkmc/aarch64.h since it is the main way to restore register state.

    -
    23.3.2.2.1. ARMV8 aarch64 stack alignment
    +
    24.3.2.2.1. ARMV8 aarch64 stack alignment

    In ARMv8, the stack can be enforced to 16-byte alignment.

    @@ -25318,7 +25367,7 @@ Bibliography: -

    23.3.3. ARM LDMIA instruction

    +

    24.3.3. ARM LDMIA instruction

    Pop values form stack into the register and optionally update the address register.

    @@ -25368,7 +25417,7 @@ ldmia sp!, reglist
    -

    23.4. ARM data processing instructions

    +

    24.4. ARM data processing instructions

    Arithmetic:

    @@ -25392,7 +25441,7 @@ ldmia sp!, reglist
    -

    23.4.1. ARM CSET instruction

    +

    24.4.1. ARM CSET instruction

    @@ -25404,7 +25453,7 @@ ldmia sp!, reglist
    -

    23.4.2. ARM bitwise instructions

    +

    24.4.2. ARM bitwise instructions

    • @@ -25422,7 +25471,7 @@ ldmia sp!, reglist
    -
    23.4.2.1. ARM BIC instruction
    +
    24.4.2.1. ARM BIC instruction

    Bitwise Bit Clear: clear some bits.

    @@ -25436,7 +25485,7 @@ ldmia sp!, reglist
    -
    23.4.2.2. ARM UBFM instruction
    +
    24.4.2.2. ARM UBFM instruction

    Unsigned Bitfield Move.

    @@ -25454,7 +25503,7 @@ ldmia sp!, reglist

    TODO: explain full behaviour. Very complicated. Has several simpler to understand aliases.

    -
    23.4.2.2.1. ARM UBFX instruction
    +
    24.4.2.2.1. ARM UBFX instruction

    Alias for:

    @@ -25488,12 +25537,12 @@ ldmia sp!, reglist
    -
    23.4.2.3. ARM BFM instruction
    +
    24.4.2.3. ARM BFM instruction

    TODO: explain. Similar to UBFM but leave untouched bits unmodified.

    -
    23.4.2.3.1. ARM BFI instruction
    +
    24.4.2.3.1. ARM BFI instruction

    Examples:

    @@ -25524,12 +25573,12 @@ ldmia sp!, reglist
    -

    23.4.3. ARM MOV instruction

    +

    24.4.3. ARM MOV instruction

    Move an immediate to a register, or a register to another register.

    -

    Cannot load from or to memory, since only the LDR and STR instruction families can do that in ARM as mentioned at: Section 23.3, “ARM load and store instructions”.

    +

    Cannot load from or to memory, since only the LDR and STR instruction families can do that in ARM as mentioned at: Section 24.3, “ARM load and store instructions”.

    Example: userland/arch/arm/mov.S

    @@ -25590,7 +25639,7 @@ ldmia sp!, reglist

    Assemblers however support magic memory allocations which may hide what is truly going on: https://stackoverflow.com/questions/14046686/why-use-ldr-over-mov-or-vice-versa-in-arm-assembly Always ask your friendly disassembly for a good confirmation.

    -
    23.4.3.1. ARM movw and movt instructions
    +
    24.4.3.1. ARM movw and movt instructions

    Set the higher or lower 16 bits of a register to an immediate in one go.

    @@ -25602,7 +25651,7 @@ ldmia sp!, reglist
    -
    23.4.3.2. ARMv8 aarch64 movk instruction
    +
    24.4.3.2. ARMv8 aarch64 movk instruction

    Fill a 64 bit register with 4 16-bit instructions one at a time.

    @@ -25617,7 +25666,7 @@ ldmia sp!, reglist
    -
    23.4.3.3. ARMv8 aarch64 movn instruction
    +
    24.4.3.3. ARMv8 aarch64 movn instruction

    Set 16-bits negated and the rest to 1.

    @@ -25627,9 +25676,9 @@ ldmia sp!, reglist
    -

    23.4.4. ARM data processing instruction suffixes

    +

    24.4.4. ARM data processing instruction suffixes

    -
    23.4.4.1. ARM shift suffixes
    +
    24.4.4.1. ARM shift suffixes

    Most data processing instructions can also optionally shift the second register operand.

    @@ -25657,7 +25706,7 @@ ldmia sp!, reglist
    -
    23.4.4.2. ARM S suffix
    +
    24.4.4.2. ARM S suffix

    Example: userland/arch/arm/s_suffix.S

    @@ -25673,7 +25722,7 @@ ldmia sp!, reglist
    -

    23.4.5. ARM ADR instruction

    +

    24.4.5. ARM ADR instruction

    Similar rationale to the ARM LDR pseudo-instruction, allowing to easily store a PC-relative reachable address into a register in one go, to overcome the 4-byte fixed instruction size.

    @@ -25697,19 +25746,19 @@ ldmia sp!, reglist

    More details: https://stackoverflow.com/questions/41906688/what-are-the-semantics-of-adrp-and-adrl-instructions-in-arm-assembly/54042899#54042899

    -
    23.4.5.1. ARM ADRL instruction
    +
    24.4.5.1. ARM ADRL instruction
    -

    See: Section 23.4.5, “ARM ADR instruction”.

    +

    See: Section 24.4.5, “ARM ADR instruction”.

    -

    23.5. ARM miscellaneous instructions

    +

    24.5. ARM miscellaneous instructions

    -

    23.5.1. ARM NOP instruction

    +

    24.5.1. ARM NOP instruction

    There are a few different ways to encode NOP, notably MOV a register into itself, and a dedicated miscellaneous instruction.

    @@ -25730,7 +25779,7 @@ ldmia sp!, reglist
    -

    23.5.2. ARM UDF instruction

    +

    24.5.2. ARM UDF instruction

    Guaranteed undefined! Therefore raise illegal instruction signal. Used by GCC __builtin_trap apparently: https://stackoverflow.com/questions/16081618/programmatically-cause-undefined-instruction-exception

    @@ -25750,12 +25799,12 @@ ldmia sp!, reglist
    -

    23.6. ARM SIMD

    +

    24.6. ARM SIMD

    -

    Parent section: Section 21.3, “SIMD assembly”

    +

    Parent section: Section 22.3, “SIMD assembly”

    -

    23.6.1. ARM VFP

    +

    24.6.1. ARM VFP

    The name for the ARMv7 and AArch32 floating point and SIMD instructions / registers.

    @@ -25801,7 +25850,7 @@ ldmia sp!, reglist
    -
    23.6.1.1. ARM VFP registers
    +
    24.6.1.1. ARM VFP registers

    TODO example

    @@ -25837,20 +25886,20 @@ ldmia sp!, reglist
    -
    23.6.1.2. ARM VADD instruction
    +
    24.6.1.2. ARM VADD instruction
    -
    23.6.1.3. ARM VCVT instruction
    +
    24.6.1.3. ARM VCVT instruction

    Example: userland/arch/arm/vcvt.S

    @@ -25879,7 +25928,7 @@ ldmia sp!, reglist
    -
    23.6.1.3.1. ARM VCVTR instruction
    +
    24.6.1.3.1. ARM VCVTR instruction

    Example: userland/arch/arm/vcvtr.S

    @@ -25897,7 +25946,7 @@ ldmia sp!, reglist
    -
    23.6.1.3.2. ARMv8 AArch32 VCVTA instruction
    +
    24.6.1.3.2. ARMv8 AArch32 VCVTA instruction

    Example: userland/arch/arm/vcvt.S

    @@ -25917,7 +25966,7 @@ ldmia sp!, reglist
    -

    23.6.2. ARMv8 Advanced SIMD and floating-point support

    +

    24.6.2. ARMv8 Advanced SIMD and floating-point support

    The ARMv8 architecture reference manual specifies floating point and SIMD support in the main architecture at A1.5 "Advanced SIMD and floating-point support".

    @@ -25925,13 +25974,13 @@ ldmia sp!, reglist

    The feature is often refered to simply as "SIMD&FP" throughout the manual.

    -

    The Linux kernel shows /proc/cpuinfo compatibility as neon, which is yet another intermediate name that came up at some point, see: Section 23.6.2.2, “ARM NEON”.

    +

    The Linux kernel shows /proc/cpuinfo compatibility as neon, which is yet another intermediate name that came up at some point, see: Section 24.6.2.2, “ARM NEON”.

    Vs ARM VFP: https://stackoverflow.com/questions/4097034/arm-cortex-a8-whats-the-difference-between-vfp-and-neon

    -
    23.6.2.1. ARMv8 floating point availability
    +
    24.6.2.1. ARMv8 floating point availability

    Support is semi-mandatory. ARMv8 architecture reference manual A1.5 "Advanced SIMD and floating-point support":

    @@ -25968,7 +26017,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -
    23.6.2.2. ARM NEON
    +
    24.6.2.2. ARM NEON

    Just an informal name for the "Advanced SIMD instructions"? Very confusing.

    @@ -25995,7 +26044,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -

    23.6.3. ARMv8 AArch64 floating point registers

    +

    24.6.3. ARMv8 AArch64 floating point registers

    TODO example.

    @@ -26050,7 +26099,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -
    23.6.3.1. ARMv8 aarch64 add vector instruction
    +
    24.6.3.1. ARMv8 aarch64 add vector instruction

    userland/arch/aarch64/add_vector.S

    @@ -26059,21 +26108,21 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -
    23.6.3.2. ARMv8 aarch64 FADD instruction
    +
    24.6.3.2. ARMv8 aarch64 FADD instruction
    -
    23.6.3.2.1. ARM FADD vs VADD
    +
    24.6.3.2.1. ARM FADD vs VADD
    -

    It is very confusing, but FADDS and FADDD in Aarch32 are pre-UAL for vadd.f32 and vadd.f64 which we use in this tutorial, see: Section 23.6.1.2, “ARM VADD instruction”

    +

    It is very confusing, but FADDS and FADDD in Aarch32 are pre-UAL for vadd.f32 and vadd.f64 which we use in this tutorial, see: Section 24.6.1.2, “ARM VADD instruction”

    The same goes for most ARMv7 mnemonics: f* is old, and v* is the newer better syntax.

    @@ -26085,12 +26134,12 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    Also keep in mind that fused multiply add is FMADD.

    -
    23.6.3.3. ARMv8 aarch64 ld2 instruction
    +
    24.6.3.3. ARMv8 aarch64 ld2 instruction

    Example: userland/arch/aarch64/ld2.S

    @@ -26106,7 +26155,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -

    23.6.4. ARM SIMD bibliography

    +

    24.6.4. ARM SIMD bibliography

    -

    23.6.5. ARM SVE

    +

    24.6.5. ARM SVE

    Scalable Vector Extension.

    @@ -26177,7 +26226,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    Using SVE normally requires setting the CPACR_EL1.FPEN and ZEN bits, which as as of lkmc 29fd625f3fda79f5e0ee6cac43517ba74340d513 + 1 we also enable in our Baremetal bootloaders, see also: aarch64 baremetal NEON setup.

    -
    23.6.5.1. SVE bibliography
    +
    24.6.5.1. SVE bibliography
    -
    23.6.5.1.1. SVE spec
    +
    24.6.5.1.1. SVE spec

    ARMv8 architecture reference manual A1.7 "ARMv8 architecture extensions" says:

    @@ -26217,14 +26266,14 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -

    23.7. ARMv8 architecture extensions

    +

    24.7. ARMv8 architecture extensions

    -

    23.7.1. ARMv8.1 architecture extension

    +

    24.7.1. ARMv8.1 architecture extension

    ARMv8 architecture reference manual db A1.7.3 "The ARMv8.1 architecture extension"

    -
    23.7.1.1. ARM Large System Extensions (LSE)
    +
    24.7.1.1. ARM Large System Extensions (LSE)

    ARMv8 architecture reference manual db "ARMv8.1-LSE, ARMv8.1 Large System Extensions"

    @@ -26252,9 +26301,9 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -

    23.8. ARM assembly bibliography

    +

    24.8. ARM assembly bibliography

    -

    23.8.1. ARM non-official bibliography

    +

    24.8.1. ARM non-official bibliography

    Good getting started tutorials:

    @@ -26276,7 +26325,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -

    23.8.2. ARM official bibliography

    +

    24.8.2. ARM official bibliography

    The official manuals were stored in http://infocenter.arm.com but as of 2017 they started to slowly move to https://developer.arm.com.

    @@ -26290,7 +26339,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    Bibliography: https://www.quora.com/Where-can-I-find-the-official-documentation-of-ARM-instruction-set-architectures-ISAs

    -
    23.8.2.1. ARMv7 architecture reference manual
    +
    24.8.2.1. ARMv7 architecture reference manual
    @@ -26302,7 +26351,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -
    23.8.2.2. ARMv8 architecture reference manual
    +
    24.8.2.2. ARMv8 architecture reference manual

    https://static.docs.arm.com/ddi0487/ca/DDI0487C_a_armv8_arm.pdf

    @@ -26358,13 +26407,13 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -
    23.8.2.3. ARMv8 architecture reference manual db
    +
    24.8.2.3. ARMv8 architecture reference manual db

    https://static.docs.arm.com/ddi0487/db/DDI0487D_b_armv8_arm.pdf

    -
    23.8.2.4. Programmer’s Guide for ARMv8-A
    +
    24.8.2.4. Programmer’s Guide for ARMv8-A

    https://static.docs.arm.com/den0024/a/DEN0024A_v8_architecture_PG.pdf

    @@ -26379,7 +26428,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -
    23.8.2.5. Arm A64 Instruction Set Architecture: Future Architecture Technologies in the A architecture profile Documentation
    +
    24.8.2.5. Arm A64 Instruction Set Architecture: Future Architecture Technologies in the A architecture profile Documentation

    https://developer.arm.com/docs/ddi0602/b

    @@ -26388,7 +26437,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -
    23.8.2.6. ARM processor documentation
    +
    24.8.2.6. ARM processor documentation

    ARM also releases documentation specific to each given processor.

    @@ -26396,7 +26445,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    This adds extra details to the more portable ARMv8 architecture reference manual ISA documentation.

    -
    23.8.2.6.1. ARM Cortex-A15 MPCore Processor Technical Reference Manual r4p0
    +
    24.8.2.6.1. ARM Cortex-A15 MPCore Processor Technical Reference Manual r4p0

    http://infocenter.arm.com/help/topic/com.arm.doc.ddi0438i/DDI0438I_cortex_a15_r4p0_trm.pdf

    @@ -26410,7 +26459,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -

    24. ELF

    +

    25. ELF

    https://en.wikipedia.org/wiki/Executable_and_Linkable_Format

    @@ -26424,7 +26473,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -

    25. IEEE 754

    +

    26. IEEE 754

    https://en.wikipedia.org/wiki/IEEE_754

    @@ -26454,13 +26503,13 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -

    26. Baremetal

    +

    27. Baremetal

    -

    26.1. Baremetal GDB step debug

    +

    27.1. Baremetal GDB step debug

    GDB step debug works on baremetal exactly as it does on the Linux kernel, which is described at: Section 2, “GDB step debug”.

    @@ -26531,7 +26580,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -

    26.2. Baremetal bootloaders

    +

    27.2. Baremetal bootloaders

    As can be seen from Baremetal GDB step debug, all examples under baremetal/, with the exception of baremetal/arch/<arch>/no_bootloader, start from our tiny bootloaders:

    @@ -26567,7 +26616,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    the stack pointer

  • -

    NEON: Section 26.9.2, “aarch64 baremetal NEON setup”

    +

    NEON: Section 27.9.2, “aarch64 baremetal NEON setup”

  • TODO: we don’t do this currently but maybe we should setup BSS

    @@ -26595,7 +26644,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

  • -

    26.3. Semihosting

    +

    27.3. Semihosting

    Semihosting is a publicly documented interface specified by ARM Holdings that allows us to do some magic operations very useful in development.

    @@ -26716,7 +26765,7 @@ svc 0x00123456
    -

    26.3.1. gem5 semihosting

    +

    27.3.1. gem5 semihosting

    For gem5, you need:

    @@ -26731,7 +26780,7 @@ svc 0x00123456
    -

    26.4. gem5 baremetal carriage return

    +

    27.4. gem5 baremetal carriage return

    TODO: our example is printing newlines without automatic carriage return \r as in:

    @@ -26754,7 +26803,7 @@ svc 0x00123456
    -

    26.5. Baremetal host packaged toolchain

    +

    27.5. Baremetal host packaged toolchain

    For arm, some baremetal examples compile fine with:

    @@ -26790,7 +26839,7 @@ collect2: error: ld returned 1 exit status
    -

    26.6. Baremetal C++

    +

    27.6. Baremetal C++

    TODO not working as of 8825222579767f2ee7e46ffd8204b9e509440759 + 1. Not yet properly researched / reported upstream yet.

    @@ -26850,7 +26899,7 @@ collect2: error: ld returned 1 exit status
    -

    26.7. GDB builtin CPU simulator

    +

    27.7. GDB builtin CPU simulator

    It is incredible, but GDB also has a CPU simulator inside of it as documented at: https://sourceware.org/gdb/onlinedocs/gdb/Target-Commands.html

    @@ -26910,7 +26959,7 @@ starti
    -

    26.7.1. GDB builtin CPU simulator userland

    +

    27.7.1. GDB builtin CPU simulator userland

    Since I had this compiled, I also decided to try it out on userland.

    @@ -26945,7 +26994,7 @@ starti
    -

    26.8. ARM baremetal

    +

    27.8. ARM baremetal

    In this section we will focus on learning ARM architecture concepts that can only learnt on baremetal setups.

    @@ -26953,7 +27002,7 @@ starti

    Userland information can be found at: https://github.com/cirosantilli/arm-assembly-cheat

    -

    26.8.1. ARM exception levels

    +

    27.8.1. ARM exception levels

    ARM exception levels are analogous to x86 rings.

    @@ -27082,13 +27131,13 @@ CurrentEL.EL 0x3

    According to ARMv7 architecture reference manual, access to that register is controlled by other registers NSACR.{CP11, CP10} and HCPTR so those must be turned off, but I’m lazy to investigate now, even just trying to dump those registers in userland/arch/arm/dump_regs.c also leads to exceptions…​

    -
    26.8.1.1. ARM change exception level
    +
    27.8.1.1. ARM change exception level

    TODO. Create a minimal runnable example of going into EL0 and jumping to EL1.

    -
    26.8.1.2. ARM SP0 vs SPx
    +
    27.8.1.2. ARM SP0 vs SPx

    See ARMv8 architecture reference manual db D1.6.2 "The stack pointer registers".

    @@ -27101,7 +27150,7 @@ CurrentEL.EL 0x3
    -

    26.8.2. ARM SVC instruction

    +

    27.8.2. ARM SVC instruction

    This is the most basic example of exception handling we have.

    @@ -27453,7 +27502,7 @@ IN: main
    -
    26.8.2.1. ARMv8 exception vector table format
    +
    27.8.2.1. ARMv8 exception vector table format

    The vector table format is described on ARMv8 architecture reference manual Table D1-7 "Vector offsets from vector table base address".

    @@ -27593,29 +27642,29 @@ IN: main
    -
    26.8.2.2. ARM ESR register
    +
    27.8.2.2. ARM ESR register

    Exception Syndrome Register.

    -

    See example at: Section 26.8.2, “ARM SVC instruction”

    +

    See example at: Section 27.8.2, “ARM SVC instruction”

    Documentation: ARMv8 architecture reference manual db D12.2.36 "ESR_EL1, Exception Syndrome Register (EL1)".

    -
    26.8.2.3. ARM ELR register
    +
    27.8.2.3. ARM ELR register

    Exception Link Register.

    -

    See the example at: Section 26.8.2, “ARM SVC instruction”

    +

    See the example at: Section 27.8.2, “ARM SVC instruction”

    -

    26.8.3. ARM multicore

    +

    27.8.3. ARM multicore

    Examples:

    @@ -27694,7 +27743,7 @@ IN: main

    Bibliography: https://stackoverflow.com/questions/980999/what-does-multicore-assembly-language-look-like/33651438#33651438

    -
    26.8.3.1. ARM WFE and SEV instructions
    +
    27.8.3.1. ARM WFE and SEV instructions

    The WFE and SEV instructions are just hints: a compliant implementation can treat them as NOPs.

    @@ -27768,7 +27817,7 @@ IN: main
    -
    26.8.3.2. ARM PSCI
    +
    27.8.3.2. ARM PSCI

    In QEMU, CPU 1 starts in a halted state. This can be observed from GDB, where:

    @@ -27818,14 +27867,14 @@ IN: main
    -
    26.8.3.3. ARM DMB instruction
    +
    27.8.3.3. ARM DMB instruction

    TODO: create and study a minimal examples in gem5 where the DMB instruction leads to less cycles: https://stackoverflow.com/questions/15491751/real-life-use-cases-of-barriers-dsb-dmb-isb-in-arm

    -

    26.8.4. ARM timer

    +

    27.8.4. ARM timer

    The ARM timer is the simplest way to generate hardware interrupts periodically, and therefore serves as the simples example of ARM GIC usage.

    @@ -27978,7 +28027,7 @@ cntvct_el0 0x3CF516F
    -

    26.8.5. ARM GIC

    +

    27.8.5. ARM GIC

    Generic Interrupt Controller.

    @@ -28020,7 +28069,7 @@ cntvct_el0 0x3CF516F
    -

    26.8.6. ARM paging

    +

    27.8.6. ARM paging

    TODO create a minimal working aarch64 example analogous to the x86 one at: https://github.com/cirosantilli/x86-bare-metal-examples/blob/6dc9a73830fc05358d8d66128f740ef9906f7677/paging.S

    @@ -28050,9 +28099,9 @@ cntvct_el0 0x3CF516F
    -

    26.8.7. ARM baremetal bibliography

    +

    27.8.7. ARM baremetal bibliography

    -

    First, also consider the userland bibliography: Section 23.8, “ARM assembly bibliography”.

    +

    First, also consider the userland bibliography: Section 24.8, “ARM assembly bibliography”.

    The most useful ARM baremetal example sets we’ve seen so far are:

    @@ -28077,7 +28126,7 @@ cntvct_el0 0x3CF516F
    -
    26.8.7.1. NienfengYao/armv8-bare-metal
    +
    27.8.7.1. NienfengYao/armv8-bare-metal
    @@ -28136,7 +28185,7 @@ cntvct_el0 0x3CF516F
    -
    26.8.7.2. tukl-msd/gem5.bare-metal
    +
    27.8.7.2. tukl-msd/gem5.bare-metal

    https://github.com/tukl-msd/gem5.bare-metal

    @@ -28178,7 +28227,7 @@ make CROSS_COMPILE_DIR=/usr/bin
    -

    26.9. How we got some baremetal stuff to work

    +

    27.9. How we got some baremetal stuff to work

    It is nice when thing just work.

    @@ -28186,7 +28235,7 @@ make CROSS_COMPILE_DIR=/usr/bin

    But you can also learn a thing or two from how I actually made them work in the first place.

    -

    26.9.1. Find the UART address

    +

    27.9.1. Find the UART address

    Enter the QEMU console:

    @@ -28222,7 +28271,7 @@ make CROSS_COMPILE_DIR=/usr/bin
    -

    26.9.2. aarch64 baremetal NEON setup

    +

    27.9.2. aarch64 baremetal NEON setup

    Inside baremetal/lib/aarch64.S there is a chunk of code that enables floating point operations:

    @@ -28346,7 +28395,7 @@ ISB
    -

    26.10. Baremetal tests

    +

    27.10. Baremetal tests

    Baremetal tests work exactly like User mode tests, except that you have to add the --mode baremetal option, for example:

    @@ -28359,13 +28408,13 @@ ISB

    In baremetal, we detect if tests failed by parsing logs for the Magic failure string.

    -

    See: Section 29.13, “Test this repo” for more useful testing tips.

    +

    See: Section 30.13, “Test this repo” for more useful testing tips.

    -

    27. Android

    +

    28. Android

    Remember: Android AOSP is a huge undocumented piece of bloatware. It’s integration into this repo will likely never be super good.

    @@ -28413,7 +28462,7 @@ ISB

    Tested on: 8.1.0_r60.

    -

    27.1.1. Android images read-only

    +

    28.1.1. Android images read-only

    From mount, we can see that some of the mounted images are ro.

    @@ -28570,7 +28619,7 @@ date >/system/a
    -

    27.1.2. Android /data partition

    +

    28.1.2. Android /data partition

    When I install an app like F-Droid, it goes under /data according to:

    @@ -28631,7 +28680,7 @@ date >/system/a
    -

    27.2. Install Android apps

    +

    28.2. Install Android apps

    I don’t know how to download files from the web on Vanilla android, the default browser does not download anything, and there is no wget:

    @@ -28681,7 +28730,7 @@ date >/system/a
    -

    27.3. Android init

    +

    28.3. Android init

    For Linux in general, see: Section 6, “init”.

    @@ -28730,7 +28779,7 @@ import /init.${ro.zygote}.rc
    -

    28. Benchmark this repo

    +

    29. Benchmark this repo

    TODO: didn’t fully port during refactor after 3b0a343647bed577586989fb702b760bd280844a. Reimplementing should not be hard.

    @@ -28759,7 +28808,7 @@ cd -
    -

    28.1. Continuous integraion

    +

    29.1. Continuous integraion

    We have exploreed a few Continuous integration solutions.

    @@ -28767,13 +28816,13 @@ cd -

    We haven’t setup any of them yet.

    -

    28.1.1. Travis

    +

    29.1.1. Travis

    We tried to automate it on Travis with .travis.yml but it hits the current 50 minute job timeout: https://travis-ci.org/cirosantilli/linux-kernel-module-cheat/builds/296454523 And I bet it would likely hit a disk maxout either way if it went on.

    -

    28.1.2. CircleCI

    +

    29.1.2. CircleCI

    This setup sucessfully built gem5 on every commit: .circleci/config.yml

    @@ -28802,9 +28851,9 @@ cd -
    -

    28.2. Benchmark this repo benchmarks

    +

    29.2. Benchmark this repo benchmarks

    -

    28.2.1. Benchmark Linux kernel boot

    +

    29.2.1. Benchmark Linux kernel boot

    Run all kernel boot benchmarks for one arch:

    @@ -28889,7 +28938,7 @@ instructions 124346081

    TODO: aarch64 gem5 and QEMU use the same kernel, so why is the gem5 instruction count so much much higher?

    -
    28.2.1.1. gem5 arm HPI boot takes much longer than aarch64
    +
    29.2.1.1. gem5 arm HPI boot takes much longer than aarch64

    TODO 62f6870e4e0b384c4bd2d514116247e81b241251 takes 33 minutes to finish at 62f6870e4e0b384c4bd2d514116247e81b241251:

    @@ -28915,7 +28964,7 @@ instructions 124346081
    -
    28.2.1.2. gem5 x86_64 DerivO3CPU boot panics
    +
    29.2.1.2. gem5 x86_64 DerivO3CPU boot panics

    https://github.com/cirosantilli-work/gem5-issues/issues/2

    @@ -28927,7 +28976,7 @@ instructions 124346081
    -

    28.2.2. Benchmark builds

    +

    29.2.2. Benchmark builds

    The build times are calculated after doing ./configure and make source, which downloads the sources, and basically benchmarks the Internet.

    @@ -28952,7 +29001,7 @@ cat ../linux-kernel-module-cheat-regression/*/build-time.log
    -
    28.2.2.1. Find which Buildroot packages are making the build slow and big
    +
    29.2.2.1. Find which Buildroot packages are making the build slow and big
    ./build-buildroot -- graph-build graph-size graph-depends
    @@ -28963,14 +29012,14 @@ xdg-open graph-size.pdf
    -
    28.2.2.1.1. Buildroot use prebuilt host toolchain
    +
    29.2.2.1.1. Buildroot use prebuilt host toolchain

    The biggest build time hog is always GCC, and it does not look like we can use a precompiled one: https://stackoverflow.com/questions/10833672/buildroot-environment-with-host-toolchain

    -
    28.2.2.2. Benchmark Buildroot build baseline
    +
    29.2.2.2. Benchmark Buildroot build baseline

    This is the minimal build we could expect to get away with.

    @@ -29038,7 +29087,7 @@ xdg-open graph-size.pdf
    -
    28.2.2.3. Benchmark gem5 build
    +
    29.2.2.3. Benchmark gem5 build

    How long it takes to build gem5 itself.

    @@ -29058,7 +29107,7 @@ tail -n+1 ../linux-kernel-module-cheat-regression/*/gem5-bench-build-*.txt
    -
    28.2.2.3.1. Benchmark gem5 single file change rebuild time
    +
    29.2.2.3.1. Benchmark gem5 single file change rebuild time

    This is the critical development parameter, and is dominated by the link time of huge binaries.

    @@ -29135,9 +29184,9 @@ tail -n+1 ../linux-kernel-module-cheat-regression/*/gem5-bench-build-*.txt
    -

    28.3. Benchmark machines

    +

    29.3. Benchmark machines

    -

    28.3.1. P51

    +

    29.3.1. P51

    Lenovo ThinkPad P51 laptop:

    @@ -29166,9 +29215,9 @@ tail -n+1 ../linux-kernel-module-cheat-regression/*/gem5-bench-build-*.txt
    -

    28.4. Benchmark Internets

    +

    29.4. Benchmark Internets

    -

    28.4.1. 38Mbps internet

    +

    29.4.1. 38Mbps internet

    2c12b21b304178a81c9912817b782ead0286d282:

    @@ -29188,7 +29237,7 @@ tail -n+1 ../linux-kernel-module-cheat-regression/*/gem5-bench-build-*.txt
    -

    28.5. Benchmark this repo bibliography

    +

    29.5. Benchmark this repo bibliography

    gem5:

    @@ -29216,10 +29265,10 @@ tail -n+1 ../linux-kernel-module-cheat-regression/*/gem5-bench-build-*.txt
    -

    29. About this repo

    +

    30. About this repo

    -

    29.1. Supported hosts

    +

    30.1. Supported hosts

    The host requirements depend a lot on which examples you want to run.

    @@ -29268,9 +29317,9 @@ tail -n+1 ../linux-kernel-module-cheat-regression/*/gem5-bench-build-*.txt
    -

    29.2. Common build issues

    +

    30.2. Common build issues

    -

    29.2.1. You must put some 'source' URIs in your sources.list

    +

    30.2.1. You must put some 'source' URIs in your sources.list

    If ./build --download-dependencies fails with:

    @@ -29284,7 +29333,7 @@ tail -n+1 ../linux-kernel-module-cheat-regression/*/gem5-bench-build-*.txt
    -

    29.2.2. Build from downloaded source zip files

    +

    30.2.2. Build from downloaded source zip files

    It does not work if you just download the .zip with the sources for this repository from GitHub because we use Git submodules, you must clone this repo.

    @@ -29294,7 +29343,7 @@ tail -n+1 ../linux-kernel-module-cheat-regression/*/gem5-bench-build-*.txt
    -

    29.3. Run command after boot

    +

    30.3. Run command after boot

    If you just want to run a command after boot ends without thinking much about it, just use the --eval-after option, e.g.:

    @@ -29311,7 +29360,7 @@ tail -n+1 ../linux-kernel-module-cheat-regression/*/gem5-bench-build-*.txt
    -

    29.4. Default command line arguments

    +

    30.4. Default command line arguments

    It gets annoying to retype --arch aarch64 for every single command, or to remember --config setups.

    @@ -29356,12 +29405,12 @@ tail -n+1 ../linux-kernel-module-cheat-regression/*/gem5-bench-build-*.txt
    -

    29.5. Documentation

    +

    30.5. Documentation

    To learn how to build the documentation see: Section 1.8, “Build the documentation”.

    -

    29.5.1. Documentation verification

    +

    30.5.1. Documentation verification

    When running build-doc, we do the following checks:

    @@ -29382,7 +29431,7 @@ tail -n+1 ../linux-kernel-module-cheat-regression/*/gem5-bench-build-*.txt

    The scripts prints what you have to fix and exits with an error status if there are any errors.

    - + @@ -29405,7 +29454,7 @@ tail -n+1 ../linux-kernel-module-cheat-regression/*/gem5-bench-build-*.txt
    -
    29.5.1.2. asciidoctor/extract-header-ids
    +
    30.5.1.2. asciidoctor/extract-header-ids

    Documentation for asciidoctor/extract-header-ids

    @@ -29450,7 +29499,7 @@ explicitly-given
    - +

    The Asciidoctor extension scripts:

    @@ -29478,7 +29527,7 @@ explicitly-given
    -

    29.6.1. GitHub pages

    +

    30.6.1. GitHub pages

    As mentioned before the TOC, we have to push this README to GitHub pages due to: https://github.com/isaacs/github/issues/1610

    @@ -29528,7 +29577,7 @@ explicitly-given
    -

    29.7. Clean the build

    +

    30.7. Clean the build

    You did something crazy, and nothing seems to work anymore?

    @@ -29592,7 +29641,7 @@ ls "$(./getvar buildroot_build_dir)"
    -

    29.8. ccache

    +

    30.8. ccache

    ccache might save you a lot of re-build when you decide to Clean the build or create a new build variant.

    @@ -29661,7 +29710,7 @@ export CCACHE_MAXSIZE="20G"
    -

    29.9. Rebuild Buildroot while running

    +

    30.9. Rebuild Buildroot while running

    It is not possible to rebuild the root filesystem while running QEMU because QEMU holds the file qcow2 file:

    @@ -29672,7 +29721,7 @@ export CCACHE_MAXSIZE="20G"
    -

    29.10. Simultaneous runs

    +

    30.10. Simultaneous runs

    When doing long simulations sweeping across multiple system parameters, it becomes fundamental to do multiple simulations in parallel.

    @@ -29768,7 +29817,7 @@ less "$(./getvar --arch aarch64 --emulator gem5 --run-id 1 termout_file)"
    -

    To run multiple gem5 checkouts, see: Section 29.11.3.1, “gem5 worktree”.

    +

    To run multiple gem5 checkouts, see: Section 30.11.3.1, “gem5 worktree”.

    Implementation note: we create multiple namespaces for two things:

    @@ -29807,7 +29856,7 @@ less "$(./getvar --arch aarch64 --emulator gem5 --run-id 1 termout_file)"
    -

    29.11. Build variants

    +

    30.11. Build variants

    It often happens that you are comparing two versions of the build, a good and a bad one, and trying to figure out why the bad one is bad.

    @@ -29815,7 +29864,7 @@ less "$(./getvar --arch aarch64 --emulator gem5 --run-id 1 termout_file)"

    Our build variants system allows you to keep multiple built versions of all major components, so that you can easily switching between running one or the other.

    -

    29.11.1. Linux kernel build variants

    +

    30.11.1. Linux kernel build variants

    If you want to keep two builds around, one for the latest Linux version, and the other for Linux v4.16:

    @@ -29851,11 +29900,11 @@ git -C "$(./getvar linux_source_dir)" checkout -
    -

    To run both kernels simultaneously, one on each QEMU instance, see: Section 29.10, “Simultaneous runs”.

    +

    To run both kernels simultaneously, one on each QEMU instance, see: Section 30.10, “Simultaneous runs”.

    -

    29.11.2. QEMU build variants

    +

    30.11.2. QEMU build variants

    Analogous to the Linux kernel build variants but with the --qemu-build-id option instead:

    @@ -29871,7 +29920,7 @@ git -C "$(./getvar qemu_source_dir)" checkout -
    -

    29.11.3. gem5 build variants

    +

    30.11.3. gem5 build variants

    Analogous to the Linux kernel build variants but with the --gem5-build-id option instead:

    @@ -29902,7 +29951,7 @@ git -C "$(./getvar gem5_source_dir)" checkout some-branch

    Therefore, you can’t forget to checkout to the sources to that of the corresponding build before running, unless you explicitly tell gem5 to use a non-default source tree with gem5 worktree. This becomes inevitable when you want to launch multiple simultaneous runs at different checkouts.

    -
    29.11.3.1. gem5 worktree
    +
    30.11.3.1. gem5 worktree

    --gem5-build-id goes a long way, but if you want to seamlessly switch between two gem5 tress without checking out multiple times, then --gem5-worktree is for you.

    @@ -29955,7 +30004,7 @@ cd -
    -
    29.11.3.2. gem5 private source trees
    +
    30.11.3.2. gem5 private source trees

    Suppose that you are working on a private fork of gem5, but you want to use this repository to develop it as well.

    @@ -29999,7 +30048,7 @@ gem5_internal="$(pwd)/gem5-internal"
    -

    29.11.4. Buildroot build variants

    +

    30.11.4. Buildroot build variants

    Allows you to have multiple versions of the GCC toolchain or root filesystem.

    @@ -30019,9 +30068,9 @@ git -C "$(./getvar buildroot_source_dir)" checkout -
    -

    29.12. Directory structure

    +

    30.12. Directory structure

    -

    29.12.1. lkmc directory

    +

    30.12.1. lkmc directory

    lkmc/ contains sources and headers that are shared across kernel modules, userland and baremetal examples.

    @@ -30032,7 +30081,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -

    Another option would have been to name it as includes/lkmc, but that would make paths longer, and we might want to store source code in that directory as well in the future.

    -
    29.12.1.1. Userland objects vs header-only
    +
    30.12.1.1. Userland objects vs header-only

    When factoring out functionality across userland examples, there are two main options:

    @@ -30091,7 +30140,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -
    -

    29.12.2. buildroot_packages directory

    +

    30.12.2. buildroot_packages directory

    Source: buildroot_packages/

    @@ -30140,7 +30189,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -

    A custom build script can give you more flexibility: e.g. the package can be made work with other root filesystems more easily, have better 9P support, and rebuild faster as it evades some Buildroot boilerplate.

    -
    29.12.2.1. kernel_modules buildroot package
    +
    30.12.2.1. kernel_modules buildroot package

    Source: buildroot_packages/kernel_modules/

    @@ -30187,9 +30236,9 @@ git -C "$(./getvar buildroot_source_dir)" checkout -
    -

    29.12.3. patches directory

    +

    30.12.3. patches directory

    -
    29.12.3.1. patches/global directory
    +
    30.12.3.1. patches/global directory

    Has the following structure:

    @@ -30206,7 +30255,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -
    -
    29.12.3.2. patches/manual directory
    +
    30.12.3.2. patches/manual directory

    Patches in this directory are never applied automatically: it is up to users to manually apply them before usage following the instructions in this documentation.

    @@ -30216,7 +30265,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -
    -

    29.12.4. rootfs_overlay

    +

    30.12.4. rootfs_overlay

    We use this directory for:

    @@ -30261,7 +30310,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -
    -

    29.12.5. lkmc.c

    +

    30.12.5. lkmc.c

    The files:

    @@ -30291,7 +30340,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -
    -

    29.12.6. rand_check.out

    +

    30.12.6. rand_check.out

    Print out several parameters that normally change randomly from boot to boot:

    @@ -30318,7 +30367,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -
    -

    29.12.7. lkmc_home

    +

    30.12.7. lkmc_home

    lkmc_home refers to the target base directory in which we put all our custom built stuff, such as userland executables and kernel modules.

    @@ -30352,9 +30401,9 @@ git -C "$(./getvar buildroot_source_dir)" checkout -
    -

    29.13. Test this repo

    +

    30.13. Test this repo

    -

    29.13.1. Automated tests

    +

    30.13.1. Automated tests

    Run almost all tests:

    @@ -30410,7 +30459,7 @@ echo $?

    test does not all possible tests, because there are too many possible variations and that would take forever. The rationale is the same as for ./build all and is explained in ./build --help.

    -
    29.13.1.1. Test arch and emulator selection
    +
    30.13.1.1. Test arch and emulator selection

    You can select multiple archs and emulators of interest, as for an other command, with:

    @@ -30443,7 +30492,7 @@ echo $?
    -
    29.13.1.2. Quit on fail
    +
    30.13.1.2. Quit on fail

    By default, continue running even after the first failure happens, and they show a summary at the end.

    @@ -30457,7 +30506,7 @@ echo $?
    -
    29.13.1.3. Test userland in full system
    +
    30.13.1.3. Test userland in full system

    TODO: we really need a mechanism to automatically generate the test list automatically e.g. based on path_properties, currently there are many tests missing, and we have to add everything manually which is very annoying.

    @@ -30486,7 +30535,7 @@ echo $?
    -
    29.13.1.4. GDB tests
    +
    30.13.1.4. GDB tests

    We have some pexpect automated tests for GDB for both userland and baremetal programs!

    @@ -30559,7 +30608,7 @@ echo $?
    -
    29.13.1.5. Magic failure string
    +
    30.13.1.5. Magic failure string

    We do not know of any way to set the emulator exit status in QEMU arm full system.

    @@ -30662,9 +30711,9 @@ echo $?
    -

    29.13.2. Non-automated tests

    +

    30.13.2. Non-automated tests

    -
    29.13.2.1. Test GDB Linux kernel
    +
    30.13.2.1. Test GDB Linux kernel

    For the Linux kernel, do the following manual tests for now.

    @@ -30702,7 +30751,7 @@ echo $?
    -
    29.13.2.2. Test the Internet
    +
    30.13.2.2. Test the Internet

    You should also test that the Internet works:

    @@ -30713,7 +30762,7 @@ echo $?
    -
    29.13.2.3. CLI script tests
    +
    30.13.2.3. CLI script tests

    build-userland and test-executables have a wide variety of target selection modes, and it was hard to keep them all working without some tests:

    @@ -30731,7 +30780,7 @@ echo $?
    -

    29.14. Bisection

    +

    30.14. Bisection

    When updating the Linux kernel, QEMU and gem5, things sometimes break.

    @@ -30787,7 +30836,7 @@ git submodule update
    -

    29.15. path_properties

    +

    30.15. path_properties

    In order to build and run each userland and baremetal example properly, we need per-file metadata such as compiler flags and required number of cores.

    @@ -30830,7 +30879,7 @@ git submodule update
    -

    29.16. Update a forked submodule

    +

    30.16. Update a forked submodule

    This is a template update procedure for submodules for which we have some patches on on top of mainline.

    @@ -30860,9 +30909,9 @@ git commit -m "linux: update to ${next_mainline_revision}"
    -

    29.17. Release

    +

    30.17. Release

    -

    29.17.1. Release procedure

    +

    30.17.1. Release procedure

    Ensure that the Automated tests are passing on a clean build:

    @@ -30873,7 +30922,7 @@ git commit -m "linux: update to ${next_mainline_revision}"
    -

    The ./build-test command builds a superset of what will be downloaded which also tests other things we would like to be working on the release. For the minimal build to generate the files to be uploaded, see: Section 29.17.2, “release-zip”

    +

    The ./build-test command builds a superset of what will be downloaded which also tests other things we would like to be working on the release. For the minimal build to generate the files to be uploaded, see: Section 30.17.2, “release-zip”

    The clean build is necessary as it generates clean images since it is not possible to remove Buildroot packages

    @@ -30943,7 +30992,7 @@ git push --follow-tags
    -

    29.17.2. release-zip

    +

    30.17.2. release-zip

    Create a zip containing all files required for Prebuilt setup:

    @@ -30968,7 +31017,7 @@ git push --follow-tags
    -

    29.17.3. release-upload

    +

    30.17.3. release-upload

    After:

    @@ -31016,9 +31065,9 @@ git push --follow-tags
    -

    29.18. Design rationale

    +

    30.18. Design rationale

    -

    29.18.1. Design goals

    +

    30.18.1. Design goals

    This project was created to help me understand, modify and test low level system components by using system simulators.

    @@ -31094,7 +31143,7 @@ git push --follow-tags
    -

    29.18.2. Setup trade-offs

    +

    30.18.2. Setup trade-offs

    The trade-offs between the different setups are basically a balance between:

    @@ -31119,13 +31168,13 @@ git push --follow-tags

    compatibility: how likely is is that all the components will work well together: emulator, compiler, kernel, standard library, …​

  • -

    guest software availability: how wide is your choice of easily installed guest software packages? See also: Section 29.18.4, “Linux distro choice”

    +

    guest software availability: how wide is your choice of easily installed guest software packages? See also: Section 30.18.4, “Linux distro choice”

  • -

    29.18.3. Resource tradeoff guidelines

    +

    30.18.3. Resource tradeoff guidelines

    Choosing which features go into our default builds means making tradeoffs, here are our guidelines:

    @@ -31166,11 +31215,11 @@ git push --follow-tags
    -

    In order to learn how to measure some of those aspects, see: Section 28, “Benchmark this repo”.

    +

    In order to learn how to measure some of those aspects, see: Section 29, “Benchmark this repo”.

    -

    29.18.4. Linux distro choice

    +

    30.18.4. Linux distro choice

    We haven’t found the ultimate distro yet, here is a summary table of trade-offs that we care about: Table 7, “Comparison of Linux distros for usage in this repository”.

    @@ -31273,9 +31322,9 @@ git push --follow-tags
    -

    29.19. Soft topics

    +

    30.19. Soft topics

    -

    29.19.1. Fairy tale

    +

    30.19.1. Fairy tale

    @@ -31312,7 +31361,7 @@ git push --follow-tags
    -

    29.19.2. Should you waste your life with systems programming?

    +

    30.19.2. Should you waste your life with systems programming?

    Being the hardcore person who fully understands an important complex system such as a computer, it does have a nice ring to it doesn’t it?

    @@ -31397,7 +31446,7 @@ git push --follow-tags
    -

    29.20. Bibliography

    +

    30.20. Bibliography

    Runnable stuff: