diff --git a/index.html b/index.html index 3204859..cf410a7 100644 --- a/index.html +++ b/index.html @@ -461,1333 +461,1339 @@ pre{ white-space:pre }

The perfect emulation setup to study and develop the Linux kernel v5.9.2, kernel modules, QEMU, gem5 and x86_64, ARMv7 and ARMv8 userland and baremetal assembly, ANSI C, C++ and POSIX. GDB step debug and KGDB just work. Powered by Buildroot and crosstool-NG. Highly automated. Thoroughly documented. Automated tests. "Tested" in an Ubuntu 20.04 host.

-

TL;DR: Section 1.2.1, “QEMU Buildroot setup getting started”

+

TL;DR: Section 2.2.1, “QEMU Buildroot setup getting started”

The source code for this page is located at: https://github.com/cirosantilli/linux-kernel-module-cheat. Due to a GitHub limitation, this README is too long and not fully rendered on github.com, so either use: https://cirosantilli.com/linux-kernel-module-cheat or build the docs yourself.

+
+
+Xinjiang prisoners sitting identified +
+
-

1. Getting started

+

1. --china

+
+
+

The most important functionality of this repository is the --china option, sample usage:

+
+
+
+
./setup
+./run --china > index.html
+firefox index.html
+
+
+ +
+

The secondary systems programming functionality is described on the sections below starting from Getting started.

+
+
+
+Tiananmen cute girls +
+
+
+
+
+

2. Getting started

Each child section describes a possible different setup for this repo.

@@ -2341,10 +2373,10 @@ pre{ white-space:pre }

If you don’t know which one to go for, start with QEMU Buildroot setup getting started.

-

Design goals of this project are documented at: Section 37.20.1, “Design goals”.

+

Design goals of this project are documented at: Section 38.20.1, “Design goals”.

-

1.1. Should you waste your life with systems programming?

+

2.1. Should you waste your life with systems programming?

Being the hardcore person who fully understands an important complex system such as a computer, it does have a nice ring to it doesn’t it?

@@ -2434,11 +2466,11 @@ pre{ white-space:pre }
-

1.2. QEMU Buildroot setup

+

2.2. QEMU Buildroot setup

-

1.2.1. QEMU Buildroot setup getting started

+

2.2.1. QEMU Buildroot setup getting started

-

This setup has been mostly tested on Ubuntu. For other host operating systems see: Section 37.1, “Supported hosts”. For greater stability, consider using the latest release instead of master: https://github.com/cirosantilli/linux-kernel-module-cheat/releases

+

This setup has been mostly tested on Ubuntu. For other host operating systems see: Section 38.1, “Supported hosts”. For greater stability, consider using the latest release instead of master: https://github.com/cirosantilli/linux-kernel-module-cheat/releases

Reserve 12Gb of disk and run:

@@ -2456,7 +2488,7 @@ cd linux-kernel-module-cheat

You don’t need to clone recursively even though we have .git submodules: download-dependencies fetches just the submodules that you need for this build to save time.

The initial build will take a while (30 minutes to 2 hours) to clone and build, see Benchmark builds for more details.

@@ -2524,7 +2556,7 @@ hello2 cleanup

All available modules can be found in the kernel_modules directory.

@@ -2540,7 +2572,7 @@ hello2 cleanup
-

To avoid typing --arch aarch64 many times, you can set the default arch as explained at: Section 37.4, “Default command line arguments”

+

To avoid typing --arch aarch64 many times, you can set the default arch as explained at: Section 38.4, “Default command line arguments”

I now urge you to read the following sections which contain widely applicable information:

@@ -2619,12 +2651,12 @@ hello /root/.profile
-

1.2.2. How to hack stuff

+

2.2.2. How to hack stuff

Besides a seamless initial build, this project also aims to make it effortless to modify and rebuild several major components of the system, to serve as an awesome development setup.

-
1.2.2.1. Your first Linux kernel hack
+
2.2.2.1. Your first Linux kernel hack

Let’s hack up the Linux kernel entry point, which is an easy place to start.

@@ -2692,11 +2724,11 @@ hello /root/.profile

see also: Dry run to get commands for your project.

-

When you reach difficulties, QEMU makes it possible to easily GDB step debug the Linux kernel source code, see: Section 2, “GDB step debug”.

+

When you reach difficulties, QEMU makes it possible to easily GDB step debug the Linux kernel source code, see: Section 3, “GDB step debug”.

-
1.2.2.2. Your first kernel module hack
+
2.2.2.2. Your first kernel module hack

Edit kernel_modules/hello.c to contain:

@@ -2780,11 +2812,11 @@ hello /root/.profile

All of this put together makes the safe procedure acceptably fast for regular development as well.

-

It is also easy to GDB step debug kernel modules with our setup, see: Section 2.4, “GDB step debug kernel module”.

+

It is also easy to GDB step debug kernel modules with our setup, see: Section 3.4, “GDB step debug kernel module”.

-
1.2.2.3. Your first glibc hack
+
2.2.2.3. Your first glibc hack

We use glibc as our default libc now, and it is tracked as an unmodified submodule at submodules/glibc, at the exact same version that Buildroot has it, which can be found at: package/glibc/glibc.mk. Buildroot 2018.05 applies no patches.

@@ -2872,7 +2904,7 @@ index 706b20b492..23185948f3 100644
-
1.2.2.4. Your first Binutils hack
+
2.2.2.4. Your first Binutils hack

Have you ever felt that a single inc instruction was not enough? Really? Me too!

@@ -2958,7 +2990,7 @@ index af583ce578..3cc341f303 100644
-
1.2.2.5. Your first GCC hack
+
2.2.2.5. Your first GCC hack

OK, now time to hack GCC.

@@ -3061,7 +3093,7 @@ j = 0
-

1.2.3. About the QEMU Buildroot setup

+

2.2.3. About the QEMU Buildroot setup

What QEMU and Buildroot are:

@@ -3097,7 +3129,7 @@ j = 0
-

1.3. Dry run to get commands for your project

+

2.3. Dry run to get commands for your project

One of the major features of this repository is that we try to support the --dry-run option really well for all scripts.

@@ -3183,9 +3215,9 @@ j = 0
-

1.4. gem5 Buildroot setup

+

2.4. gem5 Buildroot setup

-

1.4.1. About the gem5 Buildroot setup

+

2.4.1. About the gem5 Buildroot setup

This setup is like the QEMU Buildroot setup, but it uses gem5 instead of QEMU as a system simulator.

@@ -3212,7 +3244,7 @@ j = 0
-

and can therefore be used to estimate system performance, see: Section 23.2, “gem5 run benchmark” for an example.

+

and can therefore be used to estimate system performance, see: Section 24.2, “gem5 run benchmark” for an example.

The downside of gem5 much slower than QEMU because of the greater simulation detail.

@@ -3222,7 +3254,7 @@ j = 0
-

1.4.2. gem5 Buildroot setup getting started

+

2.4.2. gem5 Buildroot setup getting started

For the most part, if you just add the --emulator gem5 option or *-gem5 suffix to all commands and everything should magically work.

@@ -3270,7 +3302,7 @@ j = 0
-

See also: Section 2.3.1, “tmux gem5”.

+

See also: Section 3.3.1, “tmux gem5”.

At the end of boot, it might not be very clear that you have the shell since some printk messages may appear in front of the prompt like this:

@@ -3293,7 +3325,7 @@ j = 0
-

More gem5 information is present at: Section 23, “gem5”

+

More gem5 information is present at: Section 24, “gem5”

Good next steps are:

@@ -3317,12 +3349,12 @@ j = 0
-

1.5. Docker host setup

+

2.5. Docker host setup

This repository has been tested inside clean Docker containers.

-

This is a good option if you are on a Linux host, but the native setup failed due to your weird host distribution, and you have better things to do with your life than to debug it. See also: Section 37.1, “Supported hosts”.

+

This is a good option if you are on a Linux host, but the native setup failed due to your weird host distribution, and you have better things to do with your life than to debug it. See also: Section 38.1, “Supported hosts”.

For example, to do a QEMU Buildroot setup inside Docker, run:

@@ -3332,7 +3364,7 @@ j = 0
sudo apt-get install docker
 ./run-docker create && \
 ./run-docker sh -- ./build --download-dependencies qemu-buildroot
-./run-docker sh
+./run-docker
@@ -3374,7 +3406,7 @@ j = 0
  • -

    ./run-docker sh: open a shell on the container.

    +

    ./run-docker: open a shell on the container.

    If it has not been started previously, start it. This can also be done explicitly with:

    @@ -3417,7 +3449,7 @@ j = 0
    -
    ./run-docker sh
    +
    ./run-docker
    @@ -3466,9 +3498,9 @@ j = 0
    -

    1.6. Prebuilt setup

    +

    2.6. Prebuilt setup

    -

    1.6.1. About the prebuilt setup

    +

    2.6.1. About the prebuilt setup

    This setup uses prebuilt binaries that we upload to GitHub from time to time.

    @@ -3510,7 +3542,7 @@ j = 0
    -

    1.6.2. Prebuilt setup getting started

    +

    2.6.2. Prebuilt setup getting started

    Checkout to the latest tag and use the Ubuntu packaged QEMU to boot Linux:

    @@ -3624,7 +3656,7 @@ unzip lkmc-*.zip
    -

    1.7. Host kernel module setup

    +

    2.7. Host kernel module setup

    THIS IS DANGEROUS (AND FUN), YOU HAVE BEEN WARNED

    @@ -3729,7 +3761,7 @@ sudo lsmod | grep hello
    -

    1.7.1. Hello host

    +

    2.7.1. Hello host

    Minimal host build system example:

    @@ -3746,9 +3778,9 @@ dmesg
    -

    1.8. Userland setup

    +

    2.8. Userland setup

    -

    1.8.1. About the userland setup

    +

    2.8.1. About the userland setup

    In order to test the kernel and emulators, userland content in the form of executables and scripts is of course required, and we store it mostly under:

    @@ -3798,14 +3830,14 @@ dmesg
    -

    1.8.2. Userland setup getting started

    +

    2.8.2. Userland setup getting started

    There are several ways to run our Userland content, notably:

  • -

    from full system simulation as shown at: Section 1.2.1, “QEMU Buildroot setup getting started”.

    +

    from full system simulation as shown at: Section 2.2.1, “QEMU Buildroot setup getting started”.

    This is the most reproducible and controlled environment, and all examples work there. But also the slower one to setup.

    @@ -3853,7 +3885,7 @@ dmesg
    -
    1.8.2.1. Userland setup getting started natively
    +
    2.8.2.1. Userland setup getting started natively

    With this setup, we will use the host toolchain and execute executables directly on the host.

    @@ -3961,7 +3993,7 @@ cd userland

    So you can use any option supported by build-userland script freely with build-userland-in-tree and build.

    -

    The situation is analogous for userland/test, test-executables-in-tree and test-executables, which are further documented at: Section 10.2, “User mode tests”.

    +

    The situation is analogous for userland/test, test-executables-in-tree and test-executables, which are further documented at: Section 11.2, “User mode tests”.

    Do a more clean out-of-tree build instead and run the program:

    @@ -3994,11 +4026,11 @@ cd userland
    -

    as shown at: Section 22.8, “Debug the emulator”, although direct GDB host usage works as well of course.

    +

    as shown at: Section 23.8, “Debug the emulator”, although direct GDB host usage works as well of course.

    -
    1.8.2.2. Userland setup getting started with prebuilt toolchain and QEMU user mode
    +
    2.8.2.2. Userland setup getting started with prebuilt toolchain and QEMU user mode

    If you are lazy to built the Buildroot toolchain and QEMU, but want to run e.g. ARM Userland assembly in User mode simulation, you can get away on Ubuntu 18.04 with just:

    @@ -4027,7 +4059,7 @@ cd userland
  • --gcc-which host: use the host toolchain.

    -

    We must pass this to ./run as well because QEMU must know which dynamic libraries to use. See also: Section 10.5, “User mode static executables”.

    +

    We must pass this to ./run as well because QEMU must know which dynamic libraries to use. See also: Section 11.5, “User mode static executables”.

  • @@ -4036,7 +4068,7 @@ cd userland
    -

    This present the usual trade-offs of using prebuilts as mentioned at: Section 1.6, “Prebuilt setup”.

    +

    This present the usual trade-offs of using prebuilts as mentioned at: Section 2.6, “Prebuilt setup”.

    Other functionality are analogous, e.g. testing:

    @@ -4069,7 +4101,7 @@ cd userland
    -
    1.8.2.3. Userland setup getting started full system
    +
    2.8.2.3. Userland setup getting started full system

    First ensure that QEMU Buildroot setup is working.

    @@ -4077,7 +4109,7 @@ cd userland

    After doing that setup, you can already execute your userland programs from inside QEMU: the only missing step is how to rebuild executables and run them.

    -

    And the answer is exactly analogous to what is shown at: Section 1.2.2.2, “Your first kernel module hack”

    +

    And the answer is exactly analogous to what is shown at: Section 2.2.2.2, “Your first kernel module hack”

    For example, if we modify userland/c/hello.c to print out something different, we can just rebuild it with:

    @@ -4118,9 +4150,9 @@ cd userland
    -

    1.9. Baremetal setup

    +

    2.9. Baremetal setup

    -

    1.9.1. About the baremetal setup

    +

    2.9.1. About the baremetal setup

    This setup does not use the Linux kernel nor Buildroot at all: it just runs your very own minimal OS.

    @@ -4151,7 +4183,7 @@ cd userland
    -

    1.9.2. Baremetal setup getting started

    +

    2.9.2. Baremetal setup getting started

    Every .c file inside baremetal/ and .S file inside baremetal/arch/<arch>/ generates a separate baremetal image.

    @@ -4295,7 +4327,7 @@ error: simulation error detected by parsing logs
    -

    TODO: the carriage returns are a bit different than in QEMU, see: Section 32.6, “gem5 baremetal carriage return”.

    +

    TODO: the carriage returns are a bit different than in QEMU, see: Section 33.6, “gem5 baremetal carriage return”.

    Note that ./build-baremetal requires the --emulator gem5 option, and generates separate executable images for both, as can be seen from:

    @@ -4328,7 +4360,7 @@ echo "$(./getvar --arch aarch64 --emulator gem5 image)"
    -

    see also: Section 23.18, “gem5 ARM platforms”.

    +

    see also: Section 24.18, “gem5 ARM platforms”.

    This generates yet new separate images with new magic constants:

    @@ -4343,10 +4375,10 @@ echo "$(./getvar --arch aarch64 --baremetal userland/c/hello.c --emulator gem5 -

    But just stick to newer and better VExpress_GEM5_V1 unless you have a good reason to use RealViewPBX.

    -

    When doing baremetal programming, it is likely that you will want to learn userland assembly first, see: Section 27, “Userland assembly”.

    +

    When doing baremetal programming, it is likely that you will want to learn userland assembly first, see: Section 28, “Userland assembly”.

    -

    For more information on baremetal, see the section: Section 32, “Baremetal”.

    +

    For more information on baremetal, see the section: Section 33, “Baremetal”.

    The following subjects are particularly important:

    @@ -4364,7 +4396,7 @@ echo "$(./getvar --arch aarch64 --baremetal userland/c/hello.c --emulator gem5 -
    -

    1.10. Build the documentation

    +

    2.10. Build the documentation

    You don’t need to depend on GitHub.

    @@ -4412,16 +4444,16 @@ xdg-open README.html
    -

    More information about our documentation internals can be found at: Section 37.5, “Documentation”

    +

    More information about our documentation internals can be found at: Section 38.5, “Documentation”

    -

    2. GDB step debug

    +

    3. GDB step debug

    -

    2.1. GDB step debug kernel boot

    +

    3.1. GDB step debug kernel boot

    --gdb-wait makes QEMU and gem5 wait for a GDB connection, otherwise we could accidentally go past the point we want to break at:

    @@ -4470,7 +4502,7 @@ continue
    -

    2.1.1. GDB step debug kernel boot other archs

    +

    3.1.1. GDB step debug kernel boot other archs

    Just don’t forget to pass --arch to ./run-gdb, e.g.:

    @@ -4489,7 +4521,7 @@ continue
    -

    2.2. GDB step debug kernel post-boot

    +

    3.2. GDB step debug kernel post-boot

    Let’s observe the kernel write system call as it reacts to some userland actions.

    @@ -4567,7 +4599,7 @@ continue
    -

    2.3. tmux

    +

    3.3. tmux

    tmux just makes things even more fun by allowing us to see both the terminal for:

    @@ -4680,7 +4712,7 @@ continue

    Bibliography: https://unix.stackexchange.com/questions/152738/how-to-split-a-new-window-and-run-a-command-in-this-new-window-using-tmux/432111#432111

    -

    2.3.1. tmux gem5

    +

    3.3.1. tmux gem5

    If you are using gem5 instead of QEMU, --tmux has a different effect by default: it opens the gem5 terminal instead of the debugger:

    @@ -4722,7 +4754,7 @@ continue
    -

    2.4. GDB step debug kernel module

    +

    3.4. GDB step debug kernel module

    @@ -4795,7 +4827,7 @@ continue

    TODO: why does break work_func for insmod kthread.ko not very well? Sometimes it breaks but not others.

    -

    2.4.1. GDB step debug kernel module insmodded by init on ARM

    +

    3.4.1. GDB step debug kernel module insmodded by init on ARM

    TODO on arm 51e31cdc2933a774c2a0dc62664ad8acec1d2dbe it does not always work, and lx-symbols fails with the message:

    @@ -4870,7 +4902,7 @@ Error occurred in Python command: Cannot access memory at address 0xbf0000cc
    -

    2.4.2. GDB module_init

    +

    3.4.2. GDB module_init

    TODO find a more convenient method. We have working methods, but they are not ideal.

    @@ -4891,7 +4923,7 @@ Error occurred in Python command: Cannot access memory at address 0xbf0000cc
    -
    2.4.2.1. GDB module_init step into it
    +
    3.4.2.1. GDB module_init step into it

    This is the best method we’ve found so far.

    @@ -4938,7 +4970,7 @@ Error occurred in Python command: Cannot access memory at address 0xbf0000cc
    -
    2.4.2.2. GDB module_init calculate entry address
    +
    3.4.2.2. GDB module_init calculate entry address

    This works, but is a bit annoying.

    @@ -5014,7 +5046,7 @@ Error occurred in Python command: Cannot access memory at address 0xbf0000cc
    -
    2.4.2.3. GDB module_init break at the end of sys_init_module
    +
    3.4.2.3. GDB module_init break at the end of sys_init_module

    TODO not working. This could be potentially very convenient.

    @@ -5073,7 +5105,7 @@ Error occurred in Python command: Cannot access memory at address 0xbf0000cc
    -
    2.4.2.4. GDB module_init add trap instruction
    +
    3.4.2.4. GDB module_init add trap instruction

    This is another possibility: we could modify the module source by adding a trap instruction of some kind.

    @@ -5097,7 +5129,7 @@ Error occurred in Python command: Cannot access memory at address 0xbf0000cc
    -

    2.4.3. Bypass lx-symbols

    +

    3.4.3. Bypass lx-symbols

    Useless, but a good way to show how hardcore you are. Disable lx-symbols with:

    @@ -5170,7 +5202,7 @@ echo 'file kernel/module.c +p' > /sys/kernel/debug/dynamic_debug/control
    -

    2.5. GDB step debug early boot

    +

    3.5. GDB step debug early boot

    TODO successfully debug the very first instruction that the Linux kernel runs, before start_kernel!

    @@ -5245,7 +5277,7 @@ echo 'file kernel/module.c +p' > /sys/kernel/debug/dynamic_debug/control
    -

    2.5.1. Linux kernel entry point

    +

    3.5.1. Linux kernel entry point

    @@ -5284,7 +5316,7 @@ echo 'file kernel/module.c +p' > /sys/kernel/debug/dynamic_debug/control

    and no I do have the symbols from arch/arm/boot/compressed/vmlinux', but the breaks still don’t work.

    -

    v4.19 also added a CONFIG_HAVE_KERNEL_UNCOMPRESSED=y option for having the kernel uncompressed which could make following the startup easier, but it is only available on s390. aarch64 however is already uncompressed by default, so might be the easiest one. See also: Section 16.20.1, “vmlinux vs bzImage vs zImage vs Image”.

    +

    v4.19 also added a CONFIG_HAVE_KERNEL_UNCOMPRESSED=y option for having the kernel uncompressed which could make following the startup easier, but it is only available on s390. aarch64 however is already uncompressed by default, so might be the easiest one. See also: Section 17.20.1, “vmlinux vs bzImage vs zImage vs Image”.

    You then need the associated KERNEL_UNCOMPRESSED to enable it if available:

    @@ -5297,7 +5329,7 @@ echo 'file kernel/module.c +p' > /sys/kernel/debug/dynamic_debug/control
    -
    2.5.1.1. arm64 secondary CPU entry point
    +
    3.5.1.1. arm64 secondary CPU entry point

    In gem5 aarch64 Linux v4.18, experimentally the entry point of secondary CPUs seems to be secondary_holding_pen as shown at https://gist.github.com/cirosantilli2/34a7bc450fcb6c1c1a910369be1fdd90

    @@ -5384,7 +5416,7 @@ echo 'file kernel/module.c +p' > /sys/kernel/debug/dynamic_debug/control
    -

    2.5.2. Linux kernel arch-agnostic entry point

    +

    3.5.2. Linux kernel arch-agnostic entry point

    start_kernel is the first C function to be executed basically: https://stackoverflow.com/questions/18266063/does-kernel-have-main-function/33422401#33422401

    @@ -5393,7 +5425,7 @@ echo 'file kernel/module.c +p' > /sys/kernel/debug/dynamic_debug/control
    -

    2.5.3. Linux kernel early boot messages

    +

    3.5.3. Linux kernel early boot messages

    When booting Linux on a slow emulator like gem5, what you observe is that:

    @@ -5435,7 +5467,7 @@ echo 'file kernel/module.c +p' > /sys/kernel/debug/dynamic_debug/control
    -

    2.6. GDB step debug userland processes

    +

    3.6. GDB step debug userland processes

    QEMU’s -gdb GDB breakpoints are set on virtual addresses, so you can in theory debug userland processes as well.

    @@ -5455,7 +5487,7 @@ echo 'file kernel/module.c +p' > /sys/kernel/debug/dynamic_debug/control
    • -

      the emulator does not support host to guest networking. This seems to be the case for gem5 as explained at: Section 14.3.1.3, “gem5 host to guest networking”

      +

      the emulator does not support host to guest networking. This seems to be the case for gem5 as explained at: Section 15.3.1.3, “gem5 host to guest networking”

    • cannot see the start of the init process easily

      @@ -5473,7 +5505,7 @@ echo 'file kernel/module.c +p' > /sys/kernel/debug/dynamic_debug/control
    • the kernel might switch context to another process or to the kernel itself e.g. on a system call, and then TODO confirm the PIC would go to weird places and source code would be missing.

      -

      Solutions to this are being researched at: Section 2.10.1, “lx-ps”.

      +

      Solutions to this are being researched at: Section 3.10.1, “lx-ps”.

    • @@ -5491,7 +5523,7 @@ No loaded shared libraries match the pattern `../../staging/lib/libc.so.0'.
    -

    2.6.1. GDB step debug userland custom init

    +

    3.6.1. GDB step debug userland custom init

    This is the userland debug setup most likely to work, since at init time there is only one userland executable running.

    @@ -5534,7 +5566,7 @@ No loaded shared libraries match the pattern `../../staging/lib/libc.so.0'.
    -

    2.6.2. GDB step debug userland BusyBox init

    +

    3.6.2. GDB step debug userland BusyBox init

    BusyBox custom init process:

    @@ -5594,7 +5626,7 @@ No loaded shared libraries match the pattern `../../staging/lib/libc.so.0'.
    -

    2.6.3. GDB step debug userland non-init

    +

    3.6.3. GDB step debug userland non-init

    Non-init process:

    @@ -5630,7 +5662,7 @@ No loaded shared libraries match the pattern `../../staging/lib/libc.so.0'.This is the least reliable setup as there might be other processes that use the given virtual address.

    -
    2.6.3.1. GDB step debug userland non-init without --gdb-wait
    +
    3.6.3.1. GDB step debug userland non-init without --gdb-wait

    TODO: if I try GDB step debug userland non-init without --gdb-wait and the break main that we do inside ./run-gdb says:

    @@ -5708,7 +5740,7 @@ No loaded shared libraries match the pattern `../../staging/lib/libc.so.0'.
    -

    2.7. GDB call

    +

    3.7. GDB call

    @@ -5776,7 +5808,7 @@ Breakpoint 3 at 0xffffffff811615e3: fdget_pos. (9 locations)
    -

    2.8. GDB view ARM system registers

    +

    3.8. GDB view ARM system registers

    info all-registers shows some of them.

    @@ -5785,9 +5817,9 @@ Breakpoint 3 at 0xffffffff811615e3: fdget_pos. (9 locations)
    -

    2.9. GDB step debug multicore userland

    +

    3.9. GDB step debug multicore userland

    -

    For a more minimal baremetal multicore setup, see: Section 32.10.3, “ARM baremetal multicore”.

    +

    For a more minimal baremetal multicore setup, see: Section 33.10.3, “ARM baremetal multicore”.

    We can set and get which cores the Linux kernel allows a program to run on with sched_getaffinity and sched_setaffinity:

    @@ -5835,7 +5867,7 @@ sched_getcpu = 0
    -

    The number of cores is modified as explained at: Section 23.3.1, “Number of cores”

    +

    The number of cores is modified as explained at: Section 24.3.1, “Number of cores”

    taskset from the util-linux package sets the initial core affinity of a program:

    @@ -5964,7 +5996,7 @@ sched_getcpu = 0
    -

    2.10. Linux kernel GDB scripts

    +

    3.10. Linux kernel GDB scripts

    We source the Linux kernel GDB scripts by default for lx-symbols, but they also contains some other goodies worth looking into.

    @@ -6033,7 +6065,7 @@ pwd
    -

    2.10.1. lx-ps

    +

    3.10.1. lx-ps

    List all processes:

    @@ -6104,7 +6136,7 @@ pwd
    -
    2.10.1.1. CONFIG_PID_IN_CONTEXTIDR
    +
    3.10.1.1. CONFIG_PID_IN_CONTEXTIDR

    https://stackoverflow.com/questions/54133479/accessing-logical-software-thread-id-in-gem5 on ARM the kernel can store an indication of PID in the CONTEXTIDR_EL1 register, making that much easier to observe from simulators.

    @@ -6302,7 +6334,7 @@ pid=45
    -

    2.11. Debug the GDB remote protocol

    +

    3.11. Debug the GDB remote protocol

    For when it breaks again, or you want to add a new feature!

    @@ -6316,7 +6348,7 @@ pid=45

    See also: https://stackoverflow.com/questions/13496389/gdb-remote-protocol-how-to-analyse-packets

    -

    2.11.1. Remote 'g' packet reply is too long

    +

    3.11.1. Remote 'g' packet reply is too long

    This error means that the GDB server, e.g. in QEMU, sent more registers than the GDB client expected.

    @@ -6344,7 +6376,7 @@ pid=45
    -

    3. KGDB

    +

    4. KGDB

    KGDB is kernel dark magic that allows you to GDB the kernel on real hardware without any extra hardware support.

    @@ -6393,7 +6425,7 @@ Entering kdb (current=0x(____ptrval____), pid 1) on processor 0 due to Keyboard

    KGDB expects the connection at ttyS1, our second serial port after ttyS0 which contains the terminal.

    -

    The last line is the KDB prompt, and is covered at: Section 3.3, “KDB”. Typing now shows nothing because that prompt is expecting input from ttyS1.

    +

    The last line is the KDB prompt, and is covered at: Section 4.3, “KDB”. Typing now shows nothing because that prompt is expecting input from ttyS1.

    Instead, we connect to the serial port ttyS1 with GDB:

    @@ -6461,7 +6493,7 @@ continue
    -

    3.1. KGDB ARM

    +

    4.1. KGDB ARM

    TODO: we would need a second serial for KGDB to work, but it is not currently supported on arm and aarch64 with -M virt that we use: https://unix.stackexchange.com/questions/479085/can-qemu-m-virt-on-arm-aarch64-have-multiple-serial-ttys-like-such-as-pl011-t/479340#479340

    @@ -6473,7 +6505,7 @@ continue
    -

    3.2. KGDB kernel modules

    +

    4.2. KGDB kernel modules

    Just works as you would expect:

    @@ -6499,7 +6531,7 @@ continue
    -

    3.3. KDB

    +

    4.3. KDB

    KDB is a way to use KDB directly in your main console, without GDB.

    @@ -6564,7 +6596,7 @@ continue

    The other KDB commands allow you to step instructions, view memory, registers and some higher level kernel runtime data similar to the superior GDB Python scripts.

    -

    3.3.1. KDB graphic

    +

    4.3.1. KDB graphic

    You can also use KDB directly from the graphic window with:

    @@ -6581,7 +6613,7 @@ continue
    -

    3.3.2. KDB ARM

    +

    4.3.2. KDB ARM

    TODO neither arm and aarch64 are working as of 1cd1e58b023791606498ca509256cc48e95e4f5b + 1.

    @@ -6624,7 +6656,7 @@ el0_svc+0x8/0xc
    -

    4. gdbserver

    +

    5. gdbserver

    Step debug userland processes to understand how they are talking to the kernel.

    @@ -6668,7 +6700,7 @@ el0_svc+0x8/0xc

    Bibliography: https://reverseengineering.stackexchange.com/questions/8829/cross-debugging-for-arm-mips-elf-with-qemu-toolchain/16214#16214

    -

    4.1. gdbserver BusyBox

    +

    5.1. gdbserver BusyBox

    @@ -6687,7 +6719,7 @@ el0_svc+0x8/0xc
    -

    4.2. gdbserver libc

    +

    5.2. gdbserver libc

    Our setup gives you the rare opportunity to step debug libc and other system libraries.

    @@ -6743,7 +6775,7 @@ continue
    -

    4.3. gdbserver dynamic loader

    +

    5.3. gdbserver dynamic loader

    TODO: try to step debug the dynamic loader. Would be even easier if starti is available: https://stackoverflow.com/questions/10483544/stopping-at-the-first-machine-code-instruction-in-gdb

    @@ -6754,7 +6786,7 @@ continue
    -

    5. CPU architecture

    +

    6. CPU architecture

    The portability of the kernel and toolchains is amazing: change an option and most things magically work on completely different hardware.

    @@ -6795,9 +6827,9 @@ continue

    Known quirks of the supported architectures are documented in this section.

    -

    5.1. x86_64

    +

    6.1. x86_64

    -

    5.1.1. ring0

    +

    6.1.1. ring0

    This example illustrates how reading from the x86 control registers with mov crX, rax can only be done from kernel land on ring0.

    @@ -6878,9 +6910,9 @@ cr3 = 0xFFFFF0DCDC000
    -

    5.3. MIPS

    +

    6.3. MIPS

    We used to "support" it until f8c0502bb2680f2dbe7c1f3d7958f60265347005 (it booted) but dropped since one was testing it often.

    @@ -6913,7 +6945,7 @@ cr3 = 0xFFFFF0DCDC000
    -

    5.4. Other architectures

    +

    6.4. Other architectures

    It should not be too hard to port this repository to any architecture that Buildroot supports. Pull requests are welcome.

    @@ -6921,7 +6953,7 @@ cr3 = 0xFFFFF0DCDC000
    -

    6. init

    +

    7. init

    When the Linux kernel finishes booting, it runs an executable as the first and only userland process. This executable is called the init program.

    @@ -6942,10 +6974,10 @@ cr3 = 0xFFFFF0DCDC000

    The init program can be either an executable shell text file, or a compiled ELF file. It becomes easy to accept this once you see that the exec system call handles both cases equally: https://unix.stackexchange.com/questions/174062/can-the-init-process-be-a-shell-script-in-linux/395375#395375

    -

    The init executable is searched for in a list of paths in the root filesystem, including /init, /sbin/init and a few others. For more details see: Section 6.3, “Path to init”

    +

    The init executable is searched for in a list of paths in the root filesystem, including /init, /sbin/init and a few others. For more details see: Section 7.3, “Path to init”

    -

    6.1. Replace init

    +

    7.1. Replace init

    To have more control over the system, you can replace BusyBox’s init with your own.

    @@ -6961,7 +6993,7 @@ cr3 = 0xFFFFF0DCDC000

    This just counts every second forever and does not give you a shell.

    -

    This method is not very flexible however, as it is hard to reliably pass multiple commands and command line arguments to the init with it, as explained at: Section 6.4, “Init environment”.

    +

    This method is not very flexible however, as it is hard to reliably pass multiple commands and command line arguments to the init with it, as explained at: Section 7.4, “Init environment”.

    For this reason, we have created a more robust helper method with the --eval option:

    @@ -6983,10 +7015,10 @@ cr3 = 0xFFFFF0DCDC000

    Source: rootfs_overlay/lkmc/eval_base64.sh.

    -

    This allows quoting and newlines by base64 encoding on host, and decoding on guest, see: Section 16.3.1, “Kernel command line parameters escaping”.

    +

    This allows quoting and newlines by base64 encoding on host, and decoding on guest, see: Section 17.3.1, “Kernel command line parameters escaping”.

    -

    It also automatically chooses between init= and rcinit= for you, see: Section 6.3, “Path to init”

    +

    It also automatically chooses between init= and rcinit= for you, see: Section 7.3, “Path to init”

    --eval replaces BusyBox' init completely, which makes things more minimal, but also has has the following consequences:

    @@ -7012,7 +7044,7 @@ cr3 = 0xFFFFF0DCDC000
    -

    The best way to overcome those limitations is to use: Section 6.2, “Run command at the end of BusyBox init”

    +

    The best way to overcome those limitations is to use: Section 7.2, “Run command at the end of BusyBox init”

    If the script is large, you can add it to a gitignored file and pass that to --eval as in:

    @@ -7056,7 +7088,7 @@ chmod +x rootfs_overlay/lkmc/gitignore.sh
    -

    6.1.1. poweroff.out

    +

    7.1.1. poweroff.out

    Just using BusyBox' poweroff at the end of the init does not work and the kernel panics:

    @@ -7095,7 +7127,7 @@ chmod +x rootfs_overlay/lkmc/gitignore.sh
    -

    6.1.2. sleep_forever.out

    +

    7.1.2. sleep_forever.out

    I dare you to guess what this does:

    @@ -7112,7 +7144,7 @@ chmod +x rootfs_overlay/lkmc/gitignore.sh
    -

    6.1.3. time_boot.out

    +

    7.1.3. time_boot.out

    Get a reasonable answer to "how long does boot take in guest time?":

    @@ -7141,7 +7173,7 @@ chmod +x rootfs_overlay/lkmc/gitignore.sh
    -

    6.2. Run command at the end of BusyBox init

    +

    7.2. Run command at the end of BusyBox init

    Use the --eval-after option is for you rely on something that BusyBox' init set up for you like /etc/fstab:

    @@ -7186,7 +7218,7 @@ vim rootfs_overlay/etc/init.d/S99.gitignore
    -

    6.3. Path to init

    +

    7.3. Path to init

    The init is selected at:

    @@ -7216,7 +7248,7 @@ vim rootfs_overlay/etc/init.d/S99.gitignore
    -

    6.4. Init environment

    +

    7.4. Init environment

    @@ -7281,7 +7313,7 @@ asdf=qwer
    -

    6.4.1. init arguments

    +

    7.4.1. init arguments

    The annoying dash - gets passed as a parameter to init, which makes it impossible to use this method for most non custom executables.

    @@ -7312,7 +7344,7 @@ ab
    -

    6.4.2. init environment env

    +

    7.4.2. init environment env

    Wait, where do HOME and TERM come from? (greps the kernel). Ah, OK, the kernel sets those by default: https://github.com/torvalds/linux/blob/94710cac0ef4ee177a63b5227664b38c95bbf703/init/main.c#L173

    @@ -7323,7 +7355,7 @@ ab
    -

    6.4.3. BusyBox shell init environment

    +

    7.4.3. BusyBox shell init environment

    On top of the Linux kernel, the BusyBox /bin/sh shell will also define other variables.

    @@ -7374,7 +7406,7 @@ PWD=/
    -
    6.4.3.1. BusyBox shell initrc files
    +
    7.4.3.1. BusyBox shell initrc files

    Login shells source some default files, notably:

    @@ -7416,7 +7448,7 @@ $HOME/.profile
    -

    7. initrd

    +

    8. initrd

    The kernel can boot from an CPIO file, which is a directory serialization format much like tar: https://superuser.com/questions/343915/tar-vs-cpio-what-is-the-difference

    @@ -7468,7 +7500,7 @@ cat f

    which can be good for automated tests, as it ensures that you are using a pristine unmodified system image every time.

    -

    Not however that we already disable disk persistency by default on ext2 filesystems even without --initrd: Section 22.3, “Disk persistency”.

    +

    Not however that we already disable disk persistency by default on ext2 filesystems even without --initrd: Section 23.3, “Disk persistency”.

    One downside of this method is that it has to put the entire filesystem into memory, and could lead to a panic:

    @@ -7512,7 +7544,7 @@ cat f

    TODO: how does the bootloader inform the kernel where to find initrd? https://unix.stackexchange.com/questions/89923/how-does-linux-load-the-initrd-image

    -

    7.1. initrd in desktop distros

    +

    8.1. initrd in desktop distros

    Most modern desktop distributions have an initrd in their root disk to do early setup.

    @@ -7535,7 +7567,7 @@ cat f
    -

    7.2. initramfs

    +

    8.2. initramfs

    initramfs is just like initrd, but you also glue the image directly to the kernel image itself using the kernel’s build system.

    @@ -7598,7 +7630,7 @@ cat f
    -

    7.3. rootfs

    +

    8.3. rootfs

    This is how /proc/mounts shows the root filesystem:

    @@ -7629,14 +7661,14 @@ cat f
    -

    7.3.1. /dev/root

    +

    8.3.1. /dev/root

    -

    7.4. gem5 initrd

    +

    8.4. gem5 initrd

    @@ -7653,7 +7685,7 @@ cat f
    -

    7.5. gem5 initramfs

    +

    8.5. gem5 initramfs

    This could in theory be easier to make work than initrd since the emulator does not have to do anything special.

    @@ -7666,7 +7698,7 @@ cat f
    -

    We think that this might be because gem5 boots directly vmlinux, and not from the final compressed images that contain the attached rootfs such as bzImage, which is what QEMU does, see also: Section 16.20.1, “vmlinux vs bzImage vs zImage vs Image”.

    +

    We think that this might be because gem5 boots directly vmlinux, and not from the final compressed images that contain the attached rootfs such as bzImage, which is what QEMU does, see also: Section 17.20.1, “vmlinux vs bzImage vs zImage vs Image”.

    To do this failed test, we automatically pass a dummy disk image as of gem5 7fa4c946386e7207ad5859e8ade0bbfc14000d91 since the scripts don’t handle a missing --disk-image well, much like is currently done for Baremetal.

    @@ -7678,7 +7710,7 @@ cat f
    -

    8. Device tree

    +

    9. Device tree

    The device tree is a Linux kernel defined data structure that serves to inform the kernel how the hardware is setup.

    @@ -7706,7 +7738,7 @@ cat f

    The Linux kernel itself has several device trees under ./arch/<arch>/boot/dts, see also: https://stackoverflow.com/questions/21670967/how-to-compile-dts-linux-device-tree-source-files-to-dtb/42839737#42839737

    -

    8.1. DTB files

    +

    9.1. DTB files

    Files that contain device trees have the .dtb extension when compiled, and .dts when in text form.

    @@ -7754,7 +7786,7 @@ cat f
    -

    8.2. Device tree syntax

    +

    9.2. Device tree syntax

    Good format descriptions:

    @@ -7816,7 +7848,7 @@ cat f
    -

    8.3. Get device tree from a running kernel

    +

    9.3. Get device tree from a running kernel

    @@ -7873,7 +7905,7 @@ cat f
    -

    8.4. Device tree emulator generation

    +

    9.4. Device tree emulator generation

    Since emulators know everything about the hardware, they can automatically generate device trees for us, which is very convenient.

    @@ -7921,7 +7953,7 @@ cat f
    -

    9. KVM

    +

    10. KVM

    KVM is Linux kernel interface that greatly speeds up execution of virtual machines.

    @@ -7970,7 +8002,7 @@ cat f

    One important use case for KVM is to fast forward gem5 execution, often to skip boot, take a gem5 checkpoint, and then move on to a more detailed and slow simulation

    -

    9.1. KVM arm

    +

    10.1. KVM arm

    TODO: we haven’t gotten it to work yet, but it should be doable, and this is an outline of how to do it. Just don’t expect this to tested very often for now.

    @@ -8005,7 +8037,7 @@ cd linux-kernel-module-cheat
    -

    9.2. gem5 KVM

    +

    10.2. gem5 KVM

    While gem5 does have KVM, as of 2019 its support has not been very good, because debugging it is harder and people haven’t focused intensively on it.

    @@ -8044,7 +8076,7 @@ cd linux-kernel-module-cheat
    -

    10. User mode simulation

    +

    11. User mode simulation

    Both QEMU and gem5 have an user mode simulation mode in addition to full system simulation that we consider elsewhere in this project.

    @@ -8098,7 +8130,7 @@ cd linux-kernel-module-cheat
  • emulator implementers have to keep up with libc changes, some of which break even a C hello world due setup code executed before main.

    -

    See also: Section 10.4, “User mode simulation with glibc”

    +

    See also: Section 11.4, “User mode simulation with glibc”

  • @@ -8110,7 +8142,7 @@ cd linux-kernel-module-cheat
    -

    10.1. QEMU user mode getting started

    +

    11.1. QEMU user mode getting started

    Let’s run userland/c/command_line_arguments.c built with the Buildroot toolchain on QEMU user mode:

    @@ -8137,7 +8169,7 @@ qw er

    ./run --userland path resolution is analogous to that of ./run --baremetal.

    -

    ./build user-mode-qemu first builds Buildroot, and then runs ./build-userland, which is further documented at: Section 1.8, “Userland setup”. It also builds QEMU. If you ahve already done a QEMU Buildroot setup previously, this will be very fast.

    +

    ./build user-mode-qemu first builds Buildroot, and then runs ./build-userland, which is further documented at: Section 2.8, “Userland setup”. It also builds QEMU. If you ahve already done a QEMU Buildroot setup previously, this will be very fast.

    If you modify the userland programs, rebuild simply with:

    @@ -8177,7 +8209,7 @@ qw er
    -

    10.1.1. User mode GDB

    +

    11.1.1. User mode GDB

    It’s nice when the obvious just works, right?

    @@ -8217,12 +8249,12 @@ qw er
    -

    To stop at the very first instruction of a freestanding program, just use --no-continue. A good example of this is shown at: Section 27.5.1, “Freestanding programs”.

    +

    To stop at the very first instruction of a freestanding program, just use --no-continue. A good example of this is shown at: Section 28.5.1, “Freestanding programs”.

    -

    10.2. User mode tests

    +

    11.2. User mode tests

    Automatically run all userland tests that can be run in user mode simulation, and check that they exit with status 0:

    @@ -8267,14 +8299,14 @@ qw er

    Tests under userland/libs/ are only run if --package or --package-all are given as described at userland/libs directory.

    -

    The gem5 tests require building statically with build id static, see also: Section 10.7, “gem5 syscall emulation mode”. TODO automate this better.

    +

    The gem5 tests require building statically with build id static, see also: Section 11.7, “gem5 syscall emulation mode”. TODO automate this better.

    -

    See: Section 37.16, “Test this repo” for more useful testing tips.

    +

    See: Section 38.16, “Test this repo” for more useful testing tips.

    -

    10.3. User mode Buildroot executables

    +

    11.3. User mode Buildroot executables

    If you followed QEMU Buildroot setup, you can now run the executables created by Buildroot directly as:

    @@ -8311,16 +8343,16 @@ qw er
    -

    Here is an interesting examples of this: Section 16.19.1, “Linux Test Project”

    +

    Here is an interesting examples of this: Section 17.19.1, “Linux Test Project”

    -

    10.4. User mode simulation with glibc

    +

    11.4. User mode simulation with glibc

    At 125d14805f769104f93c510bedaa685a52ec025d we moved Buildroot from uClibc to glibc, and caused some user mode pain, which we document here.

    -

    10.4.1. FATAL: kernel too old failure in userland simulation

    +

    11.4.1. FATAL: kernel too old failure in userland simulation

    glibc has a check for kernel version, likely obtained from the uname syscall, and if the kernel is not new enough, it quits.

    @@ -8365,7 +8397,7 @@ qw er
    -

    10.4.2. stack smashing detected when using glibc

    +

    11.4.2. stack smashing detected when using glibc

    For some reason QEMU / glibc x86_64 picks up the host libc, which breaks things.

    @@ -8419,7 +8451,7 @@ qemu: uncaught target signal 6 (Aborted) - core dumped
    -

    10.5. User mode static executables

    +

    11.5. User mode static executables

    Example:

    @@ -8462,7 +8494,7 @@ qemu: uncaught target signal 6 (Aborted) - core dumped

    QEMU x86_64 guest on x86_64 host was failing with stack smashing detected when using glibc, but we found a workaround

  • -

    gem5 user only supported static executables in the past, as mentioned at: Section 10.7, “gem5 syscall emulation mode”

    +

    gem5 user only supported static executables in the past, as mentioned at: Section 11.7, “gem5 syscall emulation mode”

  • @@ -8493,7 +8525,7 @@ qemu: uncaught target signal 6 (Aborted) - core dumped
    -

    10.5.1. User mode static executables with dynamic libraries

    +

    11.5.1. User mode static executables with dynamic libraries

    One limitation of static executables is that Buildroot mostly only builds dynamic versions of libraries (the libc is an exception).

    @@ -8517,7 +8549,7 @@ qemu: uncaught target signal 6 (Aborted) - core dumped
    -
    10.5.1.1. C++ static and pthreads
    +
    11.5.1.1. C++ static and pthreads

    g++ and pthreads also causes issues:

    @@ -8584,7 +8616,7 @@ qemu: uncaught target signal 6 (Aborted) - core dumped
    -

    10.6. syscall emulation mode program stdin

    +

    11.6. syscall emulation mode program stdin

    The following work on both QEMU and gem5 as of LKMC 99d6bc6bc19d4c7f62b172643be95d9c43c26145 + 1. Interactive input:

    @@ -8638,7 +8670,7 @@ qemu: uncaught target signal 6 (Aborted) - core dumped
    -

    10.7. gem5 syscall emulation mode

    +

    11.7. gem5 syscall emulation mode

    Less robust than QEMU’s, but still usable:

    @@ -8690,7 +8722,7 @@ qemu: uncaught target signal 6 (Aborted) - core dumped
    -

    10.7.1. gem5 dynamic linked executables in syscall emulation

    +

    11.7.1. gem5 dynamic linked executables in syscall emulation

    Support for dynamic linking was added in November 2019:

    @@ -8705,11 +8737,11 @@ qemu: uncaught target signal 6 (Aborted) - core dumped
    -

    Note that as shown at Section 34.2.2, “Benchmark emulators on userland executables”, the dynamic version runs 200x more instructions, which might have an impact on smaller simulations in detailed CPUs.

    +

    Note that as shown at Section 35.2.2, “Benchmark emulators on userland executables”, the dynamic version runs 200x more instructions, which might have an impact on smaller simulations in detailed CPUs.

    -

    10.7.2. gem5 syscall emulation exit status

    +

    11.7.2. gem5 syscall emulation exit status

    As of gem5 7fa4c946386e7207ad5859e8ade0bbfc14000d91, the crappy se.py script does not forward the exit status of syscall emulation mode, you can test it with:

    @@ -8748,7 +8780,7 @@ qemu: uncaught target signal 6 (Aborted) - core dumped
    -

    10.7.3. gem5 syscall emulation mode syscall tracing

    +

    11.7.3. gem5 syscall emulation mode syscall tracing

    Since gem5 has to implement syscalls itself in syscall emulation mode, it can of course clearly see which syscalls are being made, and we can log them for debug purposes with gem5 tracing, e.g.:

    @@ -8794,7 +8826,7 @@ hello
    -

    10.7.4. gem5 syscall emulation multithreading

    +

    11.7.4. gem5 syscall emulation multithreading

    gem5 user mode multithreading has been particularly flaky compared to QEMU’s, but work is being put into improving it.

    @@ -8881,7 +8913,7 @@ hello
    -

    10.7.5. gem5 syscall emulation multiple executables

    +

    11.7.5. gem5 syscall emulation multiple executables

    gem5 syscall emulation has the nice feature of allowing you to run multiple executables "at once".

    @@ -8952,7 +8984,7 @@ pid=100

    and therefore shows one instruction running on each CPU for each process at the same time.

    -
    10.7.5.1. gem5 syscall emulation --smt
    +
    11.7.5.1. gem5 syscall emulation --smt

    gem5 b1623cb2087873f64197e503ab8894b5e4d4c7b4 syscall emulation has an --smt option presumably for Hardware threads but it has been neglected forever it seems: https://github.com/cirosantilli/linux-kernel-module-cheat/issues/104

    @@ -8985,9 +9017,9 @@ Program aborted at tick 0
    -

    10.8. QEMU user mode quirks

    +

    11.8. QEMU user mode quirks

    -

    10.8.1. QEMU user mode does not show stdout immediately

    +

    11.8.1. QEMU user mode does not show stdout immediately

    At 8d8307ac0710164701f6e14c99a69ee172ccbb70 + 1, I noticed that if you run userland/posix/count.c:

    @@ -9015,7 +9047,7 @@ Program aborted at tick 0

    TODO: investigate further and then possibly post on QEMU mailing list.

    -
    10.8.1.1. QEMU user mode does not show errors
    +
    11.8.1.1. QEMU user mode does not show errors

    Similarly to QEMU user mode does not show stdout immediately, QEMU error messages do not show at all through pipes.

    @@ -9036,10 +9068,10 @@ Program aborted at tick 0
    -

    11. Kernel module utilities

    +

    12. Kernel module utilities

    -

    11.1. insmod

    +

    12.1. insmod

    @@ -9050,7 +9082,7 @@ Program aborted at tick 0
    -

    11.2. myinsmod

    +

    12.2. myinsmod

    If you are feeling raw, you can insert and remove modules with our own minimal module inserter and remover!

    @@ -9118,7 +9150,7 @@ Program aborted at tick 0
    -

    11.3. modprobe

    +

    12.3. modprobe

    Implemented as a BusyBox applet by default: https://git.busybox.net/busybox/tree/modutils/modprobe.c?h=1_29_stable

    @@ -9142,10 +9174,10 @@ Program aborted at tick 0
    @@ -9161,13 +9193,13 @@ Program aborted at tick 0
  • -

    we would have to think how to not have to include the kernel modules twice in the root filesystem, but still have 9P working for fast development as described at: Section 1.2.2.2, “Your first kernel module hack”

    +

    we would have to think how to not have to include the kernel modules twice in the root filesystem, but still have 9P working for fast development as described at: Section 2.2.2.2, “Your first kernel module hack”

  • -

    11.4. kmod

    +

    12.4. kmod

    The more "reference" kernel.org implementation of lsmod, insmod, rmmod, etc.: https://git.kernel.org/pub/scm/utils/kernel/kmod/kmod.git

    @@ -9207,13 +9239,13 @@ Program aborted at tick 0

    BusyBox also implements its own version of those executables, see e.g. modprobe. Here we will only describe features that differ from kmod to the BusyBox implementation.

    -

    11.4.1. module-init-tools

    +

    12.4.1. module-init-tools

    Name of a predecessor set of tools.

    -

    11.4.2. kmod modprobe

    +

    12.4.2. kmod modprobe

    kmod’s modprobe can also load modules under different names to avoid conflicts, e.g.:

    @@ -9227,10 +9259,10 @@ Program aborted at tick 0
    -

    12. Filesystems

    +

    13. Filesystems

    -

    12.1. OverlayFS

    +

    13.1. OverlayFS

    OverlayFS is a filesystem merged in the Linux kernel in 3.18.

    @@ -9266,7 +9298,7 @@ Program aborted at tick 0

    no need to regenerate the root filesystem at all and reboot

  • -

    overcomes the check_bin_arch problem as shown at: Section 25.8, “Buildroot rebuild is slow when the root filesystem is large”

    +

    overcomes the check_bin_arch problem as shown at: Section 26.8, “Buildroot rebuild is slow when the root filesystem is large”

  • @@ -9353,7 +9385,7 @@ a crash or deadlock.
    -

    12.2. Secondary disk

    +

    13.2. Secondary disk

    A simpler and possibly less overhead alternative to 9P would be to generate a secondary disk image with the benchmark you want to rebuild.

    @@ -9413,7 +9445,7 @@ vim userland/c/hello.c
    -

    13. Graphics

    +

    14. Graphics

    Both QEMU and gem5 are capable of outputting graphics to the screen, and taking mouse and keyboard input.

    @@ -9422,7 +9454,7 @@ vim userland/c/hello.c

    https://unix.stackexchange.com/questions/307390/what-is-the-difference-between-ttys0-ttyusb0-and-ttyama0-in-linux

    -

    13.1. QEMU text mode

    +

    14.1. QEMU text mode

    Text mode is the default mode for QEMU.

    @@ -9438,7 +9470,7 @@ vim userland/c/hello.c
    -

    13.2. QEMU graphic mode

    +

    14.2. QEMU graphic mode

    Enable graphic mode with:

    @@ -9531,7 +9563,7 @@ vim userland/c/hello.c

    Outcome: you see a penguin due to CONFIG_LOGO.

    -

    For a more exciting GUI experience, see: Section 13.4, “X11 Buildroot”

    +

    For a more exciting GUI experience, see: Section 14.4, “X11 Buildroot”

    Text mode is the default due to the following considerable advantages:

    @@ -9588,7 +9620,7 @@ vim userland/c/hello.c

    flooding the screen with colors. See also: https://superuser.com/questions/223094/how-do-i-know-if-i-have-kms-enabled

    -

    13.2.1. Scroll up in graphic mode

    +

    14.2.1. Scroll up in graphic mode

    Scroll up in QEMU graphic mode:

    @@ -9615,9 +9647,9 @@ vim userland/c/hello.c
    -

    13.2.2. QEMU Graphic mode arm

    +

    14.2.2. QEMU Graphic mode arm

    -
    13.2.2.1. QEMU graphic mode arm terminal
    +
    14.2.2.1. QEMU graphic mode arm terminal

    TODO: on arm, we see the penguin and some boot messages, but don’t get a shell at then end:

    @@ -9650,7 +9682,7 @@ vim userland/c/hello.c
    -
    13.2.2.2. QEMU graphic mode arm terminal implementation
    +
    14.2.2.2. QEMU graphic mode arm terminal implementation

    arm and aarch64 rely on the QEMU CLI option:

    @@ -9676,7 +9708,7 @@ CONFIG_DRM_VIRTIO_GPU=y
    -
    13.2.2.3. QEMU graphic mode arm VGA
    +
    14.2.2.3. QEMU graphic mode arm VGA

    TODO: how to use VGA on ARM? https://stackoverflow.com/questions/20811203/how-can-i-output-to-vga-through-qemu-arm Tried:

    @@ -9702,7 +9734,7 @@ CONFIG_DRM_VIRTIO_GPU=y
    -

    13.3. gem5 graphic mode

    +

    14.3. gem5 graphic mode

    gem5 does not have a "text mode", since it cannot redirect the Linux terminal to same host terminal where the executable is running: you are always forced to connect to the terminal with gem-shell.

    @@ -9802,7 +9834,7 @@ CONFIG_DRM_VIRTIO_GPU=y

    Tested on: 38fd6153d965ba20145f53dc1bb3ba34b336bde9

    -

    13.3.1. Graphic mode gem5 aarch64

    +

    14.3.1. Graphic mode gem5 aarch64

    For aarch64 we also need to configure the kernel with linux_config/display:

    @@ -9825,7 +9857,7 @@ git -C "$(./getvar linux_source_dir)" checkout -
    -

    13.3.2. gem5 graphic mode DP650

    +

    14.3.2. gem5 graphic mode DP650

    TODO get working. There is an unmerged patchset at: https://gem5-review.googlesource.com/c/public/gem5/+/11036/1

    @@ -9845,7 +9877,7 @@ git -C "$(./getvar linux_source_dir)" checkout -
    -

    13.3.3. gem5 graphic mode internals

    +

    14.3.3. gem5 graphic mode internals

    We cannot use mainline Linux because the gem5 arm Linux kernel patches are required at least to provide the CONFIG_DRM_VIRT_ENCODER option.

    @@ -9908,7 +9940,7 @@ git -C "$(./getvar linux_source_dir)" checkout -
    -

    13.4. X11 Buildroot

    +

    14.4. X11 Buildroot

    Once you’ve seen the CONFIG_LOGO penguin as a sanity check, you can try to go for a cooler X11 Buildroot setup.

    @@ -9976,7 +10008,7 @@ xeyes
    -

    13.4.1. X11 Buildroot mouse not moving

    +

    14.4.1. X11 Buildroot mouse not moving

    TODO 9076c1d9bcc13b6efdb8ef502274f846d8d4e6a1 I’m 100% sure that it was working before, but I didn’t run it forever, and it stopped working at some point. Needs bisection, on whatever commit last touched x11 stuff.

    @@ -10044,7 +10076,7 @@ xeyes
    -

    13.4.2. X11 Buildroot ARM

    +

    14.4.2. X11 Buildroot ARM

    On ARM, startx hangs at a message:

    @@ -10087,12 +10119,12 @@ xeyes
    -

    14. Networking

    +

    15. Networking

    -

    14.1. Enable networking

    +

    15.1. Enable networking

    -

    We disable networking by default because it starts an userland process, and we want to keep the number of userland processes to a minimum to make the system more understandable as explained at: Section 37.20.3, “Resource tradeoff guidelines”

    +

    We disable networking by default because it starts an userland process, and we want to keep the number of userland processes to a minimum to make the system more understandable as explained at: Section 38.20.3, “Resource tradeoff guidelines”

    To enable networking on Buildroot, simply run:

    @@ -10127,7 +10159,7 @@ cat index.html
    -

    14.2. ping

    +

    15.2. ping

    ping does not work within QEMU by default, e.g.:

    @@ -10152,17 +10184,17 @@ cat index.html
    -

    14.3. Guest host networking

    +

    15.3. Guest host networking

    In this section we discuss how to interact between the guest and the host through networking.

    -

    First ensure that you can access the external network since that is easier to get working, see: Section 14, “Networking”.

    +

    First ensure that you can access the external network since that is easier to get working, see: Section 15, “Networking”.

    -

    14.3.1. Host to guest networking

    +

    15.3.1. Host to guest networking

    -
    14.3.1.1. nc host to guest
    +
    15.3.1.1. nc host to guest

    With nc we can create the most minimal example possible as a sanity check.

    @@ -10206,7 +10238,7 @@ cat index.html
    -
    14.3.1.2. ssh into guest
    +
    15.3.1.2. ssh into guest

    Not enabled by default due to the build / runtime overhead. To enable, build with:

    @@ -10239,14 +10271,14 @@ cat index.html
    -
    14.3.1.3. gem5 host to guest networking
    +
    15.3.1.3. gem5 host to guest networking

    Could not do port forwarding from host to guest, and therefore could not use gdbserver: https://stackoverflow.com/questions/48941494/how-to-do-port-forwarding-from-guest-to-host-in-gem5

    -

    14.3.2. Guest to host networking

    +

    15.3.2. Guest to host networking

    First Enable networking.

    @@ -10299,7 +10331,7 @@ cat index.html
    -

    14.4. 9P

    +

    15.4. 9P

    The 9p protocol allows the guest to mount a host directory.

    @@ -10307,7 +10339,7 @@ cat index.html

    Both QEMU and gem5 9P support 9P.

    -

    14.4.1. 9P vs NFS

    +

    15.4.1. 9P vs NFS

    All of 9P and NFS (and sshfs) allow sharing directories between guest and host.

    @@ -10348,7 +10380,7 @@ cat index.html
    -

    14.4.2. 9P getting started

    +

    15.4.2. 9P getting started

    As usual, we have already set everything up for you. On host:

    @@ -10430,7 +10462,7 @@ mount -t 9p -o trans=virtio,version=9p2000.L host0 /mnt/my9p
    -

    14.4.3. gem5 9P

    +

    15.4.3. gem5 9P

    Is possible on aarch64 as shown at: https://gem5-review.googlesource.com/c/public/gem5/+/22831, and it is just a matter of exposing to X86 for those that want it.

    @@ -10515,7 +10547,7 @@ m5 checkpoint
    -

    14.4.4. NFS

    +

    15.4.4. NFS

    TODO: get working.

    @@ -10523,7 +10555,7 @@ m5 checkpoint

    9P is better with emulation, but let’s just get this working for fun.

    -

    First make sure that this works: Section 14.3.2, “Guest to host networking”.

    +

    First make sure that this works: Section 15.3.2, “Guest to host networking”.

    Then, build the kernel with NFS support:

    @@ -10590,7 +10622,7 @@ mount -t nfs 10.0.2.2:/tmp /mnt/nfs
    -

    15. Operating systems

    +

    16. Operating systems

    https://en.wikipedia.org/wiki/Operating_system

    @@ -10617,15 +10649,15 @@ mount -t nfs 10.0.2.2:/tmp /mnt/nfs
    -

    16. Linux kernel

    +

    17. Linux kernel

    -

    16.1. Linux kernel configuration

    +

    17.1. Linux kernel configuration

    -

    16.1.1. Modify kernel config

    +

    17.1.1. Modify kernel config

    To modify a single option on top of our default kernel configs, do:

    @@ -10716,11 +10748,11 @@ cp "$(./getvar linux_build_dir)/defconfig" data/myconfig
    -

    You can also use other config generating targets such as defconfig with the same method as shown at: Section 16.1.3.1.1, “Linux kernel defconfig”.

    +

    You can also use other config generating targets such as defconfig with the same method as shown at: Section 17.1.3.1.1, “Linux kernel defconfig”.

    -

    16.1.2. Find the kernel config

    +

    17.1.2. Find the kernel config

    Get the build config in guest:

    @@ -10778,14 +10810,14 @@ CONFIG_IKCONFIG_PROC=y
    -

    16.1.3. About our Linux kernel configs

    +

    17.1.3. About our Linux kernel configs

    By default, build-linux generates a .config that is a mixture of:

    -
    16.1.3.1.2. Linux kernel min config
    +
    17.1.3.1.2. Linux kernel min config

    linux_config/min contains minimal tweaks required to boot gem5 or for using our slightly different QEMU command line options than Buildroot on all archs.

    -

    It is one of the default config fragments we use, as explained at: Section 16.1.3, “About our Linux kernel configs”>.

    +

    It is one of the default config fragments we use, as explained at: Section 17.1.3, “About our Linux kernel configs”>.

    Having the same config working for both QEMU and gem5 (oh, the hours of bisection) means that you can deal with functional matters in QEMU, which runs much faster, and switch to gem5 only for performance issues.

    @@ -10971,14 +11003,14 @@ CONFIG_IKCONFIG_PROC=y
    -
    16.1.3.2. Notable alternate gem5 kernel configs
    +
    17.1.3.2. Notable alternate gem5 kernel configs

    Other configs which we had previously tested at 4e0d9af81fcce2ce4e777cb82a1990d7c2ca7c1e are:

    -

    16.2. Kernel version

    +

    17.2. Kernel version

    -

    16.2.1. Find the kernel version

    +

    17.2.1. Find the kernel version

    We try to use the latest possible kernel major release version.

    @@ -11014,7 +11046,7 @@ git log | grep -E ' Linux [0-9]+\.' | head
    -

    16.2.2. Update the Linux kernel

    +

    17.2.2. Update the Linux kernel

    During update all you kernel modules may break since the kernel API is not stable.

    @@ -11031,15 +11063,15 @@ git log | grep -E ' Linux [0-9]+\.' | head

    This also makes this repo the perfect setup to develop the Linux kernel.

    -

    In case something breaks while updating the Linux kernel, you can try to bisect it to understand the root cause, see: Section 37.17, “Bisection”.

    +

    In case something breaks while updating the Linux kernel, you can try to bisect it to understand the root cause, see: Section 38.17, “Bisection”.

    -
    16.2.2.1. Update the Linux kernel LKMC procedure
    +
    17.2.2.1. Update the Linux kernel LKMC procedure
    -

    First, use use the branching procedure described at: Section 37.18, “Update a forked submodule”

    +

    First, use use the branching procedure described at: Section 38.18, “Update a forked submodule”

    -

    Because the kernel is so central to this repository, almost all tests must be re-run, so basically just follow the full testing procedure described at: Section 37.16, “Test this repo”. The only tests that can be skipped are essentially the Baremetal tests.

    +

    Because the kernel is so central to this repository, almost all tests must be re-run, so basically just follow the full testing procedure described at: Section 38.16, “Test this repo”. The only tests that can be skipped are essentially the Baremetal tests.

    Before comitting, don’t forget to update:

    @@ -11067,7 +11099,7 @@ git log | grep -E ' Linux [0-9]+\.' | head
    -

    16.2.3. Downgrade the Linux kernel

    +

    17.2.3. Downgrade the Linux kernel

    The kernel is not forward compatible, however, so downgrading the Linux kernel requires downgrading the userland too to the latest Buildroot branch that supports it.

    @@ -11143,7 +11175,7 @@ git log | grep -E ' Linux [0-9]+\.' | head
    -

    16.3. Kernel command line parameters

    +

    17.3. Kernel command line parameters

    Bootloaders can pass a string as input to the Linux kernel when it is booting to control its behaviour, much like the execve system call does to userland processes.

    @@ -11204,7 +11236,7 @@ git log | grep -E ' Linux [0-9]+\.' | head
    -

    16.3.1. Kernel command line parameters escaping

    +

    17.3.1. Kernel command line parameters escaping

    Double quotes can be used to escape spaces as in opt="a b", but double quotes themselves cannot be escaped, e.g. opt"a\"b"

    @@ -11213,7 +11245,7 @@ git log | grep -E ' Linux [0-9]+\.' | head
    -

    16.3.2. Kernel command line parameters definition points

    +

    17.3.2. Kernel command line parameters definition points

    There are two methods:

    @@ -11256,7 +11288,7 @@ git log | grep -E ' Linux [0-9]+\.' | head
    -

    16.3.3. rw

    +

    17.3.3. rw

    By default, the Linux kernel mounts the root filesystem as readonly. TODO rationale?

    @@ -11328,7 +11360,7 @@ mount
    -

    16.3.4. norandmaps

    +

    17.3.4. norandmaps

    Disable userland address space randomization. Test it out by running rand_check.out twice:

    @@ -11352,7 +11384,7 @@ mount
    -

    16.4. printk

    +

    17.4. printk

    printk is the most simple and widely used way of getting information from the kernel, so you should familiarize yourself with its basic configuration.

    @@ -11432,10 +11464,10 @@ mount
    -

    The debug highest level is a bit more magic, see: Section 16.4.3, “pr_debug” for more info.

    +

    The debug highest level is a bit more magic, see: Section 17.4.3, “pr_debug” for more info.

    -

    16.4.1. /proc/sys/kernel/printk

    +

    17.4.1. /proc/sys/kernel/printk

    The current printk level can be obtained with:

    @@ -11622,7 +11654,7 @@ early_param("quiet", quiet_kernel);
    -

    16.4.2. ignore_loglevel

    +

    17.4.2. ignore_loglevel

    ./run --kernel-cli 'ignore_loglevel'
    @@ -11641,7 +11673,7 @@ early_param("quiet", quiet_kernel);
    -

    16.4.3. pr_debug

    +

    17.4.3. pr_debug

    @@ -11739,7 +11771,7 @@ insmod myprintk.ko

    Get ready for the noisiest boot ever, I think it overflows the printk buffer and funny things happen.

    -
    16.4.3.1. pr_debug != printk(KERN_DEBUG
    +
    17.4.3.1. pr_debug != printk(KERN_DEBUG

    When CONFIG_DYNAMIC_DEBUG is set, printk(KERN_DEBUG is not the exact same as pr_debug( since printk(KERN_DEBUG messages are visible with:

    @@ -11788,9 +11820,9 @@ insmod myprintk.ko
    -

    16.5. Kernel module APIs

    +

    17.5. Kernel module APIs

    -

    16.5.1. Kernel module parameters

    +

    17.5.1. Kernel module parameters

    The Linux kernel allows passing module parameters at insertion time through the init_module and finit_module system calls.

    @@ -11863,7 +11895,7 @@ parm: i:my favorite int
    -
    16.5.1.1. modprobe.conf
    +
    17.5.1.1. modprobe.conf

    modprobe insertion can also set default parameters via the /etc/modprobe.conf file:

    @@ -11890,7 +11922,7 @@ cat /sys/kernel/debug/lkmc_params
    -

    16.5.2. Kernel module dependencies

    +

    17.5.2. Kernel module dependencies

    One module can depend on symbols of another module that are exported with EXPORT_SYMBOL:

    @@ -12003,7 +12035,7 @@ extra/dep.ko:

    TODO: what for, and at which point point does Buildroot / BusyBox generate that file?

    -
    16.5.2.1. Kernel module dependencies with modprobe
    +
    17.5.2.1. Kernel module dependencies with modprobe

    Unlike insmod, modprobe deals with kernel module dependencies for us.

    @@ -12089,7 +12121,7 @@ buildroot_dep 16384 1 buildroot_dep2
    -

    16.5.3. MODULE_INFO

    +

    17.5.3. MODULE_INFO

    Module metadata is stored on module files at compile time. Some of the fields can be retrieved through the THIS_MODULE struct module:

    @@ -12209,7 +12241,7 @@ vermagic: 4.17.0 SMP mod_unload modversions
    -

    16.5.4. vermagic

    +

    17.5.4. vermagic

    kernel_modules/vermagic.c

    @@ -12304,7 +12336,7 @@ vermagic: 4.17.0 SMP mod_unload modversions
    -

    16.5.5. init_module

    +

    17.5.5. init_module

    init_module and cleanup_module are an older alternative to the module_init and module_exit macros:

    @@ -12331,7 +12363,7 @@ cleanup_module
    -

    16.5.6. Floating point in kernel modules

    +

    17.5.6. Floating point in kernel modules

    It is generally hard / impossible to use floating point operations in the kernel. TODO understand details.

    @@ -12402,7 +12434,7 @@ cleanup_module
    -

    16.6. Kernel panic and oops

    +

    17.6. Kernel panic and oops

    To test out kernel panics and oops in controlled circumstances, try out the modules:

    @@ -12466,7 +12498,7 @@ insmod oops.ko
    -

    16.6.1. Kernel panic

    +

    17.6.1. Kernel panic

    On panic, the kernel dies, and so does our terminal.

    @@ -12516,7 +12548,7 @@ Kernel Offset: disabled
    -
    16.6.1.1. Kernel module stack trace to source line
    +
    17.6.1.1. Kernel module stack trace to source line

    The log shows which module each symbol belongs to if any, e.g.:

    @@ -12582,25 +12614,25 @@ Kernel Offset: disabled
    -
    16.6.1.2. BUG_ON
    +
    17.6.1.2. BUG_ON

    Basically just calls panic("BUG!") for most archs.

    -
    16.6.1.3. Exit emulator on panic
    +
    17.6.1.3. Exit emulator on panic

    For testing purposes, it is very useful to quit the emulator automatically with exit status non zero in case of kernel panic, instead of just hanging forever.

    -
    16.6.1.3.1. Exit QEMU on panic
    +
    17.6.1.3.1. Exit QEMU on panic

    Enabled by default with:

    -
    16.6.1.3.2. Exit gem5 on panic
    +
    17.6.1.3.2. Exit gem5 on panic

    gem5 9048ef0ffbf21bedb803b785fb68f83e95c04db8 (January 2019) can detect panics automatically if the option system.panic_on_panic is on.

    @@ -12679,7 +12711,7 @@ Kernel Offset: disabled
    -
    16.6.1.4. Reboot on panic
    +
    17.6.1.4. Reboot on panic

    Make the kernel reboot after n seconds after panic:

    @@ -12709,7 +12741,7 @@ Kernel Offset: disabled
    -
    16.6.1.5. Panic trace show addresses instead of symbols
    +
    17.6.1.5. Panic trace show addresses instead of symbols

    If CONFIG_KALLSYMS=n, then addresses are shown on traces instead of symbol plus offset.

    @@ -12748,7 +12780,7 @@ Kernel Offset: disabled
    -

    16.6.2. Kernel oops

    +

    17.6.2. Kernel oops

    On oops, the shell still lives after.

    @@ -12865,7 +12897,7 @@ CR2: 0000000000000000
    -

    16.6.3. dump_stack

    +

    17.6.3. dump_stack

    The dump_stack function produces a stack trace much like panic and oops, but causes no problems and we return to the normal control flow, and can cleanly remove the module afterwards:

    @@ -12879,7 +12911,7 @@ CR2: 0000000000000000
    -

    16.6.4. WARN_ON

    +

    17.6.4. WARN_ON

    The WARN_ON macro basically just calls dump_stack.

    @@ -12900,7 +12932,7 @@ insmod warn_on.ko
    -

    16.6.5. not syncing: VFS: Unable to mount root fs on unknown-block(0,0)

    +

    17.6.5. not syncing: VFS: Unable to mount root fs on unknown-block(0,0)

    Let’s learn how to diagnose problems with the root filesystem not being found. TODO add a sample panic error message for each error type:

    @@ -13051,7 +13083,7 @@ CONFIG_VIRTIO_PCI=y
    -

    16.7. Pseudo filesystems

    +

    17.7. Pseudo filesystems

    Pseudo filesystems are filesystems that don’t represent actual files in a hard disk, but rather allow us to do special operations on filesystem-related system calls.

    @@ -13072,7 +13104,7 @@ CONFIG_VIRTIO_PCI=y
    -

    16.7.1. debugfs

    +

    17.7.1. debugfs

    Debugfs is the simplest pseudo filesystem to play around with:

    @@ -13136,7 +13168,7 @@ echo $?
    -

    16.7.2. procfs

    +

    17.7.2. procfs

    Procfs is just another fops entry point:

    @@ -13187,7 +13219,7 @@ echo $?
    -
    16.7.2.1. /proc/version
    +
    17.7.2.1. /proc/version

    Its data is shared with uname(), which is a POSIX C function and has a Linux syscall to back it up.

    @@ -13228,7 +13260,7 @@ echo $?
    -

    16.7.3. sysfs

    +

    17.7.3. sysfs

    Sysfs is more restricted than procfs, as it does not take an arbitrary file_operations:

    @@ -13308,7 +13340,7 @@ echo $?
    -

    16.7.4. Character devices

    +

    17.7.4. Character devices

    Character devices can have arbitrary File operations associated to them:

    @@ -13402,7 +13434,7 @@ echo $?

    Bibliography: https://unix.stackexchange.com/questions/37829/understanding-character-device-or-character-special-files/371758#371758

    -
    16.7.4.1. Automatically create character device file on insmod
    +
    17.7.4.1. Automatically create character device file on insmod

    And also destroy it on rmmod:

    @@ -13440,9 +13472,9 @@ echo $?
    -

    16.8. Pseudo files

    +

    17.8. Pseudo files

    -

    16.8.1. File operations

    +

    17.8.1. File operations

    File operations are the main method of userland driver communication.

    @@ -13495,7 +13527,7 @@ echo $?
    -

    16.8.2. seq_file

    +

    17.8.2. seq_file

    Writing trivial read File operations is repetitive and error prone. The seq_file API makes the process much easier for those trivial cases:

    @@ -13556,7 +13588,7 @@ echo $?
    -
    16.8.2.1. seq_file single_open
    +
    17.8.2.1. seq_file single_open

    If you have the entire read output upfront, single_open is an even more convenient version of seq_file:

    @@ -13599,7 +13631,7 @@ cd
    -

    16.8.3. poll

    +

    17.8.3. poll

    The poll system call allows an user process to do a non-busy wait on a kernel event.

    @@ -13704,7 +13736,7 @@ POLLIN n=10 buf=4294893839
    -

    16.8.4. ioctl

    +

    17.8.4. ioctl

    The ioctl system call is the best way to pass an arbitrary number of parameters to the kernel in a single go:

    @@ -13801,7 +13833,7 @@ echo $?
    -

    16.8.5. mmap

    +

    17.8.5. mmap

    The mmap system call allows us to share memory between user and kernel space without copying:

    @@ -13871,7 +13903,7 @@ echo $?
    -

    16.8.6. Anonymous inode

    +

    17.8.6. Anonymous inode

    Anonymous inodes allow getting multiple file descriptors from a single filesystem entry, which reduces namespace pollution compared to creating multiple device files:

    @@ -13919,7 +13951,7 @@ echo $?
    - +

    Netlink sockets offer a socket API for kernel / userland communication:

    @@ -13984,7 +14016,7 @@ for i in `seq 16`; do ./netlink.out & done
    -

    16.9. kthread

    +

    17.9. kthread

    Kernel threads are managed exactly like userland threads; they also have a backing task_struct, and are scheduled with the same mechanism:

    @@ -14022,7 +14054,7 @@ for i in `seq 16`; do ./netlink.out & done
    -

    The sleep is done with usleep_range, see: Section 16.9.2, “sleep”.

    +

    The sleep is done with usleep_range, see: Section 17.9.2, “sleep”.

    Bibliography:

    @@ -14038,7 +14070,7 @@ for i in `seq 16`; do ./netlink.out & done
    -

    16.9.1. kthreads

    +

    17.9.1. kthreads

    Let’s launch two threads and see if they actually run in parallel:

    @@ -14081,7 +14113,7 @@ for i in `seq 16`; do ./netlink.out & done
    -

    16.9.2. sleep

    +

    17.9.2. sleep

    Count to dmesg every one second from 0 up to n - 1:

    @@ -14111,7 +14143,7 @@ for i in `seq 16`; do ./netlink.out & done
    -

    16.9.3. Workqueues

    +

    17.9.3. Workqueues

    A more convenient front-end for kthread:

    @@ -14144,7 +14176,7 @@ for i in `seq 16`; do ./netlink.out & done

    Bibliography: https://github.com/torvalds/linux/blob/v4.17/Documentation/core-api/workqueue.rst

    -
    16.9.3.1. Workqueue from workqueue
    +
    17.9.3.1. Workqueue from workqueue

    Count from 0 to 9 every second infinitely many times by scheduling a new work item from a work item:

    @@ -14170,7 +14202,7 @@ for i in `seq 16`; do ./netlink.out & done
    -

    16.9.4. schedule

    +

    17.9.4. schedule

    Let’s block the entire kernel! Yay:

    @@ -14217,7 +14249,7 @@ for i in `seq 16`; do ./netlink.out & done
    -

    16.9.5. Wait queues

    +

    17.9.5. Wait queues

    Wait queues are a way to make a thread sleep until an event happens on the queue:

    @@ -14281,7 +14313,7 @@ for i in `seq 16`; do ./netlink.out & done
    -

    16.10. Timers

    +

    17.10. Timers

    Count from 0 to 9 infinitely many times in 1 second intervals using timers:

    @@ -14322,9 +14354,9 @@ for i in `seq 16`; do ./netlink.out & done
    -

    16.11. IRQ

    +

    17.11. IRQ

    -

    16.11.1. irq.ko

    +

    17.11.1. irq.ko

    Brute force monitor every shared interrupt that will accept us:

    @@ -14430,7 +14462,7 @@ request_irq irq = 1 ret = 0
    -

    16.11.2. dummy-irq

    +

    17.11.2. dummy-irq

    The Linux kernel v4.16 mainline also has a dummy-irq module at drivers/misc/dummy-irq.c for monitoring a single IRQ.

    @@ -14485,7 +14517,7 @@ request_irq irq = 1 ret = 0
    -

    16.11.3. /proc/interrupts

    +

    17.11.3. /proc/interrupts

    In the guest with QEMU graphic mode:

    @@ -14519,12 +14551,12 @@ request_irq irq = 1 ret = 0
    -

    16.12. Kernel utility functions

    +

    17.12. Kernel utility functions

    https://github.com/torvalds/linux/blob/v4.17/Documentation/core-api/kernel-api.rst

    -

    16.12.1. kstrto

    +

    17.12.1. kstrto

    Convert a string to an integer:

    @@ -14560,7 +14592,7 @@ echo $?
    -

    16.12.2. virt_to_phys

    +

    17.12.2. virt_to_phys

    Convert a virtual address to physical:

    @@ -14630,7 +14662,7 @@ virt_to_phys(&static_var) = 0x40002308
    -
    16.12.2.1. Userland physical address experiments
    +
    17.12.2.1. Userland physical address experiments

    Only tested in x86_64.

    @@ -14751,7 +14783,7 @@ pid 110
    -
    16.12.2.1.1. QEMU xp
    +
    17.12.2.1.1. QEMU xp

    The xp QEMU monitor command reads memory at a given physical address.

    @@ -14782,7 +14814,7 @@ pid 110
    -
    16.12.2.1.2. /dev/mem
    +
    17.12.2.1.2. /dev/mem

    /dev/mem exposes access to physical addresses, and we use it through the convenient devmem BusyBox utility.

    @@ -14858,7 +14890,7 @@ Value at address 0X7C7B800 (0x7ff7dbe01800): 0x12345678
    -
    16.12.2.1.3. pagemap_dump.out
    +
    17.12.2.1.3. pagemap_dump.out

    Dump the physical address of all pages mapped to a given process using /proc/<pid>/maps and /proc/<pid>/pagemap.

    @@ -15029,7 +15061,7 @@ pid 63
    -

    16.13. Linux kernel tracing

    +

    17.13. Linux kernel tracing

    Good overviews:

    @@ -15047,7 +15079,7 @@ pid 63

    I hope to have examples of all methods some day, since I’m obsessed with visibility.

    -

    16.13.1. CONFIG_PROC_EVENTS

    +

    17.13.1. CONFIG_PROC_EVENTS

    Logs proc events such as process creation to a netlink socket.

    @@ -15114,7 +15146,7 @@ a
    -
    16.13.1.1. CONFIG_PROC_EVENTS aarch64
    +
    17.13.1.1. CONFIG_PROC_EVENTS aarch64

    0111ca406bdfa6fd65a2605d353583b4c4051781 was failing with:

    @@ -15184,7 +15216,7 @@ make: *** [_all] Error 2
    -

    16.13.2. ftrace

    +

    17.13.2. ftrace

    Trace a single function:

    @@ -15292,13 +15324,13 @@ echo function_graph > current_tracer

    TODO: can you get function arguments? https://stackoverflow.com/questions/27608752/does-ftrace-allow-capture-of-system-call-arguments-to-the-linux-kernel-or-only

    -
    16.13.2.1. ftrace system calls
    +
    17.13.2.1. ftrace system calls

    https://stackoverflow.com/questions/29840213/how-do-i-trace-a-system-call-in-linux/51856306#51856306

    -
    16.13.2.2. trace-cmd
    +
    17.13.2.2. trace-cmd

    TODO example:

    @@ -15310,7 +15342,7 @@ echo function_graph > current_tracer
    -

    16.13.3. Kprobes

    +

    17.13.3. Kprobes

    kprobes is an instrumentation mechanism that injects arbitrary code at a given address in a trap instruction, much like GDB. Oh, the good old kernel. :-)

    @@ -15375,7 +15407,7 @@ sleep 4 & sleep 4 &
    -

    16.13.4. Count boot instructions

    +

    17.13.4. Count boot instructions

    TODO: didn’t port during refactor after 3b0a343647bed577586989fb702b760bd280844a. Reimplementing should not be hard.

    @@ -15567,12 +15599,12 @@ instructions_firmware 20708
    -

    16.14. Linux kernel hardening

    +

    17.14. Linux kernel hardening

    Make it harder to get hacked and easier to notice that you were, at the cost of some (small?) runtime overhead.

    -

    16.14.1. CONFIG_FORTIFY_SOURCE

    +

    17.14.1. CONFIG_FORTIFY_SOURCE

    Detects buffer overflows for us:

    @@ -15617,12 +15649,12 @@ detected buffer overflow in strlen
    -

    16.14.2. Linux security modules

    +

    17.14.2. Linux security modules

    https://en.wikipedia.org/wiki/Linux_Security_Modules

    -
    16.14.2.1. SELinux
    +
    17.14.2.1. SELinux

    TODO get a hello world permission control working:

    @@ -15713,13 +15745,13 @@ detected buffer overflow in strlen
    -

    SELinux requires glibc as mentioned at: Section 25.10, “libc choice”.

    +

    SELinux requires glibc as mentioned at: Section 26.10, “libc choice”.

    -

    16.15. User mode Linux

    +

    17.15. User mode Linux

    I once got UML running on a minimal Buildroot setup at: https://unix.stackexchange.com/questions/73203/how-to-create-rootfs-for-user-mode-linux-on-fedora-18/372207#372207

    @@ -15731,7 +15763,7 @@ detected buffer overflow in strlen
    -

    16.16. UIO

    +

    17.16. UIO

    UIO is a kernel subsystem that allows to do certain types of driver operations from userland.

    @@ -15839,9 +15871,9 @@ detected buffer overflow in strlen
    -

    16.17. Linux kernel interactive stuff

    +

    17.17. Linux kernel interactive stuff

    -

    16.17.1. Linux kernel console fun

    +

    17.17.1. Linux kernel console fun

    Requires Graphics.

    @@ -15885,7 +15917,7 @@ detected buffer overflow in strlen
    -

    16.17.2. Linux kernel magic keys

    +

    17.17.2. Linux kernel magic keys

    Requires Graphics.

    @@ -15922,7 +15954,7 @@ sendkey shift-pgdown
    -
    16.17.2.1. Ctrl Alt Del
    +
    17.17.2.1. Ctrl Alt Del

    If you run in QEMU graphic mode:

    @@ -15956,7 +15988,7 @@ sendkey shift-pgdown
    -

    This leads Linux to try to reboot, and QEMU shutdowns due to the -no-reboot option which we set by default for, see: Section 16.6.1.3, “Exit emulator on panic”.

    +

    This leads Linux to try to reboot, and QEMU shutdowns due to the -no-reboot option which we set by default for, see: Section 17.6.1.3, “Exit emulator on panic”.

    Here is a minimal example of Ctrl Alt Del:

    @@ -16140,7 +16172,7 @@ static void halt_reboot_pwoff(int sig)
    -
    16.17.2.2. SysRq
    +
    17.17.2.2. SysRq

    We cannot test these actual shortcuts on QEMU since the host captures them at a lower level, but from:

    @@ -16211,7 +16243,7 @@ static void halt_reboot_pwoff(int sig)
    -

    16.17.3. TTY

    +

    17.17.3. TTY

    In order to play with TTYs, do this:

    @@ -16450,7 +16482,7 @@ tty63::respawn:-/bin/sh
    -
    16.17.3.1. Start a getty from outside of init
    +
    17.17.3.1. Start a getty from outside of init

    TODO: https://unix.stackexchange.com/questions/196704/getty-start-from-command-line

    @@ -16507,7 +16539,7 @@ tty63::respawn:-/bin/sh
    -
    16.17.3.2. console kernel boot parameter
    +
    17.17.3.2. console kernel boot parameter

    Take the command described at TTY and try adding the following:

    @@ -16543,7 +16575,7 @@ tty63::respawn:-/bin/sh
    - +

    If you run in Graphics, then you get a Penguin image for every core above the console! https://askubuntu.com/questions/80938/is-it-possible-to-get-the-tux-logo-on-the-text-based-boot

    @@ -16587,7 +16619,7 @@ tty63::respawn:-/bin/sh
    -

    16.18. DRM

    +

    17.18. DRM

    DRM / DRI is the new interface that supersedes fbdev:

    @@ -16670,7 +16702,7 @@ crw------- 1 root root 226, 0 May 28 09:41 card0

    Tested on: 93e383902ebcc03d8a7ac0d65961c0e62af9612b

    -

    16.18.1. kmscube

    +

    17.18.1. kmscube

    ./build-buildroot --config-fragment buildroot_config/kmscube
    @@ -16745,7 +16777,7 @@ failed to initialize legacy DRM
    -

    16.18.2. kmscon

    +

    17.18.2. kmscon

    TODO get working.

    @@ -16766,7 +16798,7 @@ failed to initialize legacy DRM
    -

    16.18.3. libdri2

    +

    17.18.3. libdri2

    TODO get working.

    @@ -16792,12 +16824,12 @@ wget \
    -

    16.19. Linux kernel testing

    +

    17.19. Linux kernel testing

    Bibliography: https://stackoverflow.com/questions/3177338/how-is-the-linux-kernel-tested

    -

    16.19.1. Linux Test Project

    +

    17.19.1. Linux Test Project

    @@ -16844,7 +16876,7 @@ wget \
    -

    16.19.2. stress

    +

    17.19.2. stress

    POSIX userland stress. Two versions:

    @@ -16857,7 +16889,7 @@ wget \
    -

    STRESS_NG is likely the best, but it requires glibc, see: Section 25.10, “libc choice”.

    +

    STRESS_NG is likely the best, but it requires glibc, see: Section 26.10, “libc choice”.

    Websites:

    @@ -16899,9 +16931,9 @@ ps
    -

    16.20. Linux kernel build system

    +

    17.20. Linux kernel build system

    -

    16.20.1. vmlinux vs bzImage vs zImage vs Image

    +

    17.20.1. vmlinux vs bzImage vs zImage vs Image

    Between all archs on QEMU and gem5 we touch all of those kernel built output files.

    @@ -16917,7 +16949,7 @@ ps
    -

    16.21. Virtio

    +

    17.21. Virtio

    https://www.linux-kvm.org/page/Virtio

    @@ -16935,9 +16967,9 @@ ps
    -

    16.22. Kernel modules

    +

    17.22. Kernel modules

    -

    16.22.1. dump_regs

    +

    17.22.1. dump_regs

    The following kernel modules and Baremetal executables dump and disassemble various registers which cannot be observed from userland (usually "system registers", "control registers"):

    @@ -17015,7 +17047,7 @@ ps
    -

    17. FreeBSD

    +

    18. FreeBSD

    https://en.wikipedia.org/wiki/FreeBSD

    @@ -17029,13 +17061,13 @@ ps
    -

    18. RTOS

    +

    19. RTOS

    -

    18.2. ARM Mbed

    +

    19.2. ARM Mbed

    @@ -17089,7 +17121,7 @@ west build -b qemu_aarch64 samples/hello_world
    -

    19. Xen

    +

    20. Xen

    https://en.wikipedia.org/wiki/Xen

    @@ -17145,7 +17177,7 @@ west build -b qemu_aarch64 samples/hello_world
    -

    so as long as we craft the correct DTB and feed it into Xen so that it can see the kernel, it should work. TODO does QEMU support patching the auto-generated DTB with pre-generated options? In the worst case we can just dump it hand hack it up though with -machine dumpdtb, see: Section 8.4, “Device tree emulator generation”.

    +

    so as long as we craft the correct DTB and feed it into Xen so that it can see the kernel, it should work. TODO does QEMU support patching the auto-generated DTB with pre-generated options? In the worst case we can just dump it hand hack it up though with -machine dumpdtb, see: Section 9.4, “Device tree emulator generation”.

    Bibliography:

    @@ -17169,7 +17201,7 @@ west build -b qemu_aarch64 samples/hello_world
    -

    20. U-Boot

    +

    21. U-Boot

    https://en.wikipedia.org/wiki/Das_U-Boot

    @@ -17183,7 +17215,7 @@ west build -b qemu_aarch64 samples/hello_world
    -

    21. Emulators

    +

    22. Emulators

    https://en.wikipedia.org/wiki/Emulator

    @@ -17204,10 +17236,10 @@ west build -b qemu_aarch64 samples/hello_world
    -

    22. QEMU

    +

    23. QEMU

    -

    22.1. Introduction to QEMU

    +

    23.1. Introduction to QEMU

    QEMU is a system simulator: it simulates a CPU and devices such as interrupt handlers, timers, UART, screen, keyboard, etc.

    @@ -17237,7 +17269,7 @@ west build -b qemu_aarch64 samples/hello_world
    -

    22.2. Binary translation

    +

    23.2. Binary translation

    @@ -17246,7 +17278,7 @@ west build -b qemu_aarch64 samples/hello_world
    -

    22.3. Disk persistency

    +

    23.3. Disk persistency

    We disable disk persistency for both QEMU and gem5 by default, to prevent the emulator from putting the image in an unknown state.

    @@ -17301,7 +17333,7 @@ west build -b qemu_aarch64 samples/hello_world

    Disk persistency is useful to re-run shell commands from the history of a previous session with Ctrl-R, but we felt that the loss of determinism was not worth it.

    -

    22.3.1. gem5 disk persistency

    +

    23.3.1. gem5 disk persistency

    TODO how to make gem5 disk writes persistent?

    @@ -17331,7 +17363,7 @@ index 17498c42b..76b8b351d 100644
    -

    22.4. gem5 qcow2

    +

    23.4. gem5 qcow2

    qcow2 does not appear supported, there are not hits in the source tree, and there is a mention on Nate’s 2009 wishlist: http://gem5.org/Nate%27s_Wish_List

    @@ -17340,7 +17372,7 @@ index 17498c42b..76b8b351d 100644
    -

    22.5. Snapshot

    +

    23.5. Snapshot

    QEMU allows us to take snapshots at any time through the monitor.

    @@ -17438,7 +17470,7 @@ index 17498c42b..76b8b351d 100644

    Bibliography: https://stackoverflow.com/questions/40227651/does-qemu-emulator-have-checkpoint-function/48724371#48724371

    -

    22.5.1. Snapshot internals

    +

    23.5.1. Snapshot internals

    Snapshots are stored inside the .qcow2 images themselves.

    @@ -17487,7 +17519,7 @@ Format specific information:
    -

    22.6. Device models

    +

    23.6. Device models

    This section documents:

    @@ -17519,12 +17551,12 @@ Format specific information:
    -

    22.6.1. PCI

    +

    23.6.1. PCI

    Only tested in x86.

    -
    22.6.1.1. QEMU edu PCI device
    +
    23.6.1.1. QEMU edu PCI device

    Small upstream educational PCI device:

    @@ -17600,7 +17632,7 @@ Format specific information:
    -
    22.6.1.2. Manipulate PCI registers directly
    +
    23.6.1.2. Manipulate PCI registers directly

    In this section we will try to interact with PCI devices directly from userland without kernel modules.

    @@ -17734,7 +17766,7 @@ devmem 0xfeb54000 w 0x12345678
    -
    22.6.1.3. pciutils
    +
    23.6.1.3. pciutils

    There are two versions of setpci and lspci:

    @@ -17750,7 +17782,7 @@ devmem 0xfeb54000 w 0x12345678
    -
    22.6.1.4. Introduction to PCI
    +
    23.6.1.4. Introduction to PCI

    The PCI standard is non-free, obviously like everything in low level: https://pcisig.com/specifications but Google gives several illegal PDF hits :-)

    @@ -17810,7 +17842,7 @@ devmem 0xfeb54000 w 0x12345678
    -
    22.6.1.5. PCI BFD
    +
    23.6.1.5. PCI BFD

    lspci -k shows something like:

    @@ -17864,7 +17896,7 @@ devmem 0xfeb54000 w 0x12345678
    -
    22.6.1.6. PCI BAR
    +
    23.6.1.6. PCI BAR

    https://stackoverflow.com/questions/30190050/what-is-base-address-register-bar-in-pcie/44716618#44716618

    @@ -17906,7 +17938,7 @@ pci_register_bar(pdev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &edu->mmio);
    -

    22.6.2. GPIO

    +

    23.6.2. GPIO

    TODO: broken. Was working before we moved arm from -M versatilepb to -M virt around af210a76711b7fa4554dcc2abd0ddacfc810dfd4. Either make it work on -M virt if that is possible, or document precisely how to make it work with versatilepb, or hopefully vexpress which is newer.

    @@ -17949,7 +17981,7 @@ pci_register_bar(pdev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &edu->mmio);
    -

    22.6.3. LEDs

    +

    23.6.3. LEDs

    TODO: broken when arm moved to -M virt, same as GPIO.

    @@ -18021,7 +18053,7 @@ echo 255 >brightness
    -

    22.6.4. gem5 educational hardware models

    +

    23.6.4. gem5 educational hardware models

    TODO get some working!

    @@ -18031,7 +18063,7 @@ echo 255 >brightness
    -

    22.7. QEMU monitor

    +

    23.7. QEMU monitor

    The QEMU monitor is a magic terminal that allows you to send text commands to the QEMU VM itself: https://en.wikibooks.org/wiki/QEMU/Monitor

    @@ -18151,7 +18183,7 @@ echo 255 >brightness
    -

    22.7.1. QEMU monitor from guest

    +

    23.7.1. QEMU monitor from guest

    Peter Maydell said potentially not possible nicely as of August 2018: https://stackoverflow.com/questions/51747744/how-to-run-a-qemu-monitor-command-from-inside-the-guest/51764110#51764110

    @@ -18168,7 +18200,7 @@ echo 255 >brightness
    -

    22.7.2. QEMU monitor from GDB

    +

    23.7.2. QEMU monitor from GDB

    When doing GDB step debug it is possible to send QEMU monitor commands through the GDB monitor command, which saves you the trouble of opening yet another shell.

    @@ -18184,7 +18216,7 @@ monitor info qtree
    -

    22.8. Debug the emulator

    +

    23.8. Debug the emulator

    When you start hacking QEMU or gem5, it is useful to see what is going on inside the emulator themselves.

    @@ -18244,10 +18276,10 @@ run

    The build outputs are automatically stored in a different directories for optimized and debug builds, which prevents debug files from overwriting opt ones. Therefore, --gem5-build-id is not required.

    -

    The price to pay for debuggability is high however: a Linux kernel boot was about 3x slower in QEMU and 14 times slower in gem5 debug compared to opt, see benchmarks at: Section 34.2.1, “Benchmark Linux kernel boot”.

    +

    The price to pay for debuggability is high however: a Linux kernel boot was about 3x slower in QEMU and 14 times slower in gem5 debug compared to opt, see benchmarks at: Section 35.2.1, “Benchmark Linux kernel boot”.

    -

    Similar slowdowns can be observed at: Section 34.2.2, “Benchmark emulators on userland executables”.

    +

    Similar slowdowns can be observed at: Section 35.2.2, “Benchmark emulators on userland executables”.

    When in QEMU text mode, using --debug-vm makes Ctrl-C not get passed to the QEMU guest anymore: it is instead captured by GDB itself, so allow breaking. So e.g. you won’t be able to easily quit from a guest program like:

    @@ -18264,7 +18296,7 @@ run

    You can still send key presses to QEMU however even without the mouse capture, just either click on the title bar, or alt tab to give it focus.

    -

    22.8.1. Reverse debug the emulator

    +

    23.8.1. Reverse debug the emulator

    While step debugging any complex program, you always end up feeling the need to step in reverse to reach the last call to some function that was called before the failure point, in order to trace back the problem to the actual bug source.

    @@ -18353,7 +18385,7 @@ reverse-next
    -

    22.8.2. Debug gem5 Python scripts

    +

    23.8.2. Debug gem5 Python scripts

    Start pdb at the first instruction:

    @@ -18387,7 +18419,7 @@ reverse-next
    -

    22.9. Tracing

    +

    23.9. Tracing

    QEMU can log several different events.

    @@ -18478,7 +18510,7 @@ Call Trace:
    -

    22.9.1. QEMU -d tracing

    +

    23.9.1. QEMU -d tracing

    QEMU also has a second trace mechanism in addition to -trace, find out the events with:

    @@ -18519,7 +18551,7 @@ IN:
    -

    22.9.2. QEMU trace register values

    +

    23.9.2. QEMU trace register values

    TODO: is it possible to show the register values for each instruction?

    @@ -18549,11 +18581,11 @@ IN:

    PANDA can list memory addresses, so I bet it can also decode the instructions: https://github.com/panda-re/panda/blob/883c85fa35f35e84a323ed3d464ff40030f06bd6/panda/docs/LINE_Censorship.md I wonder why they don’t just upstream those things to QEMU’s tracing: https://github.com/panda-re/panda/issues/290

    -

    gem5 can do it as shown at: Section 22.9.8, “gem5 tracing”.

    +

    gem5 can do it as shown at: Section 23.9.8, “gem5 tracing”.

    -

    22.9.3. QEMU trace memory accesses

    +

    23.9.3. QEMU trace memory accesses

    Not possible apparently, not even with the memory_region_ops_read and memory_region_ops_write trace events, Peter comments https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg07482.html

    @@ -18572,7 +18604,7 @@ of guest operations.

    -

    22.9.4. Trace source lines

    +

    23.9.4. Trace source lines

    We can further use Binutils' addr2line to get the line that corresponds to each address:

    @@ -18628,7 +18660,7 @@ less "$(./getvar --arch x86_64 run_dir)/trace-lines.txt"
    -

    22.9.5. QEMU record and replay

    +

    23.9.5. QEMU record and replay

    QEMU runs, unlike gem5, are not deterministic by default, however it does support a record and replay mechanism that allows you to replay a previous run deterministically.

    @@ -18735,7 +18767,7 @@ less "$(./getvar --arch x86_64 run_dir)/trace-lines.txt"

    Solved on unmerged c42634d8e3428cfa60672c3ba89cabefc720cde9 from https://github.com/ispras/qemu/tree/rr-180725

    -
    22.9.5.1. QEMU reverse debugging
    +
    23.9.5.1. QEMU reverse debugging

    TODO get working.

    @@ -18774,7 +18806,7 @@ reverse-continue
    -

    22.9.6. QEMU trace multicore

    +

    23.9.6. QEMU trace multicore

    TODO: is there any way to distinguish which instruction runs on each core? Doing:

    @@ -18789,13 +18821,13 @@ reverse-continue
    -

    22.9.7. QEMU get guest instruction count

    +

    23.9.7. QEMU get guest instruction count

    TODO: https://stackoverflow.com/questions/58766571/how-to-count-the-number-of-guest-instructions-qemu-executed-from-the-beginning-t

    -

    22.9.8. gem5 tracing

    +

    23.9.8. gem5 tracing

    gem5 provides also provides a tracing mechanism documented at: http://www.gem5.org/Trace_Based_Debugging:

    @@ -18906,7 +18938,7 @@ less "$(./getvar --arch aarch64 run_dir)/trace-lines.txt"

    TODO: 7452d399290c9c1fc6366cdad129ef442f323564 ./trace2line this is too slow and takes hours. QEMU’s processing of 170k events takes 7 seconds. gem5’s processing is analogous, but there are 140M events, so it should take 7000 seconds ~ 2 hours which seems consistent with what I observe, so maybe there is no way to speed this up…​ The workaround is to just use gem5’s ExecSymbol to get function granularity, and then GDB individually if line detail is needed?

    -
    22.9.8.1. gem5 trace internals
    +
    23.9.8.1. gem5 trace internals

    gem5 traces are generated from DPRINTF(<trace-id> calls scattered throughout the code, except for ExecAll instruction traces, which uses Debug::ExecEnable directly..

    @@ -18943,7 +18975,7 @@ extern SimpleFlag ExecEnable;
    -
    22.9.8.2. gem5 ExecAll trace format
    +
    23.9.8.2. gem5 ExecAll trace format

    This debug flag traces all instructions.

    @@ -18981,7 +19013,7 @@ extern SimpleFlag ExecEnable;

    25007500: time count in some unit. Note how the microops execute at further timestamps.

  • -

    system.cpu: distinguishes between CPUs when there are more than one. For example, running Section 32.10.3, “ARM baremetal multicore” with two cores produces system.cpu0 and system.cpu1

    +

    system.cpu: distinguishes between CPUs when there are more than one. For example, running Section 33.10.3, “ARM baremetal multicore” with two cores produces system.cpu0 and system.cpu1

  • T0: thread number. TODO: hyperthread? How to play with it?

    @@ -19026,7 +19058,7 @@ extern SimpleFlag ExecEnable;
  • -
    22.9.8.3. gem5 Registers trace format
    +
    23.9.8.3. gem5 Registers trace format

    This flag shows a more detailed register usage than gem5 ExecAll trace format.

    @@ -19081,13 +19113,13 @@ add x1, x0, 2
    -
    22.9.8.4. gem5 TARMAC traces
    +
    23.9.8.4. gem5 TARMAC traces

    https://stackoverflow.com/questions/54882466/how-to-use-the-tarmac-tracer-with-gem5

    -
    22.9.8.5. gem5 tracing internals
    +
    23.9.8.5. gem5 tracing internals

    As of gem5 16eeee5356585441a49d05c78abc328ef09f7ace the default tracer is ExeTracer. It is set at:

    @@ -19160,7 +19192,7 @@ src/arch/x86/nativetrace.hh:41:class X86NativeTrace : public NativeTrace
    -

    22.10. QEMU GUI is unresponsive

    +

    23.10. QEMU GUI is unresponsive

    Sometimes in Ubuntu 14.04, after the QEMU SDL GUI starts, it does not get updated after keyboard strokes, and there are artifacts like disappearing text.

    @@ -19184,10 +19216,10 @@ root
    -

    23. gem5

    +

    24. gem5

    gem5 has a bunch of crappiness, mostly described at: gem5 vs QEMU, but it does deserve some credit on the following points:

    @@ -19203,7 +19235,7 @@ root
    -

    23.1. gem5 vs QEMU

    +

    24.1. gem5 vs QEMU

    -

    23.2. gem5 run benchmark

    +

    24.2. gem5 run benchmark

    OK, this is why we used gem5 in the first place, performance measurements!

    @@ -19478,7 +19510,7 @@ cat out/gem5-bench-dhrystone.txt
    -

    but the problem is that this method does not allow to easily run a different script without running the boot again. The ./gem5.sh script works around that by using m5 readfile as explained further at: Section 23.6.3, “gem5 checkpoint restore and run a different script”.

    +

    but the problem is that this method does not allow to easily run a different script without running the boot again. The ./gem5.sh script works around that by using m5 readfile as explained further at: Section 24.6.3, “gem5 checkpoint restore and run a different script”.

    Now you can play a fun little game with your friends:

    @@ -19513,7 +19545,7 @@ cat out/gem5-bench-dhrystone.txt

    To find out why your program is slow, a good first step is to have a look at the gem5 m5out/stats.txt file.

    -

    23.2.1. Skip extra benchmark instructions

    +

    24.2.1. Skip extra benchmark instructions

    A few imperfections of our benchmarking method are:

    @@ -19549,7 +19581,7 @@ cat out/gem5-bench-dhrystone.txt
    -

    23.3. gem5 system parameters

    +

    24.3. gem5 system parameters

    Besides optimizing a program for a given CPU setup, chip developers can also do the inverse, and optimize the chip for a given benchmark!

    @@ -19557,7 +19589,7 @@ cat out/gem5-bench-dhrystone.txt

    The rabbit hole is likely deep, but let’s scratch a bit of the surface.

    -

    23.3.1. Number of cores

    +

    24.3.1. Number of cores

    ./run --arch arm --cpus 2 --emulator gem5
    @@ -19604,7 +19636,7 @@ getconf _NPROCESSORS_CONF
    -
    23.3.1.1. QEMU user mode multithreading
    +
    24.3.1.1. QEMU user mode multithreading

    User mode simulation QEMU v4.0.0 always shows the number of cores of the host, presumably because the thread switching uses host threads directly which would make that harder to implement.

    @@ -19641,7 +19673,7 @@ ps Haux | grep qemu | wc
    -

    23.3.2. gem5 cache size

    +

    24.3.2. gem5 cache size

    @@ -19870,12 +19902,12 @@ instructions 91738770
    -

    23.3.3. gem5 DRAM model

    +

    24.3.3. gem5 DRAM model

    Some info at: TimingSimpleCPU analysis #1 but highly TODO :-)

    -
    23.3.3.1. gem5 memory latency
    +
    24.3.3.1. gem5 memory latency

    TODO These look promising:

    @@ -19916,7 +19948,7 @@ instructions 91738770

    we have no caches, each instruction is fetched from memory

  • -

    each loop contains 11 instructions as shown at Section 35.2, “C busy loop”

    +

    each loop contains 11 instructions as shown at Section 36.2, “C busy loop”

  • and supposing that the loop dominated executable pre/post main, which we know is true since as shown in Benchmark emulators on userland executables an empty dynamically linked C program only as about 100k instructions, while our loop runs 1000000 * 11 = 12M.

    @@ -19944,7 +19976,7 @@ instructions 91738770
  • -
    23.3.3.2. Memory size
    +
    24.3.3.2. Memory size

    Can be set across emulators with:

    @@ -20048,7 +20080,7 @@ get_avphys_pages() * sysconf(_SC_PAGESIZE) = 0x1D178000
    -
    23.3.3.3. gem5 DRAM setup
    +
    24.3.3.3. gem5 DRAM setup

    This can be explored pretty well from gem5 config.ini.

    @@ -20106,7 +20138,7 @@ get_avphys_pages() * sysconf(_SC_PAGESIZE) = 0x1D178000
    -

    23.3.4. gem5 disk and network latency

    +

    24.3.4. gem5 disk and network latency

    TODO These look promising:

    @@ -20121,7 +20153,7 @@ get_avphys_pages() * sysconf(_SC_PAGESIZE) = 0x1D178000
    -

    23.3.5. gem5 clock frequency

    +

    24.3.5. gem5 clock frequency

    As of gem5 872cb227fdc0b4d60acc7840889d567a6936b6e1 defaults to 2GHz for fs.py:

    @@ -20206,7 +20238,7 @@ hello
    -

    23.4. gem5 kernel command line parameters

    +

    24.4. gem5 kernel command line parameters

    Analogous to QEMU:

    @@ -20239,9 +20271,9 @@ hello
    -

    23.5. gem5 GDB step debug

    +

    24.5. gem5 GDB step debug

    -

    23.5.1. gem5 GDB step debug kernel

    +

    24.5.1. gem5 GDB step debug kernel

    Analogous to QEMU, on the first shell:

    @@ -20270,13 +20302,13 @@ hello

    When you want to break, just do a Ctrl-C on GDB shell, and then continue.

    -

    And we now see the boot messages, and then get a shell. Now try the ./count.sh procedure described for QEMU at: Section 2.2, “GDB step debug kernel post-boot”.

    +

    And we now see the boot messages, and then get a shell. Now try the ./count.sh procedure described for QEMU at: Section 3.2, “GDB step debug kernel post-boot”.

    -

    23.5.2. gem5 GDB step debug userland process

    +

    24.5.2. gem5 GDB step debug userland process

    -

    We are unable to use gdbserver because of networking as mentioned at: Section 14.3.1.3, “gem5 host to guest networking”

    +

    We are unable to use gdbserver because of networking as mentioned at: Section 15.3.1.3, “gem5 host to guest networking”

    The alternative is to do as in GDB step debug userland processes.

    @@ -20309,7 +20341,7 @@ hello
    -

    23.5.3. gem5 GDB step debug secondary cores

    +

    24.5.3. gem5 GDB step debug secondary cores

    gem5’s secondary core GDB setup is a hack and spawns one gdbserver for each core in separate ports, e.g. 7000, 7001, etc.

    @@ -20330,7 +20362,7 @@ hello
    -

    23.6. gem5 checkpoint

    +

    24.6. gem5 checkpoint

    Analogous to QEMU’s Snapshot, but better since it can be started from inside the guest, so we can easily checkpoint after a specific guest event, e.g. just before init is done.

    @@ -20418,7 +20450,7 @@ m5 checkpoint

    since boot has already happened, and the parameters are already in the RAM of the snapshot.

    -

    23.6.1. gem5 checkpoint userland minimal example

    +

    24.6.1. gem5 checkpoint userland minimal example

    In order to debug checkpoint restore bugs, this minimal setup using userland/freestanding/gem5_checkpoint.S can be handy:

    @@ -20472,7 +20504,7 @@ Exiting @ tick 84500 because m5_exit instruction encountered
    -

    23.6.2. gem5 checkpoint internals

    +

    24.6.2. gem5 checkpoint internals

    A quick way to get a gem5 syscall emulation mode or full system checkpoint to observe is:

    @@ -20523,7 +20555,7 @@ prvEvalTick=0
    -

    23.6.3. gem5 checkpoint restore and run a different script

    +

    24.6.3. gem5 checkpoint restore and run a different script

    You want to automate running several tests from a single pristine post-boot state.

    @@ -20671,7 +20703,7 @@ expect eof
    -

    23.6.4. gem5 restore checkpoint with a different CPU

    +

    24.6.4. gem5 restore checkpoint with a different CPU

    gem5 can switch to a different CPU model when restoring a checkpoint.

    @@ -20786,7 +20818,7 @@ cat "$(./getvar --arch aarch64 --emulator gem5 trace_txt_file)"
    -
    23.6.4.1. gem5 fast forward
    +
    24.6.4.1. gem5 fast forward

    Besides switching CPUs after a checkpoint restore, fs.py also has the --fast-forward option to automatically run the script from the start on a less detailed CPU, and switch to a more detailed CPU at a given tick.

    @@ -20912,7 +20944,7 @@ FullO3CPU: Ticking main, FullO3CPU.
    -

    23.6.5. gem5 checkpoint upgrader

    +

    24.6.5. gem5 checkpoint upgrader

    The in-tree util/cpt_upgrader.py is a tool to upgrade checkpoints taken from an older version of gem5 to be compatible with the newest version, so you can update gem5 without having to re-run the simulation that generated the checkpoints.

    @@ -20948,7 +20980,7 @@ version_tags=arm-ccregs arm-contextidr-el2 arm-gem5-gic-ext ...
    -

    23.7. Pass extra options to gem5

    +

    24.7. Pass extra options to gem5

    Remember that in the gem5 command line, we can either pass options to the script being run as in:

    @@ -21005,7 +21037,7 @@ version_tags=arm-ccregs arm-contextidr-el2 arm-gem5-gic-ext ...
    -

    23.8. m5ops

    +

    24.8. m5ops

    m5ops are magic instructions which lead gem5 to do magic things, like quitting or dumping stats.

    @@ -21045,7 +21077,7 @@ version_tags=arm-ccregs arm-contextidr-el2 arm-gem5-gic-ext ...
    -

    23.8.1. gem5 m5 executable

    +

    24.8.1. gem5 m5 executable

    m5 is a guest command line utility that is installed and run on the guest, that serves as a CLI front-end for the m5ops

    @@ -21075,7 +21107,7 @@ version_tags=arm-ccregs arm-contextidr-el2 arm-gem5-gic-ext ...

    This can be a good test m5ops since it executes very quickly.

    -
    23.8.1.1. m5 exit
    +
    24.8.1.1. m5 exit

    End the simulation.

    @@ -21084,13 +21116,13 @@ version_tags=arm-ccregs arm-contextidr-el2 arm-gem5-gic-ext ...
    -
    23.8.1.2. m5 dumpstats
    +
    24.8.1.2. m5 dumpstats

    Makes gem5 dump one more statistics entry to the gem5 m5out/stats.txt file.

    -
    23.8.1.3. m5 fail
    +
    24.8.1.3. m5 fail

    End the simulation with a failure exit event:

    @@ -21129,7 +21161,7 @@ version_tags=arm-ccregs arm-contextidr-el2 arm-gem5-gic-ext ...
    -
    23.8.1.4. m5 writefile
    +
    24.8.1.4. m5 writefile

    Send a guest file to the host. 9P is a more advanced alternative.

    @@ -21160,7 +21192,7 @@ m5 writefile myfileguest myfilehost
    -
    23.8.1.5. m5 readfile
    +
    24.8.1.5. m5 readfile

    Read a host file pointed to by the fs.py --script option to stdout.

    @@ -21188,7 +21220,7 @@ m5 writefile myfileguest myfilehost
    -
    23.8.1.6. m5 initparam
    +
    24.8.1.6. m5 initparam

    Ermm, just another m5 readfile that only takes integers and only from CLI options? Is this software so redundant?

    @@ -21214,7 +21246,7 @@ m5 writefile myfileguest myfilehost
    -
    23.8.1.7. m5 execfile
    +
    24.8.1.7. m5 execfile

    Trivial combination of m5 readfile + execute the script.

    @@ -21249,7 +21281,7 @@ m5 execfile
    -

    23.8.2. m5ops instructions

    +

    24.8.2. m5ops instructions

    There are few different possible instructions that can be used to implement identical m5ops:

    @@ -21365,7 +21397,7 @@ m5 --semi exit
    -
    23.8.2.1. m5ops magic addresses
    +
    24.8.2.1. m5ops magic addresses

    These are magic addresses that when accessed lead to an m5op.

    @@ -21403,7 +21435,7 @@ fatal: Unable to find destination for [0x10012100:0x10012108] on system.iobus

    -
    23.8.2.2. m5ops instructions interface
    +
    24.8.2.2. m5ops instructions interface

    Let’s study how the gem5 m5 executable uses them:

    @@ -21514,7 +21546,7 @@ m5_fail(ints[1], ints[0]);
    -
    23.8.2.3. m5op annotations
    +
    24.8.2.3. m5op annotations

    include/gem5/asm/generic/m5ops.h also describes some annotation instructions.

    @@ -21525,7 +21557,7 @@ m5_fail(ints[1], ints[0]);
    -

    23.9. gem5 arm Linux kernel patches

    +

    24.9. gem5 arm Linux kernel patches

    https://gem5.googlesource.com/arm/linux/ contains an ARM Linux kernel forks with a few gem5 specific Linux kernel patches on top of mainline created by ARM Holdings on top of a few upstream kernel releases.

    @@ -21581,7 +21613,7 @@ git -C "$(./getvar linux_source_dir)" checkout -
    -

    because glibc was built to expect a newer Linux kernel as shown at: Section 10.4.1, “FATAL: kernel too old failure in userland simulation”. Your choices to solve this are:

    +

    because glibc was built to expect a newer Linux kernel as shown at: Section 11.4.1, “FATAL: kernel too old failure in userland simulation”. Your choices to solve this are:

    -

    23.9.1. gem5 arm Linux kernel patches boot speedup

    +

    24.9.1. gem5 arm Linux kernel patches boot speedup

    We have observed that with the kernel patches, boot is 2x faster, falling from 1m40s to 50s.

    @@ -21631,7 +21663,7 @@ git -C "$(./getvar linux_source_dir)" checkout -
    -

    23.10. m5out directory

    +

    24.10. m5out directory

    When you run gem5, it generates an m5out directory at:

    @@ -21647,7 +21679,7 @@ git -C "$(./getvar linux_source_dir)" checkout -

    The files in that directory contains some very important information about the run, and you should become familiar with every one of them.

    -

    23.10.1. gem5 m5out/system.terminal file

    +

    24.10.1. gem5 m5out/system.terminal file

    Contains UART output, both from the Linux kernel or from the baremetal system.

    @@ -21656,7 +21688,7 @@ git -C "$(./getvar linux_source_dir)" checkout -
    -

    23.10.2. gem5 m5out/system.workload.dmesg file

    +

    24.10.2. gem5 m5out/system.workload.dmesg file

    This file used to be called just m5out/system.dmesg, but the name was changed after the workload refactorings of March 2020.

    @@ -21730,7 +21762,7 @@ index f296d89be757..3e79916322c2 100644
    -

    23.10.3. gem5 m5out/stats.txt file

    +

    24.10.3. gem5 m5out/stats.txt file

    This file contains important statistics about the run:

    @@ -21829,7 +21861,7 @@ system.cpu.dtb.inst_hits

    and after that the file size went down to 21KB.

    -
    23.10.3.1. gem5 HDF5 statistics
    +
    24.10.3.1. gem5 HDF5 statistics

    We can make gem5 dump statistics in the HDF5 format by adding the magic h5:// prefix to the file name as in:

    @@ -21879,7 +21911,7 @@ system.cpu.dtb.inst_hits
    -
    23.10.3.2. gem5 only dump selected stats
    +
    24.10.3.2. gem5 only dump selected stats

    https://stackoverflow.com/questions/52014953/how-to-dump-only-a-single-or-certain-selected-stats-in-gem5

    @@ -21891,7 +21923,7 @@ system.cpu.dtb.inst_hits
    -
    23.10.3.3. Meaning of each gem5 stat
    +
    24.10.3.3. Meaning of each gem5 stat

    Well, run minimal examples, and reverse engineer them up!

    @@ -21949,7 +21981,7 @@ sim_ops 6 # Number of ops (including micro ops) simulated
    -
    23.10.3.4. gem5 stats internals
    +
    24.10.3.4. gem5 stats internals

    This describes the internals of the gem5 m5out/stats.txt file.

    @@ -22023,7 +22055,7 @@ Text::end()
    -

    23.10.4. gem5 config.ini

    +

    24.10.4. gem5 config.ini

    The m5out/config.ini file, contains a very good high level description of the system:

    @@ -22096,7 +22128,7 @@ clock=500

    Modifying the config.ini file manually does nothing since it gets overwritten every time.

    -
    23.10.4.1. gem5 config.dot
    +
    24.10.4.1. gem5 config.dot

    The m5out/config.dot file contains a graphviz .dot file that provides a simplified graphical view of a subset of the gem5 config.ini.

    @@ -22177,7 +22209,7 @@ xdg-open "$(./getvar --arch arm --emulator gem5 m5out_dir)/config.dot.svg"
    -

    23.11. m5term

    +

    24.11. m5term

    We use the m5term in-tree executable to connect to the terminal instead of a direct telnet.

    @@ -22202,7 +22234,7 @@ xdg-open "$(./getvar --arch arm --emulator gem5 m5out_dir)/config.dot.svg"
    -

    23.12. gem5 Python scripts without rebuild

    +

    24.12. gem5 Python scripts without rebuild

    We have made a crazy setup that allows you to just cd into submodules/gem5, and edit Python scripts directly there.

    @@ -22236,7 +22268,7 @@ xdg-open "$(./getvar --arch arm --emulator gem5 m5out_dir)/config.dot.svg"
    -

    23.13. gem5 fs_bigLITTLE

    +

    24.13. gem5 fs_bigLITTLE

    By default, we use configs/example/fs.py script.

    @@ -22285,7 +22317,7 @@ xdg-open "$(./getvar --arch arm --emulator gem5 m5out_dir)/config.dot.svg"
    -

    23.14. gem5 in-tree tests

    +

    24.14. gem5 in-tree tests

    https://stackoverflow.com/questions/52279971/how-to-run-the-gem5-unit-tests

    @@ -22296,7 +22328,7 @@ xdg-open "$(./getvar --arch arm --emulator gem5 m5out_dir)/config.dot.svg"

    But can the people from the project be convinced of that?

    -

    23.14.1. gem5 unit tests

    +

    24.14.1. gem5 unit tests

    These are just very small GTest tests that test a single class in isolation, they don’t run any executables.

    @@ -22351,7 +22383,7 @@ xdg-open "$(./getvar --arch arm --emulator gem5 m5out_dir)/config.dot.svg"
    -

    23.14.2. gem5 regression tests

    +

    24.14.2. gem5 regression tests

    This section is about running the gem5 in-tree tests.

    @@ -22400,7 +22432,7 @@ xdg-open "$(./getvar --arch arm --emulator gem5 m5out_dir)/config.dot.svg"
    -

    23.15. gem5 simulate() limit reached

    +

    24.15. gem5 simulate() limit reached

    This error happens when the following instruction limits are reached:

    @@ -22536,24 +22568,24 @@ Exiting @ tick 18446744073709551615 because simulate() limit reached
    -

    23.16. gem5 build options

    +

    24.16. gem5 build options

    In order to use different build options, you might also want to use gem5 build variants to keep the build outputs separate from one another.

    -

    23.16.1. gem5 debug build

    +

    24.16.1. gem5 debug build

    -

    How to use it in LKMC: Section 22.8, “Debug the emulator”.

    +

    How to use it in LKMC: Section 23.8, “Debug the emulator”.

    If you build gem5 with scons build/ARM/gem5.debug, then that is a .debug build.

    -

    It relates to the more common .opt build just as explained at Section 22.8, “Debug the emulator”: both .opt and .debug have -g, but .opt uses -O2 while .debug uses -O0.

    +

    It relates to the more common .opt build just as explained at Section 23.8, “Debug the emulator”: both .opt and .debug have -g, but .opt uses -O2 while .debug uses -O0.

    -

    23.16.2. gem5 fast build

    +

    24.16.2. gem5 fast build

    ./build-gem5 --gem5-build-type fast
    @@ -22571,13 +22603,13 @@ Exiting @ tick 18446744073709551615 because simulate() limit reached
    -

    23.16.3. gem5 prof and perf builds

    +

    24.16.3. gem5 prof and perf builds

    Profiling builds as of 3cea7d9ce49bda49c50e756339ff1287fd55df77 both use: -g -O3 and disable asserts and logging like the gem5 fast build and:

    @@ -22605,7 +22637,7 @@ gprof "$(./getvar --arch aarch64 gem5_executable)" > tmp.gprof
    -

    23.16.4. gem5 clang build

    +

    24.16.4. gem5 clang build

    TODO test properly, benchmark vs GCC.

    @@ -22618,7 +22650,7 @@ gprof "$(./getvar --arch aarch64 gem5_executable)" > tmp.gprof
    -

    23.16.5. gem5 sanitation build

    +

    24.16.5. gem5 sanitation build

    If there gem5 appears to have a C++ undefined behaviour bug, which is often very difficult to track down, you can try to build it with the following extra SCons options:

    @@ -22692,7 +22724,7 @@ Indirect leak of 1346 byte(s) in 2 object(s) allocated from:
    -

    23.16.6. gem5 Ruby build

    +

    24.16.6. gem5 Ruby build

    gem5 has two types of memory system:

    @@ -22828,7 +22860,7 @@ cat "$(./getvar --arch aarch64 --emulator gem5 trace_txt_file)"

    Tested in gem5 d7d9bc240615625141cd6feddbadd392457e49eb.

    -
    23.16.6.1. gem5 Ruby MI_example protocol
    +
    24.16.6.1. gem5 Ruby MI_example protocol

    This is the simplest of all protocols, and therefore the first one you should study to learn how Ruby works.

    @@ -22864,7 +22896,7 @@ cat "$(./getvar --arch aarch64 --emulator gem5 trace_txt_file)"
    -
    23.16.6.2. gem5 crossbar interconnect
    +
    24.16.6.2. gem5 crossbar interconnect

    Crossbar or XBar in the code, is the default CPU interconnect that gets used by fs.py if --ruby is not given.

    @@ -22915,7 +22947,7 @@ class SystemXBar(CoherentXBar):
    -

    23.16.7. gem5 Python 3 build

    +

    24.16.7. gem5 Python 3 build

    Python 3 support was mostly added in 2019 Q3 at arounda347a1a68b8a6e370334be3a1d2d66675891e0f1 but remained buggy for some time afterwards.

    @@ -22933,7 +22965,7 @@ class SystemXBar(CoherentXBar):
    -

    23.17. gem5 CPU types

    +

    24.17. gem5 CPU types

    gem5 has a few in tree CPU models for different purposes.

    @@ -23016,9 +23048,9 @@ class SystemXBar(CoherentXBar):

    From this we see that there are basically only 4 C++ CPU models in gem5: Atomic, Timing, Minor and O3. All others are basically parametrizations of those base types.

    -

    23.17.1. List of gem5 CPU types

    +

    24.17.1. List of gem5 CPU types

    -
    23.17.1.1. gem5 BaseSimpleCPU
    +
    24.17.1.1. gem5 BaseSimpleCPU

    Simple abstract CPU without a pipeline.

    @@ -23039,7 +23071,7 @@ class SystemXBar(CoherentXBar):
    -
    23.17.1.1.1. gem5 AtomicSimpleCPU
    +
    24.17.1.1.1. gem5 AtomicSimpleCPU

    AtomicSimpleCPU: the default one. Memory accesses happen instantaneously. The fastest simulation except for KVM, but not realistic at all.

    @@ -23048,7 +23080,7 @@ class SystemXBar(CoherentXBar):
    -
    23.17.1.1.2. gem5 TimingSimpleCPU
    +
    24.17.1.1.2. gem5 TimingSimpleCPU

    TimingSimpleCPU: memory accesses are realistic, but the CPU has no pipeline. The simulation is faster than detailed models, but slower than AtomicSimpleCPU.

    @@ -23064,7 +23096,7 @@ class SystemXBar(CoherentXBar):
    -
    23.17.1.2. gem5 MinorCPU
    +
    24.17.1.2. gem5 MinorCPU

    Generic in-order superscalar core.

    @@ -23130,7 +23162,7 @@ class SystemXBar(CoherentXBar):
    -
    23.17.1.3. gem5 DerivO3CPU
    +
    24.17.1.3. gem5 DerivO3CPU

    Generic out-of-order core. "O3" Stands for "Out Of Order"!

    @@ -23190,7 +23222,7 @@ wbWidth=8
    -
    23.17.1.3.1. gem5 DerivO3CPU pipeline stages
    +
    24.17.1.3.1. gem5 DerivO3CPU pipeline stages
    -
    23.17.1.3.2. gem5 util/o3-pipeview.py O3 pipeline viewer
    +
    24.17.1.3.2. gem5 util/o3-pipeview.py O3 pipeline viewer

    Mentioned at: http://www.m5sim.org/Visualization

    @@ -23243,7 +23275,7 @@ less o3pipeview.tmp.log
    -
    23.17.1.3.3. gem5 Konata O3 pipeline viewer
    +
    24.17.1.3.3. gem5 Konata O3 pipeline viewer

    https://github.com/shioyadan/Konata

    @@ -23263,7 +23295,7 @@ less o3pipeview.tmp.log
    -

    23.17.2. gem5 ARM RSK

    +

    24.17.2. gem5 ARM RSK

    https://github.com/arm-university/arm-gem5-rsk/blob/aa3b51b175a0f3b6e75c9c856092ae0c8f2a7cdc/gem5_rsk.pdf

    @@ -23273,7 +23305,7 @@ less o3pipeview.tmp.log
    -

    23.18. gem5 ARM platforms

    +

    24.18. gem5 ARM platforms

    The gem5 platform is selectable with the --machine option, which is named after the analogous QEMU -machine option, and which sets the --machine-type.

    @@ -23301,7 +23333,7 @@ less o3pipeview.tmp.log
    -

    23.19. gem5 upstream images

    +

    24.19. gem5 upstream images

    Present at:

    @@ -23355,7 +23387,7 @@ cd ..
    -

    23.20. gem5 bootloaders

    +

    24.20. gem5 bootloaders

    Certain ISAs like ARM have bootloaders that are automatically run before the main image to setup basic system state.

    @@ -23388,12 +23420,12 @@ cd ..
    -

    23.21. gem5 memory system

    +

    24.21. gem5 memory system

    Parent section: gem5 internals.

    -

    23.21.1. gem5 port system

    +

    24.21.1. gem5 port system

    The gem5 memory system is connected in a very flexible way through the port system.

    @@ -23404,7 +23436,7 @@ cd ..

    A Packet is the basic information unit that gets sent across ports.

    -
    23.21.1.1. gem5 functional vs atomic vs timing memory requests
    +
    24.21.1.1. gem5 functional vs atomic vs timing memory requests

    gem5 memory requests can be classified in the following broad categories:

    @@ -23614,7 +23646,7 @@ TimingSimpleCPU::finishTranslation(WholeTranslationState *state)

    Tested in gem5 b1623cb2087873f64197e503ab8894b5e4d4c7b4.

    -
    23.21.1.1.1. gem5 functional requests
    +
    24.21.1.1.1. gem5 functional requests

    As seen at gem5 functional vs atomic vs timing memory requests, functional requests are not used in common simulation, since the core must always go through caches.

    @@ -23661,9 +23693,9 @@ TimingSimpleCPU::finishTranslation(WholeTranslationState *state)
    -

    23.21.2. gem5 Packet vs Request

    +

    24.21.2. gem5 Packet vs Request

    -
    23.21.2.1. gem5 Packet
    +
    24.21.2.1. gem5 Packet

    Packet is what goes through ports: a single packet is sent out to the memory system, gets modified when it hits valid data, and then returns with the reply.

    @@ -23746,7 +23778,7 @@ Addr addr;
    -
    23.21.2.1.1. gem5 MemCmd
    +
    24.21.2.1.1. gem5 MemCmd

    Each gem5 Packet contains a MemCmd

    @@ -23832,7 +23864,7 @@ MemCmd::commandInfo[] =
    -
    23.21.2.2. gem5 Request
    +
    24.21.2.2. gem5 Request

    One good way to think about Request vs Packet could be "it is what the instruction definitions see", a bit like ExecContext vs ThreadContext.

    @@ -23899,7 +23931,7 @@ Addr _vaddr = MaxAddr;
    -
    23.21.2.2.1. gem5 Request in AtomicSimpleCPU
    +
    24.21.2.2.1. gem5 Request in AtomicSimpleCPU

    In AtomicSimpleCPU, a single packet of each type is kept for the entire CPU, e.g.:

    @@ -23960,7 +23992,7 @@ TLB::translateMmuOn(ThreadContext* tc, const RequestPtr &req, Mode mode,
    -
    23.21.2.2.2. gem5 Request in TimingSimpleCPU
    +
    24.21.2.2.2. gem5 Request in TimingSimpleCPU

    In TimingSimpleCPU, the request gets created per memory read:

    @@ -23995,7 +24027,7 @@ TimingSimpleCPU::initiateMemRead(Addr addr, unsigned size,
    -

    23.21.3. gem5 MSHR

    +

    24.21.3. gem5 MSHR

    Mentioned at: http://pages.cs.wisc.edu/~swilson/gem5-docs/gem5MemorySystem.html

    @@ -24057,7 +24089,7 @@ TimingSimpleCPU::initiateMemRead(Addr addr, unsigned size,
    -

    23.21.4. gem5 CommMonitor

    +

    24.21.4. gem5 CommMonitor

    You can place this SimObject in between two ports to get extra statistics about the packets that are going through.

    @@ -24103,7 +24135,7 @@ TimingSimpleCPU::initiateMemRead(Addr addr, unsigned size,
    -

    23.21.5. gem5 SimpleMemory

    +

    24.21.5. gem5 SimpleMemory

    SimpleMemory is a highly simplified memory system. It can replace a more complex DRAM model if you use it e.g. as:

    @@ -24127,7 +24159,7 @@ type=SimpleMemory
    -

    23.22. gem5 internals

    +

    24.22. gem5 internals

    Internals under other sections:

    @@ -24148,7 +24180,7 @@ type=SimpleMemory
    -

    23.22.1. gem5 Eclipse configuration

    +

    24.22.1. gem5 Eclipse configuration

    https://stackoverflow.com/questions/61656709/how-to-setup-eclipse-ide-for-gem5-development

    @@ -24213,7 +24245,7 @@ type=SimpleMemory
    -

    23.22.2. gem5 Python C++ interaction

    +

    24.22.2. gem5 Python C++ interaction

    The interaction uses the Python C extension interface https://docs.python.org/2/extending/extending.html interface through the pybind11 helper library: https://github.com/pybind/pybind11

    @@ -24398,7 +24430,7 @@ static EmbeddedPyBind embed_obj("BadDevice", module_init, "BasicPioDevice");
    -

    23.22.3. gem5 entry point

    +

    24.22.3. gem5 entry point

    The main is at: src/sim/main.cc. It calls:

    @@ -24486,7 +24518,7 @@ exec filecode in scope

    Tested at gem5 b4879ae5b0b6644e6836b0881e4da05c64a6550d.

    -
    23.22.3.1. gem5 m5.objects module
    +
    24.22.3.1. gem5 m5.objects module

    All SimObjects seem to be automatically added to the m5.objects namespace, and this is done in a very convoluted way, let’s try to understand a bit:

    @@ -24651,7 +24683,7 @@ for source in PySource.all:
    -

    23.22.4. gem5 event queue

    +

    24.22.4. gem5 event queue

    gem5 is an event based simulator, and as such the event queue is of of the crucial elements in the system.

    @@ -24757,7 +24789,7 @@ b EventFunctionWrapper::process

    Then, once we had that, the most perfect thing ever would be to make the full event graph containing which events schedule which events!

    -
    23.22.4.1. gem5 event queue AtomicSimpleCPU syscall emulation freestanding example analysis
    +
    24.22.4.1. gem5 event queue AtomicSimpleCPU syscall emulation freestanding example analysis

    Let’s now analyze every single event on a minimal gem5 syscall emulation mode in the simplest CPU that we have:

    @@ -24893,7 +24925,7 @@ AtomicSimpleCPU::tick() at atomic.cc:757 0x55555907834c

    Tested in gem5 12c917de54145d2d50260035ba7fa614e25317a3.

    -
    23.22.4.1.1. AtomicSimpleCPU initial events
    +
    24.22.4.1.1. AtomicSimpleCPU initial events

    Let’s have a closer look at the initial magically scheduled events of the simulation.

    @@ -25112,7 +25144,7 @@ simulate() at simulate.cc:104 0x555559476d6f
    -
    23.22.4.1.2. AtomicSimpleCPU tick reschedule timing
    +
    24.22.4.1.2. AtomicSimpleCPU tick reschedule timing

    Inside AtomicSimpleCPU::tick() we saw previously that the reschedule happens at:

    @@ -25152,7 +25184,7 @@ clock=500
    -
    23.22.4.1.3. AtomicSimpleCPU memory access
    +
    24.22.4.1.3. AtomicSimpleCPU memory access

    It will be interesting to see how AtomicSimpleCPU makes memory access on GDB and to compare that with TimingSimpleCPU.

    @@ -25206,7 +25238,7 @@ clock=500
    -
    23.22.4.1.4. gem5 se.py page translation
    +
    24.22.4.1.4. gem5 se.py page translation

    Happens on EmulationPageTable, and seems to happen atomically without making any extra memory requests.

    @@ -25277,7 +25309,7 @@ Exiting @ tick 3500 because exiting with last active thread context
    -
    23.22.4.2. gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis
    +
    24.22.4.2. gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis

    Now, let’s move on to TimingSimpleCPU, which is just like AtomicSimpleCPU internally, but now the memory requests don’t actually finish immediately: gem5 CPU types!

    @@ -25558,7 +25590,7 @@ info: Entering event queue @ 0. Starting simulation...
    -
    23.22.4.2.1. TimingSimpleCPU analysis #0
    +
    24.22.4.2.1. TimingSimpleCPU analysis #0

    Schedules TimingSimpleCPU::fetch through:

    @@ -25603,7 +25635,7 @@ ArmLinuxProcess64::initState
    -
    23.22.4.2.2. TimingSimpleCPU analysis #1
    +
    24.22.4.2.2. TimingSimpleCPU analysis #1

    Backtrace:

    @@ -25734,7 +25766,7 @@ DRAMCtrl::Rank::startup(Tick ref_tick)
    -
    23.22.4.2.3. TimingSimpleCPU analysis #2
    +
    24.22.4.2.3. TimingSimpleCPU analysis #2

    This is just the startup of the second rank, see: TimingSimpleCPU analysis #1.

    @@ -25767,13 +25799,13 @@ DRAMCtrl::Rank::startup(Tick ref_tick)
    -
    23.22.4.2.4. TimingSimpleCPU analysis #3 and #4
    +
    24.22.4.2.4. TimingSimpleCPU analysis #3 and #4

    From the timing we know what that one is: the end of time exit event, like for AtomicSimpleCPU.

    -
    23.22.4.2.5. TimingSimpleCPU analysis #5
    +
    24.22.4.2.5. TimingSimpleCPU analysis #5

    Executes TimingSimpleCPU::fetch().

    @@ -25881,7 +25913,7 @@ DRAMCtrl::Rank::startup(Tick ref_tick)
    -
    23.22.4.2.6. TimingSimpleCPU analysis #6
    +
    24.22.4.2.6. TimingSimpleCPU analysis #6

    Schedules DRAMCtrl::processNextReqEvent through:

    @@ -26018,7 +26050,7 @@ TimingSimpleCPU::fetch
    -
    23.22.4.2.7. TimingSimpleCPU analysis #7
    +
    24.22.4.2.7. TimingSimpleCPU analysis #7

    Schedules BaseXBar::Layer::releaseLayer through:

    @@ -26044,13 +26076,13 @@ TimingSimpleCPU::fetch
    -
    23.22.4.2.8. TimingSimpleCPU analysis #8
    +
    24.22.4.2.8. TimingSimpleCPU analysis #8

    Executes DRAMCtrl::processNextReqEvent.

    -
    23.22.4.2.9. TimingSimpleCPU analysis #9
    +
    24.22.4.2.9. TimingSimpleCPU analysis #9

    Schedules DRAMCtrl::Rank::processActivateEvent through:

    @@ -26064,7 +26096,7 @@ DRAMCtrl::processNextReqEvent
    -
    23.22.4.2.10. TimingSimpleCPU analysis #10
    +
    24.22.4.2.10. TimingSimpleCPU analysis #10

    Schedules DRAMCtrl::processRespondEvent through:

    @@ -26076,7 +26108,7 @@ DRAMCtrl::processNextReqEvent
    -
    23.22.4.2.11. TimingSimpleCPU analysis #11
    +
    24.22.4.2.11. TimingSimpleCPU analysis #11

    Schedules DRAMCtrl::processNextReqEvent through:

    @@ -26088,7 +26120,7 @@ DRAMCtrl::processNextReqEvent
    -
    23.22.4.2.12. TimingSimpleCPU analysis #12
    +
    24.22.4.2.12. TimingSimpleCPU analysis #12

    Executes DRAMCtrl::Rank::processActivateEvent.

    @@ -26097,7 +26129,7 @@ DRAMCtrl::processNextReqEvent
    -
    23.22.4.2.13. TimingSimpleCPU analysis #13
    +
    24.22.4.2.13. TimingSimpleCPU analysis #13

    Schedules DRAMCtrl::Rank::processPowerEvent through:

    @@ -26110,7 +26142,7 @@ DRAMCtrl::Rank::processActivateEvent
    -
    23.22.4.2.14. TimingSimpleCPU analysis #14
    +
    24.22.4.2.14. TimingSimpleCPU analysis #14

    Executes DRAMCtrl::Rank::processPowerEvent.

    @@ -26119,25 +26151,25 @@ DRAMCtrl::Rank::processActivateEvent
    -
    23.22.4.2.15. TimingSimpleCPU analysis #15
    +
    24.22.4.2.15. TimingSimpleCPU analysis #15

    Executes BaseXBar::Layer<SrcType, DstType>::releaseLayer.

    -
    23.22.4.2.16. TimingSimpleCPU analysis #16
    +
    24.22.4.2.16. TimingSimpleCPU analysis #16

    Executes DRAMCtrl::processNextReqEvent().

    -
    23.22.4.2.17. TimingSimpleCPU analysis #17
    +
    24.22.4.2.17. TimingSimpleCPU analysis #17

    Executes DRAMCtrl::processRespondEvent().

    -
    23.22.4.2.18. TimingSimpleCPU analysis #18
    +
    24.22.4.2.18. TimingSimpleCPU analysis #18

    Schedules PacketQueue::processSendEvent() through:

    @@ -26152,13 +26184,13 @@ DRAMCtrl::processRespondEvent
    -
    23.22.4.2.19. TimingSimpleCPU analysis #19
    +
    24.22.4.2.19. TimingSimpleCPU analysis #19

    Executes PacketQueue::processSendEvent().

    -
    23.22.4.2.20. TimingSimpleCPU analysis #20
    +
    24.22.4.2.20. TimingSimpleCPU analysis #20

    Schedules PacketQueue::processSendEvent through:

    @@ -26182,7 +26214,7 @@ PacketQueue::processSendEvent
    -
    23.22.4.2.21. TimingSimpleCPU analysis #21
    +
    24.22.4.2.21. TimingSimpleCPU analysis #21

    Schedules BaseXBar::Layer<SrcType, DstType>::releaseLayer through:

    @@ -26202,19 +26234,19 @@ PacketQueue::processSendEvent
    -
    23.22.4.2.22. TimingSimpleCPU analysis #22
    +
    24.22.4.2.22. TimingSimpleCPU analysis #22

    Executes BaseXBar::Layer<SrcType, DstType>::releaseLayer.

    -
    23.22.4.2.23. TimingSimpleCPU analysis #23
    +
    24.22.4.2.23. TimingSimpleCPU analysis #23

    Executes PacketQueue::processSendEvent.

    -
    23.22.4.2.24. TimingSimpleCPU analysis #24
    +
    24.22.4.2.24. TimingSimpleCPU analysis #24

    Schedules TimingSimpleCPU::IcachePort::ITickEvent::process() through:

    @@ -26232,7 +26264,7 @@ PacketQueue::processSendEvent
    -
    23.22.4.2.25. TimingSimpleCPU analysis #25
    +
    24.22.4.2.25. TimingSimpleCPU analysis #25

    Executes TimingSimpleCPU::IcachePort::ITickEvent::process().

    @@ -26252,7 +26284,7 @@ PacketQueue::processSendEvent
    -
    23.22.4.2.26. TimingSimpleCPU analysis #26
    +
    24.22.4.2.26. TimingSimpleCPU analysis #26

    Schedules DRAMCtrl::processNextReqEvent through:

    @@ -26281,7 +26313,7 @@ TimingSimpleCPU::IcachePort::ITickEvent::process
    -
    23.22.4.2.27. TimingSimpleCPU analysis #27
    +
    24.22.4.2.27. TimingSimpleCPU analysis #27

    Schedules BaseXBar::Layer<SrcType, DstType>::releaseLayer through:

    @@ -26307,19 +26339,19 @@ TimingSimpleCPU::IcachePort::ITickEvent::process
    -
    23.22.4.2.28. TimingSimpleCPU analysis #28
    +
    24.22.4.2.28. TimingSimpleCPU analysis #28

    Execute DRAMCtrl::processNextReqEvent.

    -
    23.22.4.2.29. TimingSimpleCPU analysis #29
    +
    24.22.4.2.29. TimingSimpleCPU analysis #29

    Schedule DRAMCtrl::processRespondEvent().

    -
    23.22.4.2.30. TimingSimpleCPU analysis: LDR stall
    +
    24.22.4.2.30. TimingSimpleCPU analysis: LDR stall

    One important thing we want to check now, is how the memory reads are going to make the processor stall in the middle of an instruction.

    @@ -26437,7 +26469,7 @@ TimingSimpleCPU::IcachePort::ITickEvent::process
    -
    23.22.4.3. gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis with caches
    +
    24.22.4.3. gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis with caches

    Let’s just add --caches to gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis to see if things go any faster, and add Cache to --trace as in:

    @@ -26753,7 +26785,7 @@ type=SetAssociative

    At 1000, the future event is executed, and so it reads the original packet from the MSHR, and uses that to create a new request [40:7f] which gets forwarded.

    -
    23.22.4.3.1. What is the coherency protocol implemented by the classic cache system in gem5?
    +
    24.22.4.3.1. What is the coherency protocol implemented by the classic cache system in gem5?

    MOESI cache coherence protocol: https://github.com/gem5/gem5/blob/9fc9c67b4242c03f165951775be5cd0812f2a705/src/mem/cache/cache_blk.hh#L352

    @@ -26761,12 +26793,12 @@ type=SetAssociative

    The actual representation is done via separate state bits: https://github.com/gem5/gem5/blob/9fc9c67b4242c03f165951775be5cd0812f2a705/src/mem/cache/cache_blk.hh#L66 and MOESI appears explicitly only on the pretty printing.

    -

    This pretty printing appears for example in the --trace Cache lines as shown at gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis with caches and with a few more transitions visible at Section 23.22.4.4, “gem5 event queue AtomicSimpleCPU syscall emulation freestanding example analysis with caches and multiple CPUs”.

    +

    This pretty printing appears for example in the --trace Cache lines as shown at gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis with caches and with a few more transitions visible at Section 24.22.4.4, “gem5 event queue AtomicSimpleCPU syscall emulation freestanding example analysis with caches and multiple CPUs”.

    -
    23.22.4.4. gem5 event queue AtomicSimpleCPU syscall emulation freestanding example analysis with caches and multiple CPUs
    +
    24.22.4.4. gem5 event queue AtomicSimpleCPU syscall emulation freestanding example analysis with caches and multiple CPUs

    It would be amazing to analyze a simple example with interconnect packets possibly invalidating caches of other CPUs.

    @@ -26976,7 +27008,7 @@ type=SetAssociative
    -
    23.22.4.5. gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis with caches and multiple CPUs
    +
    24.22.4.5. gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis with caches and multiple CPUs

    Like gem5 event queue AtomicSimpleCPU syscall emulation freestanding example analysis with caches and multiple CPUs but with gem5 TimingSimpleCPU and userland/c/atomic/aarch64_add.c:

    @@ -27301,7 +27333,7 @@ global 147
    -
    23.22.4.6. gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis with caches and multiple CPUs and Ruby
    +
    24.22.4.6. gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis with caches and multiple CPUs and Ruby

    Now let’s do the exact same we did for gem5 event queue AtomicSimpleCPU syscall emulation freestanding example analysis with caches and multiple CPUs, but with Ruby rather than the classic system and TimingSimpleCPU (atomic does not work with Ruby)

    @@ -27343,7 +27375,7 @@ non-atomic 19
    -
    23.22.4.7. gem5 event queue MinorCPU syscall emulation freestanding example analysis
    +
    24.22.4.7. gem5 event queue MinorCPU syscall emulation freestanding example analysis

    The events for the Atomic CPU were pretty simple: basically just ticks.

    @@ -27513,14 +27545,14 @@ non-atomic 19
    -
    23.22.4.7.1. gem5 event queue MinorCPU syscall emulation freestanding example analysis: hazard
    +
    24.22.4.7.1. gem5 event queue MinorCPU syscall emulation freestanding example analysis: hazard

    TODO like gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazard but with the hazard.

    -
    23.22.4.8. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis
    +
    24.22.4.8. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis

    Like gem5 event queue MinorCPU syscall emulation freestanding example analysis but even more complex since for the gem5 DerivO3CPU!

    @@ -27548,7 +27580,7 @@ non-atomic 19

    This section and children are tested at LKMC 144a552cf926ea630ef9eadbb22b79fe2468c456.

    -
    23.22.4.8.1. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazardless
    +
    24.22.4.8.1. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazardless

    Let’s have a look at the arguably simplest example userland/arch/aarch64/freestanding/linux/hazardless.S.

    @@ -27787,7 +27819,7 @@ non-atomic 19
    -
    23.22.4.8.2. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazard
    +
    24.22.4.8.2. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazard

    Now let’s do the same as in gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazardless but with a hazard: userland/arch/aarch64/freestanding/linux/hazard.S.

    @@ -27831,7 +27863,7 @@ non-atomic 19
    -
    23.22.4.8.3. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazard4
    +
    24.22.4.8.3. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazard4

    Like gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazard but a hazard of depth 4: userland/arch/aarch64/freestanding/linux/hazard.S.

    @@ -27872,7 +27904,7 @@ non-atomic 19
    -
    23.22.4.8.4. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall
    +
    24.22.4.8.4. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall

    Like gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazard but now with an LDR stall: userland/arch/aarch64/freestanding/linux/stall.S.

    @@ -27923,7 +27955,7 @@ non-atomic 19
    -
    23.22.4.8.5. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall_gain
    +
    24.22.4.8.5. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall_gain

    Like gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall but now with an LDR stall: userland/arch/aarch64/freestanding/linux/stall_gain.S.

    @@ -28010,7 +28042,7 @@ non-atomic 19
    -
    23.22.4.8.6. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall_hazard4
    +
    24.22.4.8.6. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall_hazard4

    Like gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall_gain but now with some dependencies after the LDR: userland/arch/aarch64/freestanding/linux/stall_hazard4.S.

    @@ -28077,7 +28109,7 @@ non-atomic 19
    -
    23.22.4.8.7. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: speculative
    +
    24.22.4.8.7. gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: speculative

    Now let’s try to see some Speculative execution in action with userland/arch/aarch64/freestanding/linux/speculative.S.

    @@ -28266,7 +28298,7 @@ wbActual:0
    -

    23.22.5. gem5 instruction definitions

    +

    24.22.5. gem5 instruction definitions

    This is one of the parts of gem5 that rely on semi-useless code generation inside the .isa sublanguage.

    @@ -28309,7 +28341,7 @@ wbActual:0
    -

    We also notice that the key argument passed to those instructions is of type ExecContext, which is discussed further at: Section 23.22.6.3, “gem5 ExecContext.

    +

    We also notice that the key argument passed to those instructions is of type ExecContext, which is discussed further at: Section 24.22.6.3, “gem5 ExecContext.

    The file is an include so that compilation can be split up into chunks by the autogenerated includers

    @@ -28514,7 +28546,7 @@ namespace ArmISAInst {

    Tested in gem5 b1623cb2087873f64197e503ab8894b5e4d4c7b4.

    -
    23.22.5.1. gem5 execute vs initiateAcc vs completeAcc
    +
    24.22.5.1. gem5 execute vs initiateAcc vs completeAcc

    These are the key methods defined in instruction definitions, so lets see when each one gets called and what they do more or less.

    @@ -28568,7 +28600,7 @@ namespace ArmISAInst {

    This can be seen concretely in GDB from the analysis done at: TimingSimpleCPU analysis: LDR stall and for more memory details see gem5 functional vs atomic vs timing memory requests.

    -
    23.22.5.1.1. gem5 completeAcc
    +
    24.22.5.1.1. gem5 completeAcc

    completeAcc is boring on most simple store memory instructions, e.g. a simple STR:

    @@ -28621,7 +28653,7 @@ namespace ArmISAInst {
    -
    23.22.5.2. gem5 microops
    +
    24.22.5.2. gem5 microops

    Some gem5 instructions break down into multiple microops.

    @@ -28682,7 +28714,7 @@ namespace ArmISAInst {
    -

    23.22.6. gem5 ThreadContext vs ThreadState vs ExecContext vs Process

    +

    24.22.6. gem5 ThreadContext vs ThreadState vs ExecContext vs Process

    These classes get used everywhere, and they have a somewhat convoluted relation with one another, so let’s figure it out this mess.

    @@ -28693,7 +28725,7 @@ namespace ArmISAInst {

    This section and all children tested at gem5 b1623cb2087873f64197e503ab8894b5e4d4c7b4.

    -
    23.22.6.1. gem5 ThreadContext
    +
    24.22.6.1. gem5 ThreadContext

    As we delve into more details below, we will reach the following conclusion: a ThreadContext represents on thread of a CPU with multiple Hardware threads.

    @@ -28743,7 +28775,7 @@ typedef SimpleThread MinorThread;

    Essentially all methods of the base ThreadContext are pure virtual.

    -
    23.22.6.1.1. gem5 SimpleThread
    +
    24.22.6.1.1. gem5 SimpleThread

    SimpleThread storage defined on BaseSimpleCPU for simple CPUs like AtomicSimpleCPU:

    @@ -28838,7 +28870,7 @@ typedef SimpleThread MinorThread;
    -
    23.22.6.1.2. gem5 O3ThreadContext
    +
    24.22.6.1.2. gem5 O3ThreadContext

    Instantiation happens in the FullO3CPU constructor:

    @@ -28942,7 +28974,7 @@ FullO3CPU<Impl>::readArchIntReg(int reg_idx, ThreadID tid)
    -
    23.22.6.2. gem5 ThreadState
    +
    24.22.6.2. gem5 ThreadState

    Owned one per ThreadContext.

    @@ -28988,7 +29020,7 @@ class O3ThreadContext : public ThreadContext
    -
    23.22.6.3. gem5 ExecContext
    +
    24.22.6.3. gem5 ExecContext

    ExecContext gets used in gem5 instruction definitions, e.g.:

    @@ -29148,7 +29180,7 @@ class O3ThreadContext : public ThreadContext

    This makes sense, since each ThreadContext represents one CPU register set, and therefore needs a separate ExecContext which allows instruction implementations to access those registers.

    -
    23.22.6.3.1. gem5 ExecContext::readIntRegOperand register resolution
    +
    24.22.6.3.1. gem5 ExecContext::readIntRegOperand register resolution

    Let’s have a look at how ExecContext::readIntRegOperand actually matches registers to decoded registers IDs, since it is not obvious.

    @@ -29187,7 +29219,7 @@ class O3ThreadContext : public ThreadContext

    First, we guess that they must be related to the reading of x1 and x2, which are the inputs of the addition.

    -

    Next, we also guess that the 0 read must correspond to x2, since it later gets potentially shifted as mentioned at Section 29.4.4.1, “ARM shift suffixes”.

    +

    Next, we also guess that the 0 read must correspond to x2, since it later gets potentially shifted as mentioned at Section 30.4.4.1, “ARM shift suffixes”.

    Let’s also have a look at the decoder code that builds the instruction instance in build/ARM/arch/arm/generated/decoder-ns.cc.inc:

    @@ -29421,7 +29453,7 @@ flattenIntIndex(int reg) const
    -
    23.22.6.4. gem5 Process
    +
    24.22.6.4. gem5 Process

    The Process class is used only for gem5 syscall emulation mode, and it represents a process like a Linux userland process, in addition to any further gem5 specific data needed to represent the process.

    @@ -29509,12 +29541,12 @@ readFunc(SyscallDesc *desc, ThreadContext *tc,
    -

    23.22.7. gem5 functional units

    +

    24.22.7. gem5 functional units

    Each instruction is marked with a class, and each class can execute in a given functional unit.

    -
    23.22.7.1. gem5 MinorCPU default functional units
    +
    24.22.7.1. gem5 MinorCPU default functional units

    Which units are available is visible for example on the gem5 config.ini of a gem5 MinorCPU run. Functional units are not present in simple CPUs like gem5 TimingSimpleCPU.

    @@ -29673,7 +29705,7 @@ opClass=IntAlu
    -
    23.22.7.2. gem5 DerivO3CPU default functional units
    +
    24.22.7.2. gem5 DerivO3CPU default functional units

    On gem5 3ca404da175a66e0b958165ad75eb5f54cb5e772, after running:

    @@ -29771,7 +29803,7 @@ pipelined=false
    -

    23.22.8. gem5 code generation

    +

    24.22.8. gem5 code generation

    gem5 uses a ton of code generation, which makes the project horrendous:

    @@ -29816,7 +29848,7 @@ pipelined=false

    But it has been widely overused to insanity. It likely also exists partly because when the project started in 2003 C++ compilers weren’t that good, so you couldn’t rely on features like templates that much.

    -
    23.22.8.1. gem5 THE_ISA
    +
    24.22.8.1. gem5 THE_ISA

    Generated code at: build/<ISA>/config/the_isa.hh which e.g. for ARM contains:

    @@ -29862,9 +29894,9 @@ enum class Arch {
    -

    23.22.9. gem5 build system

    +

    24.22.9. gem5 build system

    -
    23.22.9.1. M5_OVERRIDE_PY_SOURCE
    +
    24.22.9.1. M5_OVERRIDE_PY_SOURCE
    @@ -29879,7 +29911,7 @@ enum class Arch {
    -
    23.22.9.2. gem5 build broken on recent compiler version
    +
    24.22.9.2. gem5 build broken on recent compiler version

    gem5 moves a bit slowly, and if your host compiler is very new, the gem5 build might be broken for it, e.g. this was the case for Ubuntu 19.10 with GCC 9 and gem5 62d75e7105fe172eb906d4f80f360ff8591d4178 from Dec 2019.

    @@ -29904,7 +29936,7 @@ enum class Arch {
    -
    23.22.9.3. gem5 polymorphic ISA includes
    +
    24.22.9.3. gem5 polymorphic ISA includes

    E.g. src/cpu/decode_cache.hh includes:

    @@ -29983,7 +30015,7 @@ build/ARM/config/the_isa.hh
    -
    23.22.9.4. Why are all C++ symlinked into the gem5 build dir?
    +
    24.22.9.4. Why are all C++ symlinked into the gem5 build dir?

    Upstream request: https://gem5.atlassian.net/browse/GEM5-469

    @@ -30024,7 +30056,7 @@ build/ARM/config/the_isa.hh
    -

    24. Gensim

    +

    25. Gensim

    https://gensim.org

    @@ -30128,10 +30160,10 @@ gensim/models/armv8/isa.ac
    -

    25. Buildroot

    +

    26. Buildroot

    -

    25.1. Introduction to Buildroot

    +

    26.1. Introduction to Buildroot

    Buildroot is a set of Make scripts that download and compile from source compatible versions of:

    @@ -30144,7 +30176,7 @@ gensim/models/armv8/isa.ac

    Linux kernel

  • -

    C standard library: Buildroot supports several implementations, see: Section 25.10, “libc choice”

    +

    C standard library: Buildroot supports several implementations, see: Section 26.10, “libc choice”

  • BusyBox: provides the shell and basic command line utilities

    @@ -30155,7 +30187,7 @@ gensim/models/armv8/isa.ac

    It therefore produces a pristine, blob-less, debuggable setup, where all moving parts are configured to work perfectly together.

  • -

    Perhaps the awesomeness of Buildroot only sinks in once you notice that all it takes is 4 commands as explained at Section 25.11, “Buildroot hello world”.

    +

    Perhaps the awesomeness of Buildroot only sinks in once you notice that all it takes is 4 commands as explained at Section 26.11, “Buildroot hello world”.

    The downsides of Buildroot are:

    @@ -30203,7 +30235,7 @@ gensim/models/armv8/isa.ac
    -

    25.2. Custom Buildroot configs

    +

    26.2. Custom Buildroot configs

    We provide the following mechanisms:

    @@ -30238,10 +30270,10 @@ gensim/models/armv8/isa.ac

    The clean is necessary because the source files didn’t change, so make would just check the timestamps and not build anything.

    -

    You will then likely want to make those more permanent as explained at: Section 37.4, “Default command line arguments”.

    +

    You will then likely want to make those more permanent as explained at: Section 38.4, “Default command line arguments”.

    -

    25.2.1. Enable Buildroot compiler optimizations

    +

    26.2.1. Enable Buildroot compiler optimizations

    If you are benchmarking compiled programs instead of hand written assembly, remember that we configure Buildroot to disable optimizations by default with:

    @@ -30273,7 +30305,7 @@ gensim/models/armv8/isa.ac
    -

    25.3. Find Buildroot options with make menuconfig

    +

    26.3. Find Buildroot options with make menuconfig

    make menuconfig is a convenient way to find Buildroot configurations:

    @@ -30335,7 +30367,7 @@ make menuconfig
    -

    25.4. Change user

    +

    26.4. Change user

    At startup, we login automatically as the root user.

    @@ -30372,7 +30404,7 @@ make menuconfig
    -

    25.4.1. Login as a non-root user without password

    +

    26.4.1. Login as a non-root user without password

    Replace on inittab:

    @@ -30395,7 +30427,7 @@ make menuconfig
    -

    25.5. Add new files to the Buildroot image

    +

    26.5. Add new files to the Buildroot image

    These are your options:

    @@ -30459,7 +30491,7 @@ make menuconfig
    -

    25.5.1. Add new Buildroot packages

    +

    26.5.1. Add new Buildroot packages

    First, see if you can’t get away without actually adding a new package, for example:

    @@ -30469,7 +30501,7 @@ make menuconfig

    if you have a standalone C file with no dependencies besides the C standard library to be compiled with GCC, just add a new file under buildroot_packages/sample_package and you are done

  • -

    if you have a dependency on a library, first check if Buildroot doesn’t have a package for it already with ls buildroot/package. If yes, just enable that package as explained at: Section 25.2, “Custom Buildroot configs”

    +

    if you have a dependency on a library, first check if Buildroot doesn’t have a package for it already with ls buildroot/package. If yes, just enable that package as explained at: Section 26.2, “Custom Buildroot configs”

  • @@ -30477,7 +30509,7 @@ make menuconfig

    If none of those methods are flexible enough for you, you can just fork or hack up buildroot_packages/sample_package the sample package to do what you want.

    -

    For how to use that package, see: Section 37.15.2, “buildroot_packages directory”.

    +

    For how to use that package, see: Section 38.15.2, “buildroot_packages directory”.

    Then iterate trying to do what you want and reading the manual until it works: https://buildroot.org/downloads/manual/manual.html

    @@ -30485,7 +30517,7 @@ make menuconfig
    -

    25.6. Remove Buildroot packages

    +

    26.6. Remove Buildroot packages

    Once you’ve built a package in to the image, there is no easy way to remove it.

    @@ -30497,7 +30529,7 @@ make menuconfig
    -

    25.7. BR2_TARGET_ROOTFS_EXT2_SIZE

    +

    26.7. BR2_TARGET_ROOTFS_EXT2_SIZE

    When adding new large package to the Buildroot root filesystem, it may fail with the message:

    @@ -30541,7 +30573,7 @@ TODO benchmark: would gem5 suffer a considerable disk read performance hit due t

    libguestfs: https://serverfault.com/questions/246835/convert-directory-to-qemu-kvm-virtual-disk-image/916697#916697, in particular vfs-minimum-size

  • -

    use methods described at: Section 23.6.3, “gem5 checkpoint restore and run a different script” instead of putting builds on the root filesystem

    +

    use methods described at: Section 24.6.3, “gem5 checkpoint restore and run a different script” instead of putting builds on the root filesystem

  • @@ -30549,7 +30581,7 @@ TODO benchmark: would gem5 suffer a considerable disk read performance hit due t

    Bibliography: https://stackoverflow.com/questions/49211241/is-there-a-way-to-automatically-detect-the-minimum-required-br2-target-rootfs-ex

    -

    25.7.1. SquashFS

    +

    26.7.1. SquashFS

    SquashFS creation with mksquashfs does not take fixed sizes, and I have successfully booted from it, but it is readonly, which is unacceptable.

    @@ -30562,7 +30594,7 @@ TODO benchmark: would gem5 suffer a considerable disk read performance hit due t
    -

    25.8. Buildroot rebuild is slow when the root filesystem is large

    +

    26.8. Buildroot rebuild is slow when the root filesystem is large

    Buildroot is not designed for large root filesystem images, and the rebuild becomes very slow when we add a large package to it.

    @@ -30600,7 +30632,7 @@ TODO benchmark: would gem5 suffer a considerable disk read performance hit due t
    -

    25.9. Report upstream bugs

    +

    26.9. Report upstream bugs

    When asking for help on upstream repositories outside of this repository, you will need to provide the commands that you are running in detail without referencing our scripts.

    @@ -30660,7 +30692,7 @@ git -C "$(./getvar qemu_source_dir)" checkout -

    Then, you will also want to do a Bisection to pinpoint the exact commit to blame, and CC that developer.

    -

    Finally, give the images you used save upstream developers' time as shown at: Section 37.19.2, “release-zip”.

    +

    Finally, give the images you used save upstream developers' time as shown at: Section 38.19.2, “release-zip”.

    For Buildroot problems, you should wither provide the config you have:

    @@ -30675,7 +30707,7 @@ git -C "$(./getvar qemu_source_dir)" checkout -
    -

    25.10. libc choice

    +

    26.10. libc choice

    Buildroot supports several libc implementations, including:

    @@ -30719,11 +30751,11 @@ git -C "$(./getvar qemu_source_dir)" checkout -
    -

    One "downside" of glibc is that it exercises much more kernel functionality on its more bloated pre-main init, which breaks user mode C hello worlds more often, see: Section 10.4, “User mode simulation with glibc”. I quote "downside" because glibc is actually exposing emulator bugs which we should actually go and fix.

    +

    One "downside" of glibc is that it exercises much more kernel functionality on its more bloated pre-main init, which breaks user mode C hello worlds more often, see: Section 11.4, “User mode simulation with glibc”. I quote "downside" because glibc is actually exposing emulator bugs which we should actually go and fix.

    -

    25.11. Buildroot hello world

    +

    26.11. Buildroot hello world

    This repo doesn’t do much more other than setting a bunch of Buildroot configurations and building it.

    @@ -30740,7 +30772,7 @@ git -C "$(./getvar qemu_source_dir)" checkout -
    @@ -30768,7 +30800,7 @@ git -C "$(./getvar qemu_source_dir)" checkout -
    -

    25.12. Update the Buildroot toolchain

    +

    26.12. Update the Buildroot toolchain

    Users of this repo will often want to update the compilation toolchain to the latest version to get fresh new features like new ISA instructions.

    @@ -30782,7 +30814,7 @@ git -C "$(./getvar qemu_source_dir)" checkout -

    In this section we cover the most common cases.

    -

    25.12.1. Update GCC: GCC supported by Buildroot

    +

    26.12.1. Update GCC: GCC supported by Buildroot

    This is of course the simplest case.

    @@ -30900,9 +30932,9 @@ cd ../..
    -

    25.12.2. Update GCC: GCC not supported by Buildroot

    +

    26.12.2. Update GCC: GCC not supported by Buildroot

    -

    Now it gets fun, but well, guess what, we will try to do the same as Section 25.12.1, “Update GCC: GCC supported by Buildroot” but:

    +

    Now it gets fun, but well, guess what, we will try to do the same as Section 26.12.1, “Update GCC: GCC supported by Buildroot” but:

    -

    25.13. Buildroot vanilla kernel

    +

    26.13. Buildroot vanilla kernel

    By default, our build system uses build-linux, and the Buildroot kernel build is disabled: https://stackoverflow.com/questions/52231793/can-buildroot-build-the-root-filesystem-without-building-the-linux-kernel

    @@ -30992,28 +31024,28 @@ cd ../..
    -

    26. Userland content

    +

    27. Userland content

    This section documents our test and educational userland content, such as C, C++ and POSIX examples, present mostly under userland/.

    -

    Getting started at: Section 1.8, “Userland setup”

    +

    Getting started at: Section 2.8, “Userland setup”

    -

    Userland assembly content is located at: Section 27, “Userland assembly”. It was split from this section basically because we were hitting the HTML h6 limit, stupid web :-)

    +

    Userland assembly content is located at: Section 28, “Userland assembly”. It was split from this section basically because we were hitting the HTML h6 limit, stupid web :-)

    This content makes up the bulk of the userland/ directory.

    -

    The quickest way to run the arch agnostic examples, which comprise the majority of the examples, is natively as shown at: Section 1.8.2.1, “Userland setup getting started natively”

    +

    The quickest way to run the arch agnostic examples, which comprise the majority of the examples, is natively as shown at: Section 2.8.2.1, “Userland setup getting started natively”

    This section was originally moved in here from: https://github.com/cirosantilli/cpp-cheat

    -

    26.1. build-userland

    +

    27.1. build-userland

    @@ -31070,7 +31102,7 @@ cd ../..
    -

    26.2. C

    +

    27.2. C

    Programs under userland/c/ are examples of ANSI C programming:

    @@ -31209,7 +31241,7 @@ cd ../..
    -

    26.2.1. malloc

    +

    27.2.1. malloc

    @@ -31223,7 +31255,7 @@ cd ../..

    malloc leads to the infinite joys of Memory leaks.

    -
    26.2.1.1. malloc implementation
    +
    27.2.1.1. malloc implementation

    TODO: the exact answer is going to be hard.

    @@ -31268,7 +31300,7 @@ printf '%x\n' 4198400
    -
    26.2.1.2. malloc maximum size
    +
    27.2.1.2. malloc maximum size

    General overview at: https://stackoverflow.com/questions/2798330/maximum-memory-which-malloc-can-allocate

    @@ -31334,7 +31366,7 @@ echo 1 > /proc/sys/vm/overcommit_memory

    If we start using the pages, the OOM killer would sooner or later step in and kill our process: Linux out-of-memory killer.

    -
    26.2.1.2.1. Linux out-of-memory killer
    +
    27.2.1.2.1. Linux out-of-memory killer

    We can observe the OOM in LKMC 1e969e832f66cb5a72d12d57c53fb09e9721d589 which defaults to 256MiB of memory with:

    @@ -31360,7 +31392,7 @@ echo 1 > /proc/sys/vm/overcommit_memory
    -

    26.2.2. C multithreading

    +

    27.2.2. C multithreading

    Added in C11!

    @@ -31378,7 +31410,7 @@ echo 1 > /proc/sys/vm/overcommit_memory
    -
    26.2.2.1. atomic.c
    +
    27.2.2.1. atomic.c
    -

    26.2.3. GCC C extensions

    +

    27.2.3. GCC C extensions

    -
    26.2.3.1. C empty struct
    +
    27.2.3.1. C empty struct
    @@ -31484,7 +31516,7 @@ echo 1 > /proc/sys/vm/overcommit_memory
    -
    26.2.3.2. OpenMP
    +
    27.2.3.2. OpenMP

    GCC implements the OpenMP threading implementation: https://stackoverflow.com/questions/3949901/pthreads-vs-openmp

    @@ -31510,7 +31542,7 @@ echo 1 > /proc/sys/vm/overcommit_memory
    -

    26.3. C++

    +

    27.3. C++

    Programs under userland/cpp/ are examples of ISO C programming.

    @@ -31584,9 +31616,9 @@ echo 1 > /proc/sys/vm/overcommit_memory
    -

    26.3.1. C++ classes

    +

    27.3.1. C++ classes

    -
    26.3.1.1. C++ constructor
    +
    27.3.1.1. C++ constructor
    • @@ -31618,7 +31650,7 @@ echo 1 > /proc/sys/vm/overcommit_memory
    -

    26.3.2. C++ standards

    +

    27.3.2. C++ standards

    Like for C, you have to pay for the standards…​ insane. So we just use the closest free drafts instead.

    @@ -31675,7 +31707,7 @@ destructor
    -

    26.3.3. C++ initialization types

    +

    27.3.3. C++ initialization types

    OMG this is hell, understand when primitive variables are initialized or not:

    @@ -31723,7 +31755,7 @@ destructor
    -

    26.3.4. C++ multithreading

    +

    27.3.4. C++ multithreading

    -
    26.3.4.1. atomic.cpp
    +
    27.3.4.1. atomic.cpp
    @@ -31957,7 +31989,7 @@ time ./mutex.out 4 100000000
    -
    26.3.4.1.1. Detailed gem5 analysis of how data races happen
    +
    27.3.4.1.1. Detailed gem5 analysis of how data races happen

    The smallest data race we managed to come up as of LKMC 7c01b29f1ee7da878c7cc9cb4565f3f3cf516a92 and gem5 872cb227fdc0b4d60acc7840889d567a6936b6e1 was with userland/c/atomic.c (see also C multithreading):

    @@ -32062,7 +32094,7 @@ non-atomic 19
    -
    26.3.4.2. C++ std::memory_order
    +
    27.3.4.2. C++ std::memory_order

    https://stackoverflow.com/questions/12346487/what-do-each-memory-order-mean

    @@ -32074,7 +32106,7 @@ non-atomic 19
    -
    26.3.4.3. C++ parallel algorithms
    +
    27.3.4.3. C++ parallel algorithms

    https://stackoverflow.com/questions/51031060/are-c17-parallel-algorithms-implemented-already/55989883#55989883

    @@ -32083,14 +32115,14 @@ non-atomic 19
    -
    26.3.4.4. C++17 N4659 standards draft
    +
    27.3.4.4. C++17 N4659 standards draft

    http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4659.pdf

    -

    26.3.5. C++ templates

    +

    27.3.5. C++ templates

    -
    26.3.5.1. SFINAE
    +
    27.3.5.1. SFINAE
    @@ -32112,7 +32144,7 @@ non-atomic 19
    -

    26.3.6. C++ type casting

    +

    27.3.6. C++ type casting

    userland/cpp/static_dynamic_reinterpret_cast.cpp

    @@ -32121,7 +32153,7 @@ non-atomic 19
    -

    26.3.7. C++ compile time magic

    +

    27.3.7. C++ compile time magic

    -
    26.3.7.1. C++ decltype
    +
    27.3.7.1. C++ decltype
    @@ -32146,9 +32178,9 @@ non-atomic 19
    -

    26.3.8. C++ concepts

    +

    27.3.8. C++ concepts

    -
    26.3.8.1. C++ iterators
    +
    27.3.8.1. C++ iterators
    @@ -32171,12 +32203,12 @@ non-atomic 19
    -

    26.3.9. C++ third-party libraries

    +

    27.3.9. C++ third-party libraries

    Under: userland/libs directory.

    -
    26.3.9.1. Boost
    +
    27.3.9.1. Boost
    @@ -32192,7 +32224,7 @@ non-atomic 19
    -
    26.3.9.2. GoogleTest
    +
    27.3.9.2. GoogleTest

    https://github.com/google/googletest

    @@ -32244,7 +32276,7 @@ cd ../../userland/libs/googletest
    -
    26.3.9.3. HDF5
    +
    27.3.9.3. HDF5

    https://en.wikipedia.org/wiki/Hierarchical_Data_Format

    @@ -32268,7 +32300,7 @@ cd ../../userland/libs/googletest
    -

    26.4. POSIX

    +

    27.4. POSIX

    Programs under userland/posix/ are examples of POSIX C programming.

    @@ -32286,13 +32318,13 @@ cd ../../userland/libs/googletest
    -

    26.4.1. Environment variables

    +

    27.4.1. Environment variables

    POSIX C example that prints all environment variables: userland/posix/environ.c

    -

    26.4.2. unistd.h

    +

    27.4.2. unistd.h

    -

    26.4.3. fork

    +

    27.4.3. fork

    POSIX' multiprocess API. Contrast with pthreads which are for threads.

    @@ -32330,7 +32362,7 @@ fork() return = 13039

    Read the source comments and understand everything that is going on!

    -
    26.4.3.1. getpid
    +
    27.4.3.1. getpid

    The minimal interesting example is to use fork and observe different PIDs.

    @@ -32342,7 +32374,7 @@ fork() return = 13039
    -
    26.4.3.2. Fork bomb
    +
    27.4.3.2. Fork bomb

    https://en.wikipedia.org/wiki/Fork_bomb

    @@ -32377,7 +32409,7 @@ fork() return = 13039
    -

    26.4.4. pthreads

    +

    27.4.4. pthreads

    POSIX' multithreading API. Contrast with fork which is for processes.

    @@ -32401,7 +32433,7 @@ fork() return = 13039
    -
    26.4.4.1. pthread_mutex
    +
    27.4.4.1. pthread_mutex

    userland/posix/pthread_count.c exemplifies the functions:

    @@ -32438,7 +32470,7 @@ There are no non-locking atomic types or atomic primitives in POSIX: -

    26.4.5. sysconf

    +

    27.4.5. sysconf

    https://pubs.opengroup.org/onlinepubs/9699919799/functions/sysconf.html

    @@ -32484,7 +32516,7 @@ There are no non-locking atomic types or atomic primitives in POSIX: -

    26.4.6. mmap

    +

    27.4.6. mmap

    The mmap system call allows advanced memory operations.

    @@ -32495,7 +32527,7 @@ There are no non-locking atomic types or atomic primitives in POSIX: -
    26.4.6.1. mmap MAP_ANONYMOUS
    +
    27.4.6.1. mmap MAP_ANONYMOUS

    Basic mmap example, do the same as userland/c/malloc.c, but with mmap.

    @@ -32513,7 +32545,7 @@ There are no non-locking atomic types or atomic primitives in POSIX: -
    26.4.6.2. mmap file
    +
    27.4.6.2. mmap file

    Memory mapped file example: userland/posix/mmap_file.c

    @@ -32525,7 +32557,7 @@ There are no non-locking atomic types or atomic primitives in POSIX: -
    26.4.6.3. brk
    +
    27.4.6.3. brk

    Previously POSIX, but was deprecated in favor of malloc

    @@ -32541,7 +32573,7 @@ There are no non-locking atomic types or atomic primitives in POSIX: -

    26.4.7. socket

    +

    27.4.7. socket

    A bit like read and write, but from / to the Internet!

    @@ -32555,7 +32587,7 @@ There are no non-locking atomic types or atomic primitives in POSIX: -

    26.5. Userland multithreading

    +

    27.5. Userland multithreading

    The following sections are related to multithreading in userland:

    @@ -32617,12 +32649,12 @@ There are no non-locking atomic types or atomic primitives in POSIX: -

    26.6. C debugging

    +

    27.6. C debugging

    Let’s group the hard-to-debug undefined-behaviour-like stuff found in C / C+ here and how to tackle those problems.

    -

    26.6.1. Stack smashing

    +

    27.6.1. Stack smashing

    @@ -32642,7 +32674,7 @@ There are no non-locking atomic types or atomic primitives in POSIX: -

    26.6.2. Memory leaks

    +

    27.6.2. Memory leaks

    @@ -32651,7 +32683,7 @@ There are no non-locking atomic types or atomic primitives in POSIX: -

    26.6.3. Profiling userland programs

    +

    27.6.3. Profiling userland programs

    @@ -32671,12 +32703,12 @@ There are no non-locking atomic types or atomic primitives in POSIX: -

    26.7. Interpreted languages

    +

    27.7. Interpreted languages

    Maybe some day someone will use this setup to study the performance of interpreters.

    -

    26.7.1. Python

    +

    27.7.1. Python

    @@ -32704,9 +32736,9 @@ There are no non-locking atomic types or atomic primitives in POSIX: -
    26.7.1.1. Python standard library
    +
    27.7.1.1. Python standard library
    -
    26.7.1.1.1. Python unittest
    +
    27.7.1.1.1. Python unittest

    rootfs_overlay/lkmc/python/unittest_find/ contains examples to test how tests are found by unittest within directories. Related questions:

    @@ -32722,7 +32754,7 @@ There are no non-locking atomic types or atomic primitives in POSIX: -
    26.7.1.1.2. Python relative imports
    +
    27.7.1.1.2. Python relative imports

    rootfs_overlay/lkmc/python/relative_import/ contains examples to test how how to do relative imports in Python.

    @@ -32751,7 +32783,7 @@ There are no non-locking atomic types or atomic primitives in POSIX: -
    26.7.1.2. Build and install the interpreter
    +
    27.7.1.2. Build and install the interpreter

    Buildroot has a Python package that can be added to the guest image:

    @@ -32810,7 +32842,7 @@ There are no non-locking atomic types or atomic primitives in POSIX: -
    26.7.1.3. Python gem5 user mode simulation
    +
    27.7.1.3. Python gem5 user mode simulation

    At LKMC 50ac89b779363774325c81157ec8b9a6bdb50a2f gem5 390a74f59934b85d91489f8a563450d8321b602da:

    @@ -32870,7 +32902,7 @@ There are no non-locking atomic types or atomic primitives in POSIX: -
    26.7.1.4. Embedding Python in another application
    +
    27.7.1.4. Embedding Python in another application

    Here we will add some better examples and explanations for: https://docs.python.org/3/extending/embedding.html#very-high-level-embedding

    @@ -32921,7 +32953,7 @@ There are no non-locking atomic types or atomic primitives in POSIX: -
    26.7.1.5. pybind11
    +
    27.7.1.5. pybind11
    @@ -32944,7 +32976,7 @@ There are no non-locking atomic types or atomic primitives in POSIX: -

    26.7.2. Node.js

    +

    27.7.2. Node.js

    @@ -33030,7 +33062,7 @@ There are no non-locking atomic types or atomic primitives in POSIX: -
    26.7.2.1. Node.js step debugging
    +
    27.7.2.1. Node.js step debugging

    Overviews:

    @@ -33055,7 +33087,7 @@ There are no non-locking atomic types or atomic primitives in POSIX: -
    26.7.2.2. NPM
    +
    27.7.2.2. NPM
    @@ -33074,7 +33106,7 @@ There are no non-locking atomic types or atomic primitives in POSIX: -
    26.7.2.2.1. NPM data-files
    +
    27.7.2.2.1. NPM data-files

    Illustrates how to add extra non-code data files to an NPM package, and then use those files at runtime.

    @@ -33085,7 +33117,7 @@ There are no non-locking atomic types or atomic primitives in POSIX: -

    26.7.3. Java

    +

    27.7.3. Java

    @@ -33104,7 +33136,7 @@ There are no non-locking atomic types or atomic primitives in POSIX: -

    26.8. Algorithms

    +

    27.8. Algorithms

    @@ -33264,7 +33296,7 @@ cmp tmp.o tmp.e

    These are good targets for performance analysis with gem5, and there is some overlap between this section and Benchmarks.

    -

    26.8.1. BST vs heap vs hashmap

    +

    27.8.1. BST vs heap vs hashmap

    @@ -33362,7 +33394,7 @@ xdg-open bst_vs_heap_vs_hashmap_gem5.tmp.png

    The cache sizes were chosen to match the host 2017 Lenovo ThinkPad P51 to improve the comparison. Ideally we should also use the same standard library.

    -

    Note that this will take a long time, and will produce a humongous ~40Gb stats file as explained at: Section 23.10.3.2, “gem5 only dump selected stats”

    +

    Note that this will take a long time, and will produce a humongous ~40Gb stats file as explained at: Section 24.10.3.2, “gem5 only dump selected stats”

    Sources:

    @@ -33382,7 +33414,7 @@ xdg-open bst_vs_heap_vs_hashmap_gem5.tmp.png
    -

    26.8.2. BLAS

    +

    27.8.2. BLAS

    Buildroot supports it, which makes everything just trivial:

    @@ -33434,7 +33466,7 @@ cblas_dgemm( CblasColMajor, CblasNoTrans, CblasTrans,3,3,2 ,1, A,3, B,
    -

    26.8.3. Eigen

    +

    27.8.3. Eigen

    Header only linear algebra library with a mainline Buildroot package:

    @@ -33473,7 +33505,7 @@ cblas_dgemm( CblasColMajor, CblasNoTrans, CblasTrans,3,3,2 ,1, A,3, B,
    -

    26.9. Benchmarks

    +

    27.9. Benchmarks

    These are good targets for performance analysis with gem5.

    @@ -33491,7 +33523,7 @@ cblas_dgemm( CblasColMajor, CblasNoTrans, CblasTrans,3,3,2 ,1, A,3, B,
    -

    26.9.1. Microbenchmarks

    +

    27.9.1. Microbenchmarks

    It eventually has to come to that, hasn’t it?

    @@ -33530,7 +33562,7 @@ cblas_dgemm( CblasColMajor, CblasNoTrans, CblasTrans,3,3,2 ,1, A,3, B,
    -

    26.10. userland/libs directory

    +

    27.10. userland/libs directory

    Tests under userland/libs require certain optional libraries to be installed on the target, and are not built or tested by default, you must enable them with either:

    @@ -33574,7 +33606,7 @@ cd userland/libs/eigen
    -

    26.11. Userland content filename conventions

    +

    27.11. Userland content filename conventions

    The following basenames should always refer to programs that do the same thing, but in different languages:

    @@ -33603,7 +33635,7 @@ cd userland/libs/eigen
    -

    26.12. Userland content bibliography

    +

    27.12. Userland content bibliography

    -

    27. Userland assembly

    +

    28. Userland assembly

    Programs under userland/arch/<arch>/ are examples of userland assembly programming.

    @@ -33647,7 +33679,7 @@ cd userland/libs/eigen
    -

    Like other userland programs, these programs can be run as explained at: Section 1.8, “Userland setup”.

    +

    Like other userland programs, these programs can be run as explained at: Section 2.8, “Userland setup”.

    As a quick reminder, the fastest setups to get started are:

    @@ -33663,7 +33695,7 @@ cd userland/libs/eigen
    -

    However, as usual, it is saner to build your toolchain as explained at: Section 10.1, “QEMU user mode getting started”.

    +

    However, as usual, it is saner to build your toolchain as explained at: Section 11.1, “QEMU user mode getting started”.

    The first examples you should look into are:

    @@ -33716,7 +33748,7 @@ cd userland/libs/eigen
  • -

    registers, see: Section 27.1, “Assembly registers”

    +

    registers, see: Section 28.1, “Assembly registers”

  • jumping:

    @@ -33859,14 +33891,14 @@ error: asm_main returned 1 at line 8
  • -

    27.1. Assembly registers

    +

    28.1. Assembly registers

    After seeing an ADD hello world, you need to learn the general registers:

    -

    27.1.1. ARMv8 aarch64 x31 register

    +

    28.1.1. ARMv8 aarch64 x31 register

    @@ -33982,7 +34014,7 @@ When instructions do not interpret this operand encoding as the zero register, u
    -

    27.2. Floating point assembly

    +

    28.2. Floating point assembly

    Keep in mind that many ISAs started floating point as an optional thing, and it later got better integrated into the main CPU, side by side with SIMD.

    @@ -34024,7 +34056,7 @@ When instructions do not interpret this operand encoding as the zero register, u
    -

    27.3. SIMD assembly

    +

    28.3. SIMD assembly

    Much like ADD for non-SIMD, start learning SIMD instructions by looking at the integer and floating point SIMD ADD instructions of each ISA:

    @@ -34114,14 +34146,14 @@ When instructions do not interpret this operand encoding as the zero register, u

    Bibliography: https://stackoverflow.com/questions/1389712/getting-started-with-intel-x86-sse-simd-instructions/56409539#56409539

    -

    27.3.1. FMA instruction

    +

    28.3.1. FMA instruction

    Fused multiply add:

    @@ -34163,7 +34195,7 @@ When instructions do not interpret this operand encoding as the zero register, u
    -

    27.4. User vs system assembly

    +

    28.4. User vs system assembly

    By "userland assembly", we mean "the parts of the ISA which can be freely used from userland".

    @@ -34174,17 +34206,17 @@ When instructions do not interpret this operand encoding as the zero register, u

    One big difference between both is that we can run userland assembly on Userland setup, which is easier to get running and debug.

    -

    In particular, most userland assembly examples link to the C standard library, see: Section 27.5, “Userland assembly C standard library”.

    +

    In particular, most userland assembly examples link to the C standard library, see: Section 28.5, “Userland assembly C standard library”.

    Userland assembly is generally simpler, and a pre-requisite for Baremetal setup.

    -

    System-land assembly cheats will be put under: Section 1.9, “Baremetal setup”.

    +

    System-land assembly cheats will be put under: Section 2.9, “Baremetal setup”.

    -

    27.5. Userland assembly C standard library

    +

    28.5. Userland assembly C standard library

    All examples except the Freestanding programs link to the C standard library.

    @@ -34217,7 +34249,7 @@ When instructions do not interpret this operand encoding as the zero register, u
    -

    27.5.1. Freestanding programs

    +

    28.5.1. Freestanding programs

    Unlike most our other assembly examples, which use the C standard library for portability, examples under freestanding/ directories don’t link to the C standard library:

    @@ -34297,7 +34329,7 @@ When instructions do not interpret this operand encoding as the zero register, u
    -
    27.5.1.1. nostartfiles programs
    +
    28.5.1.1. nostartfiles programs

    Assembly examples under nostartfiles directories can use the standard library, but they don’t use the pre-main boilerplate and start directly at our explicitly given _start:

    @@ -34380,7 +34412,7 @@ Is it any easy to determine which functions I can use or not, in case there are
    -

    27.6. GCC inline assembly

    +

    28.6. GCC inline assembly

    Examples under arch/<arch>/c/ directories show to how use inline assembly from higher level languages such as C:

    @@ -34443,7 +34475,7 @@ Is it any easy to determine which functions I can use or not, in case there are
    -

    27.6.1. GCC inline assembly register variables

    +

    28.6.1. GCC inline assembly register variables

    Used notably in some of the Linux system calls setups:

    @@ -34467,14 +34499,14 @@ Is it any easy to determine which functions I can use or not, in case there are

    In arm, it is the only way to achieve this effect: https://stackoverflow.com/questions/10831792/how-to-use-specific-register-in-arm-inline-assembler

    -

    This feature notably useful for making system calls from C, see: Section 27.7, “Linux system calls”.

    +

    This feature notably useful for making system calls from C, see: Section 28.7, “Linux system calls”.

    Documentation: https://gcc.gnu.org/onlinedocs/gcc-4.4.2/gcc/Explicit-Reg-Vars.html

    -

    27.6.2. GCC inline assembly scratch registers

    +

    28.6.2. GCC inline assembly scratch registers

    How to use temporary registers in inline assembly:

    @@ -34500,7 +34532,7 @@ Is it any easy to determine which functions I can use or not, in case there are
    -

    27.6.3. GCC inline assembly early-clobbers

    +

    28.6.3. GCC inline assembly early-clobbers

    An example of using the & early-clobber modifier: link:userland/arch/aarch64/earlyclobber.c

    @@ -34512,7 +34544,7 @@ Is it any easy to determine which functions I can use or not, in case there are
    -

    27.6.4. GCC inline assembly floating point ARM

    +

    28.6.4. GCC inline assembly floating point ARM

    Not documented as of GCC 8.2, but possible: https://stackoverflow.com/questions/53960240/armv8-floating-point-output-inline-assembly

    @@ -34528,7 +34560,7 @@ Is it any easy to determine which functions I can use or not, in case there are
    -

    27.6.5. GCC intrinsics

    +

    28.6.5. GCC intrinsics

    Pre-existing C wrappers using inline assembly, this is what production programs should use instead of inline assembly for SIMD:

    @@ -34550,7 +34582,7 @@ Is it any easy to determine which functions I can use or not, in case there are
    -
    27.6.5.1. GCC x86 intrinsics
    +
    28.6.5.1. GCC x86 intrinsics

    Good official cheatsheet with all intrinsics and what they expand to: https://software.intel.com/sites/landingpage/IntrinsicsGuide

    @@ -34678,7 +34710,7 @@ zmmintrin.h AVX512
    -

    27.7. Linux system calls

    +

    28.7. Linux system calls

    The following Userland setup programs illustrate how to make system calls:

    @@ -34777,7 +34809,7 @@ zmmintrin.h AVX512
    -

    27.7.1. futex system call

    +

    28.7.1. futex system call

    This is how threads either:

    @@ -34839,7 +34871,7 @@ child after parent sleep
    -
    27.7.1.1. Userland mutex implementation
    +
    28.7.1.1. Userland mutex implementation

    The best article to understand spinlocks is: https://eli.thegreenplace.net/2018/basics-of-futexes/

    @@ -34849,7 +34881,7 @@ child after parent sleep
    -

    27.7.2. getcpu system call and the sched_getaffinity glibc wrapper

    +

    28.7.2. getcpu system call and the sched_getaffinity glibc wrapper

    Examples:

    @@ -34923,7 +34955,7 @@ child after parent sleep
    -

    27.7.3. perf_event_open system call

    +

    28.7.3. perf_event_open system call

    userland/linux/perf_event_open.c

    @@ -34974,7 +35006,7 @@ child after parent sleep
    -

    27.8. Linux calling conventions

    +

    28.8. Linux calling conventions

    A summary of results is shown at: Table 3, “Summary of Linux calling conventions for several architectures”.

    @@ -35016,7 +35048,7 @@ child after parent sleep
    -

    27.8.1. x86_64 calling convention

    +

    28.8.1. x86_64 calling convention

    Examples:

    @@ -35045,7 +35077,7 @@ child after parent sleep
    -

    27.8.2. ARM calling convention

    +

    28.8.2. ARM calling convention

    Call C standard library functions from assembly and vice versa.

    @@ -35107,7 +35139,7 @@ child after parent sleep
    -

    27.9. GNU GAS assembler

    +

    28.9. GNU GAS assembler

    GNU GAS is the default assembler used by GDB, and therefore it completely dominates in Linux.

    @@ -35115,7 +35147,7 @@ child after parent sleep

    The Linux kernel in particular uses GNU GAS assembly extensively for the arch specific parts under arch/.

    -

    27.9.1. GNU GAS assembler comments

    +

    28.9.1. GNU GAS assembler comments

    In this tutorial, we use exclusively C Preprocessor /**/ comments because:

    @@ -35150,7 +35182,7 @@ child after parent sleep
    -

    27.9.2. GNU GAS assembler immediates

    +

    28.9.2. GNU GAS assembler immediates

    Summary:

    @@ -35182,7 +35214,7 @@ child after parent sleep
    -

    27.9.3. GNU GAS assembler data sizes

    +

    28.9.3. GNU GAS assembler data sizes

    Let’s see how many bytes go into each data type:

    @@ -35274,9 +35306,9 @@ child after parent sleep
    -
    27.9.3.1. GNU GAS assembler ARM specifics
    +
    28.9.3.1. GNU GAS assembler ARM specifics
    -
    27.9.3.1.1. GNU GAS assembler ARM unified syntax
    +
    28.9.3.1.1. GNU GAS assembler ARM unified syntax

    There are two types of ARMv7 assemblies:

    @@ -35321,14 +35353,14 @@ child after parent sleep
  • -

    cannot have implicit destination with shift, see: Section 29.4.4.1, “ARM shift suffixes”

    +

    cannot have implicit destination with shift, see: Section 30.4.4.1, “ARM shift suffixes”

  • -
    27.9.3.2. GNU GAS assembler ARM .n and .w suffixes
    +
    28.9.3.2. GNU GAS assembler ARM .n and .w suffixes

    When reading disassembly, many instructions have either a .n or .w suffix.

    @@ -35341,7 +35373,7 @@ child after parent sleep
    -

    27.9.4. GNU GAS assembler char literals

    +

    28.9.4. GNU GAS assembler char literals

    userland/arch/x86_64/char_literals.S

    @@ -35362,14 +35394,14 @@ child after parent sleep
    -

    27.10. NOP instructions

    +

    28.10. NOP instructions

    @@ -35386,13 +35418,13 @@ child after parent sleep
    -

    28. x86 userland assembly

    +

    29. x86 userland assembly

    -

    Arch agnostic infrastructure getting started at: Section 27, “Userland assembly”.

    +

    Arch agnostic infrastructure getting started at: Section 28, “Userland assembly”.

    -

    28.1. x86 registers

    +

    29.1. x86 registers

    link:userland/arch/x86_64/registers.S

    @@ -35443,7 +35475,7 @@ child after parent sleep
    -

    28.2. x86 addressing modes

    +

    29.2. x86 addressing modes

    @@ -35526,7 +35558,7 @@ child after parent sleep
    -

    28.3. x86 data transfer instructions

    +

    29.3. x86 data transfer instructions

    5.1.1 "Data Transfer Instructions"

    @@ -35557,7 +35589,7 @@ child after parent sleep
    -

    28.3.1. x86 exchange instructions

    +

    29.3.1. x86 exchange instructions

    Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 7.3.1.2 "Exchange Instructions":

    @@ -35575,7 +35607,7 @@ child after parent sleep

    TODO: concrete multi-thread GCC inline assembly examples of how all those instructions are normally used as synchronization primitives.

    -
    28.3.1.1. x86 CMPXCHG instruction
    +
    29.3.1.1. x86 CMPXCHG instruction

    userland/arch/x86_64/cmpxchg.S

    @@ -35599,7 +35631,7 @@ child after parent sleep
    -

    28.3.2. x86 PUSH and POP instructions

    +

    29.3.2. x86 PUSH and POP instructions

    userland/arch/x86_64/push.S

    @@ -35626,7 +35658,7 @@ add $8, %rsp
    -

    28.3.3. x86 CQTO and CLTQ instructions

    +

    29.3.3. x86 CQTO and CLTQ instructions

    Examples:

    @@ -35727,7 +35759,7 @@ add $8, %rsp
    -

    28.3.4. x86 CMOVcc instructions

    +

    29.3.4. x86 CMOVcc instructions

    -

    It is interesting to compare this with ARMv7 conditional execution: which is available for all instructions, as shown at: Section 29.2.5, “ARM conditional execution”.

    +

    It is interesting to compare this with ARMv7 conditional execution: which is available for all instructions, as shown at: Section 30.2.5, “ARM conditional execution”.

    -

    28.4. x86 binary arithmetic instructions

    +

    29.4. x86 binary arithmetic instructions

    Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.1.2 "Binary Arithmetic Instructions":

    @@ -35853,7 +35885,7 @@ add $8, %rsp
    -

    28.5. x86 logical instructions

    +

    29.5. x86 logical instructions

    Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.1.4 "Logical Instructions"

    @@ -35875,7 +35907,7 @@ add $8, %rsp
    -

    28.6. x86 shift and rotate instructions

    +

    29.6. x86 shift and rotate instructions

    Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.1.5 "Shift and Rotate Instructions"

    @@ -35927,7 +35959,7 @@ add $8, %rsp
    -

    28.7. x86 bit and byte instructions

    +

    29.7. x86 bit and byte instructions

    Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.1.6 "Bit and Byte Instructions"

    @@ -35986,7 +36018,7 @@ add $8, %rsp
    -

    28.8. x86 control transfer instructions

    +

    29.8. x86 control transfer instructions

    Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.1.7 "Control Transfer Instructions"

    @@ -36005,7 +36037,7 @@ add $8, %rsp
    -

    28.8.1. x86 Jcc instructions

    +

    29.8.1. x86 Jcc instructions

    userland/arch/x86_64/jcc.S

    @@ -36079,7 +36111,7 @@ add $8, %rsp
    -

    28.8.2. x86 LOOP instruction

    +

    29.8.2. x86 LOOP instruction

    userland/arch/x86_64/loop.S

    @@ -36088,7 +36120,7 @@ add $8, %rsp
    -

    28.8.3. x86 string instructions

    +

    29.8.3. x86 string instructions

    Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.1.8 "String Instructions"

    @@ -36141,7 +36173,7 @@ add $8, %rsp

    However, as computer architecture evolved, those instructions might not offer considerable speedups anymore, and modern glibc such as 2.29 just uses x86 SIMD operations instead:, see also: https://stackoverflow.com/questions/33480999/how-can-the-rep-stosb-instruction-execute-faster-than-the-equivalent-loop

    -
    28.8.3.1. x86 REP prefix
    +
    29.8.3.1. x86 REP prefix

    Example: userland/arch/x86_64/rep.S

    @@ -36180,7 +36212,7 @@ add $8, %rsp
    -

    28.8.4. x86 ENTER and LEAVE instructions

    +

    29.8.4. x86 ENTER and LEAVE instructions

    userland/arch/x86_64/enter.S

    @@ -36231,16 +36263,16 @@ pop %rbp
    -

    28.9. x86 miscellaneous instructions

    +

    29.9. x86 miscellaneous instructions

    Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.1.13 "Miscellaneous Instructions"

    -

    NOP: Section 27.10, “NOP instructions”

    +

    NOP: Section 28.10, “NOP instructions”

    -

    28.10. x86 random number generator instructions

    +

    29.10. x86 random number generator instructions

    Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.1.15 Random Number Generator Instructions

    @@ -36263,7 +36295,7 @@ pop %rbp

    RDRAND sets the carry flag when data is ready so we must loop if the carry flag isn’t set.

    -

    28.10.1. x86 CPUID instruction

    +

    29.10.1. x86 CPUID instruction

    Example: userland/arch/x86_64/cpuid.S

    @@ -36334,7 +36366,7 @@ pop %rbp
    -

    28.11. x86 x87 FPU instructions

    +

    29.11. x86 x87 FPU instructions

    Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.2 "X87 FPU INSTRUCTIONS"

    @@ -36427,7 +36459,7 @@ pop %rbp
    -

    28.11.1. x86 x87 FPU vs SIMD

    +

    29.11.1. x86 x87 FPU vs SIMD

    https://stackoverflow.com/questions/1844669/benefits-of-x87-over-sse

    @@ -36466,9 +36498,9 @@ pop %rbp
    -

    28.12. x86 SIMD

    +

    29.12. x86 SIMD

    -

    Parent section: Section 27.3, “SIMD assembly”

    +

    Parent section: Section 28.3, “SIMD assembly”

    History:

    @@ -36502,12 +36534,12 @@ pop %rbp
    -

    28.12.1. x86 SSE instructions

    +

    29.12.1. x86 SSE instructions

    -
    28.12.1.2. x86 SSE packed arithmetic instructions
    +
    29.12.1.2. x86 SSE packed arithmetic instructions
    @@ -36539,14 +36571,14 @@ pop %rbp
    -
    28.12.1.3. x86 SSE conversion instructions
    +
    29.12.1.3. x86 SSE conversion instructions

    Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.5.1.6 "SSE Conversion Instructions"

    -

    28.12.2. x86 SSE2 instructions

    +

    29.12.2. x86 SSE2 instructions

    Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.6 "SSE2 INSTRUCTIONS"

    @@ -36558,7 +36590,7 @@ pop %rbp
    -
    28.12.2.1. x86 PADDQ instruction
    +
    29.12.2.1. x86 PADDQ instruction

    userland/arch/x86_64/paddq.S: PADDQ, PADDL, PADDW, PADDB

    @@ -36568,7 +36600,7 @@ pop %rbp
    -

    28.12.3. x86 fused multiply add (FMA)

    +

    29.12.3. x86 fused multiply add (FMA)

    Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.15 "FUSED-MULTIPLY-ADD (FMA)"

    @@ -36588,12 +36620,12 @@ pop %rbp
    -

    28.13. x86 system instructions

    +

    29.13. x86 system instructions

    Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.20 "SYSTEM INSTRUCTIONS"

    -

    28.13.1. x86 RDTSC instruction

    +

    29.13.1. x86 RDTSC instruction

    Sources:

    @@ -36667,7 +36699,7 @@ pop %rbp
    -
    28.13.1.1. x86 RDTSCP instruction
    +
    29.13.1.1. x86 RDTSCP instruction

    RDTSCP is like RDTSP, but it also stores the CPU ID into ECX: this is convenient because the value of RDTSC depends on which core we are currently on, so you often also want the core ID when you want the RDTSC.

    @@ -36722,9 +36754,9 @@ taskset -c 1 ./userland/arch/x86_64/rdtscp.out | tail -n 1
    -

    28.14. x86 thread synchronization primitives

    +

    29.14. x86 thread synchronization primitives

    -

    28.14.1. x86 LOCK prefix

    +

    29.14.1. x86 LOCK prefix

    Inline assembly example at: userland/cpp/atomic/x86_64_lock_inc.cpp, see also: atomic.cpp.

    @@ -36750,11 +36782,11 @@ taskset -c 1 ./userland/arch/x86_64/rdtscp.out | tail -n 1
    -

    28.15. x86 assembly bibliography

    +

    29.15. x86 assembly bibliography

    -

    28.15.1. x86 official bibliography

    +

    29.15.1. x86 official bibliography

    -
    28.15.1.1. Intel 64 and IA-32 Architectures Software Developer’s Manuals
    +
    29.15.1.1. Intel 64 and IA-32 Architectures Software Developer’s Manuals

    We are using the May 2019 version unless otherwise noted.

    @@ -36771,25 +36803,25 @@ taskset -c 1 ./userland/arch/x86_64/rdtscp.out | tail -n 1

    Also I can’t find older versions on the website easily, so I just web archive everything.

    -

    29. ARM userland assembly

    +

    30. ARM userland assembly

    -

    Arch general getting started at: Section 27, “Userland assembly”.

    +

    Arch general getting started at: Section 28, “Userland assembly”.

    Instructions here loosely grouped based on that of the ARMv7 architecture reference manual Chapter A4 "The Instruction Sets".

    @@ -36812,7 +36844,7 @@ taskset -c 1 ./userland/arch/x86_64/rdtscp.out | tail -n 1

    We cover here mostly ARMv7, and then treat aarch64 differentially, since much of the ARMv7 userland is the same in aarch32.

    -

    29.1. Introduction to the ARM architecture

    +

    30.1. Introduction to the ARM architecture

    The ARM architecture is has been used on the vast majority of mobile phones in the 2010’s, and on a large fraction of micro controllers.

    @@ -36829,7 +36861,7 @@ taskset -c 1 ./userland/arch/x86_64/rdtscp.out | tail -n 1

    ARM Holdings was bought by the Japanese giant SoftBank in 2016.

    -

    29.1.1. ARMv8 vs ARMv7 vs AArch64 vs AArch32

    +

    30.1.1. ARMv8 vs ARMv7 vs AArch64 vs AArch32

    ARMv7 is the older architecture described at: ARMv7 architecture reference manual.

    @@ -36885,7 +36917,7 @@ taskset -c 1 ./userland/arch/x86_64/rdtscp.out | tail -n 1

    They are described at: ARMv8 architecture reference manual A1.7 "ARMv8 architecture extensions".

    -
    29.1.1.1. AArch32
    +
    30.1.1.1. AArch32

    32-bit mode of operation of ARMv8.

    @@ -36917,7 +36949,7 @@ taskset -c 1 ./userland/arch/x86_64/rdtscp.out | tail -n 1
    -
    29.1.1.2. AArch32 vs AArch64
    +
    30.1.1.2. AArch32 vs AArch64

    A great summary of differences can be found at: https://en.wikipedia.org/wiki/ARM_architecture#AArch64_features

    @@ -36927,17 +36959,17 @@ taskset -c 1 ./userland/arch/x86_64/rdtscp.out | tail -n 1
    -

    29.1.2. Free ARM implementations

    +

    30.1.2. Free ARM implementations

    The ARM instruction set is itself protected by patents / copyright / whatever, and you have to pay ARM Holdings a licence to implement it, even if you are creating your own custom Verilog code.

    @@ -36976,7 +37008,7 @@ Bibliography: -

    29.1.3. ARM instruction encodings

    +

    30.1.3. ARM instruction encodings

    Understanding the basics of instruction encodings is fundamental to help you to remember what instructions do and why some things are possible or not, notably the ARM LDR pseudo-instruction and the ADRP instruction.

    @@ -37088,7 +37120,7 @@ Bibliography: -
    29.1.3.1. ARM Thumb encoding
    +
    30.1.3.1. ARM Thumb encoding

    Thumb examples are available at:

    @@ -37147,7 +37179,7 @@ Bibliography: -
    29.1.3.2. ARM big endian mode
    +
    30.1.3.2. ARM big endian mode

    ARM can switch between big and little endian mode on the fly!

    @@ -37243,9 +37275,9 @@ Bibliography: -

    29.2. ARM branch instructions

    +

    30.2. ARM branch instructions

    -

    29.2.1. ARM B instruction

    +

    30.2.1. ARM B instruction

    Unconditional branch.

    @@ -37263,7 +37295,7 @@ Bibliography: -

    29.2.2. ARM BEQ instruction

    +

    30.2.2. ARM BEQ instruction

    Branch if equal based on the status registers.

    @@ -37307,7 +37339,7 @@ Bibliography: -

    29.2.3. ARM BL instruction

    +

    30.2.3. ARM BL instruction

    Branch with link, i.e. branch and store the return address on the RL register.

    @@ -37321,13 +37353,13 @@ Bibliography: -
    29.2.3.1. ARM BX instruction
    +
    30.2.3.1. ARM BX instruction
    -
    29.2.3.2. ARMv8 aarch64 ret instruction
    +
    30.2.3.2. ARMv8 aarch64 ret instruction
    @@ -37360,7 +37392,7 @@ Bibliography: -

    29.2.4. ARM CBZ instruction

    +

    30.2.4. ARM CBZ instruction

    Compare and branch if zero.

    @@ -37375,7 +37407,7 @@ Bibliography: -

    29.2.5. ARM conditional execution

    +

    30.2.5. ARM conditional execution

    Weirdly, ARM B instruction and family are not the only instructions that can execute conditionally on the flags: the same also applies to most instructions, e.g. ADD.

    @@ -37391,7 +37423,7 @@ Bibliography: -

    29.3. ARM load and store instructions

    +

    30.3. ARM load and store instructions

    In ARM, there are only two instruction families that do memory access:

    @@ -37415,9 +37447,9 @@ Bibliography: Load/store architecture.

    -

    29.3.1. ARM LDR instruction

    +

    30.3.1. ARM LDR instruction

    -
    29.3.1.1. ARM LDR pseudo-instruction
    +
    30.3.1.1. ARM LDR pseudo-instruction

    LDR can be either a regular instruction that loads stuff into memory, or also a pseudo-instruction (assembler magic): http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0041c/Babbfdih.html

    @@ -37451,7 +37483,7 @@ Bibliography: -
    29.3.1.2. ARM addressing modes
    +
    30.3.1.2. ARM addressing modes
    @@ -37522,7 +37554,7 @@ Bibliography: ARMv8 architecture reference manual: C1.3.3 "Load/Store addressing modes"

    -
    29.3.1.2.1. ARM loop over array
    +
    30.3.1.2.1. ARM loop over array

    As an application of the post-indexed addressing mode, let’s increment an array.

    @@ -37532,7 +37564,7 @@ Bibliography: -
    29.3.1.3. ARM LDRH and LDRB instructions
    +
    30.3.1.3. ARM LDRH and LDRB instructions

    There are LDR variants that load less than full 4 bytes:

    @@ -37559,7 +37591,7 @@ Bibliography: -

    29.3.2. ARM STR instruction

    +

    30.3.2. ARM STR instruction

    Store from memory into registers.

    @@ -37570,7 +37602,7 @@ Bibliography: ARM LDR instruction also applies here so we won’t go into much detail.

    -
    29.3.2.1. ARMv8 aarch64 STR instruction
    +
    30.3.2.1. ARMv8 aarch64 STR instruction

    PC-relative STR is not possible in aarch64.

    @@ -37588,7 +37620,7 @@ Bibliography: -
    29.3.2.2. ARMv8 aarch64 LDP and STP instructions
    +
    30.3.2.2. ARMv8 aarch64 LDP and STP instructions

    Push a pair of registers to the stack.

    @@ -37596,7 +37628,7 @@ Bibliography: lkmc/aarch64.h since it is the main way to restore register state.

    -
    29.3.2.2.1. ARMV8 aarch64 stack alignment
    +
    30.3.2.2.1. ARMV8 aarch64 stack alignment

    In ARMv8, the stack can be enforced to 16-byte alignment.

    @@ -37643,7 +37675,7 @@ Bibliography: -

    29.3.3. ARM LDMIA instruction

    +

    30.3.3. ARM LDMIA instruction

    Pop values form stack into the register and optionally update the address register.

    @@ -37693,7 +37725,7 @@ ldmia sp!, reglist
    -

    29.4. ARM data processing instructions

    +

    30.4. ARM data processing instructions

    Arithmetic:

    @@ -37717,7 +37749,7 @@ ldmia sp!, reglist
    -

    29.4.1. ARM CSET instruction

    +

    30.4.1. ARM CSET instruction

    @@ -37729,7 +37761,7 @@ ldmia sp!, reglist
    -

    29.4.2. ARM bitwise instructions

    +

    30.4.2. ARM bitwise instructions

    • @@ -37747,7 +37779,7 @@ ldmia sp!, reglist
    -
    29.4.2.1. ARM BIC instruction
    +
    30.4.2.1. ARM BIC instruction

    Bitwise Bit Clear: clear some bits.

    @@ -37761,7 +37793,7 @@ ldmia sp!, reglist
    -
    29.4.2.2. ARM UBFM instruction
    +
    30.4.2.2. ARM UBFM instruction

    Unsigned Bitfield Move.

    @@ -37779,7 +37811,7 @@ ldmia sp!, reglist

    TODO: explain full behaviour. Very complicated. Has several simpler to understand aliases.

    -
    29.4.2.2.1. ARM UBFX instruction
    +
    30.4.2.2.1. ARM UBFX instruction

    Alias for:

    @@ -37813,12 +37845,12 @@ ldmia sp!, reglist
    -
    29.4.2.3. ARM BFM instruction
    +
    30.4.2.3. ARM BFM instruction

    TODO: explain. Similar to UBFM but leave untouched bits unmodified.

    -
    29.4.2.3.1. ARM BFI instruction
    +
    30.4.2.3.1. ARM BFI instruction

    Examples:

    @@ -37849,12 +37881,12 @@ ldmia sp!, reglist
    -

    29.4.3. ARM MOV instruction

    +

    30.4.3. ARM MOV instruction

    Move an immediate to a register, or a register to another register.

    -

    Cannot load from or to memory, since only the LDR and STR instruction families can do that in ARM as mentioned at: Section 29.3, “ARM load and store instructions”.

    +

    Cannot load from or to memory, since only the LDR and STR instruction families can do that in ARM as mentioned at: Section 30.3, “ARM load and store instructions”.

    Example: userland/arch/arm/mov.S

    @@ -37915,7 +37947,7 @@ ldmia sp!, reglist

    Assemblers however support magic memory allocations which may hide what is truly going on: https://stackoverflow.com/questions/14046686/why-use-ldr-over-mov-or-vice-versa-in-arm-assembly Always ask your friendly disassembly for a good confirmation.

    -
    29.4.3.1. ARM movw and movt instructions
    +
    30.4.3.1. ARM movw and movt instructions

    Set the higher or lower 16 bits of a register to an immediate in one go.

    @@ -37927,7 +37959,7 @@ ldmia sp!, reglist
    -
    29.4.3.2. ARMv8 aarch64 movk instruction
    +
    30.4.3.2. ARMv8 aarch64 movk instruction

    Fill a 64 bit register with 4 16-bit instructions one at a time.

    @@ -37942,7 +37974,7 @@ ldmia sp!, reglist
    -
    29.4.3.3. ARMv8 aarch64 movn instruction
    +
    30.4.3.3. ARMv8 aarch64 movn instruction

    Set 16-bits negated and the rest to 1.

    @@ -37952,9 +37984,9 @@ ldmia sp!, reglist
    -

    29.4.4. ARM data processing instruction suffixes

    +

    30.4.4. ARM data processing instruction suffixes

    -
    29.4.4.1. ARM shift suffixes
    +
    30.4.4.1. ARM shift suffixes

    Most data processing instructions can also optionally shift the second register operand.

    @@ -37982,7 +38014,7 @@ ldmia sp!, reglist
    -
    29.4.4.2. ARM S suffix
    +
    30.4.4.2. ARM S suffix

    Example: userland/arch/arm/s_suffix.S

    @@ -37998,7 +38030,7 @@ ldmia sp!, reglist
    -

    29.4.5. ARM ADR instruction

    +

    30.4.5. ARM ADR instruction

    Similar rationale to the ARM LDR pseudo-instruction, allowing to easily store a PC-relative reachable address into a register in one go, to overcome the 4-byte fixed instruction size.

    @@ -38022,19 +38054,19 @@ ldmia sp!, reglist

    More details: https://stackoverflow.com/questions/41906688/what-are-the-semantics-of-adrp-and-adrl-instructions-in-arm-assembly/54042899#54042899

    -
    29.4.5.1. ARM ADRL instruction
    +
    30.4.5.1. ARM ADRL instruction
    -

    See: Section 29.4.5, “ARM ADR instruction”.

    +

    See: Section 30.4.5, “ARM ADR instruction”.

    -

    29.5. ARM miscellaneous instructions

    +

    30.5. ARM miscellaneous instructions

    -

    29.5.1. ARM NOP instruction

    +

    30.5.1. ARM NOP instruction

    There are a few different ways to encode NOP, notably MOV a register into itself, and a dedicated miscellaneous instruction.

    @@ -38055,7 +38087,7 @@ ldmia sp!, reglist
    -

    29.5.2. ARM UDF instruction

    +

    30.5.2. ARM UDF instruction

    Guaranteed undefined! Therefore raise illegal instruction signal. Used by GCC __builtin_trap apparently: https://stackoverflow.com/questions/16081618/programmatically-cause-undefined-instruction-exception

    @@ -38074,7 +38106,7 @@ ldmia sp!, reglist
    -

    29.5.3. ARM system register instructions

    +

    30.5.3. ARM system register instructions

    Examples of using them can be found at: dump_regs

    @@ -38181,7 +38213,7 @@ dc isw
    -
    29.5.3.1. ARM system register encodings
    +
    30.5.3.1. ARM system register encodings

    Each aarch64 system register is specified in the encoding of ARM system register instructions by 5 integer numbers:

    @@ -38227,12 +38259,12 @@ LKMC_DUMP_SYSTEM_REGS_PRINTF("ID_ISAR6_EL1 0x%" PRIX32 "\n", id_isar6_el1);
    -

    29.6. ARM SIMD

    +

    30.6. ARM SIMD

    -

    Parent section: Section 27.3, “SIMD assembly”

    +

    Parent section: Section 28.3, “SIMD assembly”

    -

    29.6.1. ARM VFP

    +

    30.6.1. ARM VFP

    The name for the ARMv7 and AArch32 floating point and SIMD instructions / registers.

    @@ -38278,7 +38310,7 @@ LKMC_DUMP_SYSTEM_REGS_PRINTF("ID_ISAR6_EL1 0x%" PRIX32 "\n", id_isar6_el1);
    -
    29.6.1.1. ARM VFP registers
    +
    30.6.1.1. ARM VFP registers

    TODO example

    @@ -38314,20 +38346,20 @@ LKMC_DUMP_SYSTEM_REGS_PRINTF("ID_ISAR6_EL1 0x%" PRIX32 "\n", id_isar6_el1);
    -
    29.6.1.2. ARM VADD instruction
    +
    30.6.1.2. ARM VADD instruction
    -
    29.6.1.3. ARM VCVT instruction
    +
    30.6.1.3. ARM VCVT instruction
    @@ -38356,7 +38388,7 @@ LKMC_DUMP_SYSTEM_REGS_PRINTF("ID_ISAR6_EL1 0x%" PRIX32 "\n", id_isar6_el1);
    -
    29.6.1.3.1. ARM VCVTR instruction
    +
    30.6.1.3.1. ARM VCVTR instruction
    @@ -38374,7 +38406,7 @@ LKMC_DUMP_SYSTEM_REGS_PRINTF("ID_ISAR6_EL1 0x%" PRIX32 "\n", id_isar6_el1);
    -
    29.6.1.3.2. ARMv8 AArch32 VCVTA instruction
    +
    30.6.1.3.2. ARMv8 AArch32 VCVTA instruction
    @@ -38394,7 +38426,7 @@ LKMC_DUMP_SYSTEM_REGS_PRINTF("ID_ISAR6_EL1 0x%" PRIX32 "\n", id_isar6_el1);
    -

    29.6.2. ARMv8 Advanced SIMD and floating-point support

    +

    30.6.2. ARMv8 Advanced SIMD and floating-point support

    The ARMv8 architecture reference manual specifies floating point and SIMD support in the main architecture at A1.5 "Advanced SIMD and floating-point support".

    @@ -38402,13 +38434,13 @@ LKMC_DUMP_SYSTEM_REGS_PRINTF("ID_ISAR6_EL1 0x%" PRIX32 "\n", id_isar6_el1);The feature is often refered to simply as "SIMD&FP" throughout the manual.

    -

    The Linux kernel shows /proc/cpuinfo compatibility as neon, which is yet another intermediate name that came up at some point, see: Section 29.6.2.2, “ARM NEON”.

    +

    The Linux kernel shows /proc/cpuinfo compatibility as neon, which is yet another intermediate name that came up at some point, see: Section 30.6.2.2, “ARM NEON”.

    Vs ARM VFP: https://stackoverflow.com/questions/4097034/arm-cortex-a8-whats-the-difference-between-vfp-and-neon

    -
    29.6.2.1. ARMv8 floating point availability
    +
    30.6.2.1. ARMv8 floating point availability

    Support is semi-mandatory. ARMv8 architecture reference manual A1.5 "Advanced SIMD and floating-point support":

    @@ -38445,7 +38477,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -
    29.6.2.2. ARM NEON
    +
    30.6.2.2. ARM NEON

    Just an informal name for the "Advanced SIMD instructions"? Very confusing.

    @@ -38472,7 +38504,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -

    29.6.3. ARMv8 AArch64 floating point registers

    +

    30.6.3. ARMv8 AArch64 floating point registers

    TODO example.

    @@ -38527,7 +38559,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -
    29.6.3.1. ARMv8 aarch64 add vector instruction
    +
    30.6.3.1. ARMv8 aarch64 add vector instruction

    userland/arch/aarch64/add_vector.S

    @@ -38536,21 +38568,21 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -
    29.6.3.2. ARMv8 aarch64 FADD instruction
    +
    30.6.3.2. ARMv8 aarch64 FADD instruction
    -
    29.6.3.2.1. ARM FADD vs VADD
    +
    30.6.3.2.1. ARM FADD vs VADD
    -

    It is very confusing, but FADDS and FADDD in Aarch32 are pre-UAL for vadd.f32 and vadd.f64 which we use in this tutorial, see: Section 29.6.1.2, “ARM VADD instruction”

    +

    It is very confusing, but FADDS and FADDD in Aarch32 are pre-UAL for vadd.f32 and vadd.f64 which we use in this tutorial, see: Section 30.6.1.2, “ARM VADD instruction”

    The same goes for most ARMv7 mnemonics: f* is old, and v* is the newer better syntax.

    @@ -38562,12 +38594,12 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    Also keep in mind that fused multiply add is FMADD.

    -
    29.6.3.3. ARMv8 aarch64 LD2 instruction
    +
    30.6.3.3. ARMv8 aarch64 LD2 instruction

    Example: userland/arch/aarch64/ld2.S

    @@ -38583,7 +38615,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -

    29.6.4. ARM SIMD bibliography

    +

    30.6.4. ARM SIMD bibliography

    -

    29.6.5. ARM SVE

    +

    30.6.5. ARM SVE

    Scalable Vector Extension.

    @@ -38661,7 +38693,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    Using SVE normally requires setting the CPACR_EL1.FPEN and ZEN bits, which as as of lkmc 29fd625f3fda79f5e0ee6cac43517ba74340d513 + 1 we also enable in our Baremetal bootloaders, see also: aarch64 baremetal NEON setup.

    -
    29.6.5.1. ARM SVE VADDL instruction
    +
    30.6.5.1. ARM SVE VADDL instruction

    Get the SVE vector length. The following programs do that and print it to stdout:

    @@ -38677,7 +38709,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -
    29.6.5.2. Change ARM SVE vector length in emulators
    +
    30.6.5.2. Change ARM SVE vector length in emulators

    gem5 covered at: https://stackoverflow.com/questions/57692765/how-to-change-the-gem5-arm-sve-vector-length

    @@ -38714,7 +38746,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -
    29.6.5.3. SVE bibliography
    +
    30.6.5.3. SVE bibliography
    -
    29.6.5.3.1. SVE spec
    +
    30.6.5.3.1. SVE spec

    ARMv8 architecture reference manual A1.7 "ARMv8 architecture extensions" says:

    @@ -38754,12 +38786,12 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -

    29.7. ARM thread synchronization primitives

    +

    30.7. ARM thread synchronization primitives

    Parent section: Userland multithreading.

    -

    29.7.1. ARM LDXR and STXR instructions

    +

    30.7.1. ARM LDXR and STXR instructions

    Parent section: atomic.cpp

    @@ -38809,7 +38841,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -

    29.7.2. ARM Large System Extensions (LSE)

    +

    30.7.2. ARM Large System Extensions (LSE)

    Set of atomic and synchronization primitives added in ARMv8.1 architecture extension.

    @@ -38836,9 +38868,9 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -

    29.8. ARMv8 architecture extensions

    +

    30.8. ARMv8 architecture extensions

    -

    29.8.1. ARMv8.1 architecture extension

    +

    30.8.1. ARMv8.1 architecture extension

    ARMv8 architecture reference manual db A1.7.3 "The ARMv8.1 architecture extension"

    @@ -38852,7 +38884,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -

    29.9. ARM PMU

    +

    30.9. ARM PMU

    The PMU (Performance Monitor Unit) is an unit in the ARM CPU that counts performance events of interest. These can be used to benchmark, and sometimes debug, code running on ARM CPUs.

    @@ -38907,7 +38939,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    Bibliography: https://community.arm.com/developer/ip-products/system/b/embedded-blog/posts/using-the-arm-performance-monitor-unit-pmu-linux-driver

    -

    29.9.1. ARM PMCCNTR register

    +

    30.9.1. ARM PMCCNTR register

    TODO We didn’t manage to find a working ARM analogue to x86 RDTSC instruction: kernel_modules/pmccntr.c is oopsing, and even it if weren’t, it likely won’t give the cycle count since boot since it needs to be activate before it starts counting anything:

    @@ -38927,9 +38959,9 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -

    29.10. ARM assembly bibliography

    +

    30.10. ARM assembly bibliography

    -

    29.10.1. ARM non-official bibliography

    +

    30.10.1. ARM non-official bibliography

    Good getting started tutorials:

    @@ -38951,7 +38983,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -

    29.10.2. ARM official bibliography

    +

    30.10.2. ARM official bibliography

    The official manuals were stored in http://infocenter.arm.com but as of 2017 they started to slowly move to https://developer.arm.com.

    @@ -38965,7 +38997,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    Bibliography: https://www.quora.com/Where-can-I-find-the-official-documentation-of-ARM-instruction-set-architectures-ISAs

    -
    29.10.2.1. ARMv7 architecture reference manual
    +
    30.10.2.1. ARMv7 architecture reference manual

    https://developer.arm.com/products/architecture/a-profile/docs/ddi0406/latest/arm-architecture-reference-manual-armv7-a-and-armv7-r-edition

    @@ -38977,7 +39009,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -
    29.10.2.2. ARMv8 architecture reference manual
    +
    30.10.2.2. ARMv8 architecture reference manual

    https://static.docs.arm.com/ddi0487/ca/DDI0487C_a_armv8_arm.pdf

    @@ -39033,19 +39065,19 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -
    29.10.2.3. ARMv8 architecture reference manual db
    +
    30.10.2.3. ARMv8 architecture reference manual db

    https://static.docs.arm.com/ddi0487/db/DDI0487D_b_armv8_arm.pdf

    -
    29.10.2.4. ARMv8 architecture reference manual db
    +
    30.10.2.4. ARMv8 architecture reference manual db

    https://static.docs.arm.com/ddi0487/fa/DDI0487F_a_armv8_arm.pdf

    -
    29.10.2.5. Programmer’s Guide for ARMv8-A
    +
    30.10.2.5. Programmer’s Guide for ARMv8-A

    https://static.docs.arm.com/den0024/a/DEN0024A_v8_architecture_PG.pdf

    @@ -39060,7 +39092,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -
    29.10.2.6. Arm A64 Instruction Set Architecture: Future Architecture Technologies in the A architecture profile Documentation
    +
    30.10.2.6. Arm A64 Instruction Set Architecture: Future Architecture Technologies in the A architecture profile Documentation

    https://developer.arm.com/docs/ddi0602/b

    @@ -39069,7 +39101,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -
    29.10.2.7. ARM processor documentation
    +
    30.10.2.7. ARM processor documentation

    ARM also releases documentation specific to each given processor.

    @@ -39093,7 +39125,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -
    29.10.2.7.1. ARM Cortex-A15 MPCore Processor Technical Reference Manual r4p0
    +
    30.10.2.7.1. ARM Cortex-A15 MPCore Processor Technical Reference Manual r4p0

    http://infocenter.arm.com/help/topic/com.arm.doc.ddi0438i/DDI0438I_cortex_a15_r4p0_trm.pdf

    @@ -39103,13 +39135,13 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -
    29.10.2.8. Arm Cortex‑A77 Technical Reference Manual r1p1
    +
    30.10.2.8. Arm Cortex‑A77 Technical Reference Manual r1p1

    https://static.docs.arm.com/101111/0101/arm_cortex_a77_trm_101111_0101_04_en.pdf

    -
    29.10.2.9. Arm Cortex‑A77 Software Optimization Guide r1p1
    +
    30.10.2.9. Arm Cortex‑A77 Software Optimization Guide r1p1

    https://static.docs.arm.com/swog011050/c/Arm_Cortex-A77_Software_Optimization_Guide.pdf

    @@ -39119,7 +39151,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -

    30. ELF

    +

    31. ELF

    https://en.wikipedia.org/wiki/Executable_and_Linkable_Format

    @@ -39133,7 +39165,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -

    31. IEEE 754

    +

    32. IEEE 754

    https://en.wikipedia.org/wiki/IEEE_754

    @@ -39163,15 +39195,15 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -

    32. Baremetal

    +

    33. Baremetal

    -

    32.1. Baremetal GDB step debug

    +

    33.1. Baremetal GDB step debug

    -

    GDB step debug works on baremetal exactly as it does on the Linux kernel, which is described at: Section 2, “GDB step debug”.

    +

    GDB step debug works on baremetal exactly as it does on the Linux kernel, which is described at: Section 3, “GDB step debug”.

    Except that is is even cooler here since we can easily control and understand every single instruction that is being run!

    @@ -39240,7 +39272,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    -

    32.2. Baremetal bootloaders

    +

    33.2. Baremetal bootloaders

    As can be seen from Baremetal GDB step debug, all examples under baremetal/, with the exception of baremetal/arch/<arch>/no_bootloader, start from our tiny bootloaders:

    @@ -39276,7 +39308,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    the stack pointer

  • -

    NEON: Section 32.11.2, “aarch64 baremetal NEON setup”

    +

    NEON: Section 33.11.2, “aarch64 baremetal NEON setup”

  • TODO: we don’t do this currently but maybe we should setup BSS

    @@ -39304,7 +39336,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

  • -

    32.3. Baremetal linker script

    +

    33.3. Baremetal linker script

    For things to work in baremetal, we often have to layout memory in specific ways.

    @@ -39333,7 +39365,7 @@ lkmc_heap_top = .;
    -

    32.4. Baremetal command line arguments

    +

    33.4. Baremetal command line arguments

    QEMU and gem5 currently supports baremetal CLI arguments!

    @@ -39382,7 +39414,7 @@ cc

    It is worth noting that e.g. ARM has a Semihosting mechanism for loading CLI arguments through SYS_GET_CMDLINE, but our mechanism works in principle for any ISA.

    -

    32.4.1. gem5 baremetal arm CLI args

    +

    33.4.1. gem5 baremetal arm CLI args

    Currently not supported, so we just hardcode argc 0 on the arm baremetal bootloader.

    @@ -39392,7 +39424,7 @@ cc
    -

    32.5. Semihosting

    +

    33.5. Semihosting

    Semihosting is a publicly documented interface specified by ARM Holdings that allows us to do some magic operations very useful in development, such as writting to the terminal or reading and writing host files.

    @@ -39510,7 +39542,7 @@ svc 0x00123456
    -

    32.5.1. gem5 semihosting

    +

    33.5.1. gem5 semihosting

    @@ -39525,7 +39557,7 @@ svc 0x00123456
    -

    32.6. gem5 baremetal carriage return

    +

    33.6. gem5 baremetal carriage return

    TODO: our example is printing newlines without automatic carriage return \r as in:

    @@ -39548,7 +39580,7 @@ svc 0x00123456
    -

    32.7. Baremetal host packaged toolchain

    +

    33.7. Baremetal host packaged toolchain

    For arm, some baremetal examples compile fine with:

    @@ -39584,13 +39616,13 @@ collect2: error: ld returned 1 exit status
    -

    32.8. Baremetal C++

    +

    33.8. Baremetal C++

    Didn’t get it working, traking at: https://github.com/cirosantilli/linux-kernel-module-cheat/issues/119

    -

    32.9. GDB builtin CPU simulator

    +

    33.9. GDB builtin CPU simulator

    It is incredible, but GDB also has a CPU simulator inside of it as documented at: https://sourceware.org/gdb/onlinedocs/gdb/Target-Commands.html

    @@ -39650,7 +39682,7 @@ starti
    -

    32.9.1. GDB builtin CPU simulator userland

    +

    33.9.1. GDB builtin CPU simulator userland

    Since I had this compiled, I also decided to try it out on userland.

    @@ -39685,7 +39717,7 @@ starti
    -

    32.10. ARM baremetal

    +

    33.10. ARM baremetal

    In this section we will focus on learning ARM architecture concepts that can only learnt on baremetal setups.

    @@ -39693,7 +39725,7 @@ starti

    Userland information can be found at: https://github.com/cirosantilli/arm-assembly-cheat

    -

    32.10.1. ARM exception levels

    +

    33.10.1. ARM exception levels

    ARM exception levels are analogous to x86 rings.

    @@ -39822,13 +39854,13 @@ CurrentEL.EL 0x3

    According to ARMv7 architecture reference manual, access to that register is controlled by other registers NSACR.{CP11, CP10} and HCPTR so those must be turned off, but I’m lazy to investigate now, even just trying to dump those registers in userland/arch/arm/dump_regs.c also leads to exceptions…​

    -
    32.10.1.1. ARM change exception level
    +
    33.10.1.1. ARM change exception level

    TODO. Create a minimal runnable example of going into EL0 and jumping to EL1.

    -
    32.10.1.2. ARM SP0 vs SPx
    +
    33.10.1.2. ARM SP0 vs SPx

    See ARMv8 architecture reference manual db D1.6.2 "The stack pointer registers".

    @@ -39847,7 +39879,7 @@ CurrentEL.EL 0x3
    -

    32.10.2. ARM SVC instruction

    +

    33.10.2. ARM SVC instruction

    This is the most basic example of exception handling we have.

    @@ -40196,7 +40228,7 @@ IN: main
    -
    32.10.2.1. ARMv8 exception vector table format
    +
    33.10.2.1. ARMv8 exception vector table format

    The vector table format is described on ARMv8 architecture reference manual Table D1-7 "Vector offsets from vector table base address".

    @@ -40336,29 +40368,29 @@ IN: main
    -
    32.10.2.2. ARM ESR register
    +
    33.10.2.2. ARM ESR register

    Exception Syndrome Register.

    -

    See example at: Section 32.10.2, “ARM SVC instruction”

    +

    See example at: Section 33.10.2, “ARM SVC instruction”

    Documentation: ARMv8 architecture reference manual db D12.2.36 "ESR_EL1, Exception Syndrome Register (EL1)".

    -
    32.10.2.3. ARM ELR register
    +
    33.10.2.3. ARM ELR register

    Exception Link Register.

    -

    See the example at: Section 32.10.2, “ARM SVC instruction”

    +

    See the example at: Section 33.10.2, “ARM SVC instruction”

    -

    32.10.3. ARM baremetal multicore

    +

    33.10.3. ARM baremetal multicore

    Examples:

    @@ -40431,13 +40463,13 @@ IN: main

    since gem5 is able to detect when nothing will ever happen, and exits.

    -

    When GDB step debugging, switch between cores with the usual thread commands, see also: Section 2.9, “GDB step debug multicore userland”.

    +

    When GDB step debugging, switch between cores with the usual thread commands, see also: Section 3.9, “GDB step debug multicore userland”.

    Bibliography: https://stackoverflow.com/questions/980999/what-does-multicore-assembly-language-look-like/33651438#33651438

    -
    32.10.3.1. ARM WFE and SEV instructions
    +
    33.10.3.1. ARM WFE and SEV instructions

    The WFE and SEV instructions are just hints: a compliant implementation can treat them as NOPs.

    @@ -40590,7 +40622,7 @@ IN: main

    For how userland spinlocks and mutexes are implemented see Userland mutex implementation.

    -
    32.10.3.1.1. ARM WFE global monitor events
    +
    33.10.3.1.1. ARM WFE global monitor events

    Examples:

    @@ -40630,7 +40662,7 @@ IN: main
    -
    32.10.3.1.2. WFE from userland
    +
    33.10.3.1.2. WFE from userland

    WFE and SEV are usable from userland, and are part of an efficient spinlock implementation (which userland should arguably stay away from and rather use the futex system call which allow for non busy sleep instead), which maybe is not something that userland should ever tho and just stick to mutexes?

    @@ -40737,7 +40769,7 @@ IN: main
    -
    32.10.3.1.3. ARMv8 spinlock pattern
    +
    33.10.3.1.3. ARMv8 spinlock pattern

    http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka16277.html

    @@ -40756,7 +40788,7 @@ IN: main
    -
    32.10.3.1.4. gem5 ARM WFE
    +
    33.10.3.1.4. gem5 ARM WFE

    gem5 390a74f59934b85d91489f8a563450d8321b602d does not sleep on the first WFE on either syscall emulation or full system, because the code does:

    @@ -40798,14 +40830,14 @@ IN: main
    -
    32.10.3.1.5. ARM YIELD instruction
    +
    33.10.3.1.5. ARM YIELD instruction

    https://stackoverflow.com/questions/59311066/how-does-the-arm-yield-instruction-inform-other-threads-that-they-could-start-a

    -
    32.10.3.2. ARM LDAXR and STLXR instructions
    +
    33.10.3.2. ARM LDAXR and STLXR instructions

    Can be used to implement atomic variables, see also:

    @@ -40824,7 +40856,7 @@ IN: main
    -
    32.10.3.3. ARM PSCI
    +
    33.10.3.3. ARM PSCI

    In QEMU, CPU 1 starts in a halted state. This can be observed from GDB, where:

    @@ -40874,14 +40906,14 @@ IN: main
    -
    32.10.3.4. ARM DMB instruction
    +
    33.10.3.4. ARM DMB instruction

    TODO: create and study a minimal examples in gem5 where the DMB instruction leads to less cycles: https://stackoverflow.com/questions/15491751/real-life-use-cases-of-barriers-dsb-dmb-isb-in-arm

    -

    32.10.4. ARM timer

    +

    33.10.4. ARM timer

    The ARM timer is the simplest way to generate hardware interrupts periodically, and therefore serves as the simples example of ARM GIC usage.

    @@ -41034,7 +41066,7 @@ cntvct_el0 0x3CF516F
    -

    32.10.5. ARM GIC

    +

    33.10.5. ARM GIC

    Generic Interrupt Controller.

    @@ -41076,7 +41108,7 @@ cntvct_el0 0x3CF516F
    -

    32.10.6. ARM paging

    +

    33.10.6. ARM paging

    TODO create a minimal working aarch64 example analogous to the x86 one at: https://github.com/cirosantilli/x86-bare-metal-examples/blob/6dc9a73830fc05358d8d66128f740ef9906f7677/paging.S

    @@ -41109,9 +41141,9 @@ cntvct_el0 0x3CF516F
    -

    32.10.7. ARM baremetal bibliography

    +

    33.10.7. ARM baremetal bibliography

    -

    First, also consider the userland bibliography: Section 29.10, “ARM assembly bibliography”.

    +

    First, also consider the userland bibliography: Section 30.10, “ARM assembly bibliography”.

    The most useful ARM baremetal example sets we’ve seen so far are:

    @@ -41136,7 +41168,7 @@ cntvct_el0 0x3CF516F
    -
    32.10.7.1. NienfengYao/armv8-bare-metal
    +
    33.10.7.1. NienfengYao/armv8-bare-metal
    @@ -41195,7 +41227,7 @@ cntvct_el0 0x3CF516F
    -
    32.10.7.2. tukl-msd/gem5.bare-metal
    +
    33.10.7.2. tukl-msd/gem5.bare-metal

    https://github.com/tukl-msd/gem5.bare-metal

    @@ -41237,7 +41269,7 @@ make CROSS_COMPILE_DIR=/usr/bin
    -

    32.11. How we got some baremetal stuff to work

    +

    33.11. How we got some baremetal stuff to work

    It is nice when thing just work.

    @@ -41245,7 +41277,7 @@ make CROSS_COMPILE_DIR=/usr/bin

    But you can also learn a thing or two from how I actually made them work in the first place.

    -

    32.11.1. Find the UART address

    +

    33.11.1. Find the UART address

    Enter the QEMU console:

    @@ -41281,7 +41313,7 @@ make CROSS_COMPILE_DIR=/usr/bin
    -

    32.11.2. aarch64 baremetal NEON setup

    +

    33.11.2. aarch64 baremetal NEON setup

    Inside baremetal/lib/aarch64.S there is a chunk of code that enables floating point operations:

    @@ -41405,7 +41437,7 @@ ISB
    -

    32.12. Baremetal tests

    +

    33.12. Baremetal tests

    Baremetal tests work exactly like User mode tests, except that you have to add the --mode baremetal option, for example:

    @@ -41418,13 +41450,13 @@ ISB

    In baremetal, we detect if tests failed by parsing logs for the Magic failure string.

    -

    See: Section 37.16, “Test this repo” for more useful testing tips.

    +

    See: Section 38.16, “Test this repo” for more useful testing tips.

    -

    33. Android

    +

    34. Android

    Remember: Android AOSP is a huge undocumented piece of bloatware. It’s integration into this repo will likely never be super good. See also: https://cirosantilli.com#android

    @@ -41472,7 +41504,7 @@ ISB

    Tested on: 8.1.0_r60.

    -

    33.1.1. Android images read-only

    +

    34.1.1. Android images read-only

    From mount, we can see that some of the mounted images are ro.

    @@ -41629,7 +41661,7 @@ date >/system/a
    -

    33.1.2. Android /data partition

    +

    34.1.2. Android /data partition

    When I install an app like F-Droid, it goes under /data according to:

    @@ -41690,7 +41722,7 @@ date >/system/a
    -

    33.2. Install Android apps

    +

    34.2. Install Android apps

    I don’t know how to download files from the web on Vanilla android, the default browser does not download anything, and there is no wget:

    @@ -41740,9 +41772,9 @@ date >/system/a
    -

    33.3. Android init

    +

    34.3. Android init

    -

    For Linux in general, see: Section 6, “init”.

    +

    For Linux in general, see: Section 7, “init”.

    The /init executable interprets the /init.rc files, which is in a custom Android init system language: https://android.googlesource.com/platform/system/core/+/ee0e63f71d90537bb0570e77aa8a699cc222cfaf/init/README.md

    @@ -41789,7 +41821,7 @@ import /init.${ro.zygote}.rc
    -

    34. Benchmark this repo

    +

    35. Benchmark this repo

    TODO: didn’t fully port during refactor after 3b0a343647bed577586989fb702b760bd280844a. Reimplementing should not be hard.

    @@ -41818,7 +41850,7 @@ cd -
    -

    34.1. Continuous integration

    +

    35.1. Continuous integration

    We have explored a few Continuous integration solutions.

    @@ -41826,13 +41858,13 @@ cd -

    We haven’t setup any of them yet.

    -

    34.1.1. Travis

    +

    35.1.1. Travis

    We tried to automate it on Travis with .travis.yml but it hits the current 50 minute job timeout: https://travis-ci.org/cirosantilli/linux-kernel-module-cheat/builds/296454523 And I bet it would likely hit a disk maxout either way if it went on.

    -

    34.1.2. CircleCI

    +

    35.1.2. CircleCI

    This setup successfully built gem5 on every commit: .circleci/config.yml

    @@ -41861,9 +41893,9 @@ cd -
    -

    34.2. Benchmark this repo benchmarks

    +

    35.2. Benchmark this repo benchmarks

    -

    34.2.1. Benchmark Linux kernel boot

    +

    35.2.1. Benchmark Linux kernel boot

    Run all kernel boot benchmarks for one arch:

    @@ -41993,7 +42025,7 @@ instructions 124346081
    -
    34.2.1.1. gem5 arm HPI boot takes much longer than aarch64
    +
    35.2.1.1. gem5 arm HPI boot takes much longer than aarch64

    TODO 62f6870e4e0b384c4bd2d514116247e81b241251 takes 33 minutes to finish at 62f6870e4e0b384c4bd2d514116247e81b241251:

    @@ -42019,7 +42051,7 @@ instructions 124346081
    -
    34.2.1.2. gem5 x86_64 DerivO3CPU boot panics
    +
    35.2.1.2. gem5 x86_64 DerivO3CPU boot panics

    https://github.com/cirosantilli2/gem5-issues/issues/2

    @@ -42031,7 +42063,7 @@ instructions 124346081
    -

    34.2.2. Benchmark emulators on userland executables

    +

    35.2.2. Benchmark emulators on userland executables

    Let’s see how fast our simulators are running some well known or easy to understand userland benchmarks!

    @@ -42326,7 +42358,7 @@ instructions 124346081

    so ~ 110 million instructions / 100 seconds makes ~ 1 MIPS (million instructions per second).

    -

    This experiment also suggests that each loop is about 11 instructions long (110M instructions / 10M loops), which we confirm at Section 35.2, “C busy loop”, bingo!

    +

    This experiment also suggests that each loop is about 11 instructions long (110M instructions / 10M loops), which we confirm at Section 36.2, “C busy loop”, bingo!

    Then for QEMU, we experimentally turn the number of loops up to 10^10 loops (100000 100000), which contains an expected 11 * 10^10 instructions, and the runtime is 00:01:08, so we have 1.1 * 10^11 instruction / 68 seconds ~ 2 * 10^9 = 2000 MIPS!

    @@ -42335,12 +42367,12 @@ instructions 124346081

    We can then repeat the experiment for other gem5 CPUs to see how they compare.

    -
    34.2.2.1. User mode vs full system benchmark
    +
    35.2.2.1. User mode vs full system benchmark

    Let’s see if user mode runs considerably faster than full system or not, ignoring the kernel boot.

    -

    First we build dhrystonee manually statically since dynamic linking is broken in gem5 as explained at: Section 10.7, “gem5 syscall emulation mode”.

    +

    First we build dhrystonee manually statically since dynamic linking is broken in gem5 as explained at: Section 11.7, “gem5 syscall emulation mode”.

    gem5 user mode:

    @@ -42419,7 +42451,7 @@ time \
    -

    34.2.3. Benchmark builds

    +

    35.2.3. Benchmark builds

    The build times are calculated after doing ./configure and make source, which downloads the sources, and basically benchmarks the Internet.

    @@ -42444,7 +42476,7 @@ cat ../linux-kernel-module-cheat-regression/*/build-time.log
    -
    34.2.3.1. Find which Buildroot packages are making the build slow and big
    +
    35.2.3.1. Find which Buildroot packages are making the build slow and big
    ./build-buildroot -- graph-build graph-size graph-depends
    @@ -42455,14 +42487,14 @@ xdg-open graph-size.pdf
    -
    34.2.3.1.1. Buildroot use prebuilt host toolchain
    +
    35.2.3.1.1. Buildroot use prebuilt host toolchain

    The biggest build time hog is always GCC, and it does not look like we can use a precompiled one: https://stackoverflow.com/questions/10833672/buildroot-environment-with-host-toolchain

    -
    34.2.3.2. Benchmark Buildroot build baseline
    +
    35.2.3.2. Benchmark Buildroot build baseline

    This is the minimal build we could expect to get away with.

    @@ -42530,7 +42562,7 @@ xdg-open graph-size.pdf
    -
    34.2.3.3. Benchmark gem5 build
    +
    35.2.3.3. Benchmark gem5 build

    How long it takes to build gem5 itself.

    @@ -42562,7 +42594,7 @@ tail -n+1 ../linux-kernel-module-cheat-regression/*/gem5-bench-build-*.txt

    A profiling of the build has been done at: https://gem5.atlassian.net/browse/GEM5-277 Analysis there showed that d7d9bc240615625141cd6feddbadd392457e49eb (2018-06-17) is also composed of 50% pybind11 and with no obvious time sinks.

    -
    34.2.3.3.1. pybind11 accounts for 50% of gem5 build time
    +
    35.2.3.3.1. pybind11 accounts for 50% of gem5 build time

    https://gem5.atlassian.net/browse/GEM5-366

    @@ -42574,7 +42606,7 @@ tail -n+1 ../linux-kernel-module-cheat-regression/*/gem5-bench-build-*.txt
    -
    34.2.3.3.2. Benchmark gem5 single file change rebuild time
    +
    35.2.3.3.2. Benchmark gem5 single file change rebuild time

    This is the critical development parameter, and is dominated by the link time of huge binaries.

    @@ -42710,9 +42742,9 @@ tail -n+1 ../linux-kernel-module-cheat-regression/*/gem5-bench-build-*.txt
    -

    34.3. Benchmark machines

    +

    35.3. Benchmark machines

    -

    34.3.1. 2017 Lenovo ThinkPad P51

    +

    35.3.1. 2017 Lenovo ThinkPad P51

    Serial number: TYPE 20HH-CTO1WW S/N PF-0V5V5N 17/11

    @@ -42818,11 +42850,11 @@ tail -n+1 ../linux-kernel-module-cheat-regression/*/gem5-bench-build-*.txt
    -
    34.3.1.1. P51 benchmarks
    +
    35.3.1.1. P51 benchmarks
    -
    34.3.1.2. P51 maintenance history
    +
    35.3.1.2. P51 maintenance history

    Bought: 2017 for approximately 2400 pounds.

    @@ -42880,7 +42912,7 @@ tail -n+1 ../linux-kernel-module-cheat-regression/*/gem5-bench-build-*.txt
    -
    34.3.1.3. Intel Core i7-7820HQ CPU
    +
    35.3.1.3. Intel Core i7-7820HQ CPU

    https://ark.intel.com/products/97496/Intel-Core-i7-7820HQ-Processor-8M-Cache-up-to-3-90-GHz- (archive).

    @@ -42962,7 +42994,7 @@ LEVEL4_CACHE_LINESIZE 0
    -
    34.3.1.4. Samsung M471A2K43BB1-CRC 16GB DRAM
    +
    35.3.1.4. Samsung M471A2K43BB1-CRC 16GB DRAM

    Nominal speed: 2400 Mbps

    @@ -42977,7 +43009,7 @@ LEVEL4_CACHE_LINESIZE 0
    -
    34.3.1.5. Samsung MZVLB512HAJQ-000L7 512GB SSD
    +
    35.3.1.5. Samsung MZVLB512HAJQ-000L7 512GB SSD

    PCIe TLC OPAL2.

    @@ -43002,7 +43034,7 @@ LEVEL4_CACHE_LINESIZE 0
    -
    34.3.1.6. Seagate ST1000LM035-1RK1 1TB hard disk
    +
    35.3.1.6. Seagate ST1000LM035-1RK1 1TB hard disk

    1TB.

    @@ -43026,15 +43058,15 @@ LEVEL4_CACHE_LINESIZE 0
    -
    34.3.1.7. NVIDIA Quadro M1200 4GB GDDR5 GPU
    +
    35.3.1.7. NVIDIA Quadro M1200 4GB GDDR5 GPU
    -

    34.4. Benchmark Internets

    +

    35.4. Benchmark Internets

    -

    34.4.1. 38Mbps internet

    +

    35.4.1. 38Mbps internet

    2c12b21b304178a81c9912817b782ead0286d282:

    @@ -43054,7 +43086,7 @@ LEVEL4_CACHE_LINESIZE 0
    -

    34.5. Benchmark this repo bibliography

    +

    35.5. Benchmark this repo bibliography

    gem5:

    @@ -43082,13 +43114,13 @@ LEVEL4_CACHE_LINESIZE 0
    -

    35. Compilers

    +

    36. Compilers

    Argh, compilers are boring, let’s learn a bit about them.

    -

    35.2. C busy loop

    +

    36.2. C busy loop

    @@ -43184,10 +43216,10 @@ LEVEL4_CACHE_LINESIZE 0
    -

    36. Computer architecture

    +

    37. Computer architecture

    -

    36.1. Instruction pipelining

    +

    37.1. Instruction pipelining

    In gem5, can be seen on:

    @@ -43202,7 +43234,7 @@ LEVEL4_CACHE_LINESIZE 0
    -

    36.2. Superscalar processor

    +

    37.2. Superscalar processor

    @@ -43239,7 +43271,7 @@ LEVEL4_CACHE_LINESIZE 0
    -

    36.2.1. Execution unit

    +

    37.2.1. Execution unit

    @@ -43252,7 +43284,7 @@ LEVEL4_CACHE_LINESIZE 0
    -

    36.3. Out-of-order execution

    +

    37.3. Out-of-order execution

    https://en.wikipedia.org/wiki/Out-of-order_execution

    @@ -43269,7 +43301,7 @@ LEVEL4_CACHE_LINESIZE 0

    As mentioned at: https://stackoverflow.com/questions/10074831/what-is-general-difference-between-superscalar-and-ooo-execution it is in theory possible for an out-of-order CPU to not a Superscalar processor, but the combination is so natural (since you can look ahead, you might as well run it!) that it is not super common.

    -

    36.3.1. Speculative execution

    +

    37.3.1. Speculative execution

    https://en.wikipedia.org/wiki/Speculative_execution

    @@ -43287,7 +43319,7 @@ LEVEL4_CACHE_LINESIZE 0
    -
    36.3.1.1. Branch predictor
    +
    37.3.1.1. Branch predictor

    https://en.wikipedia.org/wiki/Branch_predictor

    @@ -43300,20 +43332,20 @@ LEVEL4_CACHE_LINESIZE 0
    -

    36.3.2. Re-order buffer

    +

    37.3.2. Re-order buffer

    https://en.wikipedia.org/wiki/Re-order_buffer

    -

    36.3.3. Register renaming

    +

    37.3.3. Register renaming

    https://en.wikipedia.org/wiki/Register_renaming

    -

    36.4. Instruction level parallelism

    +

    37.4. Instruction level parallelism

    https://en.wikipedia.org/wiki/Instruction-level_parallelism

    @@ -43332,7 +43364,7 @@ LEVEL4_CACHE_LINESIZE 0
    -

    36.5. Hardware threads

    +

    37.5. Hardware threads

    Intel name: "Hyperthreading"

    @@ -43382,7 +43414,7 @@ LEVEL4_CACHE_LINESIZE 0
    -

    36.6. Caches

    +

    37.6. Caches

    https://courses.cs.washington.edu/courses/cse378/09wi/lectures/lec15.pdf contains some of the first pictures you should see.

    @@ -43424,7 +43456,7 @@ LEVEL4_CACHE_LINESIZE 0

    For example, for a 2-way associative cache, we remove on bit from the index, and add it to the tag.

    -

    36.6.1. Cache coherence

    +

    37.6.1. Cache coherence

    https://en.wikipedia.org/wiki/Cache_coherence

    @@ -43466,7 +43498,7 @@ LEVEL4_CACHE_LINESIZE 0

    Even if caches are coherent, this is still not enough to avoid data race conditions, because this does not enforce atomicity of read modify write sequences. This is for example shown at: Detailed gem5 analysis of how data races happen.

    -
    36.6.1.1. Memory consistency
    +
    37.6.1.1. Memory consistency

    According to http://www.inf.ed.ac.uk/teaching/courses/pa/Notes/lecture07-sc.pdf "memory consistency" is about ordering requirements of different memory addresses.

    @@ -43474,14 +43506,14 @@ LEVEL4_CACHE_LINESIZE 0

    This is represented explicitly in C++ for example C++ std::memory_order.

    -
    36.6.1.1.1. Sequential Consistency
    +
    37.6.1.1.1. Sequential Consistency

    According to http://www.inf.ed.ac.uk/teaching/courses/pa/Notes/lecture07-sc.pdf, the strongest possible consistency, everything nicely ordered as you’d expect.

    -
    36.6.1.2. Can caches snoop data from other caches?
    +
    37.6.1.2. Can caches snoop data from other caches?

    Either they can snoop only control, or both control and data can be snooped.

    @@ -43496,7 +43528,7 @@ LEVEL4_CACHE_LINESIZE 0
    -
    36.6.1.3. VI cache coherence protocol
    +
    37.6.1.3. VI cache coherence protocol

    Mentioned at:

    @@ -43743,7 +43775,7 @@ LEVEL4_CACHE_LINESIZE 0
    -
    36.6.1.4. MSI cache coherence protocol
    +
    37.6.1.4. MSI cache coherence protocol

    https://en.wikipedia.org/wiki/MSI_protocol

    @@ -44055,7 +44087,7 @@ CACHE2 S nyy

    TODO gem5 concrete example.

    -
    36.6.1.4.1. MSI cache coherence protocol with transient states
    +
    37.6.1.4.1. MSI cache coherence protocol with transient states

    TODO understand well why those are needed.

    @@ -44075,7 +44107,7 @@ CACHE2 S nyy
    -
    36.6.1.5. MESI cache coherence protocol
    +
    37.6.1.5. MESI cache coherence protocol

    https://en.wikipedia.org/wiki/MESI_protocol

    @@ -44135,7 +44167,7 @@ CACHE2 S nyy
    -
    36.6.1.6. MOSI cache coherence protocol
    +
    37.6.1.6. MOSI cache coherence protocol

    https://en.wikipedia.org/wiki/MOSI_protocol The critical MSI vs MOSI section was a bit bogus though: https://en.wikipedia.org/w/index.php?title=MOSI_protocol&oldid=895443023 but I edited it :-)

    @@ -44195,7 +44227,7 @@ CACHE2 S nyy
    -
    36.6.1.7. MOESI cache coherence protocol
    +
    37.6.1.7. MOESI cache coherence protocol

    https://en.wikipedia.org/wiki/MOESI_protocol

    @@ -44203,10 +44235,10 @@ CACHE2 S nyy

    MESI cache coherence protocol + MOSI cache coherence protocol, not much else to it!

    -

    In gem5 9fc9c67b4242c03f165951775be5cd0812f2a705, MOESI is the default cache coherency protocol of the classic memory system as shown at Section 23.22.4.3.1, “What is the coherency protocol implemented by the classic cache system in gem5?”.

    +

    In gem5 9fc9c67b4242c03f165951775be5cd0812f2a705, MOESI is the default cache coherency protocol of the classic memory system as shown at Section 24.22.4.3.1, “What is the coherency protocol implemented by the classic cache system in gem5?”.

    -

    A good an simple example showing several MOESI transitions in the classic memory model can be seen at: Section 23.22.4.4, “gem5 event queue AtomicSimpleCPU syscall emulation freestanding example analysis with caches and multiple CPUs”.

    +

    A good an simple example showing several MOESI transitions in the classic memory model can be seen at: Section 24.22.4.4, “gem5 event queue AtomicSimpleCPU syscall emulation freestanding example analysis with caches and multiple CPUs”.

    gem5 12c917de54145d2d50260035ba7fa614e25317a3 has several Ruby MOESI models implemented: MOESI_AMD_Base, MOESI_CMP_directory, MOESI_CMP_token and MOESI_hammer.

    @@ -44217,10 +44249,10 @@ CACHE2 S nyy
    -

    37. About this repo

    +

    38. About this repo

    -

    37.1. Supported hosts

    +

    38.1. Supported hosts

    The host requirements depend a lot on which examples you want to run.

    @@ -44270,9 +44302,9 @@ CACHE2 S nyy
    -

    37.2. Common build issues

    +

    38.2. Common build issues

    -

    37.2.1. You must put some 'source' URIs in your sources.list

    +

    38.2.1. You must put some 'source' URIs in your sources.list

    If ./build --download-dependencies fails with:

    @@ -44286,7 +44318,7 @@ CACHE2 S nyy
    -

    37.2.2. Build from downloaded source zip files

    +

    38.2.2. Build from downloaded source zip files

    It does not work if you just download the .zip with the sources for this repository from GitHub because we use Git submodules, you must clone this repo.

    @@ -44296,7 +44328,7 @@ CACHE2 S nyy
    -

    37.3. Run command after boot

    +

    38.3. Run command after boot

    If you just want to run a command after boot ends without thinking much about it, just use the --eval-after option, e.g.:

    @@ -44313,7 +44345,7 @@ CACHE2 S nyy
    -

    37.4. Default command line arguments

    +

    38.4. Default command line arguments

    It gets annoying to retype --arch aarch64 for every single command, or to remember --config setups.

    @@ -44358,12 +44390,12 @@ CACHE2 S nyy
    -

    37.5. Documentation

    +

    38.5. Documentation

    -

    To learn how to build the documentation see: Section 1.10, “Build the documentation”.

    +

    To learn how to build the documentation see: Section 2.10, “Build the documentation”.

    -

    37.5.1. Documentation verification

    +

    38.5.1. Documentation verification

    When running build-doc, we do the following checks:

    @@ -44384,7 +44416,7 @@ CACHE2 S nyy

    The scripts prints what you have to fix and exits with an error status if there are any errors.

    - + @@ -44407,7 +44439,7 @@ CACHE2 S nyy
    -
    37.5.1.2. asciidoctor/extract-header-ids
    +
    38.5.1.2. asciidoctor/extract-header-ids

    Documentation for asciidoctor/extract-header-ids

    @@ -44452,7 +44484,7 @@ explicitly-given
    - +

    The Asciidoctor extension scripts:

    @@ -44480,7 +44512,7 @@ explicitly-given
    -

    37.6.1. GitHub pages

    +

    38.6.1. GitHub pages

    As mentioned before the TOC, we have to push this README to GitHub pages due to: https://github.com/isaacs/github/issues/1610

    @@ -44530,7 +44562,7 @@ explicitly-given
    -

    37.7. Clean the build

    +

    38.7. Clean the build

    You did something crazy, and nothing seems to work anymore?

    @@ -44594,7 +44626,7 @@ ls "$(./getvar buildroot_build_dir)"
    -

    37.8. Custom build directory

    +

    38.8. Custom build directory

    For now there is no way to change the build directory from out/ (resp. out.docker for <<docker>.) to something else.

    @@ -44609,7 +44641,7 @@ ln -s out /mnt/hd/linux-kernel-module-cheat-out
    -

    37.9. ccache

    +

    38.9. ccache

    ccache might save you a lot of re-build when you decide to Clean the build or create a new build variant.

    @@ -44689,7 +44721,7 @@ export CCACHE_MAXSIZE="20G"
    -

    37.10. getvar

    +

    38.10. getvar

    The getvar helper script can print the values of internal LKMC variables.

    @@ -44727,7 +44759,7 @@ export CCACHE_MAXSIZE="20G"

    For this reason, we use it in particular often in this README to reduce the need for refactoring.

    -

    37.10.1. run-toolchain

    +

    38.10.1. run-toolchain

    While you could just manually find/learn the path to toolchain tools, e.g. in LKMC b15a0e455d691afa49f3b813ad9b09394dfb02b7 they are:

    @@ -44774,7 +44806,7 @@ export CCACHE_MAXSIZE="20G"
    -
    37.10.1.1. disas
    +
    38.10.1.1. disas

    Since disassembly of a single function of a LKMC executable with GDB is such a common use case for run-toolchain via https://stackoverflow.com/questions/22769246/how-to-disassemble-one-single-function-using-objdump, we have this shortcut for it.

    @@ -44806,7 +44838,7 @@ export CCACHE_MAXSIZE="20G"
    -

    37.11. Rebuild Buildroot while running

    +

    38.11. Rebuild Buildroot while running

    It is not possible to rebuild the root filesystem while running QEMU because QEMU holds the file qcow2 file:

    @@ -44817,7 +44849,7 @@ export CCACHE_MAXSIZE="20G"
    -

    37.12. Simultaneous runs

    +

    38.12. Simultaneous runs

    When doing long simulations sweeping across multiple system parameters, it becomes fundamental to do multiple simulations in parallel.

    @@ -44913,7 +44945,7 @@ less "$(./getvar --arch aarch64 --emulator gem5 --run-id 1 termout_file)"
    -

    To run multiple gem5 checkouts, see: Section 37.13.3.1, “gem5 worktree”.

    +

    To run multiple gem5 checkouts, see: Section 38.13.3.1, “gem5 worktree”.

    Implementation note: we create multiple namespaces for two things:

    @@ -44952,7 +44984,7 @@ less "$(./getvar --arch aarch64 --emulator gem5 --run-id 1 termout_file)"
    -

    37.13. Build variants

    +

    38.13. Build variants

    It often happens that you are comparing two versions of the build, a good and a bad one, and trying to figure out why the bad one is bad.

    @@ -44960,7 +44992,7 @@ less "$(./getvar --arch aarch64 --emulator gem5 --run-id 1 termout_file)"

    Our build variants system allows you to keep multiple built versions of all major components, so that you can easily switching between running one or the other.

    -

    37.13.1. Linux kernel build variants

    +

    38.13.1. Linux kernel build variants

    If you want to keep two builds around, one for the latest Linux version, and the other for Linux v4.16:

    @@ -44996,11 +45028,11 @@ git -C "$(./getvar linux_source_dir)" checkout -
    -

    To run both kernels simultaneously, one on each QEMU instance, see: Section 37.12, “Simultaneous runs”.

    +

    To run both kernels simultaneously, one on each QEMU instance, see: Section 38.12, “Simultaneous runs”.

    -

    37.13.2. QEMU build variants

    +

    38.13.2. QEMU build variants

    Analogous to the Linux kernel build variants but with the --qemu-build-id option instead:

    @@ -45016,7 +45048,7 @@ git -C "$(./getvar qemu_source_dir)" checkout -
    -

    37.13.3. gem5 build variants

    +

    38.13.3. gem5 build variants

    Analogous to the Linux kernel build variants but with the --gem5-build-id option instead:

    @@ -45047,7 +45079,7 @@ git -C "$(./getvar gem5_source_dir)" checkout some-branch

    Therefore, you can’t forget to checkout to the sources to that of the corresponding build before running, unless you explicitly tell gem5 to use a non-default source tree with gem5 worktree. This becomes inevitable when you want to launch multiple simultaneous runs at different checkouts.

    -
    37.13.3.1. gem5 worktree
    +
    38.13.3.1. gem5 worktree

    --gem5-build-id goes a long way, but if you want to seamlessly switch between two gem5 tress without checking out multiple times, then --gem5-worktree is for you.

    @@ -45100,7 +45132,7 @@ cd -
    -
    37.13.3.2. gem5 private source trees
    +
    38.13.3.2. gem5 private source trees

    Suppose that you are working on a private fork of gem5, but you want to use this repository to develop it as well.

    @@ -45144,7 +45176,7 @@ gem5_internal="$(pwd)/gem5-internal"
    -

    37.13.4. Buildroot build variants

    +

    38.13.4. Buildroot build variants

    Allows you to have multiple versions of the GCC toolchain or root filesystem.

    @@ -45164,7 +45196,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -
    -

    37.14. Optimization level of a build

    +

    38.14. Optimization level of a build

    The --optimization-level option is available on all build scripts and sets the given GCC `-`O optimization level where it has been implemented for guest binaries.

    @@ -45191,9 +45223,9 @@ git -C "$(./getvar buildroot_source_dir)" checkout -
    -

    37.15. Directory structure

    +

    38.15. Directory structure

    -

    37.15.1. lkmc directory

    +

    38.15.1. lkmc directory

    lkmc/ contains sources and headers that are shared across kernel modules, userland and baremetal examples.

    @@ -45204,7 +45236,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -

    Another option would have been to name it as includes/lkmc, but that would make paths longer, and we might want to store source code in that directory as well in the future.

    -
    37.15.1.1. Userland objects vs header-only
    +
    38.15.1.1. Userland objects vs header-only

    When factoring out functionality across userland examples, there are two main options:

    @@ -45263,7 +45295,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -
    -

    37.15.2. buildroot_packages directory

    +

    38.15.2. buildroot_packages directory

    Source: buildroot_packages/.

    @@ -45312,7 +45344,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -

    A custom build script can give you more flexibility: e.g. the package can be made work with other root filesystems more easily, have better 9P support, and rebuild faster as it evades some Buildroot boilerplate.

    -
    37.15.2.1. kernel_modules buildroot package
    +
    38.15.2.1. kernel_modules buildroot package

    Source: buildroot_packages/kernel_modules/

    @@ -45359,9 +45391,9 @@ git -C "$(./getvar buildroot_source_dir)" checkout -
    -

    37.15.3. patches directory

    +

    38.15.3. patches directory

    -
    37.15.3.1. patches/global directory
    +
    38.15.3.1. patches/global directory

    Has the following structure:

    @@ -45378,7 +45410,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -
    -
    37.15.3.2. patches/manual directory
    +
    38.15.3.2. patches/manual directory

    Patches in this directory are never applied automatically: it is up to users to manually apply them before usage following the instructions in this documentation.

    @@ -45388,7 +45420,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -
    -

    37.15.4. rootfs_overlay

    +

    38.15.4. rootfs_overlay

    Source: rootfs_overlay.

    @@ -45421,7 +45453,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -

    Source: copy-overlay

    -

    Build Buildroot is required for the same reason as described at: Section 1.2.2.2, “Your first kernel module hack”.

    +

    Build Buildroot is required for the same reason as described at: Section 2.2.2.2, “Your first kernel module hack”.

    However, since the rootfs_overlay directory does not require compilation, unlike say kernel modules, we also make it 9P available to the guest directly even without ./copy-overlay at:

    @@ -45435,7 +45467,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -

    This way you can just hack away the scripts and try them out immediately without any further operations.

    -
    37.15.4.1. out_rootfs_overlay_dir
    +
    38.15.4.1. out_rootfs_overlay_dir

    This path can be found with:

    @@ -45467,7 +45499,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -

    This does not include native image modification mechanisms such as Buildroot packages, which we let Buildroot itself manage.

    -
    37.15.4.1.1. disk_image_2
    +
    38.15.4.1.1. disk_image_2

    A squashfs of out_rootfs_overlay_dir that gets passed as the second argument.

    @@ -45478,7 +45510,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -
    -

    37.15.5. lkmc.c

    +

    38.15.5. lkmc.c

    The files:

    @@ -45516,7 +45548,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -
    -

    37.15.6. lkmc_home

    +

    38.15.6. lkmc_home

    lkmc_home refers to the target base directory in which we put all our custom built stuff, such as userland executables and kernel modules.

    @@ -45549,7 +45581,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -
    -

    37.15.7. path_properties.py

    +

    38.15.7. path_properties.py

    In order to build and run each userland and baremetal example properly, we need per-file metadata such as compiler flags and required number of cores.

    @@ -45612,7 +45644,7 @@ baremetal=True
    -

    37.15.8. rand_check.out

    +

    38.15.8. rand_check.out

    Print out several parameters that normally change randomly from boot to boot:

    @@ -45640,9 +45672,9 @@ baremetal=True
    -

    37.16. Test this repo

    +

    38.16. Test this repo

    -

    37.16.1. Automated tests

    +

    38.16.1. Automated tests

    Run almost all tests:

    @@ -45698,7 +45730,7 @@ echo $?

    test does not all possible tests, because there are too many possible variations and that would take forever. The rationale is the same as for ./build all and is explained in ./build --help.

    -
    37.16.1.1. Test arch and emulator selection
    +
    38.16.1.1. Test arch and emulator selection

    You can select multiple archs and emulators of interest, as for an other command, with:

    @@ -45731,7 +45763,7 @@ echo $?
    -
    37.16.1.2. Quit on fail
    +
    38.16.1.2. Quit on fail

    By default, continue running even after the first failure happens, and they show a summary at the end.

    @@ -45745,7 +45777,7 @@ echo $?
    -
    37.16.1.3. Test userland in full system
    +
    38.16.1.3. Test userland in full system

    TODO: we really need a mechanism to automatically generate the test list automatically e.g. based on path_properties.py, currently there are many tests missing, and we have to add everything manually which is very annoying.

    @@ -45770,11 +45802,11 @@ echo $?

    Failure is detected by looking for the Magic failure string

    -

    Most userland programs that don’t rely on kernel modules can also be tested in user mode simulation as explained at: Section 10.2, “User mode tests”.

    +

    Most userland programs that don’t rely on kernel modules can also be tested in user mode simulation as explained at: Section 11.2, “User mode tests”.

    -
    37.16.1.4. GDB tests
    +
    38.16.1.4. GDB tests

    We have some pexpect automated tests for GDB for both userland and baremetal programs!

    @@ -45847,7 +45879,7 @@ echo $?
    -
    37.16.1.5. Magic failure string
    +
    38.16.1.5. Magic failure string

    We do not know of any way to set the emulator exit status in QEMU arm full system.

    @@ -45863,7 +45895,7 @@ echo $?

    gem5: m5 fail works on all archs

  • -

    user mode: QEMU forwards exit status, for gem5 we do some log parsing as described at: Section 10.7.2, “gem5 syscall emulation exit status”

    +

    user mode: QEMU forwards exit status, for gem5 we do some log parsing as described at: Section 11.7.2, “gem5 syscall emulation exit status”

  • @@ -45950,9 +45982,9 @@ echo $?
    -

    37.16.2. Non-automated tests

    +

    38.16.2. Non-automated tests

    -
    37.16.2.1. Test GDB Linux kernel
    +
    38.16.2.1. Test GDB Linux kernel

    For the Linux kernel, do the following manual tests for now.

    @@ -45990,7 +46022,7 @@ echo $?
    -
    37.16.2.2. Test the Internet
    +
    38.16.2.2. Test the Internet

    You should also test that the Internet works:

    @@ -46001,7 +46033,7 @@ echo $?
    -
    37.16.2.3. CLI script tests
    +
    38.16.2.3. CLI script tests

    build-userland and test-executables have a wide variety of target selection modes, and it was hard to keep them all working without some tests:

    @@ -46019,12 +46051,12 @@ echo $?
    -

    37.17. Bisection

    +

    38.17. Bisection

    When updating the Linux kernel, QEMU and gem5, things sometimes break.

    -

    However, for many types of crashes, it is trivial to bisect down to the offending commit, in particular because we can make QEMU and gem5 exit with status 1 on kernel panic as mentioned at: Section 16.6.1.3, “Exit emulator on panic”.

    +

    However, for many types of crashes, it is trivial to bisect down to the offending commit, in particular because we can make QEMU and gem5 exit with status 1 on kernel panic as mentioned at: Section 17.6.1.3, “Exit emulator on panic”.

    For example, when updating from QEMU v2.12.0 to v3.0.0-rc3, the Linux kernel boot started to panic for arm.

    @@ -46075,7 +46107,7 @@ git submodule update
    -

    37.18. Update a forked submodule

    +

    38.18. Update a forked submodule

    This is a template update procedure for submodules for which we have some patches on on top of mainline.

    @@ -46104,9 +46136,9 @@ git commit -m "linux: update to ${next_mainline_revision}"
    -

    37.19. Release

    +

    38.19. Release

    -

    37.19.1. Release procedure

    +

    38.19.1. Release procedure

    Ensure that the Automated tests are passing on a clean build:

    @@ -46117,7 +46149,7 @@ git commit -m "linux: update to ${next_mainline_revision}"
    -

    The ./build-test command builds a superset of what will be downloaded which also tests other things we would like to be working on the release. For the minimal build to generate the files to be uploaded, see: Section 37.19.2, “release-zip”

    +

    The ./build-test command builds a superset of what will be downloaded which also tests other things we would like to be working on the release. For the minimal build to generate the files to be uploaded, see: Section 38.19.2, “release-zip”

    The clean build is necessary as it generates clean images since it is not possible to remove Buildroot packages

    @@ -46187,7 +46219,7 @@ git push --follow-tags
    -

    37.19.2. release-zip

    +

    38.19.2. release-zip

    Create a zip containing all files required for Prebuilt setup:

    @@ -46212,7 +46244,7 @@ git push --follow-tags
    -

    37.19.3. release-upload

    +

    38.19.3. release-upload

    After:

    @@ -46260,9 +46292,9 @@ git push --follow-tags
    -

    37.20. Design rationale

    +

    38.20. Design rationale

    -

    37.20.1. Design goals

    +

    38.20.1. Design goals

    This project was created to help me understand, modify and test low level system components by using system simulators.

    @@ -46338,7 +46370,7 @@ git push --follow-tags
    -

    37.20.2. Setup trade-offs

    +

    38.20.2. Setup trade-offs

    The trade-offs between the different setups are basically a balance between:

    @@ -46363,13 +46395,13 @@ git push --follow-tags

    compatibility: how likely is is that all the components will work well together: emulator, compiler, kernel, standard library, …​

  • -

    guest software availability: how wide is your choice of easily installed guest software packages? See also: Section 37.20.4, “Linux distro choice”

    +

    guest software availability: how wide is your choice of easily installed guest software packages? See also: Section 38.20.4, “Linux distro choice”

  • -

    37.20.3. Resource tradeoff guidelines

    +

    38.20.3. Resource tradeoff guidelines

    Choosing which features go into our default builds means making tradeoffs, here are our guidelines:

    @@ -46410,11 +46442,11 @@ git push --follow-tags
    -

    In order to learn how to measure some of those aspects, see: Section 34, “Benchmark this repo”.

    +

    In order to learn how to measure some of those aspects, see: Section 35, “Benchmark this repo”.

    -

    37.20.4. Linux distro choice

    +

    38.20.4. Linux distro choice

    We haven’t found the ultimate distro yet, here is a summary table of trade-offs that we care about: Table 8, “Comparison of Linux distros for usage in this repository”.

    @@ -46517,9 +46549,9 @@ git push --follow-tags
    -

    37.21. Soft topics

    +

    38.21. Soft topics

    -

    37.21.1. Fairy tale

    +

    38.21.1. Fairy tale

    @@ -46557,7 +46589,7 @@ git push --follow-tags