diff --git a/index.html b/index.html index 4bbbe79..4bcfd3f 100644 --- a/index.html +++ b/index.html @@ -445,10 +445,13 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
+

64534859

+
+

The perfect emulation setup to study and develop the Linux kernel v5.2.1, kernel modules, QEMU, gem5 and x86_64, ARMv7 and ARMv8 userland and baremetal assembly, ANSI C, C++ and POSIX. GDB step debug and KGDB just work. Powered by Buildroot and crosstool-NG. Highly automated. Thoroughly documented. Automated tests. "Tested" in an Ubuntu 18.04 host.

The source code for this page is located at: https://github.com/cirosantilli/linux-kernel-module-cheat. Due to a GitHub limitation, this README is too long and not fully rendered on github.com. Either use: README.adoc, https://cirosantilli.com/linux-kernel-module-cheat or build the docs yourself.

@@ -510,6 +513,7 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
  • 1.7.2. Baremetal setup getting started
  • +
  • 1.8. Build the documentation
  • 2. GDB step debug @@ -1068,7 +1072,9 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
  • 19. Buildroot @@ -1196,7 +1208,7 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
  • 20.2. C++
  • -

    Design goals of this project are documented at: Design goals.

    +

    Design goals of this project are documented at: Section 29.18.1, “Design goals”.

    1.1. QEMU Buildroot setup

    1.1.1. QEMU Buildroot setup getting started

    -

    This setup has been mostly tested on Ubuntu. For other host operating systems see: Supported hosts. For greater stability, consider using the latest release instead of master: https://github.com/cirosantilli/linux-kernel-module-cheat/releases

    +

    This setup has been mostly tested on Ubuntu. For other host operating systems see: Section 29.1, “Supported hosts”. For greater stability, consider using the latest release instead of master: https://github.com/cirosantilli/linux-kernel-module-cheat/releases

    Reserve 12Gb of disk and run:

    @@ -1808,7 +1821,7 @@ cd linux-kernel-module-cheat

    You don’t need to clone recursively even though we have .git submodules: download-dependencies fetches just the submodules that you need for this build to save time.

    The initial build will take a while (30 minutes to 2 hours) to clone and build, see Benchmark builds for more details.

    @@ -1876,7 +1889,7 @@ hello2 cleanup

    All available modules can be found in the kernel_modules directory.

    @@ -1891,7 +1904,7 @@ hello2 cleanup
    -

    To avoid typing --arch aarch64 many times, you can set the default arch as explained at: Default command line arguments

    +

    To avoid typing --arch aarch64 many times, you can set the default arch as explained at: Section 29.4, “Default command line arguments”

    I now urge you to read the following sections which contain widely applicable information:

    @@ -2040,7 +2053,7 @@ hello /root/.profile
    -

    When you reach difficulties, QEMU makes it possible to easily GDB step debug the Linux kernel source code, see: GDB step debug.

    +

    When you reach difficulties, QEMU makes it possible to easily GDB step debug the Linux kernel source code, see: Section 2, “GDB step debug”.

    @@ -2125,7 +2138,7 @@ hello /root/.profile

    All of this put together makes the safe procedure acceptably fast for regular development as well.

    -

    It is also easy to GDB step debug kernel modules with our setup, see: GDB step debug kernel module.

    +

    It is also easy to GDB step debug kernel modules with our setup, see: Section 2.4, “GDB step debug kernel module”.

    @@ -2195,10 +2208,10 @@ hello /root/.profile

    If you really want to develop semiconductors, your only choice is to join an university or a semiconductor company that has the EDA licenses.

    -

    See also: Should you waste your life with systems programming?.

    +

    See also: Section 29.19.2, “Should you waste your life with systems programming?”.

    -

    While hacking QEMU, you will likely want to GDB step its source. That is trivial since QEMU is just another userland program like any other, but our setup has a shortcut to make it even more convenient, see: Debug the emulator.

    +

    While hacking QEMU, you will likely want to GDB step its source. That is trivial since QEMU is just another userland program like any other, but our setup has a shortcut to make it even more convenient, see: Section 17.7, “Debug the emulator”.

    @@ -2531,7 +2544,7 @@ j = 0
    -

    and can therefore be used to estimate system performance, see: gem5 run benchmark for an example.

    +

    and can therefore be used to estimate system performance, see: Section 18.2, “gem5 run benchmark” for an example.

    The downside of gem5 much slower than QEMU because of the greater simulation detail.

    @@ -2585,7 +2598,7 @@ j = 0
    -

    See also: tmux gem5.

    +

    See also: Section 2.3.1, “tmux gem5”.

    At the end of boot, it might not be very clear that you have the shell since some printk messages may appear in front of the prompt like this:

    @@ -2608,7 +2621,7 @@ j = 0
    -

    More gem5 information is present at: gem5

    +

    More gem5 information is present at: Section 18, “gem5”

    Good next steps are:

    @@ -2634,7 +2647,7 @@ j = 0

    This repository has been tested inside clean Docker containers.

    -

    This is a good option if you are on a Linux host, but the native setup failed due to your weird host distribution, and you have better things to do with your life than to debug it. See also: Supported hosts.

    +

    This is a good option if you are on a Linux host, but the native setup failed due to your weird host distribution, and you have better things to do with your life than to debug it. See also: Section 29.1, “Supported hosts”.

    For example, to do a QEMU Buildroot setup inside Docker, run:

    @@ -2822,7 +2835,7 @@ j = 0
    -

    The situation is analogous for userland/test, test-executables-in-tree and test-executables, which are further documented at: User mode tests.

    +

    The situation is analogous for userland/test, test-executables-in-tree and test-executables, which are further documented at: Section 10.2, “User mode tests”.

    Do a more clean out-of-tree build instead and run the program:

    @@ -3307,7 +3320,7 @@ cd userland
    -

    as shown at: Debug the emulator, although direct GDB host usage works as well of course.

    +

    as shown at: Section 17.7, “Debug the emulator”, although direct GDB host usage works as well of course.

    @@ -3340,7 +3353,7 @@ cd userland
  • --gcc-which host: use the host toolchain.

    -

    We must pass this to ./run as well because QEMU must know which dynamic libraries to use. See also: User mode static executables.

    +

    We must pass this to ./run as well because QEMU must know which dynamic libraries to use. See also: Section 10.5, “User mode static executables”.

  • @@ -3349,7 +3362,7 @@ cd userland
  • -

    This present the usual trade-offs of using prebuilts as mentioned at: Prebuilt setup.

    +

    This present the usual trade-offs of using prebuilts as mentioned at: Section 1.4, “Prebuilt setup”.

    Other functionality are analogous, e.g. testing:

    @@ -3390,7 +3403,7 @@ cd userland

    After doing that setup, you can already execute your userland programs from inside QEMU: the only missing step is how to rebuild executables and run them.

    -

    And the answer is exactly analogous to what is shown at: Your first kernel module hack

    +

    And the answer is exactly analogous to what is shown at: Section 1.1.2.2, “Your first kernel module hack”

    For example, if we modify userland/c/hello.c to print out something different, we can just rebuild it with:

    @@ -3596,7 +3609,7 @@ error: simulation error detected by parsing logs
    -

    TODO: the carriage returns are a bit different than in QEMU, see: gem5 baremetal carriage return.

    +

    TODO: the carriage returns are a bit different than in QEMU, see: Section 26.4, “gem5 baremetal carriage return”.

    Note that ./build-baremetal requires the --emulator gem5 option, and generates separate executable images for both, as can be seen from:

    @@ -3641,10 +3654,10 @@ echo "$(./getvar --arch aarch64 --baremetal userland/c/hello.c --emulator gem5 -

    But just stick to newer and better VExpress_GEM5_V1 unless you have a good reason to use RealViewPBX.

    -

    When doing baremetal programming, it is likely that you will want to learn userland assembly first, see: Userland assembly.

    +

    When doing baremetal programming, it is likely that you will want to learn userland assembly first, see: Section 21, “Userland assembly”.

    -

    For more information on baremetal, see the section: Baremetal.

    +

    For more information on baremetal, see the section: Section 26, “Baremetal”.

    The following subjects are particularly important:

    @@ -3661,6 +3674,57 @@ echo "$(./getvar --arch aarch64 --baremetal userland/c/hello.c --emulator gem5 -
    +
    +

    1.8. Build the documentation

    +
    +

    You don’t need to depend on GitHub.

    +
    +
    +

    For a quick and dirty build, install Asciidoctor however you like and build:

    +
    +
    +
    +
    asciidotor README.adoc
    +xdg-open README.html
    +
    +
    +
    +

    For development, you will want to do a more controlled build with extra error checking as follows.

    +
    +
    +

    For the initial build do:

    +
    +
    +
    +
    ./build --download-dependencies docs
    +
    +
    +
    +

    which also downloads build dependencies.

    +
    +
    +

    Then the following times just to the faster:

    +
    +
    +
    +
    ./build-doc
    +
    +
    +
    +

    Source: build-doc

    +
    +
    +

    The HTML output is located at:

    +
    +
    +
    +
    xdg-open out/README.html
    +
    +
    +
    +

    More information about our documentation internals can be found at: Section 29.5, “Documentation”

    +
    +
    @@ -4444,7 +4508,7 @@ echo 'file kernel/module.c +p' > /sys/kernel/debug/dynamic_debug/control

    gem5 tracing with --debug-flags=Exec does show the right symbols however! So in the worst case, we can just read their source. Amazing.

    -

    v4.19 also added a CONFIG_HAVE_KERNEL_UNCOMPRESSED=y option for having the kernel uncompressed which could make following the startup easier, but it is only available on s390. aarch64 however is already uncompressed by default, so might be the easiest one. See also: vmlinux vs bzImage vs zImage vs Image.

    +

    v4.19 also added a CONFIG_HAVE_KERNEL_UNCOMPRESSED=y option for having the kernel uncompressed which could make following the startup easier, but it is only available on s390. aarch64 however is already uncompressed by default, so might be the easiest one. See also: Section 15.21.1, “vmlinux vs bzImage vs zImage vs Image”.

    2.5.1. GDB step debug early boot by address

    @@ -4518,7 +4582,7 @@ echo 'file kernel/module.c +p' > /sys/kernel/debug/dynamic_debug/control
    -

    The number of cores is modified as explained at: Number of cores

    +

    The number of cores is modified as explained at: Section 18.2.2.1, “Number of cores”

    taskset from the util-linux package sets the initial core affinity of a program:

    @@ -5244,7 +5308,7 @@ Entering kdb (current=0x(____ptrval____), pid 1) on processor 0 due to Keyboard

    KGDB expects the connection at ttyS1, our second serial port after ttyS0 which contains the terminal.

    -

    The last line is the KDB prompt, and is covered at: KDB. Typing now shows nothing because that prompt is expecting input from ttyS1.

    +

    The last line is the KDB prompt, and is covered at: Section 3.3, “KDB”. Typing now shows nothing because that prompt is expecting input from ttyS1.

    Instead, we connect to the serial port ttyS1 with GDB:

    @@ -5793,7 +5857,7 @@ cr3 = 0xFFFFF0DCDC000

    The init program can be either an executable shell text file, or a compiled ELF file. It becomes easy to accept this once you see that the exec system call handles both cases equally: https://unix.stackexchange.com/questions/174062/can-the-init-process-be-a-shell-script-in-linux/395375#395375

    -

    The init executable is searched for in a list of paths in the root filesystem, including /init, /sbin/init and a few others. For more details see: Path to init

    +

    The init executable is searched for in a list of paths in the root filesystem, including /init, /sbin/init and a few others. For more details see: Section 6.3, “Path to init”

    6.1. Replace init

    @@ -5812,7 +5876,7 @@ cr3 = 0xFFFFF0DCDC000

    This just counts every second forever and does not give you a shell.

    -

    This method is not very flexible however, as it is hard to reliably pass multiple commands and command line arguments to the init with it, as explained at: Init environment.

    +

    This method is not very flexible however, as it is hard to reliably pass multiple commands and command line arguments to the init with it, as explained at: Section 6.4, “Init environment”.

    For this reason, we have created a more robust helper method with the --eval option:

    @@ -5834,10 +5898,10 @@ cr3 = 0xFFFFF0DCDC000

    Source: rootfs_overlay/lkmc/eval_base64.sh.

    -

    This allows quoting and newlines by base64 encoding on host, and decoding on guest, see: Kernel command line parameters escaping.

    +

    This allows quoting and newlines by base64 encoding on host, and decoding on guest, see: Section 15.3.1, “Kernel command line parameters escaping”.

    -

    It also automatically chooses between init= and rcinit= for you, see: Path to init

    +

    It also automatically chooses between init= and rcinit= for you, see: Section 6.3, “Path to init”

    --eval replaces BusyBox' init completely, which makes things more minimal, but also has has the following consequences:

    @@ -5863,7 +5927,7 @@ cr3 = 0xFFFFF0DCDC000
    -

    The best way to overcome those limitations is to use: Run command at the end of BusyBox init

    +

    The best way to overcome those limitations is to use: Section 6.2, “Run command at the end of BusyBox init”

    If the script is large, you can add it to a gitignored file and pass that to --eval as in:

    @@ -6282,7 +6346,7 @@ cat f

    which can be good for automated tests, as it ensures that you are using a pristine unmodified system image every time.

    -

    Not however that we already disable disk persistency by default on ext2 filesystems even without --initrd: Disk persistency.

    +

    Not however that we already disable disk persistency by default on ext2 filesystems even without --initrd: Section 17.2, “Disk persistency”.

    One downside of this method is that it has to put the entire filesystem into memory, and could lead to a panic:

    @@ -6445,7 +6509,7 @@ cat f
    @@ -6480,7 +6544,7 @@ cat f
    -

    We think that this might be because gem5 boots directly vmlinux, and not from the final compressed images that contain the attached rootfs such as bzImage, which is what QEMU does, see also: vmlinux vs bzImage vs zImage vs Image.

    +

    We think that this might be because gem5 boots directly vmlinux, and not from the final compressed images that contain the attached rootfs such as bzImage, which is what QEMU does, see also: Section 15.21.1, “vmlinux vs bzImage vs zImage vs Image”.

    To do this failed test, we automatically pass a dummy disk image as of gem5 7fa4c946386e7207ad5859e8ade0bbfc14000d91 since the scripts don’t handle a missing --disk-image well, much like is currently done for Baremetal.

    @@ -6886,7 +6950,7 @@ sudo ./setup -y
  • emulator implementers have to keep up with libc changes, some of which break even a C hello world due setup code executed before main.

  • @@ -6925,7 +6989,7 @@ qw er

    ./run --userland path resolution is analogous to that of ./run --baremetal.

    -

    ./build user-mode-qemu first builds Buildroot, and then runs ./build-userland, which is further documented at: Userland setup. It also builds QEMU. If you ahve already done a QEMU Buildroot setup previously, this will be very fast.

    +

    ./build user-mode-qemu first builds Buildroot, and then runs ./build-userland, which is further documented at: Section 1.6, “Userland setup”. It also builds QEMU. If you ahve already done a QEMU Buildroot setup previously, this will be very fast.

    If you modify the userland programs, rebuild simply with:

    @@ -6976,7 +7040,7 @@ qw er
    -

    To stop at the very first instruction of a freestanding program, just use --no-continue. A good example of this is shown at: Freestanding programs.

    +

    To stop at the very first instruction of a freestanding program, just use --no-continue. A good example of this is shown at: Section 21.5.1, “Freestanding programs”.

    @@ -7026,10 +7090,10 @@ qw er

    Tests under userland/libs/ depend on certain libraries being available on the target, e.g. BLAS for userland/libs/openblas. They are not run by default, but can be enabled with --package and --package-all.

    -

    The gem5 tests require building statically with build id static, see also: gem5 syscall emulation mode. TODO automate this better.

    +

    The gem5 tests require building statically with build id static, see also: Section 10.6, “gem5 syscall emulation mode”. TODO automate this better.

    -

    See: Test this repo for more useful testing tips.

    +

    See: Section 29.13, “Test this repo” for more useful testing tips.

    @@ -7070,7 +7134,7 @@ qw er
    -

    Here is an interesting examples of this: Linux Test Project

    +

    Here is an interesting examples of this: Section 15.20.1, “Linux Test Project”

    @@ -7208,7 +7272,7 @@ qemu: uncaught target signal 6 (Aborted) - core dumped
    -

    First we build Dhrystone manually statically since dynamic linking is broken in gem5: gem5 syscall emulation mode.

    +

    First we build Dhrystone manually statically since dynamic linking is broken in gem5 as explained at: Section 10.6, “gem5 syscall emulation mode”.

    gem5 user mode:

    @@ -7477,11 +7541,11 @@ time \
    -
    ./run --userland userland/posix/count.c --userland-args 3
    +
    ./run --userland userland/posix/count_to.c --userland-args 3
    -

    it first waits for 3 seconds, and then dumps all the output at once, instead of counting once every second as expected.

    +

    it first waits for 3 seconds, then the program exits, and then it dumps all the stdout at once, instead of counting once every second as expected.

    The same can be reproduced by copying the raw QEMU command and piping it through tee, so I don’t think it is a bug in our setup:

    @@ -7626,10 +7690,10 @@ time \
    @@ -7645,7 +7709,7 @@ time \
  • -

    we would have to think how to not have to include the kernel modules twice in the root filesystem, but still have 9P working for fast development as described at: Your first kernel module hack

    +

    we would have to think how to not have to include the kernel modules twice in the root filesystem, but still have 9P working for fast development as described at: Section 1.1.2.2, “Your first kernel module hack”

  • @@ -7750,7 +7814,7 @@ time \

    no need to regenerate the root filesystem at all and reboot

  • -

    overcomes the check_bin_arch problem: Buildroot rebuild is slow when the root filesystem is large

    +

    overcomes the check_bin_arch problem as shown at: Section 19.8, “Buildroot rebuild is slow when the root filesystem is large”

  • @@ -7879,7 +7943,7 @@ a crash or deadlock.
    -

    For a more exciting GUI experience, see: X11 Buildroot

    +

    For a more exciting GUI experience, see: Section 13.4, “X11 Buildroot”

    Text mode is the default due to the following considerable advantages:

    @@ -8533,7 +8597,7 @@ xeyes

    14.1. Enable networking

    -

    We disable networking by default because it starts an userland process, and we want to keep the number of userland processes to a minimum to make the system more understandable: Resource tradeoff guidelines

    +

    We disable networking by default because it starts an userland process, and we want to keep the number of userland processes to a minimum to make the system more understandable as explained at: Section 29.18.3, “Resource tradeoff guidelines”

    To enable networking on Buildroot, simply run:

    @@ -8595,7 +8659,7 @@ cat index.html

    In this section we discuss how to interact between the guest and the host through networking.

    -

    First ensure that you can access the external network since that is easier to get working: Networking.

    +

    First ensure that you can access the external network since that is easier to get working, see: Section 14, “Networking”.

    14.3.1. Host to guest networking

    @@ -8879,7 +8943,7 @@ mount -t 9p -o trans=virtio,version=9p2000.L host0 /mnt/my9p

    9P is better with emulation, but let’s just get this working for fun.

    -

    First make sure that this works: Guest to host networking.

    +

    First make sure that this works: Section 14.3.2, “Guest to host networking”.

    Then, build the kernel with NFS support:

    @@ -9042,7 +9106,7 @@ cp "$(./getvar linux_build_dir)/defconfig" data/myconfig
    -

    You can also use other config generating targets such as defconfig with the same method as shown at: Linux kernel defconfig.

    +

    You can also use other config generating targets such as defconfig with the same method as shown at: Section 15.1.3.1.1, “Linux kernel defconfig”.

    @@ -9111,7 +9175,7 @@ CONFIG_IKCONFIG_PROC=y
    @@ -9748,7 +9822,7 @@ mount
    -

    The debug highest level is a bit more magic, see: pr_debug for more info.

    +

    The debug highest level is a bit more magic, see: Section 15.4.2, “pr_debug” for more info.

    15.4.1. ignore_loglevel

    @@ -10719,7 +10793,7 @@ Kernel Offset: disabled
    -

    The sleep is done with usleep_range, see: sleep.

    +

    The sleep is done with usleep_range, see: Section 15.10.2, “sleep”.

    Bibliography:

    @@ -13276,7 +13350,7 @@ sleep 4 & sleep 4 &
    -

    Results (boot not excluded): Boot instruction counts for various setups

    +

    Results (boot not excluded) are shown at: Table 1, “Boot instruction counts for various setups”

    @@ -13590,7 +13664,7 @@ detected buffer overflow in strlen
    -

    SELinux requires glibc: libc choice.

    +

    SELinux requires glibc as mentioned at: Section 19.10, “libc choice”.

    @@ -13817,7 +13891,7 @@ sendkey shift-pgdown
    -

    Linux tries to reboot, and QEMU shutdowns due to the -no-reboot option which we set by default for: Exit emulator on panic.

    +

    Linux tries to reboot, and QEMU shutdowns due to the -no-reboot option which we set by default for, see: Section 15.7.1.3, “Exit emulator on panic”.

    Under the hood, behaviour is controlled by the reboot syscall:

    @@ -14536,7 +14610,7 @@ failed to initialize legacy DRM

    Implements a console for DRM.

    -

    The Linux kernel has a built-in fbdev console: fbcon but not for DRM it seems.

    +

    The Linux kernel has a built-in fbdev console called Linux kernel console fun but not for DRM it seems.

    The upstream project seems dead with last commit in 2014: https://www.freedesktop.org/wiki/Software/kmscon/

    @@ -14640,7 +14714,7 @@ wget \
    -

    STRESS_NG is likely the best, but it requires glibc: libc choice.

    +

    STRESS_NG is likely the best, but it requires glibc, see: Section 19.10, “libc choice”.

    Websites:

    @@ -14755,7 +14829,7 @@ ps
    -

    so as long as we craft the correct DTB and feed it into Xen so that it can see the kernel, it should work. TODO does QEMU support patching the auto-generated DTB with pre-generated options? In the worst case we can just dump it hand hack it up though with -machine dumpdtb: Device tree emulator generation.

    +

    so as long as we craft the correct DTB and feed it into Xen so that it can see the kernel, it should work. TODO does QEMU support patching the auto-generated DTB with pre-generated options? In the worst case we can just dump it hand hack it up though with -machine dumpdtb, see: Section 8.4, “Device tree emulator generation”.

    Bibliography:

    @@ -16162,7 +16236,7 @@ IN:

    PANDA can list memory addresses, so I bet it can also decode the instructions: https://github.com/panda-re/panda/blob/883c85fa35f35e84a323ed3d464ff40030f06bd6/panda/docs/LINE_Censorship.md I wonder why they don’t just upstream those things to QEMU’s tracing: https://github.com/panda-re/panda/issues/290

    -

    gem5 can do it: gem5 tracing.

    +

    gem5 can do it as shown at: Section 17.8.6, “gem5 tracing”.

    @@ -16567,7 +16641,7 @@ root

    18. gem5

    -

    Getting started at: gem5 Buildroot setup.

    +

    Getting started at: Section 1.2, “gem5 Buildroot setup”.

    18.1. gem5 vs QEMU

    @@ -16641,13 +16715,13 @@ root

    runs are deterministic by default, unlike QEMU which has a special QEMU record and replay mode, that requires first playing the content once and then replaying

  • -

    gem5 ARM at least appears to implement more low level CPU functionality than QEMU, e.g. QEMU only added EL2 in 2018: https://stackoverflow.com/questions/42824706/qemu-system-aarch64-entering-el1-when-emulating-a53-power-up See also: ARM exception levels

    +

    gem5 ARM at least appears to implement more low level CPU functionality than QEMU, e.g. QEMU only added EL2 in 2018: https://stackoverflow.com/questions/42824706/qemu-system-aarch64-entering-el1-when-emulating-a53-power-up See also: Section 26.8.1, “ARM exception levels”

  • -

    disadvantage of gem5: slower than QEMU, see: Benchmark Linux kernel boot

    +

    disadvantage of gem5: slower than QEMU, see: Section 28.2.1, “Benchmark Linux kernel boot”

    This implies that the user base is much smaller, since no Android devs.

    @@ -16792,7 +16866,7 @@ cat out/gem5-bench-dhrystone.txt
  • -

    but the problem is that this method does not allow to easily run a different script without running the boot again. The ./gem5.sh script works around that by using m5 readfile as explained further at: gem5 checkpoint restore and run a different script.

    +

    but the problem is that this method does not allow to easily run a different script without running the boot again. The ./gem5.sh script works around that by using m5 readfile as explained further at: Section 18.5.2, “gem5 checkpoint restore and run a different script”.

    Now you can play a fun little game with your friends:

    @@ -16873,7 +16947,99 @@ getconf _NPROCESSORS_CONF
    -
    18.2.2.1.1. gem5 arm more than 8 cores
    +
    18.2.2.1.1. Number of cores in QEMU user mode
    +
    +

    TODO why in User mode simulation QEMU always shows the number of cores of the host. E.g., both of the following output the same as nproc on the host:

    +
    +
    +
    +
    nproc
    +./run --userland userland/cpp/thread_hardware_concurrency.cpp
    +./run --cpus 2 --userland userland/cpp/thread_hardware_concurrency.cpp
    +
    +
    +
    +

    This random page suggests that QEMU splits one host thread thread per guest thread, and thus presumably delegates context switching to the host kernel: https://qemu.weilnetz.de/w64/2012/2012-12-04/qemu-tech.html#User-emulation-specific-details

    +
    +
    +

    We can confirm that with:

    +
    +
    +
    +
    ./run --userland userland/posix/pthread_count.c --userland-args 4
    +ps Haux | grep qemu | wc
    +
    +
    +
    +

    Remember QEMU user mode does not show stdout immediately though.

    +
    +
    +

    At 369a47fc6e5c2f4a7f911c1c058b6088f8824463 + 1 QEMU appears to spawn 3 host threads plus one for every new guest thread created. Remember that userland/posix/pthread_count.c spawns N + 1 total threads if you count the main thread.

    +
    +
    +
    +
    18.2.2.1.2. Number of cores in gem5 user mode
    +
    +

    gem5 user mode multi core has been particularly flaky compared to QEMU’s.

    +
    +
    +

    You have the limitation that you must have at least one core per guest thread, otherwise pthread_create fails. For example:

    +
    +
    +
    +
    ./run --cpus 1 --emulator gem5 --static --userland userland/posix/pthread_self.c --userland-args 1
    +
    +
    +
    +

    fails because that process has a total of 2 threads: one for main and one extra thread spawned: userland/posix/pthread_self.c The error message is:

    +
    +
    +
    +
    pthread_create: Resource temporarily unavailable
    +
    +
    +
    +

    It works however if we add on extra CPU:

    +
    +
    +
    +
    ./run --cpus 2 --emulator gem5 --static --userland userland/posix/pthread_self.c --userland-args 1
    +
    +
    +
    +

    This has to do with the fact that gem5 has a more simplistic thread implementation that does not spawn one host thread per guest thread CPU. Maybe this is required to achieve reproducible runs? What is the task switch algorithm then?

    +
    +
    +

    gem5 threading does however show the expected number of cores, e.g.:

    +
    +
    +
    +
    ./run --cpus 1 --userland userland/cpp/thread_hardware_concurrency.cpp --emulator gem5 --static
    +./run --cpus 2 --userland userland/cpp/thread_hardware_concurrency.cpp --emulator gem5 --static
    +
    +
    +
    +

    outputs 1 and 2 respectively.

    +
    +
    +

    TODO: aarch64 seems to failing to spawn more than 2 threads at 369a47fc6e5c2f4a7f911c1c058b6088f8824463 + 1:

    +
    +
    +
    +
    ./run --arch aarch64 --cpus 3 --emulator gem5 --static --userland userland/posix/pthread_self.c --userland-args 2
    +
    +
    +
    +

    fails with:

    +
    +
    +
    +
    Exiting @ tick 18446744073709551615 because simulate() limit reached
    +
    +
    +
    +
    +
    18.2.2.1.3. gem5 ARM full system with more than 8 cores

    https://stackoverflow.com/questions/50248067/how-to-run-a-gem5-arm-aarch64-full-system-simulation-with-fs-py-with-more-than-8

    @@ -16913,7 +17079,7 @@ getconf _NPROCESSORS_CONF
    -

    But keep in mind that it only affects benchmark performance of the most detailed CPU types: gem5 cache support in function of CPU type

    +

    But keep in mind that it only affects benchmark performance of the most detailed CPU types as shown at: Table 2, “gem5 cache support in function of CPU type”.

    Table 1. Boot instruction counts for various setups
    @@ -17184,7 +17350,7 @@ m5 dumpstats
    -

    There are not yet enabled, but it should be easy to so, see: Add new Buildroot packages

    +

    There are not yet enabled, but it should be easy to so, see: Section 19.5, “Add new Buildroot packages”

    18.2.3.1. BST vs heap vs hashmap
    @@ -17277,10 +17443,10 @@ xdg-open bst_vs_heap_vs_hashmap_gem5.tmp.png

    TODO: the gem5 simulation blows up on a tcmalloc allocation somewhere near 25k elements as of 3fdd83c2c58327d9714fa2347c724b78d7c05e2b + 1, likely linked to the extreme inefficiency of the stats collection?

    -

    The cache sizes were chosen to match the host P51 to improve the comparison. Ideally we sould also use the same standard library.

    +

    The cache sizes were chosen to match the host P51 to improve the comparison. Ideally we should also use the same standard library.

    -

    Note that this will take a long time, and will produce a humongous ~40Gb stats file due to: gem5 only dump selected stats

    +

    Note that this will take a long time, and will produce a humongous ~40Gb stats file as explained at: Section 18.10.2.1, “gem5 only dump selected stats”

    Sources:

    @@ -17571,7 +17737,7 @@ parsecmgmt -a run -p splash2x.fmm -i test
    18.2.3.4.4. PARSEC uninstall
    -

    If you want to remove PARSEC later, Buildroot doesn’t provide an automated package removal mechanism: Remove Buildroot packages, but the following procedure should be satisfactory:

    +

    If you want to remove PARSEC later, Buildroot doesn’t provide an automated package removal mechanism as mentioned at: Section 19.6, “Remove Buildroot packages”, but the following procedure should be satisfactory:

    @@ -17706,13 +17872,13 @@ git clean -xdf .

    When you want to break, just do a Ctrl-C on GDB shell, and then continue.

    -

    And we now see the boot messages, and then get a shell. Now try the ./count.sh procedure described for QEMU: GDB step debug kernel post-boot.

    +

    And we now see the boot messages, and then get a shell. Now try the ./count.sh procedure described for QEMU at: Section 2.2, “GDB step debug kernel post-boot”.

    18.4.2. gem5 GDB step debug userland process

    -

    We are unable to use gdbserver because of networking: gem5 host to guest networking

    +

    We are unable to use gdbserver because of networking as mentioned at: Section 14.3.1.3, “gem5 host to guest networking”

    The alternative is to do as in GDB step debug userland processes.

    @@ -18526,7 +18692,7 @@ git -C "$(./getvar linux_source_dir)" checkout -
    -

    because glibc was built to expect a newer Linux kernel: FATAL: kernel too old. Your choices to sole this are:

    +

    because glibc was built to expect a newer Linux kernel as shown at: Section 10.4.1, “FATAL: kernel too old”. Your choices to sole this are:

    -

    18.15. gem5 clang build

    +

    18.15. gem5 build options

    +
    +

    In order to use different build options, you might also want to use gem5 build variants to keep the build outputs separate from one another.

    +
    +
    +

    18.15.1. gem5 debug build

    +
    +

    The gem5.debug executable has optimizations turned off unlike the default gem5.opt, and provides a much better debug experience:

    +
    +
    +
    +
    ./build-gem5 --arch aarch64 --gem5-build-type debug
    +./run --arch aarch64 --debug-vm --emulator gem5 --gem5-build-type debug
    +
    +
    +
    +

    The build outputs are automatically stored in a different directory from other build types such as .opt build, which prevents .debug files from overwriting .opt ones.

    +
    +
    +

    Therefore, --gem5-build-id is not required.

    +
    +
    +

    The price to pay for debuggability is high however: a Linux kernel boot was about 14 times slower than opt at 71e927e63bda6507d5a528f22c78d65099bdf36f between the commands:

    +
    +
    +
    +
    ./run --arch aarch64 --eval 'm5 exit' --emulator gem5 --linux-build-id v4.16
    +./run --arch aarch64 --eval 'm5 exit' --emulator gem5 --linux-build-id v4.16 --gem5-build-type debug
    +
    +
    +
    +

    so you will likely only use this when it is unavoidable. This is also benchmarked at: Section 28.2.1, “Benchmark Linux kernel boot”

    +
    +
    +
    +

    18.15.2. gem5 clang build

    TODO test properly, benchmark vs GCC.

    @@ -18930,6 +19131,69 @@ clock=500
    +
    +

    18.15.3. gem5 sanitation build

    +
    +

    If there gem5 appears to have a C++ undefined behaviour bug, which is often very difficult to track down, you can try to build it with the following extra SCons options:

    +
    +
    +
    +
    ./build-gem5 --gem5-build-id san --verbose -- --with-ubsan --without-tcmalloc
    +
    +
    +
    +

    This will make GCC do a lot of extra sanitation checks at compile and run time.

    +
    +
    +

    As a result, the build and runtime will be way slower than normal, but that still might be the fastest way to solve undefined behaviour problems.

    +
    +
    +

    Ideally, we should also be able to run it with asan with --with-asan, but if we try then the build fails at gem5 16eeee5356585441a49d05c78abc328ef09f7ace (with two ubsan trivial fixes I’ll push soon):

    +
    +
    +
    +
    =================================================================
    +==9621==ERROR: LeakSanitizer: detected memory leaks
    +
    +Direct leak of 371712 byte(s) in 107 object(s) allocated from:
    +    #0 0x7ff039804448 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10c448)
    +    #1 0x7ff03950d065 in dictresize ../Objects/dictobject.c:643
    +
    +Direct leak of 23728 byte(s) in 26 object(s) allocated from:
    +    #0 0x7ff039804448 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10c448)
    +    #1 0x7ff03945e40d in _PyObject_GC_Malloc ../Modules/gcmodule.c:1499
    +    #2 0x7ff03945e40d in _PyObject_GC_Malloc ../Modules/gcmodule.c:1493
    +
    +Direct leak of 2928 byte(s) in 43 object(s) allocated from:
    +    #0 0x7ff03980487e in __interceptor_realloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10c87e)
    +    #1 0x7ff03951d763 in list_resize ../Objects/listobject.c:62
    +    #2 0x7ff03951d763 in app1 ../Objects/listobject.c:277
    +    #3 0x7ff03951d763 in PyList_Append ../Objects/listobject.c:289
    +
    +Direct leak of 2002 byte(s) in 3 object(s) allocated from:
    +    #0 0x7ff039804448 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10c448)
    +    #1 0x7ff0394fd813 in PyString_FromStringAndSize ../Objects/stringobject.c:88
    +    #2 0x7ff0394fd813 in PyString_FromStringAndSize ../Objects/stringobject.c:57                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Direct leak of 40 byte(s) in 2 object(s) allocated from:                                                                                                                                                                                                                            #0 0x7ff039804448 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10c448)
    +    #1 0x7ff03951ea4b in PyList_New ../Objects/listobject.c:152
    +
    +Indirect leak of 10384 byte(s) in 11 object(s) allocated from:                                                                                                                                                                                                                      #0 0x7ff039804448 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10c448)                                                                                                                                                                                                   #1 0x7ff03945e40d in _PyObject_GC_Malloc ../Modules/gcmodule.c:1499                                                                                                                                                                                                             #2 0x7ff03945e40d in _PyObject_GC_Malloc ../Modules/gcmodule.c:1493
    +
    +Indirect leak of 4089 byte(s) in 6 object(s) allocated from:
    +    #0 0x7ff039804448 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10c448)
    +    #1 0x7ff0394fd648 in PyString_FromString ../Objects/stringobject.c:143
    +
    +Indirect leak of 2090 byte(s) in 3 object(s) allocated from:
    +    #0 0x7ff039804448 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10c448)                                                                                                                                                                                                   #1 0x7ff0394eb36f in type_new ../Objects/typeobject.c:2421                                                                                                                                                                                                                      #2 0x7ff0394eb36f in type_new ../Objects/typeobject.c:2094
    +Indirect leak of 1346 byte(s) in 2 object(s) allocated from:
    +    #0 0x7ff039804448 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10c448)
    +    #1 0x7ff0394fd813 in PyString_FromStringAndSize ../Objects/stringobject.c:88                                                                                                                                                                                                    #2 0x7ff0394fd813 in PyString_FromStringAndSize ../Objects/stringobject.c:57                                                                                                                                                                                                                                                                                                                                                                                                                                                                                SUMMARY: AddressSanitizer: 418319 byte(s) leaked in 203 allocation(s).
    +
    +
    +
    +

    From the message, this appears however to be a Python / pyenv11 bug however and not in gem5 specifically. I think it worked when I tried it in the past in an older gem5 / Ubuntu.

    +
    +
    +
    @@ -18949,7 +19213,7 @@ clock=500

    Linux kernel

  • -

    C standard library: Buildroot supports several implementations, see: libc choice

    +

    C standard library: Buildroot supports several implementations, see: Section 19.10, “libc choice”

  • BusyBox: provides the shell and basic command line utilities

    @@ -19032,7 +19296,7 @@ qemu-system-aarch64 -M virt -cpu cortex-a57 -nographic -smp 1 -kernel output/ima

    The clean is necessary because the source files didn’t change, so make would just check the timestamps and not build anything.

  • -

    You will then likely want to make those more permanent with: Default command line arguments

    +

    You will then likely want to make those more permanent as explained at: Section 29.4, “Default command line arguments”.

    19.2.1. Enable Buildroot compiler optimizations

    @@ -19067,7 +19331,7 @@ qemu-system-aarch64 -M virt -cpu cortex-a57 -nographic -smp 1 -kernel output/ima
    @@ -19207,7 +19471,7 @@ make menuconfig

    If none of those methods are flexible enough for you, you can just fork or hack up buildroot_packages/sample_package the sample package to do what you want.

    -

    For how to use that package, see: buildroot_packages directory.

    +

    For how to use that package, see: Section 29.12.2, “buildroot_packages directory”.

    Then iterate trying to do what you want and reading the manual until it works: https://buildroot.org/downloads/manual/manual.html

    @@ -19225,7 +19489,7 @@ make menuconfig

    Also mentioned at: https://stackoverflow.com/questions/47320800/how-to-clean-only-target-in-buildroot

    -

    See this for a sample manual workaround: PARSEC uninstall.

    +

    See this for a sample manual workaround: Section 18.2.3.4.4, “PARSEC uninstall”.

    @@ -19273,7 +19537,7 @@ TODO benchmark: would gem5 suffer a considerable disk read performance hit due t

    libguestfs: https://serverfault.com/questions/246835/convert-directory-to-qemu-kvm-virtual-disk-image/916697#916697, in particular vfs-minimum-size

  • -

    use methods described at: gem5 checkpoint restore and run a different script instead of putting builds on the root filesystem

    +

    use methods described at: Section 18.5.2, “gem5 checkpoint restore and run a different script” instead of putting builds on the root filesystem

  • @@ -19392,7 +19656,7 @@ git -C "$(./getvar qemu_source_dir)" checkout -

    Then, you will also want to do a Bisection to pinpoint the exact commit to blame, and CC that developer.

    -

    Finally, give the images you used save upstream developpers time: release-zip.

    +

    Finally, give the images you used save upstream developers' time as shown at: Section 29.17.2, “release-zip”.

    For Buildroot problems, you should wither provide the config you have:

    @@ -19451,7 +19715,7 @@ git -C "$(./getvar qemu_source_dir)" checkout -
    -

    One "downside" of glibc is that it exercises much more kernel functionality on its more bloated pre-main init, which breaks user mode C hello worlds more often, see: User mode simulation with glibc. I quote "downside" because glibc is actually exposing emulator bugs which we should actually go and fix.

    +

    One "downside" of glibc is that it exercises much more kernel functionality on its more bloated pre-main init, which breaks user mode C hello worlds more often, see: Section 10.4, “User mode simulation with glibc”. I quote "downside" because glibc is actually exposing emulator bugs which we should actually go and fix.

    @@ -19463,16 +19727,16 @@ git -C "$(./getvar qemu_source_dir)" checkout -

    This section contains userland content, such as C, C++ and POSIX examples.

    -

    Getting started at: Userland setup

    +

    Getting started at: Section 1.6, “Userland setup”

    -

    Userland assembly content is located at: Userland assembly. It was split from this section basically becase we were hitting the HTML h6 limit, stupid web :-)

    +

    Userland assembly content is located at: Section 21, “Userland assembly”. It was split from this section basically because we were hitting the HTML h6 limit, stupid web :-)

    This content makes up the bulk of the userland/ directory.

    -

    The quickest way to run the arch agnostic examples, which comprise the majority of the examples, is natively with: Userland setup getting started natively

    +

    The quickest way to run the arch agnostic examples, which comprise the majority of the examples, is natively as shown at: Section 1.6.2.1, “Userland setup getting started natively”

    This section was originally moved in here from: https://github.com/cirosantilli/cpp-cheat

    @@ -19654,7 +19918,20 @@ git -C "$(./getvar qemu_source_dir)" checkout -
    -

    20.2.2. C++ standards

    +

    20.2.2. C++ standards

    Like for C, you have to pay for the standards…​ insane. So we just use the closest free drafts instead.

    @@ -19688,7 +19965,7 @@ git -C "$(./getvar qemu_source_dir)" checkout -

    Programs under userland/posix/ are examples of POSIX C programming.

    -

    What is POSIX:

    +

    These links provide a clear overview of what POSIX is:

    -

    20.3.1. sysconf

    +

    20.3.1. unistd.h

    +
    + +
    +
    +
    +

    20.3.2. pthreads

    +
    +

    POSIX' multithreading API. This was for a looong time the only "portable" multithreading alternative, until C++11 finally added threads, thus also extending the portability to Windows.

    +
    + +
    +
    +

    20.3.3. sysconf

    @@ -19742,9 +20048,23 @@ git -C "$(./getvar qemu_source_dir)" checkout -
    +
    @@ -19788,7 +20124,7 @@ git -C "$(./getvar qemu_source_dir)" checkout -
    -

    Like other userland programs, these programs can be run as explained at: Userland setup.

    +

    Like other userland programs, these programs can be run as explained at: Section 1.6, “Userland setup”.

    As a quick reminder, the fastest setups to get started are:

    @@ -19804,7 +20140,7 @@ git -C "$(./getvar qemu_source_dir)" checkout -
    -

    However, as usual, it is saner to build your toolchain as explained at: QEMU user mode getting started.

    +

    However, as usual, it is saner to build your toolchain as explained at: Section 10.1, “QEMU user mode getting started”.

    The first examples you should look into are:

    @@ -19857,7 +20193,7 @@ git -C "$(./getvar qemu_source_dir)" checkout -
  • -

    registers: Assembly registers

    +

    registers, see: Section 21.1, “Assembly registers”

  • jumping:

    @@ -20007,7 +20343,7 @@ error: asm_main returned 1 at line 8
    -

    In particular, most userland assembly examples link to the C standard library: Userland assembly C standard library.

    +

    In particular, most userland assembly examples link to the C standard library, see: Section 21.5, “Userland assembly C standard library”.

    Userland assembly is generally simpler, and a pre-requisite for Baremetal setup.

    -

    System-land assembly cheats will be put under: Baremetal setup.

    +

    System-land assembly cheats will be put under: Section 1.7, “Baremetal setup”.

    @@ -20363,7 +20699,7 @@ When instructions do not interpret this operand encoding as the zero register, u

    Unlike most our other assembly examples, which use the C standard library for portability, examples under freestanding/ directories don’t link to the C standard library.

    -

    As a result, those examples cannot do IO portably, and so they make raw system calls and only be run on one given OS, e.g. Linux: Linux system calls.

    +

    As a result, those examples cannot do IO portably, and so they make raw system calls and only be run on one given OS, e.g. Linux system calls.

    Such executables are called freestanding because they don’t execute the glibc initialization code, but rather start directly on our custom hand written assembly.

    @@ -20467,7 +20803,7 @@ When instructions do not interpret this operand encoding as the zero register, u

    In arm, it is the only way to achieve this effect: https://stackoverflow.com/questions/10831792/how-to-use-specific-register-in-arm-inline-assembler

    -

    This feature notably useful for making system calls from C, see: Linux system calls.

    +

    This feature notably useful for making system calls from C, see: Section 21.7, “Linux system calls”.

    Documentation: https://gcc.gnu.org/onlinedocs/gcc-4.4.2/gcc/Explicit-Reg-Vars.html

    @@ -20780,7 +21116,7 @@ zmmintrin.h AVX512
  • Table 2. gem5 cache support in function of CPU type
    @@ -21004,7 +21340,7 @@ zmmintrin.h AVX512
    -

    Conclusion: Summary of GNU GAS assembler data sizes

    +

    The results are shown at: Table 4, “Summary of GNU GAS assembler data sizes”.

    Table 3. Summary of Linux calling conventions for several architectures
    @@ -21125,7 +21461,7 @@ zmmintrin.h AVX512
  • -

    cannot have implicit destination with shift, see: ARM shift suffixes

    +

    cannot have implicit destination with shift, see: Section 23.4.4.1, “ARM shift suffixes”

  • @@ -21173,7 +21509,7 @@ zmmintrin.h AVX512

    x86: NOP

  • -

    ARM: ARM NOP instruction

    +

    ARM: Section 23.5.1, “ARM NOP instruction”

  • @@ -21193,7 +21529,7 @@ zmmintrin.h AVX512

    22. x86 userland assembly

    -

    Arch agnostic infrastructure getting started at: Userland assembly.

    +

    Arch agnostic infrastructure getting started at: Section 21, “Userland assembly”.

    22.1. x86 registers

    @@ -21470,7 +21806,7 @@ add $8, %rsp
    -

    GNU GAS accepts both syntaxes: CQTO and CLTQ family Intel vs AT&T

    +

    GNU GAS accepts both syntaxes, see: Table 5, “CQTO and CLTQ family Intel vs AT&T”.

    Table 4. Summary of GNU GAS assembler data sizes
    @@ -21584,7 +21920,7 @@ add $8, %rsp

    This is partly why the ternary ? C operator exists: https://stackoverflow.com/questions/3565368/ternary-operator-vs-if-else

    -

    It is interesting to compare this with ARMv7 conditional executaion: which is available for all instructions: ARM conditional execution

    +

    It is interesting to compare this with ARMv7 conditional execution: which is available for all instructions, as shown at: Section 23.2.5, “ARM conditional execution”.

    @@ -22040,7 +22376,7 @@ pop %rbp

    Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.1.13 "Miscellaneous Instructions"

    -

    NOP: NOP instructions

    +

    NOP: Section 21.10, “NOP instructions”

    @@ -22272,7 +22608,7 @@ pop %rbp

    22.12. x86 SIMD

    -

    Parent section: SIMD assembly

    +

    Parent section: Section 21.3, “SIMD assembly”

    History:

    @@ -22337,7 +22673,7 @@ pop %rbp
    @@ -22367,7 +22703,7 @@ pop %rbp

    userland/arch/x86_64/paddq.S: PADDQ, PADDL, PADDW, PADDB

    -

    Good first instruction to learn SIMD: SIMD assembly

    +

    Good first instruction to learn SIMD assembly.

    @@ -22616,7 +22952,7 @@ taskset -c 1 ./userland/arch/x86_64/rdtscp.out | tail -n 1

    23. ARM userland assembly

    -

    Arch general getting started at: Userland assembly.

    +

    Arch general getting started at: Section 21, “Userland assembly”.

    Instructions here loosely grouped based on that of the ARMv7 architecture reference manual Chapter A4 "The Instruction Sets".

    @@ -22740,10 +23076,10 @@ taskset -c 1 ./userland/arch/x86_64/rdtscp.out | tail -n 1
    @@ -23136,7 +23472,7 @@ Bibliography:
    23.2.3.1. ARM BX instruction
    @@ -23206,7 +23542,17 @@ Bibliography:

    23.3. ARM load and store instructions

    -

    In ARM, there are only two instruction families that do memory access: ARM LDR instruction to load and ARM STR instruction to store.

    +

    In ARM, there are only two instruction families that do memory access:

    +
    +
    +

    Everything else works on register and immediates.

    @@ -23647,7 +23993,7 @@ ldmia sp!, reglist

    Move an immediate to a register, or a register to another register.

    -

    Cannot load from or to memory, since only the LDR and STR instruction families can do that in ARM: ARM load and store instructions

    +

    Cannot load from or to memory, since only the LDR and STR instruction families can do that in ARM as mentioned at: Section 23.3, “ARM load and store instructions”.

    @@ -23827,7 +24173,7 @@ ldmia sp!, reglist

    23.5.1. ARM NOP instruction

    -

    Parent section: NOP instructions

    +

    Parent section: Section 21.10, “NOP instructions”

    There are a few different ways to encode NOP, notably MOV a register into itself, and a dedicated miscellaneous instruction.

    @@ -23870,7 +24216,7 @@ ldmia sp!, reglist

    23.6. ARM SIMD

    -

    Parent section: SIMD assembly

    +

    Parent section: Section 21.3, “SIMD assembly”

    23.6.1. ARM VFP

    @@ -23959,10 +24305,10 @@ ldmia sp!, reglist @@ -24043,7 +24389,7 @@ ldmia sp!, reglist

    The feature is often refered to simply as "SIMD&FP" throughout the manual.

    -

    The Linux kernel shows /proc/cpuinfo compatibility as neon, which is yet another intermediate name that came up at some point: ARM NEON

    +

    The Linux kernel shows /proc/cpuinfo compatibility as neon, which is yet another intermediate name that came up at some point, see: Section 23.6.2.2, “ARM NEON”.

    Vs ARM VFP: https://stackoverflow.com/questions/4097034/arm-cortex-a8-whats-the-difference-between-vfp-and-neon

    @@ -24155,7 +24501,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    userland/arch/aarch64/add_vector.S

    -

    Good first instruction to learn SIMD: SIMD assembly

    +

    Good first instruction to learn SIMD: SIMD assembly.

    @@ -24163,17 +24509,17 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    23.6.3.2.1. ARM FADD vs VADD
    -

    It is very confusing, but FADDS and FADDD in Aarch32 are pre-UAL for vadd.f32 and vadd.f64 which we use in this tutorial: ARM VADD instruction

    +

    It is very confusing, but FADDS and FADDD in Aarch32 are pre-UAL for vadd.f32 and vadd.f64 which we use in this tutorial, see: Section 23.6.1.2, “ARM VADD instruction”

    The same goes for most ARMv7 mnemonics: f* is old, and v* is the newer better syntax.

    @@ -24185,7 +24531,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    Also keep in mind that fused multiply add is FMADD.

    -

    Examples at: SIMD assembly

    +

    Examples at: Section 21.3, “SIMD assembly”

    @@ -24541,12 +24887,12 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    26. Baremetal

    -

    Getting started at: Baremetal setup

    +

    Getting started at: Section 1.7, “Baremetal setup”

    26.1. Baremetal GDB step debug

    -

    GDB step debug works on baremetal exactly as it does on the Linux kernel: GDB step debug.

    +

    GDB step debug works on baremetal exactly as it does on the Linux kernel, which is described at: Section 2, “GDB step debug”.

    Except that is is even cooler here since we can easily control and understand every single instruction that is being run!

    @@ -24645,7 +24991,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.

    the stack pointer

  • -

    NEON: aarch64 baremetal NEON setup

    +

    NEON: Section 26.9.2, “aarch64 baremetal NEON setup”

  • TODO: we don’t do this currently but maybe we should setup BSS

    @@ -25316,7 +25662,7 @@ IN:

    A good representation of the format of the vector table can also be found at Programmer’s Guide for ARMv8-A Table 10-2 "Vector table offsets from vector table base address".

  • -

    The first part of the table contains: Summary of ARMv8 vector handlers

    +

    The first part of the table contains: Table 6, “Summary of ARMv8 vector handlers”.

    Table 5. CQTO and CLTQ family Intel vs AT&T
    @@ -25414,7 +25760,7 @@ IN:

    Exception Syndrome Register.

    -

    See example at: ARM SVC instruction

    +

    See example at: Section 26.8.2, “ARM SVC instruction”

    Documentation: ARMv8 architecture reference manual db D12.2.36 "ESR_EL1, Exception Syndrome Register (EL1)".

    @@ -25426,7 +25772,7 @@ IN:

    Exception Link Register.

    -

    See example at: ARM SVC instruction

    +

    See the example at: Section 26.8.2, “ARM SVC instruction”

    @@ -25498,7 +25844,7 @@ IN:

    since gem5 is able to detect when nothing will ever happen, and exits.

    -

    When GDB step debugging, switch between cores with the usual thread commands, see also: GDB step debug multicore userland.

    +

    When GDB step debugging, switch between cores with the usual thread commands, see also: Section 2.9, “GDB step debug multicore userland”.

    Bibliography: https://stackoverflow.com/questions/980999/what-does-multicore-assembly-language-look-like/33651438#33651438

    @@ -25708,7 +26054,7 @@ IN:

    26.8.6. ARM baremetal bibliography

    -

    First, also consider the userland bibliography: ARM assembly bibliography.

    +

    First, also consider the userland bibliography: Section 23.8, “ARM assembly bibliography”.

    The most useful ARM baremetal example sets we’ve seen so far are:

    @@ -26012,7 +26358,7 @@ ISB

    In baremetal, we detect if tests failed by parsing logs for the Magic failure string.

    -

    See: Test this repo for more useful testing tips.

    +

    See: Section 29.13, “Test this repo” for more useful testing tips.

    @@ -26336,7 +26682,7 @@ date >/system/a

    27.3. Android init

    -

    For Linux in general, see: init.

    +

    For Linux in general, see: Section 6, “init”.

    The /init executable interprets the /init.rc files, which is in a custom Android init system language: https://android.googlesource.com/platform/system/core/+/ee0e63f71d90537bb0570e77aa8a699cc222cfaf/init/README.md

    @@ -27009,51 +27355,9 @@ tail -n+1 ../linux-kernel-module-cheat-regression/*/gem5-bench-build-*.txt
    -

    29.5. Build the documentation

    +

    29.5. Documentation

    -

    You don’t need to depend on GitHub.

    -
    -
    -

    For a quick and dirty build, install Asciidoctor however you like and build:

    -
    -
    -
    -
    asciidotor README.adoc
    -xdg-open README.html
    -
    -
    -
    -

    For development, you will want to do a more controlled build with extra error checking as follows.

    -
    -
    -

    For the initial build do:

    -
    -
    -
    -
    ./build --download-dependencies docs
    -
    -
    -
    -

    which also downloads build dependencies.

    -
    -
    -

    Then the following times just to the faster:

    -
    -
    -
    -
    ./build-doc
    -
    -
    -
    -

    Source: build-doc

    -
    -
    -

    The HTML output is located at:

    -
    -
    -
    -
    xdg-open out/README.html
    -
    +

    To learn how to build the documentation see: Section 1.8, “Build the documentation”.

    29.5.1. Documentation verification

    @@ -27460,7 +27764,7 @@ less "$(./getvar --arch aarch64 --emulator gem5 --run-id 1 termout_file)"
    -

    To run multiple gem5 checkouts, see: gem5 worktree.

    +

    To run multiple gem5 checkouts, see: Section 29.11.3.1, “gem5 worktree”.

    Implementation note: we create multiple namespaces for two things:

    @@ -27543,7 +27847,7 @@ git -C "$(./getvar linux_source_dir)" checkout -
    -

    To run both kernels simultaneously, one on each QEMU instance, see: Simultaneous runs.

    +

    To run both kernels simultaneously, one on each QEMU instance, see: Section 29.10, “Simultaneous runs”.

    @@ -27689,36 +27993,6 @@ gem5_internal="$(pwd)/gem5-internal"

    With this setup, both your private gem5 source and build are safely kept outside of this public repository.

    -
    -
    29.11.3.3. gem5 debug build
    -
    -

    The gem5.debug executable has optimizations turned off unlike the default gem5.opt, and provides a much better debug experience:

    -
    -
    -
    -
    ./build-gem5 --arch aarch64 --gem5-build-type debug
    -./run --arch aarch64 --debug-vm --emulator gem5 --gem5-build-type debug
    -
    -
    -
    -

    The build outputs are automatically stored in a different directory from other build types such as .opt build, which prevents .debug files from overwriting .opt ones.

    -
    -
    -

    Therefore, --gem5-build-id is not required.

    -
    -
    -

    The price to pay for debuggability is high however: a Linux kernel boot was about 14 times slower than opt at 71e927e63bda6507d5a528f22c78d65099bdf36f between the commands:

    -
    -
    -
    -
    ./run --arch aarch64 --eval 'm5 exit' --emulator gem5 --linux-build-id v4.16
    -./run --arch aarch64 --eval 'm5 exit' --emulator gem5 --linux-build-id v4.16 --gem5-build-type debug
    -
    -
    -
    -

    so you will likely only use this when it is unavoidable.

    -
    -

    29.11.4. Buildroot build variants

    @@ -27968,7 +28242,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -

    Source: copy-overlay

    -

    Build Buildroot is required for the same reason as described at: Your first kernel module hack.

    +

    Build Buildroot is required for the same reason as described at: Section 1.1.2.2, “Your first kernel module hack”.

    However, since the rootfs_overlay directory does not require compilation, unlike say kernel modules, we also make it 9P available to the guest directly even without ./copy-overlay at:

    @@ -28204,7 +28478,7 @@ echo $?

    Failure is detected by looking for the Magic failure string

    -

    Most userland programs that don’t rely on kernel modules can also be tested in user mode simulation as explained at: User mode tests.

    +

    Most userland programs that don’t rely on kernel modules can also be tested in user mode simulation as explained at: Section 10.2, “User mode tests”.

    @@ -28297,7 +28571,7 @@ echo $?

    gem5: m5 fail works on all archs

  • -

    user mode: QEMU forwards exit status, gem5 we do some log parsing: gem5 syscall emulation exit status

    +

    user mode: QEMU forwards exit status, for gem5 we do some log parsing as described at: Section 10.6.1, “gem5 syscall emulation exit status”

  • @@ -28458,7 +28732,7 @@ echo $?

    When updating the Linux kernel, QEMU and gem5, things sometimes break.

    -

    However, for many types of crashes, it is trivial to bisect down to the offending commit, in particular because we can make QEMU and gem5 exit with status 1 on kernel panic: Exit emulator on panic.

    +

    However, for many types of crashes, it is trivial to bisect down to the offending commit, in particular because we can make QEMU and gem5 exit with status 1 on kernel panic as mentioned at: Section 15.7.1.3, “Exit emulator on panic”.

    For example, when updating from QEMU v2.12.0 to v3.0.0-rc3, the Linux kernel boot started to panic for arm.

    @@ -28595,7 +28869,7 @@ git commit -m "linux: update to ${next_mainline_revision}"
    -

    The ./build-test command builds a superset of what will be downloaded which also tests other things we would like to be working on the release. For the minimal build to generate the files to be uploaded, see: release-zip

    +

    The ./build-test command builds a superset of what will be downloaded which also tests other things we would like to be working on the release. For the minimal build to generate the files to be uploaded, see: Section 29.17.2, “release-zip”

    The clean build is necessary as it generates clean images since it is not possible to remove Buildroot packages

    @@ -28841,7 +29115,7 @@ git push --follow-tags

    compatibility: how likely is is that all the components will work well together: emulator, compiler, kernel, standard library, …​

  • -

    guest software availability: how wide is your choice of easily installed guest software packages? See also: Linux distro choice

    +

    guest software availability: how wide is your choice of easily installed guest software packages? See also: Section 29.18.4, “Linux distro choice”

  • @@ -28888,13 +29162,13 @@ git push --follow-tags
    -

    In order to learn how to measure some of those aspects, see: Benchmark this repo

    +

    In order to learn how to measure some of those aspects, see: Section 28, “Benchmark this repo”.

    29.18.4. Linux distro choice

    -

    We haven’t found the ultimate distro yet, here is a summary table of trade-offs that we care about: Comparison of Linux distros for usage in this repository

    +

    We haven’t found the ultimate distro yet, here is a summary table of trade-offs that we care about: Table 7, “Comparison of Linux distros for usage in this repository”.

    Table 6. Summary of ARMv8 vector handlers
    Table 7. Comparison of Linux distros for usage in this repository