+diff --git a/index.html b/index.html index 3204859..cf410a7 100644 --- a/index.html +++ b/index.html @@ -461,1333 +461,1339 @@ pre{ white-space:pre }
The perfect emulation setup to study and develop the Linux kernel v5.9.2, kernel modules, QEMU, gem5 and x86_64, ARMv7 and ARMv8 userland and baremetal assembly, ANSI C, C++ and POSIX. GDB step debug and KGDB just work. Powered by Buildroot and crosstool-NG. Highly automated. Thoroughly documented. Automated tests. "Tested" in an Ubuntu 20.04 host.
TL;DR: Section 1.2.1, “QEMU Buildroot setup getting started”
+TL;DR: Section 2.2.1, “QEMU Buildroot setup getting started”
The source code for this page is located at: https://github.com/cirosantilli/linux-kernel-module-cheat. Due to a GitHub limitation, this README is too long and not fully rendered on github.com, so either use: https://cirosantilli.com/linux-kernel-module-cheat or build the docs yourself.
+--chinam5out/system.workload.dmesg filem5out/system.workload.dmesg filePacket vs Request
+Packet vs Request
MSHRCommMonitorSimpleMemoryMSHRCommMonitorSimpleMemoryThreadContext vs ThreadState vs ExecContext vs Process
+ThreadContext vs ThreadState vs ExecContext vs Process
--chinaThe most important functionality of this repository is the --china option, sample usage:
./setup +./run --china > index.html +firefox index.html+
see also: https://cirosantilli.com/china-dictatorship/#mirrors
+The secondary systems programming functionality is described on the sections below starting from Getting started.
+
+Each child section describes a possible different setup for this repo.
@@ -2341,10 +2373,10 @@ pre{ white-space:pre }If you don’t know which one to go for, start with QEMU Buildroot setup getting started.
Design goals of this project are documented at: Section 37.20.1, “Design goals”.
+Design goals of this project are documented at: Section 38.20.1, “Design goals”.
Being the hardcore person who fully understands an important complex system such as a computer, it does have a nice ring to it doesn’t it?
This setup has been mostly tested on Ubuntu. For other host operating systems see: Section 37.1, “Supported hosts”. For greater stability, consider using the latest release instead of master: https://github.com/cirosantilli/linux-kernel-module-cheat/releases
+This setup has been mostly tested on Ubuntu. For other host operating systems see: Section 38.1, “Supported hosts”. For greater stability, consider using the latest release instead of master: https://github.com/cirosantilli/linux-kernel-module-cheat/releases
Reserve 12Gb of disk and run:
@@ -2456,7 +2488,7 @@ cd linux-kernel-module-cheatYou don’t need to clone recursively even though we have .git submodules: download-dependencies fetches just the submodules that you need for this build to save time.
If something goes wrong, see: Section 37.2, “Common build issues” and use our issue tracker: https://github.com/cirosantilli/linux-kernel-module-cheat/issues
+If something goes wrong, see: Section 38.2, “Common build issues” and use our issue tracker: https://github.com/cirosantilli/linux-kernel-module-cheat/issues
The initial build will take a while (30 minutes to 2 hours) to clone and build, see Benchmark builds for more details.
@@ -2524,7 +2556,7 @@ hello2 cleanupSee also: Section 13.1.1, “Quit QEMU from text mode”.
+See also: Section 14.1.1, “Quit QEMU from text mode”.
All available modules can be found in the kernel_modules directory.
@@ -2540,7 +2572,7 @@ hello2 cleanupTo avoid typing --arch aarch64 many times, you can set the default arch as explained at: Section 37.4, “Default command line arguments”
To avoid typing --arch aarch64 many times, you can set the default arch as explained at: Section 38.4, “Default command line arguments”
I now urge you to read the following sections which contain widely applicable information:
@@ -2619,12 +2651,12 @@ hello /root/.profileBesides a seamless initial build, this project also aims to make it effortless to modify and rebuild several major components of the system, to serve as an awesome development setup.
Let’s hack up the Linux kernel entry point, which is an easy place to start.
see also: Dry run to get commands for your project.
When you reach difficulties, QEMU makes it possible to easily GDB step debug the Linux kernel source code, see: Section 2, “GDB step debug”.
+When you reach difficulties, QEMU makes it possible to easily GDB step debug the Linux kernel source code, see: Section 3, “GDB step debug”.
Edit kernel_modules/hello.c to contain:
All of this put together makes the safe procedure acceptably fast for regular development as well.
It is also easy to GDB step debug kernel modules with our setup, see: Section 2.4, “GDB step debug kernel module”.
+It is also easy to GDB step debug kernel modules with our setup, see: Section 3.4, “GDB step debug kernel module”.
We use glibc as our default libc now, and it is tracked as an unmodified submodule at submodules/glibc, at the exact same version that Buildroot has it, which can be found at: package/glibc/glibc.mk. Buildroot 2018.05 applies no patches.
Have you ever felt that a single inc instruction was not enough? Really? Me too!
OK, now time to hack GCC.
What QEMU and Buildroot are:
One of the major features of this repository is that we try to support the --dry-run option really well for all scripts.
This setup is like the QEMU Buildroot setup, but it uses gem5 instead of QEMU as a system simulator.
and can therefore be used to estimate system performance, see: Section 23.2, “gem5 run benchmark” for an example.
+and can therefore be used to estimate system performance, see: Section 24.2, “gem5 run benchmark” for an example.
The downside of gem5 much slower than QEMU because of the greater simulation detail.
@@ -3222,7 +3254,7 @@ j = 0For the most part, if you just add the --emulator gem5 option or *-gem5 suffix to all commands and everything should magically work.
See also: Section 2.3.1, “tmux gem5”.
+See also: Section 3.3.1, “tmux gem5”.
At the end of boot, it might not be very clear that you have the shell since some printk messages may appear in front of the prompt like this:
@@ -3293,7 +3325,7 @@ j = 0More gem5 information is present at: Section 23, “gem5”
+More gem5 information is present at: Section 24, “gem5”
Good next steps are:
@@ -3317,12 +3349,12 @@ j = 0This repository has been tested inside clean Docker containers.
This is a good option if you are on a Linux host, but the native setup failed due to your weird host distribution, and you have better things to do with your life than to debug it. See also: Section 37.1, “Supported hosts”.
+This is a good option if you are on a Linux host, but the native setup failed due to your weird host distribution, and you have better things to do with your life than to debug it. See also: Section 38.1, “Supported hosts”.
For example, to do a QEMU Buildroot setup inside Docker, run:
@@ -3332,7 +3364,7 @@ j = 0sudo apt-get install docker ./run-docker create && \ ./run-docker sh -- ./build --download-dependencies qemu-buildroot -./run-docker sh+./run-docker
./run-docker sh: open a shell on the container.
./run-docker: open a shell on the container.
If it has not been started previously, start it. This can also be done explicitly with:
./run-docker sh+
./run-docker
This setup uses prebuilt binaries that we upload to GitHub from time to time.
can’t GDB step debug the kernel, since the source and cross toolchain with GDB are not available. Buildroot cannot easily use a host toolchain: Section 34.2.3.1.1, “Buildroot use prebuilt host toolchain”.
+can’t GDB step debug the kernel, since the source and cross toolchain with GDB are not available. Buildroot cannot easily use a host toolchain: Section 35.2.3.1.1, “Buildroot use prebuilt host toolchain”.
Maybe we could work around this by just downloading the kernel source somehow, and using a host prebuilt GDB, but we felt that it would be too messy and unreliable.
Checkout to the latest tag and use the Ubuntu packaged QEMU to boot Linux:
THIS IS DANGEROUS (AND FUN), YOU HAVE BEEN WARNED
Minimal host build system example:
In order to test the kernel and emulators, userland content in the form of executables and scripts is of course required, and we store it mostly under:
There are several ways to run our Userland content, notably:
natively on the host as shown at: Section 1.8.2.1, “Userland setup getting started natively”
+natively on the host as shown at: Section 2.8.2.1, “Userland setup getting started natively”
Can only run examples compatible with your host CPU architecture and OS, but has the fastest setup and runtimes.
the host prebuilt toolchain: Section 1.8.2.2, “Userland setup getting started with prebuilt toolchain and QEMU user mode”
+the host prebuilt toolchain: Section 2.8.2.2, “Userland setup getting started with prebuilt toolchain and QEMU user mode”
the Buildroot toolchain you built yourself: Section 10.1, “QEMU user mode getting started”
+the Buildroot toolchain you built yourself: Section 11.1, “QEMU user mode getting started”
from full system simulation as shown at: Section 1.2.1, “QEMU Buildroot setup getting started”.
+from full system simulation as shown at: Section 2.2.1, “QEMU Buildroot setup getting started”.
This is the most reproducible and controlled environment, and all examples work there. But also the slower one to setup.
With this setup, we will use the host toolchain and execute executables directly on the host.
So you can use any option supported by build-userland script freely with build-userland-in-tree and build.
The situation is analogous for userland/test, test-executables-in-tree and test-executables, which are further documented at: Section 10.2, “User mode tests”.
+The situation is analogous for userland/test, test-executables-in-tree and test-executables, which are further documented at: Section 11.2, “User mode tests”.
Do a more clean out-of-tree build instead and run the program:
@@ -3994,11 +4026,11 @@ cd userlandas shown at: Section 22.8, “Debug the emulator”, although direct GDB host usage works as well of course.
+as shown at: Section 23.8, “Debug the emulator”, although direct GDB host usage works as well of course.
If you are lazy to built the Buildroot toolchain and QEMU, but want to run e.g. ARM Userland assembly in User mode simulation, you can get away on Ubuntu 18.04 with just:
--gcc-which host: use the host toolchain.
We must pass this to ./run as well because QEMU must know which dynamic libraries to use. See also: Section 10.5, “User mode static executables”.
We must pass this to ./run as well because QEMU must know which dynamic libraries to use. See also: Section 11.5, “User mode static executables”.
This present the usual trade-offs of using prebuilts as mentioned at: Section 1.6, “Prebuilt setup”.
+This present the usual trade-offs of using prebuilts as mentioned at: Section 2.6, “Prebuilt setup”.
Other functionality are analogous, e.g. testing:
@@ -4069,7 +4101,7 @@ cd userlandFirst ensure that QEMU Buildroot setup is working.
After doing that setup, you can already execute your userland programs from inside QEMU: the only missing step is how to rebuild executables and run them.
And the answer is exactly analogous to what is shown at: Section 1.2.2.2, “Your first kernel module hack”
+And the answer is exactly analogous to what is shown at: Section 2.2.2.2, “Your first kernel module hack”
For example, if we modify userland/c/hello.c to print out something different, we can just rebuild it with:
@@ -4118,9 +4150,9 @@ cd userlandThis setup does not use the Linux kernel nor Buildroot at all: it just runs your very own minimal OS.
Every .c file inside baremetal/ and .S file inside baremetal/arch/<arch>/ generates a separate baremetal image.
TODO: the carriage returns are a bit different than in QEMU, see: Section 32.6, “gem5 baremetal carriage return”.
+TODO: the carriage returns are a bit different than in QEMU, see: Section 33.6, “gem5 baremetal carriage return”.
Note that ./build-baremetal requires the --emulator gem5 option, and generates separate executable images for both, as can be seen from:
see also: Section 23.18, “gem5 ARM platforms”.
+see also: Section 24.18, “gem5 ARM platforms”.
This generates yet new separate images with new magic constants:
@@ -4343,10 +4375,10 @@ echo "$(./getvar --arch aarch64 --baremetal userland/c/hello.c --emulator gem5 -But just stick to newer and better VExpress_GEM5_V1 unless you have a good reason to use RealViewPBX.
When doing baremetal programming, it is likely that you will want to learn userland assembly first, see: Section 27, “Userland assembly”.
+When doing baremetal programming, it is likely that you will want to learn userland assembly first, see: Section 28, “Userland assembly”.
For more information on baremetal, see the section: Section 32, “Baremetal”.
+For more information on baremetal, see the section: Section 33, “Baremetal”.
The following subjects are particularly important:
@@ -4364,7 +4396,7 @@ echo "$(./getvar --arch aarch64 --baremetal userland/c/hello.c --emulator gem5 -You don’t need to depend on GitHub.
More information about our documentation internals can be found at: Section 37.5, “Documentation”
+More information about our documentation internals can be found at: Section 38.5, “Documentation”
--gdb-wait makes QEMU and gem5 wait for a GDB connection, otherwise we could accidentally go past the point we want to break at:
Just don’t forget to pass --arch to ./run-gdb, e.g.:
Let’s observe the kernel write system call as it reacts to some userland actions.
tmux just makes things even more fun by allowing us to see both the terminal for:
If you are using gem5 instead of QEMU, --tmux has a different effect by default: it opens the gem5 terminal instead of the debugger:
TODO: why does break work_func for insmod kthread.ko not very well? Sometimes it breaks but not others.
TODO on arm 51e31cdc2933a774c2a0dc62664ad8acec1d2dbe it does not always work, and lx-symbols fails with the message:
TODO find a more convenient method. We have working methods, but they are not ideal.
This is the best method we’ve found so far.
This works, but is a bit annoying.
TODO not working. This could be potentially very convenient.
This is another possibility: we could modify the module source by adding a trap instruction of some kind.
Useless, but a good way to show how hardcore you are. Disable lx-symbols with:
TODO successfully debug the very first instruction that the Linux kernel runs, before start_kernel!
and no I do have the symbols from arch/arm/boot/compressed/vmlinux', but the breaks still don’t work.
v4.19 also added a CONFIG_HAVE_KERNEL_UNCOMPRESSED=y option for having the kernel uncompressed which could make following the startup easier, but it is only available on s390. aarch64 however is already uncompressed by default, so might be the easiest one. See also: Section 16.20.1, “vmlinux vs bzImage vs zImage vs Image”.
v4.19 also added a CONFIG_HAVE_KERNEL_UNCOMPRESSED=y option for having the kernel uncompressed which could make following the startup easier, but it is only available on s390. aarch64 however is already uncompressed by default, so might be the easiest one. See also: Section 17.20.1, “vmlinux vs bzImage vs zImage vs Image”.
You then need the associated KERNEL_UNCOMPRESSED to enable it if available:
In gem5 aarch64 Linux v4.18, experimentally the entry point of secondary CPUs seems to be secondary_holding_pen as shown at https://gist.github.com/cirosantilli2/34a7bc450fcb6c1c1a910369be1fdd90
start_kernel is the first C function to be executed basically: https://stackoverflow.com/questions/18266063/does-kernel-have-main-function/33422401#33422401
When booting Linux on a slow emulator like gem5, what you observe is that:
QEMU’s -gdb GDB breakpoints are set on virtual addresses, so you can in theory debug userland processes as well.
the emulator does not support host to guest networking. This seems to be the case for gem5 as explained at: Section 14.3.1.3, “gem5 host to guest networking”
+the emulator does not support host to guest networking. This seems to be the case for gem5 as explained at: Section 15.3.1.3, “gem5 host to guest networking”
cannot see the start of the init process easily
the kernel might switch context to another process or to the kernel itself e.g. on a system call, and then TODO confirm the PIC would go to weird places and source code would be missing.
Solutions to this are being researched at: Section 2.10.1, “lx-ps”.
+Solutions to this are being researched at: Section 3.10.1, “lx-ps”.
This is the userland debug setup most likely to work, since at init time there is only one userland executable running.
BusyBox custom init process:
Non-init process:
TODO: if I try GDB step debug userland non-init without --gdb-wait and the break main that we do inside ./run-gdb says:
GDB can call functions as explained at: https://stackoverflow.com/questions/1354731/how-to-evaluate-functions-in-gdb
info all-registers shows some of them.
For a more minimal baremetal multicore setup, see: Section 32.10.3, “ARM baremetal multicore”.
+For a more minimal baremetal multicore setup, see: Section 33.10.3, “ARM baremetal multicore”.
We can set and get which cores the Linux kernel allows a program to run on with sched_getaffinity and sched_setaffinity:
The number of cores is modified as explained at: Section 23.3.1, “Number of cores”
+The number of cores is modified as explained at: Section 24.3.1, “Number of cores”
taskset from the util-linux package sets the initial core affinity of a program:
We source the Linux kernel GDB scripts by default for lx-symbols, but they also contains some other goodies worth looking into.
https://stackoverflow.com/questions/54133479/accessing-logical-software-thread-id-in-gem5 on ARM the kernel can store an indication of PID in the CONTEXTIDR_EL1 register, making that much easier to observe from simulators.
For when it breaks again, or you want to add a new feature!
See also: https://stackoverflow.com/questions/13496389/gdb-remote-protocol-how-to-analyse-packets
This error means that the GDB server, e.g. in QEMU, sent more registers than the GDB client expected.
KGDB is kernel dark magic that allows you to GDB the kernel on real hardware without any extra hardware support.
@@ -6393,7 +6425,7 @@ Entering kdb (current=0x(____ptrval____), pid 1) on processor 0 due to KeyboardKGDB expects the connection at ttyS1, our second serial port after ttyS0 which contains the terminal.
The last line is the KDB prompt, and is covered at: Section 3.3, “KDB”. Typing now shows nothing because that prompt is expecting input from ttyS1.
The last line is the KDB prompt, and is covered at: Section 4.3, “KDB”. Typing now shows nothing because that prompt is expecting input from ttyS1.
Instead, we connect to the serial port ttyS1 with GDB:
TODO: we would need a second serial for KGDB to work, but it is not currently supported on arm and aarch64 with -M virt that we use: https://unix.stackexchange.com/questions/479085/can-qemu-m-virt-on-arm-aarch64-have-multiple-serial-ttys-like-such-as-pl011-t/479340#479340
Just works as you would expect:
KDB is a way to use KDB directly in your main console, without GDB.
The other KDB commands allow you to step instructions, view memory, registers and some higher level kernel runtime data similar to the superior GDB Python scripts.
You can also use KDB directly from the graphic window with:
TODO neither arm and aarch64 are working as of 1cd1e58b023791606498ca509256cc48e95e4f5b + 1.
Step debug userland processes to understand how they are talking to the kernel.
@@ -6668,7 +6700,7 @@ el0_svc+0x8/0xcBibliography: https://reverseengineering.stackexchange.com/questions/8829/cross-debugging-for-arm-mips-elf-with-qemu-toolchain/16214#16214
Analogous to GDB step debug userland processes:
Our setup gives you the rare opportunity to step debug libc and other system libraries.
TODO: try to step debug the dynamic loader. Would be even easier if starti is available: https://stackoverflow.com/questions/10483544/stopping-at-the-first-machine-code-instruction-in-gdb
The portability of the kernel and toolchains is amazing: change an option and most things magically work on completely different hardware.
@@ -6795,9 +6827,9 @@ continueKnown quirks of the supported architectures are documented in this section.
This example illustrates how reading from the x86 control registers with mov crX, rax can only be done from kernel land on ring0.
TODO Can you run arm executables in the aarch64 guest? https://stackoverflow.com/questions/22460589/armv8-running-legacy-32-bit-applications-on-64-bit-os/51466709#51466709
It should not be too hard to port this repository to any architecture that Buildroot supports. Pull requests are welcome.
When the Linux kernel finishes booting, it runs an executable as the first and only userland process. This executable is called the init program.
The init program can be either an executable shell text file, or a compiled ELF file. It becomes easy to accept this once you see that the exec system call handles both cases equally: https://unix.stackexchange.com/questions/174062/can-the-init-process-be-a-shell-script-in-linux/395375#395375
The init executable is searched for in a list of paths in the root filesystem, including /init, /sbin/init and a few others. For more details see: Section 6.3, “Path to init”
The init executable is searched for in a list of paths in the root filesystem, including /init, /sbin/init and a few others. For more details see: Section 7.3, “Path to init”
To have more control over the system, you can replace BusyBox’s init with your own.
This just counts every second forever and does not give you a shell.
This method is not very flexible however, as it is hard to reliably pass multiple commands and command line arguments to the init with it, as explained at: Section 6.4, “Init environment”.
+This method is not very flexible however, as it is hard to reliably pass multiple commands and command line arguments to the init with it, as explained at: Section 7.4, “Init environment”.
For this reason, we have created a more robust helper method with the --eval option:
Source: rootfs_overlay/lkmc/eval_base64.sh.
This allows quoting and newlines by base64 encoding on host, and decoding on guest, see: Section 16.3.1, “Kernel command line parameters escaping”.
+This allows quoting and newlines by base64 encoding on host, and decoding on guest, see: Section 17.3.1, “Kernel command line parameters escaping”.
It also automatically chooses between init= and rcinit= for you, see: Section 6.3, “Path to init”
It also automatically chooses between init= and rcinit= for you, see: Section 7.3, “Path to init”
--eval replaces BusyBox' init completely, which makes things more minimal, but also has has the following consequences:
The best way to overcome those limitations is to use: Section 6.2, “Run command at the end of BusyBox init”
+The best way to overcome those limitations is to use: Section 7.2, “Run command at the end of BusyBox init”
If the script is large, you can add it to a gitignored file and pass that to --eval as in:
Just using BusyBox' poweroff at the end of the init does not work and the kernel panics:
I dare you to guess what this does:
Get a reasonable answer to "how long does boot take in guest time?":
Use the --eval-after option is for you rely on something that BusyBox' init set up for you like /etc/fstab:
The init is selected at:
The annoying dash - gets passed as a parameter to init, which makes it impossible to use this method for most non custom executables.
Wait, where do HOME and TERM come from? (greps the kernel). Ah, OK, the kernel sets those by default: https://github.com/torvalds/linux/blob/94710cac0ef4ee177a63b5227664b38c95bbf703/init/main.c#L173
On top of the Linux kernel, the BusyBox /bin/sh shell will also define other variables.
Login shells source some default files, notably:
The kernel can boot from an CPIO file, which is a directory serialization format much like tar: https://superuser.com/questions/343915/tar-vs-cpio-what-is-the-difference
@@ -7468,7 +7500,7 @@ cat fwhich can be good for automated tests, as it ensures that you are using a pristine unmodified system image every time.
Not however that we already disable disk persistency by default on ext2 filesystems even without --initrd: Section 22.3, “Disk persistency”.
Not however that we already disable disk persistency by default on ext2 filesystems even without --initrd: Section 23.3, “Disk persistency”.
One downside of this method is that it has to put the entire filesystem into memory, and could lead to a panic:
@@ -7512,7 +7544,7 @@ cat fTODO: how does the bootloader inform the kernel where to find initrd? https://unix.stackexchange.com/questions/89923/how-does-linux-load-the-initrd-image
Most modern desktop distributions have an initrd in their root disk to do early setup.
initramfs is just like initrd, but you also glue the image directly to the kernel image itself using the kernel’s build system.
This is how /proc/mounts shows the root filesystem:
TODO we were not able to get it working yet: https://stackoverflow.com/questions/49261801/how-to-boot-the-linux-kernel-with-initrd-or-initramfs-with-gem5
This could in theory be easier to make work than initrd since the emulator does not have to do anything special.
We think that this might be because gem5 boots directly vmlinux, and not from the final compressed images that contain the attached rootfs such as bzImage, which is what QEMU does, see also: Section 16.20.1, “vmlinux vs bzImage vs zImage vs Image”.
We think that this might be because gem5 boots directly vmlinux, and not from the final compressed images that contain the attached rootfs such as bzImage, which is what QEMU does, see also: Section 17.20.1, “vmlinux vs bzImage vs zImage vs Image”.
To do this failed test, we automatically pass a dummy disk image as of gem5 7fa4c946386e7207ad5859e8ade0bbfc14000d91 since the scripts don’t handle a missing --disk-image well, much like is currently done for Baremetal.
The device tree is a Linux kernel defined data structure that serves to inform the kernel how the hardware is setup.
@@ -7706,7 +7738,7 @@ cat fThe Linux kernel itself has several device trees under ./arch/<arch>/boot/dts, see also: https://stackoverflow.com/questions/21670967/how-to-compile-dts-linux-device-tree-source-files-to-dtb/42839737#42839737
Files that contain device trees have the .dtb extension when compiled, and .dts when in text form.
Good format descriptions:
Since emulators know everything about the hardware, they can automatically generate device trees for us, which is very convenient.
KVM is Linux kernel interface that greatly speeds up execution of virtual machines.
@@ -7970,7 +8002,7 @@ cat fOne important use case for KVM is to fast forward gem5 execution, often to skip boot, take a gem5 checkpoint, and then move on to a more detailed and slow simulation
TODO: we haven’t gotten it to work yet, but it should be doable, and this is an outline of how to do it. Just don’t expect this to tested very often for now.
While gem5 does have KVM, as of 2019 its support has not been very good, because debugging it is harder and people haven’t focused intensively on it.
Both QEMU and gem5 have an user mode simulation mode in addition to full system simulation that we consider elsewhere in this project.
@@ -8098,7 +8130,7 @@ cd linux-kernel-module-cheatemulator implementers have to keep up with libc changes, some of which break even a C hello world due setup code executed before main.
Let’s run userland/c/command_line_arguments.c built with the Buildroot toolchain on QEMU user mode:
./run --userland path resolution is analogous to that of ./run --baremetal.
./build user-mode-qemu first builds Buildroot, and then runs ./build-userland, which is further documented at: Section 1.8, “Userland setup”. It also builds QEMU. If you ahve already done a QEMU Buildroot setup previously, this will be very fast.
./build user-mode-qemu first builds Buildroot, and then runs ./build-userland, which is further documented at: Section 2.8, “Userland setup”. It also builds QEMU. If you ahve already done a QEMU Buildroot setup previously, this will be very fast.
If you modify the userland programs, rebuild simply with:
@@ -8177,7 +8209,7 @@ qw erIt’s nice when the obvious just works, right?
To stop at the very first instruction of a freestanding program, just use --no-continue. A good example of this is shown at: Section 27.5.1, “Freestanding programs”.
To stop at the very first instruction of a freestanding program, just use --no-continue. A good example of this is shown at: Section 28.5.1, “Freestanding programs”.
Automatically run all userland tests that can be run in user mode simulation, and check that they exit with status 0:
Tests under userland/libs/ are only run if --package or --package-all are given as described at userland/libs directory.
The gem5 tests require building statically with build id static, see also: Section 10.7, “gem5 syscall emulation mode”. TODO automate this better.
The gem5 tests require building statically with build id static, see also: Section 11.7, “gem5 syscall emulation mode”. TODO automate this better.
See: Section 37.16, “Test this repo” for more useful testing tips.
+See: Section 38.16, “Test this repo” for more useful testing tips.
If you followed QEMU Buildroot setup, you can now run the executables created by Buildroot directly as:
Here is an interesting examples of this: Section 16.19.1, “Linux Test Project”
+Here is an interesting examples of this: Section 17.19.1, “Linux Test Project”
At 125d14805f769104f93c510bedaa685a52ec025d we moved Buildroot from uClibc to glibc, and caused some user mode pain, which we document here.
glibc has a check for kernel version, likely obtained from the uname syscall, and if the kernel is not new enough, it quits.
For some reason QEMU / glibc x86_64 picks up the host libc, which breaks things.
Example:
QEMU x86_64 guest on x86_64 host was failing with stack smashing detected when using glibc, but we found a workaround
gem5 user only supported static executables in the past, as mentioned at: Section 10.7, “gem5 syscall emulation mode”
+gem5 user only supported static executables in the past, as mentioned at: Section 11.7, “gem5 syscall emulation mode”
One limitation of static executables is that Buildroot mostly only builds dynamic versions of libraries (the libc is an exception).
g++ and pthreads also causes issues:
The following work on both QEMU and gem5 as of LKMC 99d6bc6bc19d4c7f62b172643be95d9c43c26145 + 1. Interactive input:
Less robust than QEMU’s, but still usable:
Support for dynamic linking was added in November 2019:
Note that as shown at Section 34.2.2, “Benchmark emulators on userland executables”, the dynamic version runs 200x more instructions, which might have an impact on smaller simulations in detailed CPUs.
+Note that as shown at Section 35.2.2, “Benchmark emulators on userland executables”, the dynamic version runs 200x more instructions, which might have an impact on smaller simulations in detailed CPUs.
As of gem5 7fa4c946386e7207ad5859e8ade0bbfc14000d91, the crappy se.py script does not forward the exit status of syscall emulation mode, you can test it with:
Since gem5 has to implement syscalls itself in syscall emulation mode, it can of course clearly see which syscalls are being made, and we can log them for debug purposes with gem5 tracing, e.g.:
gem5 user mode multithreading has been particularly flaky compared to QEMU’s, but work is being put into improving it.
gem5 syscall emulation has the nice feature of allowing you to run multiple executables "at once".
and therefore shows one instruction running on each CPU for each process at the same time.
gem5 b1623cb2087873f64197e503ab8894b5e4d4c7b4 syscall emulation has an --smt option presumably for Hardware threads but it has been neglected forever it seems: https://github.com/cirosantilli/linux-kernel-module-cheat/issues/104
At 8d8307ac0710164701f6e14c99a69ee172ccbb70 + 1, I noticed that if you run userland/posix/count.c:
TODO: investigate further and then possibly post on QEMU mailing list.
Similarly to QEMU user mode does not show stdout immediately, QEMU error messages do not show at all through pipes.
If you are feeling raw, you can insert and remove modules with our own minimal module inserter and remover!
Implemented as a BusyBox applet by default: https://git.busybox.net/busybox/tree/modutils/modprobe.c?h=1_29_stable
modules built with Buildroot, see: Section 37.15.2.1, “kernel_modules buildroot package”
+modules built with Buildroot, see: Section 38.15.2.1, “kernel_modules buildroot package”
modules built from the kernel tree itself, see: Section 16.11.2, “dummy-irq”
+modules built from the kernel tree itself, see: Section 17.11.2, “dummy-irq”
we would have to think how to not have to include the kernel modules twice in the root filesystem, but still have 9P working for fast development as described at: Section 1.2.2.2, “Your first kernel module hack”
+we would have to think how to not have to include the kernel modules twice in the root filesystem, but still have 9P working for fast development as described at: Section 2.2.2.2, “Your first kernel module hack”
The more "reference" kernel.org implementation of lsmod, insmod, rmmod, etc.: https://git.kernel.org/pub/scm/utils/kernel/kmod/kmod.git
BusyBox also implements its own version of those executables, see e.g. modprobe. Here we will only describe features that differ from kmod to the BusyBox implementation.
kmod’s modprobe can also load modules under different names to avoid conflicts, e.g.:
OverlayFS is a filesystem merged in the Linux kernel in 3.18.
no need to regenerate the root filesystem at all and reboot
overcomes the check_bin_arch problem as shown at: Section 25.8, “Buildroot rebuild is slow when the root filesystem is large”
overcomes the check_bin_arch problem as shown at: Section 26.8, “Buildroot rebuild is slow when the root filesystem is large”
A simpler and possibly less overhead alternative to 9P would be to generate a secondary disk image with the benchmark you want to rebuild.
Both QEMU and gem5 are capable of outputting graphics to the screen, and taking mouse and keyboard input.
@@ -9422,7 +9454,7 @@ vim userland/c/hello.cText mode is the default mode for QEMU.
scrolling up: Section 13.2.1, “Scroll up in graphic mode”
+scrolling up: Section 14.2.1, “Scroll up in graphic mode”
copy paste to and from the terminal
@@ -9487,7 +9519,7 @@ vim userland/c/hello.cbut there is not terminal on the VNC window, just the CONFIG_LOGO penguin.
Enable graphic mode with:
Outcome: you see a penguin due to CONFIG_LOGO.
For a more exciting GUI experience, see: Section 13.4, “X11 Buildroot”
+For a more exciting GUI experience, see: Section 14.4, “X11 Buildroot”
Text mode is the default due to the following considerable advantages:
@@ -9588,7 +9620,7 @@ vim userland/c/hello.cflooding the screen with colors. See also: https://superuser.com/questions/223094/how-do-i-know-if-i-have-kms-enabled
Scroll up in QEMU graphic mode:
TODO: on arm, we see the penguin and some boot messages, but don’t get a shell at then end:
arm and aarch64 rely on the QEMU CLI option:
TODO: how to use VGA on ARM? https://stackoverflow.com/questions/20811203/how-can-i-output-to-vga-through-qemu-arm Tried:
gem5 does not have a "text mode", since it cannot redirect the Linux terminal to same host terminal where the executable is running: you are always forced to connect to the terminal with gem-shell.
Tested on: 38fd6153d965ba20145f53dc1bb3ba34b336bde9
For aarch64 we also need to configure the kernel with linux_config/display:
TODO get working. There is an unmerged patchset at: https://gem5-review.googlesource.com/c/public/gem5/+/11036/1
We cannot use mainline Linux because the gem5 arm Linux kernel patches are required at least to provide the CONFIG_DRM_VIRT_ENCODER option.
Once you’ve seen the CONFIG_LOGO penguin as a sanity check, you can try to go for a cooler X11 Buildroot setup.
TODO 9076c1d9bcc13b6efdb8ef502274f846d8d4e6a1 I’m 100% sure that it was working before, but I didn’t run it forever, and it stopped working at some point. Needs bisection, on whatever commit last touched x11 stuff.
On ARM, startx hangs at a message:
We disable networking by default because it starts an userland process, and we want to keep the number of userland processes to a minimum to make the system more understandable as explained at: Section 37.20.3, “Resource tradeoff guidelines”
+We disable networking by default because it starts an userland process, and we want to keep the number of userland processes to a minimum to make the system more understandable as explained at: Section 38.20.3, “Resource tradeoff guidelines”
To enable networking on Buildroot, simply run:
@@ -10127,7 +10159,7 @@ cat index.htmlping does not work within QEMU by default, e.g.:
In this section we discuss how to interact between the guest and the host through networking.
First ensure that you can access the external network since that is easier to get working, see: Section 14, “Networking”.
+First ensure that you can access the external network since that is easier to get working, see: Section 15, “Networking”.
With nc we can create the most minimal example possible as a sanity check.
Not enabled by default due to the build / runtime overhead. To enable, build with:
Could not do port forwarding from host to guest, and therefore could not use gdbserver: https://stackoverflow.com/questions/48941494/how-to-do-port-forwarding-from-guest-to-host-in-gem5
First Enable networking.
The 9p protocol allows the guest to mount a host directory.
Both QEMU and gem5 9P support 9P.
All of 9P and NFS (and sshfs) allow sharing directories between guest and host.
As usual, we have already set everything up for you. On host:
Is possible on aarch64 as shown at: https://gem5-review.googlesource.com/c/public/gem5/+/22831, and it is just a matter of exposing to X86 for those that want it.
TODO: get working.
9P is better with emulation, but let’s just get this working for fun.
First make sure that this works: Section 14.3.2, “Guest to host networking”.
+First make sure that this works: Section 15.3.2, “Guest to host networking”.
Then, build the kernel with NFS support:
@@ -10590,7 +10622,7 @@ mount -t nfs 10.0.2.2:/tmp /mnt/nfsTo modify a single option on top of our default kernel configs, do:
You can also use other config generating targets such as defconfig with the same method as shown at: Section 16.1.3.1.1, “Linux kernel defconfig”.
You can also use other config generating targets such as defconfig with the same method as shown at: Section 17.1.3.1.1, “Linux kernel defconfig”.
Get the build config in guest:
By default, build-linux generates a .config that is a mixture of:
a base config extracted from Buildroot’s minimal per machine .config, which has the minimal options needed to boot as explained at: Section 16.1.3.1, “About Buildroot’s kernel configs”.
a base config extracted from Buildroot’s minimal per machine .config, which has the minimal options needed to boot as explained at: Section 17.1.3.1, “About Buildroot’s kernel configs”.
small overlays put top of that
@@ -10826,18 +10858,18 @@ CONFIG_IKCONFIG_PROC=ylinux_config/min: see: Section 16.1.3.1.2, “Linux kernel min config”
+linux_config/min: see: Section 17.1.3.1.2, “Linux kernel min config”
linux_config/default: other optional configs that we enable by default because they increase visibility, or expose some cool feature, and don’t significantly increase build time nor add significant runtime overhead
We have since observed that the kernel size itself is very bloated compared to defconfig as shown at: Section 16.1.3.1.1, “Linux kernel defconfig”.
We have since observed that the kernel size itself is very bloated compared to defconfig as shown at: Section 17.1.3.1.1, “Linux kernel defconfig”.
To see Buildroot’s base configs, start from buildroot/configs/qemu_x86_64_defconfig.
arm, on the other hand, uses buildroot/configs/qemu_arm_vexpress_defconfig, which contains BR2_LINUX_KERNEL_DEFCONFIG="vexpress", and therefore just does a make vexpress_defconfig, and gets its config from the Linux kernel tree itself.
To boot defconfig from disk on Linux and see a shell, all we need is these missing virtio options:
linux_config/min contains minimal tweaks required to boot gem5 or for using our slightly different QEMU command line options than Buildroot on all archs.
It is one of the default config fragments we use, as explained at: Section 16.1.3, “About our Linux kernel configs”>.
+It is one of the default config fragments we use, as explained at: Section 17.1.3, “About our Linux kernel configs”>.
Having the same config working for both QEMU and gem5 (oh, the hours of bisection) means that you can deal with functional matters in QEMU, which runs much faster, and switch to gem5 only for performance issues.
@@ -10971,14 +11003,14 @@ CONFIG_IKCONFIG_PROC=yOther configs which we had previously tested at 4e0d9af81fcce2ce4e777cb82a1990d7c2ca7c1e are:
arm and aarch64 configs present in the official ARM gem5 Linux kernel fork as described at: Section 23.9, “gem5 arm Linux kernel patches”. Some of the configs present there are added by the patches.
arm and aarch64 configs present in the official ARM gem5 Linux kernel fork as described at: Section 24.9, “gem5 arm Linux kernel patches”. Some of the configs present there are added by the patches.
Jason’s magic x86_64 config: http://web.archive.org/web/20171229121642/http://www.lowepower.com/jason/files/config which is referenced at: http://web.archive.org/web/20171229121525/http://www.lowepower.com/jason/setting-up-gem5-full-system.html. QEMU boots with that by removing # CONFIG_VIRTIO_PCI is not set.
We try to use the latest possible kernel major release version.
During update all you kernel modules may break since the kernel API is not stable.
This also makes this repo the perfect setup to develop the Linux kernel.
In case something breaks while updating the Linux kernel, you can try to bisect it to understand the root cause, see: Section 37.17, “Bisection”.
+In case something breaks while updating the Linux kernel, you can try to bisect it to understand the root cause, see: Section 38.17, “Bisection”.
First, use use the branching procedure described at: Section 37.18, “Update a forked submodule”
+First, use use the branching procedure described at: Section 38.18, “Update a forked submodule”
Because the kernel is so central to this repository, almost all tests must be re-run, so basically just follow the full testing procedure described at: Section 37.16, “Test this repo”. The only tests that can be skipped are essentially the Baremetal tests.
+Because the kernel is so central to this repository, almost all tests must be re-run, so basically just follow the full testing procedure described at: Section 38.16, “Test this repo”. The only tests that can be skipped are essentially the Baremetal tests.
Before comitting, don’t forget to update:
@@ -11067,7 +11099,7 @@ git log | grep -E ' Linux [0-9]+\.' | headThe kernel is not forward compatible, however, so downgrading the Linux kernel requires downgrading the userland too to the latest Buildroot branch that supports it.
Bootloaders can pass a string as input to the Linux kernel when it is booting to control its behaviour, much like the execve system call does to userland processes.
Double quotes can be used to escape spaces as in opt="a b", but double quotes themselves cannot be escaped, e.g. opt"a\"b"
There are two methods:
By default, the Linux kernel mounts the root filesystem as readonly. TODO rationale?
Disable userland address space randomization. Test it out by running rand_check.out twice:
printk is the most simple and widely used way of getting information from the kernel, so you should familiarize yourself with its basic configuration.
The debug highest level is a bit more magic, see: Section 16.4.3, “pr_debug” for more info.
+The debug highest level is a bit more magic, see: Section 17.4.3, “pr_debug” for more info.
The current printk level can be obtained with:
./run --kernel-cli 'ignore_loglevel'@@ -11641,7 +11673,7 @@ early_param("quiet", quiet_kernel);
Get ready for the noisiest boot ever, I think it overflows the printk buffer and funny things happen.
When CONFIG_DYNAMIC_DEBUG is set, printk(KERN_DEBUG is not the exact same as pr_debug( since printk(KERN_DEBUG messages are visible with:
The Linux kernel allows passing module parameters at insertion time through the init_module and finit_module system calls.
modprobe insertion can also set default parameters via the /etc/modprobe.conf file:
One module can depend on symbols of another module that are exported with EXPORT_SYMBOL:
TODO: what for, and at which point point does Buildroot / BusyBox generate that file?
Unlike insmod, modprobe deals with kernel module dependencies for us.
Module metadata is stored on module files at compile time. Some of the fields can be retrieved through the THIS_MODULE struct module:
init_module and cleanup_module are an older alternative to the module_init and module_exit macros:
It is generally hard / impossible to use floating point operations in the kernel. TODO understand details.
To test out kernel panics and oops in controlled circumstances, try out the modules:
On panic, the kernel dies, and so does our terminal.
The log shows which module each symbol belongs to if any, e.g.:
For testing purposes, it is very useful to quit the emulator automatically with exit status non zero in case of kernel panic, instead of just hanging forever.
Enabled by default with:
panic=-1 command line option which reboots the kernel immediately on panic, see: Section 16.6.1.4, “Reboot on panic”
panic=-1 command line option which reboots the kernel immediately on panic, see: Section 17.6.1.4, “Reboot on panic”
QEMU -no-reboot, which makes QEMU exit when the guest tries to reboot
gem5 9048ef0ffbf21bedb803b785fb68f83e95c04db8 (January 2019) can detect panics automatically if the option system.panic_on_panic is on.
Make the kernel reboot after n seconds after panic:
If CONFIG_KALLSYMS=n, then addresses are shown on traces instead of symbol plus offset.
On oops, the shell still lives after.
The dump_stack function produces a stack trace much like panic and oops, but causes no problems and we return to the normal control flow, and can cleanly remove the module afterwards:
The WARN_ON macro basically just calls dump_stack.
Let’s learn how to diagnose problems with the root filesystem not being found. TODO add a sample panic error message for each error type:
Pseudo filesystems are filesystems that don’t represent actual files in a hard disk, but rather allow us to do special operations on filesystem-related system calls.
Debugfs is the simplest pseudo filesystem to play around with:
Procfs is just another fops entry point:
Its data is shared with uname(), which is a POSIX C function and has a Linux syscall to back it up.
Sysfs is more restricted than procfs, as it does not take an arbitrary file_operations:
Character devices can have arbitrary File operations associated to them:
Bibliography: https://unix.stackexchange.com/questions/37829/understanding-character-device-or-character-special-files/371758#371758
And also destroy it on rmmod:
File operations are the main method of userland driver communication.
Writing trivial read File operations is repetitive and error prone. The seq_file API makes the process much easier for those trivial cases:
If you have the entire read output upfront, single_open is an even more convenient version of seq_file:
The poll system call allows an user process to do a non-busy wait on a kernel event.
The ioctl system call is the best way to pass an arbitrary number of parameters to the kernel in a single go:
The mmap system call allows us to share memory between user and kernel space without copying:
Anonymous inodes allow getting multiple file descriptors from a single filesystem entry, which reduces namespace pollution compared to creating multiple device files:
Netlink sockets offer a socket API for kernel / userland communication:
Kernel threads are managed exactly like userland threads; they also have a backing task_struct, and are scheduled with the same mechanism:
The sleep is done with usleep_range, see: Section 16.9.2, “sleep”.
The sleep is done with usleep_range, see: Section 17.9.2, “sleep”.
Bibliography:
@@ -14038,7 +14070,7 @@ for i in `seq 16`; do ./netlink.out & doneLet’s launch two threads and see if they actually run in parallel:
Count to dmesg every one second from 0 up to n - 1:
A more convenient front-end for kthread:
Bibliography: https://github.com/torvalds/linux/blob/v4.17/Documentation/core-api/workqueue.rst
Count from 0 to 9 every second infinitely many times by scheduling a new work item from a work item:
Let’s block the entire kernel! Yay:
Wait queues are a way to make a thread sleep until an event happens on the queue:
Count from 0 to 9 infinitely many times in 1 second intervals using timers:
Brute force monitor every shared interrupt that will accept us:
The Linux kernel v4.16 mainline also has a dummy-irq module at drivers/misc/dummy-irq.c for monitoring a single IRQ.
In the guest with QEMU graphic mode:
Convert a virtual address to physical:
Only tested in x86_64.
The xp QEMU monitor command reads memory at a given physical address.
/dev/mem exposes access to physical addresses, and we use it through the convenient devmem BusyBox utility.
Dump the physical address of all pages mapped to a given process using /proc/<pid>/maps and /proc/<pid>/pagemap.
Good overviews:
I hope to have examples of all methods some day, since I’m obsessed with visibility.
Logs proc events such as process creation to a netlink socket.
0111ca406bdfa6fd65a2605d353583b4c4051781 was failing with:
Trace a single function:
TODO: can you get function arguments? https://stackoverflow.com/questions/27608752/does-ftrace-allow-capture-of-system-call-arguments-to-the-linux-kernel-or-only
TODO example:
kprobes is an instrumentation mechanism that injects arbitrary code at a given address in a trap instruction, much like GDB. Oh, the good old kernel. :-)
TODO: didn’t port during refactor after 3b0a343647bed577586989fb702b760bd280844a. Reimplementing should not be hard.
Make it harder to get hacked and easier to notice that you were, at the cost of some (small?) runtime overhead.
Detects buffer overflows for us:
TODO get a hello world permission control working:
SELinux requires glibc as mentioned at: Section 25.10, “libc choice”.
+SELinux requires glibc as mentioned at: Section 26.10, “libc choice”.
I once got UML running on a minimal Buildroot setup at: https://unix.stackexchange.com/questions/73203/how-to-create-rootfs-for-user-mode-linux-on-fedora-18/372207#372207
UIO is a kernel subsystem that allows to do certain types of driver operations from userland.
Requires Graphics.
Requires Graphics.
If you run in QEMU graphic mode:
This leads Linux to try to reboot, and QEMU shutdowns due to the -no-reboot option which we set by default for, see: Section 16.6.1.3, “Exit emulator on panic”.
This leads Linux to try to reboot, and QEMU shutdowns due to the -no-reboot option which we set by default for, see: Section 17.6.1.3, “Exit emulator on panic”.
Here is a minimal example of Ctrl Alt Del:
@@ -16140,7 +16172,7 @@ static void halt_reboot_pwoff(int sig)We cannot test these actual shortcuts on QEMU since the host captures them at a lower level, but from:
In order to play with TTYs, do this:
Take the command described at TTY and try adding the following:
If you run in Graphics, then you get a Penguin image for every core above the console! https://askubuntu.com/questions/80938/is-it-possible-to-get-the-tux-logo-on-the-text-based-boot
DRM / DRI is the new interface that supersedes fbdev:
Tested on: 93e383902ebcc03d8a7ac0d65961c0e62af9612b
./build-buildroot --config-fragment buildroot_config/kmscube@@ -16745,7 +16777,7 @@ failed to initialize legacy DRM
TODO get working.
POSIX userland stress. Two versions:
STRESS_NG is likely the best, but it requires glibc, see: Section 25.10, “libc choice”.
STRESS_NG is likely the best, but it requires glibc, see: Section 26.10, “libc choice”.
Websites:
@@ -16899,9 +16931,9 @@ psBetween all archs on QEMU and gem5 we touch all of those kernel built output files.
The following kernel modules and Baremetal executables dump and disassemble various registers which cannot be observed from userland (usually "system registers", "control registers"):
so as long as we craft the correct DTB and feed it into Xen so that it can see the kernel, it should work. TODO does QEMU support patching the auto-generated DTB with pre-generated options? In the worst case we can just dump it hand hack it up though with -machine dumpdtb, see: Section 8.4, “Device tree emulator generation”.
so as long as we craft the correct DTB and feed it into Xen so that it can see the kernel, it should work. TODO does QEMU support patching the auto-generated DTB with pre-generated options? In the worst case we can just dump it hand hack it up though with -machine dumpdtb, see: Section 9.4, “Device tree emulator generation”.
Bibliography:
@@ -17169,7 +17201,7 @@ west build -b qemu_aarch64 samples/hello_worldQEMU is a system simulator: it simulates a CPU and devices such as interrupt handlers, timers, UART, screen, keyboard, etc.
We disable disk persistency for both QEMU and gem5 by default, to prevent the emulator from putting the image in an unknown state.
Disk persistency is useful to re-run shell commands from the history of a previous session with Ctrl-R, but we felt that the loss of determinism was not worth it.
TODO how to make gem5 disk writes persistent?
qcow2 does not appear supported, there are not hits in the source tree, and there is a mention on Nate’s 2009 wishlist: http://gem5.org/Nate%27s_Wish_List
QEMU allows us to take snapshots at any time through the monitor.
Bibliography: https://stackoverflow.com/questions/40227651/does-qemu-emulator-have-checkpoint-function/48724371#48724371
Snapshots are stored inside the .qcow2 images themselves.
This section documents:
Only tested in x86.
Small upstream educational PCI device:
In this section we will try to interact with PCI devices directly from userland without kernel modules.
There are two versions of setpci and lspci:
The PCI standard is non-free, obviously like everything in low level: https://pcisig.com/specifications but Google gives several illegal PDF hits :-)
lspci -k shows something like:
TODO: broken. Was working before we moved arm from -M versatilepb to -M virt around af210a76711b7fa4554dcc2abd0ddacfc810dfd4. Either make it work on -M virt if that is possible, or document precisely how to make it work with versatilepb, or hopefully vexpress which is newer.
TODO: broken when arm moved to -M virt, same as GPIO.
TODO get some working!
The QEMU monitor is a magic terminal that allows you to send text commands to the QEMU VM itself: https://en.wikibooks.org/wiki/QEMU/Monitor
Peter Maydell said potentially not possible nicely as of August 2018: https://stackoverflow.com/questions/51747744/how-to-run-a-qemu-monitor-command-from-inside-the-guest/51764110#51764110
When doing GDB step debug it is possible to send QEMU monitor commands through the GDB monitor command, which saves you the trouble of opening yet another shell.
When you start hacking QEMU or gem5, it is useful to see what is going on inside the emulator themselves.
The build outputs are automatically stored in a different directories for optimized and debug builds, which prevents debug files from overwriting opt ones. Therefore, --gem5-build-id is not required.
The price to pay for debuggability is high however: a Linux kernel boot was about 3x slower in QEMU and 14 times slower in gem5 debug compared to opt, see benchmarks at: Section 34.2.1, “Benchmark Linux kernel boot”.
+The price to pay for debuggability is high however: a Linux kernel boot was about 3x slower in QEMU and 14 times slower in gem5 debug compared to opt, see benchmarks at: Section 35.2.1, “Benchmark Linux kernel boot”.
Similar slowdowns can be observed at: Section 34.2.2, “Benchmark emulators on userland executables”.
+Similar slowdowns can be observed at: Section 35.2.2, “Benchmark emulators on userland executables”.
When in QEMU text mode, using --debug-vm makes Ctrl-C not get passed to the QEMU guest anymore: it is instead captured by GDB itself, so allow breaking. So e.g. you won’t be able to easily quit from a guest program like:
You can still send key presses to QEMU however even without the mouse capture, just either click on the title bar, or alt tab to give it focus.
While step debugging any complex program, you always end up feeling the need to step in reverse to reach the last call to some function that was called before the failure point, in order to trace back the problem to the actual bug source.
Start pdb at the first instruction:
QEMU can log several different events.
QEMU also has a second trace mechanism in addition to -trace, find out the events with:
TODO: is it possible to show the register values for each instruction?
PANDA can list memory addresses, so I bet it can also decode the instructions: https://github.com/panda-re/panda/blob/883c85fa35f35e84a323ed3d464ff40030f06bd6/panda/docs/LINE_Censorship.md I wonder why they don’t just upstream those things to QEMU’s tracing: https://github.com/panda-re/panda/issues/290
gem5 can do it as shown at: Section 22.9.8, “gem5 tracing”.
+gem5 can do it as shown at: Section 23.9.8, “gem5 tracing”.
Not possible apparently, not even with the memory_region_ops_read and memory_region_ops_write trace events, Peter comments https://lists.gnu.org/archive/html/qemu-devel/2015-06/msg07482.html
We can further use Binutils' addr2line to get the line that corresponds to each address:
QEMU runs, unlike gem5, are not deterministic by default, however it does support a record and replay mechanism that allows you to replay a previous run deterministically.
Solved on unmerged c42634d8e3428cfa60672c3ba89cabefc720cde9 from https://github.com/ispras/qemu/tree/rr-180725
TODO get working.
TODO: is there any way to distinguish which instruction runs on each core? Doing:
gem5 provides also provides a tracing mechanism documented at: http://www.gem5.org/Trace_Based_Debugging:
TODO: 7452d399290c9c1fc6366cdad129ef442f323564 ./trace2line this is too slow and takes hours. QEMU’s processing of 170k events takes 7 seconds. gem5’s processing is analogous, but there are 140M events, so it should take 7000 seconds ~ 2 hours which seems consistent with what I observe, so maybe there is no way to speed this up… The workaround is to just use gem5’s ExecSymbol to get function granularity, and then GDB individually if line detail is needed?
gem5 traces are generated from DPRINTF(<trace-id> calls scattered throughout the code, except for ExecAll instruction traces, which uses Debug::ExecEnable directly..
This debug flag traces all instructions.
25007500: time count in some unit. Note how the microops execute at further timestamps.
system.cpu: distinguishes between CPUs when there are more than one. For example, running Section 32.10.3, “ARM baremetal multicore” with two cores produces system.cpu0 and system.cpu1
system.cpu: distinguishes between CPUs when there are more than one. For example, running Section 33.10.3, “ARM baremetal multicore” with two cores produces system.cpu0 and system.cpu1
T0: thread number. TODO: hyperthread? How to play with it?
This flag shows a more detailed register usage than gem5 ExecAll trace format.
As of gem5 16eeee5356585441a49d05c78abc328ef09f7ace the default tracer is ExeTracer. It is set at:
Sometimes in Ubuntu 14.04, after the QEMU SDL GUI starts, it does not get updated after keyboard strokes, and there are artifacts like disappearing text.
Getting started at: Section 1.4, “gem5 Buildroot setup”.
+Getting started at: Section 2.4, “gem5 Buildroot setup”.
gem5 has a bunch of crappiness, mostly described at: gem5 vs QEMU, but it does deserve some credit on the following points:
@@ -19203,7 +19235,7 @@ rootruns are deterministic by default, unlike QEMU which has a special QEMU record and replay mode, that requires first playing the content once and then replaying
gem5 ARM at least appears to implement more low level CPU functionality than QEMU, e.g. QEMU only added EL2 in 2018: https://stackoverflow.com/questions/42824706/qemu-system-aarch64-entering-el1-when-emulating-a53-power-up See also: Section 32.10.1, “ARM exception levels”
+gem5 ARM at least appears to implement more low level CPU functionality than QEMU, e.g. QEMU only added EL2 in 2018: https://stackoverflow.com/questions/42824706/qemu-system-aarch64-entering-el1-when-emulating-a53-power-up See also: Section 33.10.1, “ARM exception levels”
gem5 offers more advanced logging, even for non micro architectural things which QEMU models in some way, e.g. QEMU trace memory accesses, because QEMU’s binary translation optimizations reduce visibility
@@ -19287,7 +19319,7 @@ rootslower than QEMU, see: Section 34.2.1, “Benchmark Linux kernel boot”
+slower than QEMU, see: Section 35.2.1, “Benchmark Linux kernel boot”
This implies that the user base is much smaller, since no Android devs.
OK, this is why we used gem5 in the first place, performance measurements!
but the problem is that this method does not allow to easily run a different script without running the boot again. The ./gem5.sh script works around that by using m5 readfile as explained further at: Section 23.6.3, “gem5 checkpoint restore and run a different script”.
but the problem is that this method does not allow to easily run a different script without running the boot again. The ./gem5.sh script works around that by using m5 readfile as explained further at: Section 24.6.3, “gem5 checkpoint restore and run a different script”.
Now you can play a fun little game with your friends:
@@ -19513,7 +19545,7 @@ cat out/gem5-bench-dhrystone.txtTo find out why your program is slow, a good first step is to have a look at the gem5 m5out/stats.txt file.
A few imperfections of our benchmarking method are:
Besides optimizing a program for a given CPU setup, chip developers can also do the inverse, and optimize the chip for a given benchmark!
The rabbit hole is likely deep, but let’s scratch a bit of the surface.
./run --arch arm --cpus 2 --emulator gem5@@ -19604,7 +19636,7 @@ getconf _NPROCESSORS_CONF
User mode simulation QEMU v4.0.0 always shows the number of cores of the host, presumably because the thread switching uses host threads directly which would make that harder to implement.
Some info at: TimingSimpleCPU analysis #1 but highly TODO :-)
TODO These look promising:
we have no caches, each instruction is fetched from memory
each loop contains 11 instructions as shown at Section 35.2, “C busy loop”
+each loop contains 11 instructions as shown at Section 36.2, “C busy loop”
and supposing that the loop dominated executable pre/post main, which we know is true since as shown in Benchmark emulators on userland executables an empty dynamically linked C program only as about 100k instructions, while our loop runs 1000000 * 11 = 12M.
Can be set across emulators with:
This can be explored pretty well from gem5 config.ini.
TODO These look promising:
As of gem5 872cb227fdc0b4d60acc7840889d567a6936b6e1 defaults to 2GHz for fs.py:
Analogous to QEMU:
Analogous to QEMU, on the first shell:
When you want to break, just do a Ctrl-C on GDB shell, and then continue.
And we now see the boot messages, and then get a shell. Now try the ./count.sh procedure described for QEMU at: Section 2.2, “GDB step debug kernel post-boot”.
And we now see the boot messages, and then get a shell. Now try the ./count.sh procedure described for QEMU at: Section 3.2, “GDB step debug kernel post-boot”.
We are unable to use gdbserver because of networking as mentioned at: Section 14.3.1.3, “gem5 host to guest networking”
We are unable to use gdbserver because of networking as mentioned at: Section 15.3.1.3, “gem5 host to guest networking”
The alternative is to do as in GDB step debug userland processes.
@@ -20309,7 +20341,7 @@ hellogem5’s secondary core GDB setup is a hack and spawns one gdbserver for each core in separate ports, e.g. 7000, 7001, etc.
Analogous to QEMU’s Snapshot, but better since it can be started from inside the guest, so we can easily checkpoint after a specific guest event, e.g. just before init is done.
since boot has already happened, and the parameters are already in the RAM of the snapshot.
In order to debug checkpoint restore bugs, this minimal setup using userland/freestanding/gem5_checkpoint.S can be handy:
A quick way to get a gem5 syscall emulation mode or full system checkpoint to observe is:
You want to automate running several tests from a single pristine post-boot state.
gem5 can switch to a different CPU model when restoring a checkpoint.
Besides switching CPUs after a checkpoint restore, fs.py also has the --fast-forward option to automatically run the script from the start on a less detailed CPU, and switch to a more detailed CPU at a given tick.
The in-tree util/cpt_upgrader.py is a tool to upgrade checkpoints taken from an older version of gem5 to be compatible with the newest version, so you can update gem5 without having to re-run the simulation that generated the checkpoints.
Remember that in the gem5 command line, we can either pass options to the script being run as in:
m5ops are magic instructions which lead gem5 to do magic things, like quitting or dumping stats.
m5 is a guest command line utility that is installed and run on the guest, that serves as a CLI front-end for the m5ops
This can be a good test m5ops since it executes very quickly.
End the simulation.
Makes gem5 dump one more statistics entry to the gem5 m5out/stats.txt file.
End the simulation with a failure exit event:
Send a guest file to the host. 9P is a more advanced alternative.
Read a host file pointed to by the fs.py --script option to stdout.
Ermm, just another m5 readfile that only takes integers and only from CLI options? Is this software so redundant?
Trivial combination of m5 readfile + execute the script.
There are few different possible instructions that can be used to implement identical m5ops:
These are magic addresses that when accessed lead to an m5op.
Let’s study how the gem5 m5 executable uses them:
include/gem5/asm/generic/m5ops.h also describes some annotation instructions.
https://gem5.googlesource.com/arm/linux/ contains an ARM Linux kernel forks with a few gem5 specific Linux kernel patches on top of mainline created by ARM Holdings on top of a few upstream kernel releases.
because glibc was built to expect a newer Linux kernel as shown at: Section 10.4.1, “FATAL: kernel too old failure in userland simulation”. Your choices to solve this are:
+because glibc was built to expect a newer Linux kernel as shown at: Section 11.4.1, “FATAL: kernel too old failure in userland simulation”. Your choices to solve this are:
drm: Add component-aware simple encoder allows you to see images through VNC, see: Section 13.3, “gem5 graphic mode”
drm: Add component-aware simple encoder allows you to see images through VNC, see: Section 14.3, “gem5 graphic mode”
gem5: Add support for gem5’s extended GIC mode adds support for more than 8 cores, see: Section 23.3.1.2, “gem5 ARM full system with more than 8 cores”
gem5: Add support for gem5’s extended GIC mode adds support for more than 8 cores, see: Section 24.3.1.2, “gem5 ARM full system with more than 8 cores”
Tested on 649d06d6758cefd080d04dc47fd6a5a26a620874 + 1.
We have observed that with the kernel patches, boot is 2x faster, falling from 1m40s to 50s.
When you run gem5, it generates an m5out directory at:
The files in that directory contains some very important information about the run, and you should become familiar with every one of them.
Contains UART output, both from the Linux kernel or from the baremetal system.
m5out/system.workload.dmesg filem5out/system.workload.dmesg fileThis file used to be called just m5out/system.dmesg, but the name was changed after the workload refactorings of March 2020.
This file contains important statistics about the run:
and after that the file size went down to 21KB.
We can make gem5 dump statistics in the HDF5 format by adding the magic h5:// prefix to the file name as in:
Well, run minimal examples, and reverse engineer them up!
This describes the internals of the gem5 m5out/stats.txt file.
The m5out/config.ini file, contains a very good high level description of the system:
Modifying the config.ini file manually does nothing since it gets overwritten every time.
The m5out/config.dot file contains a graphviz .dot file that provides a simplified graphical view of a subset of the gem5 config.ini.
We use the m5term in-tree executable to connect to the terminal instead of a direct telnet.
We have made a crazy setup that allows you to just cd into submodules/gem5, and edit Python scripts directly there.
By default, we use configs/example/fs.py script.
But can the people from the project be convinced of that?
These are just very small GTest tests that test a single class in isolation, they don’t run any executables.
This section is about running the gem5 in-tree tests.
This error happens when the following instruction limits are reached:
In order to use different build options, you might also want to use gem5 build variants to keep the build outputs separate from one another.
How to use it in LKMC: Section 22.8, “Debug the emulator”.
+How to use it in LKMC: Section 23.8, “Debug the emulator”.
If you build gem5 with scons build/ARM/gem5.debug, then that is a .debug build.
It relates to the more common .opt build just as explained at Section 22.8, “Debug the emulator”: both .opt and .debug have -g, but .opt uses -O2 while .debug uses -O0.
It relates to the more common .opt build just as explained at Section 23.8, “Debug the emulator”: both .opt and .debug have -g, but .opt uses -O2 while .debug uses -O0.
./build-gem5 --gem5-build-type fast@@ -22571,13 +22603,13 @@ Exiting @ tick 18446744073709551615 because simulate() limit reached
Profiling builds as of 3cea7d9ce49bda49c50e756339ff1287fd55df77 both use: -g -O3 and disable asserts and logging like the gem5 fast build and:
TODO test properly, benchmark vs GCC.
If there gem5 appears to have a C++ undefined behaviour bug, which is often very difficult to track down, you can try to build it with the following extra SCons options:
gem5 has two types of memory system:
Tested in gem5 d7d9bc240615625141cd6feddbadd392457e49eb.
This is the simplest of all protocols, and therefore the first one you should study to learn how Ruby works.
Crossbar or XBar in the code, is the default CPU interconnect that gets used by fs.py if --ruby is not given.
Python 3 support was mostly added in 2019 Q3 at arounda347a1a68b8a6e370334be3a1d2d66675891e0f1 but remained buggy for some time afterwards.
gem5 has a few in tree CPU models for different purposes.
From this we see that there are basically only 4 C++ CPU models in gem5: Atomic, Timing, Minor and O3. All others are basically parametrizations of those base types.
BaseSimpleCPUBaseSimpleCPUSimple abstract CPU without a pipeline.
AtomicSimpleCPUAtomicSimpleCPUAtomicSimpleCPU: the default one. Memory accesses happen instantaneously. The fastest simulation except for KVM, but not realistic at all.
TimingSimpleCPUTimingSimpleCPUTimingSimpleCPU: memory accesses are realistic, but the CPU has no pipeline. The simulation is faster than detailed models, but slower than AtomicSimpleCPU.
Generic in-order superscalar core.
DerivO3CPUDerivO3CPUGeneric out-of-order core. "O3" Stands for "Out Of Order"!
DerivO3CPU pipeline stagesDerivO3CPU pipeline stagesMentioned at: http://www.m5sim.org/Visualization
The gem5 platform is selectable with the --machine option, which is named after the analogous QEMU -machine option, and which sets the --machine-type.
Certain ISAs like ARM have bootloaders that are automatically run before the main image to setup basic system state.
Parent section: gem5 internals.
The gem5 memory system is connected in a very flexible way through the port system.
A Packet is the basic information unit that gets sent across ports.
gem5 memory requests can be classified in the following broad categories:
Tested in gem5 b1623cb2087873f64197e503ab8894b5e4d4c7b4.
As seen at gem5 functional vs atomic vs timing memory requests, functional requests are not used in common simulation, since the core must always go through caches.
Packet vs RequestPacket vs RequestPacketPacketPacket is what goes through ports: a single packet is sent out to the memory system, gets modified when it hits valid data, and then returns with the reply.
MemCmdMemCmdEach gem5 Packet contains a MemCmd
RequestRequestOne good way to think about Request vs Packet could be "it is what the instruction definitions see", a bit like ExecContext vs ThreadContext.
Request in AtomicSimpleCPURequest in AtomicSimpleCPUIn AtomicSimpleCPU, a single packet of each type is kept for the entire CPU, e.g.:
Request in TimingSimpleCPURequest in TimingSimpleCPUIn TimingSimpleCPU, the request gets created per memory read:
MSHRMSHRCommMonitorCommMonitorSimpleMemorySimpleMemorySimpleMemory is a highly simplified memory system. It can replace a more complex DRAM model if you use it e.g. as:
Internals under other sections:
The interaction uses the Python C extension interface https://docs.python.org/2/extending/extending.html interface through the pybind11 helper library: https://github.com/pybind/pybind11
The main is at: src/sim/main.cc. It calls:
Tested at gem5 b4879ae5b0b6644e6836b0881e4da05c64a6550d.
m5.objects modulem5.objects moduleAll SimObjects seem to be automatically added to the m5.objects namespace, and this is done in a very convoluted way, let’s try to understand a bit:
gem5 is an event based simulator, and as such the event queue is of of the crucial elements in the system.
Then, once we had that, the most perfect thing ever would be to make the full event graph containing which events schedule which events!
Let’s now analyze every single event on a minimal gem5 syscall emulation mode in the simplest CPU that we have:
Tested in gem5 12c917de54145d2d50260035ba7fa614e25317a3.
Let’s have a closer look at the initial magically scheduled events of the simulation.
Inside AtomicSimpleCPU::tick() we saw previously that the reschedule happens at:
It will be interesting to see how AtomicSimpleCPU makes memory access on GDB and to compare that with TimingSimpleCPU.
Happens on EmulationPageTable, and seems to happen atomically without making any extra memory requests.
Now, let’s move on to TimingSimpleCPU, which is just like AtomicSimpleCPU internally, but now the memory requests don’t actually finish immediately: gem5 CPU types!
Schedules TimingSimpleCPU::fetch through:
Backtrace:
This is just the startup of the second rank, see: TimingSimpleCPU analysis #1.
From the timing we know what that one is: the end of time exit event, like for AtomicSimpleCPU.
Executes TimingSimpleCPU::fetch().
Schedules DRAMCtrl::processNextReqEvent through:
Schedules BaseXBar::Layer::releaseLayer through:
Executes DRAMCtrl::processNextReqEvent.
Schedules DRAMCtrl::Rank::processActivateEvent through:
Schedules DRAMCtrl::processRespondEvent through:
Schedules DRAMCtrl::processNextReqEvent through:
Executes DRAMCtrl::Rank::processActivateEvent.
Schedules DRAMCtrl::Rank::processPowerEvent through:
Executes DRAMCtrl::Rank::processPowerEvent.
Executes BaseXBar::Layer<SrcType, DstType>::releaseLayer.
Executes DRAMCtrl::processNextReqEvent().
Executes DRAMCtrl::processRespondEvent().
Schedules PacketQueue::processSendEvent() through:
Executes PacketQueue::processSendEvent().
Schedules PacketQueue::processSendEvent through:
Schedules BaseXBar::Layer<SrcType, DstType>::releaseLayer through:
Executes BaseXBar::Layer<SrcType, DstType>::releaseLayer.
Executes PacketQueue::processSendEvent.
Schedules TimingSimpleCPU::IcachePort::ITickEvent::process() through:
Executes TimingSimpleCPU::IcachePort::ITickEvent::process().
Schedules DRAMCtrl::processNextReqEvent through:
Schedules BaseXBar::Layer<SrcType, DstType>::releaseLayer through:
Execute DRAMCtrl::processNextReqEvent.
Schedule DRAMCtrl::processRespondEvent().
One important thing we want to check now, is how the memory reads are going to make the processor stall in the middle of an instruction.
Let’s just add --caches to gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis to see if things go any faster, and add Cache to --trace as in:
At 1000, the future event is executed, and so it reads the original packet from the MSHR, and uses that to create a new request [40:7f] which gets forwarded.
MOESI cache coherence protocol: https://github.com/gem5/gem5/blob/9fc9c67b4242c03f165951775be5cd0812f2a705/src/mem/cache/cache_blk.hh#L352
The actual representation is done via separate state bits: https://github.com/gem5/gem5/blob/9fc9c67b4242c03f165951775be5cd0812f2a705/src/mem/cache/cache_blk.hh#L66 and MOESI appears explicitly only on the pretty printing.
This pretty printing appears for example in the --trace Cache lines as shown at gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis with caches and with a few more transitions visible at Section 23.22.4.4, “gem5 event queue AtomicSimpleCPU syscall emulation freestanding example analysis with caches and multiple CPUs”.
This pretty printing appears for example in the --trace Cache lines as shown at gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis with caches and with a few more transitions visible at Section 24.22.4.4, “gem5 event queue AtomicSimpleCPU syscall emulation freestanding example analysis with caches and multiple CPUs”.
It would be amazing to analyze a simple example with interconnect packets possibly invalidating caches of other CPUs.
Now let’s do the exact same we did for gem5 event queue AtomicSimpleCPU syscall emulation freestanding example analysis with caches and multiple CPUs, but with Ruby rather than the classic system and TimingSimpleCPU (atomic does not work with Ruby)
The events for the Atomic CPU were pretty simple: basically just ticks.
TODO like gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazard but with the hazard.
Like gem5 event queue MinorCPU syscall emulation freestanding example analysis but even more complex since for the gem5 DerivO3CPU!
This section and children are tested at LKMC 144a552cf926ea630ef9eadbb22b79fe2468c456.
Let’s have a look at the arguably simplest example userland/arch/aarch64/freestanding/linux/hazardless.S.
Now let’s do the same as in gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazardless but with a hazard: userland/arch/aarch64/freestanding/linux/hazard.S.
Like gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: hazard but now with an LDR stall: userland/arch/aarch64/freestanding/linux/stall.S.
Like gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall but now with an LDR stall: userland/arch/aarch64/freestanding/linux/stall_gain.S.
Like gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall_gain but now with some dependencies after the LDR: userland/arch/aarch64/freestanding/linux/stall_hazard4.S.
Now let’s try to see some Speculative execution in action with userland/arch/aarch64/freestanding/linux/speculative.S.
This is one of the parts of gem5 that rely on semi-useless code generation inside the .isa sublanguage.
We also notice that the key argument passed to those instructions is of type ExecContext, which is discussed further at: Section 23.22.6.3, “gem5 ExecContext”.
We also notice that the key argument passed to those instructions is of type ExecContext, which is discussed further at: Section 24.22.6.3, “gem5 ExecContext”.
The file is an include so that compilation can be split up into chunks by the autogenerated includers
@@ -28514,7 +28546,7 @@ namespace ArmISAInst {Tested in gem5 b1623cb2087873f64197e503ab8894b5e4d4c7b4.
execute vs initiateAcc vs completeAccexecute vs initiateAcc vs completeAccThese are the key methods defined in instruction definitions, so lets see when each one gets called and what they do more or less.
This can be seen concretely in GDB from the analysis done at: TimingSimpleCPU analysis: LDR stall and for more memory details see gem5 functional vs atomic vs timing memory requests.
completeAcccompleteAcccompleteAcc is boring on most simple store memory instructions, e.g. a simple STR:
Some gem5 instructions break down into multiple microops.
ThreadContext vs ThreadState vs ExecContext vs ProcessThreadContext vs ThreadState vs ExecContext vs ProcessThese classes get used everywhere, and they have a somewhat convoluted relation with one another, so let’s figure it out this mess.
This section and all children tested at gem5 b1623cb2087873f64197e503ab8894b5e4d4c7b4.
ThreadContextThreadContextAs we delve into more details below, we will reach the following conclusion: a ThreadContext represents on thread of a CPU with multiple Hardware threads.
Essentially all methods of the base ThreadContext are pure virtual.
SimpleThreadSimpleThreadSimpleThread storage defined on BaseSimpleCPU for simple CPUs like AtomicSimpleCPU:
O3ThreadContextO3ThreadContextInstantiation happens in the FullO3CPU constructor:
ThreadStateThreadStateOwned one per ThreadContext.
ExecContextExecContextExecContext gets used in gem5 instruction definitions, e.g.:
This makes sense, since each ThreadContext represents one CPU register set, and therefore needs a separate ExecContext which allows instruction implementations to access those registers.
ExecContext::readIntRegOperand register resolutionExecContext::readIntRegOperand register resolutionLet’s have a look at how ExecContext::readIntRegOperand actually matches registers to decoded registers IDs, since it is not obvious.
First, we guess that they must be related to the reading of x1 and x2, which are the inputs of the addition.
Next, we also guess that the 0 read must correspond to x2, since it later gets potentially shifted as mentioned at Section 29.4.4.1, “ARM shift suffixes”.
Next, we also guess that the 0 read must correspond to x2, since it later gets potentially shifted as mentioned at Section 30.4.4.1, “ARM shift suffixes”.
Let’s also have a look at the decoder code that builds the instruction instance in build/ARM/arch/arm/generated/decoder-ns.cc.inc:
ProcessProcessThe Process class is used only for gem5 syscall emulation mode, and it represents a process like a Linux userland process, in addition to any further gem5 specific data needed to represent the process.
Each instruction is marked with a class, and each class can execute in a given functional unit.
MinorCPU default functional unitsMinorCPU default functional unitsWhich units are available is visible for example on the gem5 config.ini of a gem5 MinorCPU run. Functional units are not present in simple CPUs like gem5 TimingSimpleCPU.
On gem5 3ca404da175a66e0b958165ad75eb5f54cb5e772, after running:
gem5 uses a ton of code generation, which makes the project horrendous:
But it has been widely overused to insanity. It likely also exists partly because when the project started in 2003 C++ compilers weren’t that good, so you couldn’t rely on features like templates that much.
Generated code at: build/<ISA>/config/the_isa.hh which e.g. for ARM contains:
gem5 moves a bit slowly, and if your host compiler is very new, the gem5 build might be broken for it, e.g. this was the case for Ubuntu 19.10 with GCC 9 and gem5 62d75e7105fe172eb906d4f80f360ff8591d4178 from Dec 2019.
E.g. src/cpu/decode_cache.hh includes:
Upstream request: https://gem5.atlassian.net/browse/GEM5-469
Buildroot is a set of Make scripts that download and compile from source compatible versions of:
Linux kernel
C standard library: Buildroot supports several implementations, see: Section 25.10, “libc choice”
+C standard library: Buildroot supports several implementations, see: Section 26.10, “libc choice”
BusyBox: provides the shell and basic command line utilities
@@ -30155,7 +30187,7 @@ gensim/models/armv8/isa.acIt therefore produces a pristine, blob-less, debuggable setup, where all moving parts are configured to work perfectly together.
Perhaps the awesomeness of Buildroot only sinks in once you notice that all it takes is 4 commands as explained at Section 25.11, “Buildroot hello world”.
+Perhaps the awesomeness of Buildroot only sinks in once you notice that all it takes is 4 commands as explained at Section 26.11, “Buildroot hello world”.
The downsides of Buildroot are:
@@ -30203,7 +30235,7 @@ gensim/models/armv8/isa.acWe provide the following mechanisms:
The clean is necessary because the source files didn’t change, so make would just check the timestamps and not build anything.
You will then likely want to make those more permanent as explained at: Section 37.4, “Default command line arguments”.
+You will then likely want to make those more permanent as explained at: Section 38.4, “Default command line arguments”.
If you are benchmarking compiled programs instead of hand written assembly, remember that we configure Buildroot to disable optimizations by default with:
if you already have a full -O0 build, you can choose to rebuild just your package of interest to save some time as described at: Section 25.2, “Custom Buildroot configs”
if you already have a full -O0 build, you can choose to rebuild just your package of interest to save some time as described at: Section 26.2, “Custom Buildroot configs”
./build-buildroot \ @@ -30292,7 +30324,7 @@ gensim/models/armv8/isa.acMaybe you can get away with rebuilding libc, but I’m not sure that it will work properly.
Kernel-wise it should be fine though as mentioned at: Section 2.1.2, “Disable kernel compiler optimizations”
+Kernel-wise it should be fine though as mentioned at: Section 3.1.2, “Disable kernel compiler optimizations”
make menuconfig is a convenient way to find Buildroot configurations:
At startup, we login automatically as the root user.
Replace on inittab:
These are your options:
First, see if you can’t get away without actually adding a new package, for example:
if you have a standalone C file with no dependencies besides the C standard library to be compiled with GCC, just add a new file under buildroot_packages/sample_package and you are done
if you have a dependency on a library, first check if Buildroot doesn’t have a package for it already with ls buildroot/package. If yes, just enable that package as explained at: Section 25.2, “Custom Buildroot configs”
if you have a dependency on a library, first check if Buildroot doesn’t have a package for it already with ls buildroot/package. If yes, just enable that package as explained at: Section 26.2, “Custom Buildroot configs”
If none of those methods are flexible enough for you, you can just fork or hack up buildroot_packages/sample_package the sample package to do what you want.
For how to use that package, see: Section 37.15.2, “buildroot_packages directory”.
+For how to use that package, see: Section 38.15.2, “buildroot_packages directory”.
Then iterate trying to do what you want and reading the manual until it works: https://buildroot.org/downloads/manual/manual.html
@@ -30485,7 +30517,7 @@ make menuconfigOnce you’ve built a package in to the image, there is no easy way to remove it.
When adding new large package to the Buildroot root filesystem, it may fail with the message:
libguestfs: https://serverfault.com/questions/246835/convert-directory-to-qemu-kvm-virtual-disk-image/916697#916697, in particular vfs-minimum-size
use methods described at: Section 23.6.3, “gem5 checkpoint restore and run a different script” instead of putting builds on the root filesystem
+use methods described at: Section 24.6.3, “gem5 checkpoint restore and run a different script” instead of putting builds on the root filesystem
Bibliography: https://stackoverflow.com/questions/49211241/is-there-a-way-to-automatically-detect-the-minimum-required-br2-target-rootfs-ex
SquashFS creation with mksquashfs does not take fixed sizes, and I have successfully booted from it, but it is readonly, which is unacceptable.
Buildroot is not designed for large root filesystem images, and the rebuild becomes very slow when we add a large package to it.
When asking for help on upstream repositories outside of this repository, you will need to provide the commands that you are running in detail without referencing our scripts.
Then, you will also want to do a Bisection to pinpoint the exact commit to blame, and CC that developer.
Finally, give the images you used save upstream developers' time as shown at: Section 37.19.2, “release-zip”.
+Finally, give the images you used save upstream developers' time as shown at: Section 38.19.2, “release-zip”.
For Buildroot problems, you should wither provide the config you have:
@@ -30675,7 +30707,7 @@ git -C "$(./getvar qemu_source_dir)" checkout -Buildroot supports several libc implementations, including:
One "downside" of glibc is that it exercises much more kernel functionality on its more bloated pre-main init, which breaks user mode C hello worlds more often, see: Section 10.4, “User mode simulation with glibc”. I quote "downside" because glibc is actually exposing emulator bugs which we should actually go and fix.
+One "downside" of glibc is that it exercises much more kernel functionality on its more bloated pre-main init, which breaks user mode C hello worlds more often, see: Section 11.4, “User mode simulation with glibc”. I quote "downside" because glibc is actually exposing emulator bugs which we should actually go and fix.
This repo doesn’t do much more other than setting a bunch of Buildroot configurations and building it.
x86_64 X11 https://unix.stackexchange.com/questions/70931/how-to-install-x11-on-my-own-linux-buildroot-system/306116#306116 Also mentioned at: Section 13.4, “X11 Buildroot”.
+x86_64 X11 https://unix.stackexchange.com/questions/70931/how-to-install-x11-on-my-own-linux-buildroot-system/306116#306116 Also mentioned at: Section 14.4, “X11 Buildroot”.
Users of this repo will often want to update the compilation toolchain to the latest version to get fresh new features like new ISA instructions.
In this section we cover the most common cases.
This is of course the simplest case.
Now it gets fun, but well, guess what, we will try to do the same as Section 25.12.1, “Update GCC: GCC supported by Buildroot” but:
+Now it gets fun, but well, guess what, we will try to do the same as Section 26.12.1, “Update GCC: GCC supported by Buildroot” but:
By default, our build system uses build-linux, and the Buildroot kernel build is disabled: https://stackoverflow.com/questions/52231793/can-buildroot-build-the-root-filesystem-without-building-the-linux-kernel
This section documents our test and educational userland content, such as C, C++ and POSIX examples, present mostly under userland/.
Getting started at: Section 1.8, “Userland setup”
+Getting started at: Section 2.8, “Userland setup”
Userland assembly content is located at: Section 27, “Userland assembly”. It was split from this section basically because we were hitting the HTML h6 limit, stupid web :-)
Userland assembly content is located at: Section 28, “Userland assembly”. It was split from this section basically because we were hitting the HTML h6 limit, stupid web :-)
This content makes up the bulk of the userland/ directory.
The quickest way to run the arch agnostic examples, which comprise the majority of the examples, is natively as shown at: Section 1.8.2.1, “Userland setup getting started natively”
+The quickest way to run the arch agnostic examples, which comprise the majority of the examples, is natively as shown at: Section 2.8.2.1, “Userland setup getting started natively”
This section was originally moved in here from: https://github.com/cirosantilli/cpp-cheat
Programs under userland/c/ are examples of ANSI C programming:
Allocate memory! Vs using the stack: https://stackoverflow.com/questions/4584089/what-is-the-function-of-the-push-pop-instructions-used-on-registers-in-x86-ass/33583134#33583134
malloc leads to the infinite joys of Memory leaks.
TODO: the exact answer is going to be hard.
General overview at: https://stackoverflow.com/questions/2798330/maximum-memory-which-malloc-can-allocate
If we start using the pages, the OOM killer would sooner or later step in and kill our process: Linux out-of-memory killer.
We can observe the OOM in LKMC 1e969e832f66cb5a72d12d57c53fb09e9721d589 which defaults to 256MiB of memory with:
Added in C11!
Example: userland/gcc/empty_struct.c
GCC implements the OpenMP threading implementation: https://stackoverflow.com/questions/3949901/pthreads-vs-openmp
Programs under userland/cpp/ are examples of ISO C programming.
Like for C, you have to pay for the standards… insane. So we just use the closest free drafts instead.
OMG this is hell, understand when primitive variables are initialized or not:
The smallest data race we managed to come up as of LKMC 7c01b29f1ee7da878c7cc9cb4565f3f3cf516a92 and gem5 872cb227fdc0b4d60acc7840889d567a6936b6e1 was with userland/c/atomic.c (see also C multithreading):
decltypedecltypeUnder: userland/libs directory.
Programs under userland/posix/ are examples of POSIX C programming.
POSIX C example that prints all environment variables: userland/posix/environ.c
POSIX' multiprocess API. Contrast with pthreads which are for threads.
Read the source comments and understand everything that is going on!
The minimal interesting example is to use fork and observe different PIDs.
POSIX' multithreading API. Contrast with fork which is for processes.
userland/posix/pthread_count.c exemplifies the functions:
The mmap system call allows advanced memory operations.
Basic mmap example, do the same as userland/c/malloc.c, but with mmap.
Memory mapped file example: userland/posix/mmap_file.c
A bit like read and write, but from / to the Internet!
The following sections are related to multithreading in userland:
Let’s group the hard-to-debug undefined-behaviour-like stuff found in C / C+ here and how to tackle those problems.
Maybe some day someone will use this setup to study the performance of interpreters.
rootfs_overlay/lkmc/python/unittest_find/ contains examples to test how tests are found by unittest within directories. Related questions:
rootfs_overlay/lkmc/python/relative_import/ contains examples to test how how to do relative imports in Python.
Buildroot has a Python package that can be added to the guest image:
At LKMC 50ac89b779363774325c81157ec8b9a6bdb50a2f gem5 390a74f59934b85d91489f8a563450d8321b602da:
Here we will add some better examples and explanations for: https://docs.python.org/3/extending/embedding.html#very-high-level-embedding
Overviews:
Illustrates how to add extra non-code data files to an NPM package, and then use those files at runtime.
These are good targets for performance analysis with gem5, and there is some overlap between this section and Benchmarks.
TODO: move benchmark graph from userland/cpp/bst_vs_heap_vs_hashmap.cpp to userland/algorithm/set.
The cache sizes were chosen to match the host 2017 Lenovo ThinkPad P51 to improve the comparison. Ideally we should also use the same standard library.
Note that this will take a long time, and will produce a humongous ~40Gb stats file as explained at: Section 23.10.3.2, “gem5 only dump selected stats”
+Note that this will take a long time, and will produce a humongous ~40Gb stats file as explained at: Section 24.10.3.2, “gem5 only dump selected stats”
Sources:
@@ -33382,7 +33414,7 @@ xdg-open bst_vs_heap_vs_hashmap_gem5.tmp.pngBuildroot supports it, which makes everything just trivial:
Header only linear algebra library with a mainline Buildroot package:
These are good targets for performance analysis with gem5.
It eventually has to come to that, hasn’t it?
Tests under userland/libs require certain optional libraries to be installed on the target, and are not built or tested by default, you must enable them with either:
The following basenames should always refer to programs that do the same thing, but in different languages:
Programs under userland/arch/<arch>/ are examples of userland assembly programming.
Like other userland programs, these programs can be run as explained at: Section 1.8, “Userland setup”.
+Like other userland programs, these programs can be run as explained at: Section 2.8, “Userland setup”.
As a quick reminder, the fastest setups to get started are:
@@ -33663,7 +33695,7 @@ cd userland/libs/eigenHowever, as usual, it is saner to build your toolchain as explained at: Section 10.1, “QEMU user mode getting started”.
+However, as usual, it is saner to build your toolchain as explained at: Section 11.1, “QEMU user mode getting started”.
The first examples you should look into are:
@@ -33716,7 +33748,7 @@ cd userland/libs/eigenregisters, see: Section 27.1, “Assembly registers”
+registers, see: Section 28.1, “Assembly registers”
jumping:
@@ -33859,14 +33891,14 @@ error: asm_main returned 1 at line 8After seeing an ADD hello world, you need to learn the general registers:
x86, see: Section 28.1, “x86 registers”
+x86, see: Section 29.1, “x86 registers”
arm
@@ -33897,7 +33929,7 @@ error: asm_main returned 1 at line 8Bibliography: ARMv7 architecture reference manual A2.3 "ARM core registers".
Example: userland/arch/aarch64/x31.S
Keep in mind that many ISAs started floating point as an optional thing, and it later got better integrated into the main CPU, side by side with SIMD.
Much like ADD for non-SIMD, start learning SIMD instructions by looking at the integer and floating point SIMD ADD instructions of each ISA:
Bibliography: https://stackoverflow.com/questions/1389712/getting-started-with-intel-x86-sse-simd-instructions/56409539#56409539
Fused multiply add:
By "userland assembly", we mean "the parts of the ISA which can be freely used from userland".
One big difference between both is that we can run userland assembly on Userland setup, which is easier to get running and debug.
In particular, most userland assembly examples link to the C standard library, see: Section 27.5, “Userland assembly C standard library”.
+In particular, most userland assembly examples link to the C standard library, see: Section 28.5, “Userland assembly C standard library”.
Userland assembly is generally simpler, and a pre-requisite for Baremetal setup.
System-land assembly cheats will be put under: Section 1.9, “Baremetal setup”.
+System-land assembly cheats will be put under: Section 2.9, “Baremetal setup”.
All examples except the Freestanding programs link to the C standard library.
Unlike most our other assembly examples, which use the C standard library for portability, examples under freestanding/ directories don’t link to the C standard library:
Assembly examples under nostartfiles directories can use the standard library, but they don’t use the pre-main boilerplate and start directly at our explicitly given _start:
Examples under arch/<arch>/c/ directories show to how use inline assembly from higher level languages such as C:
Used notably in some of the Linux system calls setups:
In arm, it is the only way to achieve this effect: https://stackoverflow.com/questions/10831792/how-to-use-specific-register-in-arm-inline-assembler
This feature notably useful for making system calls from C, see: Section 27.7, “Linux system calls”.
+This feature notably useful for making system calls from C, see: Section 28.7, “Linux system calls”.
How to use temporary registers in inline assembly:
An example of using the & early-clobber modifier: link:userland/arch/aarch64/earlyclobber.c
Not documented as of GCC 8.2, but possible: https://stackoverflow.com/questions/53960240/armv8-floating-point-output-inline-assembly
Pre-existing C wrappers using inline assembly, this is what production programs should use instead of inline assembly for SIMD:
Good official cheatsheet with all intrinsics and what they expand to: https://software.intel.com/sites/landingpage/IntrinsicsGuide
The following Userland setup programs illustrate how to make system calls:
This is how threads either:
The best article to understand spinlocks is: https://eli.thegreenplace.net/2018/basics-of-futexes/
getcpu system call and the sched_getaffinity glibc wrappergetcpu system call and the sched_getaffinity glibc wrapperExamples:
perf_event_open system callperf_event_open system callA summary of results is shown at: Table 3, “Summary of Linux calling conventions for several architectures”.
Examples:
Call C standard library functions from assembly and vice versa.
GNU GAS is the default assembler used by GDB, and therefore it completely dominates in Linux.
The Linux kernel in particular uses GNU GAS assembly extensively for the arch specific parts under arch/.
In this tutorial, we use exclusively C Preprocessor /**/ comments because:
Summary:
Let’s see how many bytes go into each data type:
There are two types of ARMv7 assemblies:
cannot have implicit destination with shift, see: Section 29.4.4.1, “ARM shift suffixes”
+cannot have implicit destination with shift, see: Section 30.4.4.1, “ARM shift suffixes”
When reading disassembly, many instructions have either a .n or .w suffix.
Arch agnostic infrastructure getting started at: Section 27, “Userland assembly”.
+Arch agnostic infrastructure getting started at: Section 28, “Userland assembly”.
link:userland/arch/x86_64/registers.S
Example: userland/arch/x86_64/address_modes.S
5.1.1 "Data Transfer Instructions"
Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 7.3.1.2 "Exchange Instructions":
TODO: concrete multi-thread GCC inline assembly examples of how all those instructions are normally used as synchronization primitives.
Examples:
This is partly why the ternary ? C operator exists: https://stackoverflow.com/questions/3565368/ternary-operator-vs-if-else
It is interesting to compare this with ARMv7 conditional execution: which is available for all instructions, as shown at: Section 29.2.5, “ARM conditional execution”.
+It is interesting to compare this with ARMv7 conditional execution: which is available for all instructions, as shown at: Section 30.2.5, “ARM conditional execution”.
Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.1.2 "Binary Arithmetic Instructions":
Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.1.4 "Logical Instructions"
Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.1.5 "Shift and Rotate Instructions"
Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.1.6 "Bit and Byte Instructions"
Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.1.7 "Control Transfer Instructions"
Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.1.8 "String Instructions"
However, as computer architecture evolved, those instructions might not offer considerable speedups anymore, and modern glibc such as 2.29 just uses x86 SIMD operations instead:, see also: https://stackoverflow.com/questions/33480999/how-can-the-rep-stosb-instruction-execute-faster-than-the-equivalent-loop
Example: userland/arch/x86_64/rep.S
Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.1.13 "Miscellaneous Instructions"
Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.1.15 Random Number Generator Instructions
RDRAND sets the carry flag when data is ready so we must loop if the carry flag isn’t set.
Example: userland/arch/x86_64/cpuid.S
Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.2 "X87 FPU INSTRUCTIONS"
Parent section: Section 27.3, “SIMD assembly”
+Parent section: Section 28.3, “SIMD assembly”
History:
@@ -36502,12 +36534,12 @@ pop %rbpIntel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.5 "SSE INSTRUCTIONS"
Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.5.1.1 "SSE Data Transfer Instructions"
Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.5.1.2 "SSE Packed Arithmetic Instructions"
Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.5.1.6 "SSE Conversion Instructions"
Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.6 "SSE2 INSTRUCTIONS"
userland/arch/x86_64/paddq.S: PADDQ, PADDL, PADDW, PADDB
Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.15 "FUSED-MULTIPLY-ADD (FMA)"
Intel 64 and IA-32 Architectures Software Developer’s Manuals Volume 1 5.20 "SYSTEM INSTRUCTIONS"
Sources:
RDTSCP is like RDTSP, but it also stores the CPU ID into ECX: this is convenient because the value of RDTSC depends on which core we are currently on, so you often also want the core ID when you want the RDTSC.
Inline assembly example at: userland/cpp/atomic/x86_64_lock_inc.cpp, see also: atomic.cpp.
We are using the May 2019 version unless otherwise noted.
Also I can’t find older versions on the website easily, so I just web archive everything.
Arch general getting started at: Section 27, “Userland assembly”.
+Arch general getting started at: Section 28, “Userland assembly”.
Instructions here loosely grouped based on that of the ARMv7 architecture reference manual Chapter A4 "The Instruction Sets".
@@ -36812,7 +36844,7 @@ taskset -c 1 ./userland/arch/x86_64/rdtscp.out | tail -n 1We cover here mostly ARMv7, and then treat aarch64 differentially, since much of the ARMv7 userland is the same in aarch32.
The ARM architecture is has been used on the vast majority of mobile phones in the 2010’s, and on a large fraction of micro controllers.
ARM Holdings was bought by the Japanese giant SoftBank in 2016.
ARMv7 is the older architecture described at: ARMv7 architecture reference manual.
They are described at: ARMv8 architecture reference manual A1.7 "ARMv8 architecture extensions".
32-bit mode of operation of ARMv8.
A great summary of differences can be found at: https://en.wikipedia.org/wiki/ARM_architecture#AArch64_features
aarch32 has two encodings: Thumb and ARM: Section 29.1.3, “ARM instruction encodings”
+aarch32 has two encodings: Thumb and ARM: Section 30.1.3, “ARM instruction encodings”
in ARMv8, the stack can be enforced to 16-byte alignment: Section 29.3.2.2.1, “ARMV8 aarch64 stack alignment”
+in ARMv8, the stack can be enforced to 16-byte alignment: Section 30.3.2.2.1, “ARMV8 aarch64 stack alignment”
The ARM instruction set is itself protected by patents / copyright / whatever, and you have to pay ARM Holdings a licence to implement it, even if you are creating your own custom Verilog code.
Understanding the basics of instruction encodings is fundamental to help you to remember what instructions do and why some things are possible or not, notably the ARM LDR pseudo-instruction and the ADRP instruction.
Thumb examples are available at:
ARM can switch between big and little endian mode on the fly!
Unconditional branch.
Branch if equal based on the status registers.
Branch with link, i.e. branch and store the return address on the RL register.
Example: userland/arch/aarch64/ret.S
Compare and branch if zero.
Weirdly, ARM B instruction and family are not the only instructions that can execute conditionally on the flags: the same also applies to most instructions, e.g. ADD.
In ARM, there are only two instruction families that do memory access:
LDR can be either a regular instruction that loads stuff into memory, or also a pseudo-instruction (assembler magic): http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0041c/Babbfdih.html
Example: userland/arch/arm/address_modes.S
As an application of the post-indexed addressing mode, let’s increment an array.
There are LDR variants that load less than full 4 bytes:
Store from memory into registers.
PC-relative STR is not possible in aarch64.
Push a pair of registers to the stack.
In ARMv8, the stack can be enforced to 16-byte alignment.
Pop values form stack into the register and optionally update the address register.
Arithmetic:
Example: userland/arch/aarch64/cset.S
Bitwise Bit Clear: clear some bits.
Unsigned Bitfield Move.
TODO: explain full behaviour. Very complicated. Has several simpler to understand aliases.
Alias for:
TODO: explain. Similar to UBFM but leave untouched bits unmodified.
Examples:
Move an immediate to a register, or a register to another register.
Cannot load from or to memory, since only the LDR and STR instruction families can do that in ARM as mentioned at: Section 29.3, “ARM load and store instructions”.
+Cannot load from or to memory, since only the LDR and STR instruction families can do that in ARM as mentioned at: Section 30.3, “ARM load and store instructions”.
Example: userland/arch/arm/mov.S
@@ -37915,7 +37947,7 @@ ldmia sp!, reglistAssemblers however support magic memory allocations which may hide what is truly going on: https://stackoverflow.com/questions/14046686/why-use-ldr-over-mov-or-vice-versa-in-arm-assembly Always ask your friendly disassembly for a good confirmation.
Set the higher or lower 16 bits of a register to an immediate in one go.
Fill a 64 bit register with 4 16-bit instructions one at a time.
Set 16-bits negated and the rest to 1.
Most data processing instructions can also optionally shift the second register operand.
Example: userland/arch/arm/s_suffix.S
Similar rationale to the ARM LDR pseudo-instruction, allowing to easily store a PC-relative reachable address into a register in one go, to overcome the 4-byte fixed instruction size.
Parent section: Section 27.10, “NOP instructions”
+Parent section: Section 28.10, “NOP instructions”
There are a few different ways to encode NOP, notably MOV a register into itself, and a dedicated miscellaneous instruction.
@@ -38055,7 +38087,7 @@ ldmia sp!, reglistGuaranteed undefined! Therefore raise illegal instruction signal. Used by GCC __builtin_trap apparently: https://stackoverflow.com/questions/16081618/programmatically-cause-undefined-instruction-exception
Examples of using them can be found at: dump_regs
Each aarch64 system register is specified in the encoding of ARM system register instructions by 5 integer numbers:
Parent section: Section 27.3, “SIMD assembly”
+Parent section: Section 28.3, “SIMD assembly”
The name for the ARMv7 and AArch32 floating point and SIMD instructions / registers.
TODO example
userland/arch/arm/vadd_scalar.S: see also: Section 27.2, “Floating point assembly”
+userland/arch/arm/vadd_scalar.S: see also: Section 28.2, “Floating point assembly”
userland/arch/arm/vadd_vector.S: see also: Section 27.3, “SIMD assembly”
+userland/arch/arm/vadd_vector.S: see also: Section 28.3, “SIMD assembly”
Example: userland/arch/arm/vcvt.S
Example: userland/arch/arm/vcvtr.S
Example: userland/arch/arm/vcvt.S
The ARMv8 architecture reference manual specifies floating point and SIMD support in the main architecture at A1.5 "Advanced SIMD and floating-point support".
The Linux kernel shows /proc/cpuinfo compatibility as neon, which is yet another intermediate name that came up at some point, see: Section 29.6.2.2, “ARM NEON”.
The Linux kernel shows /proc/cpuinfo compatibility as neon, which is yet another intermediate name that came up at some point, see: Section 30.6.2.2, “ARM NEON”.
Vs ARM VFP: https://stackoverflow.com/questions/4097034/arm-cortex-a8-whats-the-difference-between-vfp-and-neon
Support is semi-mandatory. ARMv8 architecture reference manual A1.5 "Advanced SIMD and floating-point support":
Just an informal name for the "Advanced SIMD instructions"? Very confusing.
TODO example.
userland/arch/aarch64/fadd_vector.S: see also: Section 27.3, “SIMD assembly”
+userland/arch/aarch64/fadd_vector.S: see also: Section 28.3, “SIMD assembly”
userland/arch/aarch64/fadd_scalar.S: see also: Section 27.2, “Floating point assembly”
+userland/arch/aarch64/fadd_scalar.S: see also: Section 28.2, “Floating point assembly”
It is very confusing, but FADDS and FADDD in Aarch32 are pre-UAL for vadd.f32 and vadd.f64 which we use in this tutorial, see: Section 29.6.1.2, “ARM VADD instruction”
It is very confusing, but FADDS and FADDD in Aarch32 are pre-UAL for vadd.f32 and vadd.f64 which we use in this tutorial, see: Section 30.6.1.2, “ARM VADD instruction”
The same goes for most ARMv7 mnemonics: f* is old, and v* is the newer better syntax.
Also keep in mind that fused multiply add is FMADD.
Examples at: Section 27.3, “SIMD assembly”
+Examples at: Section 28.3, “SIMD assembly”
Example: userland/arch/aarch64/ld2.S
Scalable Vector Extension.
Using SVE normally requires setting the CPACR_EL1.FPEN and ZEN bits, which as as of lkmc 29fd625f3fda79f5e0ee6cac43517ba74340d513 + 1 we also enable in our Baremetal bootloaders, see also: aarch64 baremetal NEON setup.
Get the SVE vector length. The following programs do that and print it to stdout:
ARMv8 architecture reference manual A1.7 "ARMv8 architecture extensions" says:
Parent section: Userland multithreading.
Parent section: atomic.cpp
Set of atomic and synchronization primitives added in ARMv8.1 architecture extension.
ARMv8 architecture reference manual db A1.7.3 "The ARMv8.1 architecture extension"
The PMU (Performance Monitor Unit) is an unit in the ARM CPU that counts performance events of interest. These can be used to benchmark, and sometimes debug, code running on ARM CPUs.
TODO We didn’t manage to find a working ARM analogue to x86 RDTSC instruction: kernel_modules/pmccntr.c is oopsing, and even it if weren’t, it likely won’t give the cycle count since boot since it needs to be activate before it starts counting anything:
Good getting started tutorials:
The official manuals were stored in http://infocenter.arm.com but as of 2017 they started to slowly move to https://developer.arm.com.
Bibliography: https://www.quora.com/Where-can-I-find-the-official-documentation-of-ARM-instruction-set-architectures-ISAs
ARM also releases documentation specific to each given processor.
https://en.wikipedia.org/wiki/Executable_and_Linkable_Format
@@ -39133,7 +39165,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.Getting started at: Section 1.9, “Baremetal setup”
+Getting started at: Section 2.9, “Baremetal setup”
GDB step debug works on baremetal exactly as it does on the Linux kernel, which is described at: Section 2, “GDB step debug”.
+GDB step debug works on baremetal exactly as it does on the Linux kernel, which is described at: Section 3, “GDB step debug”.
Except that is is even cooler here since we can easily control and understand every single instruction that is being run!
@@ -39240,7 +39272,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.As can be seen from Baremetal GDB step debug, all examples under baremetal/, with the exception of baremetal/arch/<arch>/no_bootloader, start from our tiny bootloaders:
the stack pointer
TODO: we don’t do this currently but maybe we should setup BSS
@@ -39304,7 +39336,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.For things to work in baremetal, we often have to layout memory in specific ways.
QEMU and gem5 currently supports baremetal CLI arguments!
It is worth noting that e.g. ARM has a Semihosting mechanism for loading CLI arguments through SYS_GET_CMDLINE, but our mechanism works in principle for any ISA.
Currently not supported, so we just hardcode argc 0 on the arm baremetal bootloader.
Semihosting is a publicly documented interface specified by ARM Holdings that allows us to do some magic operations very useful in development, such as writting to the terminal or reading and writing host files.
For gem5, you need patches/manual/gem5-semihost.patch:
TODO: our example is printing newlines without automatic carriage return \r as in:
For arm, some baremetal examples compile fine with:
Didn’t get it working, traking at: https://github.com/cirosantilli/linux-kernel-module-cheat/issues/119
It is incredible, but GDB also has a CPU simulator inside of it as documented at: https://sourceware.org/gdb/onlinedocs/gdb/Target-Commands.html
Since I had this compiled, I also decided to try it out on userland.
In this section we will focus on learning ARM architecture concepts that can only learnt on baremetal setups.
Userland information can be found at: https://github.com/cirosantilli/arm-assembly-cheat
ARM exception levels are analogous to x86 rings.
According to ARMv7 architecture reference manual, access to that register is controlled by other registers NSACR.{CP11, CP10} and HCPTR so those must be turned off, but I’m lazy to investigate now, even just trying to dump those registers in userland/arch/arm/dump_regs.c also leads to exceptions…
TODO. Create a minimal runnable example of going into EL0 and jumping to EL1.
See ARMv8 architecture reference manual db D1.6.2 "The stack pointer registers".
This is the most basic example of exception handling we have.
The vector table format is described on ARMv8 architecture reference manual Table D1-7 "Vector offsets from vector table base address".
Exception Syndrome Register.
See example at: Section 32.10.2, “ARM SVC instruction”
+See example at: Section 33.10.2, “ARM SVC instruction”
Documentation: ARMv8 architecture reference manual db D12.2.36 "ESR_EL1, Exception Syndrome Register (EL1)".
Exception Link Register.
See the example at: Section 32.10.2, “ARM SVC instruction”
+See the example at: Section 33.10.2, “ARM SVC instruction”
Examples:
since gem5 is able to detect when nothing will ever happen, and exits.
When GDB step debugging, switch between cores with the usual thread commands, see also: Section 2.9, “GDB step debug multicore userland”.
When GDB step debugging, switch between cores with the usual thread commands, see also: Section 3.9, “GDB step debug multicore userland”.
Bibliography: https://stackoverflow.com/questions/980999/what-does-multicore-assembly-language-look-like/33651438#33651438
The WFE and SEV instructions are just hints: a compliant implementation can treat them as NOPs.
For how userland spinlocks and mutexes are implemented see Userland mutex implementation.
Examples:
WFE and SEV are usable from userland, and are part of an efficient spinlock implementation (which userland should arguably stay away from and rather use the futex system call which allow for non busy sleep instead), which maybe is not something that userland should ever tho and just stick to mutexes?
gem5 390a74f59934b85d91489f8a563450d8321b602d does not sleep on the first WFE on either syscall emulation or full system, because the code does:
Can be used to implement atomic variables, see also:
In QEMU, CPU 1 starts in a halted state. This can be observed from GDB, where:
TODO: create and study a minimal examples in gem5 where the DMB instruction leads to less cycles: https://stackoverflow.com/questions/15491751/real-life-use-cases-of-barriers-dsb-dmb-isb-in-arm
The ARM timer is the simplest way to generate hardware interrupts periodically, and therefore serves as the simples example of ARM GIC usage.
Generic Interrupt Controller.
TODO create a minimal working aarch64 example analogous to the x86 one at: https://github.com/cirosantilli/x86-bare-metal-examples/blob/6dc9a73830fc05358d8d66128f740ef9906f7677/paging.S
First, also consider the userland bibliography: Section 29.10, “ARM assembly bibliography”.
+First, also consider the userland bibliography: Section 30.10, “ARM assembly bibliography”.
The most useful ARM baremetal example sets we’ve seen so far are:
@@ -41136,7 +41168,7 @@ cntvct_el0 0x3CF516FIt is nice when thing just work.
But you can also learn a thing or two from how I actually made them work in the first place.
Enter the QEMU console:
Inside baremetal/lib/aarch64.S there is a chunk of code that enables floating point operations:
Baremetal tests work exactly like User mode tests, except that you have to add the --mode baremetal option, for example:
In baremetal, we detect if tests failed by parsing logs for the Magic failure string.
See: Section 37.16, “Test this repo” for more useful testing tips.
+See: Section 38.16, “Test this repo” for more useful testing tips.
Remember: Android AOSP is a huge undocumented piece of bloatware. It’s integration into this repo will likely never be super good. See also: https://cirosantilli.com#android
@@ -41472,7 +41504,7 @@ ISBTested on: 8.1.0_r60.
Tested on: 8.1.0_r60.
From mount, we can see that some of the mounted images are ro.
When I install an app like F-Droid, it goes under /data according to:
I don’t know how to download files from the web on Vanilla android, the default browser does not download anything, and there is no wget:
For Linux in general, see: Section 6, “init”.
+For Linux in general, see: Section 7, “init”.
The /init executable interprets the /init.rc files, which is in a custom Android init system language: https://android.googlesource.com/platform/system/core/+/ee0e63f71d90537bb0570e77aa8a699cc222cfaf/init/README.md
TODO: didn’t fully port during refactor after 3b0a343647bed577586989fb702b760bd280844a. Reimplementing should not be hard.
@@ -41818,7 +41850,7 @@ cd -We have explored a few Continuous integration solutions.
We haven’t setup any of them yet.
We tried to automate it on Travis with .travis.yml but it hits the current 50 minute job timeout: https://travis-ci.org/cirosantilli/linux-kernel-module-cheat/builds/296454523 And I bet it would likely hit a disk maxout either way if it went on.
This setup successfully built gem5 on every commit: .circleci/config.yml
Run all kernel boot benchmarks for one arch:
TODO 62f6870e4e0b384c4bd2d514116247e81b241251 takes 33 minutes to finish at 62f6870e4e0b384c4bd2d514116247e81b241251:
Let’s see how fast our simulators are running some well known or easy to understand userland benchmarks!
so ~ 110 million instructions / 100 seconds makes ~ 1 MIPS (million instructions per second).
This experiment also suggests that each loop is about 11 instructions long (110M instructions / 10M loops), which we confirm at Section 35.2, “C busy loop”, bingo!
+This experiment also suggests that each loop is about 11 instructions long (110M instructions / 10M loops), which we confirm at Section 36.2, “C busy loop”, bingo!
Then for QEMU, we experimentally turn the number of loops up to 10^10 loops (100000 100000), which contains an expected 11 * 10^10 instructions, and the runtime is 00:01:08, so we have 1.1 * 10^11 instruction / 68 seconds ~ 2 * 10^9 = 2000 MIPS!
We can then repeat the experiment for other gem5 CPUs to see how they compare.
Let’s see if user mode runs considerably faster than full system or not, ignoring the kernel boot.
First we build dhrystonee manually statically since dynamic linking is broken in gem5 as explained at: Section 10.7, “gem5 syscall emulation mode”.
+First we build dhrystonee manually statically since dynamic linking is broken in gem5 as explained at: Section 11.7, “gem5 syscall emulation mode”.
gem5 user mode:
@@ -42419,7 +42451,7 @@ time \The build times are calculated after doing ./configure and make source, which downloads the sources, and basically benchmarks the Internet.
./build-buildroot -- graph-build graph-size graph-depends @@ -42455,14 +42487,14 @@ xdg-open graph-size.pdf
The biggest build time hog is always GCC, and it does not look like we can use a precompiled one: https://stackoverflow.com/questions/10833672/buildroot-environment-with-host-toolchain
This is the minimal build we could expect to get away with.
How long it takes to build gem5 itself.
A profiling of the build has been done at: https://gem5.atlassian.net/browse/GEM5-277 Analysis there showed that d7d9bc240615625141cd6feddbadd392457e49eb (2018-06-17) is also composed of 50% pybind11 and with no obvious time sinks.
This is the critical development parameter, and is dominated by the link time of huge binaries.
Serial number: TYPE 20HH-CTO1WW S/N PF-0V5V5N 17/11
Bought: 2017 for approximately 2400 pounds.
Nominal speed: 2400 Mbps
PCIe TLC OPAL2.
1TB.
2c12b21b304178a81c9912817b782ead0286d282:
gem5:
Argh, compilers are boring, let’s learn a bit about them.
In gem5, can be seen on:
As mentioned at: https://stackoverflow.com/questions/10074831/what-is-general-difference-between-superscalar-and-ooo-execution it is in theory possible for an out-of-order CPU to not a Superscalar processor, but the combination is so natural (since you can look ahead, you might as well run it!) that it is not super common.
Intel name: "Hyperthreading"
https://courses.cs.washington.edu/courses/cse378/09wi/lectures/lec15.pdf contains some of the first pictures you should see.
For example, for a 2-way associative cache, we remove on bit from the index, and add it to the tag.
Even if caches are coherent, this is still not enough to avoid data race conditions, because this does not enforce atomicity of read modify write sequences. This is for example shown at: Detailed gem5 analysis of how data races happen.
According to http://www.inf.ed.ac.uk/teaching/courses/pa/Notes/lecture07-sc.pdf "memory consistency" is about ordering requirements of different memory addresses.
This is represented explicitly in C++ for example C++ std::memory_order.
According to http://www.inf.ed.ac.uk/teaching/courses/pa/Notes/lecture07-sc.pdf, the strongest possible consistency, everything nicely ordered as you’d expect.
Either they can snoop only control, or both control and data can be snooped.
Mentioned at:
TODO gem5 concrete example.
TODO understand well why those are needed.
https://en.wikipedia.org/wiki/MOSI_protocol The critical MSI vs MOSI section was a bit bogus though: https://en.wikipedia.org/w/index.php?title=MOSI_protocol&oldid=895443023 but I edited it :-)
MESI cache coherence protocol + MOSI cache coherence protocol, not much else to it!
In gem5 9fc9c67b4242c03f165951775be5cd0812f2a705, MOESI is the default cache coherency protocol of the classic memory system as shown at Section 23.22.4.3.1, “What is the coherency protocol implemented by the classic cache system in gem5?”.
+In gem5 9fc9c67b4242c03f165951775be5cd0812f2a705, MOESI is the default cache coherency protocol of the classic memory system as shown at Section 24.22.4.3.1, “What is the coherency protocol implemented by the classic cache system in gem5?”.
A good an simple example showing several MOESI transitions in the classic memory model can be seen at: Section 23.22.4.4, “gem5 event queue AtomicSimpleCPU syscall emulation freestanding example analysis with caches and multiple CPUs”.
+A good an simple example showing several MOESI transitions in the classic memory model can be seen at: Section 24.22.4.4, “gem5 event queue AtomicSimpleCPU syscall emulation freestanding example analysis with caches and multiple CPUs”.
gem5 12c917de54145d2d50260035ba7fa614e25317a3 has several Ruby MOESI models implemented: MOESI_AMD_Base, MOESI_CMP_directory, MOESI_CMP_token and MOESI_hammer.
The host requirements depend a lot on which examples you want to run.
If ./build --download-dependencies fails with:
It does not work if you just download the .zip with the sources for this repository from GitHub because we use Git submodules, you must clone this repo.
If you just want to run a command after boot ends without thinking much about it, just use the --eval-after option, e.g.:
It gets annoying to retype --arch aarch64 for every single command, or to remember --config setups.
To learn how to build the documentation see: Section 1.10, “Build the documentation”.
+To learn how to build the documentation see: Section 2.10, “Build the documentation”.
When running build-doc, we do the following checks:
The scripts prints what you have to fix and exits with an error status if there are any errors.
Documentation for asciidoctor/extract-link-targets
Documentation for asciidoctor/extract-header-ids
The Asciidoctor extension scripts:
As mentioned before the TOC, we have to push this README to GitHub pages due to: https://github.com/isaacs/github/issues/1610
You did something crazy, and nothing seems to work anymore?
For now there is no way to change the build directory from out/ (resp. out.docker for <<docker>.) to something else.
ccache might save you a lot of re-build when you decide to Clean the build or create a new build variant.
The getvar helper script can print the values of internal LKMC variables.
For this reason, we use it in particular often in this README to reduce the need for refactoring.
While you could just manually find/learn the path to toolchain tools, e.g. in LKMC b15a0e455d691afa49f3b813ad9b09394dfb02b7 they are:
Since disassembly of a single function of a LKMC executable with GDB is such a common use case for run-toolchain via https://stackoverflow.com/questions/22769246/how-to-disassemble-one-single-function-using-objdump, we have this shortcut for it.
It is not possible to rebuild the root filesystem while running QEMU because QEMU holds the file qcow2 file:
When doing long simulations sweeping across multiple system parameters, it becomes fundamental to do multiple simulations in parallel.
To run multiple gem5 checkouts, see: Section 37.13.3.1, “gem5 worktree”.
+To run multiple gem5 checkouts, see: Section 38.13.3.1, “gem5 worktree”.
Implementation note: we create multiple namespaces for two things:
@@ -44952,7 +44984,7 @@ less "$(./getvar --arch aarch64 --emulator gem5 --run-id 1 termout_file)"It often happens that you are comparing two versions of the build, a good and a bad one, and trying to figure out why the bad one is bad.
Our build variants system allows you to keep multiple built versions of all major components, so that you can easily switching between running one or the other.
If you want to keep two builds around, one for the latest Linux version, and the other for Linux v4.16:
To run both kernels simultaneously, one on each QEMU instance, see: Section 37.12, “Simultaneous runs”.
+To run both kernels simultaneously, one on each QEMU instance, see: Section 38.12, “Simultaneous runs”.
Analogous to the Linux kernel build variants but with the --qemu-build-id option instead:
Analogous to the Linux kernel build variants but with the --gem5-build-id option instead:
Therefore, you can’t forget to checkout to the sources to that of the corresponding build before running, unless you explicitly tell gem5 to use a non-default source tree with gem5 worktree. This becomes inevitable when you want to launch multiple simultaneous runs at different checkouts.
--gem5-build-id goes a long way, but if you want to seamlessly switch between two gem5 tress without checking out multiple times, then --gem5-worktree is for you.
Suppose that you are working on a private fork of gem5, but you want to use this repository to develop it as well.
Allows you to have multiple versions of the GCC toolchain or root filesystem.
The --optimization-level option is available on all build scripts and sets the given GCC `-`O optimization level where it has been implemented for guest binaries.
lkmc/ contains sources and headers that are shared across kernel modules, userland and baremetal examples.
Another option would have been to name it as includes/lkmc, but that would make paths longer, and we might want to store source code in that directory as well in the future.
When factoring out functionality across userland examples, there are two main options:
Source: buildroot_packages/.
A custom build script can give you more flexibility: e.g. the package can be made work with other root filesystems more easily, have better 9P support, and rebuild faster as it evades some Buildroot boilerplate.
Has the following structure:
Patches in this directory are never applied automatically: it is up to users to manually apply them before usage following the instructions in this documentation.
Source: rootfs_overlay.
Source: copy-overlay
Build Buildroot is required for the same reason as described at: Section 1.2.2.2, “Your first kernel module hack”.
+Build Buildroot is required for the same reason as described at: Section 2.2.2.2, “Your first kernel module hack”.
However, since the rootfs_overlay directory does not require compilation, unlike say kernel modules, we also make it 9P available to the guest directly even without ./copy-overlay at:
This way you can just hack away the scripts and try them out immediately without any further operations.
out_rootfs_overlay_dirout_rootfs_overlay_dirThis path can be found with:
This does not include native image modification mechanisms such as Buildroot packages, which we let Buildroot itself manage.
disk_image_2disk_image_2A squashfs of out_rootfs_overlay_dir that gets passed as the second argument.
The files:
lkmc_home refers to the target base directory in which we put all our custom built stuff, such as userland executables and kernel modules.
In order to build and run each userland and baremetal example properly, we need per-file metadata such as compiler flags and required number of cores.
Print out several parameters that normally change randomly from boot to boot:
Run almost all tests:
test does not all possible tests, because there are too many possible variations and that would take forever. The rationale is the same as for ./build all and is explained in ./build --help.
You can select multiple archs and emulators of interest, as for an other command, with:
By default, continue running even after the first failure happens, and they show a summary at the end.
TODO: we really need a mechanism to automatically generate the test list automatically e.g. based on path_properties.py, currently there are many tests missing, and we have to add everything manually which is very annoying.
Failure is detected by looking for the Magic failure string
Most userland programs that don’t rely on kernel modules can also be tested in user mode simulation as explained at: Section 10.2, “User mode tests”.
+Most userland programs that don’t rely on kernel modules can also be tested in user mode simulation as explained at: Section 11.2, “User mode tests”.
We have some pexpect automated tests for GDB for both userland and baremetal programs!
We do not know of any way to set the emulator exit status in QEMU arm full system.
gem5: m5 fail works on all archs
user mode: QEMU forwards exit status, for gem5 we do some log parsing as described at: Section 10.7.2, “gem5 syscall emulation exit status”
+user mode: QEMU forwards exit status, for gem5 we do some log parsing as described at: Section 11.7.2, “gem5 syscall emulation exit status”
For the Linux kernel, do the following manual tests for now.
You should also test that the Internet works:
build-userland and test-executables have a wide variety of target selection modes, and it was hard to keep them all working without some tests:
When updating the Linux kernel, QEMU and gem5, things sometimes break.
However, for many types of crashes, it is trivial to bisect down to the offending commit, in particular because we can make QEMU and gem5 exit with status 1 on kernel panic as mentioned at: Section 16.6.1.3, “Exit emulator on panic”.
+However, for many types of crashes, it is trivial to bisect down to the offending commit, in particular because we can make QEMU and gem5 exit with status 1 on kernel panic as mentioned at: Section 17.6.1.3, “Exit emulator on panic”.
For example, when updating from QEMU v2.12.0 to v3.0.0-rc3, the Linux kernel boot started to panic for arm.
This is a template update procedure for submodules for which we have some patches on on top of mainline.
Ensure that the Automated tests are passing on a clean build:
The ./build-test command builds a superset of what will be downloaded which also tests other things we would like to be working on the release. For the minimal build to generate the files to be uploaded, see: Section 37.19.2, “release-zip”
The ./build-test command builds a superset of what will be downloaded which also tests other things we would like to be working on the release. For the minimal build to generate the files to be uploaded, see: Section 38.19.2, “release-zip”
The clean build is necessary as it generates clean images since it is not possible to remove Buildroot packages
@@ -46187,7 +46219,7 @@ git push --follow-tagsCreate a zip containing all files required for Prebuilt setup:
After:
This project was created to help me understand, modify and test low level system components by using system simulators.
The trade-offs between the different setups are basically a balance between:
compatibility: how likely is is that all the components will work well together: emulator, compiler, kernel, standard library, …
guest software availability: how wide is your choice of easily installed guest software packages? See also: Section 37.20.4, “Linux distro choice”
+guest software availability: how wide is your choice of easily installed guest software packages? See also: Section 38.20.4, “Linux distro choice”
Choosing which features go into our default builds means making tradeoffs, here are our guidelines:
In order to learn how to measure some of those aspects, see: Section 34, “Benchmark this repo”.
+In order to learn how to measure some of those aspects, see: Section 35, “Benchmark this repo”.
We haven’t found the ultimate distro yet, here is a summary table of trade-offs that we care about: Table 8, “Comparison of Linux distros for usage in this repository”.