diff --git a/index.html b/index.html index bf68eb3..6c099a4 100644 --- a/index.html +++ b/index.html @@ -556,8 +556,9 @@ pre{ white-space:pre }
but you will soon find that they are simply not enough if you anywhere near serious about systems programming.
After ./run, QEMU opens up leaving you in the /lkmc/ directory, and you can start playing with the kernel modules inside the simulated system:
After ./run, QEMU opens up leaving you in the /lkmc/ directory, and you can start playing with the kernel modules inside the simulated system:
build-userland-in-tre is in turn just a thin wrapper around build-userland:
build-userland-in-tree is in turn just a thin wrapper around build-userland:
TODO why can’t we break at early startup stuff such as:
+Note however that early boot parts appear to be relocated in memory somehow, and therefore:
+you won’t see the source location in GDB, only assembly
+you won’t be able to break by symbol in those early locations
+Further discussion at: Linux kernel entry point.
+As mentioned at: GDB step debug early boot, the very first kernel instructions executed appear to be placed into memory at a different location than that of the kernel ELF section.
+As a result, we are unable to break on early symbols such as:
Maybe it is because they are being copied around at specific locations instead of being run directly from inside the main image, which is where the debug information points to?
+gem5 ExecAll trace format>> however does show the right symbols however! This could be because gem5 uses vmlinux to boot, which QEMU uses the compressed version, and as mentioned on the Stack Overflow answer, the entry point is actually a tiny decompresser routine.
gem5 tracing with --debug-flags=Exec does show the right symbols however! So in the worst case, we can just read their source. Amazing.
v4.19 also added a CONFIG_HAVE_KERNEL_UNCOMPRESSED=y option for having the kernel uncompressed which could make following the startup easier, but it is only available on s390. aarch64 however is already uncompressed by default, so might be the easiest one. See also: Section 15.21.1, “vmlinux vs bzImage vs zImage vs Image”.
One possibility is to run:
-./trace-boot --arch arm-
and then find the second address (the first one does not work, already too late maybe):
-less "$(./getvar --arch arm trace_txt_file)"-
and break there:
-./run --arch arm --gdb-wait -./run-gdb --arch arm '*0x1000'-
but TODO: it does not show the source assembly under arch/arm: https://stackoverflow.com/questions/11423784/qemu-arm-linux-kernel-boot-debug-no-source-code
In gem5 aarch64 Linux v4.18, experimentally the entry point of secondary CPUs seems to be secondary_holding_pen as shown at https://gist.github.com/cirosantilli2/34a7bc450fcb6c1c1a910369be1fdd90
I also tried to hack run-gdb with:
and no I do have the symbols from arch/arm/boot/compressed/vmlinux', but the breaks still don’t work.
v4.19 also added a CONFIG_HAVE_KERNEL_UNCOMPRESSED=y option for having the kernel uncompressed which could make following the startup easier, but it is only available on s390. aarch64 however is already uncompressed by default, so might be the easiest one. See also: Section 15.20.1, “vmlinux vs bzImage vs zImage vs Image”.
You then need the associated KERNEL_UNCOMPRESSED to enable it if available:
config KERNEL_UNCOMPRESSED + bool "None" + depends on HAVE_KERNEL_UNCOMPRESSED+
start_kernel is the first C function to be executed basically: https://stackoverflow.com/questions/18266063/does-kernel-have-main-function/33422401#33422401
For the earlier arch-specific entry point, see: Linux kernel entry point.
+When booting Linux on a slow emulator like gem5, what you observe is that:
We think that this might be because gem5 boots directly vmlinux, and not from the final compressed images that contain the attached rootfs such as bzImage, which is what QEMU does, see also: Section 15.21.1, “vmlinux vs bzImage vs zImage vs Image”.
We think that this might be because gem5 boots directly vmlinux, and not from the final compressed images that contain the attached rootfs such as bzImage, which is what QEMU does, see also: Section 15.20.1, “vmlinux vs bzImage vs zImage vs Image”.
To do this failed test, we automatically pass a dummy disk image as of gem5 7fa4c946386e7207ad5859e8ade0bbfc14000d91 since the scripts don’t handle a missing --disk-image well, much like is currently done for Baremetal.
Here is an interesting examples of this: Section 15.20.1, “Linux Test Project”
+Here is an interesting examples of this: Section 15.19.1, “Linux Test Project”
modules built with Buildroot, see: Section 34.15.2.1, “kernel_modules buildroot package”
modules built from the kernel tree itself, see: Section 15.12.2, “dummy-irq”
+modules built from the kernel tree itself, see: Section 15.11.2, “dummy-irq”
Disable userland address space randomization. Test it out by running rand_check.out twice:
+Disable userland address space randomization. Test it out by running rand_check.out twice:
start_kernel is a good definition of it: https://stackoverflow.com/questions/18266063/does-kernel-have-main-function/33422401#33422401
In gem5 aarc64 Linux v4.18, experimentally the entry point of secondary CPUs seems to be secondary_holding_pen as shown at https://gist.github.com/cirosantilli2/34a7bc450fcb6c1c1a910369be1fdd90
The Linux kernel allows passing module parameters at insertion time through the init_module and finit_module system calls.
modprobe insertion can also set default parameters via the /etc/modprobe.conf file:
One module can depend on symbols of another module that are exported with EXPORT_SYMBOL:
TODO: what for, and at which point point does Buildroot / BusyBox generate that file?
Unlike insmod, modprobe deals with kernel module dependencies for us.
Module metadata is stored on module files at compile time. Some of the fields can be retrieved through the THIS_MODULE struct module:
Vermagic is a magic string present in the kernel and on MODULE_INFO of kernel modules. It is used to verify that the kernel module was compiled against a compatible kernel version and relevant configuration:
init_module and cleanup_module are an older alternative to the module_init and module_exit macros:
It is generally hard / impossible to use floating point operations in the kernel. TODO understand details.
To test out kernel panics and oops in controlled circumstances, try out the modules:
On panic, the kernel dies, and so does our terminal.
The log shows which module each symbol belongs to if any, e.g.:
For testing purposes, it is very useful to quit the emulator automatically with exit status non zero in case of kernel panic, instead of just hanging forever.
Enabled by default with:
panic=-1 command line option which reboots the kernel immediately on panic, see: Section 15.7.1.4, “Reboot on panic”
panic=-1 command line option which reboots the kernel immediately on panic, see: Section 15.6.1.4, “Reboot on panic”
QEMU -no-reboot, which makes QEMU exit when the guest tries to reboot
gem5 9048ef0ffbf21bedb803b785fb68f83e95c04db8 (January 2019) can detect panics automatically if the option system.panic_on_panic is on.
Make the kernel reboot after n seconds after panic:
If CONFIG_KALLSYMS=n, then addresses are shown on traces instead of symbol plus offset.
On oops, the shell still lives after.
The dump_stack function produces a stack trace much like panic and oops, but causes no problems and we return to the normal control flow, and can cleanly remove the module afterwards:
The WARN_ON macro basically just calls dump_stack.
Let’s learn how to diagnose problems with the root filesystem not being found. TODO add a sample panic error message for each error type:
Pseudo filesystems are filesystems that don’t represent actual files in a hard disk, but rather allow us to do special operations on filesystem-related system calls.
Debugfs is the simplest pseudo filesystem to play around with:
Procfs is just another fops entry point:
Its data is shared with uname(), which is a POSIX C function and has a Linux syscall to back it up.
Sysfs is more restricted than procfs, as it does not take an arbitrary file_operations:
Character devices can have arbitrary File operations associated to them:
Bibliography: https://unix.stackexchange.com/questions/37829/understanding-character-device-or-character-special-files/371758#371758
And also destroy it on rmmod:
File operations are the main method of userland driver communication.
Writing trivial read File operations is repetitive and error prone. The seq_file API makes the process much easier for those trivial cases:
If you have the entire read output upfront, single_open is an even more convenient version of seq_file:
The poll system call allows an user process to do a non-busy wait on a kernel event.
The ioctl system call is the best way to pass an arbitrary number of parameters to the kernel in a single go:
The mmap system call allows us to share memory between user and kernel space without copying:
Anonymous inodes allow getting multiple file descriptors from a single filesystem entry, which reduces namespace pollution compared to creating multiple device files:
Netlink sockets offer a socket API for kernel / userland communication:
Kernel threads are managed exactly like userland threads; they also have a backing task_struct, and are scheduled with the same mechanism:
The sleep is done with usleep_range, see: Section 15.10.2, “sleep”.
The sleep is done with usleep_range, see: Section 15.9.2, “sleep”.
Bibliography:
@@ -13706,7 +13708,7 @@ for i in `seq 16`; do ./netlink.out & doneLet’s launch two threads and see if they actually run in parallel:
Count to dmesg every one second from 0 up to n - 1:
A more convenient front-end for kthread:
Bibliography: https://github.com/torvalds/linux/blob/v4.17/Documentation/core-api/workqueue.rst
Count from 0 to 9 every second infinitely many times by scheduling a new work item from a work item:
Let’s block the entire kernel! Yay:
Wait queues are a way to make a thread sleep until an event happens on the queue:
Count from 0 to 9 infinitely many times in 1 second intervals using timers:
Brute force monitor every shared interrupt that will accept us:
The Linux kernel v4.16 mainline also has a dummy-irq module at drivers/misc/dummy-irq.c for monitoring a single IRQ.
In the guest with QEMU graphic mode:
Convert a virtual address to physical:
Only tested in x86_64.
The xp QEMU monitor command reads memory at a given physical address.
/dev/mem exposes access to physical addresses, and we use it through the convenient devmem BusyBox utility.
Dump the physical address of all pages mapped to a given process using /proc/<pid>/maps and /proc/<pid>/pagemap.
Good overviews:
I hope to have examples of all methods some day, since I’m obsessed with visibility.
Logs proc events such as process creation to a netlink socket.
0111ca406bdfa6fd65a2605d353583b4c4051781 was failing with:
Trace a single function:
TODO: can you get function arguments? https://stackoverflow.com/questions/27608752/does-ftrace-allow-capture-of-system-call-arguments-to-the-linux-kernel-or-only
TODO example:
kprobes is an instrumentation mechanism that injects arbitrary code at a given address in a trap instruction, much like GDB. Oh, the good old kernel. :-)
TODO: didn’t port during refactor after 3b0a343647bed577586989fb702b760bd280844a. Reimplementing should not be hard.
Make it harder to get hacked and easier to notice that you were, at the cost of some (small?) runtime overhead.
Detects buffer overflows for us:
TODO get a hello world permission control working:
I once got UML running on a minimal Buildroot setup at: https://unix.stackexchange.com/questions/73203/how-to-create-rootfs-for-user-mode-linux-on-fedora-18/372207#372207
UIO is a kernel subsystem that allows to do certain types of driver operations from userland.
Requires Graphics.
Requires Graphics.
If you run in QEMU graphic mode:
This leads Linux to try to reboot, and QEMU shutdowns due to the -no-reboot option which we set by default for, see: Section 15.7.1.3, “Exit emulator on panic”.
This leads Linux to try to reboot, and QEMU shutdowns due to the -no-reboot option which we set by default for, see: Section 15.6.1.3, “Exit emulator on panic”.
Here is a minimal example of Ctrl Alt Del:
@@ -15811,7 +15813,7 @@ static void halt_reboot_pwoff(int sig)We cannot test these actual shortcuts on QEMU since the host captures them at a lower level, but from:
In order to play with TTYs, do this:
Take the command described at TTY and try adding the following:
If you run in Graphics, then you get a Penguin image for every core above the console! https://askubuntu.com/questions/80938/is-it-possible-to-get-the-tux-logo-on-the-text-based-boot
DRM / DRI is the new interface that supersedes fbdev:
Tested on: 93e383902ebcc03d8a7ac0d65961c0e62af9612b
./build-buildroot --config-fragment buildroot_config/kmscube@@ -16403,7 +16405,7 @@ failed to initialize legacy DRM
TODO get working.
Between all archs on QEMU and gem5 we touch all of those kernel built output files.
The following kernel modules and Baremetal executables dump and disassemble various registers which cannot be observed from userland (usually "system registers", "control registers"):
timestamps of dmesg output
rand_check.out output
+rand_check.out output
Tested in gem5 d7d9bc240615625141cd6feddbadd392457e49eb.
This is the simplest of all protocols, and therefore the first one you should study to learn how Ruby works.
There are basically two choices:
+These are your options:
create a Buildroot package: Add new Buildroot packages
+This is the most general option, but the most laborious. No big deal if you copy our template however as shown in that section.
+Handles any type of cross compilation, including multiple input sources.
+drop your files directly in rootfs_overlay and follow instructions from that section
-drop your files directly in rootfs_overlay and follow instructions from that section.
+Files in that directory are directly copied to the image, so this is the best option for files that don’t need to be compiled such as Interpreted languages.
If you need to cross compile input files such as C for the guest, then Buildroot packages are definitely the cleaner option as they make cross compilation easy.
+You could also use this method to inject compiled binaries into the image for quick-and-dirty testing.
However, for a quick initial prototype, it should be fine to just manually compile your files and drop them in rootfs_overlay.
+But it will be much more likely to work if you use our cross compiler with run-toolchain or getvar.
Ideally, you should still use the Buildroot cross compiler for this which ensures compatibility.
+If you can’t do that, at the very least make it statically with -static compiled to remove the possibility of binary mismatch with our dynamic glibc.
The best way to do that is to use either run-toolchain or getvar.
+But things can still break if your random glibc is configured to work with a newer Linux kernel than ours.
In case you can’t for some reason, e.g. if you need to use your own custom toolchain, you should:
+It often just works even if they are not perfectly matched however, partly because the Linux kernel is highly backwards compatible
make sure that you have built your toolchain to match the our kernel version. It often just works even if they are not perfectly matched however, partly because the Linux kernel is highly backwards compatible
build statically with -static to avoid binary compatibility issues with our own glibc
fork this repo and add new files to userland/ or kernel_modules/
+To add a simple executable that compiles from a single source file, like the dozens of examples that we have, you could just go this route.
+This mechanisms bypasses having to create/modify Buildroot packages, and is very simple when you have a single input single output executable.
+9P. OK, this is not really adding to the image, but it is the most convenient way to quickly modify a binary on the host, cross compile, and test it out without rebooting.
If none of those methods are flexible enough for you, you can just fork or hack up buildroot_packages/sample_package the sample package to do what you want.
For how to use that package, see: Section 34.15.2, “buildroot_packages directory”.
+For how to use that package, see: Section 34.15.2, “buildroot_packages directory”.
Then iterate trying to do what you want and reading the manual until it works: https://buildroot.org/downloads/manual/manual.html
@@ -29950,7 +29963,7 @@ make menuconfigAlso mentioned at: https://stackoverflow.com/questions/47320800/how-to-clean-only-target-in-buildroot
See this for a sample manual workaround: Section 22.8.1.4, “PARSEC uninstall”.
+See this for a sample manual workaround: Section 22.9.1.4, “PARSEC uninstall”.
This section was originally moved in here from: https://github.com/cirosantilli/cpp-cheat
Build userland programs.
+Build all with:
+./build-userland+
or build only those under e.g. userland/c with:
./build-userland userland/c+
The executables are not automatically added to the Buildroot image, you must follow the command with a ./build-buildroot command as in:
./build-userland +./build-buildroot+
Remember that certain executables have specific requirements, e.g.:
+userland/arch/ programs only build if the target arch matches
+userland/libs directory require the --package option userland/libs directory
Default: build all examples that have their package dependencies met, e.g.:
+an OpenBLAS example can only be built if the target root filesystem has the OpenBLAS libraries and headers installed, which you must inform with --package
+Programs under userland/c/ are examples of ANSI C programming:
Allocate memory! Vs using the stack: https://stackoverflow.com/questions/4584089/what-is-the-function-of-the-push-pop-instructions-used-on-registers-in-x86-ass/33583134#33583134
malloc leads to the infinite joys of Memory leaks.
TODO: the exact answer is going to be hard.
General overview at: https://stackoverflow.com/questions/2798330/maximum-memory-which-malloc-can-allocate
If we start using the pages, the OOM killer would sooner or later step in and kill our process: Linux out-of-memory killer.
We can observe the OOM in LKMC 1e969e832f66cb5a72d12d57c53fb09e9721d589 which defaults to 256MiB of memory with:
Added in C11!
Example: userland/gcc/empty_struct.c
GCC implements the OpenMP threading implementation: https://stackoverflow.com/questions/3949901/pthreads-vs-openmp
strace shows that OpenMP makes clone() syscalls in Linux. TODO: does it actually call pthread_ functions, or does it make syscalls directly? Or in other words, can it work on Freestanding programs? A quick grep shows many references to pthreads.
Programs under userland/cpp/ are examples of ISO C programming.
OMG this is hell, understand when primitive variables are initialized or not:
The smallest data race we managed to come up as of LKMC 7c01b29f1ee7da878c7cc9cb4565f3f3cf516a92 and gem5 872cb227fdc0b4d60acc7840889d567a6936b6e1 was with userland/c/atomic.c (see also C multithreading):
Like for C, you have to pay for the standards… insane. So we just use the closest free drafts instead.
https://stackoverflow.com/questions/81656/where-do-i-find-the-current-c-or-c-standard-documents
Programs under userland/posix/ are examples of POSIX C programming.
POSIX C example that prints all environment variables: userland/posix/environ.c
POSIX' multiprocess API. Contrast with pthreads which are for threads.
Read the source comments and understand everything that is going on!
The minimal interesting example is to use fork and observe different PIDs.
POSIX' multithreading API. Contrast with fork which is for processes.
userland/posix/pthread_count.c exemplifies the functions:
The mmap system call allows advanced memory operations.
Basic mmap example, do the same as userland/c/malloc.c, but with mmap.
Memory mapped file example: userland/posix/mmap_file.c
A bit like read and write, but from / to the Internet!
The following sections are related to multithreading in userland:
Let’s group the hard-to-debug undefined-behaviour-like stuff found in C / C+ here and how to tackle those problems.
Maybe some day someone will use this setup to study the performance of interpreters.
Examples:
Buildroot has a Python package that can be added to the guest image:
At LKMC 50ac89b779363774325c81157ec8b9a6bdb50a2f gem5 390a74f59934b85d91489f8a563450d8321b602da:
Here we will add some better examples and explanations for: https://docs.python.org/3/extending/embedding.html#very-high-level-embedding
Host installation shown at: https://askubuntu.com/questions/594656/how-to-install-the-latest-versions-of-nodejs-and-npm/971612#971612
Illustrates how to add extra non-code data files to an NPM package, and then use those files at runtime.
No OpenJDK package as of 2018.08: https://stackoverflow.com/questions/28874150/buildroot-with-jamvm-2-0-for-java-8/59290927#59290927 partly because their build system is shit like the rest of the project’s setup.
These are good targets for performance analysis with gem5, and there is some overlap between this section and Benchmarks.
TODO: move benchmark graph from userland/cpp/bst_vs_heap_vs_hashmap.cpp to userland/algorithm/set.
Buildroot supports it, which makes everything just trivial:
Header only linear algebra library with a mainline Buildroot package:
These are good targets for performance analysis with gem5.
We have ported parts of the PARSEC benchmark for cross compilation at: https://github.com/cirosantilli/parsec-benchmark See the documentation on that repo to find out which benchmarks have been ported. Some of the benchmarks were are segfaulting, they are documented in that repo.
./build --arch arm --download-dependencies gem5-buildroot parsec-benchmark @@ -32764,7 +32834,7 @@ cblas_dgemm( CblasColMajor, CblasNoTrans, CblasTrans,3,3,2 ,1, A,3, B,
Running a benchmark of a size different than test, e.g. simsmall, requires a rebuild with:
Most users won’t want to use this method because:
If you want to remove PARSEC later, Buildroot doesn’t provide an automated package removal mechanism as mentioned at: Section 21.6, “Remove Buildroot packages”, but the following procedure should be satisfactory:
If you end up going inside submodules/parsec-benchmark to hack up the benchmark (you will!), these tips will be helpful.
It eventually has to come to that, hasn’t it?
Tests under userland/libs require certain optional libraries to be installed on the target, and are not built or tested by default, you must enable them with either:
See for example BLAS.
+See for example BLAS. Since it is located under userland/libs/openblas, it will only build with either:
./build-userland --package openblas +./build-userland --package-all+
The following basenames should always refer to programs that do the same thing, but in different languages:
Examples:
First we build Dhrystone manually statically since dynamic linking is broken in gem5 as explained at: Section 10.7, “gem5 syscall emulation mode”.
TODO: move this section to our new custom dhrystone setup: Section 22.8.2.1, “Dhrystone”.
+TODO: move this section to our new custom dhrystone setup: Section 22.9.2.1, “Dhrystone”.
gem5 user mode:
@@ -45206,7 +45282,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -Source: buildroot_packages/.
Source: rootfs_overlay.
This way you can just hack away the scripts and try them out immediately without any further operations.
This path can be found with:
Userland content that needs to be compiled
rootfs_overlay content that gets put inside the image as is
+rootfs_overlay content that gets put inside the image as is
In Buildroot, this is done by pointing BR2_ROOTFS_OVERLAY to that directory, which is documented at: https://buildroot.org/downloads/manual/manual.html#rootfs-custom
This does not include native image modification mechanisms such as Buildroot packages, which we let Buildroot itself manage.
+This does not include native image modification mechanisms such as Buildroot packages, which we let Buildroot itself manage.
lkmc_home refers to the target base directory in which we put all our custom built stuff, such as userland executables and kernel modules.
Print out several parameters that normally change randomly from boot to boot:
When updating the Linux kernel, QEMU and gem5, things sometimes break.
However, for many types of crashes, it is trivial to bisect down to the offending commit, in particular because we can make QEMU and gem5 exit with status 1 on kernel panic as mentioned at: Section 15.7.1.3, “Exit emulator on panic”.
+However, for many types of crashes, it is trivial to bisect down to the offending commit, in particular because we can make QEMU and gem5 exit with status 1 on kernel panic as mentioned at: Section 15.6.1.3, “Exit emulator on panic”.
For example, when updating from QEMU v2.12.0 to v3.0.0-rc3, the Linux kernel boot started to panic for arm.