From c318775cc698a5f05e7e32e2459d4c53fb40ee31 Mon Sep 17 00:00:00 2001 From: Ciro Santilli Date: Thu, 13 Sep 2018 07:23:49 +0100 Subject: [PATCH] reame: getting started is beautiful --- README.adoc | 650 ++++++++++++++++++++++++++-------------------------- 1 file changed, 331 insertions(+), 319 deletions(-) diff --git a/README.adoc b/README.adoc index 46663af..aba5db1 100644 --- a/README.adoc +++ b/README.adoc @@ -166,6 +166,10 @@ I now urge you to read the following sections which contain widely applicable in * <> * <> * <> +* <> +* Linux kernel +** <> +** <> Once you use <> and <>, your terminal will look a bit like this: @@ -617,321 +621,6 @@ rmmod hello.ko dmesg .... -=== Disk persistency - -We disable disk persistency for both QEMU and gem5 by default, to prevent the emulator from putting the image in an unknown state. - -For QEMU, this is done by passing the `snapshot` option to `-drive`, and for gem5 it is the default behaviour. - -If you hack up our link:run[] script to remove that option, then: - -.... -./run --eval-busybox 'date >f;poweroff' - -.... - -followed by: - -.... -./run --eval-busybox 'cat f' -.... - -gives the date, because `poweroff` without `-n` syncs before shutdown. - -The `sync` command also saves the disk: - -.... -sync -.... - -When you do: - -.... -./build-buildroot -.... - -the disk image gets overwritten by a fresh filesystem and you lose all changes. - -Remember that if you forcibly turn QEMU off without `sync` or `poweroff` from inside the VM, e.g. by closing the QEMU window, disk changes may not be saved. - -Persistency is also turned off when booting from <> with a CPIO instead of with a disk. - -Disk persistency is useful to re-run shell commands from the history of a previous session with `Ctrl-R`, but we felt that the loss of determinism was not worth it. - -==== gem5 disk persistency - -TODO how to make gem5 disk writes persistent? - -As of cadb92f2df916dbb47f428fd1ec4932a2e1f0f48 there are some `read_only` entries in the `config.ini` under cow sections, but hacking them to true did not work: - -.... -diff --git a/configs/common/FSConfig.py b/configs/common/FSConfig.py -index 17498c42b..76b8b351d 100644 ---- a/configs/common/FSConfig.py -+++ b/configs/common/FSConfig.py -@@ -60,7 +60,7 @@ os_types = { 'alpha' : [ 'linux' ], - } - - class CowIdeDisk(IdeDisk): -- image = CowDiskImage(child=RawDiskImage(read_only=True), -+ image = CowDiskImage(child=RawDiskImage(read_only=False), - read_only=False) - - def childImage(self, ci): -.... - -The directory of interest is `src/dev/storage`. - -qcow2 does not appear supported, there are not hits in the source tree, and there is a mention on Nate's 2009 wishlist: http://gem5.org/Nate%27s_Wish_List - -=== Kernel command line parameters - -Bootloaders can pass a string as input to the Linux kernel when it is booting to control its behaviour, much like the `execve` system call does to userland processes. - -This allows us to control the behaviour of the kernel without rebuilding anything. - -With QEMU, QEMU itself acts as the bootloader, and provides the `-append` option and we expose it through `./run --kernel-cli`, e.g.: - -.... -./run --kernel-cli 'foo bar' -.... - -Then inside the host, you can check which options were given with: - -.... -cat /proc/cmdline -.... - -They are also printed at the beginning of the boot message: - -.... -dmesg | grep "Command line" -.... - -See also: - -* https://unix.stackexchange.com/questions/48601/how-to-display-the-linux-kernel-command-line-parameters-given-for-the-current-bo -* https://askubuntu.com/questions/32654/how-do-i-find-the-boot-parameters-used-by-the-running-kernel - -The arguments are documented in the kernel documentation: https://www.kernel.org/doc/html/v4.14/admin-guide/kernel-parameters.html - -When dealing with real boards, extra command line options are provided on some magic bootloader configuration file, e.g.: - -* GRUB configuration files: https://askubuntu.com/questions/19486/how-do-i-add-a-kernel-boot-parameter -* Raspberry pi `/boot/cmdline.txt` on a magic partition: https://raspberrypi.stackexchange.com/questions/14839/how-to-change-the-kernel-commandline-for-archlinuxarm-on-raspberry-pi-effectly - -==== Kernel command line parameters escaping - -Double quotes can be used to escape spaces as in `opt="a b"`, but double quotes themselves cannot be escaped, e.g. `opt"a\"b"` - -This even lead us to use base64 encoding with `--eval`! - -==== Kernel command line parameters definition points - -There are two methods: - -* `__setup` as in: -+ -.... -__setup("console=", console_setup); -.... -* `core_param` as in: -+ -.... -core_param(panic, panic_timeout, int, 0644); -.... - -`core_param` suggests how they are different: - -.... -/** - * core_param - define a historical core kernel parameter. - -... - - * core_param is just like module_param(), but cannot be modular and - * doesn't add a prefix (such as "printk."). This is for compatibility - * with __setup(), and it makes sense as truly core parameters aren't - * tied to the particular file they're in. - */ -.... - -==== norandmaps - -Disable userland address space randomization. Test it out by running <> twice: - -.... -./run --eval-busybox '/rand_check.out;/poweroff.out' -./run --eval-busybox '/rand_check.out;/poweroff.out' -.... - -If we remove it from our link:run[] script by hacking it up, the addresses shown by `rand_check.out` vary across boots. - -Equivalent to: - -.... -echo 0 > /proc/sys/kernel/randomize_va_space -.... - -=== insmod alternatives - -==== modprobe - -If you are feeling fancy, you can also insert modules with: - -.... -modprobe hello -.... - -which insmods link:packages/kernel_modules/hello.c[]. - -`modprobe` searches for modules under: - -.... -ls /lib/modules/*/extra/ -.... - -Kernel modules built from the Linux mainline tree with `CONFIG_SOME_MOD=m`, are automatically available with `modprobe`, e.g.: - -.... -modprobe dummy-irq irq=1 -.... - -==== myinsmod - -If you are feeling raw, you can insert and remove modules with our own minimal module inserter and remover! - -.... -# init_module -/myinsmod.out /hello.ko -# finit_module -/myinsmod.out /hello.ko "" 1 -/myrmmod.out hello -.... - -which teaches you how it is done from C code. - -Source: - -* link:packages/kernel_modules/user/myinsmod.c[] -* link:packages/kernel_modules/user/myrmmod.c[] - -The Linux kernel offers two system calls for module insertion: - -* `init_module` -* `finit_module` - -and: - -.... -man init_module -.... - -documents that: - -____ -The finit_module() system call is like init_module(), but reads the module to be loaded from the file descriptor fd. It is useful when the authenticity of a kernel module can be determined from its location in the filesystem; in cases where that is possible, the overhead of using cryptographically signed modules to determine the authenticity of a module can be avoided. The param_values argument is as for init_module(). -____ - -`finit` is newer and was added only in v3.8. More rationale: https://lwn.net/Articles/519010/ - -Bibliography: https://stackoverflow.com/questions/5947286/how-to-load-linux-kernel-modules-from-c-code - -=== Simultaneous runs - -When doing long simulations sweeping across multiple system parameters, it becomes fundamental to do multiple simulations in parallel. - -This is specially true for gem5, which runs much slower than QEMU, and cannot use multiple host cores to speed up the simulation: link:https://github.com/cirosantilli-work/gem5-issues/issues/15[], so the only way to parallelize is to run multiple instances in parallel. - -This also has a good synergy with <>. - -First shell: - -.... -./run -.... - -Another shell: - -.... -./run --run-id 1 -.... - -and now you have two QEMU instances running in parallel. - -The default run id is `0`. - -Our scripts solve two difficulties with simultaneous runs: - -* port conflicts, e.g. GDB and link:gem5-shell[] -* output directory conflicts, e.g. traces and gem5 stats overwriting one another - -Each run gets a separate output directory. For example: - -.... -./run --arch aarch64 --gem5 --run-id 0 &>/dev/null & -./run --arch aarch64 --gem5 --run-id 1 &>/dev/null & -.... - -produces two separate `m5out` directories: - -.... -echo "$(./getvar --arch aarch64 --gem5 --run-id 0 m5out_dir)" -echo "$(./getvar --arch aarch64 --gem5 --run-id 1 m5out_dir)" -.... - -and the gem5 host executable stdout and stderr can be found at: - -.... -less "$(./getvar --arch aarch64 --gem5 --run-id 0 termout_file)" -less "$(./getvar --arch aarch64 --gem5 --run-id 1 termout_file)" -.... - -Each line is prepended with the timestamp in seconds since the start of the program when it appeared. - -To have more semantic output directories names for later inspection, you can use a non numeric string for the run ID, and indicate the port offset explicitly: - -.... -./run --arch aarch64 --gem5 --run-id some-experiment --port-offset 1 -.... - -`--port-offset` defaults to the run ID when that is a number. - -Like <>, you will need to pass the `-n` option to anything that needs to know runtime information, e.g. <>: - -.... -./run --run-id 1 -./rungdb --run-id 1 -.... - -To run multiple gem5 checkouts, see: <>. - -Implementation note: we create multiple namespaces for two things: - -* run output directory -* ports -** QEMU allows setting all ports explicitly. -+ -If a port is not free, it just crashes. -+ -We assign a contiguous port range for each run ID. -** gem5 automatically increments ports until it finds a free one. -+ -gem5 60600f09c25255b3c8f72da7fb49100e2682093a does not seem to expose a way to set the terminal and VNC ports from `fs.py`, so we just let gem5 assign the ports itself, and use `-n` only to match what it assigned. Those ports both appear on `config.ini`. -+ -The GDB port can be assigned on `gem5.opt --remote-gdb-port`, but it does not appear on `config.ini`. - -=== Build the documentation - -You don't need to depend on GitHub: - -.... -./build-doc -xdg-open out/README.html -.... - -Source: link:build-doc[] - [[gdb]] == GDB step debug @@ -2789,7 +2478,79 @@ The main use case for `-enable-kvm` in this repository is to test if something t For example, when porting a benchmark to Buildroot, you can first use QEMU's KVM to test that benchmarks is producing the correct results, before analysing them more deeply in gem5, which runs much slower. -== kmod +== Kernel module utilities + +=== insmod + +link:https://git.busybox.net/busybox/tree/modutils/insmod.c?h=1_29_3[Provided by BusyBox]: + +.... +./run --eval-busybox 'insmod /hello.ko' +.... + +=== modprobe + +If you are feeling fancy, you can also insert modules with: + +.... +modprobe hello +.... + +which insmods link:packages/kernel_modules/hello.c[]. + +`modprobe` searches for modules under: + +.... +ls /lib/modules/*/extra/ +.... + +Kernel modules built from the Linux mainline tree with `CONFIG_SOME_MOD=m`, are automatically available with `modprobe`, e.g.: + +.... +modprobe dummy-irq irq=1 +.... + +=== myinsmod + +If you are feeling raw, you can insert and remove modules with our own minimal module inserter and remover! + +.... +# init_module +/myinsmod.out /hello.ko +# finit_module +/myinsmod.out /hello.ko "" 1 +/myrmmod.out hello +.... + +which teaches you how it is done from C code. + +Source: + +* link:packages/kernel_modules/user/myinsmod.c[] +* link:packages/kernel_modules/user/myrmmod.c[] + +The Linux kernel offers two system calls for module insertion: + +* `init_module` +* `finit_module` + +and: + +.... +man init_module +.... + +documents that: + +____ +The finit_module() system call is like init_module(), but reads the module to be loaded from the file descriptor fd. It is useful when the authenticity of a kernel module can be determined from its location in the filesystem; in cases where that is possible, the overhead of using cryptographically signed modules to determine the authenticity of a module can be avoided. The param_values argument is as for init_module(). +____ + +`finit` is newer and was added only in v3.8. More rationale: https://lwn.net/Articles/519010/ + +Bibliography: https://stackoverflow.com/questions/5947286/how-to-load-linux-kernel-modules-from-c-code + +=== kmod https://git.kernel.org/pub/scm/utils/kernel/kmod/kmod.git @@ -2823,11 +2584,11 @@ Buildroot also has a kmod package, but we are not using it since BusyBox' versio This page will only describe features that differ from kmod to the BusyBox implementation. -=== module-init-tools +==== module-init-tools Name of a predecessor set of tools. -=== kmod modprobe +==== kmod modprobe kmod's `modprobe` can also load modules under different names to avoid conflicts, e.g.: @@ -3416,6 +3177,95 @@ Those commits change `BR2_LINUX_KERNEL_LATEST_VERSION` in `/linux/Config.in`. You should then look up if there is a branch that supports that kernel. Staying on branches is a good idea as they will get backports, in particular ones that fix the build as newer host versions come out. +=== Kernel command line parameters + +Bootloaders can pass a string as input to the Linux kernel when it is booting to control its behaviour, much like the `execve` system call does to userland processes. + +This allows us to control the behaviour of the kernel without rebuilding anything. + +With QEMU, QEMU itself acts as the bootloader, and provides the `-append` option and we expose it through `./run --kernel-cli`, e.g.: + +.... +./run --kernel-cli 'foo bar' +.... + +Then inside the host, you can check which options were given with: + +.... +cat /proc/cmdline +.... + +They are also printed at the beginning of the boot message: + +.... +dmesg | grep "Command line" +.... + +See also: + +* https://unix.stackexchange.com/questions/48601/how-to-display-the-linux-kernel-command-line-parameters-given-for-the-current-bo +* https://askubuntu.com/questions/32654/how-do-i-find-the-boot-parameters-used-by-the-running-kernel + +The arguments are documented in the kernel documentation: https://www.kernel.org/doc/html/v4.14/admin-guide/kernel-parameters.html + +When dealing with real boards, extra command line options are provided on some magic bootloader configuration file, e.g.: + +* GRUB configuration files: https://askubuntu.com/questions/19486/how-do-i-add-a-kernel-boot-parameter +* Raspberry pi `/boot/cmdline.txt` on a magic partition: https://raspberrypi.stackexchange.com/questions/14839/how-to-change-the-kernel-commandline-for-archlinuxarm-on-raspberry-pi-effectly + +==== Kernel command line parameters escaping + +Double quotes can be used to escape spaces as in `opt="a b"`, but double quotes themselves cannot be escaped, e.g. `opt"a\"b"` + +This even lead us to use base64 encoding with `--eval`! + +==== Kernel command line parameters definition points + +There are two methods: + +* `__setup` as in: ++ +.... +__setup("console=", console_setup); +.... +* `core_param` as in: ++ +.... +core_param(panic, panic_timeout, int, 0644); +.... + +`core_param` suggests how they are different: + +.... +/** + * core_param - define a historical core kernel parameter. + +... + + * core_param is just like module_param(), but cannot be modular and + * doesn't add a prefix (such as "printk."). This is for compatibility + * with __setup(), and it makes sense as truly core parameters aren't + * tied to the particular file they're in. + */ +.... + +==== norandmaps + +Disable userland address space randomization. Test it out by running <> twice: + +.... +./run --eval-busybox '/rand_check.out;/poweroff.out' +./run --eval-busybox '/rand_check.out;/poweroff.out' +.... + +If we remove it from our link:run[] script by hacking it up, the addresses shown by `rand_check.out` vary across boots. + +Equivalent to: + +.... +echo 0 > /proc/sys/kernel/randomize_va_space +.... + === printk `printk` is the most simple and widely used way of getting information from the kernel, so you should familiarize yourself with its basic configuration. @@ -6487,6 +6337,73 @@ kill %1 Some QEMU specific features to play with and limitations to cry over. +=== Disk persistency + +We disable disk persistency for both QEMU and gem5 by default, to prevent the emulator from putting the image in an unknown state. + +For QEMU, this is done by passing the `snapshot` option to `-drive`, and for gem5 it is the default behaviour. + +If you hack up our link:run[] script to remove that option, then: + +.... +./run --eval-busybox 'date >f;poweroff' + +.... + +followed by: + +.... +./run --eval-busybox 'cat f' +.... + +gives the date, because `poweroff` without `-n` syncs before shutdown. + +The `sync` command also saves the disk: + +.... +sync +.... + +When you do: + +.... +./build-buildroot +.... + +the disk image gets overwritten by a fresh filesystem and you lose all changes. + +Remember that if you forcibly turn QEMU off without `sync` or `poweroff` from inside the VM, e.g. by closing the QEMU window, disk changes may not be saved. + +Persistency is also turned off when booting from <> with a CPIO instead of with a disk. + +Disk persistency is useful to re-run shell commands from the history of a previous session with `Ctrl-R`, but we felt that the loss of determinism was not worth it. + +==== gem5 disk persistency + +TODO how to make gem5 disk writes persistent? + +As of cadb92f2df916dbb47f428fd1ec4932a2e1f0f48 there are some `read_only` entries in the `config.ini` under cow sections, but hacking them to true did not work: + +.... +diff --git a/configs/common/FSConfig.py b/configs/common/FSConfig.py +index 17498c42b..76b8b351d 100644 +--- a/configs/common/FSConfig.py ++++ b/configs/common/FSConfig.py +@@ -60,7 +60,7 @@ os_types = { 'alpha' : [ 'linux' ], + } + + class CowIdeDisk(IdeDisk): +- image = CowDiskImage(child=RawDiskImage(read_only=True), ++ image = CowDiskImage(child=RawDiskImage(read_only=False), + read_only=False) + + def childImage(self, ci): +.... + +The directory of interest is `src/dev/storage`. + +qcow2 does not appear supported, there are not hits in the source tree, and there is a mention on Nate's 2009 wishlist: http://gem5.org/Nate%27s_Wish_List + === Snapshot QEMU allows us to take snapshots at any time through the monitor. @@ -9877,7 +9794,7 @@ If you just want to run a command after boot ends without thinking much about it ./run --eval-busybox 'echo hello' .... -This option passes the command to our init scripts, and uses a few clever tricks along the way to make it just work. +This option passes the command to our init scripts through <>, and uses a few clever tricks along the way to make it just work. See <> for the gory details. @@ -9941,6 +9858,101 @@ Verify with: ls "$(./getvar build_dir)" .... +=== Build the documentation + +You don't need to depend on GitHub: + +.... +./build-doc +xdg-open out/README.html +.... + +Source: link:build-doc[] + +=== Simultaneous runs + +When doing long simulations sweeping across multiple system parameters, it becomes fundamental to do multiple simulations in parallel. + +This is specially true for gem5, which runs much slower than QEMU, and cannot use multiple host cores to speed up the simulation: link:https://github.com/cirosantilli-work/gem5-issues/issues/15[], so the only way to parallelize is to run multiple instances in parallel. + +This also has a good synergy with <>. + +First shell: + +.... +./run +.... + +Another shell: + +.... +./run --run-id 1 +.... + +and now you have two QEMU instances running in parallel. + +The default run id is `0`. + +Our scripts solve two difficulties with simultaneous runs: + +* port conflicts, e.g. GDB and link:gem5-shell[] +* output directory conflicts, e.g. traces and gem5 stats overwriting one another + +Each run gets a separate output directory. For example: + +.... +./run --arch aarch64 --gem5 --run-id 0 &>/dev/null & +./run --arch aarch64 --gem5 --run-id 1 &>/dev/null & +.... + +produces two separate `m5out` directories: + +.... +echo "$(./getvar --arch aarch64 --gem5 --run-id 0 m5out_dir)" +echo "$(./getvar --arch aarch64 --gem5 --run-id 1 m5out_dir)" +.... + +and the gem5 host executable stdout and stderr can be found at: + +.... +less "$(./getvar --arch aarch64 --gem5 --run-id 0 termout_file)" +less "$(./getvar --arch aarch64 --gem5 --run-id 1 termout_file)" +.... + +Each line is prepended with the timestamp in seconds since the start of the program when it appeared. + +To have more semantic output directories names for later inspection, you can use a non numeric string for the run ID, and indicate the port offset explicitly: + +.... +./run --arch aarch64 --gem5 --run-id some-experiment --port-offset 1 +.... + +`--port-offset` defaults to the run ID when that is a number. + +Like <>, you will need to pass the `-n` option to anything that needs to know runtime information, e.g. <>: + +.... +./run --run-id 1 +./rungdb --run-id 1 +.... + +To run multiple gem5 checkouts, see: <>. + +Implementation note: we create multiple namespaces for two things: + +* run output directory +* ports +** QEMU allows setting all ports explicitly. ++ +If a port is not free, it just crashes. ++ +We assign a contiguous port range for each run ID. +** gem5 automatically increments ports until it finds a free one. ++ +gem5 60600f09c25255b3c8f72da7fb49100e2682093a does not seem to expose a way to set the terminal and VNC ports from `fs.py`, so we just let gem5 assign the ports itself, and use `-n` only to match what it assigned. Those ports both appear on `config.ini`. ++ +The GDB port can be assigned on `gem5.opt --remote-gdb-port`, but it does not appear on `config.ini`. + === Directory structure * `data`: gitignored user created data. Deleting this might lead to loss of data. Of course, if something there becomes is important enough to you, git track it.