mirror of
https://github.com/cirosantilli/linux-kernel-module-cheat.git
synced 2026-01-22 17:55:57 +01:00
relase: get failed extract-vmlinux automation back working
Only the command is back in business, but it does not work: https://github.com/cirosantilli/linux-kernel-module-cheat/issues/79
This commit is contained in:
259
README.adoc
259
README.adoc
@@ -15,7 +15,7 @@
|
||||
|
||||
TL;DR: <<qemu-buildroot-setup-getting-started>>
|
||||
|
||||
The source code for this page is located at: https://github.com/cirosantilli/linux-kernel-module-cheat[]. Due to link:https://github.com/isaacs/github/issues/1610[a GitHub limitation], this README is too long and not fully rendered on github.com. Either use: https://cirosantilli.com/linux-kernel-module-cheat or <<build-the-documentation,build the docs yourself>>.
|
||||
The source code for this page is located at: https://github.com/cirosantilli/linux-kernel-module-cheat[]. Due to https://github.com/isaacs/github/issues/1610[a GitHub limitation], this README is too long and not fully rendered on github.com. Either use: https://cirosantilli.com/linux-kernel-module-cheat or <<build-the-documentation,build the docs yourself>>.
|
||||
|
||||
toc::[]
|
||||
|
||||
@@ -296,12 +296,12 @@ You have now gone from newb to hardware hacker in a mere 15 minutes, your rate o
|
||||
|
||||
Seriously though, if you want to be a real hardware hacker, it just can't be done with open source tools as of 2018. The root obstacle is that:
|
||||
|
||||
* link:https://en.wikipedia.org/wiki/Semiconductor_fabrication_plant[Silicon fabs] don't publish reveal their link:https://en.wikipedia.org/wiki/Design_rule_checking[design rules]
|
||||
* which implies that there are no decent link:https://en.wikipedia.org/wiki/Standard_cell[standard cell libraries]. See also: https://www.quora.com/Are-there-good-open-source-standard-cell-libraries-to-learn-IC-synthesis-with-EDA-tools/answer/Ciro-Santilli
|
||||
* which implies that people can't develop open source link:https://en.wikipedia.org/wiki/Electronic_design_automation[EDA tools]
|
||||
* which implies that you can't get decent link:https://community.cadence.com/cadence_blogs_8/b/di/posts/hls-ppa-is-it-all-you-need-to-know[power, performance and area] estimates
|
||||
* https://en.wikipedia.org/wiki/Semiconductor_fabrication_plant[Silicon fabs] don't publish reveal their https://en.wikipedia.org/wiki/Design_rule_checking[design rules]
|
||||
* which implies that there are no decent https://en.wikipedia.org/wiki/Standard_cell[standard cell libraries]. See also: https://www.quora.com/Are-there-good-open-source-standard-cell-libraries-to-learn-IC-synthesis-with-EDA-tools/answer/Ciro-Santilli
|
||||
* which implies that people can't develop open source https://en.wikipedia.org/wiki/Electronic_design_automation[EDA tools]
|
||||
* which implies that you can't get decent https://community.cadence.com/cadence_blogs_8/b/di/posts/hls-ppa-is-it-all-you-need-to-know[power, performance and area] estimates
|
||||
|
||||
The only thing you can do with open source is purely functional designs with link:https://en.wikipedia.org/wiki/Verilator[Verilator], but you will never know if it can be actually produced and how efficient it can be.
|
||||
The only thing you can do with open source is purely functional designs with https://en.wikipedia.org/wiki/Verilator[Verilator], but you will never know if it can be actually produced and how efficient it can be.
|
||||
|
||||
If you really want to develop semiconductors, your only choice is to join an university or a semiconductor company that has the EDA licenses.
|
||||
|
||||
@@ -311,7 +311,7 @@ While hacking QEMU, you will likely want to GDB step its source. That is trivial
|
||||
|
||||
===== Your first glibc hack
|
||||
|
||||
We use <<libc-choice,glibc as our default libc now>>, and it is tracked as an unmodified submodule at link:submodules/glibc[], at the exact same version that Buildroot has it, which can be found at: link:https://github.com/buildroot/buildroot/blob/2018.05/package/glibc/glibc.mk#L13[package/glibc/glibc.mk]. Buildroot 2018.05 applies no patches.
|
||||
We use <<libc-choice,glibc as our default libc now>>, and it is tracked as an unmodified submodule at link:submodules/glibc[], at the exact same version that Buildroot has it, which can be found at: https://github.com/buildroot/buildroot/blob/2018.05/package/glibc/glibc.mk#L13[package/glibc/glibc.mk]. Buildroot 2018.05 applies no patches.
|
||||
|
||||
Let's hack up the `puts` function:
|
||||
|
||||
@@ -375,7 +375,7 @@ Tested on a30ed0f047523ff2368d421ee2cce0800682c44e + 1.
|
||||
|
||||
Have you ever felt that a single `inc` instruction was not enough? Really? Me too!
|
||||
|
||||
So let's hack the <<gnu-gas-assembler>>, which is part of link:https://en.wikipedia.org/wiki/GNU_Binutils[GNU Binutils], to add a new shiny version of `inc` called... `myinc`!
|
||||
So let's hack the <<gnu-gas-assembler>>, which is part of https://en.wikipedia.org/wiki/GNU_Binutils[GNU Binutils], to add a new shiny version of `inc` called... `myinc`!
|
||||
|
||||
GCC uses GNU GAS as its backend, so we will test out new mnemonic with an <<gcc-inline-assembly>> test program: link:userland/arch/x86_64/binutils_hack.c[], which is just a copy of link:userland/arch/x86_64/binutils_nohack.c[] but with `myinc` instead of `inc`.
|
||||
|
||||
@@ -532,7 +532,7 @@ Read the following sections for further introductory material:
|
||||
|
||||
==== About the gem5 Buildroot setup
|
||||
|
||||
This setup is like the <<qemu-buildroot-setup>>, but it uses link:http://gem5.org/[gem5] instead of QEMU as a system simulator.
|
||||
This setup is like the <<qemu-buildroot-setup>>, but it uses http://gem5.org/[gem5] instead of QEMU as a system simulator.
|
||||
|
||||
QEMU tries to run as fast as possible and give correct results at the end, but it does not tell us how many CPU cycles it takes to do something, just the number of instructions it ran. This kind of simulation is known as functional simulation.
|
||||
|
||||
@@ -611,7 +611,7 @@ Good next steps are:
|
||||
[[docker]]
|
||||
=== Docker host setup
|
||||
|
||||
This repository has been tested inside clean link:https://en.wikipedia.org/wiki/Docker_(software)[Docker] containers.
|
||||
This repository has been tested inside clean https://en.wikipedia.org/wiki/Docker_(software)[Docker] containers.
|
||||
|
||||
This is a good option if you are on a Linux host, but the native setup failed due to your weird host distribution, and you have better things to do with your life than to debug it. See also: <<supported-hosts>>.
|
||||
|
||||
@@ -630,7 +630,7 @@ You are now left inside a shell in the Docker! From there, just run as usual:
|
||||
./run
|
||||
....
|
||||
|
||||
The host git top level directory is mounted inside the guest with a link:https://stackoverflow.com/questions/23439126/how-to-mount-a-host-directory-in-a-docker-container[Docker volume], which means for example that you can use your host's GUI text editor directly on the files. Just don't forget that if you nuke that directory on the guest, then it gets nuked on the host as well!
|
||||
The host git top level directory is mounted inside the guest with a https://stackoverflow.com/questions/23439126/how-to-mount-a-host-directory-in-a-docker-container[Docker volume], which means for example that you can use your host's GUI text editor directly on the files. Just don't forget that if you nuke that directory on the guest, then it gets nuked on the host as well!
|
||||
|
||||
Command breakdown:
|
||||
|
||||
@@ -732,7 +732,9 @@ The limitations are severe however:
|
||||
+
|
||||
Maybe we could work around this by just downloading the kernel source somehow, and using a host prebuilt GDB, but we felt that it would be too messy and unreliable.
|
||||
* you won't get the latest version of this repository. Our <<travis>> attempt to automate builds failed, and storing a release for every commit would likely make GitHub mad at us anyways.
|
||||
* <<gem5>> is not currently supported. The major blocking point is how to avoid distributing the kernel images twice: once for gem5 which uses `vmlinux`, and once for QEMU which uses `arch/*` images, see also: <<vmlinux-vs-bzimage-vs-zimage-vs-image>>.
|
||||
* <<gem5>> is not currently supported. The major blocking point is how to avoid distributing the kernel images twice: once for gem5 which uses `vmlinux`, and once for QEMU which uses `arch/*` images, see also:
|
||||
** https://github.com/cirosantilli/linux-kernel-module-cheat/issues/79
|
||||
** <<vmlinux-vs-bzimage-vs-zimage-vs-image>>.
|
||||
|
||||
This setup might be good enough for those developing simulators, as that requires less image modification. But once again, if you are serious about this, why not just let your computer build the <<qemu-buildroot-setup,full featured setup>> while you take a coffee or a nap? :-)
|
||||
|
||||
@@ -839,7 +841,7 @@ Docker is used here just as an image download provider since it has a wide varie
|
||||
* that image is not ready to boot, but rather goes into an interactive installer: https://askubuntu.com/questions/884534/how-to-run-ubuntu-16-04-desktop-on-qemu/1046792#1046792
|
||||
* the default Ubuntu image has a large collection of software, and is large. The docker version is much more minimal.
|
||||
|
||||
One alternative would be to use link:https://wiki.ubuntu.com/Base[Ubuntu base] which can be downloaded from: http://cdimage.ubuntu.com/ubuntu-base That provides a `.tgz` and comes very close to what we obtain with Docker, but without the need for `sudo`.
|
||||
One alternative would be to use https://wiki.ubuntu.com/Base[Ubuntu base] which can be downloaded from: http://cdimage.ubuntu.com/ubuntu-base That provides a `.tgz` and comes very close to what we obtain with Docker, but without the need for `sudo`.
|
||||
|
||||
==== Ubuntu guest setup getting started
|
||||
|
||||
@@ -868,7 +870,7 @@ It has however severe limitations:
|
||||
** your disk could get erased. Yes, this can also happen with `sudo` from userland. But you should not use `sudo` when developing newbie programs. And for the kernel you don't have the choice not to use `sudo`.
|
||||
** even more subtle system corruption such as https://unix.stackexchange.com/questions/78858/cannot-remove-or-reinsert-kernel-module-after-error-while-inserting-it-without-r[not being able to rmmod]
|
||||
* can't control which hardware is used, notably the CPU architecture
|
||||
* can't step debug it with <<gdb,GDB>> easily. The alternatives are link:https://en.wikipedia.org/wiki/JTAG[JTAG] or <<kgdb>>, but those are less reliable, and require extra hardware.
|
||||
* can't step debug it with <<gdb,GDB>> easily. The alternatives are https://en.wikipedia.org/wiki/JTAG[JTAG] or <<kgdb>>, but those are less reliable, and require extra hardware.
|
||||
|
||||
Still interested?
|
||||
|
||||
@@ -1189,9 +1191,9 @@ The main reason this setup is included in this project, despite the word "Linux"
|
||||
|
||||
This setup allows you to make a tiny OS and that runs just a few instructions, use it to fully control the CPU to better understand the simulators for example, or develop your own OS if you are into that.
|
||||
|
||||
You can also use C and a subset of the C standard library because we enable link:https://en.wikipedia.org/wiki/Newlib[Newlib] by default. See also: https://electronics.stackexchange.com/questions/223929/c-standard-libraries-on-bare-metal/400077#400077
|
||||
You can also use C and a subset of the C standard library because we enable https://en.wikipedia.org/wiki/Newlib[Newlib] by default. See also: https://electronics.stackexchange.com/questions/223929/c-standard-libraries-on-bare-metal/400077#400077
|
||||
|
||||
Our C bare-metal compiler is built with link:https://github.com/crosstool-ng/crosstool-ng[crosstool-NG]. If you have already built <<qemu-buildroot-setup,Buildroot>> previously, you will end up with two GCCs installed. Unfortunately I don't see a solution for this, since we need separate toolchains for Newlib on baremetal and glibc on Linux: https://stackoverflow.com/questions/38956680/difference-between-arm-none-eabi-and-arm-linux-gnueabi/38989869#38989869
|
||||
Our C bare-metal compiler is built with https://github.com/crosstool-ng/crosstool-ng[crosstool-NG]. If you have already built <<qemu-buildroot-setup,Buildroot>> previously, you will end up with two GCCs installed. Unfortunately I don't see a solution for this, since we need separate toolchains for Newlib on baremetal and glibc on Linux: https://stackoverflow.com/questions/38956680/difference-between-arm-none-eabi-and-arm-linux-gnueabi/38989869#38989869
|
||||
|
||||
==== Baremetal setup getting started
|
||||
|
||||
@@ -1437,7 +1439,7 @@ continue
|
||||
|
||||
And you now control the counting on the first shell from GDB!
|
||||
|
||||
Before v4.17, the symbol name was just `sys_write`, the change happened at link:https://github.com/torvalds/linux/commit/d5a00528b58cdb2c71206e18bd021e34c4eab878[d5a00528b58cdb2c71206e18bd021e34c4eab878]. As of Linux v 4.19, the function is called `sys_write` in `arm`, and `__arm64_sys_write` in `aarch64`. One good way to find it if the name changes again is to try:
|
||||
Before v4.17, the symbol name was just `sys_write`, the change happened at https://github.com/torvalds/linux/commit/d5a00528b58cdb2c71206e18bd021e34c4eab878[d5a00528b58cdb2c71206e18bd021e34c4eab878]. As of Linux v 4.19, the function is called `sys_write` in `arm`, and `__arm64_sys_write` in `aarch64`. One good way to find it if the name changes again is to try:
|
||||
|
||||
....
|
||||
rbreak .*sys_write
|
||||
@@ -2651,7 +2653,7 @@ break sleep
|
||||
continue
|
||||
....
|
||||
|
||||
And you are now left inside the `sleep` function of our default libc implementation uclibc link:https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/libc/unistd/sleep.c?h=v1.0.30#n91[`libc/unistd/sleep.c`]!
|
||||
And you are now left inside the `sleep` function of our default libc implementation uclibc https://cgit.uclibc-ng.org/cgi/cgit/uclibc-ng.git/tree/libc/unistd/sleep.c?h=v1.0.30#n91[`libc/unistd/sleep.c`]!
|
||||
|
||||
You can also step into the `sleep` call:
|
||||
|
||||
@@ -2984,7 +2986,7 @@ More details: https://unix.stackexchange.com/questions/30414/what-can-make-passi
|
||||
|
||||
=== Init environment
|
||||
|
||||
Documented at link:https://www.kernel.org/doc/html/v4.14/admin-guide/kernel-parameters.html[]:
|
||||
Documented at https://www.kernel.org/doc/html/v4.14/admin-guide/kernel-parameters.html[]:
|
||||
|
||||
____
|
||||
The kernel parses parameters from the kernel command line up to "-"; if it doesn't recognize a parameter and it doesn't contain a '.', the parameter gets passed to init: parameters with '=' go into init's environment, others are passed as command line arguments to init. Everything after "-" is passed as an argument to init.
|
||||
@@ -3455,7 +3457,7 @@ gem5 can generate DTBs on ARM with `--generate-dtb`. The generated DTB is placed
|
||||
|
||||
== KVM
|
||||
|
||||
link:https://en.wikipedia.org/wiki/Kernel-based_Virtual_Machine[KVM] is Linux kernel interface that <<benchmark-linux-kernel-boot,greatly speeds up>> execution of virtual machines.
|
||||
https://en.wikipedia.org/wiki/Kernel-based_Virtual_Machine[KVM] is Linux kernel interface that <<benchmark-linux-kernel-boot,greatly speeds up>> execution of virtual machines.
|
||||
|
||||
You can make QEMU or gem5 by passing enabling KVM with:
|
||||
|
||||
@@ -3471,7 +3473,7 @@ panic: KVM: Failed to enter virtualized mode (hw reason: 0x80000021)
|
||||
|
||||
KVM works by running userland instructions natively directly on the real hardware instead of running a software simulation of those instructions.
|
||||
|
||||
Therefore, KVM only works if you the host architecture is the same as the guest architecture. This means that this will likely only work for x86 guests since almost all development machines are x86 nowadays. Unless you are link:https://www.youtube.com/watch?v=8ItXpmLsINs[running an ARM desktop for some weird reason] :-)
|
||||
Therefore, KVM only works if you the host architecture is the same as the guest architecture. This means that this will likely only work for x86 guests since almost all development machines are x86 nowadays. Unless you are https://www.youtube.com/watch?v=8ItXpmLsINs[running an ARM desktop for some weird reason] :-)
|
||||
|
||||
We don't enable KVM by default because:
|
||||
|
||||
@@ -3985,7 +3987,7 @@ So we just check ourselves manually
|
||||
|
||||
=== insmod
|
||||
|
||||
link:https://git.busybox.net/busybox/tree/modutils/insmod.c?h=1_29_3[Provided by BusyBox]:
|
||||
https://git.busybox.net/busybox/tree/modutils/insmod.c?h=1_29_3[Provided by BusyBox]:
|
||||
|
||||
....
|
||||
./run --eval-after 'insmod hello.ko'
|
||||
@@ -4103,7 +4105,7 @@ sudo modprobe vmhgfs -o vm_hgfs
|
||||
|
||||
=== OverlayFS
|
||||
|
||||
link:https://en.wikipedia.org/wiki/OverlayFS[OverlayFS] is a filesystem merged in the Linux kernel in 3.18.
|
||||
https://en.wikipedia.org/wiki/OverlayFS[OverlayFS] is a filesystem merged in the Linux kernel in 3.18.
|
||||
|
||||
As the name suggests, OverlayFS allows you to merge multiple directories into one. The following minimal runnable examples should give you an intuition on how it works:
|
||||
|
||||
@@ -4140,7 +4142,7 @@ I was unable to mount directly to `/` avoid the `chroot`:
|
||||
|
||||
We already have a prototype of this running from `fstab` on guest at `/mnt/overlay`, but it has the following shortcomings:
|
||||
|
||||
* changes to underlying filesystems are not visible on the overlay unless you remount with `mount -r remount /mnt/overlay`, as mentioned link:https://github.com/torvalds/linux/blob/v4.18/Documentation/filesystems/overlayfs.txt#L332[on the kernel docs]:
|
||||
* changes to underlying filesystems are not visible on the overlay unless you remount with `mount -r remount /mnt/overlay`, as mentioned https://github.com/torvalds/linux/blob/v4.18/Documentation/filesystems/overlayfs.txt#L332[on the kernel docs]:
|
||||
+
|
||||
....
|
||||
Changes to the underlying filesystems while part of a mounted overlay
|
||||
@@ -4259,7 +4261,7 @@ Text mode has the following limitations over graphics mode:
|
||||
./qemu-monitor info qtree
|
||||
....
|
||||
|
||||
and the Linux kernel picks it up through the link:https://en.wikipedia.org/wiki/Linux_framebuffer[fbdev] graphics system as can be seen from:
|
||||
and the Linux kernel picks it up through the https://en.wikipedia.org/wiki/Linux_framebuffer[fbdev] graphics system as can be seen from:
|
||||
|
||||
....
|
||||
cat /dev/urandom > /dev/fb0
|
||||
@@ -4421,7 +4423,7 @@ TODO <<kmscube>> failed on `aarch64` with:
|
||||
kmscube[706]: unhandled level 2 translation fault (11) at 0x00000000, esr 0x92000006, in libgbm.so.1.0.0[7fbf6a6000+e000]
|
||||
....
|
||||
|
||||
Tested on: link:https://github.com/cirosantilli/linux-kernel-module-cheat/commit/38fd6153d965ba20145f53dc1bb3ba34b336bde9[38fd6153d965ba20145f53dc1bb3ba34b336bde9]
|
||||
Tested on: https://github.com/cirosantilli/linux-kernel-module-cheat/commit/38fd6153d965ba20145f53dc1bb3ba34b336bde9[38fd6153d965ba20145f53dc1bb3ba34b336bde9]
|
||||
|
||||
==== Graphic mode gem5 aarch64
|
||||
|
||||
@@ -4446,7 +4448,7 @@ This is because the gem5 `aarch64` defconfig does not enable HDLCD like the 32 b
|
||||
|
||||
TODO get working. There is an unmerged patchset at: https://gem5-review.googlesource.com/c/public/gem5/+/11036/1
|
||||
|
||||
The DP650 is a newer display hardware than HDLCD. TODO is its interface publicly documented anywhere? Since it has a gem5 model and link:https://github.com/torvalds/linux/blob/v4.19/drivers/gpu/drm/arm/Kconfig#L39[in-tree Linux kernel support], that information cannot be secret?
|
||||
The DP650 is a newer display hardware than HDLCD. TODO is its interface publicly documented anywhere? Since it has a gem5 model and https://github.com/torvalds/linux/blob/v4.19/drivers/gpu/drm/arm/Kconfig#L39[in-tree Linux kernel support], that information cannot be secret?
|
||||
|
||||
The key option to enable support in Linux is `DRM_MALI_DISPLAY=y` which we enable at link:linux_config/display[].
|
||||
|
||||
@@ -4460,7 +4462,7 @@ Build the kernel exactly as for <<graphic-mode-gem5-aarch64>> and then run with:
|
||||
|
||||
We cannot use mainline Linux because the <<gem5-arm-linux-kernel-patches>> are required at least to provide the `CONFIG_DRM_VIRT_ENCODER` option.
|
||||
|
||||
gem5 emulates the link:http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0541c/CHDBAIDI.html[HDLCD] ARM Holdings hardware for `arm` and `aarch64`.
|
||||
gem5 emulates the http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0541c/CHDBAIDI.html[HDLCD] ARM Holdings hardware for `arm` and `aarch64`.
|
||||
|
||||
The kernel uses HDLCD to implement the <<drm>> interface, the required kernel config options are present at: link:linux_config/display[].
|
||||
|
||||
@@ -4772,7 +4774,7 @@ Bibliography: https://serverfault.com/questions/769874/how-to-forward-a-port-fro
|
||||
|
||||
=== 9P
|
||||
|
||||
The link:https://en.wikipedia.org/wiki/9P_(protocol)[9p protocol] allows the guest to mount a host directory.
|
||||
The https://en.wikipedia.org/wiki/9P_(protocol)[9p protocol] allows the guest to mount a host directory.
|
||||
|
||||
Both QEMU and <<9p-gem5>> support 9P.
|
||||
|
||||
@@ -4788,7 +4790,7 @@ Advantages of 9P
|
||||
Furthermore, this would be inconvenient, since what we usually want to do is to share host cross built files with the guest, and to do that we would have to copy the files over after the guest starts the server.
|
||||
* QEMU implements 9P natively, which makes it very stable and convenient, and must mean it is a simpler protocol than NFS as one would expect.
|
||||
+
|
||||
This is not the case for gem5 7bfb7f3a43f382eb49853f47b140bfd6caad0fb8 unfortunately, which relies on the link:https://github.com/chaos/diod[diod] host daemon, although it is not unfeasible that future versions could implement it natively as well.
|
||||
This is not the case for gem5 7bfb7f3a43f382eb49853f47b140bfd6caad0fb8 unfortunately, which relies on the https://github.com/chaos/diod[diod] host daemon, although it is not unfeasible that future versions could implement it natively as well.
|
||||
|
||||
Advantages of NFS:
|
||||
|
||||
@@ -4898,7 +4900,7 @@ mount: mounting 10.0.2.2:/tmp on /mnt/nfs failed: No such device
|
||||
|
||||
And now the `/tmp` directory from host is not mounted on guest!
|
||||
|
||||
If you don't want to start the NFS server after the next boot automatically so save resources, link:https://askubuntu.com/questions/19320/how-to-enable-or-disable-services[do]:
|
||||
If you don't want to start the NFS server after the next boot automatically so save resources, https://askubuntu.com/questions/19320/how-to-enable-or-disable-services[do]:
|
||||
|
||||
....
|
||||
systemctl disable nfs-kernel-server
|
||||
@@ -5009,7 +5011,7 @@ From host:
|
||||
cat "$(./getvar linux_config)"
|
||||
....
|
||||
|
||||
Just for fun link:https://stackoverflow.com/questions/14958192/how-to-get-the-config-from-a-linux-kernel-image/14958263#14958263[]:
|
||||
Just for fun https://stackoverflow.com/questions/14958192/how-to-get-the-config-from-a-linux-kernel-image/14958263#14958263[]:
|
||||
|
||||
....
|
||||
./linux/scripts/extract-ikconfig "$(./getvar vmlinux)"
|
||||
@@ -5055,15 +5057,15 @@ We have since observed that the kernel size itself is very bloated compared to `
|
||||
[[buildroot-kernel-config]]
|
||||
===== About Buildroot's kernel configs
|
||||
|
||||
To see Buildroot's base configs, start from link:https://github.com/buildroot/buildroot/blob/2018.05/configs/qemu_x86_64_defconfig[`buildroot/configs/qemu_x86_64_defconfig`].
|
||||
To see Buildroot's base configs, start from https://github.com/buildroot/buildroot/blob/2018.05/configs/qemu_x86_64_defconfig[`buildroot/configs/qemu_x86_64_defconfig`].
|
||||
|
||||
That file contains `BR2_LINUX_KERNEL_CUSTOM_CONFIG_FILE="board/qemu/x86_64/linux-4.15.config"`, which points to the base config file used: link:https://github.com/buildroot/buildroot/blob/2018.05/board/qemu/x86_64/linux-4.15.config[board/qemu/x86_64/linux-4.15.config].
|
||||
That file contains `BR2_LINUX_KERNEL_CUSTOM_CONFIG_FILE="board/qemu/x86_64/linux-4.15.config"`, which points to the base config file used: https://github.com/buildroot/buildroot/blob/2018.05/board/qemu/x86_64/linux-4.15.config[board/qemu/x86_64/linux-4.15.config].
|
||||
|
||||
`arm`, on the other hand, uses link:https://github.com/buildroot/buildroot/blob/2018.05/configs/qemu_arm_vexpress_defconfig[`buildroot/configs/qemu_arm_vexpress_defconfig`], which contains `BR2_LINUX_KERNEL_DEFCONFIG="vexpress"`, and therefore just does a `make vexpress_defconfig`, and gets its config from the Linux kernel tree itself.
|
||||
`arm`, on the other hand, uses https://github.com/buildroot/buildroot/blob/2018.05/configs/qemu_arm_vexpress_defconfig[`buildroot/configs/qemu_arm_vexpress_defconfig`], which contains `BR2_LINUX_KERNEL_DEFCONFIG="vexpress"`, and therefore just does a `make vexpress_defconfig`, and gets its config from the Linux kernel tree itself.
|
||||
|
||||
====== Linux kernel defconfig
|
||||
|
||||
To boot link:https://stackoverflow.com/questions/41885015/what-exactly-does-linux-kernels-make-defconfig-do[defconfig] from disk on Linux and see a shell, all we need is these missing virtio options:
|
||||
To boot https://stackoverflow.com/questions/41885015/what-exactly-does-linux-kernels-make-defconfig-do[defconfig] from disk on Linux and see a shell, all we need is these missing virtio options:
|
||||
|
||||
....
|
||||
./build-linux \
|
||||
@@ -5162,7 +5164,7 @@ Tested on 649d06d6758cefd080d04dc47fd6a5a26a620874 + 1.
|
||||
Other configs which we had previously tested at 4e0d9af81fcce2ce4e777cb82a1990d7c2ca7c1e are:
|
||||
|
||||
* `arm` and `aarch64` configs present in the official ARM gem5 Linux kernel fork: <<gem5-arm-linux-kernel-patches>>. Some of the configs present there are added by the patches.
|
||||
* Jason's magic `x86_64` config: http://web.archive.org/web/20171229121642/http://www.lowepower.com/jason/files/config which is referenced at: link:http://web.archive.org/web/20171229121525/http://www.lowepower.com/jason/setting-up-gem5-full-system.html[]. QEMU boots with that by removing `# CONFIG_VIRTIO_PCI is not set`.
|
||||
* Jason's magic `x86_64` config: http://web.archive.org/web/20171229121642/http://www.lowepower.com/jason/files/config which is referenced at: http://web.archive.org/web/20171229121525/http://www.lowepower.com/jason/setting-up-gem5-full-system.html[]. QEMU boots with that by removing `# CONFIG_VIRTIO_PCI is not set`.
|
||||
|
||||
=== Kernel version
|
||||
|
||||
@@ -5886,7 +5888,7 @@ vermagic_fail: version magic 'asdfqwer' should be '4.17.0 SMP mod_unload modvers
|
||||
|
||||
Source: link:kernel_modules/vermagic_fail.c[]
|
||||
|
||||
The kernel's vermagic is defined based on compile time configurations at link:https://github.com/torvalds/linux/blob/v4.17/include/linux/vermagic.h#L35[include/linux/vermagic.h]:
|
||||
The kernel's vermagic is defined based on compile time configurations at https://github.com/torvalds/linux/blob/v4.17/include/linux/vermagic.h#L35[include/linux/vermagic.h]:
|
||||
|
||||
....
|
||||
#define VERMAGIC_STRING \
|
||||
@@ -5954,7 +5956,7 @@ We have to call: `kernel_fpu_begin()` before starting FPU operations, and `kerne
|
||||
insmod float.ko myfloat=1 enable_fpu=0
|
||||
....
|
||||
|
||||
The v5.1 documentation under link:https://github.com/cirosantilli/linux/blob/v5.1/arch/x86/include/asm/fpu/api.h#L15[arch/x86/include/asm/fpu/api.h] reads:
|
||||
The v5.1 documentation under https://github.com/cirosantilli/linux/blob/v5.1/arch/x86/include/asm/fpu/api.h#L15[arch/x86/include/asm/fpu/api.h] reads:
|
||||
|
||||
....
|
||||
* Use kernel_fpu_begin/end() if you intend to use FPU in kernel context. It
|
||||
@@ -6611,7 +6613,7 @@ However, we only store a single integer in memory and calculate the file on the
|
||||
|
||||
Bibliography:
|
||||
|
||||
* link:https://github.com/torvalds/linux/blob/v4.17/Documentation/filesystems/seq_file.txt[Documentation/filesystems/seq_file.txt]
|
||||
* https://github.com/torvalds/linux/blob/v4.17/Documentation/filesystems/seq_file.txt[Documentation/filesystems/seq_file.txt]
|
||||
* https://stackoverflow.com/questions/25399112/how-to-use-a-seq-file-in-linux-modules
|
||||
|
||||
===== seq_file single_open
|
||||
@@ -6895,7 +6897,7 @@ insmod sleep.ko n=5
|
||||
|
||||
Source: link:kernel_modules/sleep.c[]
|
||||
|
||||
The sleep is done with a call to link:https://github.com/torvalds/linux/blob/v4.17/kernel/time/timer.c#L1984[`usleep_range`] directly inside `module_init` for simplicity.
|
||||
The sleep is done with a call to https://github.com/torvalds/linux/blob/v4.17/kernel/time/timer.c#L1984[`usleep_range`] directly inside `module_init` for simplicity.
|
||||
|
||||
Bibliography:
|
||||
|
||||
@@ -6940,7 +6942,7 @@ Stop:
|
||||
rmmod work_from_work
|
||||
....
|
||||
|
||||
The sleep is done indirectly through: link:https://github.com/torvalds/linux/blob/v4.17/include/linux/workqueue.h#L522[`queue_delayed_work`], which waits the specified time before scheduling the work.
|
||||
The sleep is done indirectly through: https://github.com/torvalds/linux/blob/v4.17/include/linux/workqueue.h#L522[`queue_delayed_work`], which waits the specified time before scheduling the work.
|
||||
|
||||
Source: link:kernel_modules/work_from_work.c[]
|
||||
|
||||
@@ -7014,8 +7016,8 @@ Source: link:kernel_modules/wait_queue.c[]
|
||||
|
||||
This example launches three threads:
|
||||
|
||||
* one thread generates events every with link:https://github.com/torvalds/linux/blob/v4.17/include/linux/wait.h#L195[`wake_up`]
|
||||
* the other two threads wait for that with link:https://github.com/torvalds/linux/blob/v4.17/include/linux/wait.h#L286[`wait_event`], and print a dmesg when it happens.
|
||||
* one thread generates events every with https://github.com/torvalds/linux/blob/v4.17/include/linux/wait.h#L195[`wake_up`]
|
||||
* the other two threads wait for that with https://github.com/torvalds/linux/blob/v4.17/include/linux/wait.h#L286[`wait_event`], and print a dmesg when it happens.
|
||||
+
|
||||
The `wait_event` macro works a bit like:
|
||||
+
|
||||
@@ -7836,7 +7838,7 @@ It only appears once on every log I've seen so far, checked with `grep 0x1000000
|
||||
+
|
||||
Then when we count the instructions that run before the kernel entry point, there is only about 100k instructions, which is insignificant compared to the kernel boot itself.
|
||||
+
|
||||
TODO `--arch arm` and `--arch aarch64` does not count firmware instructions properly because the entry point address of the ELF file (`ffffff8008080000` for `aarch64`) does not show up on the trace at all. Tested on link:https://github.com/cirosantilli/linux-kernel-module-cheat/commit/f8c0502bb2680f2dbe7c1f3d7958f60265347005[f8c0502bb2680f2dbe7c1f3d7958f60265347005].
|
||||
TODO `--arch arm` and `--arch aarch64` does not count firmware instructions properly because the entry point address of the ELF file (`ffffff8008080000` for `aarch64`) does not show up on the trace at all. Tested on https://github.com/cirosantilli/linux-kernel-module-cheat/commit/f8c0502bb2680f2dbe7c1f3d7958f60265347005[f8c0502bb2680f2dbe7c1f3d7958f60265347005].
|
||||
* We can also discount the instructions after `init` runs by using `readelf` to get the initial address of `init`. One easy way to do that now is to just run:
|
||||
+
|
||||
....
|
||||
@@ -7969,7 +7971,7 @@ SELinux requires glibc: <<libc-choice>>.
|
||||
|
||||
=== User mode Linux
|
||||
|
||||
I once got link:https://en.wikipedia.org/wiki/User-mode_Linux[UML] running on a minimal Buildroot setup at: https://unix.stackexchange.com/questions/73203/how-to-create-rootfs-for-user-mode-linux-on-fedora-18/372207#372207
|
||||
I once got https://en.wikipedia.org/wiki/User-mode_Linux[UML] running on a minimal Buildroot setup at: https://unix.stackexchange.com/questions/73203/how-to-create-rootfs-for-user-mode-linux-on-fedora-18/372207#372207
|
||||
|
||||
But in part because it is dying, I didn't spend much effort to integrate it into this repo, although it would be a good fit in principle, since it is essentially a virtualization method.
|
||||
|
||||
@@ -8318,7 +8320,7 @@ tty1::respawn:-/bin/sh
|
||||
....
|
||||
+
|
||||
it makes the terminal go crazy, as if multiple processes are randomly eating up the characters.
|
||||
* `/dev/ttyN` for the other graphic TTYs. Note that there are only 63 available ones, from `/dev/tty1` to `/dev/tty63` (`/dev/tty0` is the current one): link:https://superuser.com/questions/449781/why-is-there-so-many-linux-dev-tty[]. I think this is determined by:
|
||||
* `/dev/ttyN` for the other graphic TTYs. Note that there are only 63 available ones, from `/dev/tty1` to `/dev/tty63` (`/dev/tty0` is the current one): https://superuser.com/questions/449781/why-is-there-so-many-linux-dev-tty[]. I think this is determined by:
|
||||
+
|
||||
....
|
||||
#define MAX_NR_CONSOLES 63
|
||||
@@ -8424,7 +8426,7 @@ Instead, the shell appears on `/dev/tty7`.
|
||||
|
||||
If you run in <<graphics>>, then you get a Penguin image for <<number-of-cores,every core>> above the console! https://askubuntu.com/questions/80938/is-it-possible-to-get-the-tux-logo-on-the-text-based-boot
|
||||
|
||||
This is due to the link:https://github.com/torvalds/linux/blob/v4.17/drivers/video/logo/Kconfig#L5[`CONFIG_LOGO=y`] option which we enable by default.
|
||||
This is due to the https://github.com/torvalds/linux/blob/v4.17/drivers/video/logo/Kconfig#L5[`CONFIG_LOGO=y`] option which we enable by default.
|
||||
|
||||
`reset` on the terminal then kills the poor penguins.
|
||||
|
||||
@@ -8497,7 +8499,7 @@ Bibliography:
|
||||
* https://en.wikipedia.org/wiki/Direct_Rendering_Manager
|
||||
* https://en.wikipedia.org/wiki/Mode_setting KMS
|
||||
|
||||
Tested on: link:https://github.com/cirosantilli/linux-kernel-module-cheat/commit/93e383902ebcc03d8a7ac0d65961c0e62af9612b[93e383902ebcc03d8a7ac0d65961c0e62af9612b]
|
||||
Tested on: https://github.com/cirosantilli/linux-kernel-module-cheat/commit/93e383902ebcc03d8a7ac0d65961c0e62af9612b[93e383902ebcc03d8a7ac0d65961c0e62af9612b]
|
||||
|
||||
==== kmscube
|
||||
|
||||
@@ -8546,7 +8548,7 @@ failed to initialize legacy DRM
|
||||
|
||||
See also: https://github.com/robclark/kmscube/issues/12 and https://stackoverflow.com/questions/26920835/can-egl-application-run-in-console-mode/26921287#26921287
|
||||
|
||||
Tested on: link:https://github.com/cirosantilli/linux-kernel-module-cheat/commit/2903771275372ccfecc2b025edbb0d04c4016930[2903771275372ccfecc2b025edbb0d04c4016930]
|
||||
Tested on: https://github.com/cirosantilli/linux-kernel-module-cheat/commit/2903771275372ccfecc2b025edbb0d04c4016930[2903771275372ccfecc2b025edbb0d04c4016930]
|
||||
|
||||
==== kmscon
|
||||
|
||||
@@ -8663,18 +8665,9 @@ Between all archs on QEMU and gem5 we touch all of those kernel built output fil
|
||||
|
||||
We are trying to maintain a description of each at: https://unix.stackexchange.com/questions/5518/what-is-the-difference-between-the-following-kernel-makefile-terms-vmlinux-vml/482978#482978
|
||||
|
||||
QEMU does not seem able to boot ELF files like `vmlinux`, only `objdump` code: https://superuser.com/questions/1376944/can-qemu-boot-linux-from-vmlinux-instead-of-bzimage
|
||||
QEMU does not seem able to boot ELF files like `vmlinux`: https://superuser.com/questions/1376944/can-qemu-boot-linux-from-vmlinux-instead-of-bzimage
|
||||
|
||||
Converting `arch/*` images to `vmlinux` is possible in x86 with link:https://github.com/torvalds/linux/blob/master/scripts/extract-vmlinux[`extract-vmlinux`]. But for arm it fails with:
|
||||
|
||||
....
|
||||
run-detectors: unable to find an interpreter for
|
||||
....
|
||||
|
||||
as mentioned at:
|
||||
|
||||
* https://unix.stackexchange.com/questions/352215/how-do-i-extract-vmlinux-from-an-arm-image
|
||||
* https://raspberrypi.stackexchange.com/questions/88621/why-doesnt-extract-vmlinux-work-with-raspbians-boot-kernel-img
|
||||
Converting `arch/*` images to `vmlinux` is possible in theory x86 with https://github.com/torvalds/linux/blob/v5.1/scripts/extract-vmlinux[`extract-vmlinux`] but we didn't get any gem5 boots working from images generated like that for some reason, see: https://github.com/cirosantilli/linux-kernel-module-cheat/issues/79
|
||||
|
||||
== Xen
|
||||
|
||||
@@ -8729,11 +8722,11 @@ Link on readme https://stackoverflow.com/questions/49348453/xen-on-qemu-with-arm
|
||||
|
||||
=== Introduction to QEMU
|
||||
|
||||
link:https://en.wikipedia.org/wiki/QEMU[QEMU] is a system simulator: it simulates a CPU and devices such as interrupt handlers, timers, UART, screen, keyboard, etc.
|
||||
https://en.wikipedia.org/wiki/QEMU[QEMU] is a system simulator: it simulates a CPU and devices such as interrupt handlers, timers, UART, screen, keyboard, etc.
|
||||
|
||||
If you are familiar with link:https://en.wikipedia.org/wiki/VirtualBox[VirtualBox], then QEMU then basically does the same thing: it opens a "window" inside your desktop that can run an operating system inside your operating system.
|
||||
If you are familiar with https://en.wikipedia.org/wiki/VirtualBox[VirtualBox], then QEMU then basically does the same thing: it opens a "window" inside your desktop that can run an operating system inside your operating system.
|
||||
|
||||
Also both can use very similar techniques: either link:https://en.wikipedia.org/wiki/Binary_translation[binary translation] or <<KVM>>. VirtualBox' binary translator is / was based on QEMU's it seems: https://en.wikipedia.org/wiki/VirtualBox#Software-based_virtualization
|
||||
Also both can use very similar techniques: either https://en.wikipedia.org/wiki/Binary_translation[binary translation] or <<KVM>>. VirtualBox' binary translator is / was based on QEMU's it seems: https://en.wikipedia.org/wiki/VirtualBox#Software-based_virtualization
|
||||
|
||||
The huge advantage of QEMU over VirtualBox is that is supports cross arch simulation, e.g. simulate an ARM guest on an x86 host.
|
||||
|
||||
@@ -8935,7 +8928,7 @@ For the more complex interfaces, we focus on simplified educational devices, eit
|
||||
|
||||
* present in the QEMU upstream:
|
||||
** <<qemu-edu>>
|
||||
* added in link:https://github.com/cirosantilli/qemu[our fork of QEMU]:
|
||||
* added in https://github.com/cirosantilli/qemu[our fork of QEMU]:
|
||||
** <<pci_min>>
|
||||
** <<platform_device>>
|
||||
|
||||
@@ -9013,7 +9006,7 @@ This example uses:
|
||||
* the QEMU `edu` educational device, which is a minimal educational in-tree PCI example
|
||||
* the `pci.ko` kernel module, which exercises the `edu` hardware.
|
||||
+
|
||||
I've contacted the awesome original author author of `edu` link:https://github.com/jirislaby[Jiri Slaby], and he told there is no official kernel module example because this was created for a kernel module university course that he gives, and he didn't want to give away answers. link:https://github.com/cirosantilli/how-to-teach-efficiently[I don't agree with that philosophy], so students, cheat away with this repo and go make startups instead.
|
||||
I've contacted the awesome original author author of `edu` https://github.com/jirislaby[Jiri Slaby], and he told there is no official kernel module example because this was created for a kernel module university course that he gives, and he didn't want to give away answers. https://github.com/cirosantilli/how-to-teach-efficiently[I don't agree with that philosophy], so students, cheat away with this repo and go make startups instead.
|
||||
|
||||
TODO exercise DMA on the kernel module. The `edu` hardware model has that feature:
|
||||
|
||||
@@ -9139,7 +9132,7 @@ handler irq = 11 dev = 251
|
||||
There are two versions of `setpci` and `lspci`:
|
||||
|
||||
* a simple one from BusyBox
|
||||
* a more complete one from link:https://github.com/pciutils/pciutils[pciutils] which Buildroot has a package for, and is the default on Ubuntu 18.04 host. This is the one we enable by default.
|
||||
* a more complete one from https://github.com/pciutils/pciutils[pciutils] which Buildroot has a package for, and is the default on Ubuntu 18.04 host. This is the one we enable by default.
|
||||
|
||||
===== Introduction to PCI
|
||||
|
||||
@@ -9235,7 +9228,7 @@ The best you can do is to hack our link:build[] script to add:
|
||||
HOST_QEMU_OPTS='--extra-cflags=-DDEBUG_PL061=1'
|
||||
....
|
||||
|
||||
where link:http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0190b/index.html[PL061] is the dominating ARM Holdings hardware that handles GPIO.
|
||||
where http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0190b/index.html[PL061] is the dominating ARM Holdings hardware that handles GPIO.
|
||||
|
||||
Then compile with:
|
||||
|
||||
@@ -9739,7 +9732,7 @@ Solved on unmerged c42634d8e3428cfa60672c3ba89cabefc720cde9 from https://github.
|
||||
|
||||
TODO get working.
|
||||
|
||||
QEMU replays support checkpointing, and this allows for a simplistic "reverse debugging" implementation proposed at https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg00478.html on the unmerged link:https://github.com/ispras/qemu/tree/rr-180725[]:
|
||||
QEMU replays support checkpointing, and this allows for a simplistic "reverse debugging" implementation proposed at https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg00478.html on the unmerged https://github.com/ispras/qemu/tree/rr-180725[]:
|
||||
|
||||
....
|
||||
./run --eval-after './linux/rand_check.out;./linux/poweroff.out;' --record
|
||||
@@ -9777,7 +9770,7 @@ just appears to output both cores intertwined without any clear differentiation.
|
||||
|
||||
==== gem5 tracing
|
||||
|
||||
gem5 provides also provides a tracing mechanism documented at: link:http://www.gem5.org/Trace_Based_Debugging[]:
|
||||
gem5 provides also provides a tracing mechanism documented at: http://www.gem5.org/Trace_Based_Debugging[]:
|
||||
|
||||
....
|
||||
./run --arch aarch64 --eval 'm5 exit' --emulator gem5 --trace Exec
|
||||
@@ -9845,13 +9838,13 @@ The output format is of type:
|
||||
There are two types of lines:
|
||||
|
||||
* full instructions, as the first line. Only shown if the `ExecMacro` flag is given.
|
||||
* micro ops that constitute the instruction, the lines that follow. Yes, `aarch64` also has microops: link:https://superuser.com/questions/934752/do-arm-processors-like-cortex-a9-use-microcode/934755#934755[]. Only shown if the `ExecMicro` flag is given.
|
||||
* micro ops that constitute the instruction, the lines that follow. Yes, `aarch64` also has microops: https://superuser.com/questions/934752/do-arm-processors-like-cortex-a9-use-microcode/934755#934755[]. Only shown if the `ExecMicro` flag is given.
|
||||
|
||||
Breakdown:
|
||||
|
||||
* `25007500`: time count in some unit. Note how the microops execute at further timestamps.
|
||||
* `system.cpu`: distinguishes between CPUs when there are more than one
|
||||
* `T0`: thread number. TODO: link:https://superuser.com/questions/133082/hyper-threading-and-dual-core-whats-the-difference/995858#995858[hyperthread]? How to play with it?
|
||||
* `T0`: thread number. TODO: https://superuser.com/questions/133082/hyper-threading-and-dual-core-whats-the-difference/995858#995858[hyperthread]? How to play with it?
|
||||
* `@start_kernel`: we are in the `start_kernel` function. Awesome feature! Implemented with libelf https://sourceforge.net/projects/elftoolchain/ copy pasted in-tree `ext/libelf`. To get raw addresses, remove the `ExecSymbol`, which is enabled by `Exec`. This can be done with `Exec,-ExecSymbol`.
|
||||
* `.1` as in `@start_kernel.1`: index of the microop
|
||||
* `stp`: instruction disassembly. Seems to use `.isa` files dispersed per arch, which is an in house format: http://gem5.org/ISA_description_system
|
||||
@@ -9909,7 +9902,7 @@ Getting started at: <<gem5-buildroot-setup>>.
|
||||
It is not of course truly cycle accurate, as that:
|
||||
+
|
||||
--
|
||||
** would require exposing proprietary information of the CPU designs: link:https://stackoverflow.com/questions/17454955/can-you-check-performance-of-a-program-running-with-qemu-simulator/33580850#33580850[]
|
||||
** would require exposing proprietary information of the CPU designs: https://stackoverflow.com/questions/17454955/can-you-check-performance-of-a-program-running-with-qemu-simulator/33580850#33580850[]
|
||||
** would make the simulation even slower TODO confirm, by how much
|
||||
--
|
||||
+
|
||||
@@ -10135,7 +10128,7 @@ But keep in mind that it only affects benchmark performance of the most detailed
|
||||
|
||||
* https://stackoverflow.com/questions/49011096/how-to-switch-cpu-models-in-gem5-after-restoring-a-checkpoint-and-then-observe-t
|
||||
|
||||
Cache sizes can in theory be checked with the methods described at: link:https://superuser.com/questions/55776/finding-l2-cache-size-in-linux[]:
|
||||
Cache sizes can in theory be checked with the methods described at: https://superuser.com/questions/55776/finding-l2-cache-size-in-linux[]:
|
||||
|
||||
....
|
||||
getconf -a | grep CACHE
|
||||
@@ -10289,7 +10282,7 @@ There are not yet enabled, but it should be easy to so, see: <<add-new-buildroot
|
||||
|
||||
The following benchmark setup works both:
|
||||
|
||||
* on host through timers + link:https://stackoverflow.com/questions/51952471/why-do-i-get-a-constant-instead-of-logarithmic-curve-for-an-insert-time-benchmar/51953081#51953081[granule]
|
||||
* on host through timers + https://stackoverflow.com/questions/51952471/why-do-i-get-a-constant-instead-of-logarithmic-curve-for-an-insert-time-benchmar/51953081#51953081[granule]
|
||||
* gem5 with <<m5ops-instructions,dumpstats>>, which can get more precise results with `granule == 1`
|
||||
|
||||
It has been used to answer:
|
||||
@@ -10427,11 +10420,11 @@ Source: link:userland/libs/eigen/hello.cpp[]
|
||||
|
||||
This example just creates a matrix and prints it out.
|
||||
|
||||
Tested on: link:https://github.com/cirosantilli/linux-kernel-module-cheat/commit/a4bdcf102c068762bb1ef26c591fcf71e5907525[a4bdcf102c068762bb1ef26c591fcf71e5907525]
|
||||
Tested on: https://github.com/cirosantilli/linux-kernel-module-cheat/commit/a4bdcf102c068762bb1ef26c591fcf71e5907525[a4bdcf102c068762bb1ef26c591fcf71e5907525]
|
||||
|
||||
===== PARSEC benchmark
|
||||
|
||||
We have ported parts of the link:http://parsec.cs.princeton.edu[PARSEC benchmark] for cross compilation at: https://github.com/cirosantilli/parsec-benchmark See the documentation on that repo to find out which benchmarks have been ported. Some of the benchmarks were are segfaulting, they are documented in that repo.
|
||||
We have ported parts of the http://parsec.cs.princeton.edu[PARSEC benchmark] for cross compilation at: https://github.com/cirosantilli/parsec-benchmark See the documentation on that repo to find out which benchmarks have been ported. Some of the benchmarks were are segfaulting, they are documented in that repo.
|
||||
|
||||
There are two ways to run PARSEC with this repo:
|
||||
|
||||
@@ -10484,7 +10477,7 @@ Large input may also require tweaking:
|
||||
|
||||
`test.sh` only contains the run commands for the `test` size, and cannot be used for `simsmall`.
|
||||
|
||||
The easiest thing to do, is to link:https://superuser.com/questions/231002/how-can-i-search-within-the-output-buffer-of-a-tmux-shell/1253137#1253137[scroll up on the host shell] after the build, and look for a line of type:
|
||||
The easiest thing to do, is to https://superuser.com/questions/231002/how-can-i-search-within-the-output-buffer-of-a-tmux-shell/1253137#1253137[scroll up on the host shell] after the build, and look for a line of type:
|
||||
|
||||
....
|
||||
Running /root/linux-kernel-module-cheat/out/aarch64/buildroot/build/parsec-benchmark-custom/ext/splash2x/apps/ocean_ncp/inst/aarch64-linux.gcc/bin/ocean_ncp -n2050 -p1 -e1e-07 -r20000 -t28800
|
||||
@@ -10782,7 +10775,7 @@ printf 'sh' > "$(./getvar gem5_readfile_file)"
|
||||
|
||||
Since this is such a common setup, we provide the following helpers for this operation:
|
||||
|
||||
* link:rootfs_overlay/lkmc/gem5.sh[]. This script is analogous to gem5's in-tree link:https://github.com/gem5/gem5/blob/2b4b94d0556c2d03172ebff63f7fc502c3c26ff8/configs/boot/hack_back_ckpt.rcS[hack_back_ckpt.rcS], but with less noise.
|
||||
* link:rootfs_overlay/lkmc/gem5.sh[]. This script is analogous to gem5's in-tree https://github.com/gem5/gem5/blob/2b4b94d0556c2d03172ebff63f7fc502c3c26ff8/configs/boot/hack_back_ckpt.rcS[hack_back_ckpt.rcS], but with less noise.
|
||||
* `./run --gem5-readfile` is a convenient way to set the `m5 readfile`
|
||||
|
||||
Their usage us exemplified at <<gem5-run-benchmark>>.
|
||||
@@ -11065,7 +11058,7 @@ Then, from inside <<gem5-buildroot-setup>>, test it out with:
|
||||
|
||||
In theory, the cleanest way to add m5ops to your benchmarks would be to do exactly what the `m5` tool does:
|
||||
|
||||
* include link:https://github.com/gem5/gem5/blob/05c4c2b566ce351ab217b2bd7035562aa7a76570/include/gem5/asm/generic/m5ops.h[`include/gem5/asm/generic/m5ops.h`]
|
||||
* include https://github.com/gem5/gem5/blob/05c4c2b566ce351ab217b2bd7035562aa7a76570/include/gem5/asm/generic/m5ops.h[`include/gem5/asm/generic/m5ops.h`]
|
||||
* link with the `.o` file under `util/m5` for the correct arch, e.g. `m5op_arm_A64.o` for aarch64.
|
||||
|
||||
However, I think it is usually not worth the trouble of hacking up the build system of the benchmark to do this, and I recommend just hardcoding in a few raw instructions here and there, and managing it with version control + `sed`.
|
||||
@@ -11079,9 +11072,9 @@ Bibliography:
|
||||
|
||||
Let's study how <<m5>> uses them:
|
||||
|
||||
* link:https://github.com/gem5/gem5/blob/05c4c2b566ce351ab217b2bd7035562aa7a76570/include/gem5/asm/generic/m5ops.h[`include/gem5/asm/generic/m5ops.h`]: defines the magic constants that represent the instructions
|
||||
* link:https://github.com/gem5/gem5/blob/05c4c2b566ce351ab217b2bd7035562aa7a76570/util/m5/m5op_arm_A64.S[`util/m5/m5op_arm_A64.S`]: use the magic constants that represent the instructions using C preprocessor magic
|
||||
* link:https://github.com/gem5/gem5/blob/05c4c2b566ce351ab217b2bd7035562aa7a76570/util/m5/m5.c[`util/m5/m5.c`]: the actual executable. Gets linked to `m5op_arm_A64.S` which defines a function for each m5op.
|
||||
* https://github.com/gem5/gem5/blob/05c4c2b566ce351ab217b2bd7035562aa7a76570/include/gem5/asm/generic/m5ops.h[`include/gem5/asm/generic/m5ops.h`]: defines the magic constants that represent the instructions
|
||||
* https://github.com/gem5/gem5/blob/05c4c2b566ce351ab217b2bd7035562aa7a76570/util/m5/m5op_arm_A64.S[`util/m5/m5op_arm_A64.S`]: use the magic constants that represent the instructions using C preprocessor magic
|
||||
* https://github.com/gem5/gem5/blob/05c4c2b566ce351ab217b2bd7035562aa7a76570/util/m5/m5.c[`util/m5/m5.c`]: the actual executable. Gets linked to `m5op_arm_A64.S` which defines a function for each m5op.
|
||||
|
||||
We notice that there are two different implementations for each arch:
|
||||
|
||||
@@ -11119,7 +11112,7 @@ Finally, `m5.c` calls the defined functions as in:
|
||||
m5_exit(ints[0]);
|
||||
....
|
||||
|
||||
Therefore, the runtime "argument" that gets passed to the instruction, e.g. the delay in ticks until the exit for `m5 exit`, gets passed directly through the link:https://en.wikipedia.org/wiki/Calling_convention#ARM_(A64)[aarch64 calling convention].
|
||||
Therefore, the runtime "argument" that gets passed to the instruction, e.g. the delay in ticks until the exit for `m5 exit`, gets passed directly through the https://en.wikipedia.org/wiki/Calling_convention#ARM_(A64)[aarch64 calling convention].
|
||||
|
||||
Keep in mind that for all archs, `m5.c` does the calls with 64-bit integers:
|
||||
|
||||
@@ -11154,7 +11147,7 @@ The patches are optional: the vanilla kernel does boot. But they add some intere
|
||||
|
||||
The patches also <<notable-alternate-gem5-kernel-configs,add defconfigs>> that are known to work well with gem5.
|
||||
|
||||
E.g. for arm v4.9 there is: link:https://gem5.googlesource.com/arm/linux/+/917e007a4150d26a0aa95e4f5353ba72753669c7/arch/arm/configs/gem5_defconfig[].
|
||||
E.g. for arm v4.9 there is: https://gem5.googlesource.com/arm/linux/+/917e007a4150d26a0aa95e4f5353ba72753669c7/arch/arm/configs/gem5_defconfig[].
|
||||
|
||||
In order to use those patches and their associated configs, and, we recommend using <<linux-kernel-build-variants>> as:
|
||||
|
||||
@@ -11206,7 +11199,7 @@ Tested on 649d06d6758cefd080d04dc47fd6a5a26a620874 + 1.
|
||||
|
||||
We have observed that with the kernel patches, boot is 2x faster, falling from 1m40s to 50s.
|
||||
|
||||
With link:https://stackoverflow.com/questions/49797246/how-to-monitor-for-how-much-time-each-line-of-stdout-was-the-last-output-line-in/49797547#49797547[`ts`], we see that a large part of the difference is at the message:
|
||||
With https://stackoverflow.com/questions/49797246/how-to-monitor-for-how-much-time-each-line-of-stdout-was-the-last-output-line-in/49797547#49797547[`ts`], we see that a large part of the difference is at the message:
|
||||
|
||||
....
|
||||
clocksource: Switched to clocksource arch_sys_counter
|
||||
@@ -11304,7 +11297,7 @@ Each node has:
|
||||
* a list of parameters, e.g. `system.semihosting` is `Null`, which means that <<semihosting>> was turned off
|
||||
** the `type` parameter shows is present on every node, and it maps to a `Python` object that inherits from `SimObject`.
|
||||
+
|
||||
For example, `AtomicSimpleCPU` maps is defined at link:https://github.com/gem5/gem5/blob/05c4c2b566ce351ab217b2bd7035562aa7a76570/src/cpu/simple/AtomicSimpleCPU.py#L45[src/cpu/simple/AtomicSimpleCPU.py].
|
||||
For example, `AtomicSimpleCPU` maps is defined at https://github.com/gem5/gem5/blob/05c4c2b566ce351ab217b2bd7035562aa7a76570/src/cpu/simple/AtomicSimpleCPU.py#L45[src/cpu/simple/AtomicSimpleCPU.py].
|
||||
|
||||
You can also get a simplified graphical view of the tree with:
|
||||
|
||||
@@ -11342,7 +11335,7 @@ This is not normally possible with Buildroot, since normal Buildroot packages fi
|
||||
|
||||
So if you modified the Python scripts with this setup, you would still need to `./build` to copy the modified files over.
|
||||
|
||||
For gem5 specifically however, we have hacked up the build so that we `cd` into the `submodules/gem5` tree, and then do an link:https://stackoverflow.com/questions/54343515/how-to-build-gem5-out-of-tree/54343516#54343516[out of tree] build to `out/common/gem5`.
|
||||
For gem5 specifically however, we have hacked up the build so that we `cd` into the `submodules/gem5` tree, and then do an https://stackoverflow.com/questions/54343515/how-to-build-gem5-out-of-tree/54343516#54343516[out of tree] build to `out/common/gem5`.
|
||||
|
||||
Another advantage of this method is the we factor out the `arm` and `aarch64` gem5 builds which are identical and large, as well as the smaller arch generic pieces.
|
||||
|
||||
@@ -11385,7 +11378,7 @@ We setup 2 big and 2 small CPUs, but `cat /proc/cpuinfo` shows 4 identical CPUs
|
||||
|
||||
TODO: why is the `--dtb` required despite `fs_bigLITTLE.py` having a DTB generation capability? Without it, nothing shows on terminal, and the simulation terminates with `simulate() limit reached @ 18446744073709551615`. The magic `vmlinux.vexpress_gem5_v1.20170616` works however without a DTB.
|
||||
|
||||
Tested on: link:https://github.com/cirosantilli/linux-kernel-module-cheat/commit/18c1c823feda65f8b54cd38e261c282eee01ed9f[18c1c823feda65f8b54cd38e261c282eee01ed9f]
|
||||
Tested on: https://github.com/cirosantilli/linux-kernel-module-cheat/commit/18c1c823feda65f8b54cd38e261c282eee01ed9f[18c1c823feda65f8b54cd38e261c282eee01ed9f]
|
||||
|
||||
=== gem5 unit tests
|
||||
|
||||
@@ -11455,12 +11448,12 @@ sudo apt-get install clang
|
||||
|
||||
=== Introduction to Buildroot
|
||||
|
||||
link:https://en.wikipedia.org/wiki/Buildroot[Buildroot] is a set of Make scripts that download and compile from source compatible versions of:
|
||||
https://en.wikipedia.org/wiki/Buildroot[Buildroot] is a set of Make scripts that download and compile from source compatible versions of:
|
||||
|
||||
* GCC
|
||||
* Linux kernel
|
||||
* C standard library: Buildroot supports several implementations, see: <<libc-choice>>
|
||||
* link:https://en.wikipedia.org/wiki/BusyBox[BusyBox]: provides the shell and basic command line utilities
|
||||
* https://en.wikipedia.org/wiki/BusyBox[BusyBox]: provides the shell and basic command line utilities
|
||||
|
||||
It therefore produces a pristine, blob-less, debuggable setup, where all moving parts are configured to work perfectly together.
|
||||
|
||||
@@ -11638,7 +11631,7 @@ Then iterate trying to do what you want and reading the manual until it works: h
|
||||
|
||||
Once you've built a package in to the image, there is no easy way to remove it.
|
||||
|
||||
Documented at: link:https://github.com/buildroot/buildroot/blob/2017.08/docs/manual/rebuilding-packages.txt#L90[]
|
||||
Documented at: https://github.com/buildroot/buildroot/blob/2017.08/docs/manual/rebuilding-packages.txt#L90[]
|
||||
|
||||
Also mentioned at: https://stackoverflow.com/questions/47320800/how-to-clean-only-target-in-buildroot
|
||||
|
||||
@@ -11672,16 +11665,16 @@ Some promising ways to overcome this problem include:
|
||||
|
||||
* <<squashfs>>
|
||||
TODO benchmark: would gem5 suffer a considerable disk read performance hit due to decompressing SquashFS?
|
||||
* libguestfs: link:https://serverfault.com/questions/246835/convert-directory-to-qemu-kvm-virtual-disk-image/916697#916697[], in particular link:http://libguestfs.org/guestfish.1.html#vfs-minimum-size[`vfs-minimum-size`]
|
||||
* libguestfs: https://serverfault.com/questions/246835/convert-directory-to-qemu-kvm-virtual-disk-image/916697#916697[], in particular http://libguestfs.org/guestfish.1.html#vfs-minimum-size[`vfs-minimum-size`]
|
||||
* use methods described at: <<gem5-restore-new-script>> instead of putting builds on the root filesystem
|
||||
|
||||
Bibliography: https://stackoverflow.com/questions/49211241/is-there-a-way-to-automatically-detect-the-minimum-required-br2-target-rootfs-ex
|
||||
|
||||
==== SquashFS
|
||||
|
||||
link:https://en.wikipedia.org/wiki/SquashFS[SquashFS] creation with `mksquashfs` does not take fixed sizes, and I have successfully booted from it, but it is readonly, which is unacceptable.
|
||||
https://en.wikipedia.org/wiki/SquashFS[SquashFS] creation with `mksquashfs` does not take fixed sizes, and I have successfully booted from it, but it is readonly, which is unacceptable.
|
||||
|
||||
But then we could mount link:https://wiki.debian.org/ramfs[ramfs] on top of it with <<overlayfs>> to make it writable, but my attempts failed exactly as mentioned at <<overlayfs>>.
|
||||
But then we could mount https://wiki.debian.org/ramfs[ramfs] on top of it with <<overlayfs>> to make it writable, but my attempts failed exactly as mentioned at <<overlayfs>>.
|
||||
|
||||
This is the exact unanswered question: https://unix.stackexchange.com/questions/343484/mounting-squashfs-image-with-read-write-overlay-for-rootfs
|
||||
|
||||
@@ -11692,7 +11685,7 @@ Buildroot is not designed for large root filesystem images, and the rebuild beco
|
||||
|
||||
This is due mainly to the `pkg-generic` `GLOBAL_INSTRUMENTATION_HOOKS` sanitation which go over the entire tree doing complex operations... I no like, in particular `check_bin_arch` and `check_host_rpath`
|
||||
|
||||
We have applied link:https://github.com/cirosantilli/buildroot/commit/983fe7910a73923a4331e7d576a1e93841d53812[983fe7910a73923a4331e7d576a1e93841d53812] to out Buildroot fork which removes part of the pain by not running:
|
||||
We have applied https://github.com/cirosantilli/buildroot/commit/983fe7910a73923a4331e7d576a1e93841d53812[983fe7910a73923a4331e7d576a1e93841d53812] to out Buildroot fork which removes part of the pain by not running:
|
||||
|
||||
....
|
||||
>>> Sanitizing RPATH in target tree
|
||||
@@ -11774,8 +11767,8 @@ or try to reproduce with a minimal config, see: https://github.com/cirosantilli/
|
||||
|
||||
Buildroot supports several libc implementations, including:
|
||||
|
||||
* link:https://en.wikipedia.org/wiki/GNU_C_Library[glibc]
|
||||
* link:https://en.wikipedia.org/wiki/UClibc[uClibc]
|
||||
* https://en.wikipedia.org/wiki/GNU_C_Library[glibc]
|
||||
* https://en.wikipedia.org/wiki/UClibc[uClibc]
|
||||
|
||||
We currently use glibc, which is selected by:
|
||||
|
||||
@@ -11812,7 +11805,7 @@ This section was originally moved in here from: https://github.com/cirosantilli/
|
||||
|
||||
=== C
|
||||
|
||||
Programs under link:userland/c/[] are examples of link:https://en.wikipedia.org/wiki/ANSI_C[ANSI C] programming:
|
||||
Programs under link:userland/c/[] are examples of https://en.wikipedia.org/wiki/ANSI_C[ANSI C] programming:
|
||||
|
||||
* link:userland/c/hello.c[]
|
||||
* `main` and environment
|
||||
@@ -11868,7 +11861,7 @@ The implementation lives under `libgomp` in the GCC tree, and is documented at:
|
||||
[[cpp]]
|
||||
=== C++
|
||||
|
||||
Programs under link:userland/cpp/[] are examples of link:https://en.wikipedia.org/wiki/C%2B%2B#Standardization[ISO C] programming.
|
||||
Programs under link:userland/cpp/[] are examples of https://en.wikipedia.org/wiki/C%2B%2B#Standardization[ISO C] programming.
|
||||
|
||||
* link:userland/cpp/empty.cpp[]
|
||||
* link:userland/cpp/hello.cpp[]
|
||||
@@ -12394,7 +12387,7 @@ Bibliography:
|
||||
|
||||
=== GNU GAS assembler
|
||||
|
||||
link:https://en.wikipedia.org/wiki/GNU_Assembler[GNU GAS] is the default assembler used by GDB, and therefore it completely dominates in Linux.
|
||||
https://en.wikipedia.org/wiki/GNU_Assembler[GNU GAS] is the default assembler used by GDB, and therefore it completely dominates in Linux.
|
||||
|
||||
The Linux kernel in particular uses GNU GAS assembly extensively for the arch specific parts under `arch/`.
|
||||
|
||||
@@ -13083,12 +13076,12 @@ Parent section: <<simd-assembly>>
|
||||
|
||||
History:
|
||||
|
||||
* link:https://en.wikipedia.org/wiki/MMX_(instruction_set)[MMX]: MultiMedia eXtension (unofficial name). 1997. MM0-MM7 64-bit registers.
|
||||
* link:https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions[SSE]: Streaming SIMD Extensions. 1999. XMM0-XMM7 128-bit registers, XMM0-XMM15 for AMD in 64-bit mode.
|
||||
* link:https://en.wikipedia.org/wiki/SSE2[SSE2]: 2004
|
||||
* link:https://en.wikipedia.org/wiki/SSE3[SSE3]: 2006
|
||||
* link:https://en.wikipedia.org/wiki/SSE4[SSE4]: 2006
|
||||
* link:https://en.wikipedia.org/wiki/Advanced_Vector_Extensions[AVX]: Advanced Vector Extensions. 2011. YMM0–YMM15 256-bit registers in 64-bit mode. Extension of XMM.
|
||||
* https://en.wikipedia.org/wiki/MMX_(instruction_set)[MMX]: MultiMedia eXtension (unofficial name). 1997. MM0-MM7 64-bit registers.
|
||||
* https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions[SSE]: Streaming SIMD Extensions. 1999. XMM0-XMM7 128-bit registers, XMM0-XMM15 for AMD in 64-bit mode.
|
||||
* https://en.wikipedia.org/wiki/SSE2[SSE2]: 2004
|
||||
* https://en.wikipedia.org/wiki/SSE3[SSE3]: 2006
|
||||
* https://en.wikipedia.org/wiki/SSE4[SSE4]: 2006
|
||||
* https://en.wikipedia.org/wiki/Advanced_Vector_Extensions[AVX]: Advanced Vector Extensions. 2011. YMM0–YMM15 256-bit registers in 64-bit mode. Extension of XMM.
|
||||
* AVX2:2013
|
||||
* AVX-512: 2016. 512-bit ZMM registers. Extension of YMM.
|
||||
|
||||
@@ -13158,7 +13151,7 @@ RDTSC stores its output to EDX:EAX, even in 64-bit mode, top bits are zeroed out
|
||||
|
||||
TODO: review this section, make a more controlled userland experiment with <<m5ops>> instrumentation.
|
||||
|
||||
Let's have some fun and try to correlate the gem5 <<gem5-stats-txt>> `system.cpu.numCycles` cycle count with the link:https://en.wikipedia.org/wiki/Time_Stamp_Counter[x86 RDTSC instruction] that is supposed to do the same thing:
|
||||
Let's have some fun and try to correlate the gem5 <<gem5-stats-txt>> `system.cpu.numCycles` cycle count with the https://en.wikipedia.org/wiki/Time_Stamp_Counter[x86 RDTSC instruction] that is supposed to do the same thing:
|
||||
|
||||
....
|
||||
./build-userland --static userland/arch/x86_64/inline_asm/rdtsc.S
|
||||
@@ -13277,7 +13270,7 @@ We cover here mostly ARMv7, and then treat aarch64 differentially, since much of
|
||||
|
||||
=== Introduction to the ARM architecture
|
||||
|
||||
The link:https://en.wikipedia.org/wiki/ARM_architecture[ARM architecture] is has been used on the vast majority of mobile phones in the 2010's, and on a large fraction of micro controllers.
|
||||
The https://en.wikipedia.org/wiki/ARM_architecture[ARM architecture] is has been used on the vast majority of mobile phones in the 2010's, and on a large fraction of micro controllers.
|
||||
|
||||
It competes with <<x86-userland-assembly>> because its implementations are designed for low power consumption, which is a major requirement of the cell phone market.
|
||||
|
||||
@@ -13291,7 +13284,7 @@ ARM Holdings was bought by the Japanese giant SoftBank in 2016.
|
||||
|
||||
ARMv7 is the older architecture described at: <<armarm7>>.
|
||||
|
||||
ARMv8 is the newer architecture ISA link:https://developer.arm.com/docs/den0024/latest/preface[released in 2013] and described at: <<armarm8>>. It can be in either of two states:
|
||||
ARMv8 is the newer architecture ISA https://developer.arm.com/docs/den0024/latest/preface[released in 2013] and described at: <<armarm8>>. It can be in either of two states:
|
||||
|
||||
* <<aarch32>>
|
||||
* aarch64
|
||||
@@ -13301,7 +13294,7 @@ In the lose terminology of this repository:
|
||||
* `arm` means basically AArch32
|
||||
* `aarch64` means ARMv8 AArch64
|
||||
|
||||
ARMv8 has link:https://en.wikipedia.org/wiki/ARM_architecture#ARMv8-A[had several updates] since its release:
|
||||
ARMv8 has https://en.wikipedia.org/wiki/ARM_architecture#ARMv8-A[had several updates] since its release:
|
||||
|
||||
* v8.1: 2014
|
||||
* v8.2: 2016
|
||||
@@ -13352,7 +13345,7 @@ This licensing however does have the following fairness to it: ARM Holdings inve
|
||||
Patents for very old ISAs however have expired, Amber is one implementation of those: https://en.wikipedia.org/wiki/Amber_%28processor_core%29 TODO does it have any application?
|
||||
|
||||
|
||||
Generally, it is mostly large companies that implement the CPUs themselves. For example, the link:https://en.wikipedia.org/wiki/Apple_A12[Apple A12 chip], which is used in iPhones, has verilog designs:
|
||||
Generally, it is mostly large companies that implement the CPUs themselves. For example, the https://en.wikipedia.org/wiki/Apple_A12[Apple A12 chip], which is used in iPhones, has verilog designs:
|
||||
|
||||
____
|
||||
The A12 features an Apple-designed 64-bit ARMv8.3-A six-core CPU, with two high-performance cores running at 2.49 GHz called Vortex and four energy-efficient cores called Tempest.
|
||||
@@ -13528,7 +13521,7 @@ Everything else works on register and immediates.
|
||||
|
||||
This is part of the RISC-y beauty of the ARM instruction set, unlike x86 in which several operations can read from memory, and helps to predict how to optimize for a given CPU pipeline.
|
||||
|
||||
This kind of architecture is called a link:https://en.wikipedia.org/wiki/Load/store_architecture[Load/store architecture].
|
||||
This kind of architecture is called a https://en.wikipedia.org/wiki/Load/store_architecture[Load/store architecture].
|
||||
|
||||
==== ARM LDR instruction
|
||||
|
||||
@@ -13984,7 +13977,7 @@ Like <<arm-vcvt-instruction>>, but the rounding mode is selected by the FPSCR.RM
|
||||
|
||||
Selecting rounding mode explicitly per instruction was apparently not possible in ARMv7, but was made possible in <<aarch32>> e.g. with <<armv8-aarch32-vcvta-instruction>>.
|
||||
|
||||
Rounding mode selection is exposed in the ANSI C standard through link:https://en.cppreference.com/w/c/numeric/fenv/feround[`fesetround`].
|
||||
Rounding mode selection is exposed in the ANSI C standard through https://en.cppreference.com/w/c/numeric/fenv/feround[`fesetround`].
|
||||
|
||||
TODO: is the initial rounding mode specified by the ELF standard? Could not find a reference.
|
||||
|
||||
@@ -14099,7 +14092,7 @@ There are analogous LD3 and LD4 instruction.
|
||||
|
||||
==== ARM SIMD bibliography
|
||||
|
||||
* GNU GAS tests under link:https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=tree;f=gas/testsuite/gas/aarch64;hb=00f223631fa9803b783515a2f667f86997e2cdbe[`gas/testsuite/gas/aarch64`]
|
||||
* GNU GAS tests under https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=tree;f=gas/testsuite/gas/aarch64;hb=00f223631fa9803b783515a2f667f86997e2cdbe[`gas/testsuite/gas/aarch64`]
|
||||
* https://stackoverflow.com/questions/2851421/is-there-a-good-reference-for-arm-neon-intrinsics
|
||||
* assembly optimized libraries:
|
||||
** https://github.com/projectNe10/Ne10
|
||||
@@ -14176,7 +14169,7 @@ Good getting started tutorials:
|
||||
|
||||
==== ARM official bibliography
|
||||
|
||||
The official manuals were stored in http://infocenter.arm.com but as of 2017 they started to slowly move to link:https://developer.arm.com[].
|
||||
The official manuals were stored in http://infocenter.arm.com but as of 2017 they started to slowly move to https://developer.arm.com[].
|
||||
|
||||
Each revision of a document has a "ARM DDI" unique document identifier.
|
||||
|
||||
@@ -14937,9 +14930,9 @@ shows something like:
|
||||
2 Thread 2 (CPU#1 [halted ]) lkmc_start
|
||||
....
|
||||
|
||||
To wake up CPU 1 on QEMU, we must use the Power State Coordination Interface (PSCI) which is documented at: link:https://developer.arm.com/docs/den0022/latest/arm-power-state-coordination-interface-platform-design-document[].
|
||||
To wake up CPU 1 on QEMU, we must use the Power State Coordination Interface (PSCI) which is documented at: https://developer.arm.com/docs/den0022/latest/arm-power-state-coordination-interface-platform-design-document[].
|
||||
|
||||
This interface uses HVC calls, and the calling convention is documented at "SMC CALLING CONVENTION" link:https://developer.arm.com/docs/den0028/latest[].
|
||||
This interface uses HVC calls, and the calling convention is documented at "SMC CALLING CONVENTION" https://developer.arm.com/docs/den0028/latest[].
|
||||
|
||||
If we boot the Linux kernel on QEMU and <<get-device-tree-from-a-running-kernel,dump the auto-generated device tree>>, we observe that it contains the address of the PSCI CPU_ON call:
|
||||
|
||||
@@ -15028,7 +15021,7 @@ reg = <0x0 0x8000000 0x0 0x10000 0x0 0x8010000 0x0 0x10000>;
|
||||
|
||||
which further confirms that the exception is correct: v2 has a register range at 0x8010000 while in v3 it moved to 0x80a0000 and 0x8010000 is empty.
|
||||
|
||||
The original source does not mention GICv3 anywhere, only link:https://github.com/takeharukato/sample-tsk-sw/blob/c7bbc9dce6b14660bcce8d20735f8c6ebb09396b/hal/aarch64/gic-pl390.c[pl390], which is a specific GIC model that predates the GICv2 spec I believe.
|
||||
The original source does not mention GICv3 anywhere, only https://github.com/takeharukato/sample-tsk-sw/blob/c7bbc9dce6b14660bcce8d20735f8c6ebb09396b/hal/aarch64/gic-pl390.c[pl390], which is a specific GIC model that predates the GICv2 spec I believe.
|
||||
|
||||
TODO if I hack `#define GIC_GICC_BASE (GIC_BASE + 0xa0000)`, then it goes a bit further, but the next loop never ends.
|
||||
|
||||
@@ -15390,7 +15383,7 @@ I don't know how to download files from the web on Vanilla android, the default
|
||||
|
||||
Installing with `adb install` does however work: https://stackoverflow.com/questions/7076240/install-an-apk-file-from-command-prompt
|
||||
|
||||
link:https://f-droid.org[F-Droid] installed fine like that, however it does not have permission to install apps: https://www.maketecheasier.com/install-apps-from-unknown-sources-android/
|
||||
https://f-droid.org[F-Droid] installed fine like that, however it does not have permission to install apps: https://www.maketecheasier.com/install-apps-from-unknown-sources-android/
|
||||
|
||||
And the `Settings` app crashes so I can't change it, logcat contains:
|
||||
|
||||
@@ -15582,7 +15575,7 @@ Kernel panic - not syncing: Attempted to kill the idle task!
|
||||
|
||||
==== Benchmark builds
|
||||
|
||||
The build times are calculated after doing `./configure` and link:https://buildroot.org/downloads/manual/manual.html#_offline_builds[`make source`], which downloads the sources, and basically benchmarks the <<benchmark-internets,Internet>>.
|
||||
The build times are calculated after doing `./configure` and https://buildroot.org/downloads/manual/manual.html#_offline_builds[`make source`], which downloads the sources, and basically benchmarks the <<benchmark-internets,Internet>>.
|
||||
|
||||
Sample build time at 2c12b21b304178a81c9912817b782ead0286d282: 28 minutes, 15 with full ccache hits. Breakdown: 19% GCC, 13% Linux kernel, 7% uclibc, 6% host-python, 5% host-qemu, 5% host-gdb, 2% host-binutils
|
||||
|
||||
@@ -15701,7 +15694,7 @@ Tested at: d4b3e064adeeace3c3e7d106801f95c14637c12f + 1.
|
||||
|
||||
==== P51
|
||||
|
||||
Lenovo ThinkPad link:https://www3.lenovo.com/gb/en/laptops/thinkpad/p-series/P51/p/22TP2WPWP51[P51 laptop]:
|
||||
Lenovo ThinkPad https://www3.lenovo.com/gb/en/laptops/thinkpad/p-series/P51/p/22TP2WPWP51[P51 laptop]:
|
||||
|
||||
* 2500 USD in 2018 (high end)
|
||||
* Intel Core i7-7820HQ Processor (8MB Cache, up to 3.90GHz) (4 cores 8 threads)
|
||||
@@ -15725,7 +15718,7 @@ Google M-lab speed test: 36.4Mbps
|
||||
|
||||
gem5:
|
||||
|
||||
* link:https://www.mail-archive.com/gem5-users@gem5.org/msg15262.html[] which parts of the gem5 code make it slow
|
||||
* https://www.mail-archive.com/gem5-users@gem5.org/msg15262.html[] which parts of the gem5 code make it slow
|
||||
* what are the minimum system requirements:
|
||||
** https://stackoverflow.com/questions/47997565/gem5-system-requirements-for-decent-performance/48941793#48941793
|
||||
** https://github.com/gem5/gem5/issues/25
|
||||
@@ -15740,7 +15733,7 @@ Some setups of this repository are very portable, notably setups under <<userlan
|
||||
|
||||
The least portable setups are those that require Buildroot and crosstool-NG.
|
||||
|
||||
We tend to test this repo the most on the latest Ubuntu and on the latest link:https://askubuntu.com/questions/16366/whats-the-difference-between-a-long-term-support-release-and-a-normal-release[Ubuntu LTS].
|
||||
We tend to test this repo the most on the latest Ubuntu and on the latest https://askubuntu.com/questions/16366/whats-the-difference-between-a-long-term-support-release-and-a-normal-release[Ubuntu LTS].
|
||||
|
||||
For other Linux distros, everything will likely also just work if you install the analogous required packages for your distro.
|
||||
|
||||
@@ -15819,7 +15812,7 @@ You can also choose a different configuration file explicitly with:
|
||||
|
||||
Almost all options names are automatically deduced from their command line `--help` name: just replace `-` with `_`.
|
||||
|
||||
More precisely, we use the `dest=` value of Python's link:https://docs.python.org/3/library/argparse.html[argparse module].
|
||||
More precisely, we use the `dest=` value of Python's https://docs.python.org/3/library/argparse.html[argparse module].
|
||||
|
||||
To get a list of all global options that you can use, try:
|
||||
|
||||
@@ -15941,7 +15934,7 @@ Source: link:publish-gh-pages[]
|
||||
I'm going this way for now because:
|
||||
|
||||
* the Jekyll Asciidoctor plugin is not enabled by default on GitHub: https://webapps.stackexchange.com/questions/114606/can-github-pages-render-asciidoc
|
||||
* link:https://stackoverflow.com/questions/1797074/local-executing-hook-after-a-git-push[post-push hooks don't exist]
|
||||
* https://stackoverflow.com/questions/1797074/local-executing-hook-after-a-git-push[post-push hooks don't exist]
|
||||
* I'm lazy to setup a proper Travis CI push
|
||||
* I'm the only contributor essentially, so no problems with pull requests
|
||||
|
||||
@@ -15980,7 +15973,7 @@ ls "$(./getvar buildroot_build_dir)"
|
||||
|
||||
Note that host tools like QEMU and gem5 store all archs in a single directory to factor out build objects, so cleaning one arch will clean all of them.
|
||||
|
||||
To only nuke only one Buildroot package, we can use the link:https://buildroot.org/downloads/manual/manual.html#pkg-build-steps[`-dirclean`] Buildroot target:
|
||||
To only nuke only one Buildroot package, we can use the https://buildroot.org/downloads/manual/manual.html#pkg-build-steps[`-dirclean`] Buildroot target:
|
||||
|
||||
....
|
||||
./build-buildroot --no-all -- <package-name>-dirclean
|
||||
@@ -16000,7 +15993,7 @@ ls "$(./getvar buildroot_build_build_dir)"
|
||||
|
||||
=== ccache
|
||||
|
||||
link:https://en.wikipedia.org/wiki/Ccache[ccache] <<benchmark-builds,might>> save you a lot of re-build when you decide to <<clean-the-build>> or create a new <<build-variants,build variant>>.
|
||||
https://en.wikipedia.org/wiki/Ccache[ccache] <<benchmark-builds,might>> save you a lot of re-build when you decide to <<clean-the-build>> or create a new <<build-variants,build variant>>.
|
||||
|
||||
We have ccache enabled for everything we build by default.
|
||||
|
||||
@@ -16032,7 +16025,7 @@ watch -n1 'ccache -s'
|
||||
|
||||
and then watch the miss or hit counts go up.
|
||||
|
||||
We have link:https://buildroot.org/downloads/manual/manual.html#ccache[enabled ccached] builds by default.
|
||||
We have https://buildroot.org/downloads/manual/manual.html#ccache[enabled ccached] builds by default.
|
||||
|
||||
`BR2_CCACHE_USE_BASEDIR=n` is used for Buildroot, which means that:
|
||||
|
||||
@@ -16051,7 +16044,7 @@ error while converting qcow2: Failed to get "write" lock
|
||||
|
||||
When doing long simulations sweeping across multiple system parameters, it becomes fundamental to do multiple simulations in parallel.
|
||||
|
||||
This is specially true for gem5, which runs much slower than QEMU, and cannot use multiple host cores to speed up the simulation: link:https://github.com/cirosantilli-work/gem5-issues/issues/15[], so the only way to parallelize is to run multiple instances in parallel.
|
||||
This is specially true for gem5, which runs much slower than QEMU, and cannot use multiple host cores to speed up the simulation: https://github.com/cirosantilli-work/gem5-issues/issues/15[], so the only way to parallelize is to run multiple instances in parallel.
|
||||
|
||||
This also has a good synergy with <<build-variants>>.
|
||||
|
||||
@@ -16237,7 +16230,7 @@ cd -
|
||||
|
||||
`--gem5-worktree <worktree-id>` automatically creates:
|
||||
|
||||
* a link:https://git-scm.com/docs/git-worktree[Git worktree] of gem5 if one didn't exit yet for `<worktree-id>`
|
||||
* a https://git-scm.com/docs/git-worktree[Git worktree] of gem5 if one didn't exit yet for `<worktree-id>`
|
||||
* a separate build directory, exactly like `--gem5-build-id my-new-feature` would
|
||||
|
||||
We promise that the scripts sill never touch that worktree again once it has been created: it is now up to you to manage the code manually.
|
||||
@@ -16612,7 +16605,7 @@ Most userland programs that don't rely on kernel modules can also be tested in u
|
||||
|
||||
===== GDB tests
|
||||
|
||||
We have some link:https://github.com/pexpect/pexpect[pexpect] automated tests for GDB for both userland and baremetal programs!
|
||||
We have some https://github.com/pexpect/pexpect[pexpect] automated tests for GDB for both userland and baremetal programs!
|
||||
|
||||
Run the userland tests:
|
||||
|
||||
@@ -16695,7 +16688,7 @@ This magic output string is notably generated by:
|
||||
* link:rootfs_overlay/lkmc/test_fail.sh[], which is used by <<test-userland-in-full-system>>
|
||||
* the `exit()` baremetal function when `status != 1`.
|
||||
+
|
||||
Unfortunately the only way we found to set this up was with `on_exit`: link:https://github.com/cirosantilli/linux-kernel-module-cheat/issues/59[].
|
||||
Unfortunately the only way we found to set this up was with `on_exit`: https://github.com/cirosantilli/linux-kernel-module-cheat/issues/59[].
|
||||
+
|
||||
Trying to patch `_exit` directly fails since at that point some de-initialization has already happened which prevents the print.
|
||||
+
|
||||
|
||||
2
build
2
build
@@ -553,7 +553,7 @@ Which components to build. Default: qemu-buildroot
|
||||
else:
|
||||
git_version_tuple = tuple(
|
||||
int(x) for x in self.sh.check_output(['git', '--version']) \
|
||||
.split(' ')[-1].split('.')
|
||||
.decode().split(' ')[-1].split('.')
|
||||
)
|
||||
if git_version_tuple >= (2, 9, 0):
|
||||
# https://stackoverflow.com/questions/26957237/how-to-make-git-clone-faster-with-multiple-threads/52327638#52327638
|
||||
|
||||
@@ -50,7 +50,7 @@ from the README to example sources to GitHub rather than locally.
|
||||
for link in self.sh.check_output([
|
||||
os.path.join(asciidoctor_dir, 'extract-link-targets'),
|
||||
self.env['readme']
|
||||
]).splitlines():
|
||||
]).decode().splitlines():
|
||||
if not external_link_re.match(link):
|
||||
if not os.path.lexists(os.path.join(self.env['root_dir'], link)):
|
||||
self.log_error('broken link to local file: ' + link)
|
||||
@@ -63,7 +63,7 @@ from the README to example sources to GitHub rather than locally.
|
||||
for header_id in self.sh.check_output([
|
||||
os.path.join(asciidoctor_dir, 'extract-header-ids'),
|
||||
self.env['readme']
|
||||
]).splitlines():
|
||||
]).decode().splitlines():
|
||||
header_ids.add(header_id)
|
||||
for grep_line in self.sh.check_output(
|
||||
[
|
||||
@@ -74,7 +74,7 @@ from the README to example sources to GitHub rather than locally.
|
||||
LF
|
||||
],
|
||||
cwd=self.env['root_dir']
|
||||
).splitlines():
|
||||
).decode().splitlines():
|
||||
url_index = grep_line.index(self.env['homepage_url'])
|
||||
hash_start_index = url_index + len(self.env['homepage_url'])
|
||||
if len(grep_line) > hash_start_index:
|
||||
|
||||
@@ -1090,7 +1090,7 @@ lunch aosp_{}-eng
|
||||
self.get_toolchain_tool('readelf'),
|
||||
'-h',
|
||||
elf_file_path
|
||||
])
|
||||
]).decode()
|
||||
for line in readelf_header.decode().split('\n'):
|
||||
split = line.split()
|
||||
if line.startswith(' Entry point address:'):
|
||||
|
||||
@@ -23,7 +23,7 @@ https://cirosantilli.com/linux-kernel-module-cheat#release-upload
|
||||
'describe',
|
||||
'--exact-match',
|
||||
'--tags'
|
||||
]).rstrip()
|
||||
]).decode().rstrip()
|
||||
upload_path = self.env['release_zip_file']
|
||||
|
||||
# Check the release already exists.
|
||||
|
||||
14
run
14
run
@@ -415,8 +415,18 @@ Extra options to append at the end of the emulator command line.
|
||||
self.env['baremetal'] is None and
|
||||
self.env['userland'] is None
|
||||
):
|
||||
# This is to run gem5 from a prebuilt download.
|
||||
self.sh.run_cmd([self.env['extract_vmlinux'], self.env['linux_image']])
|
||||
# This is an attempte to run gem5 from a prebuilt download
|
||||
# but it is not working:
|
||||
# https://github.com/cirosantilli/linux-kernel-module-cheat/issues/79
|
||||
self.sh.check_output(
|
||||
[
|
||||
self.env['extract_vmlinux'],
|
||||
self.env['linux_image']
|
||||
],
|
||||
out_file=self.env['image'],
|
||||
show_cmd=True,
|
||||
show_stdout=False
|
||||
)
|
||||
else:
|
||||
raise_image_not_found()
|
||||
else:
|
||||
|
||||
@@ -78,13 +78,20 @@ class ShellHelpers:
|
||||
return base64.b64decode(string.encode()).decode()
|
||||
|
||||
def check_output(self, *args, **kwargs):
|
||||
'''
|
||||
Analogous to subprocess.check_output: get the stdout / stderr
|
||||
of a program back as a byte array.
|
||||
'''
|
||||
out_str = []
|
||||
actual_kwargs = {
|
||||
'show_stdout': False,
|
||||
'show_cmd': False
|
||||
}
|
||||
actual_kwargs.update(kwargs)
|
||||
self.run_cmd(
|
||||
*args,
|
||||
out_str=out_str,
|
||||
show_stdout=False,
|
||||
show_cmd=False,
|
||||
**kwargs
|
||||
**actual_kwargs
|
||||
)
|
||||
return out_str[0]
|
||||
|
||||
@@ -380,7 +387,7 @@ class ShellHelpers:
|
||||
if out_file is not None:
|
||||
logfile.close()
|
||||
if out_str is not None:
|
||||
out_str.append((b''.join(logfile_str)).decode())
|
||||
out_str.append((b''.join(logfile_str)))
|
||||
if threading.current_thread() == threading.main_thread():
|
||||
signal.signal(signal.SIGINT, sigint_old)
|
||||
#signal.signal(signal.SIGPIPE, sigpipe_old)
|
||||
@@ -392,7 +399,7 @@ class ShellHelpers:
|
||||
return returncode
|
||||
else:
|
||||
if not out_str is None:
|
||||
out_str.append('')
|
||||
out_str.append(b'')
|
||||
return 0
|
||||
|
||||
def shlex_split(self, string):
|
||||
|
||||
Reference in New Issue
Block a user