git clone https://github.com/cirosantilli/linux-kernel-module-cheat cd linux-kernel-module-cheat +./setup ./build --download-dependencies qemu-buildroot ./run
diff --git a/index.html b/index.html index 4a7111c..3204859 100644 --- a/index.html +++ b/index.html @@ -4,7 +4,7 @@ - +
getcpu system call and the sched_getaffinity glibc wrapperperf_event_open system callgit clone https://github.com/cirosantilli/linux-kernel-module-cheat cd linux-kernel-module-cheat +./setup ./build --download-dependencies qemu-buildroot ./run
./build --arch aarch64 --download-dependencies qemu-buildroot +./setup +./build --arch aarch64 --download-dependencies qemu-buildroot ./run --arch aarch64
./build --download-dependencies gem5-buildroot +./setup +./build --download-dependencies gem5-buildroot ./run --emulator gem5
./build --download-dependencies qemu +./setup +./build --download-dependencies qemu ./run
cd linux-kernel-module-cheat +./setup ./build --download-dependencies userland-host
./build --arch aarch64 --download-dependencies qemu-baremetal +./setup +./build --arch aarch64 --download-dependencies qemu-baremetal ./run --arch aarch64 --baremetal baremetal/arch/aarch64/dump_regs.c
./build --download-dependencies gem5-baremetal +./setup +./build --download-dependencies gem5-baremetal ./run --arch aarch64 --baremetal userland/c/hello.c --emulator gem5
./build --download-dependencies docs+
./setup +./build --download-dependencies docs
./gdbserver.sh ./c/print_argv.out asdf qwer+
./gdbserver.sh ./c/command_line_arguments.out asdf qwer
./run --gdbserver --userland "$(./getvar userland_build_dir)/c/print_argv.out"+
./run --gdbserver --userland "$(./getvar userland_build_dir)/c/command_line_arguments.out"
The final init that actually got selected is shown on Linux v5.9.2 a line of type:
+<6>[ 0.309984] Run /sbin/init as init process
+at the very end of the boot logs.
+sudo apt-get install git git clone https://github.com/cirosantilli/linux-kernel-module-cheat cd linux-kernel-module-cheat -sudo ./setup -y+./setup -y
/path/to/linux-kernel-module-cheat/out/userland/default/x86_64/c/print_argv.out +/path/to/linux-kernel-module-cheat/out/userland/default/x86_64/c/command_line_arguments.out asdf qw er
./build-userland
To rebuild just QEMU userland if you hack it, use:
+./build-qemu --mode userland+
The:
+--mode userland+
is needed because QEMU has two separate executables:
+qemu-x86_64 for userland
qemu-system-x86_64 for full system
Then you can umount and re-mount on guest without reboot.
We don’t support this yet, but it should not be too hard to hack it up, maybe by hooking into rootfs-post-build-script.
+To build the secondary disk image run build-disk2:
+./build-disk2+
This was not possible from gem5 fs.py as of 60600f09c25255b3c8f72da7fb49100e2682093a: https://stackoverflow.com/questions/50862906/how-to-attach-multiple-disk-images-in-a-simulation-with-gem5-fs-py/51037661#51037661
This will put the entire out_rootfs_overlay_dir into a squashfs filesystem.
Then, if that filesystem is present, ./run will automatically pass it as the second disk on the command line.
For example, from inside QEMU, you can mount that disk with:
+mkdir /mnt/vdb +mount /dev/vdb /mnt/vdb +/mnt/vdb/lkmc/c/hello.out+
To update the secondary disk while a simulation is running to avoid rebooting, first unmount in the guest:
+umount /mnt/vdb+
and then on the host:
+# Edit the file. +vim userland/c/hello.c +./build-userland +./build-disk2+
and now you can re-run the updated version of the executable on the guest after remounting it.
+gem5 fs.py support for multiple disks is discussed at: https://stackoverflow.com/questions/50862906/how-to-attach-multiple-disk-images-in-a-simulation-with-gem5-fs-py/51037661#51037661
The 9p protocol allows the guest to mount a host directory.
Is possible on aarch64 as shown at: https://gem5-review.googlesource.com/c/public/gem5/+/22831, and it is just a matter of exposing to X86 for those that want it.
Vermagic is a magic string present in the kernel and on MODULE_INFO of kernel modules. It is used to verify that the kernel module was compiled against a compatible kernel version and relevant configuration:
+ +As of kernel v5.8, you can’t use VERMAGIC_STRING string from modules anymore as per: https://github.com/cirosantilli/linux/commit/51161bfc66a68d21f13d15a689b3ea7980457790. So instead we just showcase init_utsname.
Sample insmod output as of LKMC fa8c2ee521ea83a74a2300e7a3be9f9ab86e2cb6 + 1 aarch64:
+<6>[ 25.180697] sysname = Linux +<6>[ 25.180697] nodename = buildroot +<6>[ 25.180697] release = 5.9.2 +<6>[ 25.180697] version = #1 SMP Thu Jan 1 00:00:00 UTC 1970 +<6>[ 25.180697] machine = aarch64 +<6>[ 25.180697] domainname = (none)+
Vermagic is a magic string present in the kernel and previously visible in MODULE_INFO on kernel modules. It is used to verify that the kernel module was compiled against a compatible kernel version and relevant configuration:
Source: kernel_modules/vermagic.c
-If we artificially create a mismatch with MODULE_INFO(vermagic, the insmod fails with:
See also: https://github.com/robclark/kmscube/issues/12 and https://stackoverflow.com/questions/26920835/can-egl-application-run-in-console-mode/26921287#26921287
+See also:
+Tested on: 2903771275372ccfecc2b025edbb0d04c4016930
@@ -17435,6 +17593,9 @@ Format specific information:https://stackoverflow.com/questions/64539528/qemu-pci-dma-read-and-pci-dma-write-does-not-work
https://stackoverflow.com/questions/64842929/general-protection-error-while-tring-to-perform-ioctl
+and then in the guest, take a checkpoint and exit:
+and then in the guest, take a checkpoint and exit with:
so you can just copy paste the command.
Building individual tests is possible with:
+Building individual tests is possible with --unit-test (singular, no 's'):
Uses the same data source as util/o3-pipeview.py.
gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall-gain shows how the text-based visualization can get problematic due to stalls requiring wraparounds.
+gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall_gain shows how the text-based visualization can get problematic due to stalls requiring wraparounds.
Like gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall but now with an LDR stall: userland/arch/aarch64/freestanding/linux/stall-gain.S.
+Like gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall but now with an LDR stall: userland/arch/aarch64/freestanding/linux/stall_gain.S.
So in this case we see that there were actual potential gains, since the movz x11 started running immediately. We just stopped at movz x20 because a new ifetch was needed.
Like gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall-gain but now with some dependencies after the LDR: userland/arch/aarch64/freestanding/linux/stall-hazard4.S.
+Like gem5 event queue DerivO3CPU syscall emulation freestanding example analysis: stall_gain but now with some dependencies after the LDR: userland/arch/aarch64/freestanding/linux/stall_hazard4.S.
So in this case the ic of dependencies like add x6, x5, #1 have to wait until the LDR is finished:
https://bitbucket.org/gensim/gensim
+Source at: https://github.com/gensim-project/gensim previously at: https://bitbucket.org/gensim/gensim
MIT licensed Binary translation simulator, so a bit like an MIT QEMU.
@@ -31362,46 +31523,6 @@ echo 1 > /proc/sys/vm/overcommit_memoryclasses
-constructor
-userland/cpp/initializer_list_constructor.cpp: documents stuff like std::vector<int> v{0, 1}; and std::initializer_list
userland/cpp/most_vexing_parse.cpp: the most vexing parse is a famous constructor vs function declaration syntax gotcha!
- -virtual and polymorphism
iostream
userland/cpp/initializer_list_constructor.cpp: documents stuff like std::vector<int> v{0, 1}; and std::initializer_list
userland/cpp/most_vexing_parse.cpp: the most vexing parse is a famous constructor vs function declaration syntax gotcha!
+ +virtual and polymorphism
Output Ubuntu 20.04 GCC 9.3:
+constructor? +constructor + +copy? +constructor +copy + +copy assignment? +constructor +copy assignment +constructor +copy +move assignment +destructor + +move? +constructor + +move? +constructor +constructor +move assignment +destructor + +a bunch of destructors? +destructor +destructor +destructor +destructor +destructor+
Like for C, you have to pay for the standards… insane. So we just use the closest free drafts instead.
+OMG this is hell, understand when primitive variables are initialized or not:
The smallest data race we managed to come up as of LKMC 7c01b29f1ee7da878c7cc9cb4565f3f3cf516a92 and gem5 872cb227fdc0b4d60acc7840889d567a6936b6e1 was with userland/c/atomic.c (see also C multithreading):
Like for C, you have to pay for the standards… insane. So we just use the closest free drafts instead.
-decltypedecltypehttps://stackoverflow.com/questions/37031805/preparation-for-stditerator-being-deprecated/38103394
+userland/cpp/custom_iterator.cpp: there is no way to easily define a nice custom iterator, you just have to wrap existing iterators and add a gazillion wrapper methods:
+Under: userland/libs directory.
+On Ubuntu 20.04, the package:
+sudo apt install googletest+
does not contain prebuilts, and it is intentional, it is incomprehensible:
+so you might as well just git clone and build the damned thing yourself:
git submodule update --init submodules/googletest +cd submodules/googletest +mkdir build +cd build +cmake .. +make -j`nproc` +cd ../../userland/libs/googletest +./build+
userland/libs/googletest/main.cpp[]
+Binary format to store data. TODO vs databases, notably SQLite: https://datascience.stackexchange.com/questions/262/hierarchical-data-format-what-are-the-advantages-compared-to-alternative-format
+Examples:
+gem5 can dump statistics as HDF5: gem5 HDF5 statistics
+Examples:
Host installation shown at: https://askubuntu.com/questions/594656/how-to-install-the-latest-versions-of-nodejs-and-npm/971612#971612
Overviews:
+Skip breaking on the first line every time: https://stackoverflow.com/questions/41153179/why-is-the-node-debugger-break-on-first-line-a-thing
+Illustrates how to add extra non-code data files to an NPM package, and then use those files at runtime.
No OpenJDK package as of 2018.08: https://stackoverflow.com/questions/28874150/buildroot-with-jamvm-2-0-for-java-8/59290927#59290927 partly because their build system is shit like the rest of the project’s setup.
https://en.wikipedia.org/wiki/Boost_(C%2B%2B_libraries)
+As an exception, if you first cd directly into one of the directories and do a native host build, e.g.:
sudo apt install libeigen3-dev +cd userland/libs/eigen +./build+
then that library will be automatically enabled.
+See also:
Binary format to store data. TODO vs databases, notably SQLite: https://datascience.stackexchange.com/questions/262/hierarchical-data-format-what-are-the-advantages-compared-to-alternative-format
-Examples:
-gem5 can dump statistics as HDF5: gem5 HDF5 statistics
-Due to the way that gem5 syscall emulation multithreading however, the output is more deterministic in that case, see that section for further details.
perf_event_open system callOn ARM, perf_event_open uses the ARM PMU. The mapping between kernel events and ARM PMU events can be found at: https://github.com/cirosantilli/linux/blob/v5.9/arch/arm64/kernel/perf_event.c
Bibliography:
+man perf_event_open
instruction counts: https://stackoverflow.com/questions/13313510/quick-way-to-count-number-of-instructions-executed-in-a-c-program/64863392#64863392
+cycle counts:
+There is also the RDPID instruction that reads just the processor ID, but it appears to be very new for QEMU 4.0.0 or 2017 Lenovo ThinkPad P51, as it fails with SIGILL on both.
TODO We didn’t manage to find a working ARM analogue to x86 RDTSC instruction: kernel_modules/pmccntr.c is oopsing, and even it if weren’t, it likely won’t give the cycle count since boot since it needs to be activate before it starts counting anything:
+Bibliography:
The PMU (Performance Monitor Unit) is an unit in the ARM CPU that counts performance events of interest. These can be used to benchmark, and sometimes debug, code running on ARM CPUs.
+It is documented at ARMv8 architecture reference manual db Chapter D7 "The Performance Monitors Extension">
+The Linux kernel exposes some (all?) of those events through the arch-agnostic perf_event_open system call system call.
Exposing the PMU to Linux v5.9.2 requires a DTB entry of type:
+pmu {
+ compatible = "arm,armv8-pmuv3";
+ interrupts = <0x01 0x04 0xf04>;
+};
+and if sucessful, a boot message shows:
+<6>[ 0.044391] hw perfevents: enabled with armv8_pmuv3 PMU driver, 32 counters available+
The PMU is exposed through ARM system register instructions, with registers that start with the prefix PM*.
<6>[ 0.044391] hw perfevents: enabled with armv8_pmuv3 PMU driver, 32 counters available
+ARMv8 architecture reference manual db D7.11.3 "Common event numbers" gives the available standardized events. Address space is also reverved for vendor extensions. For example, from it we see that the instruction count is documented at:
+++++0x0008, INST_RETIRED, Instruction architecturally executed
+++The counter increments for every architecturally executed instruction.
+
where "architecturally executed" is a reference to the possibility of Out-of-order execution in the implementation, which leads to some instructions being executed speculatively, but not have any side effects in the end.
+TODO We didn’t manage to find a working ARM analogue to x86 RDTSC instruction: kernel_modules/pmccntr.c is oopsing, and even it if weren’t, it likely won’t give the cycle count since boot since it needs to be activate before it starts counting anything:
+Good getting started tutorials:
The official manuals were stored in http://infocenter.arm.com but as of 2017 they started to slowly move to https://developer.arm.com.
Bibliography: https://www.quora.com/Where-can-I-find-the-official-documentation-of-ARM-instruction-set-architectures-ISAs
ARM also releases documentation specific to each given processor.
A general introduction to paging with x86 examples can be found at: https://cirosantilli.com/x86-paging.
Then, this article is amazing: https://www.starlab.io/blog/deep-dive-mmu-virtualization-with-xen-on-arm
+ARM paging is documented at ARMv8 architecture reference manual db Chapter D5 and is mostly called VMSAv8 in the ARMv8 manual (Virtual Memory System Architecture).
First, also consider the userland bibliography: Section 29.9, “ARM assembly bibliography”.
+First, also consider the userland bibliography: Section 29.10, “ARM assembly bibliography”.
The most useful ARM baremetal example sets we’ve seen so far are:
@@ -43779,7 +44241,8 @@ CACHE2 S nyy./build --download-dependencies --dry-run <some-target> | less+
cat ./setup +./build --download-dependencies --dry-run <some-target> | less
This way you can just hack away the scripts and try them out immediately without any further operations.
out_rootfs_overlay_dirThis path can be found with:
This does not include native image modification mechanisms such as Buildroot packages, which we let Buildroot itself manage.
disk_image_2A squashfs of out_rootfs_overlay_dir that gets passed as the second argument.
Especially useful with gem5 as a way to gem5 checkpoint restore and run a different script via Secondary disk since setting up gem5 9P is slightly laborious.
+We try to keep as much as possible in those files. It bloats builds a little, but just makes everything simpler to understand.
Link with lkmc.o is enabled with the path_properties.py
+'extra_objs_lkmc_common': False,+