cat out/run.sh+
diff --git a/index.html b/index.html index d34c89f..f546834 100644 --- a/index.html +++ b/index.html @@ -477,7 +477,7 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
m5out/system.workload.dmesg filesee also: Dry run to get commands for your project.
+When you reach difficulties, QEMU makes it possible to easily GDB step debug the Linux kernel source code, see: Section 2, “GDB step debug”.
See also: Dry run to get commands for your project.
+--eval-after is optional: you could just type insmod hello.ko in the terminal, but this makes it run automatically at the end of boot, and then drops you into a shell.
One of the major features of this repository is that we try to support the --dry-run option really well for all scripts.
Since we need this so often, the last run command is also stored for convenience at:
+cat out/run.sh+
although this won’t of course work well for Simultaneous runs.
+Furthermore, --dry-run also automatically specifies, in valid Bash shell syntax:
If you get a failure before that, it will be hard to see the print messages.
One possible solution is to parse the dmesg buffer, gem5 actually implements that: gem5 m5out/system.dmesg file.
+One possible solution is to parse the dmesg buffer, gem5 actually implements that: gem5 m5out/system.workload.dmesg file.
QEMU by default copies the host uname value, but we always override it in our scripts.
Determining the right number to use for the kernel version is of course highly non-trivial and would require an extensive userland test suite, which most emulator don’t have.
+Determining the right number to use for the kernel version is of course highly non-trivial and would require an extensive userland test suite, which most emulators don’t have.
where m5 is a guest utility present inside the gem5 tree which we cross-compiled and installed into the guest.
+where gem5 m5 executable is a guest utility present inside the gem5 tree which we cross-compiled and installed into the guest.
To restore the checkpoint, kill the VM and run:
@@ -19449,7 +19469,7 @@ FullO3CPU: Ticking main, FullO3CPU.m5 is a guest command line utility that is installed and run on the guest, that serves as a CLI front-end for the m5ops
It is possible to guess what most tools do from the corresponding m5ops, but let’s at least document the less obvious ones here.
In LKMC we build m5 with:
./build-m5 --arch aarch64+
The m5 executable can be run on User mode simulation as normal with:
./run --arch aarch64 --emulator gem5 --userland "$(./getvar --arch aarch64 out_rootfs_overlay_bin_dir)/m5" --userland-args dumpstats+
This can be a good test m5ops since it executes very quickly.
+Makes gem5 dump one more statistics entry to the gem5 m5out/stats.txt file.
+End the simulation with a failure exit event:
Send a guest file to the host. 9P is a more advanced alternative.
Read a host file pointed to by the fs.py --script option to stdout.
Ermm, just another m5 readfile that only takes integers and only from CLI options? Is this software so redundant?
Trivial combination of m5 readfile + execute the script.
gem5 allocates some magic instructions on unused instruction encodings for convenient guest instrumentation.
Those instructions are exposed through the m5 in tree executable.
+Those instructions are exposed through the gem5 m5 executable in tree executable.
To make things simpler to understand, you can play around with our own minimized educational m5 subset:
./build-userland \ --arch aarch64 \ - --ccflags='-DLKMC_M5OPS_ENABLE=1' \ --force-rebuild \ userland/c/m5ops.c \ ; @@ -19745,7 +19789,7 @@ m5 execfile
Let’s study how m5 uses them:
+Let’s study how the gem5 m5 executable uses them:
m5out/system.workload.dmesg fileTODO confirm and create minimal example.
+This file used to be called just m5out/system.dmesg, but the name was changed after the workload refactorings of March 2020.
I think this file is capable of showing terminal messages before they reach the terminal by parsing the dmesg buffer from memory.
+This file is capable of showing terminal messages that are printk before the serial is enabled as described at: Linux kernel early boot messages.
This could be used to debug the Linux kernel boot if problems happen before the serial is enabled: Linux kernel early boot messages.
+The file is dumped only on kernel panics which gem5 can detect by the PC address: Exit gem5 on panic.
The file appears to get dumped only on kernel panic which gem5 can detect by the PC address: Exit gem5 on panic.
+This mechanism can be very useful to debug the Linux kernel boot if problems happen before the serial is enabled.
+This magic mechanism works by activating an event when the PC reaches the printk address, much like gem5 can detect panic by PC and then parsing printk function arguments and buffers!
The relevant source is at src/kern/linux/printk.c.
We can test this mechanism in a controlled way by hacking a panic() into the kernel next to a printk that shows up before the serial is enabled, e.g. on Linux v5.4.3 we could do:
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
+index f296d89be757..3e79916322c2 100644
+--- a/kernel/trace/ftrace.c
++++ b/kernel/trace/ftrace.c
+@@ -6207,6 +6207,7 @@ void __init ftrace_init(void)
+
+ pr_info("ftrace: allocating %ld entries in %ld pages\n",
+ count, count / ENTRIES_PER_PAGE + 1);
++ panic("foobar");
+
+ last_ftrace_enabled = ftrace_enabled = 1;
+With this, after the panic, system.workload.dmesg contains on LKMC d09a0d97b81582cc88381c4112db631da61a048d aarch64:
[0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd070] +[0.000000] Linux version 5.4.3-dirty (lkmc@f7688b48ac46e9a669e279f1bc167722d5141eda) (gcc version 8.3.0 (Buildroot 2019.11-00002-g157ac499cf)) #1 SMP Thu Jan 1 00:00:00 UTC 1970 +[0.000000] Machine model: V2P-CA15 +[0.000000] Memory limited to 256MB +[0.000000] efi: Getting EFI parameters from FDT: +[0.000000] efi: UEFI not found. +[0.000000] On node 0 totalpages: 65536 +[0.000000] DMA32 zone: 1024 pages used for memmap +[0.000000] DMA32 zone: 0 pages reserved +[0.000000] DMA32 zone: 65536 pages, LIFO batch:15 +[0.000000] percpu: Embedded 29 pages/cpu s79960 r8192 d30632 u118784 +[0.000000] pcpu-alloc: s79960 r8192 d30632 u118784 alloc=29*4096 +[0.000000] pcpu-alloc: [0] 0 +[0.000000] Detected PIPT I-cache on CPU0 +[0.000000] CPU features: detected: ARM erratum 832075 +[0.000000] CPU features: detected: EL2 vector hardening +[0.000000] ARM_SMCCC_ARCH_WORKAROUND_1 missing from firmware +[0.000000] Built 1 zonelists, mobility grouping on. Total pages: 64512 +[0.000000] Kernel command line: earlyprintk=pl011,0x1c090000 lpj=19988480 rw loglevel=8 mem=256MB root=/dev/sda console_msg_format=syslog nokaslr norandmaps panic=-1 printk.devkmsg=on printk.time=y rw console=ttyAMA0 - lkmc_home=/lkmc +[0.000000] Dentry cache hash table entries: 32768 (order: 6, 262144 bytes, linear) +[0.000000] Inode-cache hash table entries: 16384 (order: 5, 131072 bytes, linear) +[0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off +[0.000000] Memory: 233432K/262144K available (6652K kernel code, 792K rwdata, 2176K rodata, 896K init, 659K bss, 28712K reserved, 0K cma-reserved) +[0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 +[0.000000] ftrace: allocating 22067 entries in 87 pages+
So we see that messages up to the ftrace do show up!
Whenever we run m5 dumpstats or m5 exit, a section with the following format is added to that file:
Whenever we run m5 dumpstats or when fs.py and se.py are exiting (TODO other scripts?), a section with the following format is added to that file:
For x86, it is interesting to try and correlate numCycles with:
In LKMC f42c525d7973d70f4c836d2169cc2bd2893b4197 gem5 5af26353b532d7b5988cf0f6f3d0fbc5087dd1df, the stat file for a C hello world:
+./run --arch aarch64 --emulator gem5 --userland userland/c/hello.c+
which has a single dump done at the exit, has size 59KB and stat lines of form:
+final_tick 91432000 # Number of ticks from beginning of simulation (restored from checkpoints and never reset)+
We can reduce the file size by adding the ?desc=False magic suffix to the stat flie name:
--stats-file stats.txt?desc=false+
as explained in:
+gem5.opt --stats-help+
and this reduces the file size to 39KB by removing those excessive comments:
+final_tick 91432000+
although trailing spaces are still prse
+We can further reduce this size by removing spaces from the dumps with this hack:
+ ccprintf(stream, " |%12s %10s %10s",
+ ValueToString(value, precision), pdfstr.str(), cdfstr.str());
+ } else {
+- ccprintf(stream, "%-40s %12s %10s %10s", name,
+- ValueToString(value, precision), pdfstr.str(), cdfstr.str());
++ ccprintf(stream, "%s %s", name, ValueToString(value, precision));
++ if (pdfstr.rdbuf()->in_avail())
++ stream << " " << pdfstr.str();
++ if (cdfstr.rdbuf()->in_avail())
++ stream << " " << cdfstr.str();
+
+ if (descriptions) {
+ if (!desc.empty())
+and after that the file size went down to 21KB.
+We can make gem5 dump statistics in the HDF5 format by adding the magic h5:// prefix to the file name as in:
gem5.opt --stats-file h5://stats.h5+
as explained in:
+gem5.opt --stats-help+
This is not exposed in LKMC f42c525d7973d70f4c836d2169cc2bd2893b4197 however, you just have to hack the gem5 CLI for now.
+TODO what is the advantage? The generated file for --stats-file h5://stats.h5?desc=False in LKMC f42c525d7973d70f4c836d2169cc2bd2893b4197 gem5 5af26353b532d7b5988cf0f6f3d0fbc5087dd1df for a single dump was 946K, so much larger than the text version seen at gem5 m5out/stats.txt file which was only 59KB max!
We then try to see if it is any better when you have a bunch of dump events:
+./run --arch aarch64 --emulator gem5 --userland userland/c/m5ops.c --userland-args 'd 1000'+
and there yes, we see that the file size fell from 39MB on stats.txt to 3.2MB on stats.m5, so the increase observed previously was just due to some initial size overhead (considering the patched gem5 with no spaces in the text file).
We also note however that the stat dump made the such a simulation that just loops and dumps considerably slower, from 3s to 15s on P51. Fascinating, we are definitely not disk bound there.
+TODO
To prevent the stats file from becoming humongous.
This describes the internals of the gem5 m5out/stats.txt file.
+GDB call stack to dumpstats:
Stats::pythonDump () at build/ARM/python/pybind11/stats.cc:58 +Stats::StatEvent::process() () +GlobalEvent::BarrierEvent::process (this=0x555559fa6a80) at build/ARM/sim/global_event.cc:131 +EventQueue::serviceOne (this=this@entry=0x555558c36080) at build/ARM/sim/eventq.cc:228 +doSimLoop (eventq=0x555558c36080) at build/ARM/sim/simulate.cc:219 +simulate (num_cycles=<optimized out>) at build/ARM/sim/simulate.cc:132+
Stats::pythonDump does:
void
+pythonDump()
+{
+ py::module m = py::module::import("m5.stats");
+ m.attr("dump")();
+}
+This calls src/python/m5/stats/init.py in def dump does the main dumping
That function does notably:
+for output in outputList: + if output.valid(): + output.begin() + for stat in stats_list: + stat.visit(output) + output.end()+
begin and end are defined in C++ and output the header and tail respectively
void
+Text::begin()
+{
+ ccprintf(*stream, "\n---------- Begin Simulation Statistics ----------\n");
+}
+
+void
+Text::end()
+{
+ ccprintf(*stream, "\n---------- End Simulation Statistics ----------\n");
+ stream->flush();
+}
+stats_list contains the stats, and stat.visit prints them, outputList contains by default just the text output. I don’t see any other types of output in gem5, but likely JSON / binary formats could be envisioned.
Tested in gem5 b4879ae5b0b6644e6836b0881e4da05c64a6550d.
+See also: Profiling userland programs.
+Profiling techniques are discussed in more detail at: Profiling userland programs.
+For the prof build, you can get the gmon.out file with:
./run --arch aarch64 --emulator gem5 --userland userland/c/hello.c --gem5-build-type prof +gprof "$(./getvar --arch aarch64 gem5_executable)" > tmp.gprof+
This describes the internals of the gem5 m5out/stats.txt file.
-GDB call stack to dumpstats:
Stats::pythonDump () at build/ARM/python/pybind11/stats.cc:58 -Stats::StatEvent::process() () -GlobalEvent::BarrierEvent::process (this=0x555559fa6a80) at build/ARM/sim/global_event.cc:131 -EventQueue::serviceOne (this=this@entry=0x555558c36080) at build/ARM/sim/eventq.cc:228 -doSimLoop (eventq=0x555558c36080) at build/ARM/sim/simulate.cc:219 -simulate (num_cycles=<optimized out>) at build/ARM/sim/simulate.cc:132-
Stats::pythonDump does:
void
-pythonDump()
-{
- py::module m = py::module::import("m5.stats");
- m.attr("dump")();
-}
-This calls src/python/m5/stats/init.py in def dump does the main dumping
That function does notably:
-for output in outputList: - if output.valid(): - output.begin() - for stat in stats_list: - stat.visit(output) - output.end()-
begin and end are defined in C++ and output the header and tail respectively
void
-Text::begin()
-{
- ccprintf(*stream, "\n---------- Begin Simulation Statistics ----------\n");
-}
-
-void
-Text::end()
-{
- ccprintf(*stream, "\n---------- End Simulation Statistics ----------\n");
- stream->flush();
-}
-stats_list contains the stats, and stat.visit prints them, outputList contains by default just the text output. I don’t see any other types of output in gem5, but likely JSON / binary formats could be envisioned.
Tested in gem5 b4879ae5b0b6644e6836b0881e4da05c64a6550d.
-gem5 uses a ton of code generation, which makes the project horrendous:
But it has been widely overused to insanity. It likely also exists partly because when the project started in 2003 C++ compilers weren’t that good, so you couldn’t rely on features like templates that much.
Generated code at: build/<ISA>/config/the_isa.hh which contains amongst other lines:
gem5 moves a bit slowly, and if your host compiler is very new, the gem5 build might be broken for it, e.g. this was the case for Ubuntu 19.10 with GCC 9 and gem5 62d75e7105fe172eb906d4f80f360ff8591d4178 from Dec 2019.
E.g. src/cpu/decode_cache.hh includes:
Some scons madness.
userland/cpp/template_class_with_static_member.cpp: https://stackoverflow.com/questions/3229883/static-member-initialization-in-a-class-template
userland/cpp/if_constexpr.cpp: C++17 if constexpr
userland/cpp/if_constexpr.cpp: C++17 if constexpr: https://stackoverflow.com/questions/12160765/if-else-at-compile-time-in-c/54647315#54647315
OMG this is hell, understand when primitive variables are initialized or not:
+Intuition:
+direct initialization: a constructor called explicitly with at least one argument: https://en.cppreference.com/w/cpp/language/direct_initialization
+default initialization: does not initialize primitive types: https://en.cppreference.com/w/cpp/language/default_initialization
+value initialization: maybe initializes primitive types: https://en.cppreference.com/w/cpp/language/value_initialization
+zero initialization: initializes primitive types
+Good rule:
+initialize every single variable explicitly to prevent the risk of having uninitialized variables due to programmer error (which is easy to get wrong due to insane rules)
+if you don’t define your own default constructor, always = delete it instead. This prevents the possibility that variables will be assigned twice due to zero initialization
Like for C, you have to pay for the standards… insane. So we just use the closest free drafts instead.
https://stackoverflow.com/questions/81656/where-do-i-find-the-current-c-or-c-standard-documents
The cache sizes were chosen to match the host P51 to improve the comparison. Ideally we should also use the same standard library.
Note that this will take a long time, and will produce a humongous ~40Gb stats file as explained at: Section 19.9.3.1, “gem5 only dump selected stats”
+Note that this will take a long time, and will produce a humongous ~40Gb stats file as explained at: Section 19.9.3.2, “gem5 only dump selected stats”
Sources:
@@ -26713,7 +26978,17 @@ git clean -xdf .Binary format to store data. TODO vs databases, notably SQLite: https://datascience.stackexchange.com/questions/262/hierarchical-data-format-what-are-the-advantages-compared-to-alternative-format
Examples: userland/libs/hdf5
+Examples:
+gem5 can dump statistics as HDF5: gem5 HDF5 statistics
+Same but with Buildroot vanilla kernel (kernel v4.19): 44s to blow up at "Please append a correct "root=" boot option; here are the available partitions" because missing some filesystem mount option. But likely wouldn’t be much more until after boot since we are almost already done by then! Therefore this vanilla kernel is much much faster! TODO find which config or kernel commit added so much time! Also that kernel is tiny at 8.5MB.
Same but hacking BR2_LINUX_KERNEL_LATEST_VERSION=y and BR2_PACKAGE_HOST_LINUX_HEADERS_CUSTOM_5_3=y which reaches kernel 5.3.14 which closer to the LKMC one 5.4.3: 40s, which is very similar for the older kernel. Therefore it does not loook like it is a problem of kernel code changes, but rather of configs.
Same but with: gem5 arm Linux kernel patches at v4.15: 73s, kernel size: 132M.