diff --git a/index.html b/index.html index 61617e0..0084a5d 100644 --- a/index.html +++ b/index.html @@ -1126,60 +1126,63 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
arm and aarch64 configs present in the official ARM gem5 Linux kernel fork as described at: Section 18.9, “gem5 arm Linux kernel patches”. Some of the configs present there are added by the patches.
arm and aarch64 configs present in the official ARM gem5 Linux kernel fork as described at: Section 18.8, “gem5 arm Linux kernel patches”. Some of the configs present there are added by the patches.
Jason’s magic x86_64 config: http://web.archive.org/web/20171229121642/http://www.lowepower.com/jason/files/config which is referenced at: http://web.archive.org/web/20171229121525/http://www.lowepower.com/jason/setting-up-gem5-full-system.html. QEMU boots with that by removing # CONFIG_VIRTIO_PCI is not set.
TODO how to make gem5 disk writes persistent?
As of cadb92f2df916dbb47f428fd1ec4932a2e1f0f48 there are some read_only entries in the config.ini under cow sections, but hacking them to true did not work:
As of cadb92f2df916dbb47f428fd1ec4932a2e1f0f48 there are some read_only entries in the gem5 config.ini under cow sections, but hacking them to true did not work:
disadvantage of gem5: slower than QEMU, see: Section 28.2.1, “Benchmark Linux kernel boot”
+disadvantages of gem5:
+slower than QEMU, see: Section 28.2.1, “Benchmark Linux kernel boot”
This implies that the user base is much smaller, since no Android devs.
On the other hand, the chip makers tend to upstream less, and the project becomes more crappy in average :-)
gem5 is way more complex and harder to modify and maintain
+The only hairy thing in QEMU is the binary code generation.
+gem5 however has tended towards intensive code generation in order to support all its different hardware types:
+lots of magic happen on top of pybind11, which is already magic, to more automatically glue the C++ and Python worlds
+.isa code which describes most of the instructions
+Ruby for memory systems
+To find out why your program is slow, a good first step is to have a look at gem5 stats.txt file.
+To find out why your program is slow, a good first step is to have a look at the gem5 m5out/stats.txt file.
The cache sizes were chosen to match the host P51 to improve the comparison. Ideally we should also use the same standard library.
Note that this will take a long time, and will produce a humongous ~40Gb stats file as explained at: Section 18.10.2.1, “gem5 only dump selected stats”
+Note that this will take a long time, and will produce a humongous ~40Gb stats file as explained at: Section 18.9.2.1, “gem5 only dump selected stats”
Sources:
@@ -18645,7 +18677,23 @@ expect eofPass options to the fs.py script:
Remember that in the gem5 command line, we can either pass options to the script being run as in:
+build/X86/gem5.opt configs/examples/fs.py --some-option+
or to the gem5 executable itself:
+build/X86/gem5.opt --some-option configs/examples/fs.py+
Pass options to the script in our setup use:
Pass options to the gem5 executable itself:
To pass options to the gem5 executable we expose the --gem5-exe-args option:
Quit the simulation after 1024 instructions:
./run --emulator gem5 -- -I 1024-
Can be nicely checked with gem5 tracing.
-Cycles instead of instructions:
-./run --emulator gem5 -- --memory 1024-
Otherwise the simulation runs forever by default.
-m5ops are magic instructions which lead gem5 to do magic things, like quitting or dumping stats.
m5 is a guest command line utility that is installed and run on the guest, that serves as a CLI front-end for the m5ops
It is possible to guess what most tools do from the corresponding m5ops, but let’s at least document the less obvious ones here.
End the simulation with a failure exit event:
Send a guest file to the host. 9P is a more advanced alternative.
Read a host file pointed to by the fs.py --script option to stdout.
Ermm, just another m5 readfile that only takes integers and only from CLI options? Is this software so redundant?
Trivial combination of m5 readfile + execute the script.
gem5 allocates some magic instructions on unused instruction encodings for convenient guest instrumentation.
Let’s study how m5 uses them:
include/gem5/asm/generic/m5ops.h also describes some annotation instructions.
https://gem5.googlesource.com/arm/linux/ contains an ARM Linux kernel forks with a few gem5 specific Linux kernel patches on top of mainline created by ARM Holdings on top of a few upstream kernel releases.
Tested on 649d06d6758cefd080d04dc47fd6a5a26a620874 + 1.
We have observed that with the kernel patches, boot is 2x faster, falling from 1m40s to 50s.
When you run gem5, it generates an m5out directory at:
The files in that directory contains some very important information about the run, and you should become familiar with every one of them.
Contains UART output, both from the Linux kernel or from the baremetal system.
This file contains important statistics about the run:
For x86, it is interesting to try and correlate numCycles with:
TODO
The config.ini file, contains a very good high level description of the system:
The m5out/config.ini file, contains a very good high level description of the system:
We use the m5term in-tree executable to connect to the terminal instead of a direct telnet.
We have made a crazy setup that allows you to just cd into submodules/gem5, and edit Python scripts directly there.
By default, we use configs/example/fs.py script.
We setup 2 big and 2 small CPUs, but cat /proc/cpuinfo shows 4 identical CPUs instead of 2 of two different types, likely because gem5 does not expose some informational register much like the caches: https://www.mail-archive.com/gem5-users@gem5.org/msg15426.html config.ini does show that the two big ones are DerivO3CPU and the small ones are MinorCPU.
We setup 2 big and 2 small CPUs, but cat /proc/cpuinfo shows 4 identical CPUs instead of 2 of two different types, likely because gem5 does not expose some informational register much like the caches: https://www.mail-archive.com/gem5-users@gem5.org/msg15426.html gem5 config.ini does show that the two big ones are DerivO3CPU and the small ones are MinorCPU.
TODO: why is the --dtb required despite fs_bigLITTLE.py having a DTB generation capability? Without it, nothing shows on terminal, and the simulation terminates with simulate() limit reached @ 18446744073709551615. The magic vmlinux.vexpress_gem5_v1.20170616 works however without a DTB.
This error happens when the following instruction limits are reached:
The exact same can be achieved with the older hardcoded --maxinsts mechanism present in se.py and fs.py:
./run \ + --emulator gem5 \ + --static \ + --userland \userland/arch/x86_64/freestanding/linux/hello.S \ + --trace-insts-stdout \ + -- \ + --maxinsts 3 +;+
The message also shows on User mode simulation deadlocks, for example in userland/posix/pthread_deadlock.c:
In order to use different build options, you might also want to use gem5 build variants to keep the build outputs separate from one another.
The gem5.debug executable has optimizations turned off unlike the default gem5.opt, and provides a much better debug experience:
TODO test properly, benchmark vs GCC.
If there gem5 appears to have a C++ undefined behaviour bug, which is often very difficult to track down, you can try to build it with the following extra SCons options:
Ruby is a system that includes the SLICC domain specific language to describe memory systems: http://gem5.org/Ruby
+It seems to have usage outside of gem5, but the naming overload with the Ruby programming language, which also has domain specific languages as a concept, makes it impossible to google anything about it!
+Ruby is activated at compile time with the PROTOCOL flag, which specifies the desired memory system time.
For example, to use a two level MESI cache coherence protocol, we can do:
+./build-gem5 --arch aarch64 --gem5-build-id ruby -- PROTOCOL=MESI_Two_Level+
and during build we see a humongous line of type:
+[ SLICC] src/mem/protocol/MESI_Two_Level.slicc -> ARM/mem/protocol/AccessPermission.cc, ARM/mem/protocol/AccessPermission.hh, ...+
which shows that dozens of C++ files are being generated from Ruby SLICC.
+TODO observe it doing something during a run.
+The relevant source files live in the source tree under:
+src/mem/protocol/MESI_Two_Level*+
We already pass the SLICC_HTML flag by default to the build, which generates an HTML summary of each memory protocol under:
xdg-open "$(./getvar --arch aarch64 --gem5-build-id ruby gem5_build_build_dir)/ARM/mem/protocol/html/index.html"+
TODO: review this section, make a more controlled userland experiment with m5ops instrumentation.
Let’s have some fun and try to correlate the gem5 gem5 stats.txt system.cpu.numCycles cycle count with the x86 RDTSC instruction that is supposed to do the same thing:
Let’s have some fun and try to correlate the gem5 m5out/stats.txt file system.cpu.numCycles cycle count with the x86 RDTSC instruction that is supposed to do the same thing:
gem5 automatically increments ports until it finds a free one.
gem5 60600f09c25255b3c8f72da7fb49100e2682093a does not seem to expose a way to set the terminal and VNC ports from fs.py, so we just let gem5 assign the ports itself, and use -n only to match what it assigned. Those ports both appear on config.ini.
gem5 60600f09c25255b3c8f72da7fb49100e2682093a does not seem to expose a way to set the terminal and VNC ports from fs.py, so we just let gem5 assign the ports itself, and use -n only to match what it assigned. Those ports both appear on gem5 config.ini.
The GDB port can be assigned on gem5.opt --remote-gdb-port, but it does not appear on config.ini.