gem5: ruby build example

Document complexity downside vs QEMU.

Merge two separate limit number of tick sections.
This commit is contained in:
Ciro Santilli 六四事件 法轮功
2019-08-02 00:00:00 +00:00
parent 75e2582970
commit 71735a3a15

View File

@@ -9075,7 +9075,7 @@ Disk persistency is useful to re-run shell commands from the history of a previo
TODO how to make gem5 disk writes persistent?
As of cadb92f2df916dbb47f428fd1ec4932a2e1f0f48 there are some `read_only` entries in the <<config-ini>> under cow sections, but hacking them to true did not work:
As of cadb92f2df916dbb47f428fd1ec4932a2e1f0f48 there are some `read_only` entries in the <<gem5-config-ini>> under cow sections, but hacking them to true did not work:
....
diff --git a/configs/common/FSConfig.py b/configs/common/FSConfig.py
@@ -10314,23 +10314,33 @@ but the approximation is reasonable.
It is used mostly for microarchitecture research purposes: when you are making a new chip technology, you don't really need to specialize enormously to an existing microarchitecture, but rather develop something that will work with a wide range of future architectures.
** runs are deterministic by default, unlike QEMU which has a special <<qemu-record-and-replay>> mode, that requires first playing the content once and then replaying
** gem5 ARM at least appears to implement more low level CPU functionality than QEMU, e.g. QEMU only added EL2 in 2018: https://stackoverflow.com/questions/42824706/qemu-system-aarch64-entering-el1-when-emulating-a53-power-up See also: xref:arm-exception-levels[xrefstyle=full]
* disadvantage of gem5: slower than QEMU, see: xref:benchmark-linux-kernel-boot[xrefstyle=full]
* disadvantages of gem5:
** slower than QEMU, see: xref:benchmark-linux-kernel-boot[xrefstyle=full]
+
This implies that the user base is much smaller, since no Android devs.
+
Instead, we have only chip makers, who keep everything that really works closed, and researchers, who can't version track or document code properly >:-) And this implies that:
+
--
** the documentation is more scarce
** it takes longer to support new hardware features
*** the documentation is more scarce
*** it takes longer to support new hardware features
--
+
Well, not that AOSP is that much better anyways.
* not sure: gem5 has BSD license while QEMU has GPL
** not sure: gem5 has BSD license while QEMU has GPL
+
This suits chip makers that want to distribute forks with secret IP to their customers.
+
On the other hand, the chip makers tend to upstream less, and the project becomes more crappy in average :-)
** gem5 is way more complex and harder to modify and maintain
+
The only hairy thing in QEMU is the binary code generation.
+
gem5 however has tended towards intensive code generation in order to support all its different hardware types:
+
*** lots of magic happen on top of pybind11, which is already magic, to more automatically glue the C++ and Python worlds
*** .isa code which describes most of the instructions
*** <<gem5-ruby-build,Ruby>> for memory systems
=== gem5 run benchmark
@@ -10432,7 +10442,7 @@ Now you can play a fun little game with your friends:
* make a program that solves the computation problem, and outputs output to stdout
* write the code that runs the correct computation in the smallest number of cycles possible
To find out why your program is slow, a good first step is to have a look at <<gem5-stats-txt>> file.
To find out why your program is slow, a good first step is to have a look at the <<gem5-m5out-stats-txt-file>>.
==== Skip extra benchmark instructions
@@ -11303,7 +11313,19 @@ And then restore the checkpoint with a different CPU:
=== Pass extra options to gem5
Pass options to the `fs.py` script:
Remember that in the gem5 command line, we can either pass options to the script being run as in:
....
build/X86/gem5.opt configs/examples/fs.py --some-option
....
or to the gem5 executable itself:
....
build/X86/gem5.opt --some-option configs/examples/fs.py
....
Pass options to the script in our setup use:
* get help:
+
@@ -11316,7 +11338,7 @@ Pass options to the `fs.py` script:
./run --arch arm --emulator gem5 -- --caches --cpu-type=HPI
....
Pass options to the `gem5` executable itself:
To pass options to the `gem5` executable we expose the `--gem5-exe-args` option:
* get help:
+
@@ -11324,24 +11346,6 @@ Pass options to the `gem5` executable itself:
./run --gem5-exe-args='-h' --emulator gem5
....
=== gem5 exit after a number of instructions
Quit the simulation after `1024` instructions:
....
./run --emulator gem5 -- -I 1024
....
Can be nicely checked with <<gem5-tracing>>.
Cycles instead of instructions:
....
./run --emulator gem5 -- --memory 1024
....
Otherwise the simulation runs forever by default.
=== m5ops
m5ops are magic instructions which lead gem5 to do magic things, like quitting or dumping stats.
@@ -11696,13 +11700,15 @@ The location of that directory can be set with `./gem5.opt -d`, and defaults to
The files in that directory contains some very important information about the run, and you should become familiar with every one of them.
==== system.terminal
[[gem5-m5out-system-terminal-file]]
==== gem5 m5out/system.terminal file
Contains UART output, both from the Linux kernel or from the baremetal system.
Can also be seen live on <<m5term>>.
==== gem5 stats.txt
[[gem5-m5out-stats-txt-file]]
==== gem5 m5out/stats.txt file
This file contains important statistics about the run:
@@ -11736,9 +11742,9 @@ https://stackoverflow.com/questions/52014953/how-to-dump-only-a-single-or-certai
To prevent the stats file from becoming humongous.
==== config.ini
==== gem5 config.ini
The `config.ini` file, contains a very good high level description of the system:
The `m5out/config.ini` file, contains a very good high level description of the system:
....
less $(./getvar --arch arm --emulator gem5 m5out_dir)"
@@ -11851,7 +11857,7 @@ Disadvantages over `fs.py`:
* only works for ARM, not other archs
* not as many configuration options as `fs.py`, many things are hardcoded
We setup 2 big and 2 small CPUs, but `cat /proc/cpuinfo` shows 4 identical CPUs instead of 2 of two different types, likely because gem5 does not expose some informational register much like the caches: https://www.mail-archive.com/gem5-users@gem5.org/msg15426.html <<config-ini>> does show that the two big ones are `DerivO3CPU` and the small ones are `MinorCPU`.
We setup 2 big and 2 small CPUs, but `cat /proc/cpuinfo` shows 4 identical CPUs instead of 2 of two different types, likely because gem5 does not expose some informational register much like the caches: https://www.mail-archive.com/gem5-users@gem5.org/msg15426.html <<gem5-config-ini>> does show that the two big ones are `DerivO3CPU` and the small ones are `MinorCPU`.
TODO: why is the `--dtb` required despite `fs_bigLITTLE.py` having a DTB generation capability? Without it, nothing shows on terminal, and the simulation terminates with `simulate() limit reached @ 18446744073709551615`. The magic `vmlinux.vexpress_gem5_v1.20170616` works however without a DTB.
@@ -11953,6 +11959,19 @@ info: Entering event queue @ 0. Starting simulation...
Exiting @ tick 3000 because all threads reached the max instruction count
....
The exact same can be achieved with the older hardcoded `--maxinsts` mechanism present in `se.py` and `fs.py`:
....
./run \
--emulator gem5 \
--static \
--userland \userland/arch/x86_64/freestanding/linux/hello.S \
--trace-insts-stdout \
-- \
--maxinsts 3
;
....
The message also shows on <<user-mode-simulation>> deadlocks, for example in link:userland/posix/pthread_deadlock.c[]:
....
@@ -12124,6 +12143,42 @@ In theory, any software can be packaged, and the Buildroot side is easy.
+
The hard part is dealing with crappy third party build systems and huge dependency chains.
==== gem5 Ruby build
Ruby is a system that includes the SLICC domain specific language to describe memory systems: http://gem5.org/Ruby
It seems to have usage outside of gem5, but the naming overload with the link:https://en.wikipedia.org/wiki/Ruby_(programming_language)[Ruby programming language], which also has link:https://thoughtbot.com/blog/writing-a-domain-specific-language-in-ruby[domain specific languages] as a concept, makes it impossible to google anything about it!
Ruby is activated at compile time with the `PROTOCOL` flag, which specifies the desired memory system time.
For example, to use a two level https://en.wikipedia.org/wiki/MESI_protocol[MESI] https://en.wikipedia.org/wiki/Cache_coherence[cache coherence protocol], we can do:
....
./build-gem5 --arch aarch64 --gem5-build-id ruby -- PROTOCOL=MESI_Two_Level
....
and during build we see a humongous line of type:
....
[ SLICC] src/mem/protocol/MESI_Two_Level.slicc -> ARM/mem/protocol/AccessPermission.cc, ARM/mem/protocol/AccessPermission.hh, ...
....
which shows that dozens of C++ files are being generated from Ruby SLICC.
TODO observe it doing something during a run.
The relevant source files live in the source tree under:
....
src/mem/protocol/MESI_Two_Level*
....
We already pass the `SLICC_HTML` flag by default to the build, which generates an HTML summary of each memory protocol under:
....
xdg-open "$(./getvar --arch aarch64 --gem5-build-id ruby gem5_build_build_dir)/ARM/mem/protocol/html/index.html"
....
=== Custom Buildroot configs
We provide the following mechanisms:
@@ -13849,7 +13904,7 @@ RDTSC stores its output to EDX:EAX, even in 64-bit mode, top bits are zeroed out
TODO: review this section, make a more controlled userland experiment with <<m5ops>> instrumentation.
Let's have some fun and try to correlate the gem5 <<gem5-stats-txt>> `system.cpu.numCycles` cycle count with the https://en.wikipedia.org/wiki/Time_Stamp_Counter[x86 RDTSC instruction] that is supposed to do the same thing:
Let's have some fun and try to correlate the <<gem5-m5out-stats-txt-file>> `system.cpu.numCycles` cycle count with the https://en.wikipedia.org/wiki/Time_Stamp_Counter[x86 RDTSC instruction] that is supposed to do the same thing:
....
./build-userland --static userland/arch/x86_64/inline_asm/rdtsc.S
@@ -17179,7 +17234,7 @@ If a port is not free, it just crashes.
We assign a contiguous port range for each run ID.
** gem5 automatically increments ports until it finds a free one.
+
gem5 60600f09c25255b3c8f72da7fb49100e2682093a does not seem to expose a way to set the terminal and VNC ports from `fs.py`, so we just let gem5 assign the ports itself, and use `-n` only to match what it assigned. Those ports both appear on <<config-ini>>.
gem5 60600f09c25255b3c8f72da7fb49100e2682093a does not seem to expose a way to set the terminal and VNC ports from `fs.py`, so we just let gem5 assign the ports itself, and use `-n` only to match what it assigned. Those ports both appear on <<gem5-config-ini>>.
+
The GDB port can be assigned on `gem5.opt --remote-gdb-port`, but it does not appear on `config.ini`.