From 71735a3a15515a56e0fe50393f69871e20aad3e4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ciro=20Santilli=20=E5=85=AD=E5=9B=9B=E4=BA=8B=E4=BB=B6=20?= =?UTF-8?q?=E6=B3=95=E8=BD=AE=E5=8A=9F?= Date: Fri, 2 Aug 2019 00:00:00 +0000 Subject: [PATCH] gem5: ruby build example Document complexity downside vs QEMU. Merge two separate limit number of tick sections. --- README.adoc | 121 ++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 88 insertions(+), 33 deletions(-) diff --git a/README.adoc b/README.adoc index c937edc..999a8bb 100644 --- a/README.adoc +++ b/README.adoc @@ -9075,7 +9075,7 @@ Disk persistency is useful to re-run shell commands from the history of a previo TODO how to make gem5 disk writes persistent? -As of cadb92f2df916dbb47f428fd1ec4932a2e1f0f48 there are some `read_only` entries in the <> under cow sections, but hacking them to true did not work: +As of cadb92f2df916dbb47f428fd1ec4932a2e1f0f48 there are some `read_only` entries in the <> under cow sections, but hacking them to true did not work: .... diff --git a/configs/common/FSConfig.py b/configs/common/FSConfig.py @@ -10314,23 +10314,33 @@ but the approximation is reasonable. It is used mostly for microarchitecture research purposes: when you are making a new chip technology, you don't really need to specialize enormously to an existing microarchitecture, but rather develop something that will work with a wide range of future architectures. ** runs are deterministic by default, unlike QEMU which has a special <> mode, that requires first playing the content once and then replaying ** gem5 ARM at least appears to implement more low level CPU functionality than QEMU, e.g. QEMU only added EL2 in 2018: https://stackoverflow.com/questions/42824706/qemu-system-aarch64-entering-el1-when-emulating-a53-power-up See also: xref:arm-exception-levels[xrefstyle=full] -* disadvantage of gem5: slower than QEMU, see: xref:benchmark-linux-kernel-boot[xrefstyle=full] +* disadvantages of gem5: +** slower than QEMU, see: xref:benchmark-linux-kernel-boot[xrefstyle=full] + This implies that the user base is much smaller, since no Android devs. + Instead, we have only chip makers, who keep everything that really works closed, and researchers, who can't version track or document code properly >:-) And this implies that: + -- -** the documentation is more scarce -** it takes longer to support new hardware features +*** the documentation is more scarce +*** it takes longer to support new hardware features -- + Well, not that AOSP is that much better anyways. -* not sure: gem5 has BSD license while QEMU has GPL +** not sure: gem5 has BSD license while QEMU has GPL + This suits chip makers that want to distribute forks with secret IP to their customers. + On the other hand, the chip makers tend to upstream less, and the project becomes more crappy in average :-) +** gem5 is way more complex and harder to modify and maintain ++ +The only hairy thing in QEMU is the binary code generation. ++ +gem5 however has tended towards intensive code generation in order to support all its different hardware types: ++ +*** lots of magic happen on top of pybind11, which is already magic, to more automatically glue the C++ and Python worlds +*** .isa code which describes most of the instructions +*** <> for memory systems === gem5 run benchmark @@ -10432,7 +10442,7 @@ Now you can play a fun little game with your friends: * make a program that solves the computation problem, and outputs output to stdout * write the code that runs the correct computation in the smallest number of cycles possible -To find out why your program is slow, a good first step is to have a look at <> file. +To find out why your program is slow, a good first step is to have a look at the <>. ==== Skip extra benchmark instructions @@ -11303,7 +11313,19 @@ And then restore the checkpoint with a different CPU: === Pass extra options to gem5 -Pass options to the `fs.py` script: +Remember that in the gem5 command line, we can either pass options to the script being run as in: + +.... +build/X86/gem5.opt configs/examples/fs.py --some-option +.... + +or to the gem5 executable itself: + +.... +build/X86/gem5.opt --some-option configs/examples/fs.py +.... + +Pass options to the script in our setup use: * get help: + @@ -11316,7 +11338,7 @@ Pass options to the `fs.py` script: ./run --arch arm --emulator gem5 -- --caches --cpu-type=HPI .... -Pass options to the `gem5` executable itself: +To pass options to the `gem5` executable we expose the `--gem5-exe-args` option: * get help: + @@ -11324,24 +11346,6 @@ Pass options to the `gem5` executable itself: ./run --gem5-exe-args='-h' --emulator gem5 .... -=== gem5 exit after a number of instructions - -Quit the simulation after `1024` instructions: - -.... -./run --emulator gem5 -- -I 1024 -.... - -Can be nicely checked with <>. - -Cycles instead of instructions: - -.... -./run --emulator gem5 -- --memory 1024 -.... - -Otherwise the simulation runs forever by default. - === m5ops m5ops are magic instructions which lead gem5 to do magic things, like quitting or dumping stats. @@ -11696,13 +11700,15 @@ The location of that directory can be set with `./gem5.opt -d`, and defaults to The files in that directory contains some very important information about the run, and you should become familiar with every one of them. -==== system.terminal +[[gem5-m5out-system-terminal-file]] +==== gem5 m5out/system.terminal file Contains UART output, both from the Linux kernel or from the baremetal system. Can also be seen live on <>. -==== gem5 stats.txt +[[gem5-m5out-stats-txt-file]] +==== gem5 m5out/stats.txt file This file contains important statistics about the run: @@ -11736,9 +11742,9 @@ https://stackoverflow.com/questions/52014953/how-to-dump-only-a-single-or-certai To prevent the stats file from becoming humongous. -==== config.ini +==== gem5 config.ini -The `config.ini` file, contains a very good high level description of the system: +The `m5out/config.ini` file, contains a very good high level description of the system: .... less $(./getvar --arch arm --emulator gem5 m5out_dir)" @@ -11851,7 +11857,7 @@ Disadvantages over `fs.py`: * only works for ARM, not other archs * not as many configuration options as `fs.py`, many things are hardcoded -We setup 2 big and 2 small CPUs, but `cat /proc/cpuinfo` shows 4 identical CPUs instead of 2 of two different types, likely because gem5 does not expose some informational register much like the caches: https://www.mail-archive.com/gem5-users@gem5.org/msg15426.html <> does show that the two big ones are `DerivO3CPU` and the small ones are `MinorCPU`. +We setup 2 big and 2 small CPUs, but `cat /proc/cpuinfo` shows 4 identical CPUs instead of 2 of two different types, likely because gem5 does not expose some informational register much like the caches: https://www.mail-archive.com/gem5-users@gem5.org/msg15426.html <> does show that the two big ones are `DerivO3CPU` and the small ones are `MinorCPU`. TODO: why is the `--dtb` required despite `fs_bigLITTLE.py` having a DTB generation capability? Without it, nothing shows on terminal, and the simulation terminates with `simulate() limit reached @ 18446744073709551615`. The magic `vmlinux.vexpress_gem5_v1.20170616` works however without a DTB. @@ -11953,6 +11959,19 @@ info: Entering event queue @ 0. Starting simulation... Exiting @ tick 3000 because all threads reached the max instruction count .... +The exact same can be achieved with the older hardcoded `--maxinsts` mechanism present in `se.py` and `fs.py`: + +.... +./run \ + --emulator gem5 \ + --static \ + --userland \userland/arch/x86_64/freestanding/linux/hello.S \ + --trace-insts-stdout \ + -- \ + --maxinsts 3 +; +.... + The message also shows on <> deadlocks, for example in link:userland/posix/pthread_deadlock.c[]: .... @@ -12124,6 +12143,42 @@ In theory, any software can be packaged, and the Buildroot side is easy. + The hard part is dealing with crappy third party build systems and huge dependency chains. +==== gem5 Ruby build + +Ruby is a system that includes the SLICC domain specific language to describe memory systems: http://gem5.org/Ruby + +It seems to have usage outside of gem5, but the naming overload with the link:https://en.wikipedia.org/wiki/Ruby_(programming_language)[Ruby programming language], which also has link:https://thoughtbot.com/blog/writing-a-domain-specific-language-in-ruby[domain specific languages] as a concept, makes it impossible to google anything about it! + +Ruby is activated at compile time with the `PROTOCOL` flag, which specifies the desired memory system time. + +For example, to use a two level https://en.wikipedia.org/wiki/MESI_protocol[MESI] https://en.wikipedia.org/wiki/Cache_coherence[cache coherence protocol], we can do: + +.... +./build-gem5 --arch aarch64 --gem5-build-id ruby -- PROTOCOL=MESI_Two_Level +.... + +and during build we see a humongous line of type: + +.... +[ SLICC] src/mem/protocol/MESI_Two_Level.slicc -> ARM/mem/protocol/AccessPermission.cc, ARM/mem/protocol/AccessPermission.hh, ... +.... + +which shows that dozens of C++ files are being generated from Ruby SLICC. + +TODO observe it doing something during a run. + +The relevant source files live in the source tree under: + +.... +src/mem/protocol/MESI_Two_Level* +.... + +We already pass the `SLICC_HTML` flag by default to the build, which generates an HTML summary of each memory protocol under: + +.... +xdg-open "$(./getvar --arch aarch64 --gem5-build-id ruby gem5_build_build_dir)/ARM/mem/protocol/html/index.html" +.... + === Custom Buildroot configs We provide the following mechanisms: @@ -13849,7 +13904,7 @@ RDTSC stores its output to EDX:EAX, even in 64-bit mode, top bits are zeroed out TODO: review this section, make a more controlled userland experiment with <> instrumentation. -Let's have some fun and try to correlate the gem5 <> `system.cpu.numCycles` cycle count with the https://en.wikipedia.org/wiki/Time_Stamp_Counter[x86 RDTSC instruction] that is supposed to do the same thing: +Let's have some fun and try to correlate the <> `system.cpu.numCycles` cycle count with the https://en.wikipedia.org/wiki/Time_Stamp_Counter[x86 RDTSC instruction] that is supposed to do the same thing: .... ./build-userland --static userland/arch/x86_64/inline_asm/rdtsc.S @@ -17179,7 +17234,7 @@ If a port is not free, it just crashes. We assign a contiguous port range for each run ID. ** gem5 automatically increments ports until it finds a free one. + -gem5 60600f09c25255b3c8f72da7fb49100e2682093a does not seem to expose a way to set the terminal and VNC ports from `fs.py`, so we just let gem5 assign the ports itself, and use `-n` only to match what it assigned. Those ports both appear on <>. +gem5 60600f09c25255b3c8f72da7fb49100e2682093a does not seem to expose a way to set the terminal and VNC ports from `fs.py`, so we just let gem5 assign the ports itself, and use `-n` only to match what it assigned. Those ports both appear on <>. + The GDB port can be assigned on `gem5.opt --remote-gdb-port`, but it does not appear on `config.ini`.