diff --git a/.gitignore b/.gitignore index ec4bc4e..04677c6 100644 --- a/.gitignore +++ b/.gitignore @@ -25,8 +25,11 @@ __pycache__ # Accidents. /core /m5out + +# In-tree userland builds. *.o *.out +*.so # Kernel modules. *.ko @@ -40,3 +43,7 @@ modules.order # node.js node_modules + +# Performance profiling stuff. +perf.data +callgrind.out.* diff --git a/index.html b/index.html index 21df1d2..3a7e67f 100644 --- a/index.html +++ b/index.html @@ -673,8 +673,8 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
  • 10.3. User mode Buildroot executables
  • 10.4. User mode simulation with glibc
  • 10.5. User mode static executables @@ -1121,9 +1121,14 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
  • 19.5. gem5 checkpoint
  • 19.6. Pass extra options to gem5
  • @@ -1166,60 +1171,66 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
  • 19.10. m5term
  • 19.11. gem5 Python scripts without rebuild
  • 19.12. gem5 fs_bigLITTLE
  • -
  • 19.13. gem5 unit tests
  • -
  • 19.14. gem5 regression tests
  • -
  • 19.15. gem5 simulate() limit reached
  • -
  • 19.16. gem5 build options +
  • 19.13. gem5 in-tree tests
  • -
  • 19.17. gem5 CPU types +
  • 19.14. gem5 simulate() limit reached
  • +
  • 19.15. gem5 build options -
  • -
  • 19.18. gem5 ARM platforms
  • -
  • 19.19. gem5 upstream images
  • -
  • 19.20. gem5 internals +
  • 19.16. gem5 CPU types +
  • +
  • 19.17. gem5 ARM platforms
  • +
  • 19.18. gem5 upstream images
  • +
  • 19.19. gem5 internals +
  • -
  • 19.21. gem5 bootloaders
  • +
  • 19.20. gem5 bootloaders
  • 20. Buildroot @@ -1296,6 +1307,7 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
  • 21.2.2.1. C++17 N4659 standards draft
  • +
  • 21.2.3. C++ type casting
  • 21.3. POSIX @@ -1327,11 +1339,19 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
  • 21.6. Interpreted languages +
  • +
  • 21.9. Userland content filename conventions
  • +
  • 21.10. Userland content bibliography
  • 22. Userland assembly @@ -3937,7 +3962,7 @@ echo "$(./getvar --arch aarch64 --emulator gem5 image)"
    -

    see also: Section 19.18, “gem5 ARM platforms”.

    +

    see also: Section 19.17, “gem5 ARM platforms”.

    This generates yet new separate images with new magic constants:

    @@ -7457,7 +7482,7 @@ qw er

    At 125d14805f769104f93c510bedaa685a52ec025d we moved Buildroot from uClibc to glibc, and caused some user mode pain, which we document here.

    -

    10.4.1. FATAL: kernel too old

    +

    10.4.1. FATAL: kernel too old failure in userland simulation

    glibc has a check for kernel version, likely obtained from the uname syscall, and if the kernel is not new enough, it quits.

    @@ -7502,7 +7527,7 @@ qw er
    -

    10.4.2. stack smashing detected

    +

    10.4.2. stack smashing detected when using glibc

    For some reason QEMU / glibc x86_64 picks up the host libc, which breaks things.

    @@ -7586,7 +7611,7 @@ qemu: uncaught target signal 6 (Aborted) - core dumped
    • -

      QEMU x86_64 guest on x86_64 host was failing with stack smashing detected, but we found a workaround

      +

      QEMU x86_64 guest on x86_64 host was failing with stack smashing detected when using glibc, but we found a workaround

    • gem5 user only supported static executables in the past, as mentioned at: Section 10.7, “gem5 syscall emulation mode”

      @@ -17890,6 +17915,12 @@ root

      gem5 however has tended towards horrendous intensive code generation in order to support all its different hardware types

      +
      +

      gem5 also has a complex Python interface which is also largely auto-generated, which greatly increases the maintenance complexity of the project: Embedding Python in another application.

      +
      +
      +

      This is done so that reconfiguring platforms can be done quickly without recompiling, and it is amazing when it works, but the maintenance costs are also very high.

      +
    @@ -18005,7 +18036,7 @@ cat out/gem5-bench-dhrystone.txt
    -

    but the problem is that this method does not allow to easily run a different script without running the boot again. The ./gem5.sh script works around that by using m5 readfile as explained further at: Section 19.5.2, “gem5 checkpoint restore and run a different script”.

    +

    but the problem is that this method does not allow to easily run a different script without running the boot again. The ./gem5.sh script works around that by using m5 readfile as explained further at: Section 19.5.3, “gem5 checkpoint restore and run a different script”.

    Now you can play a fun little game with your friends:

    @@ -18147,10 +18178,13 @@ ps Haux | grep qemu | wc
    19.2.2.1.2. gem5 syscall emulation multithreading
    -

    gem5 user mode multithreading has been particularly flaky compared to QEMU’s.

    +

    gem5 user mode multithreading has been particularly flaky compared to QEMU’s, but work is being put into improving it.

    -

    You have the limitation that you must have at least one core per guest thread, otherwise pthread_create fails. For example:

    +

    In gem5 syscall simulation, the fork syscall checks if there is a free CPU, and if there is a free one, the new threads runs on that CPU. Otherwise, the fork call, and therefore higher level interfaces to fork such as pthread_create also fail and return a failure return status in the guest.

    +
    +
    +

    For example, if we use just one CPU for userland/posix/pthread_self.c which spawns one thread besides main:

    @@ -18158,7 +18192,7 @@ ps Haux | grep qemu | wc
    -

    fails because that process has a total of 2 threads: one for main and one extra thread spawned: userland/posix/pthread_self.c The error message is:

    +

    fails with this error message coming from the guest stderr:

    @@ -18174,10 +18208,18 @@ ps Haux | grep qemu | wc
    -

    This has to do with the fact that gem5 has a more simplistic thread implementation that does not spawn one host thread per guest thread CPU. Maybe this is required to achieve reproducible runs? What is the task switch algorithm then?

    +

    Once threads exit, their CPU is freed and becomes available for new fork calls: For example, the following run spawns a thread, joins it, and then spawns again, and 2 CPUs are enough:

    +
    +
    +
    +
    ./run --cpus 2 --emulator gem5 --userland userland/posix/pthread_self.c --userland-args '1 2'
    +
    -

    gem5 threading does however show the expected number of cores, e.g.:

    +

    because at each point in time, only up to two threads are running.

    +
    +
    +

    gem5 syscall emulation does show the expected number of cores when queried, e.g.:

    @@ -18188,22 +18230,6 @@ ps Haux | grep qemu | wc

    outputs 1 and 2 respectively.

    -
    -

    TODO: aarch64 seems to failing to spawn more than 2 threads at 369a47fc6e5c2f4a7f911c1c058b6088f8824463 + 1:

    -
    -
    -
    -
    ./run --arch aarch64 --cpus 3 --emulator gem5 --userland userland/posix/pthread_self.c --userland-args 2
    -
    -
    -
    -

    fails with:

    -
    -
    -
    -
    Exiting @ tick 18446744073709551615 because simulate() limit reached
    -
    -
    19.2.2.1.3. gem5 se.py user mode with 2 or more pthreads fails with because simulate() limit reached
    @@ -18690,9 +18716,12 @@ m5 dumpstats +
    +

    To see it in action try:

    +
    -
    ./run --arch arm --emulator gem5
    +
    ./run --arch aarch64 --emulator gem5
    @@ -18768,7 +18797,61 @@ m5 checkpoint

    since boot has already happened, and the parameters are already in the RAM of the snapshot.

    -

    19.5.1. gem5 checkpoint internals

    +

    19.5.1. gem5 checkpoint userland minimal example

    +
    +

    In order to debug checkpoint restore bugs, this minimal setup using userland/freestanding/gem5_checkpoint_restore.S can be handy:

    +
    +
    +
    +
    ./build-userland --arch aarch64 --static
    +./run --arch aarch64 --emulator gem5 --static --userland userland/freestanding/gem5_checkpoint_restore.S --trace-insts-stdout
    +./run --arch aarch64 --emulator gem5 --static --userland userland/freestanding/gem5_checkpoint_restore.S --trace-insts-stdout --gem5-restore 1
    +./run --arch aarch64 --emulator gem5 --static --userland userland/freestanding/gem5_checkpoint_restore.S --trace-insts-stdout --gem5-restore 1 -- --cpu-type=DerivO3CPU --restore-with-cpu=DerivO3CPU --caches
    +
    +
    +
    +

    On the initial run, we see that all instructions are executed and the checkpoint is taken:

    +
    +
    +
    +
          0: system.cpu: A0 T0 : @asm_main_after_prologue    :   movz   x0, #0, #0        : IntAlu :  D=0x0000000000000000  flags=(IsInteger)
    +    500: system.cpu: A0 T0 : @asm_main_after_prologue+4    :   movz   x1, #0, #0        : IntAlu :  D=0x0000000000000000  flags=(IsInteger)
    +   1000: system.cpu: A0 T0 : @asm_main_after_prologue+8    :   m5checkpoint             : IntAlu :   flags=(IsInteger|IsNonSpeculative|IsUnverifiable)
    +Writing checkpoint
    +warn: Checkpoints for file descriptors currently do not work.
    +info: Entering event queue @ 1000.  Starting simulation...
    +   1500: system.cpu: A0 T0 : @asm_main_after_prologue+12    :   movz   x0, #0, #0        : IntAlu :  D=0x0000000000000000  flags=(IsInteger)
    +   2000: system.cpu: A0 T0 : @asm_main_after_prologue+16    :   m5exit                   : No_OpClass :   flags=(IsInteger|IsNonSpeculative)
    +Exiting @ tick 2000 because m5_exit instruction encountered
    +
    +
    +
    +

    Then, on the first restore run, the checkpoint is restored, and only instructions after the checkpoint are executed:

    +
    +
    +
    +
    info: Entering event queue @ 1000.  Starting simulation...
    +   1500: system.cpu: A0 T0 : @asm_main_after_prologue+12    :   movz   x0, #0, #0        : IntAlu :  D=0x0000000000000000  flags=(IsInteger)
    +   2000: system.cpu: A0 T0 : @asm_main_after_prologue+16    :   m5exit                   : No_OpClass :   flags=(IsInteger|IsNonSpeculative)
    +Exiting @ tick 2000 because m5_exit instruction encountered
    +
    +
    +
    +

    and a similar thing happens for the restore with a different CPU type:

    +
    +
    +
    +
    info: Entering event queue @ 1000.  Starting simulation...
    +  79000: system.cpu: A0 T0 : @asm_main_after_prologue+12    :   movz   x0, #0, #0        : IntAlu :  D=0x0000000000000000  FetchSeq=1  CPSeq=1  flags=(IsInteger)
    +Exiting @ tick 84500 because m5_exit instruction encountered
    +
    +
    +
    +

    Here we don’t see the last m5 exit instruction on the log, but it must just be something to do with the O3 logging.

    +
    +
    +
    +

    19.5.2. gem5 checkpoint internals

    Checkpoints are stored inside the m5out directory at:

    @@ -18794,7 +18877,7 @@ m5 checkpoint
    -

    19.5.2. gem5 checkpoint restore and run a different script

    +

    19.5.3. gem5 checkpoint restore and run a different script

    You want to automate running several tests from a single pristine post-boot state.

    @@ -18942,7 +19025,7 @@ expect eof
    -

    19.5.3. gem5 restore checkpoint with a different CPU

    +

    19.5.4. gem5 restore checkpoint with a different CPU

    gem5 can switch to a different CPU model when restoring a checkpoint.

    @@ -18950,27 +19033,232 @@ expect eof

    A common combo is to boot Linux with a fast CPU, make a checkpoint and then replay the benchmark of interest with a slower CPU.

    -

    An illustrative interactive run:

    +

    This can be observed interactively in full system with:

    -
    ./run --arch arm --emulator gem5
    +
    ./run --arch aarch64 --emulator gem5
    -

    In guest:

    +

    Then in the guest terminal after boot ends:

    -
    m5 checkpoint
    +
    sh -c 'm5 checkpoint;sh'
    +m5 exit
    -

    And then restore the checkpoint with a different CPU:

    +

    And then restore the checkpoint with a different slower CPU:

    -
    ./run --arch arm --emulator gem5 --gem5-restore 1 -- --caches --restore-with-cpu=HPI
    +
    ./run --arch arm --emulator gem5 --gem5-restore 1 -- --caches --cpu-type=DerivO3CPU
    +
    +
    +
    +

    And now you will notice that everything happens much slower in the guest terminal!

    +
    +
    +

    One even more direct and minimal way to observe this is with userland/freestanding/gem5_checkpoint_restore.S which was mentioned at gem5 checkpoint userland minimal example plus some logging:

    +
    +
    +
    +
    ./run \
    +  --arch aarch64 \
    +  --emulator gem5 \
    +  --static \
    +  --trace ExecAll,FmtFlag,O3CPU,SimpleCPU \
    +  --userland userland/freestanding/gem5_checkpoint_restore.S \
    +;
    +cat "$(./getvar --arch aarch64 --emulator gem5 trace_txt_file)"
    +./run \
    +  --arch aarch64 \
    +  --emulator gem5 \
    +  --gem5-restore 1 \
    +  --static \
    +  --trace ExecAll,FmtFlag,O3CPU,SimpleCPU \
    +  --userland userland/freestanding/gem5_checkpoint_restore.S \
    +  -- \
    +  --caches \
    +  --cpu-type DerivO3CPU \
    +  --restore-with-cpu DerivO3CPU \
    +;
    +cat "$(./getvar --arch aarch64 --emulator gem5 trace_txt_file)"
    +
    +
    +
    +

    At gem5 2235168b72537535d74c645a70a85479801e0651, the first run does everything in AtomicSimpleCPU:

    +
    +
    +
    +
    ...
    +      0: SimpleCPU: system.cpu.dcache_port: received snoop pkt for addr:0x1f92 WriteReq
    +      0: SimpleCPU: system.cpu.dcache_port: received snoop pkt for addr:0x1e40 WriteReq
    +      0: SimpleCPU: system.cpu.dcache_port: received snoop pkt for addr:0x1e30 WriteReq
    +      0: SimpleCPU: system.cpu: Tick
    +      0: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue    :   movz   x0, #0, #0        : IntAlu :  D=0x0000000000000000  flags=(IsInteger)
    +    500: SimpleCPU: system.cpu: Tick
    +    500: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+4    :   movz   x1, #0, #0        : IntAlu :  D=0x0000000000000000  flags=(IsInteger)
    +   1000: SimpleCPU: system.cpu: Tick
    +   1000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+8    :   m5checkpoint             : IntAlu :   flags=(IsInteger|IsNonSpeculative|IsUnverifiable)
    +   1000: SimpleCPU: system.cpu: Resume
    +   1500: SimpleCPU: system.cpu: Tick
    +   1500: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+12    :   movz   x0, #0, #0        : IntAlu :  D=0x0000000000000000  flags=(IsInteger)
    +   2000: SimpleCPU: system.cpu: Tick
    +   2000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+16    :   m5exit                   : No_OpClass :   flags=(IsInteger|IsNonSpeculative)
    +
    +
    +
    +

    and after restore we see as expected a single ExecEnable instruction executed amidst O3CPU noise:

    +
    +
    +
    +
    FullO3CPU: Ticking main, FullO3CPU.
    +  79000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+12    :   movz   x0, #0, #0        : IntAlu :  D=0x0000000000000000  FetchSeq=1  CPSeq=1  flags=(IsInteger)
    +  82500: O3CPU: system.cpu: Removing committed instruction [tid:0] PC (0x400084=>0x400088).(0=>1) [sn:1]
    +  82500: O3CPU: system.cpu: Removing instruction, [tid:0] [sn:1] PC (0x400084=>0x400088).(0=>1)
    +  82500: O3CPU: system.cpu: Scheduling next tick!
    +  83000: O3CPU: system.cpu:
    +
    +
    +
    +

    which is the movz after the checkpoint. The final m5exit does not appear due to DerivO3CPU logging insanity.

    +
    +
    +

    Bibliography:

    +
    + +
    +
    19.5.4.1. gem5 fast forward
    +
    +

    Besides switching CPUs after a checkpoint restore, fs.py also has the --fast-forward option to automatically run the script from the start on a less detailed CPU, and switch to a more detailed CPU at a given tick.

    +
    +
    +

    This is generally useless compared to checkpoint restoring because:

    +
    +
    +
      +
    • +

      checkpoint restore allows to run multiple contents after the restore, and restoring to multiple different system states, which you almost always want to do

      +
    • +
    • +

      we generally don’t know the exact tick at which the region of interest will start, especially as the binaries change. It is much easier to just instrument the content with a checkoint m5op

      +
    • +
    +
    +
    +

    But let’s give it a try anyways with userland/freestanding/gem5_checkpoint_restore.S which was mentioned at gem5 checkpoint userland minimal example

    +
    +
    +
    +
    ./run \
    +  --arch aarch64 \
    +  --emulator gem5 \
    +  --static \
    +  --trace ExecAll,FmtFlag,O3CPU,SimpleCPU \
    +  --userland userland/freestanding/gem5_checkpoint_restore.S \
    +  -- \
    +  --caches
    +  --cpu-type DerivO3CPU \
    +  --fast-forward 1000 \
    +;
    +cat "$(./getvar --arch aarch64 --emulator gem5 trace_txt_file)"
    +
    +
    +
    +

    At gem5 2235168b72537535d74c645a70a85479801e0651 we see something like:

    +
    +
    +
    +
          0: O3CPU: system.switch_cpus: Creating O3CPU object.
    +      0: O3CPU: system.switch_cpus: Workload[0] process is 0      0: SimpleCPU: system.cpu: ActivateContext 0
    +      0: SimpleCPU: system.cpu.dcache_port: received snoop pkt for addr:0 WriteReq
    +      0: SimpleCPU: system.cpu.dcache_port: received snoop pkt for addr:0x40 WriteReq
    +...
    +
    +      0: SimpleCPU: system.cpu.dcache_port: received snoop pkt for addr:0x1f92 WriteReq
    +      0: SimpleCPU: system.cpu.dcache_port: received snoop pkt for addr:0x1e40 WriteReq
    +      0: SimpleCPU: system.cpu.dcache_port: received snoop pkt for addr:0x1e30 WriteReq
    +      0: SimpleCPU: system.cpu: Tick
    +      0: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue    :   movz   x0, #0, #0        : IntAlu :  D=0x0000000000000000  flags=(IsInteger)
    +    500: SimpleCPU: system.cpu: Tick
    +    500: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+4    :   movz   x1, #0, #0        : IntAlu :  D=0x0000000000000000  flags=(IsInteger)
    +   1000: SimpleCPU: system.cpu: Tick
    +   1000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+8    :   m5checkpoint             : IntAlu :   flags=(IsInteger|IsNonSpeculative|IsUnverifiable)
    +   1000: O3CPU: system.switch_cpus: [tid:0] Calling activate thread.
    +   1000: O3CPU: system.switch_cpus: [tid:0] Adding to active threads list
    +   1500: O3CPU: system.switch_cpus:
    +
    +FullO3CPU: Ticking main, FullO3CPU.
    +   1500: O3CPU: system.switch_cpus: Scheduling next tick!
    +   2000: O3CPU: system.switch_cpus:
    +
    +FullO3CPU: Ticking main, FullO3CPU.
    +   2000: O3CPU: system.switch_cpus: Scheduling next tick!
    +   2500: O3CPU: system.switch_cpus:
    +
    +...
    +
    +FullO3CPU: Ticking main, FullO3CPU.
    +  44500: ExecEnable: system.switch_cpus: A0 T0 : @asm_main_after_prologue+12    :   movz   x0, #0, #0        : IntAlu :  D=0x00000000000
    +  48000: O3CPU: system.switch_cpus: Removing committed instruction [tid:0] PC (0x400084=>0x400088).(0=>1) [sn:1]
    +  48000: O3CPU: system.switch_cpus: Removing instruction, [tid:0] [sn:1] PC (0x400084=>0x400088).(0=>1)
    +  48000: O3CPU: system.switch_cpus: Scheduling next tick!
    +  48500: O3CPU: system.switch_cpus:
    +
    +...
    +
    +
    +
    +

    We can also compare that to the same log but without --fast-forward and other CPU switch options:

    +
    +
    +
    +
          0: SimpleCPU: system.cpu.dcache_port: received snoop pkt for addr:0x1e40 WriteReq
    +      0: SimpleCPU: system.cpu.dcache_port: received snoop pkt for addr:0x1e30 WriteReq
    +      0: SimpleCPU: system.cpu: Tick
    +      0: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue    :   movz   x0, #0, #0        : IntAlu :  D=0x0000000000000000  flags=(IsInteger)
    +    500: SimpleCPU: system.cpu: Tick
    +    500: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+4    :   movz   x1, #0, #0        : IntAlu :  D=0x0000000000000000  flags=(IsInteger)
    +   1000: SimpleCPU: system.cpu: Tick
    +   1000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+8    :   m5checkpoint             : IntAlu :   flags=(IsInteger|IsNonSpeculative|IsUnverifiable)
    +   1000: SimpleCPU: system.cpu: Resume
    +   1500: SimpleCPU: system.cpu: Tick
    +   1500: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+12    :   movz   x0, #0, #0        : IntAlu :  D=0x0000000000000000  flags=(IsInteger)
    +   2000: SimpleCPU: system.cpu: Tick
    +   2000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+16    :   m5exit                   : No_OpClass :   flags=(IsInteger|IsNonSpeculative)
    +
    +
    +
    +

    Therefore, it is clear that what we wanted happen:

    +
    +
    +
      +
    • +

      up until the tick 1000, SimpleCPU was ticking

      +
    • +
    • +

      after tick 1000, cpu O3CPU started ticking

      +
    • +
    +
    +
    +

    Bibliography:

    +
    +
    @@ -19521,7 +19809,7 @@ git -C "$(./getvar linux_source_dir)" checkout -
    -

    because glibc was built to expect a newer Linux kernel as shown at: Section 10.4.1, “FATAL: kernel too old”. Your choices to sole this are:

    +

    because glibc was built to expect a newer Linux kernel as shown at: Section 10.4.1, “FATAL: kernel too old failure in userland simulation”. Your choices to solve this are:

    -

    It is obviously not possible to understand what they actually do from their commit message, so let’s explain them one by one here as we understand them:

    +

    It is obviously not possible to understand what the Linux kernel fork commits actually do from their commit message, so let’s explain them one by one here as we understand them:

    -

    19.13. gem5 unit tests

    +

    19.13. gem5 in-tree tests

    +

    All those tests could in theory be added to this repo instead of to gem5, and this is actually the superior setup as it is cross emulator.

    +
    +
    +

    But can the people from the project be convinced of that?

    +
    +
    +

    19.13.1. gem5 unit tests

    +

    These are just very small GTest tests that test a single class in isolation, they don’t run any executables.

    @@ -19890,8 +20186,11 @@ clock=500

    Note that the command and it’s corresponding results don’t need to show consecutively on stdout because tests are run in parallel. You just have to match them based on the class name CircleBufTest to the file circlebuf.test.cpp.

    -
    -

    19.14. gem5 regression tests

    +
    +

    19.13.2. gem5 regression tests

    +
    +

    This section is about running the gem5 in-tree tests.

    +
    @@ -19905,7 +20204,7 @@ clock=500
    -

    After the first run has downloaded the test binaries for you, you can speed up the process a little bit by skipping an useless scons call:

    +

    After the first run has downloaded the test binaries for you, you can speed up the process a little bit by skipping an useless SCons call:

    @@ -19913,11 +20212,28 @@ clock=500
    -

    Note however that --skip-build is required at least once per branch to download the test binaries, because the test interface is bad.

    +

    Note however that running without --skip-build is required at least once to download the test binaries, because the test interface is bad.

    +
    +
    +

    List available instead of running them:

    +
    +
    +
    +
    ./gem5-regression --gem5-worktree master --arch aarch64 --cmd list
    +
    +
    +
    +

    You can then pick one suite (has to be a suite, not an "individual test") from the list and run just it e.g. with:

    +
    +
    +
    +
    ./gem5-regression --arch aarch64 -- --uid SuiteUID:tests/gem5/cpu_tests/test.py:cpu_test_AtomicSimpleCPU_Bubblesort-ARM-opt
    +
    +
    -

    19.15. gem5 simulate() limit reached

    +

    19.14. gem5 simulate() limit reached

    This error happens when the following instruction limits are reached:

    @@ -20053,18 +20369,58 @@ Exiting @ tick 18446744073709551615 because simulate() limit reached
    -

    19.16. gem5 build options

    +

    19.15. gem5 build options

    In order to use different build options, you might also want to use gem5 build variants to keep the build outputs separate from one another.

    +
    +

    19.15.3. gem5 prof and perf builds

    +
    +

    Profiling builds as of 3cea7d9ce49bda49c50e756339ff1287fd55df77 both use: -g -O3 and disable asserts and logging like the gem5 fast build and:

    +
    +
    +
      +
    • +

      prof uses -pg for gprof

      +
    • +
    • +

      perf uses -lprofile for google-pprof

      +
    • +
    +
    + +
    +
    +

    19.15.4. gem5 clang build

    TODO test properly, benchmark vs GCC.

    @@ -20077,7 +20433,7 @@ Exiting @ tick 18446744073709551615 because simulate() limit reached
    -

    19.16.3. gem5 sanitation build

    +

    19.15.5. gem5 sanitation build

    If there gem5 appears to have a C++ undefined behaviour bug, which is often very difficult to track down, you can try to build it with the following extra SCons options:

    @@ -20118,20 +20474,29 @@ Direct leak of 2928 byte(s) in 43 object(s) allocated from: Direct leak of 2002 byte(s) in 3 object(s) allocated from: #0 0x7ff039804448 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10c448) #1 0x7ff0394fd813 in PyString_FromStringAndSize ../Objects/stringobject.c:88 - #2 0x7ff0394fd813 in PyString_FromStringAndSize ../Objects/stringobject.c:57 Direct leak of 40 byte(s) in 2 object(s) allocated from: #0 0x7ff039804448 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10c448) + #2 0x7ff0394fd813 in PyString_FromStringAndSize ../Objects/stringobject.c: + Direct leak of 40 byte(s) in 2 object(s) allocated from + #0 0x7ff039804448 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10c448) #1 0x7ff03951ea4b in PyList_New ../Objects/listobject.c:152 -Indirect leak of 10384 byte(s) in 11 object(s) allocated from: #0 0x7ff039804448 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10c448) #1 0x7ff03945e40d in _PyObject_GC_Malloc ../Modules/gcmodule.c:1499 #2 0x7ff03945e40d in _PyObject_GC_Malloc ../Modules/gcmodule.c:1493 +Indirect leak of 10384 byte(s) in 11 object(s) allocated from + #0 0x7ff039804448 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10c448 + #1 0x7ff03945e40d in _PyObject_GC_Malloc ../Modules/gcmodule.c: + #2 0x7ff03945e40d in _PyObject_GC_Malloc ../Modules/gcmodule.c:1493 Indirect leak of 4089 byte(s) in 6 object(s) allocated from: #0 0x7ff039804448 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10c448) #1 0x7ff0394fd648 in PyString_FromString ../Objects/stringobject.c:143 Indirect leak of 2090 byte(s) in 3 object(s) allocated from: - #0 0x7ff039804448 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10c448) #1 0x7ff0394eb36f in type_new ../Objects/typeobject.c:2421 #2 0x7ff0394eb36f in type_new ../Objects/typeobject.c:2094 + #0 0x7ff039804448 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10c448 + #1 0x7ff0394eb36f in type_new ../Objects/typeobject.c: + #2 0x7ff0394eb36f in type_new ../Objects/typeobject.c:2094 Indirect leak of 1346 byte(s) in 2 object(s) allocated from: #0 0x7ff039804448 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x10c448) - #1 0x7ff0394fd813 in PyString_FromStringAndSize ../Objects/stringobject.c:88 #2 0x7ff0394fd813 in PyString_FromStringAndSize ../Objects/stringobject.c:57 SUMMARY: AddressSanitizer: 418319 byte(s) leaked in 203 allocation(s). + #1 0x7ff0394fd813 in PyString_FromStringAndSize ../Objects/stringobject.c: + #2 0x7ff0394fd813 in PyString_FromStringAndSize ../Objects/stringobject.c: + SUMMARY: AddressSanitizer: 418319 byte(s) leaked in 203 allocation(s).
    @@ -20142,15 +20507,55 @@ Indirect leak of 1346 byte(s) in 2 object(s) allocated from:
    -

    19.16.4. gem5 Ruby build

    +

    19.15.6. gem5 Ruby build

    -

    Ruby is a system that includes the SLICC domain specific language to describe memory systems: http://gem5.org/Ruby

    +

    gem5 has two types of memory system:

    +
    +
    +
      +
    • +

      the classic memory system, which is used by default

      +
    • +
    • +

      the Ruby memory system

      +
    • +
    +
    +
    +

    The Ruby memory system includes the SLICC domain specific language to describe memory systems: http://gem5.org/Ruby

    It seems to have usage outside of gem5, but the naming overload with the Ruby programming language, which also has domain specific languages as a concept, makes it impossible to google anything about it!

    -

    Ruby is activated at compile time with the PROTOCOL flag, which specifies the desired memory system time.

    +

    Since it is not the default, Ruby is generally less stable that the classic memory model. However, because it allows describing a wide variety of important coherency protocols, while the classic system only describes a single protocol, Ruby is very importanonly describes a single protocol, Ruby is a very important feature of gem5.

    +
    +
    +

    Ruby support must be enabled at compile time with the scons PROTOCOL= flag, which compiles support for the desired memory system type.

    +
    +
    +

    Note however that most ISAs already implicitly set PROTOCOL via the build_opts/ directory, e.g. build_opts/ARM contains:

    +
    +
    +
    +
    PROTOCOL = 'MOESI_CMP_directory'
    +
    +
    +
    +

    and therefore ARM already compiles MOESI_CMP_directory by default.

    +
    +
    +

    Then, with fs.py and se.py, you can choose to use either the classic or built-in ruby system at runtime with the --ruby option:

    +
    +
    +
      +
    • +

      if --ruby is given, use the ruby memory system

      +
    • +
    • +

      otherwise, use the classic memory system

      +
    • +

    For example, to use a two level MESI cache coherence protocol, we can do:

    @@ -20173,10 +20578,7 @@ Indirect leak of 1346 byte(s) in 2 object(s) allocated from:

    which shows that dozens of C++ files are being generated from Ruby SLICC.

    -

    TODO observe it doing something during a run.

    -
    -
    -

    The relevant source files live in the source tree under:

    +

    The relevant Ruby source files live in the source tree under:

    @@ -20184,7 +20586,7 @@ Indirect leak of 1346 byte(s) in 2 object(s) allocated from:
    -

    We already pass the SLICC_HTML flag by default to the build, which generates an HTML summary of each memory protocol under:

    +

    We already pass the SLICC_HTML flag by default to the build, which generates an HTML summary of each memory protocol under (TODO broken: https://gem5.atlassian.net/browse/GEM5-357):

    @@ -20194,9 +20596,49 @@ Indirect leak of 1346 byte(s) in 2 object(s) allocated from:

    A minimized ruby config which was not merged upstream can be found for study at: https://gem5-review.googlesource.com/c/public/gem5/+/13599/1

    +
    +

    One easy way to see that Ruby is being used without understanding it in detail is to enable some logging:

    +
    +
    +
    +
    ./run \
    +  --arch aarch64 \
    +  --emulator gem5 \
    +  --gem5-worktree master \
    +  --userland userland/arch/aarch64/freestanding/linux/hello.S \
    +  --static \
    +  --trace ExecAll,FmtFlag,Ruby,XBar \
    +  -- \
    +  --ruby \
    +;
    +cat "$(./getvar --arch aarch64 --emulator gem5 trace_txt_file)"
    +
    +
    +
    +

    Then:

    +
    +
    +
      +
    • +

      when the --ruby flag is given, we see a gazillion Ruby related messages prefixed e.g. by RubyPort:.

      +
      +

      We also observe from ExecEnable lines that instruction timing is not simple anymore, so the memory system must have latencies

      +
      +
    • +
    • +

      without --ruby, we instead see XBar (Coherent Crossbar) related messages such as CoherentXBar:, which I believe is the more precise name for the memory model that the classic memory system uses

      +
    • +
    +
    +
    +

    Certain features may not work in Ruby. For example, gem5 checkpoint creation is only possible in Ruby protocols that support flush, which is the case for PROTOCOL=MOESI_hammer but not PROTOCOL=MESI_Three_Level: https://www.mail-archive.com/gem5-users@gem5.org/msg17418.html

    +
    +
    +

    Tested in gem5 d7d9bc240615625141cd6feddbadd392457e49eb.

    +
    -

    19.16.5. gem5 Python 3 build

    +

    19.15.7. gem5 Python 3 build

    Python 3 support was mostly added in 2019 Q3 at arounda347a1a68b8a6e370334be3a1d2d66675891e0f1 but remained buggy for some time afterwards.

    @@ -20214,7 +20656,7 @@ Indirect leak of 1346 byte(s) in 2 object(s) allocated from:
    -

    19.17. gem5 CPU types

    +

    19.16. gem5 CPU types

    gem5 has a few in tree CPU models for different purposes.

    @@ -20244,9 +20686,9 @@ Indirect leak of 1346 byte(s) in 2 object(s) allocated from:

    Both of those can be checked with git log and git blame.

    -

    19.17.1. List gem5 CPU types

    +

    19.16.1. List gem5 CPU types

    -
    19.17.1.1. gem5 BaseSimpleCPU
    +
    19.16.1.1. gem5 BaseSimpleCPU

    Simple abstract CPU without a pipeline.

    @@ -20283,7 +20725,7 @@ Indirect leak of 1346 byte(s) in 2 object(s) allocated from:
    -
    19.17.1.2. gem5 MinorCPU
    +
    19.16.1.2. gem5 MinorCPU

    Generic in-order core that does not model any specific CPU.

    @@ -20352,7 +20794,7 @@ Indirect leak of 1346 byte(s) in 2 object(s) allocated from:
    -
    19.17.1.3. gem5 DeriveO3CPU
    +
    19.16.1.3. gem5 DeriveO3CPU

    Generic out-of-order core. "O3" Stands for "Out Of Order"!

    @@ -20379,7 +20821,7 @@ Indirect leak of 1346 byte(s) in 2 object(s) allocated from:
    -

    19.17.2. gem5 ARM RSK

    +

    19.16.2. gem5 ARM RSK

    @@ -20389,7 +20831,7 @@ Indirect leak of 1346 byte(s) in 2 object(s) allocated from:
    -

    19.18. gem5 ARM platforms

    +

    19.17. gem5 ARM platforms

    The gem5 platform is selectable with the --machine option, which is named after the analogous QEMU -machine option, and which sets the --machine-type.

    @@ -20417,7 +20859,7 @@ Indirect leak of 1346 byte(s) in 2 object(s) allocated from:
    -

    19.19. gem5 upstream images

    +

    19.18. gem5 upstream images

    Present at:

    @@ -20471,7 +20913,7 @@ cd ..
    -

    19.20. gem5 internals

    +

    19.19. gem5 internals

    Internals under other sections:

    @@ -20489,7 +20931,7 @@ cd ..
    -

    19.20.1. gem5 Eclipse configuration

    +

    19.19.1. gem5 Eclipse configuration

    In order to develop complex C++ software such as gem5, a good IDE setup is fundamental.

    @@ -20527,9 +20969,9 @@ cd ..
    -

    19.20.2. gem5 Python C++ interaction

    +

    19.19.2. gem5 Python C++ interaction

    -

    The interaction uses the Python C extension interface https://docs.python.org/2/extending/extending.html interface through the pybind11 helper library: https://github.com/pybind/pybind11

    +

    The interaction uses the Python C extension interface https://docs.python.org/2/extending/extending.html interface through the pybind11 helper library: https://github.com/pybind/pybind11

    The C++ executable both:

    @@ -20558,7 +21000,7 @@ cd ..
    -

    then gem5 magic simobject class adds some crazy stuff on top of it further…​ is is a mess. in particular, it auto generates params/ headers. TODO: why is this mess needed at all? pybind11 seems to handle constructor arguments just fine:

    +

    then gem5 magic SimObject class adds some crazy stuff on top of it further, is is a mess. In particular, it auto generates params/ headers. TODO: why is this mess needed at all? pybind11 seems to handle constructor arguments just fine:

      @@ -20593,7 +21035,7 @@ cd ..
    -

    Since BadDevice has no __init__ method, and neither BasicPioDevice, it all just falls through until the SimObject.init constructor.

    +

    Since BadDevice has no __init__ method, and neither BasicPioDevice, it all just falls through until the SimObject.__init__ constructor.

    This constructor will loop through the inheritance chain and give the Python parameters to the C++ BadDeviceParams class as follows.

    @@ -20689,11 +21131,17 @@ static EmbeddedPyBind embed_obj("BadDevice", module_init, "BasicPioDevice");
    +

    It has been found that this usage of pybind11 across hundreds of SimObject files accounted for 50% of the gem5 build time at one point: https://gem5.atlassian.net/browse/GEM5-366

    +
    +
    +

    To get a feeling of how SimObject objects are run, see: gem5 event queue AtomicSimpleCPU syscall emulation freestanding example analysis.

    +
    +

    Tested on gem5 08c79a194d1a3430801c04f37d13216cc9ec1da3.

    -

    19.20.3. gem5 entry point

    +

    19.19.3. gem5 entry point

    The main is at: src/sim/main.cc. It calls:

    @@ -20775,14 +21223,14 @@ exec filecode in scope
    -

    and that is where doSimLoop the main event loop, doSimLoop gets called and starts kicking off the gem5 event queue.

    +

    and that is where the main event loop, doSimLoop, gets called and starts kicking off the gem5 event queue.

    Tested at gem5 b4879ae5b0b6644e6836b0881e4da05c64a6550d.

    -

    19.20.4. gem5 event queue

    +

    19.19.4. gem5 event queue

    gem5 is an event based simulator, and as such the event queue is of of the crucial elements in the system.

    @@ -20842,7 +21290,7 @@ exec filecode in scope

    This calls the Event::process method of the event.

    -
    19.20.4.1. gem5 event queue AtomicSimpleCPU syscall emulation freestanding example analysis
    +
    19.19.4.1. gem5 event queue AtomicSimpleCPU syscall emulation freestanding example analysis

    Let’s now analyze every single event on a minimal gem5 syscall emulation mode in the simplest CPU that we have:

    @@ -20953,7 +21401,7 @@ ArmLinuxProcess64::initState() at process.cc:1,777 0x5555572d5e5e

    which calls EventManager::schedule.

    -

    AtomicSimpleCPU is an EventManager because SimObject inherits from it.

    +

    AtomicSimpleCPU is an EventManager because SimObject inherits from it.

    tickEvent is an EventFunctionWrapper which contains a std::function<void(void)> callback;, and is initialized in the constructor as:

    @@ -20965,6 +21413,110 @@ ArmLinuxProcess64::initState() at process.cc:1,777 0x5555572d5e5e
    +

    The call stack above ArmLinuxProcess64::initState is pybind11 fuzziness, but if we grep a bit we find the Python call point:

    +
    +
    +

    src/python/m5/simulate.py

    +
    +
    +
    +
    def instantiate(ckpt_dir=None):
    +
    +    ...
    +
    +    # Create the C++ sim objects and connect ports
    +    for obj in root.descendants(): obj.createCCObject()
    +    for obj in root.descendants(): obj.connectPorts()
    +
    +    # Do a second pass to finish initializing the sim objects
    +    for obj in root.descendants(): obj.init()
    +
    +    ...
    +
    +    # Restore checkpoint (if any)
    +    if ckpt_dir:
    +        ...
    +    else:
    +        for obj in root.descendants(): obj.initState()
    +
    +
    +
    +

    As we can see, initState is just one stage of generic SimObject initialization. root.descendants() goes over the entire SimObject tree calling initState().

    +
    +
    +

    Finally, we see that initState is part of the SimObject C++ API:

    +
    +
    +

    src/sim/sim_object.hh

    +
    +
    +
    +
    class SimObject : public EventManager, public Serializable, public Drainable,
    +                  public Stats::Group
    +{
    +
    +    ...
    +
    +    /**
    +     * initState() is called on each SimObject when *not* restoring
    +     * from a checkpoint.  This provides a hook for state
    +     * initializations that are only required for a "cold start".
    +     */
    +    virtual void initState();
    +
    +
    +
    +

    Finally, we see that initState is exposed to the Python API at:

    +
    +
    +

    build/ARM/python/_m5/param_SimObject.cc

    +
    +
    +
    +
    module_init(py::module &m_internal)
    +{
    +    py::module m = m_internal.def_submodule("param_SimObject");
    +    py::class_<SimObjectParams, std::unique_ptr<SimObjectParams, py::nodelete>>(m, "SimObjectParams")
    +        .def_readwrite("name", &SimObjectParams::name)
    +        .def_readwrite("eventq_index", &SimObjectParams::eventq_index)
    +        ;
    +
    +    py::class_<SimObject, Drainable, Serializable, Stats::Group, std::unique_ptr<SimObject, py::nodelete>>(m, "SimObject")
    +        .def("init", &SimObject::init)
    +        .def("initState", &SimObject::initState)
    +        .def("memInvalidate", &SimObject::memInvalidate)
    +        .def("memWriteback", &SimObject::memWriteback)
    +        .def("regProbePoints", &SimObject::regProbePoints)
    +        .def("regProbeListeners", &SimObject::regProbeListeners)
    +        .def("startup", &SimObject::startup)
    +        .def("loadState", &SimObject::loadState, py::arg("cp"))
    +        .def("getPort", &SimObject::getPort, pybind11::return_value_policy::reference, py::arg("if_name"), py::arg("idx"))
    +        ;
    +
    +}
    +
    +
    +
    +

    which is more magical than the other param classes since py::class_<SimObject has non-trivial methods, those are auto-generated by the cxx_exports code generation mechanism:

    +
    +
    +
    +
    class SimObject(object):
    +
    +    ...
    +
    +    cxx_exports = [
    +        PyBindMethod("init"),
    +        PyBindMethod("initState"),
    +        PyBindMethod("memInvalidate"),
    +        PyBindMethod("memWriteback"),
    +        PyBindMethod("regProbePoints"),
    +        PyBindMethod("regProbeListeners"),
    +        PyBindMethod("startup"),
    +    ]
    +
    +
    +

    So that’s how the main atomic tick loop works, fully understood!

    @@ -21033,7 +21585,7 @@ AtomicSimpleCPU::tick() at atomic.cc:757 0x55555907834c
    -
    19.20.4.2. gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis
    +
    19.19.4.2. gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis

    TODO: analyze better what each of the memory event mean. For now, we have just collected a bunch of data there, but needs interpreting. The CPU specifics in this section are already insightful however.

    @@ -21440,7 +21992,7 @@ TimingSimpleCPU::IcachePort::ITickEvent::process()
    -
    19.20.4.3. gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis with caches
    +
    19.19.4.3. gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis with caches

    Let’s just add --caches to see if things go any faster:

    @@ -21479,7 +22031,7 @@ info: Entering event queue @ 0. Starting simulation...
    -
    19.20.4.4. gem5 event queue MinorCPU syscall emulation freestanding example analysis
    +
    19.19.4.4. gem5 event queue MinorCPU syscall emulation freestanding example analysis

    The events for the Atomic CPU were pretty simple: basically just ticks.

    @@ -21506,7 +22058,7 @@ info: Entering event queue @ 0. Starting simulation...
    -

    19.20.5. gem5 stats internals

    +

    19.19.5. gem5 stats internals

    This describes the internals of the gem5 m5out/stats.txt file.

    @@ -21579,7 +22131,7 @@ Text::end()
    -

    19.20.6. gem5 code generation

    +

    19.19.6. gem5 code generation

    gem5 uses a ton of code generation, which makes the project horrendous:

    @@ -21624,7 +22176,7 @@ Text::end()

    But it has been widely overused to insanity. It likely also exists partly because when the project started in 2003 C++ compilers weren’t that good, so you couldn’t rely on features like templates that much.

    -
    19.20.6.1. gem5 THE_ISA
    +
    19.19.6.1. gem5 THE_ISA

    Generated code at: build/<ISA>/config/the_isa.hh which contains amongst other lines:

    @@ -21651,9 +22203,9 @@ enum class Arch {
    -

    19.20.7. gem5 build system

    +

    19.19.7. gem5 build system

    -
    19.20.7.1. gem5 polymorphic ISA includes
    +
    19.19.7.1. gem5 polymorphic ISA includes

    E.g. src/cpu/decode_cache.hh includes:

    @@ -21732,7 +22284,7 @@ build/ARM/config/the_isa.hh
    -
    19.20.7.2. Why are all C++ symlinked into the gem5 build dir?
    +
    19.19.7.2. Why are all C++ symlinked into the gem5 build dir?

    Some scons madness.

    @@ -21751,6 +22303,9 @@ build/ARM/config/the_isa.hh
    -

    19.21. gem5 bootloaders

    +

    19.20. gem5 bootloaders

    Certain ISAs like ARM have bootloaders that are automatically run before the main image to setup basic system state.

    @@ -22174,7 +22729,7 @@ TODO benchmark: would gem5 suffer a considerable disk read performance hit due t

    libguestfs: https://serverfault.com/questions/246835/convert-directory-to-qemu-kvm-virtual-disk-image/916697#916697, in particular vfs-minimum-size

  • -

    use methods described at: Section 19.5.2, “gem5 checkpoint restore and run a different script” instead of putting builds on the root filesystem

    +

    use methods described at: Section 19.5.3, “gem5 checkpoint restore and run a different script” instead of putting builds on the root filesystem

  • @@ -23295,6 +23850,15 @@ time ./mutex.out 4 100000000 +
    +

    21.2.3. C++ type casting

    +
    +

    userland/cpp/static_dynamic_reinterpret_cast.cpp

    +
    +
    +

    https://stackoverflow.com/questions/332030/when-should-static-cast-dynamic-cast-const-cast-and-reinterpret-cast-be-used/60414256#60414256

    +
    +

    21.3. POSIX

    @@ -23632,6 +24196,9 @@ There are no non-locking atomic types or atomic primitives in POSIX:

    Leads to the dreadful "Stack smashing detected" message. Which is infinitely better than a silent break in any case.

    +
    +

    We had also seen this error in our repository at: stack smashing detected when using glibc.

    +

    21.5.2. Memory leaks

    @@ -23642,6 +24209,25 @@ There are no non-locking atomic types or atomic primitives in POSIX: userland/c/memory_leak.c

    +
    +

    21.5.3. Profiling userland programs

    +
    +

    https://stackoverflow.com/questions/375913/how-can-i-profile-c-code-running-on-linux/60265409#60265409

    +
    +
    +

    OK, we have to learn this stuff.

    +
    +
    +

    Examples:

    +
    +
    + +
    +

    21.6. Interpreted languages

    @@ -23651,7 +24237,29 @@ There are no non-locking atomic types or atomic primitives in POSIX:

    21.6.1. Python

    -

    Build and install the interpreter on the target:

    +

    Examples:

    +
    +
    + +
    +
    +
    21.6.1.1. Build and install the interpreter
    +
    +

    Buildroot has a Python package that can be added to the guest image:

    @@ -23706,8 +24314,11 @@ There are no non-locking atomic types or atomic primitives in POSIX:
    +
    +
    +
    21.6.1.2. Python gem5 user mode simulation
    -

    LKMC 50ac89b779363774325c81157ec8b9a6bdb50a2f gem5 390a74f59934b85d91489f8a563450d8321b602da:

    +

    At LKMC 50ac89b779363774325c81157ec8b9a6bdb50a2f gem5 390a74f59934b85d91489f8a563450d8321b602da:

    @@ -23753,16 +24364,64 @@ There are no non-locking atomic types or atomic primitives in POSIX:

    which corresponds to the glorious getrandom syscall: https://github.com/torvalds/linux/blob/v4.17/include/uapi/asm-generic/unistd.h#L707

    +
    +
    +
    21.6.1.3. Embedding Python in another application
    -

    Examples:

    +

    Here we will add some better examples and explanations for: https://docs.python.org/3/extending/embedding.html#very-high-level-embedding

    +
    +
    +

    "Embedding Python" basically means calling the Python interpreter from C, and possibly passing values between the two.

    +
    +
    +

    These examples show to to embed the Python interpreter into a C/C++ application to interface between them

    • -

      rootfs_overlay/lkmc/python/hello.py: hello world

      +

      userland/libs/python_embed/eval.c: this example simply does eval a Python string in C, and don’t communicate any values between the two.

      +
      +

      It could be used to call external commands that have external side effects, but it is not very exciting.

      +
      +
    • +
    • +

      userland/libs/python_embed/pure.c: this example actually defines some Python classes and functions from C, implementing those entirely in C.

      +
      +

      The C program that defines those classes then instantiates the interpreter calls some regular Python code from it: userland/libs/python_embed/pure.py

      +
      +
      +

      The regular Python code can then use the native C classes as if they were defined in Python.

      +
      +
      +

      Finally, the Python returns values back to the C code that called the interpreter.

      +
      +
    • +
    • +

      userland/libs/python_embed/pure_cpp.cpp: C version of the above, the main goal of this example is to show how to interface with C classes.

    +
    +

    One notable user of Python embedding is the gem5 simulator, see also: gem5 vs QEMU. gem5 embeds the Python interpreter in order to interpret scripts as seen from the CLI:

    +
    +
    +
    +
    build/ARM/gem5.opt configs/example/fs.py
    +
    +
    +
    +

    gem5 then runs that Python script, which instantiates C classes defined from Python, and then finally hands back control to the C runtime to run the actual simulation faster.

    +
    +
    +

    21.6.2. Node.js

    @@ -24715,10 +25374,51 @@ git clean -xdf .

    See for example BLAS.

    +
    -

    21.9. Userland content bibliography

    +

    21.9. Userland content filename conventions

    +
    +

    The following basenames should always refer to programs that do the same thing, but in different languages:

    +
    +
    + +
    +
    +
    +

    21.10. Userland content bibliography

    +

    All benchmarks done on P51.

    +
    +

    Sample results at gem5 2a9573f5942b5416fb0570cf5cb6cdecba733392: 10 to 12 minutes.

    @@ -33141,6 +33900,9 @@ xdg-open graph-size.pdf tail -n+1 ../linux-kernel-module-cheat-regression/*/gem5-bench-build-*.txt
    +
    +

    Ubuntu 19.10, GCC 9.2.1, LKMC 7c6bb29bc89ec3f1056c0680c3f08bd64018a7bc, gem5 d7d9bc240615625141cd6feddbadd392457e49eb (18-02-2020), ./build --arch aarch64 --gem5-worktree master --no-cache: 19:33 TODO must investigate why it got so much worse.

    +
    29.2.3.3.1. Benchmark gem5 single file change rebuild time
    @@ -33223,29 +33985,10 @@ tail -n+1 ../linux-kernel-module-cheat-regression/*/gem5-bench-build-*.txt

    29.3.1. P51

    -

    Lenovo ThinkPad P51 laptop:

    +

    Lenovo ThinkPad P51 laptop with the Latest stable Ubuntu.

    -
    -
      -
    • -

      2500 USD in 2018 (high end)

      -
    • -
    • -

      Intel Core i7-7820HQ Processor (8MB Cache, up to 3.90GHz) (4 cores 8 threads)

      -
    • -
    • -

      32GB(16+16) DDR4 2400MHz SODIMM

      -
    • -
    • -

      512GB SSD PCIe TLC OPAL2

      -
    • -
    • -

      NVIDIA Quadro M1200 Mobile, latest Ubuntu supported proprietary driver

      -
    • -
    • -

      Latest Ubuntu

      -
    • -
    +
    +

    Full specs and benchmark scores will be maintained at the latest version of: https://github.com/cirosantilli/notes/blob/0c038b0e430d0017f12d028c6a0e7c0b99ec957f/my-hardware.adoc#thinkpad-p51

    @@ -34190,6 +34933,17 @@ export CCACHE_MAXSIZE="20G"
    +
    +

    ccache can be disabled with the --no-ccache option as in:

    +
    +
    +
    +
    ./build-gem5 --no-ccache
    +
    +
    +
    +

    This can be useful to benchmark builds.

    +

    33.10. getvar