From 37627d90aaf9068798d1e67212df0f6c12c8a3b5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ciro=20Santilli=20=E5=85=AD=E5=9B=9B=E4=BA=8B=E4=BB=B6=20?= =?UTF-8?q?=E6=B3=95=E8=BD=AE=E5=8A=9F?= Date: Thu, 29 Oct 2020 00:00:00 +0000 Subject: [PATCH] 1cf5222769851522454e241f3270d34c5ee55951 --- index.html | 1419 ++++++++++++---------------------------------------- 1 file changed, 320 insertions(+), 1099 deletions(-) diff --git a/index.html b/index.html index 64d5a65..8e158c1 100644 --- a/index.html +++ b/index.html @@ -477,10 +477,9 @@ pre{ white-space:pre }
  • 1.2.3. About the QEMU Buildroot setup
  • @@ -1069,19 +1068,17 @@ pre{ white-space:pre }
  • 22.7. QEMU monitor @@ -1188,8 +1185,9 @@ pre{ white-space:pre }
  • 23.8.2. m5ops instructions
  • @@ -1492,11 +1490,7 @@ pre{ white-space:pre }
  • 26.2.3. GCC C extensions
  • @@ -1520,7 +1514,17 @@ pre{ white-space:pre }
  • 26.3.3.1. C++17 N4659 standards draft
  • -
  • 26.3.4. C++ type casting
  • +
  • 26.3.4. C++ templates + +
  • +
  • 26.3.5. C++ type casting
  • +
  • 26.3.6. C++ compile time magic + +
  • 26.4. POSIX @@ -1561,10 +1565,16 @@ pre{ white-space:pre } -
  • -
  • 26.9.2. Microbenchmarks -
  • +
  • 26.9.2. Microbenchmarks
  • 26.10. userland/libs directory @@ -2105,11 +2108,7 @@ pre{ white-space:pre }
  • -
  • -

    added in our fork of QEMU:

    -
    - -
    -
  • @@ -17428,77 +17352,7 @@ Format specific information:

    Only tested in x86.

    -
    22.6.1.1. pci_min
    -
    -

    PCI driver for our minimal pci_min.c QEMU fork device:

    -
    -
    -
    -
    ./run -- -device lkmc_pci_min
    -
    -
    -
    -

    then:

    -
    -
    -
    -
    insmod pci_min.ko
    -
    -
    -
    -

    Sources:

    -
    - -
    -

    Outcome:

    -
    -
    -
    -
    <4>[   10.608241] pci_min: loading out-of-tree module taints kernel.
    -<6>[   10.609935] probe
    -<6>[   10.651881] dev->irq = 11
    -lkmc_pci_min mmio_write addr = 0 val = 12345678 size = 4
    -<6>[   10.668515] irq_handler irq = 11 dev = 251
    -lkmc_pci_min mmio_write addr = 4 val = 0 size = 4
    -
    -
    -
    -

    What happened:

    -
    -
    -
      -
    • -

      right at probe time, we write to a register

      -
    • -
    • -

      our hardware model is coded such that it generates an interrupt when written to

      -
    • -
    • -

      the Linux kernel interrupt handler write to another register, which tells the hardware to stop sending interrupts

      -
    • -
    -
    -
    -

    Kernel messages and printks from inside QEMU are shown all together, to see that more clearly, run in QEMU graphic mode instead.

    -
    -
    -

    We don’t enable the device by default because it does not work for vanilla QEMU, which we often want to test with this repository.

    -
    -
    -

    Probe already does a MMIO write, which generates an IRQ and tests everything.

    -
    -
    -
    -
    22.6.1.2. QEMU edu PCI device
    +
    22.6.1.1. QEMU edu PCI device

    Small upstream educational PCI device:

    @@ -17564,11 +17418,14 @@ lkmc_pci_min mmio_write addr = 4 val = 0 size = 4
  • https://stackoverflow.com/questions/62831327/add-memory-device-to-qemu

  • +
  • +

    https://stackoverflow.com/questions/64539528/qemu-pci-dma-read-and-pci-dma-write-does-not-work

    +
  • -
    22.6.1.3. Manipulate PCI registers directly
    +
    22.6.1.2. Manipulate PCI registers directly

    In this section we will try to interact with PCI devices directly from userland without kernel modules.

    @@ -17590,7 +17447,7 @@ lkmc_pci_min mmio_write addr = 4 val = 0 size = 4
    -

    which we identify as being edu and pci_min respectively by the magic numbers: 1234:11e?

    +

    which we identify as being QEMU edu PCI device by the magic number: 1234:11e8.

    Alternatively, we can also do use the QEMU monitor:

    @@ -17605,17 +17462,7 @@ lkmc_pci_min mmio_write addr = 4 val = 0 size = 4
    -
          dev: lkmc_pci_min, id ""
    -        addr = 07.0
    -        romfile = ""
    -        rombar = 1 (0x1)
    -        multifunction = false
    -        command_serr_enable = true
    -        x-pcie-lnksta-dllla = true
    -        x-pcie-extcap-init = true
    -        class Class 00ff, addr 00:07.0, pci id 1234:11e9 (sub 1af4:1100)
    -        bar 0: mem at 0xfeb54000 [0xfeb54007]
    -      dev: edu, id ""
    +
          dev: edu, id ""
             addr = 06.0
             romfile = ""
             rombar = 1 (0x1)
    @@ -17652,7 +17499,7 @@ lkmc_pci_min mmio_write addr = 4 val = 0 size = 4
    setpci -s 0000:00:06.0 BASE_ADDRESS_0
    -setpci -d 1234:11e9 BASE_ADDRESS_0
    +setpci -d 1234:11e8 BASE_ADDRESS_0
    @@ -17680,16 +17527,14 @@ setpci -d 1234:11e9 BASE_ADDRESS_0
    -

    which writes to the first register of our pci_min device.

    +

    which writes to the first register of the edu device.

    -

    The device then fires an interrupt at irq 11, which is unhandled, which leads the kernel to say you are a bad boy:

    +

    The device then fires an interrupt at irq 11, which is unhandled, which leads the kernel to say you are a bad person:

    -
    lkmc_pci_min mmio_write addr = 0 val = 12345678 size = 4
    -<5>[ 1064.042435] random: crng init done
    -<3>[ 1065.567742] irq 11: nobody cared (try booting with the "irqpoll" option)
    +
    <3>[ 1065.567742] irq 11: nobody cared (try booting with the "irqpoll" option)
    @@ -17705,7 +17550,7 @@ devmem 0xfeb54000 w 0x12345678
    -

    Our kernel module handles the interrupt, but does not acknowledge it like our proper pci_min kernel module, and so it keeps firing, which leads to infinitely many messages being printed:

    +

    Our kernel module handles the interrupt, but does not acknowledge it like our proper edu kernel module, and so it keeps firing, which leads to infinitely many messages being printed:

    @@ -17714,7 +17559,7 @@ devmem 0xfeb54000 w 0x12345678
    -
    22.6.1.4. pciutils
    +
    22.6.1.3. pciutils

    There are two versions of setpci and lspci:

    @@ -17730,7 +17575,7 @@ devmem 0xfeb54000 w 0x12345678
    -
    22.6.1.5. Introduction to PCI
    +
    22.6.1.4. Introduction to PCI

    The PCI standard is non-free, obviously like everything in low level: https://pcisig.com/specifications but Google gives several illegal PDF hits :-)

    @@ -17790,7 +17635,7 @@ devmem 0xfeb54000 w 0x12345678
    -
    22.6.1.6. PCI BFD
    +
    22.6.1.5. PCI BFD

    lspci -k shows something like:

    @@ -17844,7 +17689,7 @@ devmem 0xfeb54000 w 0x12345678
    -
    22.6.1.7. PCI BAR
    +
    22.6.1.6. PCI BAR
    @@ -18001,85 +17846,7 @@ echo 255 >brightness
    -

    22.6.4. platform_device

    -
    -

    Minimal platform device example coded into the -M versatilepb SoC of our QEMU fork.

    -
    -
    -

    Using this device now requires checking out to the branch:

    -
    -
    -
    -
    git checkout platform-device
    -git submodule sync
    -
    -
    -
    -

    before building, it does not work on master.

    -
    -
    -

    Rationale: we found out that the kernels that build for qemu -M versatilepb don’t work on gem5 because versatilepb is an old pre-v7 platform, and gem5 requires armv7. So we migrated over to -M virt to have a single kernel for both gem5 and QEMU, and broke this since the single kernel was more important. TODO port to -M virt.

    -
    - -
    -

    Uses:

    -
    -
    - -
    -
    -

    Expected outcome after insmod:

    -
    -
    -
      -
    • -

      QEMU reports MMIO with printfs

      -
    • -
    • -

      IRQs are generated and handled by this module, which logs to dmesg

      -
    • -
    -
    -
    -

    Without insmoding this module, try writing to the register with /dev/mem:

    -
    -
    -
    -
    devmem 0x101e9000 w 0x12345678
    -
    -
    -
    -

    We can also observe the interrupt with dummy-irq:

    -
    -
    -
    -
    modprobe dummy-irq irq=34
    -insmod platform_device.ko
    -
    -
    -
    -

    The IRQ number 34 was found by on the dmesg after:

    -
    -
    -
    -
    insmod platform_device.ko
    -
    -
    - -
    -
    -

    22.6.5. gem5 educational hardware models

    +

    22.6.4. gem5 educational hardware models

    TODO get some working!

    @@ -19434,7 +19201,7 @@ root

    OK, this is why we used gem5 in the first place, performance measurements!

    -

    Let’s see how many cycles Dhrystone, which Buildroot provides, takes for a few different input parameters.

    +

    Let’s see how many cycles dhrystone, which Buildroot provides, takes for a few different input parameters.

    We will do that for various input parameters on full system by taking a checkpoint after the boot finishes a fast atomic CPU boot, and then we will restore in a more detailed mode and run the benchmark:

    @@ -21309,10 +21076,33 @@ m5 execfile

    23.8.2. m5ops instructions

    -

    gem5 allocates some magic instructions on unused instruction encodings for convenient guest instrumentation.

    +

    There are few different possible instructions that can be used to implement identical m5ops:

    +
    +
    +
    -

    Those instructions are exposed through the gem5 m5 executable in tree executable.

    +

    All of those those methods are exposed through the gem5 m5 executable in-tree executable. You can select which method to use when calling the executable, e.g.:

    +
    +
    +
    +
    m5 exit
    +# Same as the above.
    +m5 --inst exit
    +# The address is mandatory if not configured at build time.
    +m5 --addr 0x10010000 exit
    +m5 --semi exit
    +

    To make things simpler to understand, you can play around with our own minimized educational m5 subset:

    @@ -21400,7 +21190,45 @@ m5 execfile
    -
    23.8.2.1. m5ops instructions interface
    +
    23.8.2.1. m5ops magic addresses
    +
    +

    These are magic addresses that when accessed lead to an m5op.

    +
    +
    +

    The base address is given by system.m5ops_base, and then each m5op happens at a different address offset form that base.

    +
    +
    +

    If system.m5ops_base is 0, then the memory m5ops are disabled.

    +
    +
    +

    Note that the address is physical, and therefore when running in full system on top of the Linux kernel, you must first map a virtual to physical address with /dev/mem as mentioned at: Userland physical address experiments.

    +
    +
    +

    One advantage of this method is that it can work with gem5 KVM, whereas the magic instructions don’t, since the host cannot handle them and it is hard to hook into that.

    +
    +
    +

    A Baremetal example of that can be found at: baremetal/arch/aarch64/no_bootloader/m5_exit_addr.S.

    +
    +
    +

    As of gem5 0d5a80cb469f515b95e03f23ddaf70c9fd2ecbf2, fs.py --baremetal disables the memory m5ops however for some reason, therefore you should run that program as:

    +
    +
    +
    +
    ./run --arch aarch64 --baremetal baremetal/arch/aarch64/no_bootloader/m5_exit_addr.S --emulator gem5 --trace-insts-stdout -- --param 'system.m5ops_base=0x10010000'
    +
    +
    +
    +

    TODO failing with:

    +
    +
    +
    +
    info: Entering event queue @ 0.  Starting simulation...
    +fatal: Unable to find destination for [0x10012100:0x10012108] on system.iobus
    +
    +
    +
    +
    +
    23.8.2.2. m5ops instructions interface

    Let’s study how the gem5 m5 executable uses them:

    @@ -21426,14 +21254,11 @@ m5 execfile

    magic instructions, which don’t exist in the corresponding arch

  • -

    magic memory addresses on a given page

    +

    magic memory addresses on a given page: m5ops magic addresses

  • -

    TODO: what is the advantage of magic memory addresses? Because you have to do more setup work by telling the kernel never to touch the magic page. For the magic instructions, the only thing that could go wrong is if you run some crazy kind of fuzzing workload that generates random instructions.

    -
    -

    Then, in aarch64 magic instructions for example, the lines:

    @@ -21514,7 +21339,7 @@ m5_fail(ints[1], ints[0]);
    -
    23.8.2.2. m5op annotations
    +
    23.8.2.3. m5op annotations

    include/gem5/asm/generic/m5ops.h also describes some annotation instructions.

    @@ -24208,6 +24033,9 @@ type=SimpleMemory

    and configure it into Eclipse as usual.

    +
    +

    One downside of this setup is that if you want to nuke your build directory to get a clean build, then the Eclipse configuration files present in it might get deleted. Maybe it is possible to store configuration files outside of the directory, but we are now mitigating that by making a backup copy of those configuration files before removing the directory, and restoring it when you do ./build-gem --clean.

    +

    23.22.2. gem5 Python C++ interaction

    @@ -28879,6 +28707,9 @@ class O3ThreadContext : public ThreadContext
    +

    see also: https://stackoverflow.com/questions/64420547/in-gem5-how-do-i-know-the-specific-location-of-the-class/64423633#64423633

    +
    +

    Unlike in SimpleThread however, O3ThreadContext does not contain the register data itself, e.g. O3ThreadContext::readIntRegFlat instead forwards to cpu:

    @@ -30043,9 +29874,9 @@ build/ARM/config/the_isa.hh
    -
    git submodule update --init submodules/gensim-simulator
    +
    git submodule update --init submodules/gensim
     sudo apt install libantlr3c-dev
    -cd submodule/gensim-simulator
    +cd submodule/gensim
     make
    @@ -30091,12 +29922,12 @@ Aborted (core dumped)
    -
    cd /home/ciro/bak/git/linux-kernel-module-cheat/submodules/gensim-simulator/models/armv8 && \
    -  /home/ciro/bak/git/linux-kernel-module-cheat/submodules/gensim-simulator/build/dist/bin/gensim \
    -  -a /home/ciro/bak/git/linux-kernel-module-cheat/submodules/gensim-simulator/models/armv8/aarch64.ac \
    +
    cd /home/ciro/bak/git/linux-kernel-module-cheat/submodules/gensim/models/armv8 && \
    +  /home/ciro/bak/git/linux-kernel-module-cheat/submodules/gensim/build/dist/bin/gensim \
    +  -a /home/ciro/bak/git/linux-kernel-module-cheat/submodules/gensim/models/armv8/aarch64.ac \
       -s module,arch,decode,disasm,ee_interp,ee_blockjit,jumpinfo,function,makefile \
    -  -o decode.GenerateDotGraph=1,makefile.libtrace_path=/home/ciro/bak/git/linux-kernel-module-cheat/submodules/gensim-simulator/support/libtrace/inc,makefile.archsim_path=/home/ciro/bak/git/linux-kernel-module-cheat/submodules/gensim-simulator/archsim/inc,makefile.llvm_path=,makefile.Optimise=2,makefile.Debug=1 \
    -  -t /home/ciro/bak/git/linux-kernel-module-cheat/submodules/gensim-simulator/build/models/armv8/output-aarch64/
    + -o decode.GenerateDotGraph=1,makefile.libtrace_path=/home/ciro/bak/git/linux-kernel-module-cheat/submodules/gensim/support/libtrace/inc,makefile.archsim_path=/home/ciro/bak/git/linux-kernel-module-cheat/submodules/gensim/archsim/inc,makefile.llvm_path=,makefile.Optimise=2,makefile.Debug=1 \ + -t /home/ciro/bak/git/linux-kernel-module-cheat/submodules/gensim/build/models/armv8/output-aarch64/
    @@ -31500,101 +31331,6 @@ echo 1 > /proc/sys/vm/overcommit_memory

    strace shows that OpenMP makes clone() syscalls in Linux. TODO: does it actually call pthread_ functions, or does it make syscalls directly? Or in other words, can it work on Freestanding programs? A quick grep shows many references to pthreads.

    -
    -
    26.2.3.2.1. OpenMP validation
    - -
    -

    Host build on Ubuntu 20.04:

    -
    -
    -
    -
    git submodule update --init submodules/omp-validation
    -cd submodules/omp-validation
    -PERL5LIB="${PERL5LIB}:." make -j `nproc` ctest
    -
    -
    -
    -

    This both builds and runs, took about 5 minutes on 2017 Lenovo ThinkPad P51, but had build failues for some reason:

    -
    -
    -
    -
    Summary:
    -S Number of tested Open MP constructs: 62
    -S Number of used tests:                123
    -S Number of failed tests:              4
    -S Number of successful tests:          119
    -S + from this were verified:           115
    -
    -Normal tests:
    -N Number of failed tests:              2
    -N + from this fail compilation:        0
    -N + from this timed out                0
    -N Number of successful tests:          60
    -N + from this were verified:           58
    -
    -Orphaned tests:
    -O Number of failed tests:              2
    -O + from this fail compilation:        0
    -O + from this timed out                0
    -O Number of successful tests:          59
    -O + from this were verified:           57
    -
    -
    -
    -

    The tests and run results placed under bin/c/, e.g.:

    -
    -
    -
    -
    test_omp_threadprivate
    -test_omp_threadprivate.c
    -test_omp_threadprivate.log
    -test_omp_threadprivate.out
    -test_omp_threadprivate_compile.log
    -
    -
    -
    -

    C files are also present as some kind of code generaion is used.

    -
    -
    -

    Build only and run one of them manually:

    -
    -
    -
    -
    make -j`nproc` omp_my_sleep omp_testsuite
    -PERL5LIB="${PERL5LIB}:." ./runtest.pl --lang=c --norun testlist-c.txt
    -./bin/c/test_omp_barrier
    -
    -
    -
    -

    The bin/c directory is hardcoded in the executable, so to run it you must ensure that it exists relative to CWD, e.g.:

    -
    -
    -
    -
    cd bin/c
    -mkdir -p bin/c
    -./test_omp_barrier
    -
    -
    -
    -

    Manually cross compile all tests and optionally add some extra options, e.g. -static to more conveniently run in gem5:

    -
    -
    -
    -
    PERL5LIB="${PERL5LIB}:." ./runtest.pl --lang=c --makeopts 'CC=aarch64-linux-gnu-gcc CFLAGS_EXTRA=-static' --norun testlist-c.txt
    -./../../run --arch aarch64 --emulator gem5 --userland submodules/omp-validation/bin/c/test_omp_parallel_reduction --cpus 8 --memory 8G
    -
    -
    -
    -

    Build a single test:

    -
    -
    -
    -
    make bin/c/test_omp_sections_reduction
    -
    -
    -
    @@ -31652,22 +31388,6 @@ mkdir -p bin/c
  • -

    templates

    - -
  • -
  • iostream

      @@ -32153,7 +31873,29 @@ non-atomic 19
    + +
    +

    26.3.6. C++ compile time magic

    + +
    +
    26.3.6.1. C++ decltype
    + +
    +

    C++11 keyword.

    +
    +
    +

    Replaces decltype with type of an expression at compile time.

    +
    +
    +

    More powerful than auto as you can use it in more places.

    +
    +
    +
    +
    +
    26.7.1.2. Build and install the interpreter

    Buildroot has a Python package that can be added to the guest image:

    @@ -32655,7 +32469,7 @@ There are no non-locking atomic types or atomic primitives in POSIX: -
    26.7.1.2. Python gem5 user mode simulation
    +
    26.7.1.3. Python gem5 user mode simulation

    At LKMC 50ac89b779363774325c81157ec8b9a6bdb50a2f gem5 390a74f59934b85d91489f8a563450d8321b602da:

    @@ -32715,7 +32529,7 @@ There are no non-locking atomic types or atomic primitives in POSIX: -
    26.7.1.3. Embedding Python in another application
    +
    26.7.1.4. Embedding Python in another application

    Here we will add some better examples and explanations for: https://docs.python.org/3/extending/embedding.html#very-high-level-embedding

    @@ -32766,7 +32580,7 @@ There are no non-locking atomic types or atomic primitives in POSIX: -
    26.7.1.4. pybind11
    +
    26.7.1.5. pybind11
    @@ -32859,26 +32673,9 @@ There are no non-locking atomic types or atomic primitives in POSIX: rootfs_overlay/lkmc/nodejs/object_to_string.js: util.inspect.custom and toString override experiment: https://stackoverflow.com/questions/24902061/is-there-an-repr-equivalent-for-javascript/26698403#26698403

    -
    -

    Output:

    -
    -
    -
    -
    util.inspect
    -my type is MyClassUtilInspectCustom and a is 1 and b is 2
    -console.log
    -my type is MyClassUtilInspectCustom and a is 1 and b is 2
    -toString
    -[object Object]
    -
    -util.inspect
    -MyClassToString { a: 1, b: 2 }
    -console.log
    -MyClassToString { a: 1, b: 2 }
    -toString
    -my type is MyClassToString and a is 1 and b is 2
    -
    -
    +
  • +
  • +

    rootfs_overlay/lkmc/nodejs/object_to_json.js: toJSON examples

  • @@ -33570,218 +33367,6 @@ git clean -xdf . -
    -
    26.9.1.6. Coremark
    - -
    -

    Part of EEMBC.

    -
    -
    -

    They have two versions:

    -
    -
    - -
    -
    -

    Both have a custom license, so yeah, no patience to read this stuff.

    -
    -
    -

    Coremark-pro build and run on Ubuntu 20.04:

    -
    -
    -
    -
    git submodule update --init submodules coremark-pro
    -cd submodules/coremark-pro
    -make TARGET=linux64 build
    -make TARGET=linux64 XCMD='-c4' certify-all
    -
    -
    -
    -

    This uses 4 contexts. TODO what are contexts? Is the same as threads? You likely want to use -c$(nproc) in practice instead?

    -
    -
    -

    Finishes in a few seconds, 2017 Lenovo ThinkPad P51 results:

    -
    -
    -
    -
    Workload Name                                     (iter/s)   (iter/s)    Scaling
    ------------------------------------------------ ---------- ---------- ----------
    -cjpeg-rose7-preset                                  526.32     178.57       2.95
    -core                                                  7.39       2.16       3.42
    -linear_alg-mid-100x100-sp                           684.93     238.10       2.88
    -loops-all-mid-10k-sp                                 27.65       7.80       3.54
    -nnet_test                                            32.79      10.57       3.10
    -parser-125k                                          71.43      25.00       2.86
    -radix2-big-64k                                     2320.19     623.44       3.72
    -sha-test                                            555.56     227.27       2.44
    -zip-test                                            363.64     166.67       2.18
    -
    -MARK RESULTS TABLE
    -
    -Mark Name                                        MultiCore SingleCore    Scaling
    ------------------------------------------------ ---------- ---------- ----------
    -CoreMark-PRO                                      18743.79    6306.76       2.97
    -
    -
    -
    -

    More sample results: P51 CoreMark-Pro.

    -
    -
    -

    And scaling appears to be the ration between multicore (4 due to -c4 and single core performance), each benchmark gets run twice with multicore and single core.

    -
    -
    -

    The tester script also outputs test commands, some of which are:

    -
    -
    -
    -
    builds/linux64/gcc64/bin/zip-test.exe -c1 -w1 -c4 -v1
    -builds/linux64/gcc64/bin/zip-test.exe -c1 -w1 -c4 -v0
    -builds/linux64/gcc64/bin/zip-test.exe -c4 -v1
    -builds/linux64/gcc64/bin/zip-test.exe -c4 -v0
    -
    -
    -
    -

    -v1 appears to be a fast verification run, and both -c1 vs -c4 get run because for the single vs multicore preformance.

    -
    -
    -

    Sample -c4 -v0 output:

    -
    -
    -
    -
    -  Info: Starting Run...
    --- Workload:zip-test=946108807
    --- zip-test:time(ns)=11
    --- zip-test:contexts=4
    --- zip-test:iterations=4
    --- zip-test:time(secs)=   0.011
    --- zip-test:secs/workload= 0.00275
    --- zip-test:workloads/sec= 363.636
    --- Done:zip-test=946108807
    -
    -
    -
    -

    and so we see the zip-test:workloads/sec= 363.636 output is the key value, which is close to that of the zip-test 363.64 in the earlier full summarized result.

    -
    -
    -

    Cross compile statically for aarch64. From LKMC toplevel:

    -
    -
    -
    -
    make \
    -  -C submodules/coremark-pro \
    -  LINKER_FLAGS='-static' \
    -  LINKER_LAST='-lm -lpthread -lrt' \
    -  TARGET=gcc-cross-linux \
    -  TOOLCHAIN=gcc-cross-linux \
    -  TOOLS="$(./getvar --arch aarch64 buildroot_host_usr_dir)" \
    -  TPREF="$(./getvar --arch aarch64 buildroot_toolchain_prefix)-" \
    -  build \
    -;
    -
    -
    -
    -

    Run a single executable on QEMU:

    -
    -
    -
    -
    ./run --arch aarch64 --userland submodules/coremark-pro/builds/gcc-cross-linux/bin/zip-test.exe --cli-args='-c4 -v0'
    -
    -
    -
    -

    Finishes in about 1 second, and gives zip-test:workloads/sec= 74.0741 so we see that it ran about 5x slower than the native host.

    -
    -
    -

    Run a single executable on gem5 in a verification run:

    -
    -
    -
    -
    ./run \
    -  --arch aarch64 \
    -  --cli-args='-c1 -v1' \
    -  --emulator gem5 \
    -  --userland submodules/coremark-pro/builds/gcc-cross-linux/bin/zip-test.exe \
    -;
    -
    -
    -
    -

    TODO: hangs for at least 15 minutes, there must be something wrong. Stuck on an evolving strlen loop:

    -
    -
    -
    -
    7837834500: system.cpu: A0 T0 : @__strlen_generic+112    : ldp
    -7837834500: system.cpu: A0 T0 : @__strlen_generic+112. 0 :   addxi_uop   ureg0, x1, #16 : IntAlu :  D=0x0000003ffff07170  flags=(IsInteger|IsMicroop|IsDelayedCommit|IsFirstMicroop)
    -7837835000: system.cpu: A0 T0 : @__strlen_generic+112. 1 :   ldp_uop   x2, x3, [ureg0] : MemRead :  D=0x20703c0a3e702f3c A=0x3ffff07170  flags=(IsInteger|IsMemRef|IsLoad|IsMicroop|IsLastMicroop)
    -7837835500: system.cpu: A0 T0 : @__strlen_generic+116    :   sub   x4, x2, x8         : IntAlu :  D=0x3d607360632e3b34  flags=(IsInteger)
    -7837836000: system.cpu: A0 T0 : @__strlen_generic+120    :   sub   x6, x3, x8         : IntAlu :  D=0x1f6f3b093d6f2e3b  flags=(IsInteger)
    -7837836500: system.cpu: A0 T0 : @__strlen_generic+124    :   orr   x5, x4, x6         : IntAlu :  D=0x3f6f7b697f6f3f3f  flags=(IsInteger)
    -7837837000: system.cpu: A0 T0 : @__strlen_generic+128    :   ands   x5, x8, LSL #7    : IntAlu :  D=0x0000000000000000  flags=(IsInteger)
    -7837837500: system.cpu: A0 T0 : @__strlen_generic+132    :   b.eq   <__strlen_generic+88> : IntAlu :   flags=(IsControl|IsDirectControl|IsCondControl)
    -7837838000: system.cpu: A0 T0 : @__strlen_generic+88    : ldp
    -7837838000: system.cpu: A0 T0 : @__strlen_generic+88. 0 :   addxi_uop   ureg0, x1, #32 : IntAlu :  D=0x0000003ffff07180  flags=(IsInteger|IsMicroop|IsDelayedCommit|IsFirstMicroop)
    -7837838500: system.cpu: A0 T0 : @__strlen_generic+88. 1 :   ldp_uop   x2, x3, [ureg0] : MemRead :  D=0x6565686b636f4c27 A=0x3ffff07180  flags=(IsInteger|IsMemRef|IsLoad|IsMicroop|IsDelayedCommit)
    -7837839000: system.cpu: A0 T0 : @__strlen_generic+88. 2 :   addxi_uop   x1, ureg0, #0 : IntAlu :  D=0x0000003ffff07180  flags=(IsInteger|IsMicroop|IsLastMicroop)
    -7837839500: system.cpu: A0 T0 : @__strlen_generic+92    :   sub   x4, x2, x8         : IntAlu :  D=0x3c786d606f6c6e62  flags=(IsInteger)
    -7837840000: system.cpu: A0 T0 : @__strlen_generic+96    :   sub   x6, x3, x8         : IntAlu :  D=0x6464676a626e4b26  flags=(IsInteger)
    -7837840500: system.cpu: A0 T0 : @__strlen_generic+100    :   orr   x5, x4, x6         : IntAlu :  D=0x7c7c6f6a6f6e6f66  flags=(IsInteger)
    -7837841000: system.cpu: A0 T0 : @__strlen_generic+104    :   ands   x5, x8, LSL #7    : IntAlu :  D=0x0000000000000000  flags=(IsInteger)
    -
    -
    -
    -

    Instructions before __strlen_generic starts:

    -
    -
    -
    -
    7831019000: system.cpu: A0 T0 : @define_params_zip+664    :   add   x1, sp, #168       : IntAlu :  D=0x0000007ffffef988  flags=(IsInteger)
    -7831019500: system.cpu: A0 T0 : @define_params_zip+668    :   orr   x0, xzr, x24       : IntAlu :  D=0x0000003ffff00010  flags=(IsInteger)
    -7831020000: system.cpu: A0 T0 : @define_params_zip+672    :   bl   <th_strcat>         : IntAlu :  D=0x000000000040a4c4  flags=(IsInteger|IsControl|IsDirectControl|IsUncondControl|IsCall)
    -7831020500: system.cpu: A0 T0 : @th_strcat    :   b   <strcat>             : IntAlu :   flags=(IsControl|IsDirectControl|IsUncondControl)
    -7831021000: system.cpu: A0 T0 : @strcat    : stp
    -7831021000: system.cpu: A0 T0 : @strcat. 0 :   addxi_uop   ureg0, sp, #-48 : IntAlu :  D=0x0000007ffffef8b0  flags=(IsInteger|IsMicroop|IsDelayedCommit|IsFirstMicroop)
    -7831021500: system.cpu: A0 T0 : @strcat. 1 :   strxi_uop   x29, [ureg0] : MemWrite :  D=0x0000007ffffef8e0 A=0x7ffffef8b0  flags=(IsInteger|IsMemRef|IsStore|IsMicroop|IsDelayedCommit)
    -7831022000: system.cpu: A0 T0 : @strcat. 2 :   strxi_uop   x30, [ureg0, #8] : MemWrite :  D=0x000000000040a4c4 A=0x7ffffef8b8  flags=(IsInteger|IsMemRef|IsStore|IsMicroop|IsDelayedCommit)
    -7831022500: system.cpu: A0 T0 : @strcat. 3 :   addxi_uop   sp, ureg0, #0 : IntAlu :  D=0x0000007ffffef8b0  flags=(IsInteger|IsMicroop|IsLastMicroop)
    -7831023000: system.cpu: A0 T0 : @strcat+4    :   add   x29, sp, #0        : IntAlu :  D=0x0000007ffffef8b0  flags=(IsInteger)
    -7831023500: system.cpu: A0 T0 : @strcat+8    :   str   x19, [sp, #16]     : MemWrite :  D=0x00000000004d6560 A=0x7ffffef8c0  flags=(IsInteger|IsMemRef|IsStore)
    -7831024000: system.cpu: A0 T0 : @strcat+12    :   orr   x19, xzr, x0       : IntAlu :  D=0x0000003ffff00010  flags=(IsInteger)
    -7831024500: system.cpu: A0 T0 : @strcat+16    :   str   x1, [sp, #40]      : MemWrite :  D=0x0000007ffffef988 A=0x7ffffef8d8  flags=(IsInteger|IsMemRef|IsStore)
    -7831025000: system.cpu: A0 T0 : @strcat+20    :   bl   <_init+120>         : IntAlu :  D=0x00000000004464c8  flags=(IsInteger|IsControl|IsDirectControl|IsUncondControl|IsCall)
    -7831025500: system.cpu: A0 T0 : @_init+120    :   adrp   x16, #835584      : IntAlu :  D=0x00000000004cc000  flags=(IsInteger)
    -7831026000: system.cpu: A0 T0 : @_init+124    :   ldr   x17, [x16, #48]    : MemRead :  D=0x0000000000449680 A=0x4cc030  flags=(IsInteger|IsMemRef|IsLoad)
    -7831026500: system.cpu: A0 T0 : @_init+128    :   add   x16, x16, #48      : IntAlu :  D=0x00000000004cc030  flags=(IsInteger)
    -7831027000: system.cpu: A0 T0 : @_init+132    :   br   x17                 : IntAlu :   flags=(IsInteger|IsControl|IsIndirectControl|IsUncondControl)
    -
    -
    -
    -

    Their build/run system is nice, it even user mode simulators out-of-the-box! TODO give it a shot. See :

    -
    -
    -
    -
    RUN =
    -RUN_FLAGS =
    -
    -
    -
    -

    under util/make/linux64.mak.

    -
    -
    -

    Tested on a7ae8e6a8e29ef46d79eb9178d8599d1faeea0e5 + 1.

    -
    -

    26.9.2. Microbenchmarks

    @@ -33820,365 +33405,6 @@ RUN_FLAGS =
    -
    -
    26.9.2.1. Dhrystone
    - -
    -

    Created in the 80’s, it is not a representative measure of performance in modern computers anymore. It has mostly been replaced by SPEC, which is…​ closed source! Unbelievable.

    -
    -
    -

    Dhrystone is very simple:

    -
    -
    -
      -
    • -

      there is one loop in the dhry_1.c main function that gets executed N times

      -
    • -
    • -

      that loop calls 9 short functions called Proc_0 to Proc_9, most of which are defined in dhry_1.c, and a few others in dhry_2.c

      -
    • -
    -
    -
    -

    The benchmark is single-threaded.

    -
    -
    -

    After a quick look at it, Dhrystone in -O3 is is very likely completely CPU bound, as there are no loops over variable sized arrays, except for some dummy ones that only run once. It just does a bunch of operations on local and global C variables, which are very likely to be inlined and treated fully in registers until the final write back, or to fit entirely in cache. TODO confirm with some kind of measurement. The benchmark also makes no syscalls except for measuring time and reporting results.

    -
    -
    -

    Buildroot has a dhrystone package, but because it is so interesting to us, we decided to also build it ourselves, which allows things like static and baremetal compilation more easily.

    -
    -
    -

    Build and run on QEMU User mode simulation:

    -
    -
    -
    -
    git submodule update --init submodules/dhrystone
    -./build-dhrystone --optimization-level 3
    -./run --userland "$(./getvar userland_build_dir)/submodules/dhrystone/dhrystone"
    -
    -
    -
    -

    TODO automate run more nicely to dispense getvar.

    -
    -
    -

    Increase the number of loops to try and reach more meaningful results:

    -
    -
    -
    -
    ./run --userland "$(./getvar userland_build_dir)/submodules/dhrystone/dhrystone" --cli-args 100000000
    -
    -
    -
    -

    Build and run on gem5 user mode:

    -
    -
    -
    -
    ./build-dhrystone --optimization-level 3
    -./run --emulator gem5 --userland "$(./getvar userland_build_dir)/submodules/dhrystone/dhrystone"
    -
    -
    -
    -

    Run natively on the host:

    -
    -
    -
    -
    ./build-dhrystone --host
    -"$(./getvar --host userland_build_dir)/submodules/dhrystone/dhrystone" 1000000000
    -
    -
    -
    -

    Sample output for 2017 Lenovo ThinkPad P51 Ubuntu 20.04:

    -
    -
    -
    -
    Microseconds for one run through Dhrystone:    0.1
    -Dhrystones per Second:                      16152479.0
    -
    -
    -
    -

    Build Dhrystone for Baremetal and run it in on QEMU:

    -
    -
    -
    -
    # Build our Newlib stubs.
    -./build-baremetal --arch aarch64
    -./build-dhrystone --arch aarch64 --mode baremetal
    -./run --arch aarch64 --baremetal "$(./getvar --arch aarch64 baremetal_build_dir)/submodules/dhrystone/dhrystone" --cli-args 10000
    -
    -
    -
    -

    or with gem5:

    -
    -
    -
    -
    # Build our Newlib stubs.
    -./build-baremetal --arch aarch64
    -./build-dhrystone --arch aarch64 --emulator gem5 --mode baremetal
    -./run --arch aarch64 --baremetal "$(./getvar --arch aarch64 --emulator gem5 baremetal_build_dir)/submodules/dhrystone/dhrystone" --cli-args 10000 --emulator gem5
    -
    -
    -
    -

    If you really want the Buildroot package for some reason, build it with:

    -
    -
    -
    -
    ./build-buildroot --config 'BR2_PACKAGE_DHRYSTONE=y'
    -
    -
    -
    -

    and run inside the guest from PATH with:

    -
    -
    -
    -
    dhrystone
    -
    -
    -
    -
    -
    26.9.2.2. LMbench
    - -
    -

    Canonical source at https://sourceforge.net/projects/lmbench/ but Intel has a fork at: https://github.com/intel/lmbench which has more recent build updates, so I think that’s the one I’d put my money on as of 2020.

    -
    -
    -

    Feels old, guessing not representative anymore like Dhrystone. But hey, history!

    -
    -
    -

    Ubuntu 20.04 AMD64 native build and run:

    -
    -
    -
    -
    git submodule update --init submodules/lmbench
    -cd submodules/lmbench
    -cd src
    -make results
    -
    -
    -
    -

    TODO it hangs for a long time at:

    -
    -
    -
    -
    Hang on, we are calculating your cache line size.
    -
    -
    - -
    -

    the If I kill it, configuration process continues:

    -
    -
    -
    -
    Killed
    -OK, it looks like your cache line is  bytes.
    -
    -
    -
    -

    and continues with a few more interactive questions until finally:

    -
    -
    -
    -
    Confguration done, thanks.
    -
    -
    -
    -

    where it again hangs for at least 2 hours, so I lost patience and killed it.

    -
    -
    -

    TODO: how to do a non-interactive config? After the above procedure, bin/x86_64-linux-gnu/CONFIG.ciro-p51 contains:

    -
    -
    -
    -
    DISKS=""
    -DISK_DESC=""
    -OUTPUT=/dev/null
    -ENOUGH=50000
    -FASTMEM="NO"
    -FILE=/var/tmp/XXX
    -FSDIR=/var/tmp
    -INFO=INFO.ciro-p51
    -LINE_SIZE=
    -LOOP_O=0.00000000
    -MAIL=no
    -TOTAL_MEM=31903
    -MB=22332
    -MHZ="-1 System too busy"
    -MOTHERBOARD=""
    -NETWORKS=""
    -OS="x86_64-linux-gnu"
    -PROCESSORS="8"
    -REMOTE=""
    -SLOWFS="NO"
    -SYNC_MAX="1"
    -LMBENCH_SCHED="DEFAULT"
    -TIMING_O=0
    -RSH=rsh
    -RCP=rcp
    -VERSION=lmbench-3alpha4
    -BENCHMARK_HARDWARE=YES
    -BENCHMARK_OS=YES
    -BENCHMARK_SYSCALL=
    -BENCHMARK_SELECT=
    -BENCHMARK_PROC=
    -BENCHMARK_CTX=
    -BENCHMARK_PAGEFAULT=
    -BENCHMARK_FILE=
    -BENCHMARK_MMAP=
    -BENCHMARK_PIPE=
    -BENCHMARK_UNIX=
    -BENCHMARK_UDP=
    -BENCHMARK_TCP=
    -BENCHMARK_CONNECT=
    -BENCHMARK_RPC=
    -BENCHMARK_HTTP=
    -BENCHMARK_BCOPY=
    -BENCHMARK_MEM=
    -BENCHMARK_OPS=
    -
    -
    -
    -

    Native build only without running tests:

    -
    -
    -
    -
    cd src
    -make
    -
    -
    -
    -

    Interestingly, one of the creators of LMbench, Larry Mcvoy (https://www.linkedin.com/in/larrymcvoy/, https://en.wikipedia.org/wiki/Larry_McVoy), is also a co-founder of BitKeeper. Their SMC must be blazingly fast!!! Also his LinkedIn says Intel uses it. But they will forever be remembered as "the closed source Git precursor that died N years ago", RIP.

    -
    -
    -
    -
    26.9.2.3. STREAM benchmark
    - -
    -

    Very simple memory width benchmark with one C and one Fortran version, originally published in 1991, and the latest version at the time of writing is from 2013.

    -
    -
    -

    Its operation is very simple: fork one thread for each CPU in the system (using OpenMP) and do the following four array operations (4 separate loops of individual operations):

    -
    -
    -
    -
    /* Copy. */
    -times[0 * ntimes + k] = mysecond();
    -#pragma omp parallel for
    -for (j=0; j<stream_array_size; j++)
    -    c[j] = a[j];
    -times[0 * ntimes + k] = mysecond() - times[0 * ntimes + k];
    -
    -/* Scale. */
    -times[1 * ntimes + k] = mysecond();
    -#pragma omp parallel for
    -for (j=0; j<stream_array_size; j++)
    -    b[j] = scalar*c[j];
    -times[1 * ntimes + k] = mysecond() - times[1 * ntimes + k];
    -
    -/* Add. */
    -times[2 * ntimes + k] = mysecond();
    -#pragma omp parallel for
    -for (j=0; j<stream_array_size; j++)
    -    c[j] = a[j]+b[j];
    -times[2 * ntimes + k] = mysecond() - times[2 * ntimes + k];
    -
    -/* Triad. */
    -times[3 * ntimes + k] = mysecond();
    -#pragma omp parallel for
    -for (j=0; j<stream_array_size; j++)
    -    a[j] = b[j]+scalar*c[j];
    -times[3 * ntimes + k] = mysecond() - times[3 * ntimes + k];
    -}
    -
    -
    - -
    -

    Ubuntu 20.04 native build and run:

    -
    -
    -
    -
    git submodule update --init submodules/stream-benchmark
    -cd submodules/stream-benchmark
    -make
    -./stream_c.exe
    -
    -
    -
    -

    Sample output:

    -
    -
    -
    -
    -------------------------------------------------------------
    -STREAM version $Revision: 5.10 $
    --------------------------------------------------------------
    -This system uses 8 bytes per array element.
    --------------------------------------------------------------
    -Array size = 10000000 (elements), Offset = 0 (elements)
    -Memory per array = 76.3 MiB (= 0.1 GiB).
    -Total memory required = 228.9 MiB (= 0.2 GiB).
    -Each kernel will be executed 10 times.
    - The *best* time for each kernel (excluding the first iteration)
    - will be used to compute the reported bandwidth.
    --------------------------------------------------------------
    -Number of Threads requested = 8
    -Number of Threads counted = 8
    --------------------------------------------------------------
    -Your clock granularity/precision appears to be 1 microseconds.
    -Each test below will take on the order of 7027 microseconds.
    -   (= 7027 clock ticks)
    -Increase the size of the arrays if this shows that
    -you are not getting at least 20 clock ticks per test.
    --------------------------------------------------------------
    -WARNING -- The above is only a rough guideline.
    -For best results, please be sure you know the
    -precision of your system timer.
    --------------------------------------------------------------
    -Function    Best Rate MB/s  Avg time     Min time     Max time
    -Copy:           20123.2     0.008055     0.007951     0.008267
    -Scale:          20130.4     0.008032     0.007948     0.008177
    -Add:            22528.8     0.010728     0.010653     0.010867
    -Triad:          22448.4     0.010826     0.010691     0.011352
    --------------------------------------------------------------
    -Solution Validates: avg error less than 1.000000e-13 on all three arrays
    --------------------------------------------------------------
    -
    -
    -
    -

    The LKMC usage of STREAM is analogous to that of Dhrystone. Build and run on QEMU User mode simulation:

    -
    -
    -
    -
    ./build-stream --optimization-level 3
    -./run --userland "$(./getvar userland_build_dir)/submodules/stream-benchmark/stream_c.exe"
    -
    -
    -
    -

    Decrease the benchmark size and the retry count to finish simulation faster, but possibly have a less representative result:

    -
    -
    -
    -
    ./run --userland "$(./getvar userland_build_dir)/submodules/stream-benchmark/stream_c.exe" --cli-args '100 2'
    -
    -
    -
    -

    Build and run on gem5 user mode:

    -
    -
    -
    -
    ./build-stream --optimization-level 3
    -./run --emulator gem5 --userland "$(./getvar userland_build_dir)/submodules/stream-benchmark/stream_c.exe" --cli-args '1000 2'
    -
    -
    -
    @@ -34938,6 +34164,31 @@ When instructions do not interpret this operand encoding as the zero register, u

    This is analogous to step debugging baremetal examples.

    +
    +

    Related:

    +
    +
    27.5.1.1. nostartfiles programs
    @@ -42735,30 +41986,6 @@ instructions 124346081 - -

    605448f07e6380634b1aa7e9732d111759f69fd

    -

    Dhrystone -O3

    -

    gem5 --arch aarch64

    -

    4 * 10^5

    -

    68

    -

    9.2034139 * 10^7

    -

    1.6

    - - - - - -

    5d233f2664a78789f9907d27e2a40e86cefad595

    -

    STREAM benchmark -O3

    -

    ./run --arch aarch64 --emulator gem5 --userland userland/gcc/busy_loop.c --cli-args 1000000 --trace ExecAll

    -

    3 * 10^5 * 2

    -

    64

    -

    9.9674773 * 10^7

    -

    1.6

    - - - -

    glibc C pre-main effects

    ab6f7331406b22f8ab6e2df5f8b8e464fb35b611

    userland/c/m5ops.c -O0

    @@ -42887,10 +42114,7 @@ instructions 124346081

    Let’s see if user mode runs considerably faster than full system or not, ignoring the kernel boot.

    -

    First we build Dhrystone manually statically since dynamic linking is broken in gem5 as explained at: Section 10.7, “gem5 syscall emulation mode”.

    -
    -
    -

    TODO: move this section to our new custom dhrystone setup: Section 26.9.2.1, “Dhrystone”.

    +

    First we build dhrystonee manually statically since dynamic linking is broken in gem5 as explained at: Section 10.7, “gem5 syscall emulation mode”.

    gem5 user mode:

    @@ -43148,7 +42372,7 @@ tail -n+1 ../linux-kernel-module-cheat-regression/*/gem5-bench-build-*.txt

    and then copy the link command to a separate Bash file. Then you can time and modify it easily.

    -

    Some approximate reference values on 2017 Lenovo ThinkPad P51:

    +

    Some approximate reference values on 2017 Lenovo ThinkPad P51 LKMC d4b3e064adeeace3c3e7d106801f95c14637c12f + 1 (doing multiple runs to warm up disk caches):

      @@ -43223,7 +42447,37 @@ tail -n+1 ../linux-kernel-module-cheat-regression/*/gem5-bench-build-*.txt
    -

    Tested at: d4b3e064adeeace3c3e7d106801f95c14637c12f + 1.

    +

    On LKMC 220c3a434499e4713664d4a47c246cb81ee0a06a gem5 63e96992568d8a8a0dccac477b8b7f1370ac7e98 (Sep 2020):

    +
    +
    +
      +
    • +

      opt

      +
      +
        +
      • +

        default link: 18.32user 3.99system 0:22.33elapsed 99%CPU (0avgtext+0avgdata 4622908maxresident)k

        +
      • +
      • +

        LDFLAGS_EXTRA=-fuse-ld=lld (after a build with default linker): 6.74user 1.81system 0:03.85elapsed 222%CPU (0avgtext+0avgdata 7025292maxresident)k

        +
      • +
      • +

        LDFLAGS_EXTRA=-fuse-ld=gold: 7.70user 1.36system 0:09.44elapsed 95%CPU (0avgtext+0avgdata 5959152maxresident)k

        +
        + +
        +
      • +
      +
      +
    • +
    @@ -43339,40 +42593,7 @@ tail -n+1 ../linux-kernel-module-cheat-regression/*/gem5-bench-build-*.txt
    34.3.1.1. P51 benchmarks
    -
    -

    Dhrystone on Ubuntu 20.04 results at Dhrystone.

    -
    -
    -

    STREAM benchmark on Ubuntu 20.04 results at STREAM benchmark.

    -
    -
    -
    34.3.1.1.1. P51 CoreMark-Pro
    -
    -

    CoreMark-Pro d5b4f2ba7ba31e37a5aa93423831e7d5eb933868 on Ubuntu 20.04 with XCMD="-c$(nproc)":

    -
    -
    -
    -
                                                     MultiCore SingleCore
    -Workload Name                                     (iter/s)   (iter/s)    Scaling
    ------------------------------------------------ ---------- ---------- ----------
    -cjpeg-rose7-preset                                  769.23     175.44       4.38
    -core                                                  7.98       2.11       3.78
    -linear_alg-mid-100x100-sp                           892.86     233.64       3.82
    -loops-all-mid-10k-sp                                 35.84       7.58       4.73
    -nnet_test                                            35.09      10.05       3.49
    -parser-125k                                         125.00      20.41       6.12
    -radix2-big-64k                                     3278.69     630.91       5.20
    -sha-test                                            625.00     227.27       2.75
    -zip-test                                            615.38     166.67       3.69
     
    -MARK RESULTS TABLE
    -
    -Mark Name                                        MultiCore SingleCore    Scaling
    ------------------------------------------------ ---------- ---------- ----------
    -CoreMark-PRO                                      25016.00    6079.70       4.11
    -
    -
    -
    34.3.1.2. P51 maintenance history