diff --git a/index.html b/index.html index 6ea8ed0..44bfc51 100644 --- a/index.html +++ b/index.html @@ -1326,13 +1326,14 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
  • 21.8. Benchmarks @@ -3223,7 +3224,7 @@ unzip lkmc-*.zip
    -
    ./build-modules --gcc-which host --host
    +
    ./build-modules --host
    @@ -3234,7 +3235,7 @@ unzip lkmc-*.zip
    -
    ./build-modules --gcc-which host --host -- hello hello2
    +
    ./build-modules --host -- hello hello2
    @@ -19918,6 +19919,9 @@ Indirect leak of 1346 byte(s) in 2 object(s) allocated from:

    From the message, this appears however to be a Python / pyenv11 bug however and not in gem5 specifically. I think it worked when I tried it in the past in an older gem5 / Ubuntu.

    +
    +

    --without-tcmalloc is needed / a good idea when using --with-asan: https://stackoverflow.com/questions/42712555/address-sanitizer-fsanitize-address-works-with-tcmalloc since both do more or less similar jobs, see also Memory leaks.

    +

    19.16.4. gem5 Ruby build

    @@ -20728,6 +20732,9 @@ AtomicSimpleCPU::tick() at atomic.cc:757 0x55555907834c
    19.19.4.2. gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis
    +

    TODO: analyze better what each of the memory event mean. For now, we have just collected a bunch of data there, but needs interpreting. The CPU specifics in this section are already insightful however.

    +
    +

    TimingSimpleCPU should be the second simplest CPU to analyze, so let’s give it a try:

    @@ -21735,7 +21742,7 @@ make menuconfig

    Also mentioned at: https://stackoverflow.com/questions/47320800/how-to-clean-only-target-in-buildroot

    -

    See this for a sample manual workaround: Section 21.8.2.4, “PARSEC uninstall”.

    +

    See this for a sample manual workaround: Section 21.8.3.4, “PARSEC uninstall”.

    @@ -23675,24 +23682,41 @@ cblas_dgemm( CblasColMajor, CblasNoTrans, CblasTrans,3,3,2 ,1, A,3, B,
    git submodule update --init submodules/dhrystone
    -./build-dhrystone --mode userland
    +./build-dhrystone --optimization-level 3
     ./run --userland "$(./getvar userland_build_dir)/submodules/dhrystone/dhrystone"
    +

    TODO automate run more nicely to dispense getvar.

    +
    +
    +

    Increase the number of loops to try and reach more meaningful results:

    +
    +
    +
    +
    ./run --userland "$(./getvar userland_build_dir)/submodules/dhrystone/dhrystone" --userland-args 100000000
    +
    +
    +

    Build and run on gem5 user mode:

    -
    ./build-dhrystone --mode userland --static --force-rebuild
    -./run --emulator gem5 --userland "$(./getvar userland_build_dir)/submodules/dhrystone/dhrystone"
    +
    ./build-dhrystone --optimization-level 3 --static
    +./run --emulator gem5 --userland "$(./getvar --static userland_build_dir)/submodules/dhrystone/dhrystone"
    -

    TODO automate run more nicely.

    +

    Run natively on the host:

    +
    +
    +
    +
    ./build-dhrystone --host
    +"$(./getvar --host userland_build_dir)/submodules/dhrystone/dhrystone"
    +
    -

    Build for Baremetal execution and run it in baremetal QEMU:

    +

    Build for Baremetal execution and run it in baremetal QEMU. TODO: fix the build, just need to factor out all run arguments from build-baremetal into common.py and it should just work, no missing syscalls.

    @@ -23703,9 +23727,6 @@ cblas_dgemm( CblasColMajor, CblasNoTrans, CblasTrans,3,3,2 ,1, A,3, B,
    -

    TODO: fix the build, just need to factor out all run arguments from build-baremetal into common.py and it should just work, no missing syscalls.

    -
    -

    If you really want the Buildroot package for some reason, build it with:

    @@ -23723,7 +23744,81 @@ cblas_dgemm( CblasColMajor, CblasNoTrans, CblasTrans,3,3,2 ,1, A,3, B,
    -

    21.8.2. PARSEC benchmark

    +

    21.8.2. STREAM benchmark

    + +
    +

    Very simple memory width benchmark with one C and one Fortran version, originally published in 1991, and the latest version at the time of writing is from 2013.

    +
    +
    +

    Its operation is very simple: fork one thread for each CPU in the system (using OpenMP) and do the following four array operations (4 separate loops of individual operations):

    +
    +
    +
    +
    /* Copy. */
    +times[0 * ntimes + k] = mysecond();
    +#pragma omp parallel for
    +for (j=0; j<stream_array_size; j++)
    +    c[j] = a[j];
    +times[0 * ntimes + k] = mysecond() - times[0 * ntimes + k];
    +
    +/* Scale. */
    +times[1 * ntimes + k] = mysecond();
    +#pragma omp parallel for
    +for (j=0; j<stream_array_size; j++)
    +    b[j] = scalar*c[j];
    +times[1 * ntimes + k] = mysecond() - times[1 * ntimes + k];
    +
    +/* Add. */
    +times[2 * ntimes + k] = mysecond();
    +#pragma omp parallel for
    +for (j=0; j<stream_array_size; j++)
    +    c[j] = a[j]+b[j];
    +times[2 * ntimes + k] = mysecond() - times[2 * ntimes + k];
    +
    +/* Triad. */
    +times[3 * ntimes + k] = mysecond();
    +#pragma omp parallel for
    +for (j=0; j<stream_array_size; j++)
    +    a[j] = b[j]+scalar*c[j];
    +times[3 * ntimes + k] = mysecond() - times[3 * ntimes + k];
    +}
    +
    +
    + +
    +

    The LKMC usage of STREAM is analogous to that of Dhrystone. Build and run on QEMU User mode simulation:

    +
    +
    +
    +
    git submodule update --init submodules/stream-benchmark
    +./build-stream --optimization-level 3
    +./run --userland "$(./getvar userland_build_dir)/submodules/stream-benchmark/stream_c.exe"
    +
    +
    +
    +

    Decrease the benchmark size and the retry count to finish simulation faster, but possibly have a less representative result:

    +
    +
    +
    +
    ./run --userland "$(./getvar userland_build_dir)/submodules/stream-benchmark/stream_c.exe" --userland-args '100 2'
    +
    +
    +
    +

    Build and run on gem5 user mode:

    +
    +
    +
    +
    ./build-stream --optimization-level 3 --static
    +./run --emulator gem5 --userland "$(./getvar --static userland_build_dir)/submodules/stream-benchmark/stream_c.exe" --userland-args '1000 2'
    +
    +
    +
    +
    +

    21.8.3. PARSEC benchmark

    We have ported parts of the PARSEC benchmark for cross compilation at: https://github.com/cirosantilli/parsec-benchmark See the documentation on that repo to find out which benchmarks have been ported. Some of the benchmarks were are segfaulting, they are documented in that repo.

    @@ -23741,7 +23836,7 @@ cblas_dgemm( CblasColMajor, CblasNoTrans, CblasTrans,3,3,2 ,1, A,3, B,
    -
    21.8.2.1. PARSEC benchmark without parsecmgmt
    +
    21.8.3.1. PARSEC benchmark without parsecmgmt
    ./build --arch arm --download-dependencies gem5-buildroot parsec-benchmark
    @@ -23775,7 +23870,7 @@ cblas_dgemm(      CblasColMajor, CblasNoTrans, CblasTrans,3,3,2  ,1,    A,3,  B,
     
    -
    21.8.2.2. PARSEC change the input size
    +
    21.8.3.2. PARSEC change the input size

    Running a benchmark of a size different than test, e.g. simsmall, requires a rebuild with:

    @@ -23839,7 +23934,7 @@ cblas_dgemm( CblasColMajor, CblasNoTrans, CblasTrans,3,3,2 ,1, A,3, B,
    -
    21.8.2.3. PARSEC benchmark with parsecmgmt
    +
    21.8.3.3. PARSEC benchmark with parsecmgmt

    Most users won’t want to use this method because:

    @@ -23902,7 +23997,7 @@ parsecmgmt -a run -p splash2x.fmm -i test
    -
    21.8.2.4. PARSEC uninstall
    +
    21.8.3.4. PARSEC uninstall

    If you want to remove PARSEC later, Buildroot doesn’t provide an automated package removal mechanism as mentioned at: Section 20.6, “Remove Buildroot packages”, but the following procedure should be satisfactory:

    @@ -23920,7 +24015,7 @@ parsecmgmt -a run -p splash2x.fmm -i test
    -
    21.8.2.5. PARSEC benchmark hacking
    +
    21.8.3.5. PARSEC benchmark hacking

    If you end up going inside submodules/parsec-benchmark to hack up the benchmark (you will!), these tips will be helpful.