From 42d86576cd45878d5db448cfc1c0f111924de2f9 Mon Sep 17 00:00:00 2001 From: Ciro Santilli Date: Sat, 24 Feb 2018 05:27:03 +0000 Subject: [PATCH] Move to the more automated gem5-bench benchmarking script. Enable everything in the toolchain in preparation to future benchmarking to prevent future rebuilds, notably C++, Fortran and LTO support. Document compiler optimizations for benchmarking. Document graph-build for monitoring build times. --- README.adoc | 53 +++++++++++++++++++++++++++++++-------- buildroot_config_fragment | 6 +++++ gem5-bench | 17 +++++++++++++ gem5-cycles | 2 -- 4 files changed, 66 insertions(+), 12 deletions(-) create mode 100755 gem5-bench delete mode 100755 gem5-cycles diff --git a/README.adoc b/README.adoc index b8eeba7..daf834b 100644 --- a/README.adoc +++ b/README.adoc @@ -1533,24 +1533,24 @@ https://stackoverflow.com/questions/48944587/how-to-count-the-number-of-cpu-cloc Let's benchmark https://en.wikipedia.org/wiki/Dhrystone[Dhrystone] which Buildroot provides: .... -./run -a arm -e 'init=/eval.sh - lkmc_eval="m5 checkpoint;m5 dumpstats;dhrystone 1000;m5 exit"' -g -./gem5-cycles +./gem5-bench dhrystone 1000 .... -`./gem5-cycles` outputs the approximate number of CPU cycles it took Dhrystone to run. A few possible problems are: +This initial run generates a <> after the kernel boots and before running the benchmark. + +Then we can speed up further benchmark runs by skipping the Linux kernel boot: + +.... +./gem5-bench -r dhrystone 1000 +.... + +These commands output the approximate number of CPU cycles it took Dhrystone to run. A few possible problems are: * when we do `m5 dumpstats`, there is some time passed before the `exec` system call returns and the actual benchmark starts * the benchmark outputs to stdout, which means so extra cycles in addition to the actual computation. But TODO: how to get the output to check that it is correct without such IO cycles? Those problems should be insignificant if the benchmark runs for long enough however. -We can then speed up further benchmark runs by skipping the Linux kernel boot: - -.... -./run -a arm -e 'init=/eval.sh - lkmc_eval="m5 dumpstats;dhrystone 1000;m5 exit"' -g -- -r 1 -./gem5-cycles -.... - TODO: the cycle counts on the original run and the one with checkpoint restore differ slightly. Why? Multiple checkpoint restores give the same results however. Now you can play a fun little game with your friends: @@ -1575,6 +1575,26 @@ Each time we run `m5 dumpstats`, a section with the following format is added to TODO: diff out all the stats, not just `system.cpu.numCycles`. +====== Enable compiler optimizations + +If you are benchmarking compiled programs instead of hand written assembly, remember that we configure Buildroot to disable optimizations by default with: + +.... +BR2_OPTIMIZE_0=y +.... + +to improve the debugging experience. + +You will likely want to change that to: + +.... +BR2_OPTIMIZE_3=y +.... + +and do a full rebuild. + +TODO is it possible to compile a single package with optimizations enabled? In any case, this wouldn't be very representative, since calls to an unoptimized libc will also have an impact on performance. Kernel-wise it should be fine though, since the kernel requires `O=2`. + ===== GEM5 kernel boot command line arguments Analogous <>: @@ -2231,6 +2251,19 @@ diff .config.olg .config Copy and paste the diff additions to `buildroot_config_fragment`. +==== What is making my build so slow? + +.... +cd buildroot/output.x86_64~ +make graph-build +xdg-open graphs/build.pie-packages.pdf +.... + +Our phylosophy is: + +* if something adds little to the build time, build it in by default +* otherwise, make it optional + === About This project is for people who want to learn and modify low level system components: diff --git a/buildroot_config_fragment b/buildroot_config_fragment index d5ba558..55af557 100644 --- a/buildroot_config_fragment +++ b/buildroot_config_fragment @@ -1,3 +1,7 @@ +BR2_ENABLE_LOCALE=y +BR2_GCC_ENABLE_GRAPHITE=y +BR2_GCC_ENABLE_LTO=y +BR2_GCC_ENABLE_OPENMP=y BR2_GLOBAL_PATCH_DIR="../global_patch_dir" BR2_PACKAGE_BUSYBOX_CONFIG_FRAGMENT_FILES="../busybox_config_fragment" BR2_PACKAGE_DHRYSTONE=y @@ -12,6 +16,8 @@ BR2_ROOTFS_POST_IMAGE_SCRIPT="../rootfs_post_image_script" BR2_ROOTFS_USERS_TABLES="../user_table" BR2_TARGET_ROOTFS_CPIO=y BR2_TARGET_ROOTFS_EXT2=y +BR2_TOOLCHAIN_BUILDROOT_CXX=y +BR2_TOOLCHAIN_BUILDROOT_FORTRAN=y BR2_TOOLCHAIN_BUILDROOT_WCHAR=y # Host GDB diff --git a/gem5-bench b/gem5-bench new file mode 100755 index 0000000..2d7d065 --- /dev/null +++ b/gem5-bench @@ -0,0 +1,17 @@ +#!/usr/bin/env bash +replay=false +while getopts r OPT; do + case "$OPT" in + r) + replay=true + ;; + esac +done +shift "$(($OPTIND - 1))" +bench="$@" +if "$replay"; then + ./run -a arm -e 'init=/eval.sh - lkmc_eval="m5 resetstats;'"$bench"';m5 exit"' -g -- -r 1 +else + ./run -a arm -e 'init=/eval.sh - lkmc_eval="m5 checkpoint;m5 resetstats;'"$bench"';m5 exit"' -g +fi +awk '/^system.cpu.numCycles /{ print $2 }' m5out/stats.txt diff --git a/gem5-cycles b/gem5-cycles deleted file mode 100755 index f16fd04..0000000 --- a/gem5-cycles +++ /dev/null @@ -1,2 +0,0 @@ -#!/usr/bin/env bash -grep numCycles m5out/stats.txt | awk '{t0 = $2; getline; print $2 - t0; exit;}'