diff --git a/README.adoc b/README.adoc index 467ac54..208ff41 100644 --- a/README.adoc +++ b/README.adoc @@ -5530,8 +5530,8 @@ A few imperfections of our benchmarking method are: Solutions to these problems include: -* modify benchmark code with instrumentation directly, as PARSEC and ARM employees have been doing: https://github.com/arm-university/arm-gem5-rsk/blob/aa3b51b175a0f3b6e75c9c856092ae0c8f2a7cdc/parsec_patches/xcompile-patch.diff#L230 -* monitor known addresses +* modify benchmark code with instrumentation directly, see <> for an example. +* monitor known addresses TODO possible? Create an example. Discussion at: https://stackoverflow.com/questions/48944587/how-to-count-the-number-of-cpu-clock-cycles-between-the-start-and-end-of-a-bench/48944588#48944588 @@ -6243,23 +6243,39 @@ Cycles instead of instructions: Otherwise the simulation runs forever by default. -=== m5 +=== m5ops -`m5` is a guest command line utility that is installed and run on the guest. +m5ops are magic instructions which lead gem5 to do magic things, like quitting or dumping stats. -Its source is present under the gem5 main tree. +Documentation: http://gem5.org/M5ops -It generates magic instructions, which lead gem5 to do magic things, like `dumpstats` or `exit`. +There are two main ways to use m5ops: -It is however under-documented, so let's document some of its capabilities here. +* <> +* <> -Part of those explanations could be deduced from the documentation of the magic instructions themselves: http://gem5.org/M5ops +`m5` is convenient if you only want to take snapshots before or after the benchmark, without altering its source code. It uses the <> as its backend. -==== m5 exit +`m5` cannot should / should not be used however: + +* in bare metal setups +* when you want to call the instructions from inside interest points of your benchmark. Otherwise you add the syscall overhead to the benchmark, which is more intrusive and might affect results. ++ +Why not just hardcode some <> as in our example instead, since you are going to modify the source of the benchmark anyways? + +==== m5 + +`m5` is a guest command line utility that is installed and run on the guest, that serves as a CLI front-end for the <> + +Its source is present in the gem5 tree: https://github.com/gem5/gem5/blob/6925bf55005c118dc2580ba83e0fa10b31839ef9/util/m5/m5.c + +It is possible to guess what most tools do from the corresponding <>, but let's at least document the less obvious ones here. + +===== m5 exit Quit gem5 with exit status 0. -==== m5 fail +===== m5 fail Quit gem5 with the given exit status. @@ -6267,7 +6283,7 @@ Quit gem5 with the given exit status. m5 fail 1 .... -==== m5 writefile +===== m5 writefile Send a guest file to the host. <<9p>> is a more advanced alternative. @@ -6290,7 +6306,7 @@ Does not work for subdirectories, gem5 crashes: m5 writefile myfileguest mydirhost/myfilehost .... -==== m5 readfile +===== m5 readfile https://stackoverflow.com/questions/49516399/how-to-use-m5-readfile-and-m5-execfile-in-gem5/49538051#49538051 @@ -6306,7 +6322,7 @@ Guest: m5 readfile .... -==== m5 execfile +===== m5 execfile Host: @@ -6323,6 +6339,92 @@ chmod +x /tmp/execfile m5 execfile .... +==== m5ops instructions + +The executable `/m5ops.out` illustrates how to hard code with inline assembly the m5ops that you are most likely to hack into the benchmark you are analysing: + +.... +# checkpoint +/m5ops.out c +# dumpstats +/m5ops.out d +# dump exit +/m5ops.out e +# dump resetstats +/m5ops.out r +.... + +Source: link:kernel_module/user/m5ops.c[] + +That executable is of course a subset of <> and useless by itself: its goal is only illustrate how to hardcode some <> yourself as one-liners. + +In theory, the cleanest way to add m5ops to your benchmarks would be to do exactly what the `m5` tool does: + +* include link:https://github.com/gem5/gem5/blob/05c4c2b566ce351ab217b2bd7035562aa7a76570/include/gem5/asm/generic/m5ops.h[`include/gem5/asm/generic/m5ops.h`] +* link with the `.o` file under `util/m5` for the correct arch, e.g. `m5op_arm_A64.o` for aarch64. + +However, I think it is usually not worth the trouble of hacking up the build system of the benchmark to do this, and I recommend just hardcoding in a few raw instructions here and there, and managing it with version control + `sed`. + +Related: https://www.mail-archive.com/gem5-users@gem5.org/msg15418.html + +===== m5ops instructions interface + +Let's study how <> uses them: + +* link:https://github.com/gem5/gem5/blob/05c4c2b566ce351ab217b2bd7035562aa7a76570/include/gem5/asm/generic/m5ops.h[`include/gem5/asm/generic/m5ops.h`]: defines the magic constants that represent the instructions +* link:https://github.com/gem5/gem5/blob/05c4c2b566ce351ab217b2bd7035562aa7a76570/util/m5/m5op_arm_A64.S[`util/m5/m5op_arm_A64.S`]: use the magic constants that represent the instructions using C preprocessor magic +* link:https://github.com/gem5/gem5/blob/05c4c2b566ce351ab217b2bd7035562aa7a76570/util/m5/m5.c[`util/m5/m5.c`]: the actual executable. Gets linked to `m5op_arm_A64.S` which defines a function for each m5op. + +We notice that there are two different implementations for each arch: + +* magic instructions, which don't exist in the corresponding arch +* magic memory addresses on a given page + +TODO: what is the advantage of magic memory addresses? Because you have to do more setup work by telling the kernel never to touch the magic page. For the magic instructions, the only thing that could go wrong is if you run some crazy kind of fuzzing workload that generates random instructions. + +Then, in aarch64 magic instructions for example, the lines: + +.... +.macro m5op_func, name, func, subfunc + .globl \name + \name: + .long 0xff000110 | (\func << 16) | (\subfunc << 12) + ret +.... + +define a simple function function for each m5op. Here we see that: + +* `0xff000110` is a base mask for the magic non-existing instruction +* `\func` and `\subfunc` are OR-applied on top of the base mask, and define m5op this is. ++ +Those values will loop over the magic constants defined in `m5ops.h` with the deferred preprocessor idiom. ++ +For example, `exit` is `0x21` due to: ++ +.... +#define M5OP_EXIT 0x21 +.... + +Finally, `m5.c` calls the defined functions as in: + +.... +m5_exit(ints[0]); +.... + +Therefore, the runtime "argument" that gets passed to the instruction, e.g. the desired exit status in the case of `exit`, gets passed directly through the link:https://en.wikipedia.org/wiki/Calling_convention#ARM_(A64)[aarch64 calling convention]. + +That convention specifies that `x0` to `x7` contain the function arguments, so `x0` contains the first argument, and `x1` the second. + +In our `m5ops` example, we just hardcode everything in the assembly one-liners we are producing. + +We ignore the `\subfunc` since it is always 0 on the ops that interest us. + +===== m5op annotations + +`include/gem5/asm/generic/m5ops.h` also describes some annotation instructions. + +What they mean: https://stackoverflow.com/questions/50583962/what-are-the-gem5-annotations-mops-magic-instructions-and-how-to-use-them + === gem5 arm Linux kernel patches https://gem5.googlesource.com/arm/linux/ contains an ARM Linux kernel fork with a few gem5 specific Linux kernel patches on top of mainline created by ARM Holdings. diff --git a/kernel_module/user/m5ops.c b/kernel_module/user/m5ops.c new file mode 100644 index 0000000..490d6ab --- /dev/null +++ b/kernel_module/user/m5ops.c @@ -0,0 +1,60 @@ +#include +#include +#include + +#define ENABLED 1 +#if defined(__aarch64__) +static void m5_checkpoint() +{ + __asm__ __volatile__ ("mov x0, 0; mov x1, 0; .inst 0xff000110 | (0x43 << 16);"); +}; +static void m5_dump_stats() +{ + __asm__ __volatile__ ("mov x0, 0; mov x1, 0; .inst 0xff000110 | (0x41 << 16);"); +}; +static void m5_exit() +{ + __asm__ __volatile__ ("mov x0, 0; .inst 0xff000110 | (0x21 << 16);"); +}; +static void m5_reset_stats() +{ + __asm__ __volatile__ ("mov x0, 0; mov x1, 0; .inst 0xff000110 | (0x40 << 16);"); +}; +#else +#undef ENABLED +#define ENABLED 0 +#endif + +int main( +#if ENABLED +int argc, char **argv +#else +void +#endif +) +{ +#if defined(__aarch64__) + char action; + if (argc > 1) { + action = argv[1][0]; + } else { + action = 'e'; + } + switch (action) + { + case 'c': + m5_checkpoint(0, 0); + break; + case 'd': + m5_dump_stats(0, 0); + break; + case 'e': + m5_exit(); + break; + case 'r': + m5_reset_stats(); + break; + } +#endif + return EXIT_SUCCESS; +}