diff --git a/README.adoc b/README.adoc index f54f95b..2524cfe 100644 --- a/README.adoc +++ b/README.adoc @@ -12643,6 +12643,56 @@ xdg-open "$(./getvar --arch arm --emulator gem5 m5out_dir)/config.dot.svg" An example of such file can be seen at: <>. +On Ubuntu 20.04, you can also see the dot file "directly" with xdot: + +.... +xdot "$(./getvar --arch arm --emulator gem5 m5out_dir)/config.dot" +.... + +which is kind of really cool because it allows you to graph arrows with clicks. + +It is worth noting that if you are running a bunch of short simulations, dot/SVG/PDF generation could have a significant impact in simulation startup time, so it is something to watch out for. As per https://gem5-review.googlesource.com/c/public/gem5/+/29232 it can be turned off with: + +.... +gem5.opt --dot-config='' +.... + +or in LKMC: + +.... +./run --gem5-exe-args='--dot-config= --json-config= --dump-config=' +.... + +The time difference can be readily observed on minimal examples by running gem5 with `time`. + +By looking into gem5 872cb227fdc0b4d60acc7840889d567a6936b6e1 `src/python/m5/util/dot_writer.py` are can try to remove the SVG/PDF conversion to see if those dominate the runtime: + +.... +def do_dot(root, outdir, dotFilename): + if not pydot: + warn("No dot file generated. " + + "Please install pydot to generate the dot file and pdf.") + return + # * use ranksep > 1.0 for for vertical separation between nodes + # especially useful if you need to annotate edges using e.g. visio + # which accepts svg format + # * no need for hoizontal separation as nothing moves horizonally + callgraph = pydot.Dot(graph_type='digraph', ranksep='1.3') + dot_create_nodes(root, callgraph) + dot_create_edges(root, callgraph) + dot_filename = os.path.join(outdir, dotFilename) + callgraph.write(dot_filename) + try: + # dot crashes if the figure is extremely wide. + # So avoid terminating simulation unnecessarily + callgraph.write_svg(dot_filename + ".svg") + callgraph.write_pdf(dot_filename + ".pdf") + except: + warn("failed to generate dot output from %s", dot_filename) +.... + +but nope, they don't, `dot_create_nodes` and `dot_create_edges` are the culprits, so the only way to gain speed is to remove `.dot` generation altogether. It is tempting to do this by default on LKMC and add an option to enable dot generation when desired so we can be a bit faster by default... but I'm lazy to document the option right now. When it annoys me further maybe :-) + === m5term We use the `m5term` in-tree executable to connect to the terminal instead of a direct `telnet`. @@ -13217,7 +13267,7 @@ Implementations: Useful to <>. -====== gem5 `TiminSimpleCPU` +====== gem5 `TimingSimpleCPU` `TimingSimpleCPU`: memory accesses are realistic, but the CPU has no pipeline. The simulation is faster than detailed models, but slower than `AtomicSimpleCPU`. @@ -13241,6 +13291,8 @@ The weird name "Minor" stands for "M (TODO what is M) IN ONder". Its 4 stage pipeline is described at the "MinorCPU" section of <>. +A commented execution example can be seen at: <>. + There is also an in-tree doxygen at: https://github.com/gem5/gem5/blob/9fc9c67b4242c03f165951775be5cd0812f2a705/src/doc/inside-minor.doxygen[`src/doc/inside-minor.doxygen`] and rendered at: http://pages.cs.wisc.edu/~swilson/gem5-docs/minor.html As of 2019, in-order cores are mostly present in low power/cost contexts, for example little cores of https://en.wikipedia.org/wiki/ARM_big.LITTLE[ARM bigLITTLE]. @@ -13267,10 +13319,12 @@ Implemented by Pierre-Yves Péneau from LIRMM, which is a research lab in Montpe ===== gem5 DerivO3CPU -Generic out-of-order core. "O3" Stands for "Out Of Order"! +Generic <>. "O3" Stands for "Out Of Order"! Analogous to <>, but modelling an out of order core instead of in order. +A commented execution example can be seen at: <>. + Existing parametrizations: * `ex5_big`: big corresponding to `ex5_LITTLE`, by same author at same time. It description reads: @@ -15789,6 +15843,16 @@ Fault STXRX64::completeAcc(PacketPtr pkt, ExecContext *xc, From GDB on <> we see that `completeAcc` gets called from `TimingSimpleCPU::completeDataAccess`. +===== gem5 microops + +TODO + +Some gem5 instructions break down into multiple microops. + +Microops are very similar to regular instructions, and show on the <> since that flag implies `ExecMicro`. + +On aarch64 for example, one of the simplest microoped instructions is <>, which does the relatively complex operation of storing two values to memory at once, and is therefore a good candidate for being broken down into microops. + ==== gem5 port system The gem5 memory system is connected in a very flexible way through the port system. @@ -16652,6 +16716,36 @@ BaseSimpleCPU::BaseSimpleCPU(BaseSimpleCPUParams *p) } .... +==== gem5 functional units + +TODO + +Each instruction is marked with a class, and each class can execute in a given functional unit. + +Which units are available is visible for example on the <> of a <> run. Functional units are not present in simple CPUs like <>. + +For example, on gem5 872cb227fdc0b4d60acc7840889d567a6936b6e1, the `config.ini` of a minor run: + +.... +./run \ + --arch aarch64 \ + --emulator gem5 \ + --userland userland/arch/aarch64/freestanding/linux/hello.S \ + --trace-insts-stdout \ + -N1 \ + -- \ + --cpu-type MinorCPU \ + --caches +.... + +contains: + +.... +[system.cpu] +type=MinorCPU +children=branchPred dcache dtb executeFuncUnits icache interrupts isa itb power_state tracer workload +.... + ==== gem5 code generation gem5 uses a ton of code generation, which makes the project horrendous: @@ -24159,6 +24253,35 @@ Oh my God, unoptimized code is so horrendously inefficient, even I can't stand a == Computer architecture +=== Instruction pipelining + +In gem5, can be seen on: + +* <> +* <> + +==== Classic RISC pipeline + +https://en.wikipedia.org/wiki/Classic_RISC_pipeline + +gem5's <> implements a similar but 4 stage pipeline. TODO why didn't they go with the classic RISC pipeline instead? + +=== Superscalar processor + +https://en.wikipedia.org/wiki/Superscalar_processor + +http://www.lighterra.com/papers/modernmicroprocessors/ explains it well. + +You basically decode + +TODO in gem5? gem5 definitely has functional units explicitly modelled: <>, so do <> or <> have it? + +=== Out-of-order execution + +https://en.wikipedia.org/wiki/Out-of-order_execution + +gem5's model is <>. + === Hardware threads Intel name: "Hyperthreading" diff --git a/userland/cpp/atomic/main.hpp b/userland/cpp/atomic/main.hpp index 1fed060..d8ea55f 100644 --- a/userland/cpp/atomic/main.hpp +++ b/userland/cpp/atomic/main.hpp @@ -28,7 +28,7 @@ void threadMain(size_t niters) { "incq %0;" : "+g" (global), "+g" (i) // to prevent loop unrolling, and make results more comparable across methods, - // see also: https://cirosantilli.com/linux-kernel-module-cheat#infinite-busy-loop + // see also: https://cirosantilli.com/linux-kernel-module-cheat#c-busy-loop : : ); diff --git a/userland/gcc/busy_loop.c b/userland/gcc/busy_loop.c index 7a3e296..d715b6e 100644 --- a/userland/gcc/busy_loop.c +++ b/userland/gcc/busy_loop.c @@ -1,5 +1,5 @@ /* https://cirosantilli.com/linux-kernel-module-cheat#micro-benchmarks - * https://cirosantilli.com/linux-kernel-module-cheat#infinite-busy-loop + * https://cirosantilli.com/linux-kernel-module-cheat#c-busy-loop * https://cirosantilli.com/linux-kernel-module-cheat#benchmark-emulators-on-userland-executables */ #include