functional units stub

2026-01-23 02:05:57 +01:00 · 2020-06-10 02:00:01 +00:00
parent 6a5b9673c7
commit 0a3ce2f41f
3 changed files with 127 additions and 4 deletions
--- a/README.adoc
+++ b/README.adoc
@@ -12643,6 +12643,56 @@ xdg-open "$(./getvar --arch arm --emulator gem5 m5out_dir)/config.dot.svg"

 An example of such file can be seen at: <<config-dot-svg-timingsimplecpu>>.

+On Ubuntu 20.04, you can also see the dot file "directly" with xdot:
+
+....
+xdot "$(./getvar --arch arm --emulator gem5 m5out_dir)/config.dot"
+....
+
+which is kind of really cool because it allows you to graph arrows with clicks.
+
+It is worth noting that if you are running a bunch of short simulations, dot/SVG/PDF generation could have a significant impact in simulation startup time, so it is something to watch out for. As per https://gem5-review.googlesource.com/c/public/gem5/+/29232 it can be turned off with:
+
+....
+gem5.opt --dot-config=''
+....
+
+or in LKMC:
+
+....
+./run --gem5-exe-args='--dot-config= --json-config= --dump-config='
+....
+
+The time difference can be readily observed on minimal examples by running gem5 with `time`.
+
+By looking into gem5 872cb227fdc0b4d60acc7840889d567a6936b6e1 `src/python/m5/util/dot_writer.py` are can try to remove the SVG/PDF conversion to see if those dominate the runtime:
+
+....
+def do_dot(root, outdir, dotFilename):
+    if not pydot:
+        warn("No dot file generated. " +
+             "Please install pydot to generate the dot file and pdf.")
+        return
+    # * use ranksep > 1.0 for for vertical separation between nodes
+    # especially useful if you need to annotate edges using e.g. visio
+    # which accepts svg format
+    # * no need for hoizontal separation as nothing moves horizonally
+    callgraph = pydot.Dot(graph_type='digraph', ranksep='1.3')
+    dot_create_nodes(root, callgraph)
+    dot_create_edges(root, callgraph)
+    dot_filename = os.path.join(outdir, dotFilename)
+    callgraph.write(dot_filename)
+    try:
+        # dot crashes if the figure is extremely wide.
+        # So avoid terminating simulation unnecessarily
+        callgraph.write_svg(dot_filename + ".svg")
+        callgraph.write_pdf(dot_filename + ".pdf")
+    except:
+        warn("failed to generate dot output from %s", dot_filename)
+....
+
+but nope, they don't, `dot_create_nodes` and `dot_create_edges` are the culprits, so the only way to gain speed is to remove `.dot` generation altogether. It is tempting to do this by default on LKMC and add an option to enable dot generation when desired so we can be a bit faster by default... but I'm lazy to document the option right now. When it annoys me further maybe :-)
+
 === m5term

 We use the `m5term` in-tree executable to connect to the terminal instead of a direct `telnet`.
@@ -13217,7 +13267,7 @@ Implementations:

 Useful to <<gem5-restore-checkpoint-with-a-different-cpu,boot Linux fast and then checkpoint and switch to a more detailed CPU>>.

-====== gem5 `TiminSimpleCPU`
+====== gem5 `TimingSimpleCPU`

 `TimingSimpleCPU`: memory accesses are realistic, but the CPU has no pipeline. The simulation is faster than detailed models, but slower than `AtomicSimpleCPU`.

@@ -13241,6 +13291,8 @@ The weird name "Minor" stands for "M (TODO what is M) IN ONder".

 Its 4 stage pipeline is described at the "MinorCPU" section of <<gem5-arm-rsk>>.

+A commented execution example can be seen at: <<gem5-event-queue-minorcpu-syscall-emulation-freestanding-example-analysis>>.
+
 There is also an in-tree doxygen at: https://github.com/gem5/gem5/blob/9fc9c67b4242c03f165951775be5cd0812f2a705/src/doc/inside-minor.doxygen[`src/doc/inside-minor.doxygen`] and rendered at: http://pages.cs.wisc.edu/~swilson/gem5-docs/minor.html

 As of 2019, in-order cores are mostly present in low power/cost contexts, for example little cores of https://en.wikipedia.org/wiki/ARM_big.LITTLE[ARM bigLITTLE].
@@ -13267,10 +13319,12 @@ Implemented by Pierre-Yves Péneau from LIRMM, which is a research lab in Montpe

 ===== gem5 DerivO3CPU

-Generic out-of-order core. "O3" Stands for "Out Of Order"!
+Generic <<out-of-order-execution,out-of-order core>>. "O3" Stands for "Out Of Order"!

 Analogous to <<gem5-minorcpu,MinorCPU>>, but modelling an out of order core instead of in order.

+A commented execution example can be seen at: <<gem5-event-queue-derivo3cpu-syscall-emulation-freestanding-example-analysis>>.
+
 Existing parametrizations:

 * `ex5_big`: big corresponding to `ex5_LITTLE`, by same author at same time. It description reads:
@@ -15789,6 +15843,16 @@ Fault STXRX64::completeAcc(PacketPtr pkt, ExecContext *xc,

 From GDB on <<timingsimplecpu-analysis-ldr-stall>> we see that `completeAcc` gets called from `TimingSimpleCPU::completeDataAccess`.

+===== gem5 microops
+
+TODO
+
+Some gem5 instructions break down into multiple microops.
+
+Microops are very similar to regular instructions, and show on the <<gem5-execall-trace-format>> since that flag implies `ExecMicro`.
+
+On aarch64 for example, one of the simplest microoped instructions is <<armv8-aarch64-ldp-and-stp-instructions,STP>>, which does the relatively complex operation of storing two values to memory at once, and is therefore a good candidate for being broken down into microops.
+
 ==== gem5 port system

 The gem5 memory system is connected in a very flexible way through the port system.
@@ -16652,6 +16716,36 @@ BaseSimpleCPU::BaseSimpleCPU(BaseSimpleCPUParams *p)
    }
 ....

+==== gem5 functional units
+
+TODO
+
+Each instruction is marked with a class, and each class can execute in a given functional unit.
+
+Which units are available is visible for example on the <<gem5-config-ini>> of a <<gem5-minorcpu>> run. Functional units are not present in simple CPUs like <<gem5-timingsimplecpu>>.
+
+For example, on gem5 872cb227fdc0b4d60acc7840889d567a6936b6e1, the `config.ini` of a minor run:
+
+....
+./run   \
+  --arch aarch64 \
+  --emulator gem5 \
+  --userland userland/arch/aarch64/freestanding/linux/hello.S \
+  --trace-insts-stdout \
+  -N1 \
+  -- \
+  --cpu-type MinorCPU \
+  --caches
+....
+
+contains:
+
+....
+[system.cpu]
+type=MinorCPU
+children=branchPred dcache dtb executeFuncUnits icache interrupts isa itb power_state tracer workload
+....
+
 ==== gem5 code generation

 gem5 uses a ton of code generation, which makes the project horrendous:
@@ -24159,6 +24253,35 @@ Oh my God, unoptimized code is so horrendously inefficient, even I can't stand a

 == Computer architecture

+=== Instruction pipelining
+
+In gem5, can be seen on:
+
+* <<gem5-minorcpu>>
+* <<gem5-derivo3cpu>>
+
+==== Classic RISC pipeline
+
+https://en.wikipedia.org/wiki/Classic_RISC_pipeline
+
+gem5's <<gem5-minorcpu>> implements a similar but 4 stage pipeline. TODO why didn't they go with the classic RISC pipeline instead?
+
+=== Superscalar processor
+
+https://en.wikipedia.org/wiki/Superscalar_processor
+
+http://www.lighterra.com/papers/modernmicroprocessors/ explains it well.
+
+You basically decode
+
+TODO in gem5? gem5 definitely has functional units explicitly modelled: <<gem5-functional-units>>, so do <<gem5-minorcpu>> or <<gem5-derivo3cpu>> have it?
+
+=== Out-of-order execution
+
+https://en.wikipedia.org/wiki/Out-of-order_execution
+
+gem5's model is <<gem5-derivo3cpu>>.
+
 === Hardware threads

 Intel name: "Hyperthreading"