./run --arch aarch64 --eval 'm5 exit' --emulator gem5 --trace Exec +./run --arch aarch64 --eval 'm5 exit' --emulator gem5 --trace ExecAll less "$(./getvar --arch aarch64 run_dir)/trace.txt"
diff --git a/index.html b/index.html index e867f2c..f0a0c49 100644 --- a/index.html +++ b/index.html @@ -1056,7 +1056,14 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
./run --arch aarch64 --eval 'm5 exit' --emulator gem5 --trace Exec +./run --arch aarch64 --eval 'm5 exit' --emulator gem5 --trace ExecAll less "$(./getvar --arch aarch64 run_dir)/trace.txt"
Keep in mind however that the disassembly is very broken in several places as of 2019q2, so you can’t always trust it.
+Output the trace to stdout instead of a file:
The most important trace flags to know about are:
+Faults: CPU exceptions / interrupts, see an example at: ARM SVC instruction
The traces are generated from DPRINTF(<trace-id> calls scattered throughout the code.
Enabling tracing made the runtime about 4x slower on the P51, with or without .gz compression.
Trace the source lines just like for QEMU with:
+./trace-boot --arch aarch64 --emulator gem5 +./trace2line --arch aarch64 --emulator gem5 +less "$(./getvar --arch aarch64 run_dir)/trace-lines.txt"+
TODO: 7452d399290c9c1fc6366cdad129ef442f323564 ./trace2line this is too slow and takes hours. QEMU’s processing of 170k events takes 7 seconds. gem5’s processing is analogous, but there are 140M events, so it should take 7000 seconds ~ 2 hours which seems consistent with what I observe, so maybe there is no way to speed this up… The workaround is to just use gem5’s ExecSymbol to get function granularity, and then GDB individually if line detail is needed?
This debug flag traces all instructions.
+The output format is of type:
25007500: time count in some unit. Note how the microops execute at further timestamps.
system.cpu: distinguishes between CPUs when there are more than one
system.cpu: distinguishes between CPUs when there are more than one. For example, running Section 26.8.3, “ARM multicore” with two cores produces system.cpu0 and system.cpu1
T0: thread number. TODO: hyperthread? How to play with it?
.1 as in @start_kernel.1: index of the microop
stp: instruction disassembly. Seems to use .isa files dispersed per arch, which is an in house format: http://gem5.org/ISA_description_system
stp: instruction disassembly. Note however that the disassembly of many instructions are very broken as of 2019q2, and you can’t just trust them blindly.
strxi_uop x29, [ureg0]: microop disassembly.
The best way to verify all of this is to write some baremetal code
Trace the source lines just like for QEMU with:
+This flag shows a more detailed register usage than gem5 ExecAll trace format.
+For example, if we run in LKMC 0323e81bff1d55b978a4b36b9701570b59b981eb:
./trace-boot --arch aarch64 --emulator gem5 -./trace2line --arch aarch64 --emulator gem5 -less "$(./getvar --arch aarch64 run_dir)/trace-lines.txt"+
./run --arch aarch64 --baremetal userland/arch/aarch64/add.S --emulator gem5 --trace ExecAll,Registers --trace-stdout
TODO: 7452d399290c9c1fc6366cdad129ef442f323564 ./trace2line this is too slow and takes hours. QEMU’s processing of 170k events takes 7 seconds. gem5’s processing is analogous, but there are 140M events, so it should take 7000 seconds ~ 2 hours which seems consistent with what I observe, so maybe there is no way to speed this up… The workaround is to just use gem5’s ExecSymbol to get function granularity, and then GDB individually if line detail is needed?
then the stdout contains:
+31000: system.cpu A0 T0 : @main_after_prologue : movz x0, #1, #0 : IntAlu : D=0x0000000000000001 flags=(IsInteger) + 31500: system.cpu.[tid:0]: Setting int reg 34 (34) to 0. + 31500: system.cpu.[tid:0]: Reading int reg 0 (0) as 0x1. + 31500: system.cpu.[tid:0]: Setting int reg 1 (1) to 0x3. + 31500: system.cpu A0 T0 : @main_after_prologue+4 : add x1, x0, #2 : IntAlu : D=0x0000000000000003 flags=(IsInteger) + 32000: system.cpu.[tid:0]: Setting int reg 34 (34) to 0. + 32000: system.cpu.[tid:0]: Reading int reg 1 (1) as 0x3. + 32000: system.cpu.[tid:0]: Reading int reg 31 (34) as 0. + 32000: system.cpu.[tid:0]: Setting int reg 0 (0) to 0x3.+
which corresponds to the two following instructions:
+mov x0, 1 +add x1, x0, 2+
TODO that format is either buggy or is very difficult to understand:
+what is 34? Presumably some flags register?
what do the numbers in parenthesis mean at 31 (34)? Presumably some flags register?
why is the first instruction setting reg 1 and the second one reg 0, given that the first sets x0 and the second x1?
As of gem5 16eeee5356585441a49d05c78abc328ef09f7ace the default tracer is ExeTracer. It is set at:
src/cpu/BaseCPU.py:63:default_tracer = ExeTracer()+
which then gets used at:
+class BaseCPU(ClockedObject): + [...] + tracer = Param.InstTracer(default_tracer, "Instruction tracer")+
All tracers derive from the common InstTracer base class:
git grep ': InstTracer'+
gives:
+src/arch/arm/tracers/tarmac_parser.hh:218: TarmacParser(const Params *p) : InstTracer(p), startPc(p->start_pc), +src/arch/arm/tracers/tarmac_tracer.cc:57: : InstTracer(p), +src/cpu/exetrace.hh:67: ExeTracer(const Params *params) : InstTracer(params) +src/cpu/inst_pb_trace.cc:72: : InstTracer(p), buf(nullptr), bufSize(0), curMsg(nullptr) +src/cpu/inteltrace.hh:63: IntelTrace(const IntelTraceParams *p) : InstTracer(p)+
As mentioned at gem5 TARMAC traces, there appears to be no way to select those currently without hacking the config scripts.
+TARMAC is described at: gem5 TARMAC traces.
+TODO: are IntelTrace and TarmacParser useful for anything or just relics?
Then there is also the NativeTrace class:
src/cpu/nativetrace.hh:68:class NativeTrace : public ExeTracer+
which gets implemented in a few different ISAs, but not all:
+src/arch/arm/nativetrace.hh:40:class ArmNativeTrace : public NativeTrace +src/arch/sparc/nativetrace.hh:41:class SparcNativeTrace : public NativeTrace +src/arch/x86/nativetrace.hh:41:class X86NativeTrace : public NativeTrace+
TODO: I can’t find any usages of those classes from in-tree configs.
+See bug report at: https://github.com/cirosantilli/linux-kernel-module-cheat/issues/81
+drm: Add component-aware simple encoder allows you to see images through VNC, see: Section 13.3, “gem5 graphic mode”
gem5: Add support for gem5’s extended GIC mode adds support for more than 8 cores, see: Section 18.2.2.1.3, “gem5 ARM full system with more than 8 cores”
gem5: Add support for gem5’s extended GIC mode adds support for more than 8 cores, see: Section 18.2.2.1.4, “gem5 ARM full system with more than 8 cores”
dest = `left & ~right`+
dest = left & ~right