diff --git a/index.html b/index.html index e867f2c..f0a0c49 100644 --- a/index.html +++ b/index.html @@ -1056,7 +1056,14 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
  • 17.8.5. QEMU trace multicore
  • -
  • 17.8.6. gem5 tracing
  • +
  • 17.8.6. gem5 tracing + +
  • 17.9. QEMU GUI is unresponsive
  • @@ -1074,7 +1081,8 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
  • 18.2.2.2. gem5 cache size
  • @@ -16497,11 +16505,14 @@ reverse-continue
    -
    ./run --arch aarch64 --eval 'm5 exit' --emulator gem5 --trace Exec
    +
    ./run --arch aarch64 --eval 'm5 exit' --emulator gem5 --trace ExecAll
     less "$(./getvar --arch aarch64 run_dir)/trace.txt"
    +

    Keep in mind however that the disassembly is very broken in several places as of 2019q2, so you can’t always trust it.

    +
    +

    Output the trace to stdout instead of a file:

    @@ -16549,6 +16560,22 @@ less "$(./getvar gem5_source_dir)/src/cpu/exetrace.cc"
    +

    The most important trace flags to know about are:

    +
    +
    + +
    +

    The traces are generated from DPRINTF(<trace-id> calls scattered throughout the code.

    @@ -16564,6 +16591,24 @@ less "$(./getvar gem5_source_dir)/src/cpu/exetrace.cc"

    Enabling tracing made the runtime about 4x slower on the P51, with or without .gz compression.

    +

    Trace the source lines just like for QEMU with:

    +
    +
    +
    +
    ./trace-boot --arch aarch64 --emulator gem5
    +./trace2line --arch aarch64 --emulator gem5
    +less "$(./getvar --arch aarch64 run_dir)/trace-lines.txt"
    +
    +
    +
    +

    TODO: 7452d399290c9c1fc6366cdad129ef442f323564 ./trace2line this is too slow and takes hours. QEMU’s processing of 170k events takes 7 seconds. gem5’s processing is analogous, but there are 140M events, so it should take 7000 seconds ~ 2 hours which seems consistent with what I observe, so maybe there is no way to speed this up…​ The workaround is to just use gem5’s ExecSymbol to get function granularity, and then GDB individually if line detail is needed?

    +
    +
    +
    17.8.6.1. gem5 ExecAll trace format
    +
    +

    This debug flag traces all instructions.

    +
    +

    The output format is of type:

    @@ -16597,7 +16642,7 @@ less "$(./getvar gem5_source_dir)/src/cpu/exetrace.cc"

    25007500: time count in some unit. Note how the microops execute at further timestamps.

  • -

    system.cpu: distinguishes between CPUs when there are more than one

    +

    system.cpu: distinguishes between CPUs when there are more than one. For example, running Section 26.8.3, “ARM multicore” with two cores produces system.cpu0 and system.cpu1

  • T0: thread number. TODO: hyperthread? How to play with it?

    @@ -16609,7 +16654,7 @@ less "$(./getvar gem5_source_dir)/src/cpu/exetrace.cc"

    .1 as in @start_kernel.1: index of the microop

  • -

    stp: instruction disassembly. Seems to use .isa files dispersed per arch, which is an in house format: http://gem5.org/ISA_description_system

    +

    stp: instruction disassembly. Note however that the disassembly of many instructions are very broken as of 2019q2, and you can’t just trust them blindly.

  • strxi_uop x29, [ureg0]: microop disassembly.

    @@ -16632,18 +16677,138 @@ less "$(./getvar gem5_source_dir)/src/cpu/exetrace.cc"

    The best way to verify all of this is to write some baremetal code

    +
  • +
    +
    17.8.6.2. gem5 Registers trace format
    -

    Trace the source lines just like for QEMU with:

    +

    This flag shows a more detailed register usage than gem5 ExecAll trace format.

    +
    +
    +

    For example, if we run in LKMC 0323e81bff1d55b978a4b36b9701570b59b981eb:

    -
    ./trace-boot --arch aarch64 --emulator gem5
    -./trace2line --arch aarch64 --emulator gem5
    -less "$(./getvar --arch aarch64 run_dir)/trace-lines.txt"
    +
    ./run --arch aarch64 --baremetal userland/arch/aarch64/add.S --emulator gem5 --trace ExecAll,Registers --trace-stdout
    -

    TODO: 7452d399290c9c1fc6366cdad129ef442f323564 ./trace2line this is too slow and takes hours. QEMU’s processing of 170k events takes 7 seconds. gem5’s processing is analogous, but there are 140M events, so it should take 7000 seconds ~ 2 hours which seems consistent with what I observe, so maybe there is no way to speed this up…​ The workaround is to just use gem5’s ExecSymbol to get function granularity, and then GDB individually if line detail is needed?

    +

    then the stdout contains:

    +
    +
    +
    +
      31000: system.cpu A0 T0 : @main_after_prologue    :   movz   x0, #1, #0        : IntAlu :  D=0x0000000000000001  flags=(IsInteger)
    +  31500: system.cpu.[tid:0]: Setting int reg 34 (34) to 0.
    +  31500: system.cpu.[tid:0]: Reading int reg 0 (0) as 0x1.
    +  31500: system.cpu.[tid:0]: Setting int reg 1 (1) to 0x3.
    +  31500: system.cpu A0 T0 : @main_after_prologue+4    :   add   x1, x0, #2         : IntAlu :  D=0x0000000000000003  flags=(IsInteger)
    +  32000: system.cpu.[tid:0]: Setting int reg 34 (34) to 0.
    +  32000: system.cpu.[tid:0]: Reading int reg 1 (1) as 0x3.
    +  32000: system.cpu.[tid:0]: Reading int reg 31 (34) as 0.
    +  32000: system.cpu.[tid:0]: Setting int reg 0 (0) to 0x3.
    +
    +
    +
    +

    which corresponds to the two following instructions:

    +
    +
    +
    +
    mov x0, 1
    +add x1, x0, 2
    +
    +
    +
    +

    TODO that format is either buggy or is very difficult to understand:

    +
    +
    +
      +
    • +

      what is 34? Presumably some flags register?

      +
    • +
    • +

      what do the numbers in parenthesis mean at 31 (34)? Presumably some flags register?

      +
    • +
    • +

      why is the first instruction setting reg 1 and the second one reg 0, given that the first sets x0 and the second x1?

      +
    • +
    +
    +
    +
    +
    17.8.6.3. gem5 TARMAC traces
    + +
    +
    +
    17.8.6.4. gem5 tracing internals
    +
    +

    As of gem5 16eeee5356585441a49d05c78abc328ef09f7ace the default tracer is ExeTracer. It is set at:

    +
    +
    +
    +
    src/cpu/BaseCPU.py:63:default_tracer = ExeTracer()
    +
    +
    +
    +

    which then gets used at:

    +
    +
    +
    +
    class BaseCPU(ClockedObject):
    +    [...]
    +    tracer = Param.InstTracer(default_tracer, "Instruction tracer")
    +
    +
    +
    +

    All tracers derive from the common InstTracer base class:

    +
    +
    +
    +
    git grep ': InstTracer'
    +
    +
    +
    +

    gives:

    +
    +
    +
    +
    src/arch/arm/tracers/tarmac_parser.hh:218:    TarmacParser(const Params *p) : InstTracer(p), startPc(p->start_pc),
    +src/arch/arm/tracers/tarmac_tracer.cc:57:  : InstTracer(p),
    +src/cpu/exetrace.hh:67:    ExeTracer(const Params *params) : InstTracer(params)
    +src/cpu/inst_pb_trace.cc:72:    : InstTracer(p), buf(nullptr), bufSize(0), curMsg(nullptr)
    +src/cpu/inteltrace.hh:63:    IntelTrace(const IntelTraceParams *p) : InstTracer(p)
    +
    +
    +
    +

    As mentioned at gem5 TARMAC traces, there appears to be no way to select those currently without hacking the config scripts.

    +
    +
    +

    TARMAC is described at: gem5 TARMAC traces.

    +
    +
    +

    TODO: are IntelTrace and TarmacParser useful for anything or just relics?

    +
    +
    +

    Then there is also the NativeTrace class:

    +
    +
    +
    +
    src/cpu/nativetrace.hh:68:class NativeTrace : public ExeTracer
    +
    +
    +
    +

    which gets implemented in a few different ISAs, but not all:

    +
    +
    +
    +
    src/arch/arm/nativetrace.hh:40:class ArmNativeTrace : public NativeTrace
    +src/arch/sparc/nativetrace.hh:41:class SparcNativeTrace : public NativeTrace
    +src/arch/x86/nativetrace.hh:41:class X86NativeTrace : public NativeTrace
    +
    +
    +
    +

    TODO: I can’t find any usages of those classes from in-tree configs.

    +
    @@ -17073,7 +17238,13 @@ ps Haux | grep qemu | wc
    -
    18.2.2.1.3. gem5 ARM full system with more than 8 cores
    +
    18.2.2.1.3. gem5 se.py user mode with 2 or more pthreads fails with because simulate() limit reached
    +
    +

    See bug report at: https://github.com/cirosantilli/linux-kernel-module-cheat/issues/81

    +
    +
    +
    +
    18.2.2.1.4. gem5 ARM full system with more than 8 cores

    https://stackoverflow.com/questions/50248067/how-to-run-a-gem5-arm-aarch64-full-system-simulation-with-fs-py-with-more-than-8

    @@ -18750,7 +18921,7 @@ git -C "$(./getvar linux_source_dir)" checkout -

    drm: Add component-aware simple encoder allows you to see images through VNC, see: Section 13.3, “gem5 graphic mode”

  • -

    gem5: Add support for gem5’s extended GIC mode adds support for more than 8 cores, see: Section 18.2.2.1.3, “gem5 ARM full system with more than 8 cores”

    +

    gem5: Add support for gem5’s extended GIC mode adds support for more than 8 cores, see: Section 18.2.2.1.4, “gem5 ARM full system with more than 8 cores”

  • @@ -23926,7 +24097,7 @@ ldmia sp!, reglist
    -
    dest = `left & ~right`
    +
    dest = left & ~right