gem5: atomic hello se.py event queue totally understood

This commit is contained in:
Ciro Santilli 六四事件 法轮功
2019-10-15 00:00:01 +00:00
parent 64ee7c4e70
commit 07f047f2d7

View File

@@ -4567,7 +4567,7 @@ Build the kernel exactly as for <<graphic-mode-gem5-aarch64>> and then run with:
./run --arch aarch64 --dp650 --emulator gem5 --linux-build-id gem5-v4.15
....
==== Graphic mode gem5 internals
==== gem5 graphic mode internals
We cannot use mainline Linux because the <<gem5-arm-linux-kernel-patches>> are required at least to provide the `CONFIG_DRM_VIRT_ENCODER` option.
@@ -10189,6 +10189,8 @@ gem5 provides also provides a tracing mechanism documented at: http://www.gem5.o
less "$(./getvar --arch aarch64 run_dir)/trace.txt"
....
Our wrapper just forwards the options to the `--debug-flags` gem5 option.
Keep in mind however that the disassembly is very broken in several places as of 2019q2, so you can't always trust it.
Output the trace to stdout instead of a file:
@@ -10214,6 +10216,8 @@ We also have a shortcut for `--trace ExecAll -trace-stdout` with `--trace-insts-
;
....
Be warned, the trace is humongous, at 16Gb.
This would produce a lot of output however, so you will likely not want that when tracing a Linux kernel boot instructions. But it can be very convenient for smaller traces such as <<baremetal>>.
List all available debug flags:
@@ -10236,12 +10240,10 @@ The most important trace flags to know about are:
* <<gem5-registers-trace-format,`Registers`>>
* <<gem5-syscall-emulation-mode-syscall-tracing,`SyscallBase`, `SyscallVerbose`>>
The traces are generated from `DPRINTF(<trace-id>` calls scattered throughout the code.
Trace internals are discussed at: <<gem5-trace-internals>>.
As can be seen on the `Sconstruct`, `Exec` is just an alias that enables a set of flags.
Be warned, the trace is humongous, at 16Gb.
We can make the trace smaller by naming the trace file as `trace.txt.gz`, which enables GZIP compression, but that is not currently exposed on our scripts, since you usually just need something human readable to work on.
Enabling tracing made the runtime about 4x slower on the <<p51>>, with or without `.gz` compression.
@@ -10256,6 +10258,33 @@ less "$(./getvar --arch aarch64 run_dir)/trace-lines.txt"
TODO: 7452d399290c9c1fc6366cdad129ef442f323564 `./trace2line` this is too slow and takes hours. QEMU's processing of 170k events takes 7 seconds. gem5's processing is analogous, but there are 140M events, so it should take 7000 seconds ~ 2 hours which seems consistent with what I observe, so maybe there is no way to speed this up... The workaround is to just use gem5's `ExecSymbol` to get function granularity, and then GDB individually if line detail is needed?
===== gem5 trace internals
gem5 traces are generated from `DPRINTF(<trace-id>` calls scattered throughout the code, except for `ExecAll` instruction traces, which uses `Debug::ExecEnable` directly..
The trace IDs are themselves encoded in `SConscript` files, e.g.:
....
DebugFlag('Event'
....
in `src/cpu/SConscript`.
The build system then automatically adds the options to the `--debug-flags`.
For this entry, the build system then generates a file `build/ARM/debug/ExecEnable.hh`, which contains:
....
namespace Debug {
class SimpleFlag;
extern SimpleFlag ExecEnable;
}
....
and must be included in from callers of `DPRINTF(` as `<debug/ExecEnable.hh>`.
Tested in b4879ae5b0b6644e6836b0881e4da05c64a6550d.
===== gem5 ExecAll trace format
This debug flag traces all instructions.
@@ -12410,6 +12439,12 @@ TODO: describe the main characteristics of each platform, as of gem5 5e83d703522
=== gem5 internals
Internals under other sections:
* <<gem5-trace-internals>>
* <<gem5-checkpoint-internals>>
* <<gem5-graphic-mode-internals>>
==== gem5 Eclipse configuration
In order to develop complex C++ software such as gem5, a good IDE setup is fundamental.
@@ -12424,6 +12459,8 @@ I recommend the following settings, tested in Eclipse 2019.09, Ubuntu 18.04:
** ./src/ in the source tree
** the ISA specific build directory which contains some self-generated stuff, e.g.: out/gem5/default/build/ARM
To run and GDB step debug the executable, just copy the full command line from the output `./run`, and configure it into Eclipse.
==== gem5 Python C++ interaction
The interaction uses the Python C extension interface https://docs.python.org/2/extending/extending.html interface through the pybind11 helper library: https://github.com/pybind/pybind11
@@ -12608,6 +12645,211 @@ TODO: the file path name appears to be passed as a command line argument to the
Tested at gem5 b4879ae5b0b6644e6836b0881e4da05c64a6550d.
==== gem5 event queue
gem5 is an event based simulator, and as such the event queue is of of the crucial elements in the system.
The gem5 event queue stores one callback event for each future point in time.
The event queue is implemented in the class `EventQueue` in the file `src/sim/eventq.hh`.
Not all times need to have an associated event: if a given time has no events, gem5 just skips it and jumps to the next event: the queue is basically a linked list of events.
Important examples of events include:
* CPU ticks
* TODO peripherals and memory
At the beginning of simulation, gem5 sets up exactly two events:
* the first CPU cycle
* one exit event at the end of time which triggers <<gem5-simulate-limit-reached>>
Tick events then get triggered one by one as simulation progresses, in addition to any other system events.
The `EventQueue` class has one awesome `dump()` function that prints a human friendly representation of the queue, and can be easily called from GDB. TODO example.
We can also observe what is going on in the event queue with the `Event` <<gem5-tracing,debug flag>>.
Event execution is done at `EventQueue::serviceOne()`:
....
Event *exit_event = eventq->serviceOne();
....
This calls the `Event::process` method of the event.
===== gem5 event queue AtomicSimpleCPU syscall emulation freestanding example analysis
Let's now analyze every single event on a minimal <<gem5-syscall-emulation-mode>> in the <<gem5-cpu-types,simplest CPU that we have>>:
....
./run \
--arch aarch64 \
--emulator gem5 \
--userland userland/arch/aarch64/freestanding/linux/hello.S \
--trace Event \
--trace-stdout \
;
....
At LKMC a0ea29835b9bacc6aa1cceb24c79d895315991d4 + 1 this outputs:
....
0: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event scheduled @ 0
**** REAL SIMULATION ****
0: Event_70: generic event scheduled @ 0
info: Entering event queue @ 0. Starting simulation...
0: Event_70: generic event rescheduled @ 18446744073709551615
0: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event rescheduled @ 500
500: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event rescheduled @ 1000
1000: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event rescheduled @ 1500
1500: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event rescheduled @ 2000
hello
2000: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event rescheduled @ 2500
2500: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event rescheduled @ 3000
3000: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event rescheduled @ 3500
3500: Event_71: generic event scheduled @ 3500
....
which we immediately guess means that there is one event per tick when comparing to the `ExecAll` trace:
....
0: system.cpu A0 T0 : @asm_main_after_prologue : movz x0, #1, #0 : IntAlu : D=0x0000000000000001 flags=(IsInteger)
500: system.cpu A0 T0 : @asm_main_after_prologue+4 : adr x1, #28 : IntAlu : D=0x0000000000400098 flags=(IsInteger)
1000: system.cpu A0 T0 : @asm_main_after_prologue+8 : ldr w2, #4194464 : MemRead : D=0x0000000000000006 A=0x4000a0 flags=(IsInteger|IsMemRef|IsLoad)
1500: system.cpu A0 T0 : @asm_main_after_prologue+12 : movz x8, #64, #0 : IntAlu : D=0x0000000000000040 flags=(IsInteger)
2000: system.cpu A0 T0 : @asm_main_after_prologue+16 : svc #0x0 : IntAlu : flags=(IsSerializeAfter|IsNonSpeculative|IsSyscall)
hello
2500: system.cpu A0 T0 : @asm_main_after_prologue+20 : movz x0, #0, #0 : IntAlu : D=0x0000000000000000 flags=(IsInteger)
3000: system.cpu A0 T0 : @asm_main_after_prologue+24 : movz x8, #93, #0 : IntAlu : D=0x000000000000005d flags=(IsInteger)
3500: system.cpu A0 T0 : @asm_main_after_prologue+28 : svc #0x0 : IntAlu : flags=(IsSerializeAfter|IsNonSpeculative|IsSyscall)
Exiting @ tick 3500 because exiting with last active thread context
....
On the event trace, we can see:
* `AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event scheduled @ 0` schedules a tick event for time `0`, and this leads to the first clock tick
* `0: Event_70: generic event scheduled @ 0`: schedules the end of time event for time `0`, which is later rescheduled to the actual end of time: `0: Event_70: generic event rescheduled @ 18446744073709551615`
* at `0: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event rescheduled @ 500` the first clock tick must have finished running, and so to represent the next one, it was simply rescheduled for the next time `500`! This is done at the end of `AtomicSimpleCPU::tick()`:
+
....
if (_status != Idle)
reschedule(tickEvent, curTick() + latency, true);
....
* at `3500: Event_71: generic event scheduled @ 3500` the exit system call is called and the simulation ends.
+
A new event is scheduled for the current time itself. TODO what is this event?
Let's study the first event. From GDB, it's stack trace is:
....
Trace::OstreamLogger::logMessage() at trace.cc:149 0x5555593b3b1e
void Trace::Logger::dprintf_flag<char const*, char const*, unsigned long>() at 0x55555949e603
void Trace::Logger::dprintf<char const*, char const*, unsigned long>() at 0x55555949de58
Event::trace() at eventq.cc:395 0x55555946d109
EventQueue::schedule() at eventq_impl.hh:65 0x555557195441
EventManager::schedule() at eventq.hh:746 0x555557194aa2
AtomicSimpleCPU::activateContext() at atomic.cc:239 0x555559075531
SimpleThread::activate() at simple_thread.cc:177 0x555559545a63
Process::initState() at process.cc:283 0x555559484011
ArmProcess64::initState() at process.cc:126 0x55555730827a
ArmLinuxProcess64::initState() at process.cc:1,777 0x5555572d5e5e
....
The interesting call is at `AtomicSimpleCPU::activateContext`:
....
schedule(tickEvent, clockEdge(Cycles(0)));
....
which calls `EventManager::schedule`.
`AtomicSimpleCPU` is an `EventManager` because `SimObject` inherits from it.
`tickEvent` is an `EventFunctionWrapper` which contains a `std::function<void(void)> callback;`, and is initialized in the constructor as:
....
tickEvent([this]{ tick(); }, "AtomicSimpleCPU tick",
false, Event::CPU_Tick_Pri),
....
So that's how the main atomic tick loop works, fully understood!
The second event has backtrace:
....
Trace::OstreamLogger::logMessage() at trace.cc:149 0x5555593b3b1e
void Trace::Logger::dprintf_flag<char const*, char const*, unsigned long>() at 0x55555949e603
void Trace::Logger::dprintf<char const*, char const*, unsigned long>() at 0x55555949de58
Event::trace() at eventq.cc:395 0x55555946d109
EventQueue::schedule() at eventq_impl.hh:65 0x555557195441
BaseGlobalEvent::schedule() at global_event.cc:78 0x55555946d6f1
GlobalEvent::GlobalEvent() at 0x55555949d177
GlobalSimLoopExitEvent::GlobalSimLoopExitEvent() at sim_events.cc:61 0x555559474470
simulate() at simulate.cc:104 0x555559476d6f
....
so gets scheduled automatically at object creation `simulate()` through the `GlobalEvent()` constructor:
....
simulate_limit_event =
new GlobalSimLoopExitEvent(mainEventQueue[0]->getCurTick(),
"simulate() limit reached", 0);
....
This event indicates that the simulation should finish by overriding `bool isExitEvent()` which gets checked in the main simulation at `EventQueue::serviceOne`:
....
if (event->isExitEvent()) {
assert(!event->flags.isSet(Event::Managed) ||
!event->flags.isSet(Event::IsMainQueue)); // would be silly
return event;
....
And at long, we can guess without reading the code that `Event_71` is comes from the SE implementation of the exit syscall, so let's just confirm, the trace contains:
....
exitSimLoop() at sim_events.cc:97 0x5555594746e0
exitImpl() at syscall_emul.cc:215 0x55555948c046
exitFunc() at syscall_emul.cc:225 0x55555948c147
SyscallDesc::doSyscall() at syscall_desc.cc:72 0x5555594949b6
Process::syscall() at process.cc:401 0x555559484717
SimpleThread::syscall() at 0x555559558059
ArmISA::SupervisorCall::invoke() at faults.cc:856 0x5555572950d7
BaseSimpleCPU::advancePC() at base.cc:681 0x555559083133
AtomicSimpleCPU::tick() at atomic.cc:757 0x55555907834c
....
and `exitSimLoop()` does:
....
new GlobalSimLoopExitEvent(when + simQuantum, message, exit_code, repeat);
....
Tested at b4879ae5b0b6644e6836b0881e4da05c64a6550d.
===== gem5 event queue MinorCPU syscall emulation freestanding example analysis
The events <<gem5-event-queue-atomicsimplecpu-syscall-emulation-freestanding-example-analysis,for the Atomic CPU>> were pretty simple: basically just ticks.
But as we venture into <<gem5-cpu-types,more complex CPU models>> such as `MinorCPU`, the events get much more complex and interesting.
TODO: analyze the trace for:
....
./run \
--arch aarch64 \
--emulator gem5 \
--userland userland/arch/aarch64/freestanding/linux/hello.S \
--trace Event \
--trace-stdout \
-- \
--cpu-type MinorCPU \
--caches \
;
....
==== gem5 stats internals
This describes the internals of the <<gem5-m5out-stats-txt-file>>.