mirror of
https://github.com/cirosantilli/linux-kernel-module-cheat.git
synced 2026-01-23 02:05:57 +01:00
gem5: atomic hello se.py event queue totally understood
This commit is contained in:
250
README.adoc
250
README.adoc
@@ -4567,7 +4567,7 @@ Build the kernel exactly as for <<graphic-mode-gem5-aarch64>> and then run with:
|
||||
./run --arch aarch64 --dp650 --emulator gem5 --linux-build-id gem5-v4.15
|
||||
....
|
||||
|
||||
==== Graphic mode gem5 internals
|
||||
==== gem5 graphic mode internals
|
||||
|
||||
We cannot use mainline Linux because the <<gem5-arm-linux-kernel-patches>> are required at least to provide the `CONFIG_DRM_VIRT_ENCODER` option.
|
||||
|
||||
@@ -10189,6 +10189,8 @@ gem5 provides also provides a tracing mechanism documented at: http://www.gem5.o
|
||||
less "$(./getvar --arch aarch64 run_dir)/trace.txt"
|
||||
....
|
||||
|
||||
Our wrapper just forwards the options to the `--debug-flags` gem5 option.
|
||||
|
||||
Keep in mind however that the disassembly is very broken in several places as of 2019q2, so you can't always trust it.
|
||||
|
||||
Output the trace to stdout instead of a file:
|
||||
@@ -10214,6 +10216,8 @@ We also have a shortcut for `--trace ExecAll -trace-stdout` with `--trace-insts-
|
||||
;
|
||||
....
|
||||
|
||||
Be warned, the trace is humongous, at 16Gb.
|
||||
|
||||
This would produce a lot of output however, so you will likely not want that when tracing a Linux kernel boot instructions. But it can be very convenient for smaller traces such as <<baremetal>>.
|
||||
|
||||
List all available debug flags:
|
||||
@@ -10236,12 +10240,10 @@ The most important trace flags to know about are:
|
||||
* <<gem5-registers-trace-format,`Registers`>>
|
||||
* <<gem5-syscall-emulation-mode-syscall-tracing,`SyscallBase`, `SyscallVerbose`>>
|
||||
|
||||
The traces are generated from `DPRINTF(<trace-id>` calls scattered throughout the code.
|
||||
Trace internals are discussed at: <<gem5-trace-internals>>.
|
||||
|
||||
As can be seen on the `Sconstruct`, `Exec` is just an alias that enables a set of flags.
|
||||
|
||||
Be warned, the trace is humongous, at 16Gb.
|
||||
|
||||
We can make the trace smaller by naming the trace file as `trace.txt.gz`, which enables GZIP compression, but that is not currently exposed on our scripts, since you usually just need something human readable to work on.
|
||||
|
||||
Enabling tracing made the runtime about 4x slower on the <<p51>>, with or without `.gz` compression.
|
||||
@@ -10256,6 +10258,33 @@ less "$(./getvar --arch aarch64 run_dir)/trace-lines.txt"
|
||||
|
||||
TODO: 7452d399290c9c1fc6366cdad129ef442f323564 `./trace2line` this is too slow and takes hours. QEMU's processing of 170k events takes 7 seconds. gem5's processing is analogous, but there are 140M events, so it should take 7000 seconds ~ 2 hours which seems consistent with what I observe, so maybe there is no way to speed this up... The workaround is to just use gem5's `ExecSymbol` to get function granularity, and then GDB individually if line detail is needed?
|
||||
|
||||
===== gem5 trace internals
|
||||
|
||||
gem5 traces are generated from `DPRINTF(<trace-id>` calls scattered throughout the code, except for `ExecAll` instruction traces, which uses `Debug::ExecEnable` directly..
|
||||
|
||||
The trace IDs are themselves encoded in `SConscript` files, e.g.:
|
||||
|
||||
....
|
||||
DebugFlag('Event'
|
||||
....
|
||||
|
||||
in `src/cpu/SConscript`.
|
||||
|
||||
The build system then automatically adds the options to the `--debug-flags`.
|
||||
|
||||
For this entry, the build system then generates a file `build/ARM/debug/ExecEnable.hh`, which contains:
|
||||
|
||||
....
|
||||
namespace Debug {
|
||||
class SimpleFlag;
|
||||
extern SimpleFlag ExecEnable;
|
||||
}
|
||||
....
|
||||
|
||||
and must be included in from callers of `DPRINTF(` as `<debug/ExecEnable.hh>`.
|
||||
|
||||
Tested in b4879ae5b0b6644e6836b0881e4da05c64a6550d.
|
||||
|
||||
===== gem5 ExecAll trace format
|
||||
|
||||
This debug flag traces all instructions.
|
||||
@@ -12410,6 +12439,12 @@ TODO: describe the main characteristics of each platform, as of gem5 5e83d703522
|
||||
|
||||
=== gem5 internals
|
||||
|
||||
Internals under other sections:
|
||||
|
||||
* <<gem5-trace-internals>>
|
||||
* <<gem5-checkpoint-internals>>
|
||||
* <<gem5-graphic-mode-internals>>
|
||||
|
||||
==== gem5 Eclipse configuration
|
||||
|
||||
In order to develop complex C++ software such as gem5, a good IDE setup is fundamental.
|
||||
@@ -12424,6 +12459,8 @@ I recommend the following settings, tested in Eclipse 2019.09, Ubuntu 18.04:
|
||||
** ./src/ in the source tree
|
||||
** the ISA specific build directory which contains some self-generated stuff, e.g.: out/gem5/default/build/ARM
|
||||
|
||||
To run and GDB step debug the executable, just copy the full command line from the output `./run`, and configure it into Eclipse.
|
||||
|
||||
==== gem5 Python C++ interaction
|
||||
|
||||
The interaction uses the Python C extension interface https://docs.python.org/2/extending/extending.html interface through the pybind11 helper library: https://github.com/pybind/pybind11
|
||||
@@ -12608,6 +12645,211 @@ TODO: the file path name appears to be passed as a command line argument to the
|
||||
|
||||
Tested at gem5 b4879ae5b0b6644e6836b0881e4da05c64a6550d.
|
||||
|
||||
==== gem5 event queue
|
||||
|
||||
gem5 is an event based simulator, and as such the event queue is of of the crucial elements in the system.
|
||||
|
||||
The gem5 event queue stores one callback event for each future point in time.
|
||||
|
||||
The event queue is implemented in the class `EventQueue` in the file `src/sim/eventq.hh`.
|
||||
|
||||
Not all times need to have an associated event: if a given time has no events, gem5 just skips it and jumps to the next event: the queue is basically a linked list of events.
|
||||
|
||||
Important examples of events include:
|
||||
|
||||
* CPU ticks
|
||||
* TODO peripherals and memory
|
||||
|
||||
At the beginning of simulation, gem5 sets up exactly two events:
|
||||
|
||||
* the first CPU cycle
|
||||
* one exit event at the end of time which triggers <<gem5-simulate-limit-reached>>
|
||||
|
||||
Tick events then get triggered one by one as simulation progresses, in addition to any other system events.
|
||||
|
||||
The `EventQueue` class has one awesome `dump()` function that prints a human friendly representation of the queue, and can be easily called from GDB. TODO example.
|
||||
|
||||
We can also observe what is going on in the event queue with the `Event` <<gem5-tracing,debug flag>>.
|
||||
|
||||
Event execution is done at `EventQueue::serviceOne()`:
|
||||
|
||||
....
|
||||
Event *exit_event = eventq->serviceOne();
|
||||
....
|
||||
|
||||
This calls the `Event::process` method of the event.
|
||||
|
||||
===== gem5 event queue AtomicSimpleCPU syscall emulation freestanding example analysis
|
||||
|
||||
Let's now analyze every single event on a minimal <<gem5-syscall-emulation-mode>> in the <<gem5-cpu-types,simplest CPU that we have>>:
|
||||
|
||||
....
|
||||
./run \
|
||||
--arch aarch64 \
|
||||
--emulator gem5 \
|
||||
--userland userland/arch/aarch64/freestanding/linux/hello.S \
|
||||
--trace Event \
|
||||
--trace-stdout \
|
||||
;
|
||||
....
|
||||
|
||||
At LKMC a0ea29835b9bacc6aa1cceb24c79d895315991d4 + 1 this outputs:
|
||||
|
||||
....
|
||||
0: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event scheduled @ 0
|
||||
**** REAL SIMULATION ****
|
||||
0: Event_70: generic event scheduled @ 0
|
||||
info: Entering event queue @ 0. Starting simulation...
|
||||
0: Event_70: generic event rescheduled @ 18446744073709551615
|
||||
0: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event rescheduled @ 500
|
||||
500: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event rescheduled @ 1000
|
||||
1000: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event rescheduled @ 1500
|
||||
1500: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event rescheduled @ 2000
|
||||
hello
|
||||
2000: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event rescheduled @ 2500
|
||||
2500: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event rescheduled @ 3000
|
||||
3000: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event rescheduled @ 3500
|
||||
3500: Event_71: generic event scheduled @ 3500
|
||||
....
|
||||
|
||||
which we immediately guess means that there is one event per tick when comparing to the `ExecAll` trace:
|
||||
|
||||
....
|
||||
0: system.cpu A0 T0 : @asm_main_after_prologue : movz x0, #1, #0 : IntAlu : D=0x0000000000000001 flags=(IsInteger)
|
||||
500: system.cpu A0 T0 : @asm_main_after_prologue+4 : adr x1, #28 : IntAlu : D=0x0000000000400098 flags=(IsInteger)
|
||||
1000: system.cpu A0 T0 : @asm_main_after_prologue+8 : ldr w2, #4194464 : MemRead : D=0x0000000000000006 A=0x4000a0 flags=(IsInteger|IsMemRef|IsLoad)
|
||||
1500: system.cpu A0 T0 : @asm_main_after_prologue+12 : movz x8, #64, #0 : IntAlu : D=0x0000000000000040 flags=(IsInteger)
|
||||
2000: system.cpu A0 T0 : @asm_main_after_prologue+16 : svc #0x0 : IntAlu : flags=(IsSerializeAfter|IsNonSpeculative|IsSyscall)
|
||||
hello
|
||||
2500: system.cpu A0 T0 : @asm_main_after_prologue+20 : movz x0, #0, #0 : IntAlu : D=0x0000000000000000 flags=(IsInteger)
|
||||
3000: system.cpu A0 T0 : @asm_main_after_prologue+24 : movz x8, #93, #0 : IntAlu : D=0x000000000000005d flags=(IsInteger)
|
||||
3500: system.cpu A0 T0 : @asm_main_after_prologue+28 : svc #0x0 : IntAlu : flags=(IsSerializeAfter|IsNonSpeculative|IsSyscall)
|
||||
Exiting @ tick 3500 because exiting with last active thread context
|
||||
....
|
||||
|
||||
On the event trace, we can see:
|
||||
|
||||
* `AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event scheduled @ 0` schedules a tick event for time `0`, and this leads to the first clock tick
|
||||
* `0: Event_70: generic event scheduled @ 0`: schedules the end of time event for time `0`, which is later rescheduled to the actual end of time: `0: Event_70: generic event rescheduled @ 18446744073709551615`
|
||||
* at `0: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event rescheduled @ 500` the first clock tick must have finished running, and so to represent the next one, it was simply rescheduled for the next time `500`! This is done at the end of `AtomicSimpleCPU::tick()`:
|
||||
+
|
||||
....
|
||||
if (_status != Idle)
|
||||
reschedule(tickEvent, curTick() + latency, true);
|
||||
....
|
||||
* at `3500: Event_71: generic event scheduled @ 3500` the exit system call is called and the simulation ends.
|
||||
+
|
||||
A new event is scheduled for the current time itself. TODO what is this event?
|
||||
|
||||
Let's study the first event. From GDB, it's stack trace is:
|
||||
|
||||
....
|
||||
Trace::OstreamLogger::logMessage() at trace.cc:149 0x5555593b3b1e
|
||||
void Trace::Logger::dprintf_flag<char const*, char const*, unsigned long>() at 0x55555949e603
|
||||
void Trace::Logger::dprintf<char const*, char const*, unsigned long>() at 0x55555949de58
|
||||
Event::trace() at eventq.cc:395 0x55555946d109
|
||||
EventQueue::schedule() at eventq_impl.hh:65 0x555557195441
|
||||
EventManager::schedule() at eventq.hh:746 0x555557194aa2
|
||||
AtomicSimpleCPU::activateContext() at atomic.cc:239 0x555559075531
|
||||
SimpleThread::activate() at simple_thread.cc:177 0x555559545a63
|
||||
Process::initState() at process.cc:283 0x555559484011
|
||||
ArmProcess64::initState() at process.cc:126 0x55555730827a
|
||||
ArmLinuxProcess64::initState() at process.cc:1,777 0x5555572d5e5e
|
||||
....
|
||||
|
||||
The interesting call is at `AtomicSimpleCPU::activateContext`:
|
||||
|
||||
....
|
||||
schedule(tickEvent, clockEdge(Cycles(0)));
|
||||
....
|
||||
|
||||
which calls `EventManager::schedule`.
|
||||
|
||||
`AtomicSimpleCPU` is an `EventManager` because `SimObject` inherits from it.
|
||||
|
||||
`tickEvent` is an `EventFunctionWrapper` which contains a `std::function<void(void)> callback;`, and is initialized in the constructor as:
|
||||
|
||||
....
|
||||
tickEvent([this]{ tick(); }, "AtomicSimpleCPU tick",
|
||||
false, Event::CPU_Tick_Pri),
|
||||
....
|
||||
|
||||
So that's how the main atomic tick loop works, fully understood!
|
||||
|
||||
The second event has backtrace:
|
||||
|
||||
....
|
||||
Trace::OstreamLogger::logMessage() at trace.cc:149 0x5555593b3b1e
|
||||
void Trace::Logger::dprintf_flag<char const*, char const*, unsigned long>() at 0x55555949e603
|
||||
void Trace::Logger::dprintf<char const*, char const*, unsigned long>() at 0x55555949de58
|
||||
Event::trace() at eventq.cc:395 0x55555946d109
|
||||
EventQueue::schedule() at eventq_impl.hh:65 0x555557195441
|
||||
BaseGlobalEvent::schedule() at global_event.cc:78 0x55555946d6f1
|
||||
GlobalEvent::GlobalEvent() at 0x55555949d177
|
||||
GlobalSimLoopExitEvent::GlobalSimLoopExitEvent() at sim_events.cc:61 0x555559474470
|
||||
simulate() at simulate.cc:104 0x555559476d6f
|
||||
....
|
||||
|
||||
so gets scheduled automatically at object creation `simulate()` through the `GlobalEvent()` constructor:
|
||||
|
||||
....
|
||||
simulate_limit_event =
|
||||
new GlobalSimLoopExitEvent(mainEventQueue[0]->getCurTick(),
|
||||
"simulate() limit reached", 0);
|
||||
....
|
||||
|
||||
This event indicates that the simulation should finish by overriding `bool isExitEvent()` which gets checked in the main simulation at `EventQueue::serviceOne`:
|
||||
|
||||
....
|
||||
if (event->isExitEvent()) {
|
||||
assert(!event->flags.isSet(Event::Managed) ||
|
||||
!event->flags.isSet(Event::IsMainQueue)); // would be silly
|
||||
return event;
|
||||
....
|
||||
|
||||
And at long, we can guess without reading the code that `Event_71` is comes from the SE implementation of the exit syscall, so let's just confirm, the trace contains:
|
||||
|
||||
....
|
||||
exitSimLoop() at sim_events.cc:97 0x5555594746e0
|
||||
exitImpl() at syscall_emul.cc:215 0x55555948c046
|
||||
exitFunc() at syscall_emul.cc:225 0x55555948c147
|
||||
SyscallDesc::doSyscall() at syscall_desc.cc:72 0x5555594949b6
|
||||
Process::syscall() at process.cc:401 0x555559484717
|
||||
SimpleThread::syscall() at 0x555559558059
|
||||
ArmISA::SupervisorCall::invoke() at faults.cc:856 0x5555572950d7
|
||||
BaseSimpleCPU::advancePC() at base.cc:681 0x555559083133
|
||||
AtomicSimpleCPU::tick() at atomic.cc:757 0x55555907834c
|
||||
....
|
||||
|
||||
and `exitSimLoop()` does:
|
||||
|
||||
....
|
||||
new GlobalSimLoopExitEvent(when + simQuantum, message, exit_code, repeat);
|
||||
....
|
||||
|
||||
Tested at b4879ae5b0b6644e6836b0881e4da05c64a6550d.
|
||||
|
||||
===== gem5 event queue MinorCPU syscall emulation freestanding example analysis
|
||||
|
||||
The events <<gem5-event-queue-atomicsimplecpu-syscall-emulation-freestanding-example-analysis,for the Atomic CPU>> were pretty simple: basically just ticks.
|
||||
|
||||
But as we venture into <<gem5-cpu-types,more complex CPU models>> such as `MinorCPU`, the events get much more complex and interesting.
|
||||
|
||||
TODO: analyze the trace for:
|
||||
|
||||
....
|
||||
./run \
|
||||
--arch aarch64 \
|
||||
--emulator gem5 \
|
||||
--userland userland/arch/aarch64/freestanding/linux/hello.S \
|
||||
--trace Event \
|
||||
--trace-stdout \
|
||||
-- \
|
||||
--cpu-type MinorCPU \
|
||||
--caches \
|
||||
;
|
||||
....
|
||||
|
||||
==== gem5 stats internals
|
||||
|
||||
This describes the internals of the <<gem5-m5out-stats-txt-file>>.
|
||||
|
||||
Reference in New Issue
Block a user