mirror of
https://github.com/cirosantilli/linux-kernel-module-cheat.git
synced 2026-01-25 19:21:35 +01:00
gem5: atomic hello se.py event queue totally understood
This commit is contained in:
250
README.adoc
250
README.adoc
@@ -4567,7 +4567,7 @@ Build the kernel exactly as for <<graphic-mode-gem5-aarch64>> and then run with:
|
|||||||
./run --arch aarch64 --dp650 --emulator gem5 --linux-build-id gem5-v4.15
|
./run --arch aarch64 --dp650 --emulator gem5 --linux-build-id gem5-v4.15
|
||||||
....
|
....
|
||||||
|
|
||||||
==== Graphic mode gem5 internals
|
==== gem5 graphic mode internals
|
||||||
|
|
||||||
We cannot use mainline Linux because the <<gem5-arm-linux-kernel-patches>> are required at least to provide the `CONFIG_DRM_VIRT_ENCODER` option.
|
We cannot use mainline Linux because the <<gem5-arm-linux-kernel-patches>> are required at least to provide the `CONFIG_DRM_VIRT_ENCODER` option.
|
||||||
|
|
||||||
@@ -10189,6 +10189,8 @@ gem5 provides also provides a tracing mechanism documented at: http://www.gem5.o
|
|||||||
less "$(./getvar --arch aarch64 run_dir)/trace.txt"
|
less "$(./getvar --arch aarch64 run_dir)/trace.txt"
|
||||||
....
|
....
|
||||||
|
|
||||||
|
Our wrapper just forwards the options to the `--debug-flags` gem5 option.
|
||||||
|
|
||||||
Keep in mind however that the disassembly is very broken in several places as of 2019q2, so you can't always trust it.
|
Keep in mind however that the disassembly is very broken in several places as of 2019q2, so you can't always trust it.
|
||||||
|
|
||||||
Output the trace to stdout instead of a file:
|
Output the trace to stdout instead of a file:
|
||||||
@@ -10214,6 +10216,8 @@ We also have a shortcut for `--trace ExecAll -trace-stdout` with `--trace-insts-
|
|||||||
;
|
;
|
||||||
....
|
....
|
||||||
|
|
||||||
|
Be warned, the trace is humongous, at 16Gb.
|
||||||
|
|
||||||
This would produce a lot of output however, so you will likely not want that when tracing a Linux kernel boot instructions. But it can be very convenient for smaller traces such as <<baremetal>>.
|
This would produce a lot of output however, so you will likely not want that when tracing a Linux kernel boot instructions. But it can be very convenient for smaller traces such as <<baremetal>>.
|
||||||
|
|
||||||
List all available debug flags:
|
List all available debug flags:
|
||||||
@@ -10236,12 +10240,10 @@ The most important trace flags to know about are:
|
|||||||
* <<gem5-registers-trace-format,`Registers`>>
|
* <<gem5-registers-trace-format,`Registers`>>
|
||||||
* <<gem5-syscall-emulation-mode-syscall-tracing,`SyscallBase`, `SyscallVerbose`>>
|
* <<gem5-syscall-emulation-mode-syscall-tracing,`SyscallBase`, `SyscallVerbose`>>
|
||||||
|
|
||||||
The traces are generated from `DPRINTF(<trace-id>` calls scattered throughout the code.
|
Trace internals are discussed at: <<gem5-trace-internals>>.
|
||||||
|
|
||||||
As can be seen on the `Sconstruct`, `Exec` is just an alias that enables a set of flags.
|
As can be seen on the `Sconstruct`, `Exec` is just an alias that enables a set of flags.
|
||||||
|
|
||||||
Be warned, the trace is humongous, at 16Gb.
|
|
||||||
|
|
||||||
We can make the trace smaller by naming the trace file as `trace.txt.gz`, which enables GZIP compression, but that is not currently exposed on our scripts, since you usually just need something human readable to work on.
|
We can make the trace smaller by naming the trace file as `trace.txt.gz`, which enables GZIP compression, but that is not currently exposed on our scripts, since you usually just need something human readable to work on.
|
||||||
|
|
||||||
Enabling tracing made the runtime about 4x slower on the <<p51>>, with or without `.gz` compression.
|
Enabling tracing made the runtime about 4x slower on the <<p51>>, with or without `.gz` compression.
|
||||||
@@ -10256,6 +10258,33 @@ less "$(./getvar --arch aarch64 run_dir)/trace-lines.txt"
|
|||||||
|
|
||||||
TODO: 7452d399290c9c1fc6366cdad129ef442f323564 `./trace2line` this is too slow and takes hours. QEMU's processing of 170k events takes 7 seconds. gem5's processing is analogous, but there are 140M events, so it should take 7000 seconds ~ 2 hours which seems consistent with what I observe, so maybe there is no way to speed this up... The workaround is to just use gem5's `ExecSymbol` to get function granularity, and then GDB individually if line detail is needed?
|
TODO: 7452d399290c9c1fc6366cdad129ef442f323564 `./trace2line` this is too slow and takes hours. QEMU's processing of 170k events takes 7 seconds. gem5's processing is analogous, but there are 140M events, so it should take 7000 seconds ~ 2 hours which seems consistent with what I observe, so maybe there is no way to speed this up... The workaround is to just use gem5's `ExecSymbol` to get function granularity, and then GDB individually if line detail is needed?
|
||||||
|
|
||||||
|
===== gem5 trace internals
|
||||||
|
|
||||||
|
gem5 traces are generated from `DPRINTF(<trace-id>` calls scattered throughout the code, except for `ExecAll` instruction traces, which uses `Debug::ExecEnable` directly..
|
||||||
|
|
||||||
|
The trace IDs are themselves encoded in `SConscript` files, e.g.:
|
||||||
|
|
||||||
|
....
|
||||||
|
DebugFlag('Event'
|
||||||
|
....
|
||||||
|
|
||||||
|
in `src/cpu/SConscript`.
|
||||||
|
|
||||||
|
The build system then automatically adds the options to the `--debug-flags`.
|
||||||
|
|
||||||
|
For this entry, the build system then generates a file `build/ARM/debug/ExecEnable.hh`, which contains:
|
||||||
|
|
||||||
|
....
|
||||||
|
namespace Debug {
|
||||||
|
class SimpleFlag;
|
||||||
|
extern SimpleFlag ExecEnable;
|
||||||
|
}
|
||||||
|
....
|
||||||
|
|
||||||
|
and must be included in from callers of `DPRINTF(` as `<debug/ExecEnable.hh>`.
|
||||||
|
|
||||||
|
Tested in b4879ae5b0b6644e6836b0881e4da05c64a6550d.
|
||||||
|
|
||||||
===== gem5 ExecAll trace format
|
===== gem5 ExecAll trace format
|
||||||
|
|
||||||
This debug flag traces all instructions.
|
This debug flag traces all instructions.
|
||||||
@@ -12410,6 +12439,12 @@ TODO: describe the main characteristics of each platform, as of gem5 5e83d703522
|
|||||||
|
|
||||||
=== gem5 internals
|
=== gem5 internals
|
||||||
|
|
||||||
|
Internals under other sections:
|
||||||
|
|
||||||
|
* <<gem5-trace-internals>>
|
||||||
|
* <<gem5-checkpoint-internals>>
|
||||||
|
* <<gem5-graphic-mode-internals>>
|
||||||
|
|
||||||
==== gem5 Eclipse configuration
|
==== gem5 Eclipse configuration
|
||||||
|
|
||||||
In order to develop complex C++ software such as gem5, a good IDE setup is fundamental.
|
In order to develop complex C++ software such as gem5, a good IDE setup is fundamental.
|
||||||
@@ -12424,6 +12459,8 @@ I recommend the following settings, tested in Eclipse 2019.09, Ubuntu 18.04:
|
|||||||
** ./src/ in the source tree
|
** ./src/ in the source tree
|
||||||
** the ISA specific build directory which contains some self-generated stuff, e.g.: out/gem5/default/build/ARM
|
** the ISA specific build directory which contains some self-generated stuff, e.g.: out/gem5/default/build/ARM
|
||||||
|
|
||||||
|
To run and GDB step debug the executable, just copy the full command line from the output `./run`, and configure it into Eclipse.
|
||||||
|
|
||||||
==== gem5 Python C++ interaction
|
==== gem5 Python C++ interaction
|
||||||
|
|
||||||
The interaction uses the Python C extension interface https://docs.python.org/2/extending/extending.html interface through the pybind11 helper library: https://github.com/pybind/pybind11
|
The interaction uses the Python C extension interface https://docs.python.org/2/extending/extending.html interface through the pybind11 helper library: https://github.com/pybind/pybind11
|
||||||
@@ -12608,6 +12645,211 @@ TODO: the file path name appears to be passed as a command line argument to the
|
|||||||
|
|
||||||
Tested at gem5 b4879ae5b0b6644e6836b0881e4da05c64a6550d.
|
Tested at gem5 b4879ae5b0b6644e6836b0881e4da05c64a6550d.
|
||||||
|
|
||||||
|
==== gem5 event queue
|
||||||
|
|
||||||
|
gem5 is an event based simulator, and as such the event queue is of of the crucial elements in the system.
|
||||||
|
|
||||||
|
The gem5 event queue stores one callback event for each future point in time.
|
||||||
|
|
||||||
|
The event queue is implemented in the class `EventQueue` in the file `src/sim/eventq.hh`.
|
||||||
|
|
||||||
|
Not all times need to have an associated event: if a given time has no events, gem5 just skips it and jumps to the next event: the queue is basically a linked list of events.
|
||||||
|
|
||||||
|
Important examples of events include:
|
||||||
|
|
||||||
|
* CPU ticks
|
||||||
|
* TODO peripherals and memory
|
||||||
|
|
||||||
|
At the beginning of simulation, gem5 sets up exactly two events:
|
||||||
|
|
||||||
|
* the first CPU cycle
|
||||||
|
* one exit event at the end of time which triggers <<gem5-simulate-limit-reached>>
|
||||||
|
|
||||||
|
Tick events then get triggered one by one as simulation progresses, in addition to any other system events.
|
||||||
|
|
||||||
|
The `EventQueue` class has one awesome `dump()` function that prints a human friendly representation of the queue, and can be easily called from GDB. TODO example.
|
||||||
|
|
||||||
|
We can also observe what is going on in the event queue with the `Event` <<gem5-tracing,debug flag>>.
|
||||||
|
|
||||||
|
Event execution is done at `EventQueue::serviceOne()`:
|
||||||
|
|
||||||
|
....
|
||||||
|
Event *exit_event = eventq->serviceOne();
|
||||||
|
....
|
||||||
|
|
||||||
|
This calls the `Event::process` method of the event.
|
||||||
|
|
||||||
|
===== gem5 event queue AtomicSimpleCPU syscall emulation freestanding example analysis
|
||||||
|
|
||||||
|
Let's now analyze every single event on a minimal <<gem5-syscall-emulation-mode>> in the <<gem5-cpu-types,simplest CPU that we have>>:
|
||||||
|
|
||||||
|
....
|
||||||
|
./run \
|
||||||
|
--arch aarch64 \
|
||||||
|
--emulator gem5 \
|
||||||
|
--userland userland/arch/aarch64/freestanding/linux/hello.S \
|
||||||
|
--trace Event \
|
||||||
|
--trace-stdout \
|
||||||
|
;
|
||||||
|
....
|
||||||
|
|
||||||
|
At LKMC a0ea29835b9bacc6aa1cceb24c79d895315991d4 + 1 this outputs:
|
||||||
|
|
||||||
|
....
|
||||||
|
0: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event scheduled @ 0
|
||||||
|
**** REAL SIMULATION ****
|
||||||
|
0: Event_70: generic event scheduled @ 0
|
||||||
|
info: Entering event queue @ 0. Starting simulation...
|
||||||
|
0: Event_70: generic event rescheduled @ 18446744073709551615
|
||||||
|
0: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event rescheduled @ 500
|
||||||
|
500: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event rescheduled @ 1000
|
||||||
|
1000: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event rescheduled @ 1500
|
||||||
|
1500: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event rescheduled @ 2000
|
||||||
|
hello
|
||||||
|
2000: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event rescheduled @ 2500
|
||||||
|
2500: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event rescheduled @ 3000
|
||||||
|
3000: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event rescheduled @ 3500
|
||||||
|
3500: Event_71: generic event scheduled @ 3500
|
||||||
|
....
|
||||||
|
|
||||||
|
which we immediately guess means that there is one event per tick when comparing to the `ExecAll` trace:
|
||||||
|
|
||||||
|
....
|
||||||
|
0: system.cpu A0 T0 : @asm_main_after_prologue : movz x0, #1, #0 : IntAlu : D=0x0000000000000001 flags=(IsInteger)
|
||||||
|
500: system.cpu A0 T0 : @asm_main_after_prologue+4 : adr x1, #28 : IntAlu : D=0x0000000000400098 flags=(IsInteger)
|
||||||
|
1000: system.cpu A0 T0 : @asm_main_after_prologue+8 : ldr w2, #4194464 : MemRead : D=0x0000000000000006 A=0x4000a0 flags=(IsInteger|IsMemRef|IsLoad)
|
||||||
|
1500: system.cpu A0 T0 : @asm_main_after_prologue+12 : movz x8, #64, #0 : IntAlu : D=0x0000000000000040 flags=(IsInteger)
|
||||||
|
2000: system.cpu A0 T0 : @asm_main_after_prologue+16 : svc #0x0 : IntAlu : flags=(IsSerializeAfter|IsNonSpeculative|IsSyscall)
|
||||||
|
hello
|
||||||
|
2500: system.cpu A0 T0 : @asm_main_after_prologue+20 : movz x0, #0, #0 : IntAlu : D=0x0000000000000000 flags=(IsInteger)
|
||||||
|
3000: system.cpu A0 T0 : @asm_main_after_prologue+24 : movz x8, #93, #0 : IntAlu : D=0x000000000000005d flags=(IsInteger)
|
||||||
|
3500: system.cpu A0 T0 : @asm_main_after_prologue+28 : svc #0x0 : IntAlu : flags=(IsSerializeAfter|IsNonSpeculative|IsSyscall)
|
||||||
|
Exiting @ tick 3500 because exiting with last active thread context
|
||||||
|
....
|
||||||
|
|
||||||
|
On the event trace, we can see:
|
||||||
|
|
||||||
|
* `AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event scheduled @ 0` schedules a tick event for time `0`, and this leads to the first clock tick
|
||||||
|
* `0: Event_70: generic event scheduled @ 0`: schedules the end of time event for time `0`, which is later rescheduled to the actual end of time: `0: Event_70: generic event rescheduled @ 18446744073709551615`
|
||||||
|
* at `0: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event rescheduled @ 500` the first clock tick must have finished running, and so to represent the next one, it was simply rescheduled for the next time `500`! This is done at the end of `AtomicSimpleCPU::tick()`:
|
||||||
|
+
|
||||||
|
....
|
||||||
|
if (_status != Idle)
|
||||||
|
reschedule(tickEvent, curTick() + latency, true);
|
||||||
|
....
|
||||||
|
* at `3500: Event_71: generic event scheduled @ 3500` the exit system call is called and the simulation ends.
|
||||||
|
+
|
||||||
|
A new event is scheduled for the current time itself. TODO what is this event?
|
||||||
|
|
||||||
|
Let's study the first event. From GDB, it's stack trace is:
|
||||||
|
|
||||||
|
....
|
||||||
|
Trace::OstreamLogger::logMessage() at trace.cc:149 0x5555593b3b1e
|
||||||
|
void Trace::Logger::dprintf_flag<char const*, char const*, unsigned long>() at 0x55555949e603
|
||||||
|
void Trace::Logger::dprintf<char const*, char const*, unsigned long>() at 0x55555949de58
|
||||||
|
Event::trace() at eventq.cc:395 0x55555946d109
|
||||||
|
EventQueue::schedule() at eventq_impl.hh:65 0x555557195441
|
||||||
|
EventManager::schedule() at eventq.hh:746 0x555557194aa2
|
||||||
|
AtomicSimpleCPU::activateContext() at atomic.cc:239 0x555559075531
|
||||||
|
SimpleThread::activate() at simple_thread.cc:177 0x555559545a63
|
||||||
|
Process::initState() at process.cc:283 0x555559484011
|
||||||
|
ArmProcess64::initState() at process.cc:126 0x55555730827a
|
||||||
|
ArmLinuxProcess64::initState() at process.cc:1,777 0x5555572d5e5e
|
||||||
|
....
|
||||||
|
|
||||||
|
The interesting call is at `AtomicSimpleCPU::activateContext`:
|
||||||
|
|
||||||
|
....
|
||||||
|
schedule(tickEvent, clockEdge(Cycles(0)));
|
||||||
|
....
|
||||||
|
|
||||||
|
which calls `EventManager::schedule`.
|
||||||
|
|
||||||
|
`AtomicSimpleCPU` is an `EventManager` because `SimObject` inherits from it.
|
||||||
|
|
||||||
|
`tickEvent` is an `EventFunctionWrapper` which contains a `std::function<void(void)> callback;`, and is initialized in the constructor as:
|
||||||
|
|
||||||
|
....
|
||||||
|
tickEvent([this]{ tick(); }, "AtomicSimpleCPU tick",
|
||||||
|
false, Event::CPU_Tick_Pri),
|
||||||
|
....
|
||||||
|
|
||||||
|
So that's how the main atomic tick loop works, fully understood!
|
||||||
|
|
||||||
|
The second event has backtrace:
|
||||||
|
|
||||||
|
....
|
||||||
|
Trace::OstreamLogger::logMessage() at trace.cc:149 0x5555593b3b1e
|
||||||
|
void Trace::Logger::dprintf_flag<char const*, char const*, unsigned long>() at 0x55555949e603
|
||||||
|
void Trace::Logger::dprintf<char const*, char const*, unsigned long>() at 0x55555949de58
|
||||||
|
Event::trace() at eventq.cc:395 0x55555946d109
|
||||||
|
EventQueue::schedule() at eventq_impl.hh:65 0x555557195441
|
||||||
|
BaseGlobalEvent::schedule() at global_event.cc:78 0x55555946d6f1
|
||||||
|
GlobalEvent::GlobalEvent() at 0x55555949d177
|
||||||
|
GlobalSimLoopExitEvent::GlobalSimLoopExitEvent() at sim_events.cc:61 0x555559474470
|
||||||
|
simulate() at simulate.cc:104 0x555559476d6f
|
||||||
|
....
|
||||||
|
|
||||||
|
so gets scheduled automatically at object creation `simulate()` through the `GlobalEvent()` constructor:
|
||||||
|
|
||||||
|
....
|
||||||
|
simulate_limit_event =
|
||||||
|
new GlobalSimLoopExitEvent(mainEventQueue[0]->getCurTick(),
|
||||||
|
"simulate() limit reached", 0);
|
||||||
|
....
|
||||||
|
|
||||||
|
This event indicates that the simulation should finish by overriding `bool isExitEvent()` which gets checked in the main simulation at `EventQueue::serviceOne`:
|
||||||
|
|
||||||
|
....
|
||||||
|
if (event->isExitEvent()) {
|
||||||
|
assert(!event->flags.isSet(Event::Managed) ||
|
||||||
|
!event->flags.isSet(Event::IsMainQueue)); // would be silly
|
||||||
|
return event;
|
||||||
|
....
|
||||||
|
|
||||||
|
And at long, we can guess without reading the code that `Event_71` is comes from the SE implementation of the exit syscall, so let's just confirm, the trace contains:
|
||||||
|
|
||||||
|
....
|
||||||
|
exitSimLoop() at sim_events.cc:97 0x5555594746e0
|
||||||
|
exitImpl() at syscall_emul.cc:215 0x55555948c046
|
||||||
|
exitFunc() at syscall_emul.cc:225 0x55555948c147
|
||||||
|
SyscallDesc::doSyscall() at syscall_desc.cc:72 0x5555594949b6
|
||||||
|
Process::syscall() at process.cc:401 0x555559484717
|
||||||
|
SimpleThread::syscall() at 0x555559558059
|
||||||
|
ArmISA::SupervisorCall::invoke() at faults.cc:856 0x5555572950d7
|
||||||
|
BaseSimpleCPU::advancePC() at base.cc:681 0x555559083133
|
||||||
|
AtomicSimpleCPU::tick() at atomic.cc:757 0x55555907834c
|
||||||
|
....
|
||||||
|
|
||||||
|
and `exitSimLoop()` does:
|
||||||
|
|
||||||
|
....
|
||||||
|
new GlobalSimLoopExitEvent(when + simQuantum, message, exit_code, repeat);
|
||||||
|
....
|
||||||
|
|
||||||
|
Tested at b4879ae5b0b6644e6836b0881e4da05c64a6550d.
|
||||||
|
|
||||||
|
===== gem5 event queue MinorCPU syscall emulation freestanding example analysis
|
||||||
|
|
||||||
|
The events <<gem5-event-queue-atomicsimplecpu-syscall-emulation-freestanding-example-analysis,for the Atomic CPU>> were pretty simple: basically just ticks.
|
||||||
|
|
||||||
|
But as we venture into <<gem5-cpu-types,more complex CPU models>> such as `MinorCPU`, the events get much more complex and interesting.
|
||||||
|
|
||||||
|
TODO: analyze the trace for:
|
||||||
|
|
||||||
|
....
|
||||||
|
./run \
|
||||||
|
--arch aarch64 \
|
||||||
|
--emulator gem5 \
|
||||||
|
--userland userland/arch/aarch64/freestanding/linux/hello.S \
|
||||||
|
--trace Event \
|
||||||
|
--trace-stdout \
|
||||||
|
-- \
|
||||||
|
--cpu-type MinorCPU \
|
||||||
|
--caches \
|
||||||
|
;
|
||||||
|
....
|
||||||
|
|
||||||
==== gem5 stats internals
|
==== gem5 stats internals
|
||||||
|
|
||||||
This describes the internals of the <<gem5-m5out-stats-txt-file>>.
|
This describes the internals of the <<gem5-m5out-stats-txt-file>>.
|
||||||
|
|||||||
Reference in New Issue
Block a user