From 07f047f2d768a6f65308ac7c2d2c29b4e6b50655 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ciro=20Santilli=20=E5=85=AD=E5=9B=9B=E4=BA=8B=E4=BB=B6=20?= =?UTF-8?q?=E6=B3=95=E8=BD=AE=E5=8A=9F?= Date: Tue, 15 Oct 2019 00:00:01 +0000 Subject: [PATCH] gem5: atomic hello se.py event queue totally understood --- README.adoc | 250 +++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 246 insertions(+), 4 deletions(-) diff --git a/README.adoc b/README.adoc index 42f680d..718cdcf 100644 --- a/README.adoc +++ b/README.adoc @@ -4567,7 +4567,7 @@ Build the kernel exactly as for <> and then run with: ./run --arch aarch64 --dp650 --emulator gem5 --linux-build-id gem5-v4.15 .... -==== Graphic mode gem5 internals +==== gem5 graphic mode internals We cannot use mainline Linux because the <> are required at least to provide the `CONFIG_DRM_VIRT_ENCODER` option. @@ -10189,6 +10189,8 @@ gem5 provides also provides a tracing mechanism documented at: http://www.gem5.o less "$(./getvar --arch aarch64 run_dir)/trace.txt" .... +Our wrapper just forwards the options to the `--debug-flags` gem5 option. + Keep in mind however that the disassembly is very broken in several places as of 2019q2, so you can't always trust it. Output the trace to stdout instead of a file: @@ -10214,6 +10216,8 @@ We also have a shortcut for `--trace ExecAll -trace-stdout` with `--trace-insts- ; .... +Be warned, the trace is humongous, at 16Gb. + This would produce a lot of output however, so you will likely not want that when tracing a Linux kernel boot instructions. But it can be very convenient for smaller traces such as <>. List all available debug flags: @@ -10236,12 +10240,10 @@ The most important trace flags to know about are: * <> * <> -The traces are generated from `DPRINTF(` calls scattered throughout the code. +Trace internals are discussed at: <>. As can be seen on the `Sconstruct`, `Exec` is just an alias that enables a set of flags. -Be warned, the trace is humongous, at 16Gb. - We can make the trace smaller by naming the trace file as `trace.txt.gz`, which enables GZIP compression, but that is not currently exposed on our scripts, since you usually just need something human readable to work on. Enabling tracing made the runtime about 4x slower on the <>, with or without `.gz` compression. @@ -10256,6 +10258,33 @@ less "$(./getvar --arch aarch64 run_dir)/trace-lines.txt" TODO: 7452d399290c9c1fc6366cdad129ef442f323564 `./trace2line` this is too slow and takes hours. QEMU's processing of 170k events takes 7 seconds. gem5's processing is analogous, but there are 140M events, so it should take 7000 seconds ~ 2 hours which seems consistent with what I observe, so maybe there is no way to speed this up... The workaround is to just use gem5's `ExecSymbol` to get function granularity, and then GDB individually if line detail is needed? +===== gem5 trace internals + +gem5 traces are generated from `DPRINTF(` calls scattered throughout the code, except for `ExecAll` instruction traces, which uses `Debug::ExecEnable` directly.. + +The trace IDs are themselves encoded in `SConscript` files, e.g.: + +.... +DebugFlag('Event' +.... + +in `src/cpu/SConscript`. + +The build system then automatically adds the options to the `--debug-flags`. + +For this entry, the build system then generates a file `build/ARM/debug/ExecEnable.hh`, which contains: + +.... +namespace Debug { +class SimpleFlag; +extern SimpleFlag ExecEnable; +} +.... + +and must be included in from callers of `DPRINTF(` as ``. + +Tested in b4879ae5b0b6644e6836b0881e4da05c64a6550d. + ===== gem5 ExecAll trace format This debug flag traces all instructions. @@ -12410,6 +12439,12 @@ TODO: describe the main characteristics of each platform, as of gem5 5e83d703522 === gem5 internals +Internals under other sections: + +* <> +* <> +* <> + ==== gem5 Eclipse configuration In order to develop complex C++ software such as gem5, a good IDE setup is fundamental. @@ -12424,6 +12459,8 @@ I recommend the following settings, tested in Eclipse 2019.09, Ubuntu 18.04: ** ./src/ in the source tree ** the ISA specific build directory which contains some self-generated stuff, e.g.: out/gem5/default/build/ARM +To run and GDB step debug the executable, just copy the full command line from the output `./run`, and configure it into Eclipse. + ==== gem5 Python C++ interaction The interaction uses the Python C extension interface https://docs.python.org/2/extending/extending.html interface through the pybind11 helper library: https://github.com/pybind/pybind11 @@ -12608,6 +12645,211 @@ TODO: the file path name appears to be passed as a command line argument to the Tested at gem5 b4879ae5b0b6644e6836b0881e4da05c64a6550d. +==== gem5 event queue + +gem5 is an event based simulator, and as such the event queue is of of the crucial elements in the system. + +The gem5 event queue stores one callback event for each future point in time. + +The event queue is implemented in the class `EventQueue` in the file `src/sim/eventq.hh`. + +Not all times need to have an associated event: if a given time has no events, gem5 just skips it and jumps to the next event: the queue is basically a linked list of events. + +Important examples of events include: + +* CPU ticks +* TODO peripherals and memory + +At the beginning of simulation, gem5 sets up exactly two events: + +* the first CPU cycle +* one exit event at the end of time which triggers <> + +Tick events then get triggered one by one as simulation progresses, in addition to any other system events. + +The `EventQueue` class has one awesome `dump()` function that prints a human friendly representation of the queue, and can be easily called from GDB. TODO example. + +We can also observe what is going on in the event queue with the `Event` <>. + +Event execution is done at `EventQueue::serviceOne()`: + +.... +Event *exit_event = eventq->serviceOne(); +.... + +This calls the `Event::process` method of the event. + +===== gem5 event queue AtomicSimpleCPU syscall emulation freestanding example analysis + +Let's now analyze every single event on a minimal <> in the <>: + +.... +./run \ + --arch aarch64 \ + --emulator gem5 \ + --userland userland/arch/aarch64/freestanding/linux/hello.S \ + --trace Event \ + --trace-stdout \ +; +.... + +At LKMC a0ea29835b9bacc6aa1cceb24c79d895315991d4 + 1 this outputs: + +.... + 0: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event scheduled @ 0 +**** REAL SIMULATION **** + 0: Event_70: generic event scheduled @ 0 +info: Entering event queue @ 0. Starting simulation... + 0: Event_70: generic event rescheduled @ 18446744073709551615 + 0: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event rescheduled @ 500 + 500: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event rescheduled @ 1000 + 1000: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event rescheduled @ 1500 + 1500: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event rescheduled @ 2000 +hello + 2000: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event rescheduled @ 2500 + 2500: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event rescheduled @ 3000 + 3000: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event rescheduled @ 3500 + 3500: Event_71: generic event scheduled @ 3500 +.... + +which we immediately guess means that there is one event per tick when comparing to the `ExecAll` trace: + +.... + 0: system.cpu A0 T0 : @asm_main_after_prologue : movz x0, #1, #0 : IntAlu : D=0x0000000000000001 flags=(IsInteger) + 500: system.cpu A0 T0 : @asm_main_after_prologue+4 : adr x1, #28 : IntAlu : D=0x0000000000400098 flags=(IsInteger) + 1000: system.cpu A0 T0 : @asm_main_after_prologue+8 : ldr w2, #4194464 : MemRead : D=0x0000000000000006 A=0x4000a0 flags=(IsInteger|IsMemRef|IsLoad) + 1500: system.cpu A0 T0 : @asm_main_after_prologue+12 : movz x8, #64, #0 : IntAlu : D=0x0000000000000040 flags=(IsInteger) + 2000: system.cpu A0 T0 : @asm_main_after_prologue+16 : svc #0x0 : IntAlu : flags=(IsSerializeAfter|IsNonSpeculative|IsSyscall) +hello + 2500: system.cpu A0 T0 : @asm_main_after_prologue+20 : movz x0, #0, #0 : IntAlu : D=0x0000000000000000 flags=(IsInteger) + 3000: system.cpu A0 T0 : @asm_main_after_prologue+24 : movz x8, #93, #0 : IntAlu : D=0x000000000000005d flags=(IsInteger) + 3500: system.cpu A0 T0 : @asm_main_after_prologue+28 : svc #0x0 : IntAlu : flags=(IsSerializeAfter|IsNonSpeculative|IsSyscall) +Exiting @ tick 3500 because exiting with last active thread context +.... + +On the event trace, we can see: + +* `AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event scheduled @ 0` schedules a tick event for time `0`, and this leads to the first clock tick +* `0: Event_70: generic event scheduled @ 0`: schedules the end of time event for time `0`, which is later rescheduled to the actual end of time: `0: Event_70: generic event rescheduled @ 18446744073709551615` +* at `0: AtomicSimpleCPU tick.wrapped_function_event: EventFunctionWrapped event rescheduled @ 500` the first clock tick must have finished running, and so to represent the next one, it was simply rescheduled for the next time `500`! This is done at the end of `AtomicSimpleCPU::tick()`: ++ +.... +if (_status != Idle) + reschedule(tickEvent, curTick() + latency, true); +.... +* at `3500: Event_71: generic event scheduled @ 3500` the exit system call is called and the simulation ends. ++ +A new event is scheduled for the current time itself. TODO what is this event? + +Let's study the first event. From GDB, it's stack trace is: + +.... +Trace::OstreamLogger::logMessage() at trace.cc:149 0x5555593b3b1e +void Trace::Logger::dprintf_flag() at 0x55555949e603 +void Trace::Logger::dprintf() at 0x55555949de58 +Event::trace() at eventq.cc:395 0x55555946d109 +EventQueue::schedule() at eventq_impl.hh:65 0x555557195441 +EventManager::schedule() at eventq.hh:746 0x555557194aa2 +AtomicSimpleCPU::activateContext() at atomic.cc:239 0x555559075531 +SimpleThread::activate() at simple_thread.cc:177 0x555559545a63 +Process::initState() at process.cc:283 0x555559484011 +ArmProcess64::initState() at process.cc:126 0x55555730827a +ArmLinuxProcess64::initState() at process.cc:1,777 0x5555572d5e5e +.... + +The interesting call is at `AtomicSimpleCPU::activateContext`: + +.... +schedule(tickEvent, clockEdge(Cycles(0))); +.... + +which calls `EventManager::schedule`. + +`AtomicSimpleCPU` is an `EventManager` because `SimObject` inherits from it. + +`tickEvent` is an `EventFunctionWrapper` which contains a `std::function callback;`, and is initialized in the constructor as: + +.... +tickEvent([this]{ tick(); }, "AtomicSimpleCPU tick", + false, Event::CPU_Tick_Pri), +.... + +So that's how the main atomic tick loop works, fully understood! + +The second event has backtrace: + +.... +Trace::OstreamLogger::logMessage() at trace.cc:149 0x5555593b3b1e +void Trace::Logger::dprintf_flag() at 0x55555949e603 +void Trace::Logger::dprintf() at 0x55555949de58 +Event::trace() at eventq.cc:395 0x55555946d109 +EventQueue::schedule() at eventq_impl.hh:65 0x555557195441 +BaseGlobalEvent::schedule() at global_event.cc:78 0x55555946d6f1 +GlobalEvent::GlobalEvent() at 0x55555949d177 +GlobalSimLoopExitEvent::GlobalSimLoopExitEvent() at sim_events.cc:61 0x555559474470 +simulate() at simulate.cc:104 0x555559476d6f +.... + +so gets scheduled automatically at object creation `simulate()` through the `GlobalEvent()` constructor: + +.... +simulate_limit_event = + new GlobalSimLoopExitEvent(mainEventQueue[0]->getCurTick(), + "simulate() limit reached", 0); +.... + +This event indicates that the simulation should finish by overriding `bool isExitEvent()` which gets checked in the main simulation at `EventQueue::serviceOne`: + +.... +if (event->isExitEvent()) { + assert(!event->flags.isSet(Event::Managed) || + !event->flags.isSet(Event::IsMainQueue)); // would be silly + return event; +.... + +And at long, we can guess without reading the code that `Event_71` is comes from the SE implementation of the exit syscall, so let's just confirm, the trace contains: + +.... +exitSimLoop() at sim_events.cc:97 0x5555594746e0 +exitImpl() at syscall_emul.cc:215 0x55555948c046 +exitFunc() at syscall_emul.cc:225 0x55555948c147 +SyscallDesc::doSyscall() at syscall_desc.cc:72 0x5555594949b6 +Process::syscall() at process.cc:401 0x555559484717 +SimpleThread::syscall() at 0x555559558059 +ArmISA::SupervisorCall::invoke() at faults.cc:856 0x5555572950d7 +BaseSimpleCPU::advancePC() at base.cc:681 0x555559083133 +AtomicSimpleCPU::tick() at atomic.cc:757 0x55555907834c +.... + +and `exitSimLoop()` does: + +.... +new GlobalSimLoopExitEvent(when + simQuantum, message, exit_code, repeat); +.... + +Tested at b4879ae5b0b6644e6836b0881e4da05c64a6550d. + +===== gem5 event queue MinorCPU syscall emulation freestanding example analysis + +The events <> were pretty simple: basically just ticks. + +But as we venture into <> such as `MinorCPU`, the events get much more complex and interesting. + +TODO: analyze the trace for: + +.... +./run \ + --arch aarch64 \ + --emulator gem5 \ + --userland userland/arch/aarch64/freestanding/linux/hello.S \ + --trace Event \ + --trace-stdout \ + -- \ + --cpu-type MinorCPU \ + --caches \ +; +.... + ==== gem5 stats internals This describes the internals of the <>.