gem5: make an amazing event tree for TimingSimpleCPU

This commit is contained in:
Ciro Santilli 六四事件 法轮功
2020-03-04 00:00:00 +00:00
parent 898c26b6a8
commit 8a32d9f4c3

View File

@@ -12642,6 +12642,8 @@ The weird name "Minor" stands for "M (TODO what is M) IN ONder".
Its 4 stage pipeline is described at the "MinorCPU" section of <<gem5-arm-rsk>>.
There is also an in-tree doxygen at: https://github.com/gem5/gem5/blob/9fc9c67b4242c03f165951775be5cd0812f2a705/src/doc/inside-minor.doxygen[`src/doc/inside-minor.doxygen`] and rendered at: http://pages.cs.wisc.edu/~swilson/gem5-docs/minor.html
As of 2019, in-order cores are mostly present in low power / cost contexts, for example little cores of https://en.wikipedia.org/wiki/ARM_big.LITTLE[ARM bigLITTLE].
The following models extend the `MinorCPU` class by parametrization to make it match existing CPUs more closely:
@@ -13454,86 +13456,85 @@ Here is an abridged version with:
because all that happens in between is exactly the same as the first two instructions and therefore boring.
We have also manually added double newlines before each event execution to make things easier to read:
We have also manually added:
* double newlines before each event execution
* line IDs to be able to refer to specific events more easily (`#0`, `#1`, etc.)
....
0: Event: system.cpu.wrapped_function_event: EventFunctionWrapped 43 scheduled @ 0
#0 0: Event: system.cpu.wrapped_function_event: EventFunctionWrapped 43 scheduled @ 0
**** REAL SIMULATION ****
0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 14 scheduled @ 7786250
0: Event: system.mem_ctrls_1.wrapped_function_event: EventFunctionWrapped 20 scheduled @ 7786250
0: Event: Event_74: generic 74 scheduled @ 0
#1 0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 14 scheduled @ 7786250
#2 0: Event: system.mem_ctrls_1.wrapped_function_event: EventFunctionWrapped 20 scheduled @ 7786250
#3 0: Event: Event_74: generic 74 scheduled @ 0
info: Entering event queue @ 0. Starting simulation...
0: Event: Event_74: generic 74 rescheduled @ 18446744073709551615
#4 0: Event: Event_74: generic 74 rescheduled @ 18446744073709551615
0: Event: system.cpu.wrapped_function_event: EventFunctionWrapped 43 executed @ 0
0: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 0
0: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 scheduled @ 1000
#5 0: Event: system.cpu.wrapped_function_event: EventFunctionWrapped 43 executed @ 0
#6 0: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 0
#7 0: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 scheduled @ 1000
0: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 0
0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 12 scheduled @ 0
0: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 scheduled @ 46250
0: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 5000
#8 0: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 0
#9 0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 12 scheduled @ 0
#10 0: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 scheduled @ 46250
#11 0: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 5000
0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 12 executed @ 0
0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 15 scheduled @ 0
#12 0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 12 executed @ 0
#13 0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 15 scheduled @ 0
0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 15 executed @ 0
#14 0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 15 executed @ 0
1000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 executed @ 1000
#15 1000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 executed @ 1000
5000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 5000
#16 5000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 5000
46250: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 executed @ 46250
46250: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 scheduled @ 74250
#17 46250: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 executed @ 46250
#18 46250: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 scheduled @ 74250
74250: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 executed @ 74250
74250: Event: system.membus.slave[1]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 64 scheduled @ 77000
74250: Event: system.membus.respLayer1.wrapped_function_event: EventFunctionWrapped 65 scheduled @ 77000
#19 74250: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 executed @ 74250
#20 74250: Event: system.membus.slave[1]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 64 scheduled @ 77000
#21 74250: Event: system.membus.respLayer1.wrapped_function_event: EventFunctionWrapped 65 scheduled @ 77000
77000: Event: system.membus.respLayer1.wrapped_function_event: EventFunctionWrapped 65 executed @ 77000
#22 77000: Event: system.membus.respLayer1.wrapped_function_event: EventFunctionWrapped 65 executed @ 77000
77000: Event: system.membus.slave[1]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 64 executed @ 77000
77000: Event: Event_40: Timing CPU icache tick 40 scheduled @ 77000
#23 77000: Event: system.membus.slave[1]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 64 executed @ 77000
#24 77000: Event: Event_40: Timing CPU icache tick 40 scheduled @ 77000
77000: Event: Event_40: Timing CPU icache tick 40 executed @ 77000
77000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue : movz x0, #1, #0 : IntAlu : D=0x0000000000000001 flags=(IsInteger)
77000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 77000
77000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 scheduled @ 78000
#25 77000: Event: Event_40: Timing CPU icache tick 40 executed @ 77000
77000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue : movz x0, #1, #0 : IntAlu : D=0x0000000000000001 flags=(IsInteger)
#26 77000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 77000
#27 77000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 scheduled @ 78000
77000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 77000
77000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 scheduled @ 95750
77000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 77000
#28 77000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 77000
#29 77000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 scheduled @ 95750
#30 77000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 77000
77000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 77000
#31 77000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 77000
78000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 executed @ 78000
#32 78000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 executed @ 78000
95750: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 executed @ 95750
95750: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 scheduled @ 123750
#33 95750: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 executed @ 95750
#34 95750: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 scheduled @ 123750
123750: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 executed @ 123750
123750: Event: system.membus.slave[1]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 64 scheduled @ 126000
123750: Event: system.membus.respLayer1.wrapped_function_event: EventFunctionWrapped 65 scheduled @ 126000
#35 123750: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 executed @ 123750
#36 123750: Event: system.membus.slave[1]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 64 scheduled @ 126000
#37 123750: Event: system.membus.respLayer1.wrapped_function_event: EventFunctionWrapped 65 scheduled @ 126000
126000: Event: system.membus.respLayer1.wrapped_function_event: EventFunctionWrapped 65 executed @ 126000
#38 126000: Event: system.membus.respLayer1.wrapped_function_event: EventFunctionWrapped 65 executed @ 126000
126000: Event: system.membus.slave[1]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 64 executed @ 126000
126000: Event: Event_40: Timing CPU icache tick 40 scheduled @ 126000
#39 126000: Event: system.membus.slave[1]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 64 executed @ 126000
#40 126000: Event: Event_40: Timing CPU icache tick 40 scheduled @ 126000
126000: Event: Event_40: Timing CPU icache tick 40 executed @ 126000
126000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+4 : adr x1, #28 : IntAlu : D=0x0000000000400098 flags=(IsInteger)
126000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 126000
126000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 scheduled @ 127000
#41 126000: Event: Event_40: Timing CPU icache tick 40 executed @ 126000
126000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+4 : adr x1, #28 : IntAlu : D=0x0000000000400098 flags=(IsInteger)
#42 126000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 126000
#43 126000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 scheduled @ 127000
126000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 126000
126000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 scheduled @ 144750
[...]
126000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 126000
126000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 126000
[...]
469000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+28 : svc #0x0 : IntAlu : flags=(IsSerializeAfter|IsNonSpeculative|IsSyscall)
469000: Event: Event_75: generic 75 scheduled @ 469000
469000: Event: Event_75: generic 75 executed @ 469000
469000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+28 : svc #0x0 : IntAlu : flags=(IsSerializeAfter|IsNonSpeculative|IsSyscall)
469000: Event: Event_75: generic 75 scheduled @ 469000
469000: Event: Event_75: generic 75 executed @ 469000
....
Looking into the generated <<gem5-config-ini,`config.dot.svg`>> can give a better intuition on the shape of the memory system: xref:config-dot-svg-timingsimplecpu[xrefstyle=full], so it is good to keep that in mind.
@@ -13542,15 +13543,100 @@ Looking into the generated <<gem5-config-ini,`config.dot.svg`>> can give a bette
.`config.dot.svg` for a TimingSimpleCPU without caches.
image::{cirosantilli-media-base}gem5_config_TimingSimpleCPU_12c917de54145d2d50260035ba7fa614e25317a3.svg?sanitize=true[height=600]
The first event scheduled:
It is also helpful to see this as a tree of events where one execute event schedules other events:
....
0: Event: system.cpu.wrapped_function_event: EventFunctionWrapped 43 scheduled @ 0
| | | | |
0 1 2 3 4 0 TimingSimpleCPU::fetch
5
|
+---+
| |
6 7 6 DRAMCtrl::processNextReqEvent
8 15 7 BaseXBar::Layer::releaseLayer
|
+---+---+
| | |
9 10 11 9 DRAMCtrl::Rank::processActivateEvent
12 17 16 10 DRAMCtrl::processRespondEvent
| | 11 DRAMCtrl::processNextReqEvent
| |
13 18 13 DRAMCtrl::Rank::processPowerEvent
14 19 18 PacketQueue::processSendEvent
|
+---+
| |
20 21 20 PacketQueue::processSendEvent
23 22 21 BaseXBar::Layer<SrcType, DstType>::releaseLayer
|
24 24 TimingSimpleCPU::IcachePort::ITickEvent::process
25
|
+---+
| |
26 27 26 DRAMCtrl::processNextReqEvent
28 32 27 BaseXBar::Layer<SrcType, DstType>::releaseLayer
|
+---+
| |
29 30 29 DRAMCtrl::processRespondEvent
33 31 30 DRAMCtrl::processNextReqEvent
|
34 34 PacketQueue::processSendEvent
35
|
+---+
| |
36 37 36 PacketQueue::processSendEvent
39 38 37 BaseXBar::Layer<SrcType, DstType>::releaseLayer
|
40 40 TimingSimpleCPU::IcachePort::ITickEvent::process
41
|
+---+
| |
42 43 42 DRAMCtrl::processNextReqEvent
43 BaseXBar::Layer<SrcType, DstType>::releaseLayer
....
schedules the initial tick, much like for for `AtomicSimpleCPU`.
Note that every schedule is followed by an execution, so we put them together, for example:
This time however, it is not a tick, but rather a fetch event that gets scheduled:
....
| |
6 7 6 DRAMCtrl::processNextReqEvent
8 15 7 BaseXBar::Layer::releaseLayer
|
....
means:
* `6`: schedule `DRAMCtrl::processNextReqEvent`
* `8`: execute `DRAMCtrl::processNextReqEvent`
* `7`: schedule `BaseXBar::Layer::releaseLayer`
* `15`: execute `BaseXBar::Layer::releaseLayer`
With this, we can focus on going up the event tree from an event of interest until we see what originally caused it!
Notably, the above tree contains the execution of the first two instructions.
Observe how the events leading up to the second instruction are basically a copy of those of the first one, this is the basic `TimingSimpleCPU` event loop in action.
====== TimingSimpleCPU analysis #0
Schedules `TimingSimpleCPU::fetch` through:
....
EventManager::schedule
TimingSimpleCPU::activateContext
SimpleThread::activate
Process::initState
ArmProcess64::initState
ArmLinuxProcess64::initState
....
This schedules the initial tick, much like for for `AtomicSimpleCPU`.
This time however, it is not a tick as in `AtomicSimpleCPU`, but rather a fetch event that gets scheduled for later on, since reading DRAM memory now takes time:
....
TimingSimpleCPU::activateContext(ThreadID thread_num)
@@ -13568,36 +13654,21 @@ TimingSimpleCPU::activateContext(ThreadID thread_num)
schedule(fetchEvent, clockEdge(Cycles(0)));
....
and just like for `AtomicSimpleCPU` it comes from the `initState` call:
....
EventManager::schedule
TimingSimpleCPU::activateContext
SimpleThread::activate
Process::initState
ArmProcess64::initState
ArmLinuxProcess64::initState
....
We have a fetch instead of a tick here compared to `AtomicSimpleCPU`, because in the timing CPU we must first get the instruction opcode from DRAM, which takes some cycles to return!
By looking at the source, we see that `fetchEvent` runs `TimingSimpleCPU::fetch`.
The next event line is:
Just like for `AtomicSimpleCPU`, this call comes from the `initState` call, which is exposed on `SimObject` and ultimately comes from Python.
....
0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 14 scheduled @ 7786250
....
====== TimingSimpleCPU analysis #1
From GDB we see that it comes from `DRAMCtrl::startup` in `mem/dram_ctrl.cc` with a short backtrace:
Backtrace:
....
EventManager::schedule
DRAMCtrl::Rank::startup
DRAMCtrl::startup (this
DRAMCtrl::startup
....
which contains:
Snippets:
....
void
@@ -13641,7 +13712,7 @@ DRAMCtrl::Rank::startup(Tick ref_tick)
}
....
`startup` `DRAMCtrl::startup` is itself a `SimObject` method exposed to Python and called from `simulate` in `src/python/m5/simulate.py`:
`DRAMCtrl::startup` is itself a `SimObject` method exposed to Python and called from `simulate` in `src/python/m5/simulate.py`:
....
def simulate(*args, **kwargs):
@@ -13694,18 +13765,13 @@ So we realize that we are going into deep DRAM modelling, more detail that a mer
`curTick() + tREFI - tRP = 0 + 7800000 - 13750 = 7786250` which is when that `refreshEvent` was scheduled. Our simulation ends way before that point however, so we will never know what it did thank God.
The next event:
====== TimingSimpleCPU analysis #2
....
0: Event: system.mem_ctrls_1.wrapped_function_event: EventFunctionWrapped 20 scheduled @ 7786250
....
is just the startup of the second rank.
This is just the startup of the second rank, see: <<timingsimplecpu-analysis-1>>.
`se.py` allocates the memory controller at `configs/common/MemConfig.py`:
....
def config_mem(options, system):
...
@@ -13728,20 +13794,13 @@ def config_mem(options, system):
mem_ctrls.append(mem_ctrl)
....
The next events are:
....
0: Event: Event_74: generic 74 scheduled @ 0
0: Event: Event_74: generic 74 rescheduled @ 18446744073709551615
....
====== TimingSimpleCPU analysis #3 and #4
From the timing we know what that one is: the end of time exit event, like for `AtomicSimpleCPU`.
The next event is an execution:
====== TimingSimpleCPU analysis #5
....
0: Event: system.cpu.wrapped_function_event: EventFunctionWrapped 43 executed @ 0
....
Executes `TimingSimpleCPU::fetch()`.
The log shows that event ID `43` is now executing: we had previously seen event `43` get scheduled and had analyzed it to be the initial fetch.
@@ -13777,21 +13836,23 @@ TimingSimpleCPU::fetch()
p/x ifetch_req->getVaddr()
....
Still during the execution of the `fetch`, execution then moves into the address translation `ArmISA::TLB::translateTiming`.
The following line comes from that <<arm-paging,address translation>>:
Still during the execution of the `fetch`, execution then moves into the address translation `ArmISA::TLB::translateTiming`, and after a call to:
....
0: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 0
TLB::translateSe
....
The event loop has started, and magic initialization schedulings are not happening anymore: now every event is being scheduled from another event:
the packet now contains the physical address:
....
_paddr = 0x78
....
====== TimingSimpleCPU analysis #6
Schedules `DRAMCtrl::processNextReqEvent` through:
....
Trace::OstreamLogger::logMessage
void Trace::Logger::dprintf
Event::trace
EventQueue::schedule
EventManager::schedule
DRAMCtrl::addToReadQueue
DRAMCtrl::recvTimingReq
@@ -13808,15 +13869,10 @@ ArmISA::TLB::translateComplete
ArmISA::TLB::translateTiming
ArmISA::TLB::translateTiming
TimingSimpleCPU::fetch
TimingSimpleCPU::<lambda()>::operator()(void)
std::_Function_handler<void(), TimingSimpleCPU::TimingSimpleCPU(TimingSimpleCPUParams*)::<lambda()> >
std::function<void)>::operator()() const
EventFunctionWrapper::process
EventQueue::serviceOne
doSimLoop
simulate
....
The event loop has started, and magic initialization schedulings are not happening anymore: now every event is being scheduled from another event:
From the trace, we see that we are already running from the event queue under `TimingSimpleCPU::fetch` as expected.
From the backtrace we see the tortuous path that the data request takes, going through:
@@ -13838,17 +13894,58 @@ The scheduling happens at frame `DRAMCtrl::addToReadQueue`:
}
....
and from a quick source grep we see that `nextReqEvent` is a `DRAMCtrl::processNextReqEvent`.
From this we deduce that the DRAM has a request queue of some sort, and that the fetch:
From this we deduce that the DRAM has a request queue of some sort, and that the fetch has added a read request to that queue.
* has added a read request to that queue
* and has made a future request to read from the queue
The next schedule is another outcome of the previous fetch execution:
The signature of the function is:
....
0: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 scheduled @ 1000
DRAMCtrl::addToReadQueue(PacketPtr pkt, unsigned int pktCount)
....
and schedules a `BaseXBar::Layer::releaseLayer` event through:
where `PacketPtr` is of `class `Packet`, and so clearly the packet is coming from above.
From:
....
p/x *pkt
....
we see:
....
addr = 0x78
....
which from <<timingsimplecpu-analysis-5>> we know is the physical address of the ELF entry point.
Communication goes through certain components via the `class Port` interface, e.g. at `TimingSimpleCPU::sendFetch` a call is made to send the packet forward:
....
icachePort.sendTimingReq(ifetch_pkt)
....
which ends up calling:
....
peer->recvTimingReq(pkt);
....
to reach the receiving side:
....
CoherentXBar::CoherentXBarSlavePort::recvTimingReq
....
Ports are also used to connect the XBar and the DRAM.
We will then see that at <<timingsimplecpu-analysis-20>> a reply packet will come back through the port interface down to the icache port, and only then does the decoding and execution happen.
====== TimingSimpleCPU analysis #7
Schedules `BaseXBar::Layer::releaseLayer` through:
....
EventManager::schedule
@@ -13862,99 +13959,118 @@ TimingSimpleCPU::sendFetch
TimingSimpleCPU::FetchTranslation::finish
ArmISA::TLB::translateComplete
ArmISA::TLB::translateTiming
ArmISA::TLB::translateTiming
TimingSimpleCPU::fetch
....
which schedules a `SimpleMemory::release`.
Next comes is the execution of the previously scheduled `DRAMCtrl::processNextReqEvent` and schedules `DRAMCtrl::Rank::processPrechargeEvent`.
====== TimingSimpleCPU analysis #8
Executes `DRAMCtrl::processNextReqEvent`.
====== TimingSimpleCPU analysis #9
Schedules `DRAMCtrl::Rank::processActivateEvent` through:
....
0: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 0
....
which leads to further schedules:
....
0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 12 scheduled @ 0
EventManager::schedule(Event&, unsigned long)+0x30)[0x561af7237634]
DRAMCtrl::activateBank(DRAMCtrl::Rank&, DRAMCtrl::Bank&, unsigned long, unsigned int)+0xa7b)[0x561af7f24ced]
DRAMCtrl::doDRAMAccess(DRAMCtrl::DRAMPacket*)+0x2a0)[0x561af7f25602]
DRAMCtrl::processNextReqEvent()+0xdce)[0x561af7f27522]
0: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 scheduled @ 46250
EventManager::schedule(Event&, unsigned long)+0x30)[0x561af7237634]
DRAMCtrl::processNextReqEvent()+0xf8c)[0x561af7f276e0]
0: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 5000
EventManager::schedule(Event&, unsigned long)+0x30)[0x561af7237634]
DRAMCtrl::processNextReqEvent()+0x1870)[0x561af7f27fc4]
....
which respectively activate:
....
DRAMCtrl::Rank::processActivateEvent
DRAMCtrl::processRespondEvent
EventManager::schedule
DRAMCtrl::activateBank
DRAMCtrl::doDRAMAccess
DRAMCtrl::processNextReqEvent
....
Next comes the execution of `DRAMCtrl::Rank::processActivateEvent`:
====== TimingSimpleCPU analysis #10
Schedules `DRAMCtrl::processRespondEvent` through:
....
0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 12 executed @ 0
EventManager::schedule
DRAMCtrl::processNextReqEvent
....
====== TimingSimpleCPU analysis #11
Schedules `DRAMCtrl::processNextReqEvent` through:
....
EventManager::schedule
DRAMCtrl::processNextReqEvent
....
====== TimingSimpleCPU analysis #12
Executes `DRAMCtrl::Rank::processActivateEvent`.
which schedules:
....
0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 15 scheduled @ 0
....
====== TimingSimpleCPU analysis #13
which is `DRAMCtrl::Rank::processPowerEvent`.
Then that runs and does not schedule anything thank God, it must just be some power statistics stuff:
Schedules `DRAMCtrl::Rank::processPowerEvent` through:
....
0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 15 executed @ 0
EventManager::schedule
DRAMCtrl::Rank::schedulePowerEvent
DRAMCtrl::Rank::processActivateEvent
....
Then comes `BaseXBar::Layer<SrcType, DstType>::releaseLayer()` which also does not schedule anything:
====== TimingSimpleCPU analysis #14
Executes `DRAMCtrl::Rank::processPowerEvent`.
This it must just be some power statistics stuff, as it does not schedule anything else.
====== TimingSimpleCPU analysis #15
Executes `BaseXBar::Layer<SrcType, DstType>::releaseLayer`.
====== TimingSimpleCPU analysis #16
Executes `DRAMCtrl::processNextReqEvent()`.
====== TimingSimpleCPU analysis #17
Executes `DRAMCtrl::processRespondEvent()`.
====== TimingSimpleCPU analysis #18
Schedules `PacketQueue::processSendEvent()` through:
....
1000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 executed @ 1000
PacketQueue::schedSendEvent
PacketQueue::schedSendTiming
QueuedSlavePort::schedTimingResp
DRAMCtrl::accessAndRespond
DRAMCtrl::processRespondEvent
....
and then `DRAMCtrl::processNextReqEvent()` which also does not schedule any one:
====== TimingSimpleCPU analysis #19
Executes `PacketQueue::processSendEvent()`.
====== TimingSimpleCPU analysis #20
Schedules `PacketQueue::processSendEvent` through:
....
5000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 5000
EventManager::schedule
PacketQueue::schedSendEvent
PacketQueue::schedSendTiming
QueuedSlavePort::schedTimingResp
CoherentXBar::recvTimingResp
CoherentXBar::CoherentXBarMasterPort::recvTimingResp
TimingResponseProtocol::sendResp
SlavePort::sendTimingResp
RespPacketQueue::sendTiming
PacketQueue::sendDeferredPacket
PacketQueue::processSendEvent
....
and then `DRAMCtrl::processRespondEvent()`:
From this backtrace, we see that this event is happening as the fetch reply packet finally comes back from DRAM.
....
46250: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 executed @ 46250
....
====== TimingSimpleCPU analysis #21
which schedules `PacketQueue::processSendEvent()`:
....
46250: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 scheduled @ 74250
....
That executes:
....
74250: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 executed @ 74250
....
and schedules more `PacketQueue::processSendEvent()`:
....
74250: Event: system.membus.slave[1]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 64 scheduled @ 77000
....
and via:
Schedules `BaseXBar::Layer<SrcType, DstType>::releaseLayer` through:
....
EventManager::schedule
@@ -13969,27 +14085,17 @@ PacketQueue::sendDeferredPacket
PacketQueue::processSendEvent
....
schedules `BaseXBar::Layer<SrcType, DstType>::releaseLayer()`:
====== TimingSimpleCPU analysis #22
....
74250: Event: system.membus.respLayer1.wrapped_function_event: EventFunctionWrapped 65 scheduled @ 77000
....
Executes `BaseXBar::Layer<SrcType, DstType>::releaseLayer`.
Next it runs:
====== TimingSimpleCPU analysis #23
....
77000: Event: system.membus.respLayer1.wrapped_function_event: EventFunctionWrapped 65 executed @ 77000
....
Executes `PacketQueue::processSendEvent`.
and schedules nothing.
====== TimingSimpleCPU analysis #24
`PacketQueue::processSendEvent()` executes:
....
77000: Event: system.membus.slave[1]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 64 executed @ 77000
....
and at very long long last, the `icache` schedule comes via:
Schedules `TimingSimpleCPU::IcachePort::ITickEvent::process()` through:
....
EventManager::schedule
@@ -14002,22 +14108,82 @@ PacketQueue::sendDeferredPacket
PacketQueue::processSendEvent
....
which schedules `TimingSimpleCPU::IcachePort::ITickEvent::process()` which calls `TimingSimpleCPU::completeIfetch(PacketPtr pkt)`:
====== TimingSimpleCPU analysis #25
Executes `TimingSimpleCPU::IcachePort::ITickEvent::process()`.
This custom `process` then calls `TimingSimpleCPU::completeIfetch(PacketPtr pkt)`, and that finally executes the very first instruction:
....
77000: Event: Event_40: Timing CPU icache tick 40 scheduled @ 77000
....
That then finally executes the very first instruction:
....
77000: Event: Event_40: Timing CPU icache tick 40 executed @ 77000
77000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue : movz x0, #1, #0 : IntAlu : D=0x0000000000000001 flags=(IsInteger)
....
The end of this instruction must be setting things up in a way that can continue the PC walk loop, and by looking at the source and traces, it is clearly from: `TimingSimpleCPU::advanceInst` which calls `TimingSimpleCPU::fetch`, which is the very thing we did in this simulation!!! OMG, that's the loop.
The end of this instruction must be setting things up in a way that can continue the PC walk loop, and by looking at the source and traces, it is clearly from: `TimingSimpleCPU::advanceInst` which calls `TimingSimpleCPU::fetch`.
One final thing to check, is how the memory reads are going to make the processor stall in the middle of an instruction.
And `TimingSimpleCPU::fetch` is the very thing we did in this simulation at <<timingsimplecpu-analysis-0>>!!! OMG, that's the loop.
====== TimingSimpleCPU analysis #26
Schedules `DRAMCtrl::processNextReqEvent` through:
....
EventManager::schedule
DRAMCtrl::addToReadQueue
DRAMCtrl::recvTimingReq
DRAMCtrl::MemoryPort::recvTimingReq
TimingRequestProtocol::sendReq
MasterPort::sendTimingReq
CoherentXBar::recvTimingReq
CoherentXBar::CoherentXBarSlavePort::recvTimingReq
TimingRequestProtocol::sendReq
MasterPort::sendTimingReq
TimingSimpleCPU::sendFetch
TimingSimpleCPU::FetchTranslation::finish
ArmISA::TLB::translateComplete
ArmISA::TLB::translateTiming
ArmISA::TLB::translateTiming
TimingSimpleCPU::fetch
TimingSimpleCPU::advanceInst
TimingSimpleCPU::completeIfetch
TimingSimpleCPU::IcachePort::ITickEvent::process
....
====== TimingSimpleCPU analysis #27
Schedules `BaseXBar::Layer<SrcType, DstType>::releaseLayer` through:
....
EventManager::schedule
BaseXBar::Layer<SlavePort, MasterPort>::occupyLayer
BaseXBar::Layer<SlavePort, MasterPort>::succeededTiming
CoherentXBar::recvTimingReq
CoherentXBar::CoherentXBarSlavePort::recvTimingReq
TimingRequestProtocol::sendReq
MasterPort::sendTimingReq
TimingSimpleCPU::sendFetch
TimingSimpleCPU::FetchTranslation::finish
ArmISA::TLB::translateComplete
ArmISA::TLB::translateTiming
ArmISA::TLB::translateTiming
TimingSimpleCPU::fetch
TimingSimpleCPU::advanceInst
TimingSimpleCPU::completeIfetch
TimingSimpleCPU::IcachePort::ITickEvent::process
....
====== TimingSimpleCPU analysis #28
Execute `DRAMCtrl::processNextReqEvent`.
====== TimingSimpleCPU analysis #29
Schedule `DRAMCtrl::processRespondEvent()`.
====== TimingSimpleCPU analysis: LDR stall
One important thing we want to check now, is how the memory reads are going to make the processor stall in the middle of an instruction.
Since we were using a simple CPU without a pipeline, the data memory access stall everything: there is no further progress until memory comes back.
For that, we can GDB to the `TimingSimpleCPU::completeIfetch` of the first LDR done in our test program.
@@ -14045,7 +14211,54 @@ So, where is the `execute` happening? Well, I'll satisfy myself with a quick sou
+
It also calls `+curStaticInst->completeAcc+`, which pairs up with the `initiateAcc` call.
Finally, since we were using a simple CPU without a pipeline, the memory access stall everything, there is no further progress until memory comes back.
The following is the region of interest of the event log:
....
175000: Event: Event_40: Timing CPU icache tick 40 executed @ 175000
175000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 175000
175000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 scheduled @ 176000
175000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 175000
175000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 scheduled @ 193750
175000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 175000
175000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 175000
176000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 executed @ 176000
193750: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 executed @ 193750
193750: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 scheduled @ 221750
221750: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 executed @ 221750
221750: Event: system.membus.slave[2]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 66 scheduled @ 224000
221750: Event: system.membus.respLayer2.wrapped_function_event: EventFunctionWrapped 67 scheduled @ 224000
224000: Event: system.membus.respLayer2.wrapped_function_event: EventFunctionWrapped 67 executed @ 224000
224000: Event: system.membus.slave[2]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 66 executed @ 224000
224000: Event: Event_42: Timing CPU dcache tick 42 scheduled @ 224000
224000: Event: Event_42: Timing CPU dcache tick 42 executed @ 224000
175000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+8 : ldr w2, #4194464 : MemRead : D=0x0000000000000006 A=0x4000a0 flags=(IsInteger|IsMemRef|IsLoad)
....
We first find it by looking for the `ExecEnable` of LDR.
Then, we go up to the previous `Timing CPU icache tick` event, which from the analysis of previous instruction instructions, we know is where the instruction execution starts, the LDR instruction fetch is done by then!
Next, several events happen as the data request must be percolating through the memory system, it must be very similar to the instruction fetches. TODO analyze event function names.
Finally, at last we reach
....
224000: Event: Event_42: Timing CPU dcache tick 42 executed @ 224000
175000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+8 : ldr w2, #4194464 : MemRead : D=0x0000000000000006 A=0x4000a0 flags=(IsInteger|IsMemRef|IsLoad)
....
from which we guess:
* `224000`: this is the time that the data request finally returned, and at which execute gets called
* `175000`: the log finally prints at the end of execution, but it does not show the actual time that things finished, but rather the time that the ifetch finished, which happened in the past
===== gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis with caches
@@ -14111,6 +14324,20 @@ It is worth noting that the caches do not affect the `ArmITB` and `ArmDTB` <<arm
.`config.dot.svg` for a TimingSimpleCPU with caches.
image::{cirosantilli-media-base}gem5_config_TimingSimpleCPU_caches_12c917de54145d2d50260035ba7fa614e25317a3.svg?sanitize=true[height=600]
===== gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis with caches and multiple CPUs
TODO is this the minimal setup that allows us to see the <<gem5-crossbar-interconnect>>? Can we see anything in `AtomicSimpleCPU`?
It would be amazing to analyze a simple example with interconnect packets possibly invalidating caches of other CPUs.
To observe it we could create one well controlled workload with instructions that flush memory, and run it on two CPUs.
If we don't use such instructions that flush memory, we would only see the interconnect at work when caches run out.
[[config-dot-svg-timingsimplecpu-caches-2-cpus]]
.`config.dot.svg` for a system with two TimingSimpleCPU with caches.
image::{cirosantilli-media-base}gem5_config_TimingSimpleCPU_caches_2_CPUs_12c917de54145d2d50260035ba7fa614e25317a3.svg?sanitize=true[height=600]
===== gem5 event queue MinorCPU syscall emulation freestanding example analysis
The events <<gem5-event-queue-atomicsimplecpu-syscall-emulation-freestanding-example-analysis,for the Atomic CPU>> were pretty simple: basically just ticks.