gem5: make an amazing event tree for TimingSimpleCPU

2026-01-23 02:05:57 +01:00 · 2020-03-04 00:00:00 +00:00
parent 898c26b6a8
commit 8a32d9f4c3
1 changed files with 435 additions and 208 deletions
--- a/README.adoc
+++ b/README.adoc
@@ -12642,6 +12642,8 @@ The weird name "Minor" stands for "M (TODO what is M) IN ONder".

 Its 4 stage pipeline is described at the "MinorCPU" section of <<gem5-arm-rsk>>.

+There is also an in-tree doxygen at: https://github.com/gem5/gem5/blob/9fc9c67b4242c03f165951775be5cd0812f2a705/src/doc/inside-minor.doxygen[`src/doc/inside-minor.doxygen`] and rendered at: http://pages.cs.wisc.edu/~swilson/gem5-docs/minor.html
+
 As of 2019, in-order cores are mostly present in low power / cost contexts, for example little cores of https://en.wikipedia.org/wiki/ARM_big.LITTLE[ARM bigLITTLE].

 The following models extend the `MinorCPU` class by parametrization to make it match existing CPUs more closely:
@@ -13454,86 +13456,85 @@ Here is an abridged version with:

 because all that happens in between is exactly the same as the first two instructions and therefore boring.

-We have also manually added double newlines before each event execution to make things easier to read:
+We have also manually added:
+
+* double newlines before each event execution
+* line IDs to be able to refer to specific events more easily (`#0`, `#1`, etc.)

 ....
-      0: Event: system.cpu.wrapped_function_event: EventFunctionWrapped 43 scheduled @ 0
+#0       0: Event: system.cpu.wrapped_function_event: EventFunctionWrapped 43 scheduled @ 0
 **** REAL SIMULATION ****
-      0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 14 scheduled @ 7786250
-      0: Event: system.mem_ctrls_1.wrapped_function_event: EventFunctionWrapped 20 scheduled @ 7786250
-      0: Event: Event_74: generic 74 scheduled @ 0
+#1       0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 14 scheduled @ 7786250
+#2       0: Event: system.mem_ctrls_1.wrapped_function_event: EventFunctionWrapped 20 scheduled @ 7786250
+#3       0: Event: Event_74: generic 74 scheduled @ 0
 info: Entering event queue @ 0.  Starting simulation...
-      0: Event: Event_74: generic 74 rescheduled @ 18446744073709551615
+#4       0: Event: Event_74: generic 74 rescheduled @ 18446744073709551615

-      0: Event: system.cpu.wrapped_function_event: EventFunctionWrapped 43 executed @ 0
-      0: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 0
-      0: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 scheduled @ 1000
+#5       0: Event: system.cpu.wrapped_function_event: EventFunctionWrapped 43 executed @ 0
+#6       0: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 0
+#7       0: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 scheduled @ 1000

-      0: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 0
-      0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 12 scheduled @ 0
-      0: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 scheduled @ 46250
-      0: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 5000
+#8       0: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 0
+#9       0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 12 scheduled @ 0
+#10      0: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 scheduled @ 46250
+#11      0: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 5000

-      0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 12 executed @ 0
-      0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 15 scheduled @ 0
+#12      0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 12 executed @ 0
+#13      0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 15 scheduled @ 0

-      0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 15 executed @ 0
+#14      0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 15 executed @ 0

-   1000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 executed @ 1000
+#15   1000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 executed @ 1000

-   5000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 5000
+#16   5000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 5000

-  46250: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 executed @ 46250
-  46250: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 scheduled @ 74250
+#17  46250: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 executed @ 46250
+#18  46250: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 scheduled @ 74250

-  74250: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 executed @ 74250
-  74250: Event: system.membus.slave[1]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 64 scheduled @ 77000
-  74250: Event: system.membus.respLayer1.wrapped_function_event: EventFunctionWrapped 65 scheduled @ 77000
+#19  74250: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 executed @ 74250
+#20  74250: Event: system.membus.slave[1]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 64 scheduled @ 77000
+#21  74250: Event: system.membus.respLayer1.wrapped_function_event: EventFunctionWrapped 65 scheduled @ 77000

-  77000: Event: system.membus.respLayer1.wrapped_function_event: EventFunctionWrapped 65 executed @ 77000
+#22  77000: Event: system.membus.respLayer1.wrapped_function_event: EventFunctionWrapped 65 executed @ 77000

-  77000: Event: system.membus.slave[1]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 64 executed @ 77000
-  77000: Event: Event_40: Timing CPU icache tick 40 scheduled @ 77000
+#23  77000: Event: system.membus.slave[1]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 64 executed @ 77000
+#24  77000: Event: Event_40: Timing CPU icache tick 40 scheduled @ 77000

-  77000: Event: Event_40: Timing CPU icache tick 40 executed @ 77000
-  77000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue    :   movz   x0, #1, #0        : IntAlu :  D=0x0000000000000001  flags=(IsInteger)
-  77000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 77000
-  77000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 scheduled @ 78000
+#25  77000: Event: Event_40: Timing CPU icache tick 40 executed @ 77000
+     77000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue    :   movz   x0, #1, #0        : IntAlu :  D=0x0000000000000001  flags=(IsInteger)
+#26  77000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 77000
+#27  77000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 scheduled @ 78000

-  77000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 77000
-  77000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 scheduled @ 95750
-  77000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 77000
+#28  77000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 77000
+#29  77000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 scheduled @ 95750
+#30  77000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 77000

-  77000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 77000
+#31  77000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 77000

-  78000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 executed @ 78000
+#32  78000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 executed @ 78000

-  95750: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 executed @ 95750
-  95750: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 scheduled @ 123750
+#33  95750: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 executed @ 95750
+#34  95750: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 scheduled @ 123750

- 123750: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 executed @ 123750
- 123750: Event: system.membus.slave[1]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 64 scheduled @ 126000
- 123750: Event: system.membus.respLayer1.wrapped_function_event: EventFunctionWrapped 65 scheduled @ 126000
+#35 123750: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 executed @ 123750
+#36 123750: Event: system.membus.slave[1]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 64 scheduled @ 126000
+#37 123750: Event: system.membus.respLayer1.wrapped_function_event: EventFunctionWrapped 65 scheduled @ 126000

- 126000: Event: system.membus.respLayer1.wrapped_function_event: EventFunctionWrapped 65 executed @ 126000
+#38 126000: Event: system.membus.respLayer1.wrapped_function_event: EventFunctionWrapped 65 executed @ 126000

- 126000: Event: system.membus.slave[1]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 64 executed @ 126000
- 126000: Event: Event_40: Timing CPU icache tick 40 scheduled @ 126000
+#39 126000: Event: system.membus.slave[1]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 64 executed @ 126000
+#40 126000: Event: Event_40: Timing CPU icache tick 40 scheduled @ 126000

- 126000: Event: Event_40: Timing CPU icache tick 40 executed @ 126000
- 126000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+4    :   adr   x1, #28            : IntAlu :  D=0x0000000000400098  flags=(IsInteger)
- 126000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 126000
- 126000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 scheduled @ 127000
+#41 126000: Event: Event_40: Timing CPU icache tick 40 executed @ 126000
+    126000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+4    :   adr   x1, #28            : IntAlu :  D=0x0000000000400098  flags=(IsInteger)
+#42 126000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 126000
+#43 126000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 scheduled @ 127000

- 126000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 126000
- 126000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 scheduled @ 144750
+    [...]

- 126000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 126000
- 126000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 126000
- [...]
- 469000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+28    :   svc   #0x0               : IntAlu :   flags=(IsSerializeAfter|IsNonSpeculative|IsSyscall)
- 469000: Event: Event_75: generic 75 scheduled @ 469000
- 469000: Event: Event_75: generic 75 executed @ 469000
+    469000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+28    :   svc   #0x0               : IntAlu :   flags=(IsSerializeAfter|IsNonSpeculative|IsSyscall)
+    469000: Event: Event_75: generic 75 scheduled @ 469000
+    469000: Event: Event_75: generic 75 executed @ 469000
 ....

 Looking into the generated <<gem5-config-ini,`config.dot.svg`>> can give a better intuition on the shape of the memory system: xref:config-dot-svg-timingsimplecpu[xrefstyle=full], so it is good to keep that in mind.
@@ -13542,15 +13543,100 @@ Looking into the generated <<gem5-config-ini,`config.dot.svg`>> can give a bette
 .`config.dot.svg` for a TimingSimpleCPU without caches.
 image::{cirosantilli-media-base}gem5_config_TimingSimpleCPU_12c917de54145d2d50260035ba7fa614e25317a3.svg?sanitize=true[height=600]

-The first event scheduled:
+It is also helpful to see this as a tree of events where one execute event schedules other events:

 ....
-0: Event: system.cpu.wrapped_function_event: EventFunctionWrapped 43 scheduled @ 0
+    | | | | |
+    0 1 2 3 4   0 TimingSimpleCPU::fetch
+    5
+    |
+    +---+
+    |   |
+    6   7       6 DRAMCtrl::processNextReqEvent
+    8   15      7 BaseXBar::Layer::releaseLayer
+    |
+---+---+
+|   |   |
+9   10  11      9 DRAMCtrl::Rank::processActivateEvent
+12  17  16     10 DRAMCtrl::processRespondEvent
+|   |          11 DRAMCtrl::processNextReqEvent
+|   |
+13  18         13 DRAMCtrl::Rank::processPowerEvent
+14  19         18 PacketQueue::processSendEvent
+    |
+    +---+
+    |   |
+    20  21     20 PacketQueue::processSendEvent
+    23  22     21 BaseXBar::Layer<SrcType, DstType>::releaseLayer
+    |
+    24         24 TimingSimpleCPU::IcachePort::ITickEvent::process
+    25
+    |
+    +---+
+    |   |
+    26  27     26 DRAMCtrl::processNextReqEvent
+    28  32     27 BaseXBar::Layer<SrcType, DstType>::releaseLayer
+    |
+    +---+
+    |   |
+    29  30     29 DRAMCtrl::processRespondEvent
+    33  31     30 DRAMCtrl::processNextReqEvent
+    |
+    34         34 PacketQueue::processSendEvent
+    35
+    |
+    +---+
+    |   |
+    36  37     36 PacketQueue::processSendEvent
+    39  38     37 BaseXBar::Layer<SrcType, DstType>::releaseLayer
+    |
+    40         40 TimingSimpleCPU::IcachePort::ITickEvent::process
+    41
+    |
+    +---+
+    |   |
+    42  43     42 DRAMCtrl::processNextReqEvent
+               43 BaseXBar::Layer<SrcType, DstType>::releaseLayer
 ....

-schedules the initial tick, much like for for `AtomicSimpleCPU`.
+Note that every schedule is followed by an execution, so we put them together, for example:

-This time however, it is not a tick, but rather a fetch event that gets scheduled:
+....
+    |   |
+    6   7    6 DRAMCtrl::processNextReqEvent
+    8   15   7 BaseXBar::Layer::releaseLayer
+    |
+....
+
+means:
+
+* `6`: schedule `DRAMCtrl::processNextReqEvent`
+* `8`: execute `DRAMCtrl::processNextReqEvent`
+* `7`: schedule `BaseXBar::Layer::releaseLayer`
+* `15`: execute `BaseXBar::Layer::releaseLayer`
+
+With this, we can focus on going up the event tree from an event of interest until we see what originally caused it!
+
+Notably, the above tree contains the execution of the first two instructions.
+
+Observe how the events leading up to the second instruction are basically a copy of those of the first one, this is the basic `TimingSimpleCPU` event loop in action.
+
+====== TimingSimpleCPU analysis #0
+
+Schedules `TimingSimpleCPU::fetch` through:
+
+....
+EventManager::schedule
+TimingSimpleCPU::activateContext
+SimpleThread::activate
+Process::initState
+ArmProcess64::initState
+ArmLinuxProcess64::initState
+....
+
+This schedules the initial tick, much like for for `AtomicSimpleCPU`.
+
+This time however, it is not a tick as in `AtomicSimpleCPU`, but rather a fetch event that gets scheduled for later on, since reading DRAM memory now takes time:

 ....
 TimingSimpleCPU::activateContext(ThreadID thread_num)
@@ -13568,36 +13654,21 @@ TimingSimpleCPU::activateContext(ThreadID thread_num)
        schedule(fetchEvent, clockEdge(Cycles(0)));
 ....

-and just like for `AtomicSimpleCPU` it comes from the `initState` call:
-
-....
-EventManager::schedule
-TimingSimpleCPU::activateContext
-SimpleThread::activate
-Process::initState
-ArmProcess64::initState
-ArmLinuxProcess64::initState
-....
-
-We have a fetch instead of a tick here compared to `AtomicSimpleCPU`, because in the timing CPU we must first get the instruction opcode from DRAM, which takes some cycles to return!
-
 By looking at the source, we see that `fetchEvent` runs `TimingSimpleCPU::fetch`.

-The next event line is:
+Just like for `AtomicSimpleCPU`, this call comes from the `initState` call, which is exposed on `SimObject` and ultimately comes from Python.

-....
-0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 14 scheduled @ 7786250
-....
+====== TimingSimpleCPU analysis #1

-From GDB we see that it comes from `DRAMCtrl::startup` in `mem/dram_ctrl.cc` with a short backtrace:
+Backtrace:

 ....
 EventManager::schedule
 DRAMCtrl::Rank::startup
-DRAMCtrl::startup (this
+DRAMCtrl::startup
 ....

-which contains:
+Snippets:

 ....
 void
@@ -13641,7 +13712,7 @@ DRAMCtrl::Rank::startup(Tick ref_tick)
 }
 ....

-`startup` `DRAMCtrl::startup` is itself a `SimObject` method exposed to Python and called from `simulate` in `src/python/m5/simulate.py`:
+`DRAMCtrl::startup` is itself a `SimObject` method exposed to Python and called from `simulate` in `src/python/m5/simulate.py`:

 ....
 def simulate(*args, **kwargs):
@@ -13694,18 +13765,13 @@ So we realize that we are going into deep DRAM modelling, more detail that a mer

 `curTick() + tREFI - tRP = 0 + 7800000 - 13750 = 7786250` which is when that `refreshEvent` was scheduled. Our simulation ends way before that point however, so we will never know what it did thank God.

-The next event:
+====== TimingSimpleCPU analysis #2

-....
-0: Event: system.mem_ctrls_1.wrapped_function_event: EventFunctionWrapped 20 scheduled @ 7786250
-....
-
-is just the startup of the second rank.
+This is just the startup of the second rank, see: <<timingsimplecpu-analysis-1>>.

 `se.py` allocates the memory controller at `configs/common/MemConfig.py`:

 ....
-
 def config_mem(options, system):

    ...
@@ -13728,20 +13794,13 @@ def config_mem(options, system):
            mem_ctrls.append(mem_ctrl)
 ....

-The next events are:
-
-....
-0: Event: Event_74: generic 74 scheduled @ 0
-0: Event: Event_74: generic 74 rescheduled @ 18446744073709551615
-....
+====== TimingSimpleCPU analysis #3 and #4

 From the timing we know what that one is: the end of time exit event, like for `AtomicSimpleCPU`.

-The next event is an execution:
+====== TimingSimpleCPU analysis #5

-....
-0: Event: system.cpu.wrapped_function_event: EventFunctionWrapped 43 executed @ 0
-....
+Executes `TimingSimpleCPU::fetch()`.

 The log shows that event ID `43` is now executing: we had previously seen event `43` get scheduled and had analyzed it to be the initial fetch.

@@ -13777,21 +13836,23 @@ TimingSimpleCPU::fetch()
 p/x ifetch_req->getVaddr()
 ....

-Still during the execution of the `fetch`, execution then moves into the address translation `ArmISA::TLB::translateTiming`.
-
-The following line comes from that <<arm-paging,address translation>>:
+Still during the execution of the `fetch`, execution then moves into the address translation `ArmISA::TLB::translateTiming`, and after a call to:

 ....
-0: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 0
+TLB::translateSe
 ....

-The event loop has started, and magic initialization schedulings are not happening anymore: now every event is being scheduled from another event:
+the packet now contains the physical address:
+
+....
+_paddr = 0x78
+....
+
+====== TimingSimpleCPU analysis #6
+
+Schedules `DRAMCtrl::processNextReqEvent` through:

 ....
-Trace::OstreamLogger::logMessage
-void Trace::Logger::dprintf
-Event::trace
-EventQueue::schedule
 EventManager::schedule
 DRAMCtrl::addToReadQueue
 DRAMCtrl::recvTimingReq
@@ -13808,15 +13869,10 @@ ArmISA::TLB::translateComplete
 ArmISA::TLB::translateTiming
 ArmISA::TLB::translateTiming
 TimingSimpleCPU::fetch
-TimingSimpleCPU::<lambda()>::operator()(void)
-std::_Function_handler<void(), TimingSimpleCPU::TimingSimpleCPU(TimingSimpleCPUParams*)::<lambda()> >
-std::function<void)>::operator()() const
-EventFunctionWrapper::process
-EventQueue::serviceOne
-doSimLoop
-simulate
 ....

+The event loop has started, and magic initialization schedulings are not happening anymore: now every event is being scheduled from another event:
+
 From the trace, we see that we are already running from the event queue under `TimingSimpleCPU::fetch` as expected.

 From the backtrace we see the tortuous path that the data request takes, going through:
@@ -13838,17 +13894,58 @@ The scheduling happens at frame `DRAMCtrl::addToReadQueue`:
     }
 ....

-and from a quick source grep we see that `nextReqEvent` is a `DRAMCtrl::processNextReqEvent`.
+From this we deduce that the DRAM has a request queue of some sort, and that the fetch:

-From this we deduce that the DRAM has a request queue of some sort, and that the fetch has added a read request to that queue.
+* has added a read request to that queue
+* and has made a future request to read from the queue

-The next schedule is another outcome of the previous fetch execution:
+The signature of the function is:

 ....
-0: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 scheduled @ 1000
+DRAMCtrl::addToReadQueue(PacketPtr pkt, unsigned int pktCount)
 ....

-and schedules a `BaseXBar::Layer::releaseLayer` event through:
+where `PacketPtr` is of `class `Packet`, and so clearly the packet is coming from above.
+
+From:
+
+....
+p/x *pkt
+....
+
+we see:
+
+....
+addr = 0x78
+....
+
+which from <<timingsimplecpu-analysis-5>> we know is the physical address of the ELF entry point.
+
+Communication goes through certain components via the `class Port` interface, e.g. at `TimingSimpleCPU::sendFetch` a call is made to send the packet forward:
+
+....
+icachePort.sendTimingReq(ifetch_pkt)
+....
+
+which ends up calling:
+
+....
+peer->recvTimingReq(pkt);
+....
+
+to reach the receiving side:
+
+....
+CoherentXBar::CoherentXBarSlavePort::recvTimingReq
+....
+
+Ports are also used to connect the XBar and the DRAM.
+
+We will then see that at <<timingsimplecpu-analysis-20>> a reply packet will come back through the port interface down to the icache port, and only then does the decoding and execution happen.
+
+====== TimingSimpleCPU analysis #7
+
+Schedules `BaseXBar::Layer::releaseLayer` through:

 ....
 EventManager::schedule
@@ -13862,99 +13959,118 @@ TimingSimpleCPU::sendFetch
 TimingSimpleCPU::FetchTranslation::finish
 ArmISA::TLB::translateComplete
 ArmISA::TLB::translateTiming
+ArmISA::TLB::translateTiming
+TimingSimpleCPU::fetch
 ....

 which schedules a `SimpleMemory::release`.

-Next comes is the execution of the previously scheduled `DRAMCtrl::processNextReqEvent` and schedules `DRAMCtrl::Rank::processPrechargeEvent`.
+====== TimingSimpleCPU analysis #8
+
+Executes `DRAMCtrl::processNextReqEvent`.
+
+====== TimingSimpleCPU analysis #9
+
+Schedules `DRAMCtrl::Rank::processActivateEvent` through:

 ....
-0: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 0
-....
-
-which leads to further schedules:
-
-....
-      0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 12 scheduled @ 0
-EventManager::schedule(Event&, unsigned long)+0x30)[0x561af7237634]
-DRAMCtrl::activateBank(DRAMCtrl::Rank&, DRAMCtrl::Bank&, unsigned long, unsigned int)+0xa7b)[0x561af7f24ced]
-DRAMCtrl::doDRAMAccess(DRAMCtrl::DRAMPacket*)+0x2a0)[0x561af7f25602]
-DRAMCtrl::processNextReqEvent()+0xdce)[0x561af7f27522]
-
-      0: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 scheduled @ 46250
-EventManager::schedule(Event&, unsigned long)+0x30)[0x561af7237634]
-DRAMCtrl::processNextReqEvent()+0xf8c)[0x561af7f276e0]
-
-      0: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 5000
-EventManager::schedule(Event&, unsigned long)+0x30)[0x561af7237634]
-DRAMCtrl::processNextReqEvent()+0x1870)[0x561af7f27fc4]
-....
-
-which respectively activate:
-
-....
-DRAMCtrl::Rank::processActivateEvent
-DRAMCtrl::processRespondEvent
+EventManager::schedule
+DRAMCtrl::activateBank
+DRAMCtrl::doDRAMAccess
 DRAMCtrl::processNextReqEvent
 ....

-Next comes the execution of `DRAMCtrl::Rank::processActivateEvent`:
+====== TimingSimpleCPU analysis #10
+
+Schedules `DRAMCtrl::processRespondEvent` through:

 ....
-0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 12 executed @ 0
+EventManager::schedule
+DRAMCtrl::processNextReqEvent
 ....

+====== TimingSimpleCPU analysis #11
+
+Schedules `DRAMCtrl::processNextReqEvent` through:
+
+....
+EventManager::schedule
+DRAMCtrl::processNextReqEvent
+....
+
+====== TimingSimpleCPU analysis #12
+
+Executes `DRAMCtrl::Rank::processActivateEvent`.
+
 which schedules:

-....
-0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 15 scheduled @ 0
-....
+====== TimingSimpleCPU analysis #13

-which is `DRAMCtrl::Rank::processPowerEvent`.
-
-Then that runs and does not schedule anything thank God, it must just be some power statistics stuff:
+Schedules `DRAMCtrl::Rank::processPowerEvent` through:

 ....
-0: Event: system.mem_ctrls_0.wrapped_function_event: EventFunctionWrapped 15 executed @ 0
+EventManager::schedule
+DRAMCtrl::Rank::schedulePowerEvent
+DRAMCtrl::Rank::processActivateEvent
 ....

-Then comes `BaseXBar::Layer<SrcType, DstType>::releaseLayer()` which also does not schedule anything:
+====== TimingSimpleCPU analysis #14
+
+Executes `DRAMCtrl::Rank::processPowerEvent`.
+
+This it must just be some power statistics stuff, as it does not schedule anything else.
+
+====== TimingSimpleCPU analysis #15
+
+Executes `BaseXBar::Layer<SrcType, DstType>::releaseLayer`.
+
+====== TimingSimpleCPU analysis #16
+
+Executes `DRAMCtrl::processNextReqEvent()`.
+
+====== TimingSimpleCPU analysis #17
+
+Executes `DRAMCtrl::processRespondEvent()`.
+
+====== TimingSimpleCPU analysis #18
+
+Schedules `PacketQueue::processSendEvent()` through:

 ....
-1000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 executed @ 1000
+PacketQueue::schedSendEvent
+PacketQueue::schedSendTiming
+QueuedSlavePort::schedTimingResp
+DRAMCtrl::accessAndRespond
+DRAMCtrl::processRespondEvent
 ....

-and then `DRAMCtrl::processNextReqEvent()` which also does not schedule any one:
+====== TimingSimpleCPU analysis #19
+
+Executes `PacketQueue::processSendEvent()`.
+
+====== TimingSimpleCPU analysis #20
+
+Schedules `PacketQueue::processSendEvent` through:

 ....
-5000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 5000
+EventManager::schedule
+PacketQueue::schedSendEvent
+PacketQueue::schedSendTiming
+QueuedSlavePort::schedTimingResp
+CoherentXBar::recvTimingResp
+CoherentXBar::CoherentXBarMasterPort::recvTimingResp
+TimingResponseProtocol::sendResp
+SlavePort::sendTimingResp
+RespPacketQueue::sendTiming
+PacketQueue::sendDeferredPacket
+PacketQueue::processSendEvent
 ....

-and then `DRAMCtrl::processRespondEvent()`:
+From this backtrace, we see that this event is happening as the fetch reply packet finally comes back from DRAM.

-....
-46250: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 executed @ 46250
-....
+====== TimingSimpleCPU analysis #21

-which schedules `PacketQueue::processSendEvent()`:
-
-....
-46250: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 scheduled @ 74250
-....
-
-That executes:
-
-....
-74250: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 executed @ 74250
-....
-
-and schedules more `PacketQueue::processSendEvent()`:
-
-....
-74250: Event: system.membus.slave[1]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 64 scheduled @ 77000
-....
-
-and via:
+Schedules `BaseXBar::Layer<SrcType, DstType>::releaseLayer` through:

 ....
 EventManager::schedule
@@ -13969,27 +14085,17 @@ PacketQueue::sendDeferredPacket
 PacketQueue::processSendEvent
 ....

-schedules `BaseXBar::Layer<SrcType, DstType>::releaseLayer()`:
+====== TimingSimpleCPU analysis #22

-....
-74250: Event: system.membus.respLayer1.wrapped_function_event: EventFunctionWrapped 65 scheduled @ 77000
-....
+Executes `BaseXBar::Layer<SrcType, DstType>::releaseLayer`.

-Next it runs:
+====== TimingSimpleCPU analysis #23

-....
-77000: Event: system.membus.respLayer1.wrapped_function_event: EventFunctionWrapped 65 executed @ 77000
-....
+Executes `PacketQueue::processSendEvent`.

-and schedules nothing.
+====== TimingSimpleCPU analysis #24

-`PacketQueue::processSendEvent()` executes:
-
-....
-77000: Event: system.membus.slave[1]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 64 executed @ 77000
-....
-
-and at very long long last, the `icache` schedule comes via:
+Schedules `TimingSimpleCPU::IcachePort::ITickEvent::process()` through:

 ....
 EventManager::schedule
@@ -14002,22 +14108,82 @@ PacketQueue::sendDeferredPacket
 PacketQueue::processSendEvent
 ....

-which schedules `TimingSimpleCPU::IcachePort::ITickEvent::process()` which calls `TimingSimpleCPU::completeIfetch(PacketPtr pkt)`:
+====== TimingSimpleCPU analysis #25
+
+Executes `TimingSimpleCPU::IcachePort::ITickEvent::process()`.
+
+This custom `process` then calls `TimingSimpleCPU::completeIfetch(PacketPtr pkt)`, and that finally executes the very first instruction:

 ....
-77000: Event: Event_40: Timing CPU icache tick 40 scheduled @ 77000
-....
-
-That then finally executes the very first instruction:
-
-....
-77000: Event: Event_40: Timing CPU icache tick 40 executed @ 77000
 77000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue    :   movz   x0, #1, #0        : IntAlu :  D=0x0000000000000001  flags=(IsInteger)
 ....

-The end of this instruction must be setting things up in a way that can continue the PC walk loop, and by looking at the source and traces, it is clearly from: `TimingSimpleCPU::advanceInst` which calls `TimingSimpleCPU::fetch`, which is the very thing we did in this simulation!!! OMG, that's the loop.
+The end of this instruction must be setting things up in a way that can continue the PC walk loop, and by looking at the source and traces, it is clearly from: `TimingSimpleCPU::advanceInst` which calls `TimingSimpleCPU::fetch`.

-One final thing to check, is how the memory reads are going to make the processor stall in the middle of an instruction.
+And `TimingSimpleCPU::fetch` is the very thing we did in this simulation at <<timingsimplecpu-analysis-0>>!!! OMG, that's the loop.
+
+====== TimingSimpleCPU analysis #26
+
+Schedules `DRAMCtrl::processNextReqEvent` through:
+
+....
+EventManager::schedule
+DRAMCtrl::addToReadQueue
+DRAMCtrl::recvTimingReq
+DRAMCtrl::MemoryPort::recvTimingReq
+TimingRequestProtocol::sendReq
+MasterPort::sendTimingReq
+CoherentXBar::recvTimingReq
+CoherentXBar::CoherentXBarSlavePort::recvTimingReq
+TimingRequestProtocol::sendReq
+MasterPort::sendTimingReq
+TimingSimpleCPU::sendFetch
+TimingSimpleCPU::FetchTranslation::finish
+ArmISA::TLB::translateComplete
+ArmISA::TLB::translateTiming
+ArmISA::TLB::translateTiming
+TimingSimpleCPU::fetch
+TimingSimpleCPU::advanceInst
+TimingSimpleCPU::completeIfetch
+TimingSimpleCPU::IcachePort::ITickEvent::process
+....
+
+====== TimingSimpleCPU analysis #27
+
+Schedules `BaseXBar::Layer<SrcType, DstType>::releaseLayer` through:
+
+....
+EventManager::schedule
+BaseXBar::Layer<SlavePort, MasterPort>::occupyLayer
+BaseXBar::Layer<SlavePort, MasterPort>::succeededTiming
+CoherentXBar::recvTimingReq
+CoherentXBar::CoherentXBarSlavePort::recvTimingReq
+TimingRequestProtocol::sendReq
+MasterPort::sendTimingReq
+TimingSimpleCPU::sendFetch
+TimingSimpleCPU::FetchTranslation::finish
+ArmISA::TLB::translateComplete
+ArmISA::TLB::translateTiming
+ArmISA::TLB::translateTiming
+TimingSimpleCPU::fetch
+TimingSimpleCPU::advanceInst
+TimingSimpleCPU::completeIfetch
+TimingSimpleCPU::IcachePort::ITickEvent::process
+....
+
+====== TimingSimpleCPU analysis #28
+
+Execute `DRAMCtrl::processNextReqEvent`.
+
+====== TimingSimpleCPU analysis #29
+
+Schedule `DRAMCtrl::processRespondEvent()`.
+
+====== TimingSimpleCPU analysis: LDR stall
+
+One important thing we want to check now, is how the memory reads are going to make the processor stall in the middle of an instruction.
+
+Since we were using a simple CPU without a pipeline, the data memory access stall everything: there is no further progress until memory comes back.

 For that, we can GDB to the `TimingSimpleCPU::completeIfetch` of the first LDR done in our test program.

@@ -14045,7 +14211,54 @@ So, where is the `execute` happening? Well, I'll satisfy myself with a quick sou
 +
 It also calls `+curStaticInst->completeAcc+`, which pairs up with the `initiateAcc` call.

-Finally, since we were using a simple CPU without a pipeline, the memory access stall everything, there is no further progress until memory comes back.
+The following is the region of interest of the event log:
+
+....
+ 175000: Event: Event_40: Timing CPU icache tick 40 executed @ 175000
+ 175000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 175000
+ 175000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 scheduled @ 176000
+
+ 175000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 175000
+ 175000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 scheduled @ 193750
+ 175000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 scheduled @ 175000
+
+ 175000: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 9 executed @ 175000
+
+ 176000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 60 executed @ 176000
+
+ 193750: Event: system.mem_ctrls.wrapped_function_event: EventFunctionWrapped 10 executed @ 193750
+ 193750: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 scheduled @ 221750
+
+ 221750: Event: system.mem_ctrls.port-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 8 executed @ 221750
+ 221750: Event: system.membus.slave[2]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 66 scheduled @ 224000
+ 221750: Event: system.membus.respLayer2.wrapped_function_event: EventFunctionWrapped 67 scheduled @ 224000
+
+ 224000: Event: system.membus.respLayer2.wrapped_function_event: EventFunctionWrapped 67 executed @ 224000
+
+ 224000: Event: system.membus.slave[2]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 66 executed @ 224000
+ 224000: Event: Event_42: Timing CPU dcache tick 42 scheduled @ 224000
+
+ 224000: Event: Event_42: Timing CPU dcache tick 42 executed @ 224000
+ 175000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+8    :   ldr   w2, #4194464       : MemRead :  D=0x0000000000000006 A=0x4000a0  flags=(IsInteger|IsMemRef|IsLoad)
+....
+
+We first find it by looking for the `ExecEnable` of LDR.
+
+Then, we go up to the previous `Timing CPU icache tick` event, which from the analysis of previous instruction instructions, we know is where the instruction execution starts, the LDR instruction fetch is done by then!
+
+Next, several events happen as the data request must be percolating through the memory system, it must be very similar to the instruction fetches. TODO analyze event function names.
+
+Finally, at last we reach
+
+....
+ 224000: Event: Event_42: Timing CPU dcache tick 42 executed @ 224000
+ 175000: ExecEnable: system.cpu: A0 T0 : @asm_main_after_prologue+8    :   ldr   w2, #4194464       : MemRead :  D=0x0000000000000006 A=0x4000a0  flags=(IsInteger|IsMemRef|IsLoad)
+....
+
+from which we guess:
+
+* `224000`: this is the time that the data request finally returned, and at which execute gets called
+* `175000`: the log finally prints at the end of execution, but it does not show the actual time that things finished, but rather the time that the ifetch finished, which happened in the past

 ===== gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis with caches

@@ -14111,6 +14324,20 @@ It is worth noting that the caches do not affect the `ArmITB` and `ArmDTB` <<arm
 .`config.dot.svg` for a TimingSimpleCPU with caches.
 image::{cirosantilli-media-base}gem5_config_TimingSimpleCPU_caches_12c917de54145d2d50260035ba7fa614e25317a3.svg?sanitize=true[height=600]

+===== gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis with caches and multiple CPUs
+
+TODO is this the minimal setup that allows us to see the <<gem5-crossbar-interconnect>>? Can we see anything in `AtomicSimpleCPU`?
+
+It would be amazing to analyze a simple example with interconnect packets possibly invalidating caches of other CPUs.
+
+To observe it we could create one well controlled workload with instructions that flush memory, and run it on two CPUs.
+
+If we don't use such instructions that flush memory, we would only see the interconnect at work when caches run out.
+
+[[config-dot-svg-timingsimplecpu-caches-2-cpus]]
+.`config.dot.svg` for a system with two TimingSimpleCPU with caches.
+image::{cirosantilli-media-base}gem5_config_TimingSimpleCPU_caches_2_CPUs_12c917de54145d2d50260035ba7fa614e25317a3.svg?sanitize=true[height=600]
+
 ===== gem5 event queue MinorCPU syscall emulation freestanding example analysis

 The events <<gem5-event-queue-atomicsimplecpu-syscall-emulation-freestanding-example-analysis,for the Atomic CPU>> were pretty simple: basically just ticks.