mirror of
https://github.com/cirosantilli/linux-kernel-module-cheat.git
synced 2026-01-25 03:01:36 +01:00
This commit is contained in:
348
index.html
348
index.html
@@ -1270,7 +1270,11 @@ pre{ white-space:pre }
|
||||
</li>
|
||||
<li><a href="#gem5-packet-vs-request">19.21.2. gem5 <code>Packet</code> vs <code>Request</code></a>
|
||||
<ul class="sectlevel4">
|
||||
<li><a href="#gem5-packet">19.21.2.1. gem5 <code>Packet</code></a></li>
|
||||
<li><a href="#gem5-packet">19.21.2.1. gem5 <code>Packet</code></a>
|
||||
<ul class="sectlevel5">
|
||||
<li><a href="#gem5-memcmd">19.21.2.1.1. gem5 <code>MemCmd</code></a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a href="#gem5-request">19.21.2.2. gem5 <code>Request</code></a>
|
||||
<ul class="sectlevel5">
|
||||
<li><a href="#gem5-request-in-atomicsimplecpu">19.21.2.2.1. gem5 <code>Request</code> in <code>AtomicSimpleCPU</code></a></li>
|
||||
@@ -2383,7 +2387,7 @@ pre{ white-space:pre }
|
||||
<li>
|
||||
<p>every problem that you face is caused by imperfections introduced by other humans.</p>
|
||||
<div class="paragraph">
|
||||
<p>It is much easier to accept limitations of physics, and even natural selection in biology, which is are produced by a sentient being (?).</p>
|
||||
<p>It is much easier to accept limitations of physics, and even natural selection in biology, which are not produced by a sentient being (?).</p>
|
||||
</div>
|
||||
</li>
|
||||
</ul>
|
||||
@@ -2391,7 +2395,7 @@ pre{ white-space:pre }
|
||||
</div>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>Physics-based engineering, just like low level hardware, is of course completely closed source however, since wrestling against the laws of physics is about the most expensive thing humans can do.</p>
|
||||
<p>Physics-based engineering, just like low level hardware, is of course completely closed source however, since wrestling against the laws of physics is about the most expensive thing humans can do, so there’s also a downside to it.</p>
|
||||
</div>
|
||||
</li>
|
||||
</ul>
|
||||
@@ -21057,6 +21061,12 @@ m5 execfile</pre>
|
||||
<div class="ulist">
|
||||
<ul>
|
||||
<li>
|
||||
<p><a href="https://stackoverflow.com/questions/63488050/what-are-pseudo-instructions-for-in-gem5/63489417#63489417" class="bare">https://stackoverflow.com/questions/63488050/what-are-pseudo-instructions-for-in-gem5/63489417#63489417</a></p>
|
||||
</li>
|
||||
<li>
|
||||
<p><a href="https://stackoverflow.com/questions/62757008/how-to-use-m5-in-gem5-20/62759204#62759204" class="bare">https://stackoverflow.com/questions/62757008/how-to-use-m5-in-gem5-20/62759204#62759204</a></p>
|
||||
</li>
|
||||
<li>
|
||||
<p><a href="https://stackoverflow.com/questions/56506154/how-to-analyze-only-interest-area-in-source-code-by-using-gem5/56506419#56506419" class="bare">https://stackoverflow.com/questions/56506154/how-to-analyze-only-interest-area-in-source-code-by-using-gem5/56506419#56506419</a></p>
|
||||
</li>
|
||||
<li>
|
||||
@@ -23406,7 +23416,18 @@ Addr addr;</pre>
|
||||
<p><code>Flags flags;</code>: flags describing properties of the <code>Packet</code></p>
|
||||
</li>
|
||||
<li>
|
||||
<p><code>MemCmd cmd;</code>: this is basically an enumeration of possible commands, stuff like:</p>
|
||||
<p><code>MemCmd cmd;</code>: see <a href="#gem5-memcmd">gem5 <code>MemCmd</code></a></p>
|
||||
</li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="sect5">
|
||||
<h6 id="gem5-memcmd"><a class="anchor" href="#gem5-memcmd"></a><a class="link" href="#gem5-memcmd">19.21.2.1.1. gem5 <code>MemCmd</code></a></h6>
|
||||
<div class="paragraph">
|
||||
<p>Each <a href="#gem5-packet">gem5 <code>Packet</code></a> contains a <code>MemCmd</code></p>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>The <code>MemCmd</code> is basically an enumeration of possible commands, stuff like:</p>
|
||||
</div>
|
||||
<div class="literalblock">
|
||||
<div class="content">
|
||||
<pre>enum Command
|
||||
@@ -23461,10 +23482,30 @@ MemCmd::commandInfo[] =
|
||||
};</pre>
|
||||
</div>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>Some important commands include:</p>
|
||||
</div>
|
||||
<div class="ulist">
|
||||
<ul>
|
||||
<li>
|
||||
<p><code>ReadReq</code>: what the CPU sends out to its cache, see also: <a href="#gem5-event-queue-atomicsimplecpu-syscall-emulation-freestanding-example-analysis-with-caches-and-multiple-cpus">gem5 event queue AtomicSimpleCPU syscall emulation freestanding example analysis with caches and multiple CPUs</a></p>
|
||||
</li>
|
||||
<li>
|
||||
<p><code>ReadSharedReq</code>: what dcache of the CPU sends forward to the <a href="#gem5-crossbar-interconnect">gem5 crossbar interconnect</a> after a <code>ReadReq</code>, see also: see also: <a href="#gem5-event-queue-atomicsimplecpu-syscall-emulation-freestanding-example-analysis-with-caches-and-multiple-cpus">gem5 event queue AtomicSimpleCPU syscall emulation freestanding example analysis with caches and multiple CPUs</a></p>
|
||||
</li>
|
||||
<li>
|
||||
<p><code>ReadResp</code>: response to a <code>ReadReq</code>. Can come from either DRAM or another cache that has the data. On <a href="#gem5-event-queue-atomicsimplecpu-syscall-emulation-freestanding-example-analysis-with-caches-and-multiple-cpus">gem5 event queue AtomicSimpleCPU syscall emulation freestanding example analysis with caches and multiple CPUs</a> we see that a new packet is created.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p><code>WriteReq</code>: what the CPU sends out to its cache, see also: <a href="#gem5-event-queue-atomicsimplecpu-syscall-emulation-freestanding-example-analysis-with-caches-and-multiple-cpus">gem5 event queue AtomicSimpleCPU syscall emulation freestanding example analysis with caches and multiple CPUs</a></p>
|
||||
</li>
|
||||
<li>
|
||||
<p><code>UpgradeReq</code>: what dcache of CPU sends forward after a <code>WriteReq</code></p>
|
||||
</li>
|
||||
</ul>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="sect4">
|
||||
<h5 id="gem5-request"><a class="anchor" href="#gem5-request"></a><a class="link" href="#gem5-request">19.21.2.2. gem5 <code>Request</code></a></h5>
|
||||
<div class="paragraph">
|
||||
@@ -23658,6 +23699,37 @@ TimingSimpleCPU::initiateMemRead(Addr addr, unsigned size,
|
||||
<div class="paragraph">
|
||||
<p>and Queue is also a gem5 class under <code>src/mem/cache/queue.hh</code>.</p>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>The MSHR basically keeps track of all information the cache receives, and helps it take appropriate action. I’m not sure why it is separate form the cache at all, as it is basically performing essential cache bookkeeping.</p>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>A clear example of MSHR in action can be seen at: <a href="#gem5-event-queue-timingsimplecpu-syscall-emulation-freestanding-example-analysis-with-caches-and-multiple-cpus">gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis with caches and multiple CPUs</a>. In that example what happened was:</p>
|
||||
</div>
|
||||
<div class="ulist">
|
||||
<ul>
|
||||
<li>
|
||||
<p>CPU1 writes to an address and it completes</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>CPU2 sends read</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>CPU1 writes to the address again</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>CPU2 snoops the write, and notes it down in its MSHR</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>CPU2 receives a snoop reply for its read, also from CPU1 which has the data and the line becomes valid</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>CPU2 gets its data. But the MSHR remembers that it had also received a write snoop, so it also immediately invalidates that line</p>
|
||||
</li>
|
||||
</ul>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>From this we understand that MSHR is the part of the cache that synchronizes stuff pending snoops and ensures that things get invalidated.</p>
|
||||
</div>
|
||||
</div>
|
||||
<div class="sect3">
|
||||
<h4 id="gem5-commmonitor"><a class="anchor" href="#gem5-commmonitor"></a><a class="link" href="#gem5-commmonitor">19.21.4. gem5 <code>CommMonitor</code></a></h4>
|
||||
@@ -26616,15 +26688,17 @@ global 147</pre>
|
||||
</div>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>Grepping the logs for my_thread_main+24 shows where the first non-atomic interleaves happen at:</p>
|
||||
<p>Grepping the logs with <code>grep '@my_thread_main\+24</code> shows where the first non-atomic interleaves happen at:</p>
|
||||
</div>
|
||||
<div class="literalblock">
|
||||
<div class="content">
|
||||
<pre>471039000: ExecEnable: system.cpu1: A0 T0 : @my_thread_main+24 : ldr x1, [x2] : MemRead : D=0x000000000000002f A=0x412068 flags=(IsInteger|IsMemRef|IsLoad)
|
||||
471034000: ExecEnable: system.cpu2: A0 T0 : @my_thread_main+24 : ldr x1, [x2] : MemRead : D=0x000000000000002f A=0x412068 flags=(IsInteger|IsMemRef|IsLoad)
|
||||
471059000: ExecEnable: system.cpu1: A0 T0 : @my_thread_main+44 : b.hi <my_thread_main+24> : IntAlu : flags=(IsControl|IsDirectControl|IsCondControl)
|
||||
471070000: ExecEnable: system.cpu2: A0 T0 : @my_thread_main+44 : b.hi <my_thread_main+24> : IntAlu : flags=(IsControl|IsDirectControl|IsCondControl)
|
||||
471071000: ExecEnable: system.cpu2: A0 T0 : @my_thread_main+24 : ldr x1, [x2] : MemRead : D=0x0000000000000030 A=0x412068 flags=(IsInteger|IsMemRef|IsLoad)</pre>
|
||||
<pre>[many other CPU1 hits]
|
||||
471199000: ExecEnable: system.cpu1: A0 T0 : @my_thread_main+24 : ldr x1, [x2] : MemRead : D=0x000000000000002e A=0x412068 flags=(IsInteger|IsMemRef|IsLoad)
|
||||
471207000: ExecEnable: system.cpu1: A0 T0 : @my_thread_main+24 : ldr x1, [x2] : MemRead : D=0x000000000000002f A=0x412068 flags=(IsInteger|IsMemRef|IsLoad)
|
||||
471202000: ExecEnable: system.cpu2: A0 T0 : @my_thread_main+24 : ldr x1, [x2] : MemRead : D=0x000000000000002f A=0x412068 flags=(IsInteger|IsMemRef|IsLoad)
|
||||
471239000: ExecEnable: system.cpu2: A0 T0 : @my_thread_main+24 : ldr x1, [x2] : MemRead : D=0x0000000000000030 A=0x412068 flags=(IsInteger|IsMemRef|IsLoad)
|
||||
471228000: ExecEnable: system.cpu1: A0 T0 : @my_thread_main+24 : ldr x1, [x2] : MemRead : D=0x0000000000000030 A=0x412068 flags=(IsInteger|IsMemRef|IsLoad)
|
||||
471269000: ExecEnable: system.cpu1: A0 T0 : @my_thread_main+24 : ldr x1, [x2] : MemRead : D=0x0000000000000031 A=0x412068 flags=(IsInteger|IsMemRef|IsLoad)</pre>
|
||||
</div>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
@@ -26639,10 +26713,10 @@ global 147</pre>
|
||||
<p>the shared address of interest is 0x412068</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>the physical address is 2068</p>
|
||||
<p>the physical address is 0x2068</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>it fits into the cache line for 2040:207f</p>
|
||||
<p>it fits into the cache line for 0x2040:0x207f</p>
|
||||
</li>
|
||||
</ul>
|
||||
</div>
|
||||
@@ -26651,126 +26725,228 @@ global 147</pre>
|
||||
</div>
|
||||
<div class="literalblock">
|
||||
<div class="content">
|
||||
<pre>471033000: SimpleCPU: system.cpu2: Fetch
|
||||
471033000: SimpleCPU: system.cpu2: Translating address 0x400a88</pre>
|
||||
<pre>471201000: SimpleCPU: system.cpu2: Fetch
|
||||
471201000: SimpleCPU: system.cpu2: Translating address 0x400a88</pre>
|
||||
</div>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>Things get a bit interleaved with CPU1, but soon afterwards we see the miss forwarding via <a href="#gem5-mshr">gem5 <code>MSHR</code></a> as in <a href="#gem5-event-queue-timingsimplecpu-syscall-emulation-freestanding-example-analysis-with-caches">gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis with caches</a>:</p>
|
||||
<p>Things get a bit interleaved with CPU1, but soon afterwards we see the CPU2 make its memory request to the cache:</p>
|
||||
</div>
|
||||
<div class="literalblock">
|
||||
<div class="content">
|
||||
<pre>471034000: Cache: system.cpu2.dcache: access for ReadReq [2068:206f] D=b0d989c328560000 ptr=0x5628c3d26f00 miss
|
||||
471034000: CachePort: system.cpu2.dcache.mem_side: Scheduling send event at 471035000
|
||||
471034000: Event: system.cpu2.dcache.mem_side-MemSidePort.wrapped_function_event: EventFunctionWrapped 140 scheduled @ 471035000</pre>
|
||||
<pre>471202000: Event: Event_134: Timing CPU icache tick 134 executed @ 471202000
|
||||
471202000: SimpleCPU: system.cpu2: Complete ICache Fetch for addr 0xa88
|
||||
471202000: Cache: system.cpu2.dcache: access for ReadReq [2068:206f] D=c879334bb1550000 num=266073 miss
|
||||
471202000: CachePort: system.cpu2.dcache.mem_side: Scheduling send event at 471203000
|
||||
471202000: Event: system.cpu2.dcache.mem_side-MemSidePort.wrapped_function_event: EventFunctionWrapped 140 scheduled @ 471203000</pre>
|
||||
</div>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>Before the request moves on, some CPU1 action happens: a CPU1 STR finished! It hit the cache, and now we know the cache state: M:</p>
|
||||
<p>Before the request moves on, some CPU1 action happens: a CPU1 is sending its data out! It hit the cache, and now we confirm that the cache is in <a href="#moesi">state: M</a> as expected, since CPU1 had already been previously writting repeatedly to that address:</p>
|
||||
</div>
|
||||
<div class="literalblock">
|
||||
<div class="content">
|
||||
<pre>471034000: Cache: system.cpu1.dcache: access for WriteReq [2068:206f] D=2f00000000000000 ptr=0x5628c3d26c80 hit state: f (M) valid: 1 writable: 1 readable: 1 dirty: 1 | tag: 0 set: 0x81 way: 0
|
||||
471034000: ExecEnable: system.cpu1: A0 T0 : @my_thread_main+32 : str x1, [x2] : MemWrite : D=0x000000000000002f A=0x412068 flags=(IsInteger|IsMemRef|IsStore)</pre>
|
||||
<pre>471202000: Event: Event_87: Timing CPU icache tick 87 executed @ 471202000
|
||||
471202000: SimpleCPU: system.cpu1: Complete ICache Fetch for addr 0xa90
|
||||
471202000: Cache: system.cpu1.dcache: access for WriteReq [2068:206f] D=2f00000000000000 num=266074 hit state: f (M) valid: 1 writable: 1 readable: 1 dirty: 1 | tag: 0 set: 0x81 way: 0
|
||||
471202000: CacheVerbose: system.cpu1.dcache: satisfyRequest for WriteReq [2068:206f] D=2f00000000000000 num=266074 (write)
|
||||
471202000: Event: system.cpu1.dcache.cpu_side-CpuSidePort.wrapped_function_event: EventFunctionWrapped 91 scheduled @ 471203000</pre>
|
||||
</div>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>After this is done, CPU2 dcache finally decides that it is time to forward its request, and <em>now</em> we see the crux of this experiment happen.</p>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>First <code>createMissPacket</code> creates a new packet for the cache request, and then it sends that packet into <code>CoherentXBar</code>.</p>
|
||||
<p>Immediately afterwards, CPU1 gets its reply from the cache, which is fast as that was a hit, and its STR finishes:</p>
|
||||
</div>
|
||||
<div class="literalblock">
|
||||
<div class="content">
|
||||
<pre>471035000: Event: system.cpu2.dcache.mem_side-MemSidePort.wrapped_function_event: EventFunctionWrapped 140 executed @ 471035000
|
||||
471035000: Cache: system.cpu2.dcache: sendMSHRQueuePacket: MSHR ReadReq [2068:206f] D=b0d989c328560000 ptr=0x5628c3d26f00
|
||||
471035000: Cache: system.cpu2.dcache: createMissPacket: created ReadSharedReq [2040:207f] D=c0ae37c4285600005b323036383a323036665d20443d62306439383963333238353630303030207074723d307835363238633364323666303000000000000000 ptr=0x5628c3d26e80 from ReadReq [2068:206f] D=b0d989c328560000 ptr=0x5628c3d26f00
|
||||
471035000: CoherentXBar: system.membus: recvTimingReq: src system.membus.slave[10] packet ReadSharedReq [2040:207f] D=c0ae37c4285600005b323036383a323036665d20443d62306439383963333238353630303030207074723d307835363238633364323666303000000000000000 ptr=0x5628c3d26e80</pre>
|
||||
<pre>471203000: Event: system.cpu1.dcache.cpu_side-CpuSidePort.wrapped_function_event: EventFunctionWrapped 91 executed @ 471203000
|
||||
471203000: SimpleCPU: system.cpu1.dcache_port: Received load/store response 0x2068
|
||||
471203000: Event: Event_89: Timing CPU dcache tick 89 scheduled @ 471203000
|
||||
471203000: Event: Event_89: Timing CPU dcache tick 89 executed @ 471203000
|
||||
471202000: ExecEnable: system.cpu1: A0 T0 : @my_thread_main+32 : str x1, [x2] : MemWrite : D=0x000000000000002f A=0x412068 flags=(IsInteger|IsMemRef|IsStore)</pre>
|
||||
</div>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>Now, the <code>SnoopFilte</code> which lives inside the crossbar decides if any other CPUs care aout hat address:</p>
|
||||
<p>Now we approach the crux of this example: cpu2 dcache decides to forward its read request:</p>
|
||||
</div>
|
||||
<div class="literalblock">
|
||||
<div class="content">
|
||||
<pre>471035000: SnoopFilter: system.membus.snoop_filter: lookupRequest: src system.membus.slave[10] packet ReadSharedReq [2040:207f] D=c0ae37c4285600005b323036383a323036665d20443d62306439383963333238353630303030207074723d307835363238633364323666303000000000000000 ptr=0x5628c3d26e80
|
||||
471035000: SnoopFilter: system.membus.snoop_filter: lookupRequest: SF value 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000
|
||||
471035000: SnoopFilter: system.membus.snoop_filter: lookupRequest: new SF value 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000
|
||||
471035000: CoherentXBar: system.membus: recvTimingReq: src system.membus.slave[10] packet ReadSharedReq [2040:207f] D=c0ae37c4285600005b323036383a323036665d20443d62306439383963333238353630303030207074723d307835363238633364323666303000000000000000 ptr=0x5628c3d26e80 SF size: 1 lat: 1</pre>
|
||||
<pre>471203000: Event: system.cpu2.dcache.mem_side-MemSidePort.wrapped_function_event: EventFunctionWrapped 140 executed @ 471203000
|
||||
471203000: Cache: system.cpu2.dcache: sendMSHRQueuePacket: MSHR ReadReq [2068:206f] D=c879334bb1550000 num=266073
|
||||
471203000: Cache: system.cpu2.dcache: createMissPacket: created ReadSharedReq [2040:207f] D=40cfe14bb15500005b323036383a323036665d20443d63383739333334626231353530303030206e756d3d32363630373300000000016d6973730a0000000000 num=266076 from ReadReq [2068:206f] D=c879334bb1550000 num=266073</pre>
|
||||
</div>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>and the answer is yes: CPU1 does care about that address obviously! So the packet is forwarded as is to CPU1:</p>
|
||||
<p>Here, CPU2 dcache finally forwards to the XBar its request via the <a href="#gem5-mshr">gem5 <code>MSHR</code></a> mechanism as in <a href="#gem5-event-queue-timingsimplecpu-syscall-emulation-freestanding-example-analysis-with-caches">gem5 event queue TimingSimpleCPU syscall emulation freestanding example analysis with caches</a></p>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p><code>createMissPacket</code> creates a new packet for the cache request with a different type: <code>ReadSharedReq</code> instead of the original <code>ReadReq</code>, and then it sends that packet into <a href="#gem5-crossbar-interconnect"><code>CoherentXBar</code></a>:</p>
|
||||
</div>
|
||||
<div class="literalblock">
|
||||
<div class="content">
|
||||
<pre>471035000: CoherentXBar: system.membus: forwardTiming for ReadSharedReq [2040:207f] D=c0ae37c4285600005b323036383a323036665d20443d62306439383963333238353630303030207074723d307835363238633364323666303000000000000000 ptr=0x5628c3d26e80
|
||||
471035000: CacheVerbose: system.cpu1.dcache: recvTimingSnoopReq: for ReadSharedReq [2040:207f] D=c0ae37c4285600005b323036383a323036665d20443d62306439383963333238353630303030207074723d307835363238633364323666303000000000000000 ptr=0x5628c3d26e80
|
||||
471035000: CacheVerbose: system.cpu1.dcache: handleSnoop: for ReadSharedReq [2040:207f] D=c0ae37c4285600005b323036383a323036665d20443d62306439383963333238353630303030207074723d307835363238633364323666303000000000000000 ptr=0x5628c3d26e80
|
||||
471035000: Cache: system.cpu1.dcache: handleSnoop: snoop hit for ReadSharedReq [2040:207f] D=c0ae37c4285600005b323036383a323036665d20443d62306439383963333238353630303030207074723d307835363238633364323666303000000000000000 ptr=0x5628c3d26e80, old state is state: f (M) valid: 1 writable: 1 readable: 1 dirty: 1 | tag: 0 set: 0x81 way: 0
|
||||
471035000: Cache: system.cpu1.dcache: new state is state: d (O) valid: 1 writable: 0 readable: 1 dirty: 1 | tag: 0 set: 0x81 way: 0
|
||||
471035000: Cache: system.cpu1.dcache: doTimingSupplyResponse: for ReadSharedReq [2040:207f] D=c0ae37c4285600005b323036383a323036665d20443d62306439383963333238353630303030207074723d307835363238633364323666303000000000000000 ptr=0x5628c3d26e80
|
||||
471035000: CacheVerbose: system.cpu1.dcache: doTimingSupplyResponse: created response: ReadResp [2040:207f] D=700640000000000070064000000000000000000000000000000000000000000000000000000000002f0000000000000000000000000000000000000000000000 ptr=0x5628c3d27000 tick: 471044000
|
||||
471035000: Event: system.cpu1.dcache.mem_side-MemSidePort.wrapped_function_event: EventFunctionWrapped 94 scheduled @ 471044000
|
||||
471035000: CoherentXBar: system.membus: recvTimingReq: Not forwarding ReadSharedReq [2040:207f] D=c0ae37c4285600005b323036383a323036665d20443d62306439383963333238353630303030207074723d307835363238633364323666303000000000000000 ptr=0x5628c3d26e80
|
||||
471035000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 164 scheduled @ 471036000
|
||||
471035000: BaseXBar: system.membus.reqLayer0: The crossbar layer is now busy from tick 471035000 to 471036000</pre>
|
||||
<pre>471203000: CoherentXBar: system.membus: recvTimingReq: src system.membus.slave[10] packet ReadSharedReq [2040:207f] D=40cfe14bb15500005b323036383a323036665d20443d63383739333334626231353530303030206e756d3d32363630373300000000016d6973730a0000000000 num=266076
|
||||
471203000: SnoopFilter: system.membus.snoop_filter: lookupRequest: src system.membus.slave[10] packet ReadSharedReq [2040:207f] D=40cfe14bb15500005b323036383a323036665d20443d63383739333334626231353530303030206e756d3d32363630373300000000016d6973730a0000000000 num=266076
|
||||
471203000: SnoopFilter: system.membus.snoop_filter: lookupRequest: SF value 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000
|
||||
471203000: SnoopFilter: system.membus.snoop_filter: lookupRequest: new SF value 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000
|
||||
471203000: CoherentXBar: system.membus: recvTimingReq: src system.membus.slave[10] packet ReadSharedReq [2040:207f] D=40cfe14bb15500005b323036383a323036665d20443d63383739333334626231353530303030206e756d3d32363630373300000000016d6973730a0000000000 num=266076 SF size: 1 lat: 1
|
||||
471203000: CoherentXBar: system.membus: forwardTiming for ReadSharedReq [2040:207f] D=40cfe14bb15500005b323036383a323036665d20443d63383739333334626231353530303030206e756d3d32363630373300000000016d6973730a0000000000 num=266076</pre>
|
||||
</div>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>and from this we see that this read request from CPU2 made a cache from CPU1 go from M to O!</p>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>Then, the CPU1 dcache actually goes ahead, and creates a response or CPU2, since it has the data. This response is sent back to the crossbar which will forward it back to CPU1.</p>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>This also makes the crossbar not forward the original request to DRAM as mentioned at <code>Not forwarding</code>.</p>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>This reply from CPU1 reaches the crossbar at:</p>
|
||||
<p>The XBar receives the request, and notices that CPU1 cares about it, because obviously it has that line from previous writes, so the XBar forwards the exact same request to the CPU1 dcache:</p>
|
||||
</div>
|
||||
<div class="literalblock">
|
||||
<div class="content">
|
||||
<pre>471044000: Event: system.cpu1.dcache.mem_side-MemSidePort.wrapped_function_event: EventFunctionWrapped 94 executed @ 471044000
|
||||
471044000: CoherentXBar: system.membus: recvTimingSnoopResp: src system.membus.slave[6] packet ReadResp [2040:207f] D=700640000000000070064000000000000000000000000000000000000000000000000000000000002f0000000000000000000000000000000000000000000000 ptr=0x5628c3d27000
|
||||
471044000: SnoopFilter: system.membus.snoop_filter: updateSnoopResponse: rsp system.membus.slave[6] req system.membus.slave[10] packet ReadResp [2040:207f] D=700640000000000070064000000000000000000000000000000000000000000000000000000000002f0000000000000000000000000000000000000000000000 ptr=0x5628c3d27000
|
||||
471044000: SnoopFilter: system.membus.snoop_filter: updateSnoopResponse: old SF value 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000
|
||||
471044000: SnoopFilter: system.membus.snoop_filter: updateSnoopResponse: new SF value 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000101000
|
||||
471044000: CoherentXBar: system.membus: recvTimingSnoopResp: src system.membus.slave[6] packet ReadResp [2040:207f] D=700640000000000070064000000000000000000000000000000000000000000000000000000000002f0000000000000000000000000000000000000000000000 ptr=0x5628c3d27000 FWD RESP
|
||||
471044000: Event: system.membus.slave[10]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 186 scheduled @ 471046000
|
||||
471044000: Event: system.membus.respLayer10.wrapped_function_event: EventFunctionWrapped 187 scheduled @ 471049000
|
||||
471044000: BaseXBar: system.membus.respLayer10: The crossbar layer is now busy from tick 471044000 to 471049000</pre>
|
||||
<pre>471203000: CacheVerbose: system.cpu1.dcache: recvTimingSnoopReq: for ReadSharedReq [2040:207f] D=40cfe14bb15500005b323036383a323036665d20443d63383739333334626231353530303030206e756d3d32363630373300000000016d6973730a0000000000 num=266076
|
||||
471203000: CacheVerbose: system.cpu1.dcache: handleSnoop: for ReadSharedReq [2040:207f] D=40cfe14bb15500005b323036383a323036665d20443d63383739333334626231353530303030206e756d3d32363630373300000000016d6973730a0000000000 num=266076
|
||||
471203000: Cache: system.cpu1.dcache: handleSnoop: snoop hit for ReadSharedReq [2040:207f] D=40cfe14bb15500005b323036383a323036665d20443d63383739333334626231353530303030206e756d3d32363630373300000000016d6973730a0000000000 num=266076, old state is state: f (M) valid: 1 writable: 1 readable: 1 dirty: 1 | tag: 0 set: 0x81 way: 0
|
||||
471203000: Cache: system.cpu1.dcache: new state is state: d (O) valid: 1 writable: 0 readable: 1 dirty: 1 | tag: 0 set: 0x81 way: 0
|
||||
471203000: Cache: system.cpu1.dcache: doTimingSupplyResponse: for ReadSharedReq [2040:207f] D=40cfe14bb15500005b323036383a323036665d20443d63383739333334626231353530303030206e756d3d32363630373300000000016d6973730a0000000000 num=266076
|
||||
471203000: CacheVerbose: system.cpu1.dcache: doTimingSupplyResponse: created response: ReadResp [2040:207f] D=700640000000000070064000000000000000000000000000000000000000000000000000000000002f0000000000000000000000000000000000000000000000 num=266078 tick: 471212000
|
||||
471203000: Event: system.cpu1.dcache.mem_side-MemSidePort.wrapped_function_event: EventFunctionWrapped 94 scheduled @ 471212000
|
||||
471203000: CoherentXBar: system.membus: recvTimingReq: Not forwarding ReadSharedReq [2040:207f] D=40cfe14bb15500005b323036383a323036665d20443d63383739333334626231353530303030206e756d3d32363630373300000000016d6973730a0000000000 num=266076</pre>
|
||||
</div>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>and finally, at long last, CPU2 receives the snoop reply that was created in CPU1 and sent back through the crossbar, and the LDR completes:</p>
|
||||
<p>and from this we see that this read request from CPU2 made a cache from CPU1 go <a href="#moesi">from M to O</a>! <a href="#cache-coherence">Cache coherence</a> is being maintained!</p>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>Furthermore, it also suggests that now CPU1 is going to supply the response to CPU2 directly from its cache, and the memory request will be suppressed! As mentioned in lecture notes from <a href="#cache-coherence">Cache coherence</a>, we know that this is one of the ways that cache coherence may be maintained in MOESI-like protocols.</p>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>After this point, CPU1 continues to go around the loop. After a few instructions we don’t care about, we once again reach the LDR:</p>
|
||||
</div>
|
||||
<div class="literalblock">
|
||||
<div class="content">
|
||||
<pre>471046000: Event: system.membus.slave[10]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 186 executed @ 471046000
|
||||
471046000: Cache: system.cpu2.dcache: recvTimingResp: Handling response ReadResp [2040:207f] D=700640000000000070064000000000000000000000000000000000000000000000000000000000002f0000000000000000000000000000000000000000000000 ptr=0x5628c3d27000
|
||||
471046000: Cache: system.cpu2.dcache: Block for addr 0x2040 being updated in Cache
|
||||
471046000: CacheRepl: system.cpu2.dcache: Replacement victim: state: 0 (I) valid: 0 writable: 0 readable: 0 dirty: 0 | tag: 0xffffffffffffffff set: 0x81 way: 0
|
||||
471046000: Cache: system.cpu2.dcache: Block addr 0x2040 (ns) moving from state 0 (I) to state: 5 (S) valid: 1 writable: 0 readable: 1 dirty: 0 | tag: 0 set: 0x81 way: 0
|
||||
471046000: Cache: system.cpu2.dcache: serviceMSHRTargets: updated cmd to ReadRespWithInvalidate [2068:206f] D=2f00000000000000 ptr=0x5628c3d26f00
|
||||
471046000: Event: system.cpu2.dcache.cpu_side-CpuSidePort.wrapped_function_event: EventFunctionWrapped 138 scheduled @ 471047000
|
||||
471046000: Cache: system.cpu2.dcache: processing deferred snoop...
|
||||
471046000: CacheVerbose: system.cpu2.dcache: handleSnoop: for UpgradeReq [2040:207f] D= ptr=0x5628c2d37b80
|
||||
471046000: Cache: system.cpu2.dcache: handleSnoop: snoop hit for UpgradeReq [2040:207f] D= ptr=0x5628c2d37b80, old state is state: 5 (S) valid: 1 writable: 0 readable: 1 dirty: 0 | tag: 0 set: 0x81 way: 0
|
||||
471046000: Cache: system.cpu2.dcache: new state is state: 0 (I) valid: 0 writable: 0 readable: 0 dirty: 0 | tag: 0xffffffffffffffff set: 0x81 way: 0
|
||||
471046000: CacheVerbose: system.cpu2.dcache: recvTimingResp: Leaving with ReadResp [2040:207f] D=700640000000000070064000000000000000000000000000000000000000000000000000000000002f0000000000000000000000000000000000000000000000 ptr=0x5628c3d27000
|
||||
471047000: Event: system.cpu2.dcache.cpu_side-CpuSidePort.wrapped_function_event: EventFunctionWrapped 138 executed @ 471047000
|
||||
471047000: SimpleCPU: system.cpu2.dcache_port: Received load/store response 0x2068
|
||||
471047000: Event: Event_136: Timing CPU dcache tick 136 scheduled @ 471047000
|
||||
471047000: Event: Event_136: Timing CPU dcache tick 136 executed @ 471047000
|
||||
471034000: ExecEnable: system.cpu2: A0 T0 : @my_thread_main+24 : ldr x1, [x2] : MemRead : D=0x000000000000002f A=0x412068 flags=(IsInteger|IsMemRef|IsLoad)</pre>
|
||||
<pre>471207000: SimpleCPU: system.cpu1: Complete ICache Fetch for addr 0xa88
|
||||
471207000: Cache: system.cpu1.dcache: access for ReadReq [2068:206f] D=c879334bb1550000 num=266082 hit state: d (O) valid: 1 writable: 0 readable: 1 dirty: 1 | tag: 0 set: 0x81 way: 0
|
||||
471207000: Event: system.cpu1.dcache.cpu_side-CpuSidePort.wrapped_function_event: EventFunctionWrapped 91 scheduled @ 471208000
|
||||
471208000: Event: system.cpu1.dcache.cpu_side-CpuSidePort.wrapped_function_event: EventFunctionWrapped 91 executed @ 471208000
|
||||
471208000: SimpleCPU: system.cpu1.dcache_port: Received load/store response 0x2068
|
||||
471208000: Event: Event_89: Timing CPU dcache tick 89 scheduled @ 471208000
|
||||
471208000: Event: Event_89: Timing CPU dcache tick 89 executed @ 471208000
|
||||
471207000: ExecEnable: system.cpu1: A0 T0 : @my_thread_main+24 : ldr x1, [x2] : MemRead : D=0x000000000000002f A=0x412068 flags=(IsInteger|IsMemRef|IsLoad)</pre>
|
||||
</div>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>but it is immediately satisfied since the line is already in <code>O</code>, and nothing needs to be sent out to the bus since it’s a read.</p>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>Then the add 1 runs entirely from cache of course, and then CPU1 starts its STR:</p>
|
||||
</div>
|
||||
<div class="literalblock">
|
||||
<div class="content">
|
||||
<pre>471210000: Event: Event_87: Timing CPU icache tick 87 executed @ 471210000
|
||||
471210000: SimpleCPU: system.cpu1: Complete ICache Fetch for addr 0xa90
|
||||
471210000: Cache: system.cpu1.dcache: access for WriteReq [2068:206f] D=3000000000000000 num=266085 hit state: d (O) valid: 1 writable: 0 readable: 1 dirty: 1 | tag: 0 set: 0x81 way: 0
|
||||
471210000: CachePort: system.cpu1.dcache.mem_side: Scheduling send event at 471211000
|
||||
471210000: Event: system.cpu1.dcache.mem_side-MemSidePort.wrapped_function_event: EventFunctionWrapped 93 scheduled @ 471211000</pre>
|
||||
</div>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>In parallel, the CPU1 snoop response to the CPU2 LDR that had been previously sent reaches the XBar:</p>
|
||||
</div>
|
||||
<div class="literalblock">
|
||||
<div class="content">
|
||||
<pre>471212000: Event: system.cpu1.dcache.mem_side-MemSidePort.wrapped_function_event: EventFunctionWrapped 94 executed @ 471212000
|
||||
471212000: CoherentXBar: system.membus: recvTimingSnoopResp: src system.membus.slave[6] packet ReadResp [2040:207f] D=700640000000000070064000000000000000000000000000000000000000000000000000000000002f0000000000000000000000000000000000000000000000 num=266078
|
||||
471212000: SnoopFilter: system.membus.snoop_filter: updateSnoopResponse: rsp system.membus.slave[6] req system.membus.slave[10] packet ReadResp [2040:207f] D=700640000000000070064000000000000000000000000000000000000000000000000000000000002f0000000000000000000000000000000000000000000000 num=266078
|
||||
471212000: SnoopFilter: system.membus.snoop_filter: updateSnoopResponse: old SF value 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000
|
||||
471212000: SnoopFilter: system.membus.snoop_filter: updateSnoopResponse: new SF value 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000101000
|
||||
471212000: CoherentXBar: system.membus: recvTimingSnoopResp: src system.membus.slave[6] packet ReadResp [2040:207f] D=700640000000000070064000000000000000000000000000000000000000000000000000000000002f0000000000000000000000000000000000000000000000 num=266078 FWD RESP
|
||||
471212000: Event: system.membus.slave[10]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 186 scheduled @ 471214000
|
||||
471212000: Event: system.membus.respLayer10.wrapped_function_event: EventFunctionWrapped 187 scheduled @ 471217000
|
||||
471212000: BaseXBar: system.membus.respLayer10: The crossbar layer is now busy from tick 471212000 to 471217000</pre>
|
||||
</div>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>We know that it is the same one based on the packet <code>num=</code> match.</p>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>And just after that, by coincidence, the CPU1 STR write request also starts going to the XBar:</p>
|
||||
</div>
|
||||
<div class="literalblock">
|
||||
<div class="content">
|
||||
<pre>471212001: Event: system.cpu1.dcache.mem_side-MemSidePort.wrapped_function_event: EventFunctionWrapped 93 executed @ 471212001
|
||||
471212001: Cache: system.cpu1.dcache: sendMSHRQueuePacket: MSHR WriteReq [2068:206f] D=3000000000000000 num=266085
|
||||
471212001: Cache: system.cpu1.dcache: createMissPacket: created UpgradeReq [2040:207f] D= num=266086 from WriteReq [2068:206f] D=3000000000000000 num=266085
|
||||
471212001: CoherentXBar: system.membus: recvTimingReq: src system.membus.slave[6] packet UpgradeReq [2040:207f] D= num=266086
|
||||
471212001: SnoopFilter: system.membus.snoop_filter: lookupRequest: src system.membus.slave[6] packet UpgradeReq [2040:207f] D= num=266086
|
||||
471212001: SnoopFilter: system.membus.snoop_filter: lookupRequest: SF value 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000101000
|
||||
471212001: SnoopFilter: system.membus.snoop_filter: lookupRequest: new SF value 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000101000
|
||||
471212001: CoherentXBar: system.membus: recvTimingReq: src system.membus.slave[6] packet UpgradeReq [2040:207f] D= num=266086 SF size: 1 lat: 1
|
||||
471212001: CoherentXBar: system.membus: forwardTiming for UpgradeReq [2040:207f] D= num=266086
|
||||
471212001: CacheVerbose: system.cpu2.dcache: recvTimingSnoopReq: for UpgradeReq [2040:207f] D= num=266086
|
||||
471212001: Cache: global: handleSnoop for UpgradeReq [2040:207f] D= num=266086
|
||||
471212001: CacheVerbose: system.cpu2.dcache: handleSnoop: for UpgradeReq [2040:207f] D= num=266086
|
||||
471212001: CacheVerbose: system.cpu2.dcache: handleSnoop: snoop miss for UpgradeReq [2040:207f] D= num=266086</pre>
|
||||
</div>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>This time, we can see that the <code>WriteReq</code> gets turned into an <code>UpgradeReq</code> by the cache.</p>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>It does not however change the CPU2 cacheline state, because the CPU2 cache is not yet valid line because LDR reply still hasn’t come back! We see on the source code:</p>
|
||||
</div>
|
||||
<div class="literalblock">
|
||||
<div class="content">
|
||||
<pre>Cache::handleSnoop(PacketPtr pkt, CacheBlk *blk, bool is_timing,
|
||||
bool is_deferred, bool pending_inval)
|
||||
{
|
||||
} else if (!blk_valid) {
|
||||
DPRINTF(CacheVerbose, "%s: snoop miss for %s\n", __func__,
|
||||
pkt->print());</pre>
|
||||
</div>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>At last, the CPU1 snoop reply reaches the CPU2 dcache with the (now old <code>2f</code>) data:</p>
|
||||
</div>
|
||||
<div class="literalblock">
|
||||
<div class="content">
|
||||
<pre>471214000: Event: system.membus.reqLayer0.wrapped_function_event: EventFunctionWrapped 164 executed @ 471214000
|
||||
471214000: Event: system.membus.slave[10]-RespPacketQueue.wrapped_function_event: EventFunctionWrapped 186 executed @ 471214000
|
||||
471214000: Cache: system.cpu2.dcache: recvTimingResp: Handling response ReadResp [2040:207f] D=700640000000000070064000000000000000000000000000000000000000000000000000000000002f0000000000000000000000000000000000000000000000 num=266078
|
||||
471214000: Cache: system.cpu2.dcache: Block for addr 0x2040 being updated in Cache
|
||||
471214000: CacheRepl: system.cpu2.dcache: Replacement victim: state: 0 (I) valid: 0 writable: 0 readable: 0 dirty: 0 | tag: 0xffffffffffffffff set: 0x81 way: 0
|
||||
471214000: Cache: system.cpu2.dcache: Block addr 0x2040 (ns) moving from state 0 to state: 5 (S) valid: 1 writable: 0 readable: 1 dirty: 0 | tag: 0 set: 0x81 way: 0</pre>
|
||||
</div>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>On the above, we see that this initially moves the cache to S state.</p>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>However, remember that after CPU2 started its LDR, CPU1 did an STR, and that STR was already snooped by CPU2 above? Well, the MSHR or the cache had noted that down, and now it proceeds to invalidate the line:</p>
|
||||
</div>
|
||||
<div class="literalblock">
|
||||
<div class="content">
|
||||
<pre>471214000: Cache: system.cpu2.dcache: serviceMSHRTargets: updated cmd to ReadRespWithInvalidate [2068:206f] D=2f00000000000000 num=266073
|
||||
471214000: Event: system.cpu2.dcache.cpu_side-CpuSidePort.wrapped_function_event: EventFunctionWrapped 138 scheduled @ 471215000
|
||||
471214000: Cache: system.cpu2.dcache: processing deferred snoop...
|
||||
471214000: CacheVerbose: system.cpu2.dcache: handleSnoop: for UpgradeReq [2040:207f] D= num=266087
|
||||
471214000: Cache: system.cpu2.dcache: handleSnoop: snoop hit for UpgradeReq [2040:207f] D= num=266087, old state is state: 5 (S) valid: 1 writable: 0 readable: 1 dirty: 0 | tag: 0 set: 0x81 way: 0
|
||||
471214000: Cache: system.cpu2.dcache: new state is state: 0 (I) valid: 0 writable: 0 readable: 0 dirty: 0 | tag: 0xffffffffffffffff set: 0x81 way: 0
|
||||
471214000: CacheVerbose: system.cpu2.dcache: recvTimingResp: Leaving with ReadResp [2040:207f] D=700640000000000070064000000000000000000000000000000000000000000000000000000000002f0000000000000000000000000000000000000000000000 num=266078</pre>
|
||||
</div>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>It is a bit funny, but we see that at the same time, both the response arrived with the data, and the cache gets invalidated with a delay. The MSHR kept track of that for us. On the above logs, actually <code>Cache: global: handleSnoop</code> is the line in question.</p>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>And at long long last, the CPU2 LDR finishes:</p>
|
||||
</div>
|
||||
<div class="literalblock">
|
||||
<div class="content">
|
||||
<pre>471215000: Event: system.cpu2.dcache.cpu_side-CpuSidePort.wrapped_function_event: EventFunctionWrapped 138 executed @ 471215000
|
||||
471215000: SimpleCPU: system.cpu2.dcache_port: Received load/store response 0x2068
|
||||
471215000: Event: Event_136: Timing CPU dcache tick 136 scheduled @ 471215000
|
||||
471215000: Event: Event_136: Timing CPU dcache tick 136 executed @ 471215000
|
||||
471202000: ExecEnable: system.cpu2: A0 T0 : @my_thread_main+24 : ldr x1, [x2] : MemRead : D=0x000000000000002f A=0x412068 flags=(IsInteger|IsMemRef|IsLoad)</pre>
|
||||
</div>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>We note therefore that no DRAM access was involved, one cache services the other directly!</p>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>Tested on LKMC d429552cdeb0fc0a08cff8e627bf501eaffb068f + 1, gem5 3ca404da175a66e0b958165ad75eb5f54cb5e772.</p>
|
||||
<p>Tested on LKMC 4f82f79be7b0717c12924f4c9b7c4f46f8f18e2f + 1, gem5 3ca404da175a66e0b958165ad75eb5f54cb5e772 with a hack to add packet IDs and data to <code>Packet::print</code>.</p>
|
||||
</div>
|
||||
</div>
|
||||
<div class="sect4">
|
||||
@@ -40594,7 +40770,7 @@ IN: main
|
||||
<p>Correct outcome: <a href="#gem5-simulate-limit-reached">gem5 simulate() limit reached</a>.</p>
|
||||
</div>
|
||||
<div class="paragraph">
|
||||
<p>Incorrect behaviour due to: <a href="https://gem5.atlassian.net/browse/GEM5-537" class="bare">https://gem5.atlassian.net/browse/GEM5-537</a>: Exits successfully. */</p>
|
||||
<p>Incorrect behaviour due to: <a href="https://gem5.atlassian.net/browse/GEM5-537" class="bare">https://gem5.atlassian.net/browse/GEM5-537</a>: Exits successfully.</p>
|
||||
</div>
|
||||
</li>
|
||||
</ul>
|
||||
@@ -42802,7 +42978,7 @@ CoreMark-PRO 25016.00 6079.70 4.11
|
||||
<div class="ulist">
|
||||
<ul>
|
||||
<li>
|
||||
<p>for NVIDIA driver:</p>
|
||||
<p>for NVIDIA driver: <a href="https://askubuntu.com/questions/973605/ubuntu-17-10-boot-stuck-at-message-started-nvidia-persistence-daemon-after-ins/976578#976578" class="bare">https://askubuntu.com/questions/973605/ubuntu-17-10-boot-stuck-at-message-started-nvidia-persistence-daemon-after-ins/976578#976578</a></p>
|
||||
</li>
|
||||
<li>
|
||||
<p>for KVM, required by Android Emulator: enable virtualization extensions</p>
|
||||
|
||||
Reference in New Issue
Block a user