5d08bfeeb2

2026-01-25 03:01:36 +01:00 · 2020-03-13 00:00:00 +00:00
parent 1f21f33ba7
commit 45877a196f
1 changed files with 400 additions and 80 deletions
--- a/index.html
+++ b/index.html
@@ -548,6 +548,7 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
 <li><a href="#gdb-step-debug-early-boot">2.5. GDB step debug early boot</a>
 <ul class="sectlevel3">
 <li><a href="#gdb-step-debug-early-boot-by-address">2.5.1. GDB step debug early boot by address</a></li>
+<li><a href="#linux-kernel-early-boot-messages">2.5.2. Linux kernel early boot messages</a></li>
 </ul>
 </li>
 <li><a href="#gdb-step-debug-userland-processes">2.6. GDB step debug userland processes</a>
@@ -1160,12 +1161,13 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
 <li><a href="#m5out-directory">19.9. m5out directory</a>
 <ul class="sectlevel3">
 <li><a href="#gem5-m5out-system-terminal-file">19.9.1. gem5 m5out/system.terminal file</a></li>
-<li><a href="#gem5-m5out-stats-txt-file">19.9.2. gem5 m5out/stats.txt file</a>
+<li><a href="#gem5-m5out-system-dmesg-file">19.9.2. gem5 m5out/system.dmesg file</a></li>
+<li><a href="#gem5-m5out-stats-txt-file">19.9.3. gem5 m5out/stats.txt file</a>
 <ul class="sectlevel4">
-<li><a href="#gem5-only-dump-selected-stats">19.9.2.1. gem5 only dump selected stats</a></li>
+<li><a href="#gem5-only-dump-selected-stats">19.9.3.1. gem5 only dump selected stats</a></li>
 </ul>
 </li>
-<li><a href="#gem5-config-ini">19.9.3. gem5 config.ini</a></li>
+<li><a href="#gem5-config-ini">19.9.4. gem5 config.ini</a></li>
 </ul>
 </li>
 <li><a href="#m5term">19.10. m5term</a></li>
@@ -1309,12 +1311,13 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
 <li><a href="#report-upstream-bugs">20.9. Report upstream bugs</a></li>
 <li><a href="#libc-choice">20.10. libc choice</a></li>
 <li><a href="#buildroot-hello-world">20.11. Buildroot hello world</a></li>
-<li><a href="#update-the-toolchain">20.12. Update the toolchain</a>
+<li><a href="#update-the-buildroot-toolchain">20.12. Update the Buildroot toolchain</a>
 <ul class="sectlevel3">
 <li><a href="#update-gcc-gcc-supported-by-buildroot">20.12.1. Update GCC: GCC supported by Buildroot</a></li>
 <li><a href="#update-gcc-gcc-not-supported-by-buildroot">20.12.2. Update GCC: GCC not supported by Buildroot</a></li>
 </ul>
 </li>
+<li><a href="#buildroot-vanilla-kernel">20.13. Buildroot vanilla kernel</a></li>
 </ul>
 </li>
 <li><a href="#userland-content">21. Userland content</a>
@@ -1432,15 +1435,16 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
 <li><a href="#parsec-benchmark-hacking">21.8.4.5. PARSEC benchmark hacking</a></li>
 </ul>
 </li>
-<li><a href="#userland-libs-directory">21.8.5. userland/libs directory</a>
-<ul class="sectlevel4">
-<li><a href="#hdf5">21.8.5.1. HDF5</a></li>
 </ul>
 </li>
+<li><a href="#micro-benchmarks">21.9. Micro benchmarks</a></li>
+<li><a href="#userland-libs-directory">21.10. userland/libs directory</a>
+<ul class="sectlevel3">
+<li><a href="#hdf5">21.10.1. HDF5</a></li>
 </ul>
 </li>
-<li><a href="#userland-content-filename-conventions">21.9. Userland content filename conventions</a></li>
-<li><a href="#userland-content-bibliography">21.10. Userland content bibliography</a></li>
+<li><a href="#userland-content-filename-conventions">21.11. Userland content filename conventions</a></li>
+<li><a href="#userland-content-bibliography">21.12. Userland content bibliography</a></li>
 </ul>
 </li>
 <li><a href="#userland-assembly">22. Userland assembly</a>
@@ -1925,7 +1929,12 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
 <li><a href="#benchmark-this-repo-bibliography">29.5. Benchmark this repo bibliography</a></li>
 </ul>
 </li>
-<li><a href="#xephyr">30. Xephyr</a></li>
+<li><a href="#rtos">30. RTOS</a>
+<ul class="sectlevel2">
+<li><a href="#zephyr">30.1. Zephyr</a></li>
+<li><a href="#arm-mbed">30.2. ARM Mbed</a></li>
+</ul>
+</li>
 <li><a href="#compilers">31. Compilers</a>
 <ul class="sectlevel2">
 <li><a href="#prevent-statement-reordering">31.1. Prevent statement reordering</a></li>
@@ -1936,11 +1945,16 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
 <ul class="sectlevel2">
 <li><a href="#cache-coherence">32.1. Cache coherence</a>
 <ul class="sectlevel3">
-<li><a href="#vi-protocol">32.1.1. VI protocol</a></li>
-<li><a href="#msi-protocol">32.1.2. MSI protocol</a></li>
-<li><a href="#mesi-protocol">32.1.3. MESI protocol</a></li>
-<li><a href="#mosi-protocol">32.1.4. MOSI protocol</a></li>
-<li><a href="#moesi-protocol">32.1.5. MOESI protocol</a></li>
+<li><a href="#can-caches-snoop-data-from-other-caches">32.1.1. Can caches snoop data from other caches?</a></li>
+<li><a href="#vi-cache-coherence-protocol">32.1.2. VI cache coherence protocol</a></li>
+<li><a href="#msi-cache-coherence-protocol">32.1.3. MSI cache coherence protocol</a>
+<ul class="sectlevel4">
+<li><a href="#msi-cache-coherence-protocol-with-transient-states">32.1.3.1. MSI cache coherence protocol with transient states</a></li>
+</ul>
+</li>
+<li><a href="#mesi-cache-coherence-protocol">32.1.4. MESI cache coherence protocol</a></li>
+<li><a href="#mosi-cache-coherence-protocol">32.1.5. MOSI cache coherence protocol</a></li>
+<li><a href="#moesi-cache-coherence-protocol">32.1.6. MOESI cache coherence protocol</a></li>
 </ul>
 </li>
 </ul>
@@ -2003,7 +2017,7 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
 </li>
 <li><a href="#buildroot_packages-directory">33.14.2. buildroot_packages directory</a>
 <ul class="sectlevel4">
-<li><a href="#kernel_modules-buildroot-package">33.14.2.1. kernel_modules buildroot package</a></li>
+<li><a href="#kernel-modules-buildroot-package">33.14.2.1. kernel_modules buildroot package</a></li>
 </ul>
 </li>
 <li><a href="#patches-directory">33.14.3. patches directory</a>
@@ -4927,6 +4941,47 @@ echo 'file kernel/module.c +p' &gt; /sys/kernel/debug/dynamic_debug/control
 <p>and no I do have the symbols from <code>arch/arm/boot/compressed/vmlinux'</code>, but the breaks still don&#8217;t work.</p>
 </div>
 </div>
+<div class="sect3">
+<h4 id="linux-kernel-early-boot-messages"><a class="anchor" href="#linux-kernel-early-boot-messages"></a><a class="link" href="#linux-kernel-early-boot-messages">2.5.2. Linux kernel early boot messages</a></h4>
+<div class="paragraph">
+<p>When booting Linux on a slow emulator like <a href="#gem5">gem5</a>, what you observe is that:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>first nothing shows for a while</p>
+</li>
+<li>
+<p>then at once, a bunch of message lines show at once followed on aarch64 Linux 5.4.3 by:</p>
+<div class="literalblock">
+<div class="content">
+<pre>[    0.081311] printk: console [ttyAMA0] enabled</pre>
+</div>
+</div>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>This means of course that all the previous messages had been generated earlier and stored, but were only printed to the terminal once the terminal itself was enabled.</p>
+</div>
+<div class="paragraph">
+<p>Notably for example the very first message:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd070]</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>happens very early in the boot process.</p>
+</div>
+<div class="paragraph">
+<p>If you get a failure before that, it will be hard to see the print messages.</p>
+</div>
+<div class="paragraph">
+<p>One possible solution is to parse the dmesg buffer, gem5 actually implements that: <a href="#gem5-m5out-system-dmesg-file">gem5 m5out/system.dmesg file</a>.</p>
+</div>
+</div>
 </div>
 <div class="sect2">
 <h3 id="gdb-step-debug-userland-processes"><a class="anchor" href="#gdb-step-debug-userland-processes"></a><a class="link" href="#gdb-step-debug-userland-processes">2.6. GDB step debug userland processes</a></h3>
@@ -8114,7 +8169,7 @@ hello
 <div class="ulist">
 <ul>
 <li>
-<p>modules built with Buildroot, see: <a href="#kernel_modules-buildroot-package">Section 33.14.2.1, &#8220;kernel_modules buildroot package&#8221;</a></p>
+<p>modules built with Buildroot, see: <a href="#kernel-modules-buildroot-package">Section 33.14.2.1, &#8220;kernel_modules buildroot package&#8221;</a></p>
 </li>
 <li>
 <p>modules built from the kernel tree itself, see: <a href="#dummy-irq">Section 15.12.2, &#8220;dummy-irq&#8221;</a></p>
@@ -10853,7 +10908,7 @@ extra/dep.ko:</pre>
 <p>Unlike <code>insmod</code>, <a href="#modprobe">modprobe</a> deals with kernel module dependencies for us.</p>
 </div>
 <div class="paragraph">
-<p>First get <a href="#kernel_modules-buildroot-package">kernel_modules buildroot package</a> working.</p>
+<p>First get <a href="#kernel-modules-buildroot-package">kernel_modules buildroot package</a> working.</p>
 </div>
 <div class="paragraph">
 <p>Then, for example:</p>
@@ -17832,6 +17887,19 @@ root</pre>
 <div class="paragraph">
 <p>Getting started at: <a href="#gem5-buildroot-setup">Section 1.3, &#8220;gem5 Buildroot setup&#8221;</a>.</p>
 </div>
+<div class="paragraph">
+<p>gem5 has a bunch of crappiness, mostly described at: <a href="#gem5-vs-qemu">gem5 vs QEMU</a>, but it does deserve some credit on the following points:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>insanely configurable system topology from Python without recompiling, made possible in part due to a well defined memory packet structure that allows adding caches and buses transparently</p>
+</li>
+<li>
+<p>each micro architectural model (<a href="#gem5-cpu-types">gem5 CPU types</a>) works with all ISAs</p>
+</li>
+</ul>
+</div>
 <div class="sect2">
 <h3 id="gem5-vs-qemu"><a class="anchor" href="#gem5-vs-qemu"></a><a class="link" href="#gem5-vs-qemu">19.1. gem5 vs QEMU</a></h3>
 <div class="ulist">
@@ -19931,7 +19999,22 @@ git -C "$(./getvar linux_source_dir)" checkout -
 </div>
 </div>
 <div class="sect3">
-<h4 id="gem5-m5out-stats-txt-file"><a class="anchor" href="#gem5-m5out-stats-txt-file"></a><a class="link" href="#gem5-m5out-stats-txt-file">19.9.2. gem5 m5out/stats.txt file</a></h4>
+<h4 id="gem5-m5out-system-dmesg-file"><a class="anchor" href="#gem5-m5out-system-dmesg-file"></a><a class="link" href="#gem5-m5out-system-dmesg-file">19.9.2. gem5 m5out/system.dmesg file</a></h4>
+<div class="paragraph">
+<p>TODO confirm and create minimal example.</p>
+</div>
+<div class="paragraph">
+<p>I think this file is capable of showing terminal messages before they reach the terminal by parsing the dmesg buffer from memory.</p>
+</div>
+<div class="paragraph">
+<p>This could be used to debug the Linux kernel boot if problems happen before the serial is enabled: <a href="#linux-kernel-early-boot-messages">Linux kernel early boot messages</a>.</p>
+</div>
+<div class="paragraph">
+<p>The file appears to get dumped only on kernel panic which gem5 can detect by the PC address: <a href="#exit-gem5-on-panic">Exit gem5 on panic</a>.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="gem5-m5out-stats-txt-file"><a class="anchor" href="#gem5-m5out-stats-txt-file"></a><a class="link" href="#gem5-m5out-stats-txt-file">19.9.3. gem5 m5out/stats.txt file</a></h4>
 <div class="paragraph">
 <p>This file contains important statistics about the run:</p>
 </div>
@@ -19964,7 +20047,7 @@ system.cpu.dtb.inst_hits</pre>
 <p>For x86, it is interesting to try and correlate <code>numCycles</code> with:</p>
 </div>
 <div class="sect4">
-<h5 id="gem5-only-dump-selected-stats"><a class="anchor" href="#gem5-only-dump-selected-stats"></a><a class="link" href="#gem5-only-dump-selected-stats">19.9.2.1. gem5 only dump selected stats</a></h5>
+<h5 id="gem5-only-dump-selected-stats"><a class="anchor" href="#gem5-only-dump-selected-stats"></a><a class="link" href="#gem5-only-dump-selected-stats">19.9.3.1. gem5 only dump selected stats</a></h5>
 <div class="paragraph">
 <p>TODO</p>
 </div>
@@ -19977,7 +20060,7 @@ system.cpu.dtb.inst_hits</pre>
 </div>
 </div>
 <div class="sect3">
-<h4 id="gem5-config-ini"><a class="anchor" href="#gem5-config-ini"></a><a class="link" href="#gem5-config-ini">19.9.3. gem5 config.ini</a></h4>
+<h4 id="gem5-config-ini"><a class="anchor" href="#gem5-config-ini"></a><a class="link" href="#gem5-config-ini">19.9.4. gem5 config.ini</a></h4>
 <div class="paragraph">
 <p>The <code>m5out/config.ini</code> file, contains a very good high level description of the system:</p>
 </div>
@@ -20266,7 +20349,7 @@ clock=500</pre>
 </div>
 <div class="literalblock">
 <div class="content">
-<pre>./gem5-regression --gem5-worktree master --arch aarch64 --cmd list</pre>
+<pre>./gem5-regression --arch aarch64 --cmd list</pre>
 </div>
 </div>
 <div class="paragraph">
@@ -20575,7 +20658,7 @@ Indirect leak of 1346 byte(s) in 2 object(s) allocated from:
 <p>Ruby seems to have usage outside of gem5, but the naming overload with the <a href="https://en.wikipedia.org/wiki/Ruby_(programming_language)">Ruby programming language</a>, which also has <a href="https://thoughtbot.com/blog/writing-a-domain-specific-language-in-ruby">domain specific languages</a> as a concept, makes it impossible to google anything about it!</p>
 </div>
 <div class="paragraph">
-<p>Since it is not the default, Ruby is generally less stable that the classic memory model. However, because it allows describing a wide variety of important <a href="#cache-coherence">cache coherency protocols</a>, while the classic system only describes a single protocol, Ruby is very importanonly describes a single protocol, Ruby is a very important feature of gem5.</p>
+<p>Since it is not the default, Ruby is generally less stable that the classic memory model. However, because it allows describing a wide variety of important <a href="#cache-coherence">cache coherence protocols</a>, while the classic system only describes a single protocol, Ruby is very importanonly describes a single protocol, Ruby is a very important feature of gem5.</p>
 </div>
 <div class="paragraph">
 <p>Ruby support must be enabled at compile time with the <code>scons PROTOCOL=</code> flag, which compiles support for the desired memory system type.</p>
@@ -20605,7 +20688,7 @@ Indirect leak of 1346 byte(s) in 2 object(s) allocated from:
 </ul>
 </div>
 <div class="paragraph">
-<p>For example, to use a two level <a href="#mesi-protocol">MESI protocol</a> we can do:</p>
+<p>For example, to use a two level <a href="#mesi-cache-coherence-protocol">MESI cache coherence protocol</a> we can do:</p>
 </div>
 <div class="literalblock">
 <div class="content">
@@ -22085,24 +22168,24 @@ info: Entering event queue @ 0.  Starting simulation...
    |
    +---+
    |   |
-    6   7       6 DRAMCtrl::processNextReqEvent
+    6   7       6 DRAMCtrl::processNextReqEvent (0)
    8   15      7 BaseXBar::Layer::releaseLayer
    |
 +---+---+
 |   |   |
 9   10  11      9 DRAMCtrl::Rank::processActivateEvent
-12  17  16     10 DRAMCtrl::processRespondEvent
-|   |          11 DRAMCtrl::processNextReqEvent
+12  17  16     10 DRAMCtrl::processRespondEvent (46.25)
+|   |          11 DRAMCtrl::processNextReqEvent (5)
 |   |
 13  18         13 DRAMCtrl::Rank::processPowerEvent
-14  19         18 PacketQueue::processSendEvent
+14  19         18 PacketQueue::processSendEvent (28)
    |
    +---+
    |   |
-    20  21     20 PacketQueue::processSendEvent
+    20  21     20 PacketQueue::processSendEvent  (2.75)
    23  22     21 BaseXBar::Layer&lt;SrcType, DstType&gt;::releaseLayer
    |
-    24         24 TimingSimpleCPU::IcachePort::ITickEvent::process
+    24         24 TimingSimpleCPU::IcachePort::ITickEvent::process (0)
    25
    |
    +---+
@@ -22138,8 +22221,8 @@ info: Entering event queue @ 0.  Starting simulation...
 <div class="literalblock">
 <div class="content">
 <pre>    |   |
-    6   7    6 DRAMCtrl::processNextReqEvent
-    8   15   7 BaseXBar::Layer::releaseLayer
+    6   7    6 DRAMCtrl::processNextReqEvent (0)
+    8   15   7 BaseXBar::Layer::releaseLayer (0)
    |</pre>
 </div>
 </div>
@@ -22149,13 +22232,13 @@ info: Entering event queue @ 0.  Starting simulation...
 <div class="ulist">
 <ul>
 <li>
-<p><code>6</code>: schedule <code>DRAMCtrl::processNextReqEvent</code></p>
+<p><code>6</code>: schedule <code>DRAMCtrl::processNextReqEvent</code> to run in <code>0</code> ns after the execution that scheduled it</p>
 </li>
 <li>
 <p><code>8</code>: execute <code>DRAMCtrl::processNextReqEvent</code></p>
 </li>
 <li>
-<p><code>7</code>: schedule <code>BaseXBar::Layer::releaseLayer</code></p>
+<p><code>7</code>: schedule <code>BaseXBar::Layer::releaseLayer</code> to run in <code>0</code> ns after the execution that scheduled it</p>
 </li>
 <li>
 <p><code>15</code>: execute <code>BaseXBar::Layer::releaseLayer</code></p>
@@ -22171,6 +22254,45 @@ info: Entering event queue @ 0.  Starting simulation...
 <div class="paragraph">
 <p>Observe how the events leading up to the second instruction are basically a copy of those of the first one, this is the basic <code>TimingSimpleCPU</code> event loop in action.</p>
 </div>
+<div class="paragraph">
+<p>One line summary of events:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>#5: adds the request to the DRAM queue, and schedules a <code>DRAMCtrl::processNextReqEvent</code> which later sees that request immediately</p>
+</li>
+<li>
+<p>#8: picks up the only request from the DRAM read queue (<code>readQueue</code>) and services that.</p>
+<div class="paragraph">
+<p>If there were multiple requests, priority arbitration under <code>DRAMCtrl::chooseNext</code> could chose a different one than the first based on packet priorities</p>
+</div>
+<div class="paragraph">
+<p>This puts the request on the response queue <code>respQueue</code> and schedules another <code>DRAMCtrl::processNextReqEvent</code> but the request queue is empty, and that does nos schedule further events</p>
+</div>
+</li>
+<li>
+<p>#17: picks up the only request from the DRAM response queue and services that by placing it in yet another queue, and scheduling the <code>PacketQueue::processSendEvent</code> which will later pick up that packet</p>
+</li>
+<li>
+<p>#19: picks up the request from the previous queue, and forwards it to another queue, and schedules yet another <code>PacketQueue::processSendEvent</code></p>
+<div class="paragraph">
+<p>The current one is the DRAM passing the message to the XBar, and the next <code>processSendEvent</code> is the XBar finally sending it back to the CPU</p>
+</div>
+</li>
+<li>
+<p>#23: the XBar port is actually sending the reply back.</p>
+<div class="paragraph">
+<p>If knows to which CPU core to send the request to because ports keep a map of request to source:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>const auto route_lookup = routeTo.find(pkt-&gt;req);</pre>
+</div>
+</div>
+</li>
+</ul>
+</div>
 <div class="sect5">
 <h6 id="timingsimplecpu-analysis-0"><a class="anchor" href="#timingsimplecpu-analysis-0"></a><a class="link" href="#timingsimplecpu-analysis-0">19.19.4.2.1. TimingSimpleCPU analysis #0</a></h6>
 <div class="paragraph">
@@ -23485,9 +23607,6 @@ build/ARM/config/the_isa.hh
 <p>Perhaps the awesomeness of Buildroot only sinks in once you notice that all it takes is 4 commands as explained at <a href="#buildroot-hello-world">Section 20.11, &#8220;Buildroot hello world&#8221;</a>.</p>
 </div>
 <div class="paragraph">
-<p>This repo basically wraps around that, and tries to make everything even more awesome for kernel developers.</p>
-</div>
-<div class="paragraph">
 <p>The downsides of Buildroot are:</p>
 </div>
 <div class="ulist">
@@ -23504,6 +23623,28 @@ build/ARM/config/the_isa.hh
 <p>The hard part is dealing with crappy third party build systems and huge dependency chains.</p>
 </div>
 </li>
+<li>
+<p>it is written in Make and Bash rather than Python like LKMC</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>This repo basically wraps around that, and tries to make everything even more awesome for kernel developers by adding the capability of seamlessly running the stuff you&#8217;ve built on emulators usually via <code>./run</code>.</p>
+</div>
+<div class="paragraph">
+<p>As this repo develops however, we&#8217;ve started taking some of the build out of Buildroot, e.g. notably the <a href="#buildroot-vanilla-kernel">Linux kernel</a> to have more build flexibility and faster build startup times.</p>
+</div>
+<div class="paragraph">
+<p>Therefore, more and more, this repo wants to take over everything that Buildroot does, and one day completely replace it to achieve emulation Nirvana, see e.g.:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><a href="https://github.com/cirosantilli/linux-kernel-module-cheat/issues/116" class="bare">https://github.com/cirosantilli/linux-kernel-module-cheat/issues/116</a></p>
+</li>
+<li>
+<p><a href="https://github.com/cirosantilli/linux-kernel-module-cheat/issues/117" class="bare">https://github.com/cirosantilli/linux-kernel-module-cheat/issues/117</a></p>
+</li>
 </ul>
 </div>
 </div>
@@ -24065,7 +24206,7 @@ git -C "$(./getvar qemu_source_dir)" checkout -
 </div>
 </div>
 <div class="sect2">
-<h3 id="update-the-toolchain"><a class="anchor" href="#update-the-toolchain"></a><a class="link" href="#update-the-toolchain">20.12. Update the toolchain</a></h3>
+<h3 id="update-the-buildroot-toolchain"><a class="anchor" href="#update-the-buildroot-toolchain"></a><a class="link" href="#update-the-buildroot-toolchain">20.12. Update the Buildroot toolchain</a></h3>
 <div class="paragraph">
 <p>Users of this repo will often want to update the compilation toolchain to the latest version to get fresh new features like new ISA instructions.</p>
 </div>
@@ -24256,6 +24397,36 @@ cd ../..
 </div>
 </div>
 </div>
+<div class="sect2">
+<h3 id="buildroot-vanilla-kernel"><a class="anchor" href="#buildroot-vanilla-kernel"></a><a class="link" href="#buildroot-vanilla-kernel">20.13. Buildroot vanilla kernel</a></h3>
+<div class="paragraph">
+<p>By default, our build system uses <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/build-linux">build-linux</a>, and the Buildroot kernel build is disabled: <a href="https://stackoverflow.com/questions/52231793/can-buildroot-build-the-root-filesystem-without-building-the-linux-kernel" class="bare">https://stackoverflow.com/questions/52231793/can-buildroot-build-the-root-filesystem-without-building-the-linux-kernel</a></p>
+</div>
+<div class="paragraph">
+<p>There are however some cases where we want that ability, e.g.: <a href="#kernel-modules-buildroot-package">kernel_modules buildroot package</a> and <a href="#benchmark-linux-kernel-boot">Benchmark Linux kernel boot</a>.</p>
+</div>
+<div class="paragraph">
+<p>The build of the kernel can be enabled with the <code>--build-kernel</code> option of <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/build-buildroot">build-buildroot</a>.</p>
+</div>
+<div class="paragraph">
+<p>For example, to build the kernel and then boot it you could do:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>./build-buildroot --arch aarch64 --build-linux
+./run --arch aarch64 --linux-exec "$(./getvar --arch aarch64 TODO)/vmlinux"</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>TODO: fails on LKMC d53ffcff18aa26d24ea34b86fb80e4a5694378dch with "ERROR: No hash found for linux-4.19.16.tar.xz": <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/issues/115" class="bare">https://github.com/cirosantilli/linux-kernel-module-cheat/issues/115</a></p>
+</div>
+<div class="paragraph">
+<p>Note that this kernel is not configured at all by LKMC, and there is no support to do that currently: the Buildroot default kernel configs for a target are used unchanged, e.g. <code>make qemu_aarch64_virt_defconfig</code>,see also; <a href="#buildroot-kernel-config">About Buildroot&#8217;s kernel configs</a>.</p>
+</div>
+<div class="paragraph">
+<p>Therefore, this kernel might be missing certain key capabilities, e.g. filesystem support required to boot.</p>
+</div>
+</div>
 </div>
 </div>
 <div class="sect1">
@@ -25940,7 +26111,7 @@ xdg-open bst_vs_heap_vs_hashmap_gem5.tmp.png</pre>
 <p>The cache sizes were chosen to match the host <a href="#p51">P51</a> to improve the comparison. Ideally we should also use the same standard library.</p>
 </div>
 <div class="paragraph">
-<p>Note that this will take a long time, and will produce a humongous ~40Gb stats file as explained at: <a href="#gem5-only-dump-selected-stats">Section 19.9.2.1, &#8220;gem5 only dump selected stats&#8221;</a></p>
+<p>Note that this will take a long time, and will produce a humongous ~40Gb stats file as explained at: <a href="#gem5-only-dump-selected-stats">Section 19.9.3.1, &#8220;gem5 only dump selected stats&#8221;</a></p>
 </div>
 <div class="paragraph">
 <p>Sources:</p>
@@ -26505,8 +26676,22 @@ git clean -xdf .</pre>
 </div>
 </div>
 </div>
-<div class="sect3">
-<h4 id="userland-libs-directory"><a class="anchor" href="#userland-libs-directory"></a><a class="link" href="#userland-libs-directory">21.8.5. userland/libs directory</a></h4>
+</div>
+<div class="sect2">
+<h3 id="micro-benchmarks"><a class="anchor" href="#micro-benchmarks"></a><a class="link" href="#micro-benchmarks">21.9. Micro benchmarks</a></h3>
+<div class="paragraph">
+<p>It eventually has to come to that, hasn&#8217;t it?</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/gcc/busy_loop.c">userland/gcc/busy_loop.c</a> described at <a href="#infinite-busy-loop">Infinite busy loop</a></p>
+</li>
+</ul>
+</div>
+</div>
+<div class="sect2">
+<h3 id="userland-libs-directory"><a class="anchor" href="#userland-libs-directory"></a><a class="link" href="#userland-libs-directory">21.10. userland/libs directory</a></h3>
 <div class="paragraph">
 <p>Tests under <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/libs">userland/libs</a> require certain optional libraries to be installed on the target, and are not built or tested by default, you must enable them with either:</p>
 </div>
@@ -26519,8 +26704,8 @@ git clean -xdf .</pre>
 <div class="paragraph">
 <p>See for example <a href="#blas">BLAS</a>.</p>
 </div>
-<div class="sect4">
-<h5 id="hdf5"><a class="anchor" href="#hdf5"></a><a class="link" href="#hdf5">21.8.5.1. HDF5</a></h5>
+<div class="sect3">
+<h4 id="hdf5"><a class="anchor" href="#hdf5"></a><a class="link" href="#hdf5">21.10.1. HDF5</a></h4>
 <div class="paragraph">
 <p><a href="https://en.wikipedia.org/wiki/Hierarchical_Data_Format" class="bare">https://en.wikipedia.org/wiki/Hierarchical_Data_Format</a></p>
 </div>
@@ -26532,9 +26717,8 @@ git clean -xdf .</pre>
 </div>
 </div>
 </div>
-</div>
 <div class="sect2">
-<h3 id="userland-content-filename-conventions"><a class="anchor" href="#userland-content-filename-conventions"></a><a class="link" href="#userland-content-filename-conventions">21.9. Userland content filename conventions</a></h3>
+<h3 id="userland-content-filename-conventions"><a class="anchor" href="#userland-content-filename-conventions"></a><a class="link" href="#userland-content-filename-conventions">21.11. Userland content filename conventions</a></h3>
 <div class="paragraph">
 <p>The following basenames should always refer to programs that do the same thing, but in different languages:</p>
 </div>
@@ -26563,7 +26747,7 @@ git clean -xdf .</pre>
 </div>
 </div>
 <div class="sect2">
-<h3 id="userland-content-bibliography"><a class="anchor" href="#userland-content-bibliography"></a><a class="link" href="#userland-content-bibliography">21.10. Userland content bibliography</a></h3>
+<h3 id="userland-content-bibliography"><a class="anchor" href="#userland-content-bibliography"></a><a class="link" href="#userland-content-bibliography">21.12. Userland content bibliography</a></h3>
 <div class="ulist">
 <ul>
 <li>
@@ -34381,7 +34565,7 @@ cat "$(./getvar test_boot_benchmark_file)"</pre>
 </div>
 </div>
 <div class="paragraph">
-<p>Sample results at 8fb9db39316d43a6dbd571e04dd46ae73915027f:</p>
+<p>Sample results at LKMC 8fb9db39316d43a6dbd571e04dd46ae73915027f:</p>
 </div>
 <div class="literalblock">
 <div class="content">
@@ -34455,6 +34639,18 @@ instructions 124346081</pre>
 <div class="paragraph">
 <p>TODO: aarch64 gem5 and QEMU use the same kernel, so why is the gem5 instruction count so much much higher?</p>
 </div>
+<div class="paragraph">
+<p><a href="#p51">P51</a> Ubuntu 19.10 LKMC b11e3cd9fb5df0e3fe61de28e8264bbc95ea9005 gem5 e779c19dbb51ad2f7699bd58a5c7827708e12b55 aarch64: 143s. Why huge increases from 70s on above table? Kernel size is also huge BTW: 147MB.</p>
+</div>
+<div class="paragraph">
+<p>Note that <a href="https://gem5.atlassian.net/browse/GEM5-337" class="bare">https://gem5.atlassian.net/browse/GEM5-337</a> "ARM PAuth patch slows down Linux boot 2x from 2 minutes to 4 minutes" was already semi fixed at that point.</p>
+</div>
+<div class="paragraph">
+<p>Same but with <a href="#buildroot-vanilla-kernel">Buildroot vanilla kernel</a> (kernel v4.19): 44s to blow up at "Please append a correct "root=" boot option; here are the available partitions" because missing some filesystem mount option. But likely wouldn&#8217;t be much more until after boot since we are almost already done by then! Therefore this vanilla kernel is much much faster! TODO find which config or kernel commit added so much time! Also that kernel is tiny at 8.5MB.</p>
+</div>
+<div class="paragraph">
+<p>Same but with: <a href="#gem5-arm-linux-kernel-patches">gem5 arm Linux kernel patches</a> at v4.15: 73s, kernel size: 132M.</p>
+</div>
 <div class="sect4">
 <h5 id="gem5-arm-hpi-boot-takes-much-longer-than-aarch64"><a class="anchor" href="#gem5-arm-hpi-boot-takes-much-longer-than-aarch64"></a><a class="link" href="#gem5-arm-hpi-boot-takes-much-longer-than-aarch64">29.2.1.1. gem5 arm HPI boot takes much longer than aarch64</a></h5>
 <div class="paragraph">
@@ -35052,7 +35248,7 @@ tail -n+1 ../linux-kernel-module-cheat-regression/*/gem5-bench-build-*.txt</pre>
 <p>Same but gem5 d7d9bc240615625141cd6feddbadd392457e49eb (2018-06-17) hacked with <code>-Wnoerror</code>: 11m 37s. So there was a huge regression in the last two years! We have to find it out.</p>
 </div>
 <div class="paragraph">
-<p>A profiling of the build has been done at: <a href="https://gem5.atlassian.net/browse/GEM5-277" class="bare">https://gem5.atlassian.net/browse/GEM5-277</a></p>
+<p>A profiling of the build has been done at: <a href="https://gem5.atlassian.net/browse/GEM5-277" class="bare">https://gem5.atlassian.net/browse/GEM5-277</a> Analysis there showed that d7d9bc240615625141cd6feddbadd392457e49eb (2018-06-17) is also composed of 50% pybind11 and with no obvious time sinks.</p>
 </div>
 <div class="sect5">
 <h6 id="pybind11-accounts-for-50-of-gem5-build-time"><a class="anchor" href="#pybind11-accounts-for-50-of-gem5-build-time"></a><a class="link" href="#pybind11-accounts-for-50-of-gem5-build-time">29.2.3.3.1. pybind11 accounts for 50% of gem5 build time</a></h6>
@@ -35206,10 +35402,15 @@ tail -n+1 ../linux-kernel-module-cheat-regression/*/gem5-bench-build-*.txt</pre>
 </div>
 </div>
 <div class="sect1">
-<h2 id="xephyr"><a class="anchor" href="#xephyr"></a><a class="link" href="#xephyr">30. Xephyr</a></h2>
+<h2 id="rtos"><a class="anchor" href="#rtos"></a><a class="link" href="#rtos">30. RTOS</a></h2>
 <div class="sectionbody">
+<div class="sect2">
+<h3 id="zephyr"><a class="anchor" href="#zephyr"></a><a class="link" href="#zephyr">30.1. Zephyr</a></h3>
 <div class="paragraph">
-<p>Xephyr is an RTOS that has <a href="#posix">POSIX</a> support. I think it works much like our <a href="#baremetal-setup">Baremetal setup</a> which uses Newlib and generates individual ELF files that contain both our C program&#8217;s code, and the Xephyr libraries.</p>
+<p><a href="https://en.wikipedia.org/wiki/Zephyr_(operating_system" class="bare">https://en.wikipedia.org/wiki/Zephyr_(operating_system</a>)</p>
+</div>
+<div class="paragraph">
+<p>Zephyr is an RTOS that has <a href="#posix">POSIX</a> support. I think it works much like our <a href="#baremetal-setup">Baremetal setup</a> which uses Newlib and generates individual ELF files that contain both our C program&#8217;s code, and the Zephyr libraries.</p>
 </div>
 <div class="paragraph">
 <p>TODO get a hello world working, and then consider further integration in this repo, e.g. being able to run all C userland content on it.</p>
@@ -35218,7 +35419,7 @@ tail -n+1 ../linux-kernel-module-cheat-regression/*/gem5-bench-build-*.txt</pre>
 <p>TODO: Cortex-A CPUs are not currently supported, there are some <code>qemu_cortex_m0</code> boards, but can&#8217;t find a QEMU Cortex-A. There is an x86_64 qemu board, but we don&#8217;t currently have an <a href="#about-the-baremetal-setup">x86 baremetal toolchain</a>. For this reason, we won&#8217;t touch this further for now.</p>
 </div>
 <div class="paragraph">
-<p>However, unlike Newlib, Xephyr must be setting up a simple pre-main runtime to be able to handle threads.</p>
+<p>However, unlike Newlib, Zephyr must be setting up a simple pre-main runtime to be able to handle threads.</p>
 </div>
 <div class="paragraph">
 <p>Failed attempt:</p>
@@ -35246,6 +35447,16 @@ west build -b qemu_aarch64 samples/hello_world</pre>
 <p>The build system of that project is a bit excessive / wonky. You need an edge CMake not present in Ubuntu 18.04, which I don&#8217;t want to install right now, and it uses the weird custom <code>west</code> build tool frontend.</p>
 </div>
 </div>
+<div class="sect2">
+<h3 id="arm-mbed"><a class="anchor" href="#arm-mbed"></a><a class="link" href="#arm-mbed">30.2. ARM Mbed</a></h3>
+<div class="paragraph">
+<p><a href="https://en.wikipedia.org/wiki/Mbed" class="bare">https://en.wikipedia.org/wiki/Mbed</a></p>
+</div>
+<div class="paragraph">
+<p>TODO minimal setup to run it on QEMU? Possible?</p>
+</div>
+</div>
+</div>
 </div>
 <div class="sect1">
 <h2 id="compilers"><a class="anchor" href="#compilers"></a><a class="link" href="#compilers">31. Compilers</a></h2>
@@ -35285,19 +35496,37 @@ west build -b qemu_aarch64 samples/hello_world</pre>
 <p><a href="https://en.wikipedia.org/wiki/Cache_coherence" class="bare">https://en.wikipedia.org/wiki/Cache_coherence</a></p>
 </div>
 <div class="paragraph">
-<p>Algorithms to keep the caches of different cores of a system coherent.</p>
+<p>Algorithms to keep the caches of different cores of a system coherent. Only matters for multicore systems.</p>
 </div>
 <div class="paragraph">
-<p>The main goal of such systems is to reduce the number of messages that have to be sent on the coherency bus, and most importantly, to memory (which passes first through the coherency bus).</p>
+<p>The main goal of such systems is to reduce the number of messages that have to be sent on the coherency bus, and even more importantly, to memory (which passes first through the coherency bus).</p>
 </div>
 <div class="paragraph">
-<p>E.g.: if one processors writes to the cache, other processors have to know about it before they read from that address.</p>
+<p>The main software use case example to have in mind is that of multiple threads incrementing an atomic counter as in <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/cpp/atomic/std_atomic.cpp">userland/cpp/atomic/std_atomic.cpp</a>, see also: <a href="#atomic-cpp">atomic.cpp</a>. Then, if one processors writes to the cache, other processors have to know about it before they read from that address.</p>
 </div>
 <div class="paragraph">
-<p>The main software use case example to have in mind is that of multiple threads incrementing an atomic counter as in <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/cpp/atomic/std_atomic.cpp">userland/cpp/atomic/std_atomic.cpp</a>, see also: <a href="#atomic-cpp">atomic.cpp</a>.</p>
+<p>Note that cache coherency only applies to memory read/write instructions that explicitly make coherency requirements.</p>
+</div>
+<div class="paragraph">
+<p>In most ISAs, this tends to be the minority of instructions, and is only used when something is going to modify memory that is known to be shared across threads. For example, the a <a href="#x86-thread-synchronization-primitives">x86 LOCK</a> would be used to increment atomic counters that get incremented across several threads. Outside of those cases, cache coherency is not garanteed, and behaviour is undefined.</p>
 </div>
 <div class="sect3">
-<h4 id="vi-protocol"><a class="anchor" href="#vi-protocol"></a><a class="link" href="#vi-protocol">32.1.1. VI protocol</a></h4>
+<h4 id="can-caches-snoop-data-from-other-caches"><a class="anchor" href="#can-caches-snoop-data-from-other-caches"></a><a class="link" href="#can-caches-snoop-data-from-other-caches">32.1.1. Can caches snoop data from other caches?</a></h4>
+<div class="paragraph">
+<p>Either they can snoop only control, or both control and data can be snooped.</p>
+</div>
+<div class="paragraph">
+<p>The answer to this determines if some of the following design decisions make sense.</p>
+</div>
+<div class="paragraph">
+<p>This is the central point in question at: <a href="https://electronics.stackexchange.com/questions/484830/why-is-a-flush-needed-in-the-msi-cache-coherency-protocol-when-moving-from-modif" class="bare">https://electronics.stackexchange.com/questions/484830/why-is-a-flush-needed-in-the-msi-cache-coherency-protocol-when-moving-from-modif</a></p>
+</div>
+<div class="paragraph">
+<p>If data snoops are not possible, then data must always to to DRAM first.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="vi-cache-coherence-protocol"><a class="anchor" href="#vi-cache-coherence-protocol"></a><a class="link" href="#vi-cache-coherence-protocol">32.1.2. VI cache coherence protocol</a></h4>
 <div class="paragraph">
 <p>Mentioned at:</p>
 </div>
@@ -35338,7 +35567,7 @@ west build -b qemu_aarch64 samples/hello_world</pre>
 <div class="ulist">
 <ul>
 <li>
-<p>that read is marked as exclusive, and all caches that had it snoop and become invalid.</p>
+<p>that read is marked as exclusive, and all caches that had it, snoop it become invalid.</p>
 <div class="paragraph">
 <p>Upside: no need to send the new data to the bus.</p>
 </div>
@@ -35374,7 +35603,7 @@ west build -b qemu_aarch64 samples/hello_world</pre>
 <li>
 <p>when the cache is full, eviction leads to a write to memory.</p>
 <div class="paragraph">
-<p>If multiple valid holders may exist, then this may lead to multiple</p>
+<p>If multiple valid holders may exist, then this may lead to multiple write through evictions of the same thing.</p>
 </div>
 </li>
 </ul>
@@ -35544,7 +35773,7 @@ west build -b qemu_aarch64 samples/hello_world</pre>
 </div>
 </div>
 <div class="sect3">
-<h4 id="msi-protocol"><a class="anchor" href="#msi-protocol"></a><a class="link" href="#msi-protocol">32.1.2. MSI protocol</a></h4>
+<h4 id="msi-cache-coherence-protocol"><a class="anchor" href="#msi-cache-coherence-protocol"></a><a class="link" href="#msi-cache-coherence-protocol">32.1.3. MSI cache coherence protocol</a></h4>
 <div class="paragraph">
 <p><a href="https://en.wikipedia.org/wiki/MSI_protocol" class="bare">https://en.wikipedia.org/wiki/MSI_protocol</a></p>
 </div>
@@ -35552,6 +35781,22 @@ west build -b qemu_aarch64 samples/hello_world</pre>
 <p>This is the most basic non-trivial coherency protocol, and therefore the first one you should learn.</p>
 </div>
 <div class="paragraph">
+<p>Compared to the <a href="#vi-cache-coherence-protocol">VI cache coherence protocol</a>, MSI:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>adds one bit of knowledge per cache line (shared)</p>
+</li>
+<li>
+<p>splits Valid into Modified and Shared depending on the shared bit</p>
+</li>
+<li>
+<p>this allows us to not send BusUpgr messages on the bus when writing to Modified, since we now we know that the data is not present in any other cache!</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
 <p>Helpful video: <a href="https://www.youtube.com/watch?v=gAUVAel-2Fg" class="bare">https://www.youtube.com/watch?v=gAUVAel-2Fg</a> "MSI Coherence - Georgia Tech - HPCA: Part 5" by Udacity.</p>
 </div>
 <div class="paragraph">
@@ -35677,6 +35922,9 @@ CACHE2 S nyy
 <div class="paragraph">
 <p>Therefore, it does not need to fetch the data, which saves bus traffic compared to "Bus write" since the data itself does not need to be sent.</p>
 </div>
+<div class="paragraph">
+<p>This is also called a Bus Upgrade message or BusUpgr, as it informs others that the value is going to be upgraded.</p>
+</div>
 </li>
 <li>
 <p>"Write back": send the data on the bus and tell someone to pick it up: either DRAM or another cache</p>
@@ -35746,7 +35994,7 @@ CACHE2 S nyy
 <p>Since we know what the latest data is, we can move to "Shared" rather than "Invalid" to possibly save time on future reads.</p>
 </div>
 <div class="paragraph">
-<p>But to do that, we need to write the data back to DRAM to maintain the shared state consistent. The <a href="#mesi-protocol">MESI protocol</a> prevents that extra read in some cases.</p>
+<p>But to do that, we need to write the data back to DRAM to maintain the shared state consistent. The <a href="#mesi-cache-coherence-protocol">MESI cache coherence protocol</a> prevents that extra read in some cases.</p>
 </div>
 <div class="paragraph">
 <p>And it has to be either: before the other cache gets its data from DRAM, or better, the other cache can get its data from our write back itself just like the DRAM.</p>
@@ -35836,14 +36084,33 @@ CACHE2 S nyy
 <div class="paragraph">
 <p>TODO gem5 concrete example.</p>
 </div>
+<div class="sect4">
+<h5 id="msi-cache-coherence-protocol-with-transient-states"><a class="anchor" href="#msi-cache-coherence-protocol-with-transient-states"></a><a class="link" href="#msi-cache-coherence-protocol-with-transient-states">32.1.3.1. MSI cache coherence protocol with transient states</a></h5>
+<div class="paragraph">
+<p>TODO underestand well why those are needed.</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><a href="http://learning.gem5.org/book/part3/MSI/directory.html" class="bare">http://learning.gem5.org/book/part3/MSI/directory.html</a></p>
+</li>
+<li>
+<p><a href="https://www.researchgate.net/figure/MSI-Protocol-with-Transient-States-Adapted-from-30_fig3_2531432" class="bare">https://www.researchgate.net/figure/MSI-Protocol-with-Transient-States-Adapted-from-30_fig3_2531432</a></p>
+</li>
+<li>
+<p><a href="http://csg.csail.mit.edu/6.823S16/lectures/L15.pdf" class="bare">http://csg.csail.mit.edu/6.823S16/lectures/L15.pdf</a> page 28</p>
+</li>
+</ul>
+</div>
+</div>
 </div>
 <div class="sect3">
-<h4 id="mesi-protocol"><a class="anchor" href="#mesi-protocol"></a><a class="link" href="#mesi-protocol">32.1.3. MESI protocol</a></h4>
+<h4 id="mesi-cache-coherence-protocol"><a class="anchor" href="#mesi-cache-coherence-protocol"></a><a class="link" href="#mesi-cache-coherence-protocol">32.1.4. MESI cache coherence protocol</a></h4>
 <div class="paragraph">
 <p><a href="https://en.wikipedia.org/wiki/MESI_protocol" class="bare">https://en.wikipedia.org/wiki/MESI_protocol</a></p>
 </div>
 <div class="paragraph">
-<p>Splits the Shared of <a href="#msi-protocol">MSI protocol</a> into a new Exclusive state:</p>
+<p>Splits the Shared of <a href="#msi-cache-coherence-protocol">MSI cache coherence protocol</a> into a new Exclusive state:</p>
 </div>
 <div class="ulist">
 <ul>
@@ -35851,43 +36118,96 @@ CACHE2 S nyy
 <p>MESI Exclusive: clean but only present in one cache</p>
 </li>
 <li>
-<p>MESI Shared: clean but may be present in more that one cache</p>
+<p>MESI Shared: clean but present in more that one cache</p>
 </li>
 </ul>
 </div>
 <div class="paragraph">
-<p>TODO advantage: I think the advantages over MSI are:</p>
+<p>Exclusive is entered from Invalid after a PrRd, but only if the reply came from DRAM (<a href="#can-caches-snoop-data-from-other-caches">or if we snoped that no one sent the reply to DRAM for us to read it</a>)! If the reply came from another cache, we go directly to shared instead. It is this extra information that allows for the split of S.</p>
+</div>
+<div class="paragraph">
+<p>The advantage of this over MSI is that when we move from Exclusive to Modified, no invalidate message is required, reducing bus traffic: <a href="https://en.wikipedia.org/wiki/MESI_protocol#Advantages_of_MESI_over_MSI" class="bare">https://en.wikipedia.org/wiki/MESI_protocol#Advantages_of_MESI_over_MSI</a></p>
+</div>
+<div class="paragraph">
+<p>This is a common case on read write modify loops. On MSI, it would first do PrRd, send BusRd (to move any M to S), get data, and go to Shared, then PrWr must send BusUpgr to invalidate other Shared and move to M.</p>
+</div>
+<div class="paragraph">
+<p>With MESI, the PrRd could go to E instead of S depending on who services it. If it does go to E, then the PrWr only moves it to M, there is no need to send BusUpgr because we know that no one else is in S.</p>
+</div>
+<div class="paragraph">
+<p>gem5 12c917de54145d2d50260035ba7fa614e25317a3 has two <a href="#gem5-ruby-build">Ruby</a> MESI models implemented: <code>MESI_Two_Level</code> and <code>MESI_Three_Level</code>.</p>
+</div>
+</div>
+<div class="sect3">
+<h4 id="mosi-cache-coherence-protocol"><a class="anchor" href="#mosi-cache-coherence-protocol"></a><a class="link" href="#mosi-cache-coherence-protocol">32.1.5. MOSI cache coherence protocol</a></h4>
+<div class="paragraph">
+<p><a href="https://en.wikipedia.org/wiki/MOSI_protocol" class="bare">https://en.wikipedia.org/wiki/MOSI_protocol</a> The critical MSI vs MOSI section was a bit bogus though: <a href="https://en.wikipedia.org/w/index.php?title=MOSI_protocol&amp;oldid=895443023" class="bare">https://en.wikipedia.org/w/index.php?title=MOSI_protocol&amp;oldid=895443023</a> we have to edit it.</p>
+</div>
+<div class="paragraph">
+<p>In MSI, it feels wasteful that an MS transaction needs to flush to memory: why do we need to flush right now, since even more caches now have that data? Why not wait until later ant try to gain something from this deferral?</p>
+</div>
+<div class="paragraph">
+<p>The problem with doing that in MSI, is that not flushing on an MS transaction would force us to every S eviction. So we would end up flushing even after reads!</p>
+</div>
+<div class="paragraph">
+<p>MOSI solves that by making M move to O instead of S on BusRd. Now, O is the only responsible for the flush back on eviction.</p>
+</div>
+<div class="paragraph">
+<p>So, in case we had:</p>
 </div>
 <div class="ulist">
 <ul>
 <li>
-<p>when we move from Exclusive to Shared, no DRAM write back is needed, because we know that the cache is clean</p>
+<p>processor 1: M</p>
 </li>
 <li>
-<p>when we move from Exclusive to Modified, no invalidate message is required, reducing bus traffic</p>
+<p>processor 2: I then read</p>
+</li>
+<li>
+<p>processor 1: write</p>
 </li>
 </ul>
 </div>
 <div class="paragraph">
-<p>Exclusive is entered from Invalid after a "Local read", but only if the reply came from DRAM! If the reply came from another cache, we go directly to shared instead.</p>
+<p>An MSI cahe 1 would do:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>write to main memory, go to S</p>
+</li>
+<li>
+<p>BusUpgr, go back to M, 2 back to I</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>and MOSI would do:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>go to O (no bus traffic)</p>
+</li>
+<li>
+<p>BusUpgr, go back to M</p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>This therefore saves one memory write through and its bus traffic.</p>
 </div>
 </div>
 <div class="sect3">
-<h4 id="mosi-protocol"><a class="anchor" href="#mosi-protocol"></a><a class="link" href="#mosi-protocol">32.1.4. MOSI protocol</a></h4>
-<div class="paragraph">
-<p><a href="https://en.wikipedia.org/wiki/MOSI_protocol" class="bare">https://en.wikipedia.org/wiki/MOSI_protocol</a></p>
-</div>
-<div class="paragraph">
-<p>TODO compare to MSI and understand advantages. From Wikipedia it seems that MOSI can get data from the Owned cache while MSI cannot get data from Shared caches and must go to memory, but why not? Why do we need that Owned? Is it because there are multiple Shared caches and them all replying at the same time would lead to problems?</p>
-</div>
-</div>
-<div class="sect3">
-<h4 id="moesi-protocol"><a class="anchor" href="#moesi-protocol"></a><a class="link" href="#moesi-protocol">32.1.5. MOESI protocol</a></h4>
+<h4 id="moesi-cache-coherence-protocol"><a class="anchor" href="#moesi-cache-coherence-protocol"></a><a class="link" href="#moesi-cache-coherence-protocol">32.1.6. MOESI cache coherence protocol</a></h4>
 <div class="paragraph">
 <p><a href="https://en.wikipedia.org/wiki/MOESI_protocol" class="bare">https://en.wikipedia.org/wiki/MOESI_protocol</a></p>
 </div>
 <div class="paragraph">
-<p><a href="#mesi-protocol">MESI protocol</a> + <a href="#mosi-protocol">MOSI protocol</a>, not much else to it!</p>
+<p><a href="#mesi-cache-coherence-protocol">MESI cache coherence protocol</a> + <a href="#mosi-cache-coherence-protocol">MOSI cache coherence protocol</a>, not much else to it!</p>
+</div>
+<div class="paragraph">
+<p>gem5 12c917de54145d2d50260035ba7fa614e25317a3 has several <a href="#gem5-ruby-build">Ruby</a> MOESI models implemented: <code>MOESI_AMD_Base</code>, <code>MOESI_CMP_directory</code>, <code>MOESI_CMP_token</code> and <code>MOESI_hammer</code>.</p>
 </div>
 </div>
 </div>
@@ -36920,7 +37240,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -
 <p>A custom build script can give you more flexibility: e.g. the package can be made work with other root filesystems more easily, have better <a href="#9p">9P</a> support, and rebuild faster as it evades some Buildroot boilerplate.</p>
 </div>
 <div class="sect4">
-<h5 id="kernel_modules-buildroot-package"><a class="anchor" href="#kernel_modules-buildroot-package"></a><a class="link" href="#kernel_modules-buildroot-package">33.14.2.1. kernel_modules buildroot package</a></h5>
+<h5 id="kernel-modules-buildroot-package"><a class="anchor" href="#kernel-modules-buildroot-package"></a><a class="link" href="#kernel-modules-buildroot-package">33.14.2.1. kernel_modules buildroot package</a></h5>
 <div class="paragraph">
 <p>Source: <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/buildroot_packages/kernel_modules/">buildroot_packages/kernel_modules/</a></p>
 </div>
@@ -36956,7 +37276,7 @@ git -C "$(./getvar buildroot_source_dir)" checkout -
 <p>As you have just seen, this sets up everything so that <a href="#modprobe">modprobe</a> can correctly find the module.</p>
 </div>
 <div class="paragraph">
-<p><code>./build-buildroot --build-linux</code> and <code>./run --buildroot-linux</code> are needed because the Buildroot kernel modules must use the Buildroot Linux kernel at build and run time.</p>
+<p><code>./build-buildroot --build-linux</code> and <code>./run --buildroot-linux</code> are needed because the Buildroot kernel modules must use the Buildroot Linux kernel at build and run time, see also: <a href="#buildroot-vanilla-kernel">Buildroot vanilla kernel</a>.</p>
 </div>
 <div class="paragraph">
 <p>The <code>--no-overlay</code> is required otherwise our <code>modules.order</code> generated by <code>./build-linux</code> and installed with <code>BR2_ROOTFS_OVERLAY</code> overwrites the Buildroot generated one.</p>