diff --git a/index.html b/index.html
index d31cc33..21df1d2 100644
--- a/index.html
+++ b/index.html
@@ -1353,17 +1353,19 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
 </li>
 <li><a href="#benchmarks">21.8. Benchmarks</a>
 <ul class="sectlevel3">
-<li><a href="#dhrystone">21.8.1. Dhrystone</a></li>
-<li><a href="#stream-benchmark">21.8.2. STREAM benchmark</a></li>
-<li><a href="#parsec-benchmark">21.8.3. PARSEC benchmark</a>
+<li><a href="#boost">21.8.1. Boost</a></li>
+<li><a href="#dhrystone">21.8.2. Dhrystone</a></li>
+<li><a href="#stream-benchmark">21.8.3. STREAM benchmark</a></li>
+<li><a href="#parsec-benchmark">21.8.4. PARSEC benchmark</a>
 <ul class="sectlevel4">
-<li><a href="#parsec-benchmark-without-parsecmgmt">21.8.3.1. PARSEC benchmark without parsecmgmt</a></li>
-<li><a href="#parsec-change-the-input-size">21.8.3.2. PARSEC change the input size</a></li>
-<li><a href="#parsec-benchmark-with-parsecmgmt">21.8.3.3. PARSEC benchmark with parsecmgmt</a></li>
-<li><a href="#parsec-uninstall">21.8.3.4. PARSEC uninstall</a></li>
-<li><a href="#parsec-benchmark-hacking">21.8.3.5. PARSEC benchmark hacking</a></li>
+<li><a href="#parsec-benchmark-without-parsecmgmt">21.8.4.1. PARSEC benchmark without parsecmgmt</a></li>
+<li><a href="#parsec-change-the-input-size">21.8.4.2. PARSEC change the input size</a></li>
+<li><a href="#parsec-benchmark-with-parsecmgmt">21.8.4.3. PARSEC benchmark with parsecmgmt</a></li>
+<li><a href="#parsec-uninstall">21.8.4.4. PARSEC uninstall</a></li>
+<li><a href="#parsec-benchmark-hacking">21.8.4.5. PARSEC benchmark hacking</a></li>
 </ul>
 </li>
+<li><a href="#userland-libs-directory">21.8.5. userland/libs directory</a></li>
 </ul>
 </li>
 <li><a href="#userland-content-bibliography">21.9. Userland content bibliography</a></li>
@@ -1668,7 +1670,7 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
 <li><a href="#arm-fadd-vs-vadd">24.6.3.2.1. ARM FADD vs VADD</a></li>
 </ul>
 </li>
-<li><a href="#armv8-aarch64-ld2-instruction">24.6.3.3. ARMv8 aarch64 ld2 instruction</a></li>
+<li><a href="#armv8-aarch64-ld2-instruction">24.6.3.3. ARMv8 aarch64 LD2 instruction</a></li>
 </ul>
 </li>
 <li><a href="#arm-simd-bibliography">24.6.4. ARM SIMD bibliography</a></li>
@@ -1755,9 +1757,11 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
 <ul class="sectlevel4">
 <li><a href="#arm-wfe-and-sev-instructions">27.8.3.1. ARM WFE and SEV instructions</a>
 <ul class="sectlevel5">
-<li><a href="#wfe-from-userland">27.8.3.1.1. WFE from userland</a></li>
-<li><a href="#gem5-arm-wfe">27.8.3.1.2. gem5 ARM WFE</a></li>
-<li><a href="#arm-yield-instruction">27.8.3.1.3. ARM YIELD instruction</a></li>
+<li><a href="#arm-wfe-global-monitor-events">27.8.3.1.1. ARM WFE global monitor events</a></li>
+<li><a href="#wfe-from-userland">27.8.3.1.2. WFE from userland</a></li>
+<li><a href="#armv8-spinlock-pattern">27.8.3.1.3. ARMv8 spinlock pattern</a></li>
+<li><a href="#gem5-arm-wfe">27.8.3.1.4. gem5 ARM WFE</a></li>
+<li><a href="#arm-yield-instruction">27.8.3.1.5. ARM YIELD instruction</a></li>
 </ul>
 </li>
 <li><a href="#arm-ldaxr-and-stlxr-instructions">27.8.3.2. ARM LDAXR and STLXR instructions</a></li>
@@ -1799,7 +1803,7 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
 </li>
 <li><a href="#benchmark-this-repo">29. Benchmark this repo</a>
 <ul class="sectlevel2">
-<li><a href="#continuous-integraion">29.1. Continuous integraion</a>
+<li><a href="#continuous-integration">29.1. Continuous integration</a>
 <ul class="sectlevel3">
 <li><a href="#travis">29.1.1. Travis</a></li>
 <li><a href="#circleci">29.1.2. CircleCI</a></li>
@@ -3514,7 +3518,7 @@ cd userland
 </div>
 </div>
 <div class="paragraph">
-<p>As mentioned at <a href="#user-mode-tests">User mode tests</a>, tests under <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/libs">userland/libs</a> require certain optional libraries to be installed, and are not built or tested by default.</p>
+<p>As mentioned at <a href="#userland-libs-directory">userland/libs directory</a>, tests under <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/libs">userland/libs</a> require certain optional libraries to be installed, and are not built or tested by default.</p>
 </div>
 <div class="paragraph">
 <p>You can install those libraries with:</p>
@@ -7389,7 +7393,7 @@ qw er</pre>
 <p>tests that require user interaction</p>
 </li>
 <li>
-<p>tests that take perceptible ammounts of time</p>
+<p>tests that take perceptible amounts of time</p>
 </li>
 <li>
 <p>known bugs we didn&#8217;t have time to fix ;-)</p>
@@ -7397,7 +7401,7 @@ qw er</pre>
 </ul>
 </div>
 <div class="paragraph">
-<p>Tests under <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/libs/">userland/libs/</a> depend on certain libraries being available on the target, e.g. <a href="#blas">BLAS</a> for <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/libs/openblas">userland/libs/openblas</a>. They are not run by default, but can be enabled with <code>--package</code> and <code>--package-all</code>.</p>
+<p>Tests under <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/libs/">userland/libs/</a> are only run if <code>--package</code> or <code>--package-all</code> are given as described at <a href="#userland-libs-directory">userland/libs directory</a>.</p>
 </div>
 <div class="paragraph">
 <p>The gem5 tests require building statically with build id <code>static</code>, see also: <a href="#gem5-syscall-emulation-mode">Section 10.7, &#8220;gem5 syscall emulation mode&#8221;</a>. TODO automate this better.</p>
@@ -16814,7 +16818,10 @@ run
 <p>The build outputs are automatically stored in a different directories for optimized and debug builds, which prevents <code>debug</code> files from overwriting <code>opt</code> ones. Therefore, <code>--gem5-build-id</code> is not required.</p>
 </div>
 <div class="paragraph">
-<p>The price to pay for debuggability is high however: a Linux kernel boot was about 3x slower in QEMU and 14 times slower in gem5 debug compared to opt, see benchmarks at: <a href="#benchmark-linux-kernel-boot">Section 29.2.1, &#8220;Benchmark Linux kernel boot&#8221;</a></p>
+<p>The price to pay for debuggability is high however: a Linux kernel boot was about 3x slower in QEMU and 14 times slower in gem5 debug compared to opt, see benchmarks at: <a href="#benchmark-linux-kernel-boot">Section 29.2.1, &#8220;Benchmark Linux kernel boot&#8221;</a>.</p>
+</div>
+<div class="paragraph">
+<p>Similar slowdowns can be observed at: <a href="#benchmark-emulators-on-userland-executables">Section 29.2.2, &#8220;Benchmark emulators on userland executables&#8221;</a>.</p>
 </div>
 <div class="paragraph">
 <p>When in <a href="#qemu-text-mode">QEMU text mode</a>, using <code>--debug-vm</code> makes Ctrl-C not get passed to the QEMU guest anymore: it is instead captured by GDB itself, so allow breaking. So e.g. you won&#8217;t be able to easily quit from a guest program like:</p>
@@ -16839,7 +16846,7 @@ run
 <p>While GDB "has" this feature, it is just too broken to be usable, and so we expose the amazing Mozilla RR tool conveniently in this repo: <a href="https://stackoverflow.com/questions/1470434/how-does-reverse-debugging-work/53063242#53063242" class="bare">https://stackoverflow.com/questions/1470434/how-does-reverse-debugging-work/53063242#53063242</a></p>
 </div>
 <div class="paragraph">
-<p>Before the first usage:</p>
+<p>Before the first usage setup rr with:</p>
 </div>
 <div class="literalblock">
 <div class="content">
@@ -16856,7 +16863,17 @@ sudo sysctl -p</pre>
 </div>
 </div>
 <div class="paragraph">
-<p>This will first run the program once until completion, and then restart the program at the very first instruction at <code>_start</code> and leave you in a GDB shell.</p>
+<p>This will:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>first run the program once until completion or crash</p>
+</li>
+<li>
+<p>then restart the program at the very first instruction at <code>_start</code> and leave you in a GDB shell</p>
+</li>
+</ul>
 </div>
 <div class="paragraph">
 <p>From there, run the program until your point of interest, e.g.:</p>
@@ -16879,6 +16896,14 @@ continue</pre>
 </div>
 </div>
 <div class="paragraph">
+<p>The use case of <code>rr</code> is often to go to the final crash and then walk back from there, so you often want to automate running until the end after record with <code>--debug-vm-args</code> as in:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>./run --debug-vm-args='-ex continue' --debug-vm-rr --userland userland/c/hello.c</pre>
+</div>
+</div>
+<div class="paragraph">
 <p>Programs often tend to blow up in very low frames that use values passed in from higher frames. In those cases, remember that just like with forward debugging, you can&#8217;t just go:</p>
 </div>
 <div class="literalblock">
@@ -22101,7 +22126,7 @@ make menuconfig</pre>
 <p>Also mentioned at: <a href="https://stackoverflow.com/questions/47320800/how-to-clean-only-target-in-buildroot" class="bare">https://stackoverflow.com/questions/47320800/how-to-clean-only-target-in-buildroot</a></p>
 </div>
 <div class="paragraph">
-<p>See this for a sample manual workaround: <a href="#parsec-uninstall">Section 21.8.3.4, &#8220;PARSEC uninstall&#8221;</a>.</p>
+<p>See this for a sample manual workaround: <a href="#parsec-uninstall">Section 21.8.4.4, &#8220;PARSEC uninstall&#8221;</a>.</p>
 </div>
 </div>
 <div class="sect2">
@@ -22982,6 +23007,26 @@ echo 1 &gt; /proc/sys/vm/overcommit_memory
 </ul>
 </div>
 </li>
+<li>
+<p>containers</p>
+<div class="ulist">
+<ul>
+<li>
+<p>associative</p>
+<div class="ulist">
+<ul>
+<li>
+<p><a href="#algorithms">Algorithms</a> contains a benchmark comparison of different c++ containers</p>
+</li>
+<li>
+<p><a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/cpp/set.cpp">userland/cpp/set.cpp</a>: <code>std::set</code> contains unique keys</p>
+</li>
+</ul>
+</div>
+</li>
+</ul>
+</div>
+</li>
 </ul>
 </div>
 <div class="sect3">
@@ -24220,7 +24265,23 @@ cblas_dgemm(      CblasColMajor, CblasNoTrans, CblasTrans,3,3,2  ,1,    A,3,  B,
 </ul>
 </div>
 <div class="sect3">
-<h4 id="dhrystone"><a class="anchor" href="#dhrystone"></a><a class="link" href="#dhrystone">21.8.1. Dhrystone</a></h4>
+<h4 id="boost"><a class="anchor" href="#boost"></a><a class="link" href="#boost">21.8.1. Boost</a></h4>
+<div class="paragraph">
+<p><a href="https://en.wikipedia.org/wiki/Boost_(C%2B%2B_libraries" class="bare">https://en.wikipedia.org/wiki/Boost_(C%2B%2B_libraries</a>)</p>
+</div>
+<div class="paragraph">
+<p><a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/libs/boost">userland/libs/boost</a></p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/libs/boost/bimap.cpp">userland/libs/boost/bimap.cpp</a></p>
+</li>
+</ul>
+</div>
+</div>
+<div class="sect3">
+<h4 id="dhrystone"><a class="anchor" href="#dhrystone"></a><a class="link" href="#dhrystone">21.8.2. Dhrystone</a></h4>
 <div class="paragraph">
 <p><a href="https://en.wikipedia.org/wiki/Dhrystone" class="bare">https://en.wikipedia.org/wiki/Dhrystone</a></p>
 </div>
@@ -24317,7 +24378,7 @@ cblas_dgemm(      CblasColMajor, CblasNoTrans, CblasTrans,3,3,2  ,1,    A,3,  B,
 </div>
 </div>
 <div class="sect3">
-<h4 id="stream-benchmark"><a class="anchor" href="#stream-benchmark"></a><a class="link" href="#stream-benchmark">21.8.2. STREAM benchmark</a></h4>
+<h4 id="stream-benchmark"><a class="anchor" href="#stream-benchmark"></a><a class="link" href="#stream-benchmark">21.8.3. STREAM benchmark</a></h4>
 <div class="paragraph">
 <p><a href="http://www.cs.virginia.edu/stream/ref.html" class="bare">http://www.cs.virginia.edu/stream/ref.html</a></p>
 </div>
@@ -24391,7 +24452,7 @@ times[3 * ntimes + k] = mysecond() - times[3 * ntimes + k];
 </div>
 </div>
 <div class="sect3">
-<h4 id="parsec-benchmark"><a class="anchor" href="#parsec-benchmark"></a><a class="link" href="#parsec-benchmark">21.8.3. PARSEC benchmark</a></h4>
+<h4 id="parsec-benchmark"><a class="anchor" href="#parsec-benchmark"></a><a class="link" href="#parsec-benchmark">21.8.4. PARSEC benchmark</a></h4>
 <div class="paragraph">
 <p>We have ported parts of the <a href="http://parsec.cs.princeton.edu">PARSEC benchmark</a> for cross compilation at: <a href="https://github.com/cirosantilli/parsec-benchmark" class="bare">https://github.com/cirosantilli/parsec-benchmark</a> See the documentation on that repo to find out which benchmarks have been ported. Some of the benchmarks were are segfaulting, they are documented in that repo.</p>
 </div>
@@ -24409,7 +24470,7 @@ times[3 * ntimes + k] = mysecond() - times[3 * ntimes + k];
 </ul>
 </div>
 <div class="sect4">
-<h5 id="parsec-benchmark-without-parsecmgmt"><a class="anchor" href="#parsec-benchmark-without-parsecmgmt"></a><a class="link" href="#parsec-benchmark-without-parsecmgmt">21.8.3.1. PARSEC benchmark without parsecmgmt</a></h5>
+<h5 id="parsec-benchmark-without-parsecmgmt"><a class="anchor" href="#parsec-benchmark-without-parsecmgmt"></a><a class="link" href="#parsec-benchmark-without-parsecmgmt">21.8.4.1. PARSEC benchmark without parsecmgmt</a></h5>
 <div class="literalblock">
 <div class="content">
 <pre>./build --arch arm --download-dependencies gem5-buildroot parsec-benchmark
@@ -24443,7 +24504,7 @@ times[3 * ntimes + k] = mysecond() - times[3 * ntimes + k];
 </div>
 </div>
 <div class="sect4">
-<h5 id="parsec-change-the-input-size"><a class="anchor" href="#parsec-change-the-input-size"></a><a class="link" href="#parsec-change-the-input-size">21.8.3.2. PARSEC change the input size</a></h5>
+<h5 id="parsec-change-the-input-size"><a class="anchor" href="#parsec-change-the-input-size"></a><a class="link" href="#parsec-change-the-input-size">21.8.4.2. PARSEC change the input size</a></h5>
 <div class="paragraph">
 <p>Running a benchmark of a size different than <code>test</code>, e.g. <code>simsmall</code>, requires a rebuild with:</p>
 </div>
@@ -24507,7 +24568,7 @@ times[3 * ntimes + k] = mysecond() - times[3 * ntimes + k];
 </div>
 </div>
 <div class="sect4">
-<h5 id="parsec-benchmark-with-parsecmgmt"><a class="anchor" href="#parsec-benchmark-with-parsecmgmt"></a><a class="link" href="#parsec-benchmark-with-parsecmgmt">21.8.3.3. PARSEC benchmark with parsecmgmt</a></h5>
+<h5 id="parsec-benchmark-with-parsecmgmt"><a class="anchor" href="#parsec-benchmark-with-parsecmgmt"></a><a class="link" href="#parsec-benchmark-with-parsecmgmt">21.8.4.3. PARSEC benchmark with parsecmgmt</a></h5>
 <div class="paragraph">
 <p>Most users won&#8217;t want to use this method because:</p>
 </div>
@@ -24570,7 +24631,7 @@ parsecmgmt -a run -p splash2x.fmm -i test</pre>
 </div>
 </div>
 <div class="sect4">
-<h5 id="parsec-uninstall"><a class="anchor" href="#parsec-uninstall"></a><a class="link" href="#parsec-uninstall">21.8.3.4. PARSEC uninstall</a></h5>
+<h5 id="parsec-uninstall"><a class="anchor" href="#parsec-uninstall"></a><a class="link" href="#parsec-uninstall">21.8.4.4. PARSEC uninstall</a></h5>
 <div class="paragraph">
 <p>If you want to remove PARSEC later, Buildroot doesn&#8217;t provide an automated package removal mechanism as mentioned at: <a href="#remove-buildroot-packages">Section 20.6, &#8220;Remove Buildroot packages&#8221;</a>, but the following procedure should be satisfactory:</p>
 </div>
@@ -24588,7 +24649,7 @@ parsecmgmt -a run -p splash2x.fmm -i test</pre>
 </div>
 </div>
 <div class="sect4">
-<h5 id="parsec-benchmark-hacking"><a class="anchor" href="#parsec-benchmark-hacking"></a><a class="link" href="#parsec-benchmark-hacking">21.8.3.5. PARSEC benchmark hacking</a></h5>
+<h5 id="parsec-benchmark-hacking"><a class="anchor" href="#parsec-benchmark-hacking"></a><a class="link" href="#parsec-benchmark-hacking">21.8.4.5. PARSEC benchmark hacking</a></h5>
 <div class="paragraph">
 <p>If you end up going inside <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/submodules/parsec-benchmark">submodules/parsec-benchmark</a> to hack up the benchmark (you will!), these tips will be helpful.</p>
 </div>
@@ -24640,6 +24701,21 @@ git clean -xdf .</pre>
 </div>
 </div>
 </div>
+<div class="sect3">
+<h4 id="userland-libs-directory"><a class="anchor" href="#userland-libs-directory"></a><a class="link" href="#userland-libs-directory">21.8.5. userland/libs directory</a></h4>
+<div class="paragraph">
+<p>Tests under <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/libs">userland/libs</a> require certain optional libraries to be installed on the target, and are not built or tested by default, you must enable them with either:</p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>--package &lt;package&gt;
+--package-all</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>See for example <a href="#blas">BLAS</a>.</p>
+</div>
+</div>
 </div>
 <div class="sect2">
 <h3 id="userland-content-bibliography"><a class="anchor" href="#userland-content-bibliography"></a><a class="link" href="#userland-content-bibliography">21.9. Userland content bibliography</a></h3>
@@ -29308,7 +29384,7 @@ AArch64, see Procedure Call Standard for the ARM 64-bit Architecture.</p>
 </div>
 </div>
 <div class="sect4">
-<h5 id="armv8-aarch64-ld2-instruction"><a class="anchor" href="#armv8-aarch64-ld2-instruction"></a><a class="link" href="#armv8-aarch64-ld2-instruction">24.6.3.3. ARMv8 aarch64 ld2 instruction</a></h5>
+<h5 id="armv8-aarch64-ld2-instruction"><a class="anchor" href="#armv8-aarch64-ld2-instruction"></a><a class="link" href="#armv8-aarch64-ld2-instruction">24.6.3.3. ARMv8 aarch64 LD2 instruction</a></h5>
 <div class="paragraph">
 <p>Example: <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/arch/aarch64/ld2.S">userland/arch/aarch64/ld2.S</a></p>
 </div>
@@ -31048,6 +31124,9 @@ IN: main
 <p><a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/arch/aarch64/freestanding/linux/wfe.S">userland/arch/aarch64/freestanding/linux/wfe.S</a></p>
 </li>
 <li>
+<p><a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/arch/aarch64/freestanding/linux/sevl_wfe.S">userland/arch/aarch64/freestanding/linux/sevl_wfe.S</a></p>
+</li>
+<li>
 <p><a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/arch/aarch64/freestanding/linux/wfe_wfe.S">userland/arch/aarch64/freestanding/linux/wfe_wfe.S</a>: run WFE twice, because gem5 390a74f59934b85d91489f8a563450d8321b602d does not sleep on the first, see also: <a href="#gem5-arm-wfe">gem5 ARM WFE</a></p>
 </li>
 <li>
@@ -31091,9 +31170,6 @@ IN: main
 <p>and power consumption is key in ARM applications.</p>
 </div>
 <div class="paragraph">
-<p>SEV is not the only thing that can wake up a WFE, it is only an explicit software way to do it. Notably, global monitor operations on memory accesses of regions marked by LDAXR and STLXR instructions can also wake up a WFE sleeping core. This is done to allow spinlocks opens to automatically wake up WFE sleeping cores at free time without the need for a explicit SEV.</p>
-</div>
-<div class="paragraph">
 <p>Quotes for the above <a href="#armarm8-db">ARMv8 architecture reference manual db</a> G1.18.1 "Wait For Event and Send Event":</p>
 </div>
 <div class="quoteblock">
@@ -31185,7 +31261,38 @@ IN: main
 <p>For how userland spinlocks and mutexes are implemented see <a href="#userland-mutex-implementation">Userland mutex implementation</a>.</p>
 </div>
 <div class="sect5">
-<h6 id="wfe-from-userland"><a class="anchor" href="#wfe-from-userland"></a><a class="link" href="#wfe-from-userland">27.8.3.1.1. WFE from userland</a></h6>
+<h6 id="arm-wfe-global-monitor-events"><a class="anchor" href="#arm-wfe-global-monitor-events"></a><a class="link" href="#arm-wfe-global-monitor-events">27.8.3.1.1. ARM WFE global monitor events</a></h6>
+<div class="paragraph">
+<p>Examples:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/arch/aarch64/inline_asm/wfe_ldxr_stxr.cpp">userland/arch/aarch64/inline_asm/wfe_ldxr_stxr.cpp</a></p>
+</li>
+<li>
+<p><a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/arch/aarch64/inline_asm/wfe_ldxr_str.cpp">userland/arch/aarch64/inline_asm/wfe_ldxr_str.cpp</a></p>
+</li>
+</ul>
+</div>
+<div class="paragraph">
+<p>SEV is not the only thing that can wake up a WFE, it is only an explicit software way to do it.</p>
+</div>
+<div class="paragraph">
+<p>Notably, global monitor operations on memory accesses of regions marked by <a href="#arm-ldxr-and-stxr-instructions">LDAXR and STLXR instructions</a> can also wake up a WFE sleeping core.</p>
+</div>
+<div class="paragraph">
+<p>This is done to allow spinlocks opens to automatically wake up WFE sleeping cores at free time without the need for a explicit SEV.</p>
+</div>
+<div class="paragraph">
+<p>In the shown in the <code>wfe_ldxr_stxr.cpp</code> example, which can only terminate in gem5 user mode simulation because due to this event.</p>
+</div>
+<div class="paragraph">
+<p>Note that that program still terminates when running on top of the Linux kernel as explained at: <a href="#wfe-from-userland">WFE from userland</a>.</p>
+</div>
+</div>
+<div class="sect5">
+<h6 id="wfe-from-userland"><a class="anchor" href="#wfe-from-userland"></a><a class="link" href="#wfe-from-userland">27.8.3.1.2. WFE from userland</a></h6>
 <div class="paragraph">
 <p>WFE and SEV are usable from userland, and are part of an efficient spinlock implementation (which userland should arguably stay away from and rather use the <a href="#futex-system-call">futex system call</a> which allow for non busy sleep instead), which maybe is not something that userland should ever tho and just stick to mutexes?</p>
 </div>
@@ -31272,14 +31379,46 @@ IN: main
 <li>
 <p>after a few interrupt handler instructions, the first <a href="#arm-svc-instruction">ERET</a> instruction exits the handler and comes back directly to the instruction after the WFE at PC 0x400080 == 0x40007c + 4</p>
 </li>
+<li>
+<p>the execution of the interrupt handler woke up the core that was in WFE, and it now continues normal execution past the WFE</p>
+</li>
 </ul>
 </div>
 <div class="paragraph">
 <p>Therefore, a WFE in userland is treated much like a busy loop by the Linux kernel: the kernel does not seem to try and explicitly make up room for other processes as would happen on a futex.</p>
 </div>
+<div class="paragraph">
+<p>The following test checks that SEV events don&#8217;t wake up a futexes, running forever in case of success. In <a href="#gem5-syscall-emulation-multithreading">gem5 syscall emulation multithreading</a>, this is crucial to prevent deadlocks:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p><a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/arch/aarch64/inline_asm/futex_sev.cpp">userland/arch/aarch64/inline_asm/futex_sev.cpp</a></p>
+</li>
+</ul>
+</div>
 </div>
 <div class="sect5">
-<h6 id="gem5-arm-wfe"><a class="anchor" href="#gem5-arm-wfe"></a><a class="link" href="#gem5-arm-wfe">27.8.3.1.2. gem5 ARM WFE</a></h6>
+<h6 id="armv8-spinlock-pattern"><a class="anchor" href="#armv8-spinlock-pattern"></a><a class="link" href="#armv8-spinlock-pattern">27.8.3.1.3. ARMv8 spinlock pattern</a></h6>
+<div class="paragraph">
+<p><a href="http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka16277.html" class="bare">http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka16277.html</a></p>
+</div>
+<div class="literalblock">
+<div class="content">
+<pre>       sev
+   1:  wfe
+   2:  ldaxr  w1, [w0]
+       cbnz   w1, %1b
+       stxr   w1, w2, [w0]
+       cbnz   w1, %2b</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>It is the <a href="#arm-ldxr-and-stxr-instructions">STXR</a> from the unlock on another core that automatically wakes up the spinlock afterwards: <a href="https://stackoverflow.com/questions/32276313/how-is-a-spin-lock-woken-up-in-linux-arm64" class="bare">https://stackoverflow.com/questions/32276313/how-is-a-spin-lock-woken-up-in-linux-arm64</a></p>
+</div>
+</div>
+<div class="sect5">
+<h6 id="gem5-arm-wfe"><a class="anchor" href="#gem5-arm-wfe"></a><a class="link" href="#gem5-arm-wfe">27.8.3.1.4. gem5 ARM WFE</a></h6>
 <div class="paragraph">
 <p>gem5 390a74f59934b85d91489f8a563450d8321b602d does not sleep on the first WFE on either syscall emulation or full system, because the code does:</p>
 </div>
@@ -31321,7 +31460,7 @@ IN: main
 </div>
 </div>
 <div class="sect5">
-<h6 id="arm-yield-instruction"><a class="anchor" href="#arm-yield-instruction"></a><a class="link" href="#arm-yield-instruction">27.8.3.1.3. ARM YIELD instruction</a></h6>
+<h6 id="arm-yield-instruction"><a class="anchor" href="#arm-yield-instruction"></a><a class="link" href="#arm-yield-instruction">27.8.3.1.5. ARM YIELD instruction</a></h6>
 <div class="paragraph">
 <p><a href="https://stackoverflow.com/questions/59311066/how-does-the-arm-yield-instruction-inform-other-threads-that-they-could-start-a" class="bare">https://stackoverflow.com/questions/59311066/how-does-the-arm-yield-instruction-inform-other-threads-that-they-could-start-a</a></p>
 </div>
@@ -32338,9 +32477,9 @@ cd -
 </div>
 </div>
 <div class="sect2">
-<h3 id="continuous-integraion"><a class="anchor" href="#continuous-integraion"></a><a class="link" href="#continuous-integraion">29.1. Continuous integraion</a></h3>
+<h3 id="continuous-integration"><a class="anchor" href="#continuous-integration"></a><a class="link" href="#continuous-integration">29.1. Continuous integration</a></h3>
 <div class="paragraph">
-<p>We have exploreed a few Continuous integration solutions.</p>
+<p>We have explored a few Continuous integration solutions.</p>
 </div>
 <div class="paragraph">
 <p>We haven&#8217;t setup any of them yet.</p>
@@ -32354,7 +32493,7 @@ cd -
 <div class="sect3">
 <h4 id="circleci"><a class="anchor" href="#circleci"></a><a class="link" href="#circleci">29.1.2. CircleCI</a></h4>
 <div class="paragraph">
-<p>This setup sucessfully built gem5 on every commit: <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/.circleci/config.yml">.circleci/config.yml</a></p>
+<p>This setup successfully built gem5 on every commit: <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/.circleci/config.yml">.circleci/config.yml</a></p>
 </div>
 <div class="paragraph">
 <p>Enabling it is however blocked on: <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/issues/79" class="bare">https://github.com/cirosantilli/linux-kernel-module-cheat/issues/79</a> so we disabled the builds on the web UI.</p>
@@ -32570,6 +32709,15 @@ instructions 124346081</pre>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">a18f28e263c91362519ef550150b5c9d75fa3679 + 1</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/gcc/busy_loop.c">userland/gcc/busy_loop.c</a> <code>-O0</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>gem5 --arch aarch64 --gem5-build-id debug</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">10^5</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">32</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">2.528728 * 10^6</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">0.08</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">a18f28e263c91362519ef550150b5c9d75fa3679 + 1</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/gcc/busy_loop.c">userland/gcc/busy_loop.c</a> <code>-O0</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><code>gem5 --arch aarch64 -- --cpu-type MinorCPU --caches</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">10^6</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">31</p></td>
@@ -32614,7 +32762,7 @@ instructions 124346081</pre>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">ab6f7331406b22f8ab6e2df5f8b8e464fb35b611</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock"><a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/c/m5ops.c">userland/c/m5ops.c</a> <code>-O0</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">glibc C pre-main <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/c/m5ops.c">userland/c/m5ops.c</a> <code>-O0</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><code>gem5 --arch aarch64 --userland-args e</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">1</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">2</p></td>
@@ -32623,13 +32771,49 @@ instructions 124346081</pre>
 </tr>
 <tr>
 <td class="tableblock halign-left valign-top"><p class="tableblock">ab6f7331406b22f8ab6e2df5f8b8e464fb35b611</p></td>
-<td class="tableblock halign-left valign-top"><p class="tableblock"><a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/cpp/m5ops.cpp">userland/cpp/m5ops.cpp</a> <code>-O0</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">glibc C pre-main <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/c/m5ops.c">userland/c/m5ops.c</a> <code>-O0</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>gem5 --arch aarch64 --userland-args e --gem5-build-type debug</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">1</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">2</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">1.26479 * 10^5</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">0.05</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">ab6f7331406b22f8ab6e2df5f8b8e464fb35b611</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">glibc C++ pre-main <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/cpp/m5ops.cpp">userland/cpp/m5ops.cpp</a> <code>-O0</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock"><code>gem5 --arch aarch64 --userland-args e</code></p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">1</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">2</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">2.385012 * 10^6</p></td>
 <td class="tableblock halign-left valign-top"><p class="tableblock">1</p></td>
 </tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">ab6f7331406b22f8ab6e2df5f8b8e464fb35b611</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">glibc C++ pre-main <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/cpp/m5ops.cpp">userland/cpp/m5ops.cpp</a> <code>-O0</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>gem5 --arch aarch64 --userland-args e --gem5-build-type debug</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">1</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">25</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">2.385012 * 10^6</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">0.1</p></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">ab6f7331406b22f8ab6e2df5f8b8e464fb35b611</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">immediate exit <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/arch/aarch64/freestanding/linux/gem5_exit.S">userland/arch/aarch64/freestanding/linux/gem5_exit.S</a> <code>-O0</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>gem5 --arch aarch64</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">1</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">1</p></td>
+<td class="tableblock halign-left valign-top"></td>
+<td class="tableblock halign-left valign-top"></td>
+</tr>
+<tr>
+<td class="tableblock halign-left valign-top"><p class="tableblock">ab6f7331406b22f8ab6e2df5f8b8e464fb35b611</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">immediate exit <a href="https://github.com/cirosantilli/linux-kernel-module-cheat/blob/master/userland/arch/aarch64/freestanding/linux/gem5_exit.S">userland/arch/aarch64/freestanding/linux/gem5_exit.S</a> <code>-O0</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock"><code>gem5 --arch aarch64 --gem5-build-type debug</code></p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">1</p></td>
+<td class="tableblock halign-left valign-top"><p class="tableblock">1</p></td>
+<td class="tableblock halign-left valign-top"></td>
+<td class="tableblock halign-left valign-top"></td>
+</tr>
 </tbody>
 </table>
 <div class="paragraph">
@@ -32748,7 +32932,7 @@ instructions 124346081</pre>
 <p>First we build <a href="#dhrystone">Dhrystone</a> manually statically since dynamic linking is broken in gem5 as explained at: <a href="#gem5-syscall-emulation-mode">Section 10.7, &#8220;gem5 syscall emulation mode&#8221;</a>.</p>
 </div>
 <div class="paragraph">
-<p>TODO: move this section to our new custom dhrystone setup: <a href="#dhrystone">Section 21.8.1, &#8220;Dhrystone&#8221;</a>.</p>
+<p>TODO: move this section to our new custom dhrystone setup: <a href="#dhrystone">Section 21.8.2, &#8220;Dhrystone&#8221;</a>.</p>
 </div>
 <div class="paragraph">
 <p>gem5 user mode:</p>