From 3b4929ec44d9ce318baf589cea3ecf9f6eadb620 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ciro=20Santilli=20=E5=85=AD=E5=9B=9B=E4=BA=8B=E4=BB=B6=20?= =?UTF-8?q?=E6=B3=95=E8=BD=AE=E5=8A=9F?= Date: Fri, 20 Mar 2020 00:00:00 +0000 Subject: [PATCH] 831d9d8372c63a9dd6e9e45d1738c81445761cea --- index.html | 546 ++++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 412 insertions(+), 134 deletions(-) diff --git a/index.html b/index.html index d34c89f..f546834 100644 --- a/index.html +++ b/index.html @@ -477,7 +477,7 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
  • 1.1.3. About the QEMU Buildroot setup
  • -
  • 1.2. Dry run to get commands for your project
  • +
  • 1.2. Dry run to get commands for your project
  • 1.3. gem5 Buildroot setup
    -

    See also: Profiling userland programs.

    +

    Profiling techniques are discussed in more detail at: Profiling userland programs.

    +
    +
    +

    For the prof build, you can get the gmon.out file with:

    +
    +
    +
    +
    ./run --arch aarch64 --emulator gem5 --userland userland/c/hello.c --gem5-build-type prof
    +gprof "$(./getvar --arch aarch64 gem5_executable)" > tmp.gprof
    +
    @@ -23264,80 +23554,7 @@ info: Entering event queue @ 0. Starting simulation...
    -

    19.19.5. gem5 stats internals

    -
    -

    This describes the internals of the gem5 m5out/stats.txt file.

    -
    -
    -

    GDB call stack to dumpstats:

    -
    -
    -
    -
    Stats::pythonDump () at build/ARM/python/pybind11/stats.cc:58
    -Stats::StatEvent::process() ()
    -GlobalEvent::BarrierEvent::process (this=0x555559fa6a80) at build/ARM/sim/global_event.cc:131
    -EventQueue::serviceOne (this=this@entry=0x555558c36080) at build/ARM/sim/eventq.cc:228
    -doSimLoop (eventq=0x555558c36080) at build/ARM/sim/simulate.cc:219
    -simulate (num_cycles=<optimized out>) at build/ARM/sim/simulate.cc:132
    -
    -
    -
    -

    Stats::pythonDump does:

    -
    -
    -
    -
    void
    -pythonDump()
    -{
    -    py::module m = py::module::import("m5.stats");
    -    m.attr("dump")();
    -}
    -
    -
    -
    -

    This calls src/python/m5/stats/init.py in def dump does the main dumping

    -
    -
    -

    That function does notably:

    -
    -
    -
    -
        for output in outputList:
    -        if output.valid():
    -            output.begin()
    -            for stat in stats_list:
    -                stat.visit(output)
    -            output.end()
    -
    -
    -
    -

    begin and end are defined in C++ and output the header and tail respectively

    -
    -
    -
    -
    void
    -Text::begin()
    -{
    -    ccprintf(*stream, "\n---------- Begin Simulation Statistics ----------\n");
    -}
    -
    -void
    -Text::end()
    -{
    -    ccprintf(*stream, "\n---------- End Simulation Statistics   ----------\n");
    -    stream->flush();
    -}
    -
    -
    -
    -

    stats_list contains the stats, and stat.visit prints them, outputList contains by default just the text output. I don’t see any other types of output in gem5, but likely JSON / binary formats could be envisioned.

    -
    -
    -

    Tested in gem5 b4879ae5b0b6644e6836b0881e4da05c64a6550d.

    -
    -
    -
    -

    19.19.6. gem5 code generation

    +

    19.19.5. gem5 code generation

    gem5 uses a ton of code generation, which makes the project horrendous:

    @@ -23382,7 +23599,7 @@ Text::end()

    But it has been widely overused to insanity. It likely also exists partly because when the project started in 2003 C++ compilers weren’t that good, so you couldn’t rely on features like templates that much.

    -
    19.19.6.1. gem5 THE_ISA
    +
    19.19.5.1. gem5 THE_ISA

    Generated code at: build/<ISA>/config/the_isa.hh which contains amongst other lines:

    @@ -23409,9 +23626,9 @@ enum class Arch {
    -

    19.19.7. gem5 build system

    +

    19.19.6. gem5 build system

    -
    19.19.7.1. gem5 build broken on recent compiler version
    +
    19.19.6.1. gem5 build broken on recent compiler version

    gem5 moves a bit slowly, and if your host compiler is very new, the gem5 build might be broken for it, e.g. this was the case for Ubuntu 19.10 with GCC 9 and gem5 62d75e7105fe172eb906d4f80f360ff8591d4178 from Dec 2019.

    @@ -23436,7 +23653,7 @@ enum class Arch {
    -
    19.19.7.2. gem5 polymorphic ISA includes
    +
    19.19.6.2. gem5 polymorphic ISA includes

    E.g. src/cpu/decode_cache.hh includes:

    @@ -23515,7 +23732,7 @@ build/ARM/config/the_isa.hh
  • -

    userland/cpp/if_constexpr.cpp: C++17 if constexpr

    +

    userland/cpp/if_constexpr.cpp: C++17 if constexpr: https://stackoverflow.com/questions/12160765/if-else-at-compile-time-in-c/54647315#54647315

  • @@ -24895,7 +25112,55 @@ echo 1 > /proc/sys/vm/overcommit_memory
    -

    21.2.1. C++ multithreading

    +

    21.2.1. C++ initialization types

    +
    +

    OMG this is hell, understand when primitive variables are initialized or not:

    +
    + +
    +

    Intuition:

    +
    +
    + +
    +
    +

    Good rule:

    +
    +
    +
      +
    • +

      initialize every single variable explicitly to prevent the risk of having uninitialized variables due to programmer error (which is easy to get wrong due to insane rules)

      +
    • +
    • +

      if you don’t define your own default constructor, always = delete it instead. This prevents the possibility that variables will be assigned twice due to zero initialization

      +
    • +
    +
    +
    +
    +

    21.2.2. C++ multithreading

    • @@ -24923,7 +25188,7 @@ echo 1 > /proc/sys/vm/overcommit_memory
    -
    21.2.1.1. atomic.cpp
    +
    21.2.2.1. atomic.cpp
    @@ -25127,7 +25392,7 @@ time ./mutex.out 4 100000000
    -
    21.2.1.2. C++ std::memory_order
    +
    21.2.2.2. C++ std::memory_order
    @@ -25136,7 +25401,7 @@ time ./mutex.out 4 100000000
    -
    21.2.1.3. C++ parallel algorithms
    +
    21.2.2.3. C++ parallel algorithms
    @@ -25146,7 +25411,7 @@ time ./mutex.out 4 100000000
    -

    21.2.2. C++ standards

    +

    21.2.3. C++ standards

    Like for C, you have to pay for the standards…​ insane. So we just use the closest free drafts instead.

    @@ -25154,14 +25419,14 @@ time ./mutex.out 4 100000000

    https://stackoverflow.com/questions/81656/where-do-i-find-the-current-c-or-c-standard-documents

    -
    21.2.2.1. C++17 N4659 standards draft
    +
    21.2.3.1. C++17 N4659 standards draft
    -

    21.2.3. C++ type casting

    +

    21.2.4. C++ type casting

    @@ -26111,7 +26376,7 @@ xdg-open bst_vs_heap_vs_hashmap_gem5.tmp.png

    The cache sizes were chosen to match the host P51 to improve the comparison. Ideally we should also use the same standard library.

    -

    Note that this will take a long time, and will produce a humongous ~40Gb stats file as explained at: Section 19.9.3.1, “gem5 only dump selected stats”

    +

    Note that this will take a long time, and will produce a humongous ~40Gb stats file as explained at: Section 19.9.3.2, “gem5 only dump selected stats”

    Sources:

    @@ -26713,7 +26978,17 @@ git clean -xdf .

    Binary format to store data. TODO vs databases, notably SQLite: https://datascience.stackexchange.com/questions/262/hierarchical-data-format-what-are-the-advantages-compared-to-alternative-format

    -

    Examples: userland/libs/hdf5

    +

    Examples:

    +
    +
    +
    @@ -34649,6 +34924,9 @@ instructions 124346081

    Same but with Buildroot vanilla kernel (kernel v4.19): 44s to blow up at "Please append a correct "root=" boot option; here are the available partitions" because missing some filesystem mount option. But likely wouldn’t be much more until after boot since we are almost already done by then! Therefore this vanilla kernel is much much faster! TODO find which config or kernel commit added so much time! Also that kernel is tiny at 8.5MB.

    +

    Same but hacking BR2_LINUX_KERNEL_LATEST_VERSION=y and BR2_PACKAGE_HOST_LINUX_HEADERS_CUSTOM_5_3=y which reaches kernel 5.3.14 which closer to the LKMC one 5.4.3: 40s, which is very similar for the older kernel. Therefore it does not loook like it is a problem of kernel code changes, but rather of configs.

    +
    +

    Same but with: gem5 arm Linux kernel patches at v4.15: 73s, kernel size: 132M.