diff --git a/index.html b/index.html index baa3d59..4abb7ee 100644 --- a/index.html +++ b/index.html @@ -1044,7 +1044,8 @@ body.book #toc,body.book #preamble,body.book h1.sect0,body.book .sect1>h2{page-b
This can be solved by increasing the memory with:
+This can be solved by increasing the memory as explained at Memory size:
Then you could:
-break edu_mmio_read -run-
And in QEMU:
-./qemu_edu.sh-
Or for a faster development loop:
./run --debug-vm-args '-ex "break edu_mmio_read" -ex "run"'+
./run --debug-vm-args '-ex "break qemu_add_opts" -ex "run"'
Our default emulator builds are optimized with gcc -O2 -g. To use -O0 instead, build and run with:
./build-qemu --qemu-build-type debug --verbose +./run --debug-vm +./build-gem5 --gem5-build-type debug --verbose +./run --debug-vm --emulator-gem5+
The --verbose is optional, but shows clearly each GCC build command so that you can confirm what --*-build-type is doing.
The build outputs are automatically stored in a different directories for optimized and debug builds, which prevents debug files from overwriting opt ones. Therefore, --gem5-build-id is not required:
The price to pay for debuggability is high however: a Linux kernel boot was about 3x slower in QEMU and 14 times slower in gem5 debug compared to opt, see benchmarks at: Section 28.2.1, “Benchmark Linux kernel boot”
+When in QEMU text mode, using --debug-vm makes Ctrl-C not get passed to the QEMU guest anymore: it is instead captured by GDB itself, so allow breaking. So e.g. you won’t be able to easily quit from a guest program like:
You can still send key presses to QEMU however even without the mouse capture, just either click on the title bar, or alt tab to give it focus.
While step debugging any complext program, you always end up feeling the need to step in reverse to reach the last call to some function before the failure point.
+While GDB "has" this feature, it is just too broken to be usable, and so we expose the amazing Mozilla RR tool conveniently in this repo: https://stackoverflow.com/questions/1470434/how-does-reverse-debugging-work/53063242#53063242
+Before the first usage:
+echo 'kernel.perf_event_paranoid=1' | sudo tee -a /etc/sysctl.conf +sudo sysctl -p+
Then use it with your content of interest, for example:
+./run --debug-vm-rr --userland userland/c/hello.c+
This will first run the program once until completion, and then restart the program at the very first instruction at _start and leave you in a GDB shell.
From there, run the program until your point of interest, e.g.:
+break qemu_add_opts +continue+
and you can now reiably use reverse debugging commands such as reverse-continue, reverse-finish and reverse-next!
To restart debugging again after quitting rr, simlpy run on your host terminal:
rr replay+
Start pdb at the first instruction:
./run --arch arm --memory 512M+
./run --memory 512M
and verify inside the guest with:
+We can verify this on the guest directly from the kernel with:
free -m+
cat /proc/meminfo
as of LKMC 1e969e832f66cb5a72d12d57c53fb09e9721d589 this output contains:
+MemTotal: 498472 kB+
which we expand with:
+printf '0x%X\n' $((498472 * 1024))+
to:
+0x1E6CA000+
TODO: why is this value a bit smaller than 512M?
+free also gives the same result:
free -b+
contains:
+total used free shared buffers cached +Mem: 510435328 20385792 490049536 0 503808 2760704 +-/+ buffers/cache: 17121280 493314048 +Swap: 0 0 0+
which we expand with:
+printf '0x%X\n' 510435328$((498472 * 1024)+
man free from Ubuntu’s procps 3.3.15 tells us that free obtains this information from /proc/meminfo as well.
From C, we can get this information with sysconf(_SC_PHYS_PAGES) or get_phys_pages():
./linux/total_memory.out+
Source: userland/linux/total_memory.c
+Output:
+sysconf(_SC_PHYS_PAGES) * sysconf(_SC_PAGESIZE) = 0x1E6CA000 +sysconf(_SC_AVPHYS_PAGES) * sysconf(_SC_PAGESIZE) = 0x1D178000 +get_phys_pages() * sysconf(_SC_PAGESIZE) = 0x1E6CA000 +get_avphys_pages() * sysconf(_SC_PAGESIZE) = 0x1D178000+
This is mentioned at: https://stackoverflow.com/questions/22670257/getting-ram-size-in-c-linux-non-precise-result/22670407#22670407
+AV means available and gives the free memory: https://stackoverflow.com/questions/14386856/c-check-available-ram/57659190#57659190
+The gem5.debug executable has optimizations turned off unlike the default gem5.opt, and provides a much better debug experience:
./build-gem5 --arch aarch64 --gem5-build-type debug -./run --arch aarch64 --debug-vm --emulator gem5 --gem5-build-type debug-
The build outputs are automatically stored in a different directory from other build types such as .opt build, which prevents .debug files from overwriting .opt ones.
Therefore, --gem5-build-id is not required.
The price to pay for debuggability is high however: a Linux kernel boot was about 14 times slower than opt at 71e927e63bda6507d5a528f22c78d65099bdf36f between the commands:
-./run --arch aarch64 --eval 'm5 exit' --emulator gem5 --linux-build-id v4.16 -./run --arch aarch64 --eval 'm5 exit' --emulator gem5 --linux-build-id v4.16 --gem5-build-type debug-
so you will likely only use this when it is unavoidable. This is also benchmarked at: Section 28.2.1, “Benchmark Linux kernel boot”
+Explained at: Section 17.7, “Debug the emulator”.
File IO
@@ -20825,50 +20951,136 @@ git -C "$(./getvar qemu_source_dir)" checkout -LInux 5.1 / glibc 2.29 implements it with the mmap system call.
Test how much memory Linux lets us allocate by doubling a buffer with realloc until it fails:
TODO: the exact answer is going to be hard.
+But at least let’s verify that large malloc calls use the mmap syscall with:
./run --userland userland/c/malloc_max.c+
strace -x ./c/malloc_size.out 0x100000 2>&1 | grep mmap | tail -n 1 +strace -x ./c/malloc_size.out 0x200000 2>&1 | grep mmap | tail -n 1 +strace -x ./c/malloc_size.out 0x400000 2>&1 | grep mmap | tail -n 1
Source: userland/c/malloc_max.c
+Source: userland/c/malloc_size.c.
Outcome at c03d5d18ea971ae85d008101528d84c2ff25eb27 on Ubuntu 19.04 P51 host (16GiB RAM): prints up to 0x1000000000 (64GiB).
TODO dive into source code.
-TODO: if we do direct malloc allocations with userland/c/malloc.c or mmap with userland/linux/mmap_anonymous.c, then the limit was smaller than 64GiB!
-These work:
+From this we sese that the last mmap calls are:
./userland/c/malloc.out 0x100000000 -./userland/linux/mmap_anonymous.out 0x100000000+
mmap(NULL, 1052672, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffff7ef2000 +mmap(NULL, 2101248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffff7271000 +mmap(NULL, 4198400, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ffff7071000
which is 4Gib * sizeof(int) == 16GiB, but these fail at 32GiB:
which in hex are:
./userland/c/malloc.out 0x200000000 -./userland/linux/mmap_anonymous.out 0x200000000+
printf '%x\n' 1052672 +# 101000 +printf '%x\n' 2101248 +# 201000 +printf '%x\n' 4198400 +# 401000
malloc returns NULL, and mmap goes a bit further and segfauls on the first assignment array[0] = 1.
so we figured out the pattern: those 1, 2, and 4 MiB mallocs are mmaping N + 0x1000 bytes.
+General overview at: https://stackoverflow.com/questions/2798330/maximum-memory-which-malloc-can-allocate
Bibliography: https://stackoverflow.com/questions/2798330/maximum-memory-which-malloc-can-allocate
+See also:
+From Memory size and ./run --help, we see that at we set the emulator memory by default to 256MB. Let’s see how much Linux allows us to malloc.
Then from malloc implementation we see that malloc is implemented with mmap. Therefore, let’s simplify the problam and try to understand what is the larges mmap we can do first. This way we can ignore how glibc implements malloc for now.
In Linux, the maximum mmap value in controlled by:
cat /proc/sys/vm/overcommit_memory+
which is documented in man proc.
The default value is 0, which I can’t find a precise documentation for. 2 is precisly documented but I’m lazy to do all calculations. So let’s just verify 0 vs 1 by trying to mmap 1GiB of memory:
echo 0 > /proc/sys/vm/overcommit_memory +./linux/mmap_anonymous.out 0x40000000 +echo 1 > /proc/sys/vm/overcommit_memory +./linux/mmap_anonymous.out 0x40000000+
Source: userland/linux/mmap_anonymous.c
+With 0, we get a failure:
mmap: Cannot allocate memory+
but with 1 the allocation works.
We are allowed to allocate more than the actual memory + swap because the memory is only virtual, as explained at: https://stackoverflow.com/questions/7880784/what-is-rss-and-vsz-in-linux-memory-management/57453334#57453334
+If we start using the pages, the OOM killer would sooner or later step in and kill our process: Linux out-of-memory killer.
+We can observe the OOM in LKMC 1e969e832f66cb5a72d12d57c53fb09e9721d589 which defaults to 256MiB of memory with:
+echo 1 > /proc/sys/vm/overcommit_memory +./linux/mmap_anonymous_touch.out 0x40000000 0x8000000+
This first allows memory overcommit so to that the program can mmap 1GiB, 4x more than total RAM without failing as mentioned at malloc maximum size.
+It then walks over every page and writes a value in it to ensure that it is used.
+Algorithm used by the OOM: https://unix.stackexchange.com/questions/153585/how-does-the-oom-killer-decide-which-process-to-kill-first
+Unlike most our other assembly examples, which use the C standard library for portability, examples under freestanding/ directories don’t link to the C standard library.
Unlike most our other assembly examples, which use the C standard library for portability, examples under freestanding/ directories don’t link to the C standard library:
As a result, those examples cannot do IO portably, and so they make raw system calls and only be run on one given OS, e.g. Linux system calls.
@@ -21797,6 +22022,22 @@ When instructions do not interpret this operand encoding as the zero register, uYou are now left on the very first instruction of our tiny executable!
Assembly examples under nostartfiles directories can use the standard library, but they don’t use the pre-main boilerplate and start directly at our explicitly given _start:
I’m not sure how much stdlib functionality is supposed to work without the pre-main stuff, but I guess we’ll just have to find out!
+These also have signed and unsigned versions to either zero or one extend the result:
+userland/arch/aarch64/ldrsw.S: load byte and sign extend
+The specific models have names of type GIC-600, GIC-500, etc.
In QEMU v4.0.0, the GICv3 can be selected with an extra -machine gic_version=3 option.
In gem5 3126e84db773f64e46b1d02a9a27892bf6612d30, the GIC is determined by selecting the platform as explained at: gem5 ARM platforms.
+./build-buildroot -- graph-build graph-size graph-depends