tracing: create trace2line

This lead to a re-factoring of count boot instructions, since trace2line
relies on the same tracing mechanism.

Also, now that we have common factored, I've decided to put the traces on
the out dir, to concentrate all outputs in a single place, and allow
easy switching between archs.
This commit is contained in:
Ciro Santilli
2018-03-28 13:19:04 +01:00
parent 8f64af7303
commit f5065183a8
7 changed files with 229 additions and 153 deletions

1
.gitignore vendored
View File

@@ -9,7 +9,6 @@ gitignore*
# Specific files. # Specific files.
/9p /9p
/README.html /README.html
/trace*
# GEM5 # GEM5
/m5out /m5out

View File

@@ -964,20 +964,31 @@ TODOs:
* only managed to run in the terminal interface (but weirdly a blank QEMU window is still opened) * only managed to run in the terminal interface (but weirdly a blank QEMU window is still opened)
* GDB not connecting to KGDB. Possibly linked to `-serial stdio`. See also: https://stackoverflow.com/questions/14155577/how-to-use-kgdb-on-arm * GDB not connecting to KGDB. Possibly linked to `-serial stdio`. See also: https://stackoverflow.com/questions/14155577/how-to-use-kgdb-on-arm
* `/poweroff.out` does not exit QEMU nor gem5, the terminal just hangs: https://stackoverflow.com/questions/31990487/how-to-cleanly-exit-qemu-after-executing-bare-metal-program-without-user-interve
+ ==== arm shutdown
A blunt resolution for QEMU is on host:
+ https://stackoverflow.com/questions/31990487/how-to-cleanly-exit-qemu-after-executing-bare-metal-program-without-user-interve
`/poweroff.out` does not exit QEMU nor gem5, the terminal just hangs after the message:
....
reboot: System halted
....
A blunt resolution for QEMU is to do a `Ctrl + c` on host, or run on a nother shell:
.... ....
pkill qemu pkill qemu
.... ....
+
On gem5, it is possible to use the `m5` instrumentation from guest as a good workaround: On gem5, it is possible to use the `m5` instrumentation from guest as a good workaround:
+
.... ....
m5 exit m5 exit
.... ....
It does work on `aarch64` however, presumably because of magic virtio functionality.
=== aarch64 === aarch64
As usual, we use Buildroot's recommended QEMU setup QEMU `aarch64` setup: As usual, we use Buildroot's recommended QEMU setup QEMU `aarch64` setup:
@@ -1257,132 +1268,6 @@ Haven't tried it, doubt it will work out of the box! :-)
Maybe: https://stackoverflow.com/questions/47857023/booting-a-graphical-mips-qemu-machine Maybe: https://stackoverflow.com/questions/47857023/booting-a-graphical-mips-qemu-machine
== Count boot instructions
* https://www.quora.com/How-many-instructions-does-a-typical-Linux-kernel-boot-take
* https://github.com/cirosantilli/chat/issues/31
* https://rwmj.wordpress.com/2016/03/17/tracing-qemu-guest-execution/
* `qemu/docs/tracing.txt` and `qemu/docs/replay.txt`
* https://stackoverflow.com/questions/39149446/how-to-use-qemus-simple-trace-backend/46497873#46497873
Results:
[options="header"]
|===
|Commit |Arch |Simulator |Instruction count
|7228f75ac74c896417fb8c5ba3d375a14ed4d36b |arm |QEMU |680k
|7228f75ac74c896417fb8c5ba3d375a14ed4d36b |arm |gem5 AtomicSimpleCPU |160M
|7228f75ac74c896417fb8c5ba3d375a14ed4d36b |arm |gem5 HPI |155M
|7228f75ac74c896417fb8c5ba3d375a14ed4d36b |x86_64 |QEMU |3M
|7228f75ac74c896417fb8c5ba3d375a14ed4d36b |x86_64 |gem5 AtomicSimpleCPU |528M
|===
QEMU:
....
./count-boot-instructions
....
sample output:
....
1778927 trace.txt
26264 trace-boot.txt
....
where:
* 1778927: is the total number of boot instructions
* 26264: is the number of bootloader instructions
gem5:
....
./run -a arm -g -E 'm5 exit'
# Or:
# ./run -a arm -g -E 'm5 exit' -- --cpu-type=HPI --caches
grep sim_insts m5out/stats.txt
....
Notes:
* `-n` is a good idea to reduce the chances that you send unwanted non-deterministic mouse or keyboard clicks to the VM.
* `-e 'init=/poweroff.out'` is crucial as it reduces the instruction count from 40 million to 20 million, so half of the instructions were due to userland programs instead of the boot sequence.
+
Without it, the bulk of the time seems to be spent in setting up the network with `ifup` that gets called from `/etc/init.d/S40network` from the default Buildroot BusyBox setup.
+
And it becomes even worse if you try to `-net none` as recommended in the 2.7 `replay.txt` docs, because then `ifup` waits for 15 seconds before giving up as per `/etc/network/interfaces` line `wait-delay 15`.
* `0x1000000` is the address where QEMU puts the Linux kernel at with `-kernel` in x86.
+
It can be found from:
+
....
readelf -e buildroot/output.x86_64~/build/linux-*/vmlinux | grep Entry
....
+
TODO confirm further. If I try to break there with:
+
....
./rungdb *0x1000000
....
+
but I have no corresponding source line. Also note that this line is not actually the first line, since the kernel messages such as `early console in extract_kernel` have already shown on screen at that point. This does not break at all:
+
....
./rungdb extract_kernel
....
+
It only appears once on every log I've seen so far, checked with `grep 0x1000000 trace.txt`
+
Then when we count the instructions that run before the kernel entry point, there is only about 100k instructions, which is insignificant compared to the kernel boot itself.
* We can also discount the instructions after `init` runs by using `readelf` to get the initial address of `init`. One easy way to do that now is to just run:
+
....
./rungdb-user kernel_module-1.0/user/poweroff.out main
....
+
And get that from the traces, e.g. if the address is `4003a0`, then we search:
+
....
grep -n 4003a0 trace.txt
....
+
I have observed a single match for that instruction, so it must be the init, and there were only 20k instructions after it, so the impact is negligible.
* gem5 simulates memory latencies. So I think that the CPU loops idle while waiting for memory, and counts will be higher.
This works because we have already done the following with QEMU:
* `./configure --enable-trace-backends=simple`. This logs in a binary format to the trace file.
+
It makes 3x execution faster than the default trace backend which logs human readable data to stdout.
+
This also alters the actual execution, and reduces the instruction count by 10M TODO understand exactly why, possibly due to the `All QSes seen` thing.
* patch QEMU source to remove the `disable` from `exec_tb` in the `trace-events`. See also: https://rwmj.wordpress.com/2016/03/17/tracing-qemu-guest-execution/
Possible improvements:
* to disable networking. Is replacing `init` enough?
** https://superuser.com/questions/181254/how-do-you-boot-linux-with-networking-disabled
** https://superuser.com/questions/684005/how-does-one-permanently-disable-gnu-linux-networking/1255015#1255015
+
`CONFIG_NET=n` did not significantly reduce instruction, so maybe replacing `init` is enough.
* logging with the default backend `log` greatly slows down the CPU, and in particular leads to this during kernel boot:
+
....
All QSes seen, last rcu_sched kthread activity 5252 (4294901421-4294896169), jiffies_till_next_fqs=1, root ->qsmask 0x0
swapper/0 R running task 0 1 0 0x00000008
ffff880007c03ef8 ffffffff8107aa5d ffff880007c16b40 ffffffff81a3b100
ffff880007c03f60 ffffffff810a41d1 0000000000000000 0000000007c03f20
fffffffffffffedc 0000000000000004 fffffffffffffedc ffffffff00000000
Call Trace:
<IRQ> [<ffffffff8107aa5d>] sched_show_task+0xcd/0x130
[<ffffffff810a41d1>] rcu_check_callbacks+0x871/0x880
[<ffffffff810a799f>] update_process_times+0x2f/0x60
....
+
in which the boot appears to hang for a considerable time.
* Confirm that the kernel enters at `0x1000000`, or where it enters. Once we have this, we can exclude what comes before in the BIOS.
== initrd == initrd
The kernel can boot from an CPIO file, which is a directory serialization format much like tar: https://superuser.com/questions/343915/tar-vs-cpio-what-is-the-difference The kernel can boot from an CPIO file, which is a directory serialization format much like tar: https://superuser.com/questions/343915/tar-vs-cpio-what-is-the-difference
@@ -1692,6 +1577,100 @@ TODO: what do `+` and `!` mean?
Each `enable` under the `events/` tree enables a certain set of functions, the higher the `enable` more functions are enabled. Each `enable` under the `events/` tree enables a certain set of functions, the higher the `enable` more functions are enabled.
=== Count boot instructions
* https://www.quora.com/How-many-instructions-does-a-typical-Linux-kernel-boot-take
* https://github.com/cirosantilli/chat/issues/31
* https://rwmj.wordpress.com/2016/03/17/tracing-qemu-guest-execution/
* `qemu/docs/tracing.txt` and `qemu/docs/replay.txt`
* https://stackoverflow.com/questions/39149446/how-to-use-qemus-simple-trace-backend/46497873#46497873
Results (boot not excluded):
[options="header"]
|===
|Commit |Arch |Simulator |Instruction count
|7228f75ac74c896417fb8c5ba3d375a14ed4d36b |arm |QEMU |680k
|7228f75ac74c896417fb8c5ba3d375a14ed4d36b |arm |gem5 AtomicSimpleCPU |160M
|7228f75ac74c896417fb8c5ba3d375a14ed4d36b |arm |gem5 HPI |155M
|7228f75ac74c896417fb8c5ba3d375a14ed4d36b |x86_64 |QEMU |3M
|7228f75ac74c896417fb8c5ba3d375a14ed4d36b |x86_64 |gem5 AtomicSimpleCPU |528M
|===
QEMU:
....
./trace-boot -a x86_64
....
sample output:
....
instruction count all: 1833863
entry address: 0x1000000
instruction count firmware: 20708
....
gem5:
....
./run -a arm -g -E 'm5 exit'
# Or:
# ./run -a arm -g -E 'm5 exit' -- --cpu-type=HPI --caches
grep sim_insts m5out/stats.txt
....
Notes:
* `0x1000000` is the address where QEMU puts the Linux kernel at with `-kernel` in x86.
+
It can be found from:
+
....
readelf -e buildroot/output.x86_64~/build/linux-*/vmlinux | grep Entry
....
+
TODO confirm further. If I try to break there with:
+
....
./rungdb *0x1000000
....
+
but I have no corresponding source line. Also note that this line is not actually the first line, since the kernel messages such as `early console in extract_kernel` have already shown on screen at that point. This does not break at all:
+
....
./rungdb extract_kernel
....
+
It only appears once on every log I've seen so far, checked with `grep 0x1000000 trace.txt`
+
Then when we count the instructions that run before the kernel entry point, there is only about 100k instructions, which is insignificant compared to the kernel boot itself.
+
TODO `-a arm` and `-a aarch64` does not count firmware instructions properly because the entry point address of the ELF file does not show up on the trace at all.
* We can also discount the instructions after `init` runs by using `readelf` to get the initial address of `init`. One easy way to do that now is to just run:
+
....
./rungdb-user kernel_module-1.0/user/poweroff.out main
....
+
And get that from the traces, e.g. if the address is `4003a0`, then we search:
+
....
grep -n 4003a0 trace.txt
....
+
I have observed a single match for that instruction, so it must be the init, and there were only 20k instructions after it, so the impact is negligible.
* on arm, you need to hit `Ctrl + C` once after seeing the message `reboot: System halted` due to <<arm-shutdown>>
* to disable networking. Is replacing `init` enough?
+
--
** https://superuser.com/questions/181254/how-do-you-boot-linux-with-networking-disabled
** https://superuser.com/questions/684005/how-does-one-permanently-disable-gnu-linux-networking/1255015#1255015
--
+
`CONFIG_NET=n` did not significantly reduce instruction counts, so maybe replacing `init` is enough.
* gem5 simulates memory latencies. So I think that the CPU loops idle while waiting for memory, and counts will be higher.
=== User mode Linux === User mode Linux
I once got link:https://en.wikipedia.org/wiki/User-mode_Linux[UML] running on a minimal Buildroot setup at: https://unix.stackexchange.com/questions/73203/how-to-create-rootfs-for-user-mode-linux-on-fedora-18/372207#372207 I once got link:https://en.wikipedia.org/wiki/User-mode_Linux[UML] running on a minimal Buildroot setup at: https://unix.stackexchange.com/questions/73203/how-to-create-rootfs-for-user-mode-linux-on-fedora-18/372207#372207
@@ -2096,10 +2075,80 @@ Just make sure that you never click inside the QEMU window when doing that, othe
You can still send key presses to QEMU however even without the mouse capture, just either click on the title bar, or alt tab to give it focus. You can still send key presses to QEMU however even without the mouse capture, just either click on the title bar, or alt tab to give it focus.
=== QEMU record and replay === Tracing
QEMU has a mechanism to log all instructions executed to a file.
To do it for the Linux kernel boot we have a helper:
....
./trace-boot -a x86_64
....
You can then inspect the instructions with:
....
less ./buildroot/output.x86_64~/lkmc/trace.txt
....
This functionality relies on the following setup:
* `./configure --enable-trace-backends=simple`. This logs in a binary format to the trace file.
+
It makes 3x execution faster than the default trace backend which logs human readable data to stdout.
+
Logging with the default backend `log` greatly slows down the CPU, and in particular leads to this boot message:
+
....
All QSes seen, last rcu_sched kthread activity 5252 (4294901421-4294896169), jiffies_till_next_fqs=1, root ->qsmask 0x0
swapper/0 R running task 0 1 0 0x00000008
ffff880007c03ef8 ffffffff8107aa5d ffff880007c16b40 ffffffff81a3b100
ffff880007c03f60 ffffffff810a41d1 0000000000000000 0000000007c03f20
fffffffffffffedc 0000000000000004 fffffffffffffedc ffffffff00000000
Call Trace:
<IRQ> [<ffffffff8107aa5d>] sched_show_task+0xcd/0x130
[<ffffffff810a41d1>] rcu_check_callbacks+0x871/0x880
[<ffffffff810a799f>] update_process_times+0x2f/0x60
....
+
in which the boot appears to hang for a considerable time.
* patch QEMU source to remove the `disable` from `exec_tb` in the `trace-events` file. See also: https://rwmj.wordpress.com/2016/03/17/tracing-qemu-guest-execution/
==== Trace source lines
We can further use Binutils' `addr2line` to get the line that corresponds to each address:
....
./trace-boot -a x86_64
./trace2line -a x86_64
less ./buildroot/output.x86_64~/lkmc/trace-lines.txt
....
The format is as follows:
....
39368 _static_cpu_has arch/x86/include/asm/cpufeature.h:148
....
Where:
* `39368`: number of consecutive times that a line ran. Makes the output much shorter and more meaningful
* `_static_cpu_has`: name of the function that contains the line
* `arch/x86/include/asm/cpufeature.h:148`: file and line
This could of course all be done with GDB, but it would likely be too slow to be practical.
TODO do even more awesome offline post-mortem analysis things, such as:
* detect if we are in userspace or kernelspace. Should be a simple matter of reading the
* read kernel data structures, and determine the current thread. Maybe we can reuse / extend the kernel's GDB Python scripts??
==== QEMU record and replay
QEMU supports deterministic record and replay by saving external inputs, which would be awesome to understand the kernel, as you would be able to examine a single run as many times as you would like. QEMU supports deterministic record and replay by saving external inputs, which would be awesome to understand the kernel, as you would be able to examine a single run as many times as you would like.
This mechanism first requires a trace to be generated on an initial record run. The trace is then used on the replay runs to make them deterministic.
Unfortunately it is not working in the current QEMU: https://stackoverflow.com/questions/46970215/how-to-use-qemus-deterministic-record-and-replay-feature-for-a-linux-kernel-boo Unfortunately it is not working in the current QEMU: https://stackoverflow.com/questions/46970215/how-to-use-qemus-deterministic-record-and-replay-feature-for-a-linux-kernel-boo
Alternatively, https://github.com/mozilla/rr[`mozilla/rr`] claims it is able to run QEMU: but using it would require you to step through QEMU code itself. Likely doable, but do you really want to? Alternatively, https://github.com/mozilla/rr[`mozilla/rr`] claims it is able to run QEMU: but using it would require you to step through QEMU code itself. Likely doable, but do you really want to?

7
build
View File

@@ -155,7 +155,10 @@ BR2_ROOTFS_POST_SCRIPT_ARGS=\"$post_script_args\"
fi fi
# Build. # Build.
mkdir -p "${root_dir}/9p" mkdir -p \
"${root_dir}/9p" \
"${lkmc_out_dir}" \
;
cd "${buildroot_dir}" cd "${buildroot_dir}"
# HOST_QEMU_OPTS is a hack that happens to work because the QEMU package luckly uses += at all times. # HOST_QEMU_OPTS is a hack that happens to work because the QEMU package luckly uses += at all times.
# It shouldn't be necessary in the first place: https://bugs.busybox.net/show_bug.cgi?id=9936 # It shouldn't be necessary in the first place: https://bugs.busybox.net/show_bug.cgi?id=9936
@@ -169,7 +172,7 @@ cmd="time \
env \ env \
-u LD_LIBRARY_PATH \ -u LD_LIBRARY_PATH \
make \ make \
O='$out_dir' \ O='${out_dir}' \
HOST_QEMU_OPTS='--enable-debug --extra-cflags=-DDEBUG_PL061=1 --enable-trace-backends=simple $qemu_sdl' \ HOST_QEMU_OPTS='--enable-debug --extra-cflags=-DDEBUG_PL061=1 --enable-trace-backends=simple $qemu_sdl' \
V='$v' \ V='$v' \
$extra_make_args \ $extra_make_args \

1
common
View File

@@ -10,5 +10,6 @@ set_common_vars() {
fi fi
out_dir="${buildroot_dir}/output.${arch_dir}~" out_dir="${buildroot_dir}/output.${arch_dir}~"
config_file="${out_dir}/.config" config_file="${out_dir}/.config"
lkmc_out_dir="${out_dir}/lkmc"
images_dir="${out_dir}/images" images_dir="${out_dir}/images"
} }

View File

@@ -1,17 +0,0 @@
#!/usr/bin/env bash
set -eu
. common
arch=x86_64
while getopts a: OPT; do
case "$OPT" in
a)
arch="$OPTARG"
;;
esac
done
set_common_vars "$arch" false
time ./run -a "$arch" -e 'init=/poweroff.out' -- -trace exec_tb,file=trace
time ./qemu/scripts/simpletrace.py "${out_dir}/build/host-qemu-custom/trace-events-all" trace >trace.txt
wc -l trace.txt
sed '/0x1000000/q' trace.txt >trace-boot.txt
wc -l trace-boot.txt

23
trace-boot Executable file
View File

@@ -0,0 +1,23 @@
#!/usr/bin/env bash
set -eu
. common
arch=x86_64
while getopts a: OPT; do
case "$OPT" in
a)
arch="$OPTARG"
;;
esac
done
set_common_vars "$arch" false
time ./run -a "$arch" -e 'init=/poweroff.out' -- -trace exec_tb,file="${lkmc_out_dir}/trace.bin"
time ./qemu/scripts/simpletrace.py "${out_dir}/build/host-qemu-custom/trace-events-all" "${lkmc_out_dir}/trace.bin" >"${lkmc_out_dir}/trace.txt"
# Instruction count.
# We could put this on a separate script, but it just adds more arch boilerplate to a new script.
# So let's just leave it here for now since it did not add a significant processing time.
echo "instruction count all: $(wc -l "${lkmc_out_dir}/trace.txt" | cut -d' ' -f1)"
entry_addr=$("${out_dir}"/host/bin/*-buildroot-*-readelf -h "${out_dir}/build/linux-custom/vmlinux" | grep 'Entry point address' | sed -E 's/.*: *//')
echo "entry address: ${entry_addr}"
sed "/${entry_addr}/q" "${lkmc_out_dir}/trace.txt" >"${lkmc_out_dir}/trace-boot.txt"
echo "instruction count firmware: $(wc -l "${lkmc_out_dir}/trace-boot.txt" | cut -d' ' -f1)"

18
trace2line Executable file
View File

@@ -0,0 +1,18 @@
#!/usr/bin/env bash
set -eu
. common
arch=x86_64
while getopts a: OPT; do
case "$OPT" in
a)
arch="$OPTARG"
;;
esac
done
set_common_vars "$arch" false
kernel_dir="${out_dir}/build/linux-custom"
sed -E 's/.*pc=//' "${lkmc_out_dir}/trace.txt" | \
xargs "${out_dir}/host/bin/${arch}-linux-addr2line" -e "${kernel_dir}/vmlinux" -fp | \
sed -E "s|at ${kernel_dir}/(\./\|)||" | \
uniq -c \
> "${lkmc_out_dir}/trace-lines.txt"