diff --git a/README.adoc b/README.adoc index 0fd3f8e..a89a8e9 100644 --- a/README.adoc +++ b/README.adoc @@ -4799,9 +4799,42 @@ A simpler and possibly less overhead alternative to <<9P>> would be to generate Then you can `umount` and re-mount on guest without reboot. -We don't support this yet, but it should not be too hard to hack it up, maybe by hooking into link:rootfs-post-build-script[]. +To build the secondary disk image run link:build-disk2[]: -This was not possible from gem5 `fs.py` as of 60600f09c25255b3c8f72da7fb49100e2682093a: https://stackoverflow.com/questions/50862906/how-to-attach-multiple-disk-images-in-a-simulation-with-gem5-fs-py/51037661#51037661 +.... +./build-disk2 +.... + +This will put the entire <> into a squashfs filesystem. + +Then, if that filesystem is present, `./run` will automatically pass it as the second disk on the command line. + +For example, from inside QEMU, you can mount that disk with: + +.... +mkdir /mnt/vdb +mount /dev/vdb /mnt/vdb +/mnt/vdb/lkmc/c/hello.out +.... + +To update the secondary disk while a simulation is running to avoid rebooting, first unmount in the guest: + +.... +umount /mnt/vdb +.... + +and then on the host: + +.... +# Edit the file. +vim userland/c/hello.c +./build-userland +./build-disk2 +.... + +and now you can re-run the updated version of the executable on the guest after remounting it. + +gem5 fs.py support for multiple disks is discussed at: https://stackoverflow.com/questions/50862906/how-to-attach-multiple-disk-images-in-a-simulation-with-gem5-fs-py/51037661#51037661 == Graphics @@ -5414,7 +5447,7 @@ Bibliography: The https://en.wikipedia.org/wiki/9P_(protocol)[9p protocol] allows the guest to mount a host directory. -Both QEMU and <<9p-gem5>> support 9P. +Both QEMU and <> support 9P. ==== 9P vs NFS @@ -5486,7 +5519,7 @@ Bibliography: * https://superuser.com/questions/628169/how-to-share-a-directory-with-the-host-without-networking-in-qemu * https://wiki.qemu.org/Documentation/9psetup -==== 9P gem5 +==== gem5 9P Is possible on aarch64 as shown at: https://gem5-review.googlesource.com/c/public/gem5/+/22831[], and it is just a matter of exposing to X86 for those that want it. @@ -12249,7 +12282,7 @@ First we reset the readfile to something that runs quickly: printf 'echo "first benchmark"' > "$(./getvar gem5_readfile_file)" .... -and then in the guest, take a checkpoint and exit: +and then in the guest, take a checkpoint and exit with: .... ./gem5.sh @@ -21885,12 +21918,22 @@ Due to the way that <> however, the outpu [[perf-event-open]] ==== `perf_event_open` system call -link:userland/linux/perf_event_open.c[] counts instructions of a given loop: https://stackoverflow.com/questions/13313510/quick-way-to-count-number-of-instructions-executed-in-a-c-program/64863392#64863392 +link:userland/linux/perf_event_open.c[] + +On ARM, `perf_event_open` uses the <>. The mapping between kernel events and ARM PMU events can be found at: https://github.com/cirosantilli/linux/blob/v5.9/arch/arm64/kernel/perf_event.c Bibliography: * `man perf_event_open` * https://community.arm.com/developer/ip-products/system/b/embedded-blog/posts/using-the-arm-performance-monitor-unit-pmu-linux-driver +* instruction counts: https://stackoverflow.com/questions/13313510/quick-way-to-count-number-of-instructions-executed-in-a-c-program/64863392#64863392 +* cycle counts: +** https://stackoverflow.com/questions/13772567/how-to-get-the-cpu-cycle-count-in-x86-64-from-c/64898073#64898073 +** https://stackoverflow.com/questions/3830883/cpu-cycle-count-based-profiling-in-c-c-linux-x86-64/64898121#64898121 +** https://stackoverflow.com/questions/35923834/what-is-the-most-reliable-way-to-measure-the-number-of-cycles-of-my-program-in-c/64898206#64898206 +** https://unix.stackexchange.com/questions/352166/measure-exact-clock-cycles-for-a-c-assembly-program/620317#620317 +** https://stackoverflow.com/questions/8522140/linux-alternative-to-windows-high-resolution-performance-counter-api/64898303#64898303 +* cache misses: https://stackoverflow.com/questions/10082517/simplest-tool-to-measure-c-program-cache-hit-miss-and-cpu-time-in-linux/64899613#64899613 === Linux calling conventions @@ -23975,10 +24018,40 @@ Bibliography: The PMU (Performance Monitor Unit) is an unit in the ARM CPU that counts performance events of interest. These can be used to benchmark, and sometimes debug, code running on ARM CPUs. +It is documented at <> Chapter D7 "The Performance Monitors Extension"> + The <> exposes some (all?) of those events through the arch-agnostic <> system call. +Exposing the PMU to Linux v5.9.2 requires a <> entry of type: + +.... +pmu { + compatible = "arm,armv8-pmuv3"; + interrupts = <0x01 0x04 0xf04>; +}; +.... + +and if sucessful, a boot message shows: + +.... +<6>[ 0.044391] hw perfevents: enabled with armv8_pmuv3 PMU driver, 32 counters available +.... + The PMU is exposed through <>, with registers that start with the prefix `PM*`. + +<6>[ 0.044391] hw perfevents: enabled with armv8_pmuv3 PMU driver, 32 counters available + +<> D7.11.3 "Common event numbers" gives the available standardized events. Address space is also reverved for vendor extensions. For example, from it we see that the instruction count is documented at: + +____ +0x0008, INST_RETIRED, Instruction architecturally executed + +The counter increments for every architecturally executed instruction. +____ + +where "architecturally executed" is a reference to the possibility of <> in the implementation, which leads to some instructions being executed speculatively, but not have any side effects in the end. + Bibliography: https://community.arm.com/developer/ip-products/system/b/embedded-blog/posts/using-the-arm-performance-monitor-unit-pmu-linux-driver ==== ARM PMCCNTR register @@ -28159,7 +28232,7 @@ ls /mnt/9p/rootfs_overlay This way you can just hack away the scripts and try them out immediately without any further operations. [[out-rootfs-overlay-dir]] -===== out_rootfs_overlay_dir +===== `out_rootfs_overlay_dir` This path can be found with: @@ -28178,6 +28251,13 @@ In Buildroot, this is done by pointing `BR2_ROOTFS_OVERLAY` to that directory, w This does not include native image modification mechanisms such as <>, which we let Buildroot itself manage. +[[disk-image-2]] +====== `disk_image_2` + +A squashfs of <> that gets passed as the second argument. + +Especially useful with <> as a way to <> via <> since setting up <> is slightly laborious. + ==== lkmc.c The files: diff --git a/build b/build index 4ded0d3..7a1f638 100755 --- a/build +++ b/build @@ -497,6 +497,7 @@ Which components to build. Default: qemu-buildroot 'moreutils', # ts 'python3-pip', 'rr', + 'squashfs-tools', 'tmux', 'vinagre', 'wget', diff --git a/build-disk2 b/build-disk2 new file mode 100755 index 0000000..a158536 --- /dev/null +++ b/build-disk2 @@ -0,0 +1,27 @@ +#!/usr/bin/env python3 + +import common + +class Main(common.BuildCliFunction): + def __init__(self): + super().__init__( + description='''\ +https://cirosantilli.com/linux-kernel-module-cheat#secondary-disk +''' + ) + + def build(self): + # We must clean first because mksquashfs tries to avoid overwrites + # by renaming images on the target. + self.clean() + self.sh.run_cmd([ + 'mksquashfs', + self.env['out_rootfs_overlay_dir'], + self.env['disk_image_2'] + ]) + + def clean(self): + self.sh.rmrf(self.env['disk_image_2']) + +if __name__ == '__main__': + Main().cli() diff --git a/build-m5 b/build-m5 index 569dfbb..a59be89 100755 --- a/build-m5 +++ b/build-m5 @@ -1,7 +1,5 @@ #!/usr/bin/env python3 -import os - import common from shell_helpers import LF @@ -25,14 +23,14 @@ See: https://cirosantilli.com/linux-kernel-module-cheat#gem5-m5-executable ] def build(self): - os.makedirs(self.env['gem5_m5_build_dir'], exist_ok=True) + self.sh.mkdir_p(self.env['gem5_m5_build_dir']) # We must clean first or else the build outputs of one arch can conflict with the other. # I should stop being lazy and go actually patch gem5 to support out of tree m5 build... self.clean() self.sh.run_cmd( self._get_make_cmd(), ) - os.makedirs(self.env['out_rootfs_overlay_bin_dir'], exist_ok=True) + self.sh.mkdir_p(self.env['out_rootfs_overlay_bin_dir']) self.sh.cp( self.env['gem5_m5_source_dir_build'], self.env['out_rootfs_overlay_bin_dir'] @@ -42,7 +40,6 @@ See: https://cirosantilli.com/linux-kernel-module-cheat#gem5-m5-executable self.sh.run_cmd( self._get_make_cmd() + ['--clean', LF], ) - return None if __name__ == '__main__': Main().cli() diff --git a/buildroot_config/default b/buildroot_config/default index c39980b..1693c6c 100644 --- a/buildroot_config/default +++ b/buildroot_config/default @@ -22,7 +22,7 @@ BR2_TOOLCHAIN_BUILDROOT_WCHAR=y # Rootfs BR2_TARGET_ROOTFS_CPIO=n BR2_TARGET_ROOTFS_EXT2=y -BR2_TARGET_ROOTFS_EXT2_SIZE="512M" +BR2_TARGET_ROOTFS_EXT2_SIZE="1G" BR2_TARGET_ROOTFS_SQUASHFS=n BR2_TARGET_ROOTFS_INITRAMFS=n # TODO can you boot with those as root filesystem? diff --git a/common.py b/common.py index 471590b..d24586b 100644 --- a/common.py +++ b/common.py @@ -968,7 +968,9 @@ Incompatible archs are skipped. except ValueError: env['port_offset'] = 0 if env['emulator'] == 'gem5': - env['gem5_telnet_port'] = 3456 + env['port_offset'] + # Tims 4 because gem5 now has 3 UARTs tha take up the previous ports: + # https://github.com/cirosantilli/linux-kernel-module-cheat/issues/131 + env['gem5_telnet_port'] = 3456 + env['port_offset'] * 4 env['gdb_port'] = 7000 + env['port_offset'] else: env['qemu_base_port'] = 45454 + 10 * env['port_offset'] @@ -1227,6 +1229,8 @@ Incompatible archs are skipped. env['disk_image'] = env['rootfs_raw_file'] else: env['disk_image'] = env['qcow2_file'] + # A squahfs of 'out_rootfs_overlay_dir'. + env['disk_image_2'] = env['out_rootfs_overlay_dir'] + '.squashfs' # Android if not env['_args_given']['android_base_dir']: diff --git a/rootfs_overlay/lkmc/gem5.sh b/rootfs_overlay/lkmc/gem5.sh index 8966156..33c5f13 100755 --- a/rootfs_overlay/lkmc/gem5.sh +++ b/rootfs_overlay/lkmc/gem5.sh @@ -3,6 +3,8 @@ m5 checkpoint tmp="$(mktemp)" m5 readfile > "$tmp" -m5 resetstats -sh "$tmp" -m5 exit +if [ -s "$tmp" ]; then + m5 resetstats + sh "$tmp" + m5 exit +fi diff --git a/run b/run index 4e34f5f..cc9bc21 100755 --- a/run +++ b/run @@ -600,6 +600,8 @@ Extra options to append at the end of the emulator command line. ]) if use_disk_image: cmd.extend(['--disk-image', self.env['disk_image'], LF]) + if os.path.exists(self.env['disk_image_2']): + cmd.extend(['--disk-image', self.env['disk_image_2'], LF]) if self.env['baremetal']: cmd.extend([ '--param', 'system.workload.extras = "{}"'.format(self.python_escape_double_quotes(baremetal_cli_path)), LF, @@ -677,6 +679,8 @@ Extra options to append at the end of the emulator command line. ]) if use_disk_image: cmd.extend(['--disk', self.env['disk_image'], LF]) + if os.path.exists(self.env['disk_image_2']): + cmd.extend(['--disk', self.env['disk_image_2'], LF]) if self.env['dtb']: cmd.extend([ '--dtb', @@ -844,6 +848,18 @@ Extra options to append at the end of the emulator command line. ), LF, ]) + if os.path.exists(self.env['disk_image_2']): + extra_emulator_args.extend([ + '-drive', + 'file={},format={},if={}{}{}'.format( + self.env['disk_image_2'], + 'raw', + driveif, + snapshot, + rrid + ), + LF, + ]) if rr: extra_emulator_args.extend([ '-object', 'filter-replay,id=replay,netdev=net0', LF, diff --git a/userland/linux/perf_event_open.c b/userland/linux/perf_event_open.c index c3deac3..1612f61 100644 --- a/userland/linux/perf_event_open.c +++ b/userland/linux/perf_event_open.c @@ -1,12 +1,14 @@ /* https://cirosantilli.com/linux-kernel-module-cheat#perf-event-open * Adapted from `man perf_event_open` in manpages 5.05-1. */ +#define _GNU_SOURCE #include #include #include #include #include #include +#include #include #define LKMC_M5OPS_ENABLE 1 @@ -28,10 +30,6 @@ static long perf_event_open(struct perf_event_attr *hw_event, hw_event->exclude_hv = 1; ret = syscall(__NR_perf_event_open, hw_event, pid, cpu, group_fd, flags); - if (ret == -1) { - fprintf(stderr, "Error opening leader %llx\n", hw_event->config); - exit(EXIT_FAILURE); - } return ret; } @@ -50,9 +48,17 @@ main(int argc, char **argv) { long long count; uint64_t n; Desc descs[] = { + /* ARMV8_PMUV3_PERFCTR_PC_WRITE_RETIRED = 0x0C */ DESC(PERF_TYPE_HARDWARE, PERF_COUNT_HW_BRANCH_INSTRUCTIONS), + + /* ARMV8_PMUV3_PERFCTR_BR_MIS_PRED = 0x10 */ DESC(PERF_TYPE_HARDWARE, PERF_COUNT_HW_BRANCH_MISSES), + + /* ARMV8_PMUV3_PERFCTR_CPU_CYCLES = 0x11 */ DESC(PERF_TYPE_HARDWARE, PERF_COUNT_HW_CPU_CYCLES), + DESC(PERF_TYPE_HARDWARE, PERF_COUNT_HW_REF_CPU_CYCLES), + + /* ARMV8_PMUV3_PERFCTR_INST_RETIRED = 0x08 */ DESC(PERF_TYPE_HARDWARE, PERF_COUNT_HW_INSTRUCTIONS), DESC(PERF_TYPE_HW_CACHE, PERF_COUNT_HW_CACHE_L1D | PERF_COUNT_HW_CACHE_OP_READ << 8 | PERF_COUNT_HW_CACHE_RESULT_MISS << 16), }; @@ -70,9 +76,17 @@ main(int argc, char **argv) { gem5 = 0; } - for (i = 0; i < LKMC_ARRAY_SIZE(descs); i++) + for (i = 0; i < LKMC_ARRAY_SIZE(descs); i++) { fds[i] = perf_event_open(&pes[i], descs[i].type, descs[i].config, 0, -1, -1, 0); + if (fds[i] == -1) { + fprintf( + stderr, "perf_event_open error name=%s type=%zx config=%zx\n", + descs[i].name, (uintmax_t)pes[i].type, (uintmax_t)pes[i].config + ); + /*exit(EXIT_FAILURE);*/ + } + } /* Start the counts. */ for (i = 0; i < LKMC_ARRAY_SIZE(descs); i++)