gem5: expose syscall emulation multiple executables

This commit is contained in:
Ciro Santilli 六四事件 法轮功
2020-04-29 03:00:02 +00:00
parent 939ce5668c
commit f5d4998ff5
3 changed files with 117 additions and 105 deletions

View File

@@ -4130,29 +4130,93 @@ hello
so we see that two syscall lines were added for each syscall, showing the syscall inputs and exit status, just like a mini `strace`! so we see that two syscall lines were added for each syscall, showing the syscall inputs and exit status, just like a mini `strace`!
==== gem5 syscall emulation multithreading
gem5 user mode multithreading has been particularly flaky compared <<qemu-user-mode-multithreading,to QEMU's>>, but work is being put into improving it.
In gem5 syscall simulation, the `fork` syscall checks if there is a free CPU, and if there is a free one, the new threads runs on that CPU.
Otherwise, the `fork` call, and therefore higher level interfaces to `fork` such as `pthread_create` also fail and return a failure return status in the guest.
For example, if we use just one CPU for link:userland/posix/pthread_self.c[] which spawns one thread besides `main`:
....
./run --cpus 1 --emulator gem5 --userland userland/posix/pthread_self.c --cli-args 1
....
fails with this error message coming from the guest stderr:
....
pthread_create: Resource temporarily unavailable
....
It works however if we add on extra CPU:
....
./run --cpus 2 --emulator gem5 --userland userland/posix/pthread_self.c --cli-args 1
....
Once threads exit, their CPU is freed and becomes available for new `fork` calls: For example, the following run spawns a thread, joins it, and then spawns again, and 2 CPUs are enough:
....
./run --cpus 2 --emulator gem5 --userland userland/posix/pthread_self.c --cli-args '1 2'
....
because at each point in time, only up to two threads are running.
gem5 syscall emulation does show the expected number of cores when queried, e.g.:
....
./run --cpus 1 --userland userland/cpp/thread_hardware_concurrency.cpp --emulator gem5
./run --cpus 2 --userland userland/cpp/thread_hardware_concurrency.cpp --emulator gem5
....
outputs `1` and `2` respectively.
This can also be clearly by running `sched_getcpu`:
....
./run \
--arch aarch64 \
--cli-args 4 \
--cpus 8 \
--emulator gem5 \
--userland userland/linux/sched_getcpu.c \
;
....
which necessarily produces an output containing the CPU numbers from 1 to 4 and no higher:
....
1
3
4
2
....
TODO why does the `2` come at the end here? Would be good to do a detailed assembly run analysis.
==== gem5 syscall emulation multiple executables ==== gem5 syscall emulation multiple executables
This is not currently nicely exposed in LKMC, but gem5 syscall emulation does allow you to run multiple executables "at once". gem5 syscall emulation has the nice feature of allowing you to run multiple executables "at once".
`--cmd` takes a semicolon separated list, so we could do: Each executable starts running on the next free core much as if it had been forked right at the start of simulation: <<gem5-syscall-emulation-multithreading>>.
This can be useful to quickly create deterministic multi-CPU workload.
`se.py --cmd` takes a semicolon separated list, so we could do which LKMC exposes this by taking `--userland` multiple times as in:
.... ....
./run --arch aarch64 --emulator gem5 --userland userland/posix/getpid.c --cpus 2 ./run \
--arch aarch64 \
--cpus 2 \
--emulator gem5 \
--userland userland/posix/getpid.c \
--userland userland/posix/getpid.c \
;
.... ....
and then <<dry-run,hack the produced command>> by replacing: We need at least one CPU per executable, just like when forking new processes.
....
--cmd /home/ciro/bak/git/linux-kernel-module-cheat/out/userland/default/aarch64/posix/getpid.out \
--param 'system.cpu[0].workload[:].release = "5.4.3"' \
....
with:
....
--cmd '/path/to/linux-kernel-module-cheat/out/userland/default/aarch64/posix/getpid.out;/path/to/linux-kernel-module-cheat/out/userland/default/aarch64/posix/getpid.out' \
--param 'system.cpu[:].workload[:].release = "5.4.3"' \
....
The outcome of this is that we see two different `pid` messages printed to stdout: The outcome of this is that we see two different `pid` messages printed to stdout:
@@ -4161,7 +4225,7 @@ pid=101
pid=100 pid=100
.... ....
since from <<gem5-process>> we can see that se.py sets up one different PID per executable starting at `100: since from <<gem5-process>> we can see that se.py sets up one different PID per executable starting at 100:
.... ....
workloads = options.cmd.split(';') workloads = options.cmd.split(';')
@@ -4170,8 +4234,6 @@ since from <<gem5-process>> we can see that se.py sets up one different PID per
process = Process(pid = 100 + idx) process = Process(pid = 100 + idx)
.... ....
This is basically starts running one process per CPU much like if it had been forked.
We can also see that these processes are running concurrently with <<gem5-tracing>> by hacking: We can also see that these processes are running concurrently with <<gem5-tracing>> by hacking:
.... ....
@@ -10952,78 +11014,6 @@ Remember <<qemu-user-mode-does-not-show-stdout-immediately>> though.
At 369a47fc6e5c2f4a7f911c1c058b6088f8824463 + 1 QEMU appears to spawn 3 host threads plus one for every new guest thread created. Remember that link:userland/posix/pthread_count.c[] spawns N + 1 total threads if you count the `main` thread. At 369a47fc6e5c2f4a7f911c1c058b6088f8824463 + 1 QEMU appears to spawn 3 host threads plus one for every new guest thread created. Remember that link:userland/posix/pthread_count.c[] spawns N + 1 total threads if you count the `main` thread.
====== gem5 syscall emulation multithreading
gem5 user mode multithreading has been particularly flaky compared <<qemu-user-mode-multithreading,to QEMU's>>, but work is being put into improving it.
In gem5 syscall simulation, the `fork` syscall checks if there is a free CPU, and if there is a free one, the new threads runs on that CPU.
Otherwise, the `fork` call, and therefore higher level interfaces to `fork` such as `pthread_create` also fail and return a failure return status in the guest.
For example, if we use just one CPU for link:userland/posix/pthread_self.c[] which spawns one thread besides `main`:
....
./run --cpus 1 --emulator gem5 --userland userland/posix/pthread_self.c --cli-args 1
....
fails with this error message coming from the guest stderr:
....
pthread_create: Resource temporarily unavailable
....
It works however if we add on extra CPU:
....
./run --cpus 2 --emulator gem5 --userland userland/posix/pthread_self.c --cli-args 1
....
Once threads exit, their CPU is freed and becomes available for new `fork` calls: For example, the following run spawns a thread, joins it, and then spawns again, and 2 CPUs are enough:
....
./run --cpus 2 --emulator gem5 --userland userland/posix/pthread_self.c --cli-args '1 2'
....
because at each point in time, only up to two threads are running.
gem5 syscall emulation does show the expected number of cores when queried, e.g.:
....
./run --cpus 1 --userland userland/cpp/thread_hardware_concurrency.cpp --emulator gem5
./run --cpus 2 --userland userland/cpp/thread_hardware_concurrency.cpp --emulator gem5
....
outputs `1` and `2` respectively.
This can also be clearly by running `sched_getcpu`:
....
./run \
--arch aarch64 \
--cli-args 4 \
--cpus 8 \
--emulator gem5 \
--userland userland/linux/sched_getcpu.c \
;
....
which necessarily produces an output containing the CPU numbers from 1 to 4 and no higher:
....
1
3
4
2
....
TODO why does the `2` come at the end here? Would be good to do a detailed assembly run analysis.
====== gem5 se.py user mode with 2 or more pthreads fails with because simulate() limit reached
See bug report at: https://github.com/cirosantilli/linux-kernel-module-cheat/issues/81
Related: <<gem5-simulate-limit-reached>>.
====== gem5 ARM full system with more than 8 cores ====== gem5 ARM full system with more than 8 cores
https://stackoverflow.com/questions/50248067/how-to-run-a-gem5-arm-aarch64-full-system-simulation-with-fs-py-with-more-than-8 https://stackoverflow.com/questions/50248067/how-to-run-a-gem5-arm-aarch64-full-system-simulation-with-fs-py-with-more-than-8
@@ -12874,11 +12864,13 @@ PROTOCOL = 'MOESI_CMP_directory'
and therefore ARM already compiles `MOESI_CMP_directory` by default. and therefore ARM already compiles `MOESI_CMP_directory` by default.
Then, with `fs.py` and `se.py`, you can choose to use either the classic or built-in ruby system at runtime with the `--ruby` option: Then, with `fs.py` and `se.py`, you can choose to use either the classic or the ruby system type selected at build time with `PROTOCOL=` at runtime by passing the `--ruby` option:
* if `--ruby` is given, use the ruby memory system that was compiled into gem5. Caches are always present when Ruby is used, since the main goal of Ruby is to specify the cache coherence protocol, and it therefore hardcodes cache hierarchies. * if `--ruby` is given, use the ruby memory system that was compiled into gem5. Caches are always present when Ruby is used, since the main goal of Ruby is to specify the cache coherence protocol, and it therefore hardcodes cache hierarchies.
* otherwise, use the classic memory system. Caches may be optional for certain CPU types and are enabled with `--caches`. * otherwise, use the classic memory system. Caches may be optional for certain CPU types and are enabled with `--caches`.
It is not possible to build more than one Ruby system into a single build, and this is a major pain point for testing Ruby: https://gem5.atlassian.net/browse/GEM5-467
For example, to use a two level <<mesi-cache-coherence-protocol>> we can do: For example, to use a two level <<mesi-cache-coherence-protocol>> we can do:
.... ....
@@ -12935,6 +12927,10 @@ Certain features may not work in Ruby. For example, <<gem5-checkpoint>> creation
Tested in gem5 d7d9bc240615625141cd6feddbadd392457e49eb. Tested in gem5 d7d9bc240615625141cd6feddbadd392457e49eb.
===== gem5 Ruby MI_example protocol
This is the simplest of all protocols, and therefore the first one you should study to learn how Ruby works.
===== gem5 crossbar interconnect ===== gem5 crossbar interconnect
Crossbar or `XBar` in the code, is the default <<cache-coherence,CPU interconnect>> that gets used by `fs.py` if <<gem5-ruby-build,`--ruby`>> is not given. Crossbar or `XBar` in the code, is the default <<cache-coherence,CPU interconnect>> that gets used by `fs.py` if <<gem5-ruby-build,`--ruby`>> is not given.
@@ -14976,6 +14972,8 @@ If we don't use such instructions that flush memory, we would only see the inter
.`config.dot.svg` for a system with two TimingSimpleCPU with caches. .`config.dot.svg` for a system with two TimingSimpleCPU with caches.
image::{cirosantilli-media-base}gem5_config_TimingSimpleCPU_caches_2_CPUs_12c917de54145d2d50260035ba7fa614e25317a3.svg?sanitize=true[height=600] image::{cirosantilli-media-base}gem5_config_TimingSimpleCPU_caches_2_CPUs_12c917de54145d2d50260035ba7fa614e25317a3.svg?sanitize=true[height=600]
The simplest setup to understand will be to use <<gem5-syscall-emulation-multiple-executables>>.
===== gem5 event queue MinorCPU syscall emulation freestanding example analysis ===== gem5 event queue MinorCPU syscall emulation freestanding example analysis
The events <<gem5-event-queue-atomicsimplecpu-syscall-emulation-freestanding-example-analysis,for the Atomic CPU>> were pretty simple: basically just ticks. The events <<gem5-event-queue-atomicsimplecpu-syscall-emulation-freestanding-example-analysis,for the Atomic CPU>> were pretty simple: basically just ticks.

View File

@@ -586,10 +586,15 @@ https://cirosantilli.com/linux-kernel-module-cheat#user-mode-static-executables
self.add_argument( self.add_argument(
'-u', '-u',
'--userland', '--userland',
action='append',
help='''\ help='''\
Run the given userland executable in user mode instead of booting the Linux kernel Run the given userland executable in user mode instead of booting the Linux kernel
in full system mode. In gem5, user mode is called Syscall Emulation (SE) mode and in full system mode. In gem5, user mode is called Syscall Emulation (SE) mode and
uses se.py. Path resolution is similar to --baremetal. uses se.py. Path resolution is similar to --baremetal.
* https://cirosantilli.com/linux-kernel-module-cheat#userland-setup-getting-started
* https://cirosantilli.com/linux-kernel-module-cheat#gem5-syscall-emulation-mode
This option may be given multiple times only in gem5 syscall emulation:
https://cirosantilli.com/linux-kernel-module-cheat#gem5-syscall-emulation-multiple-executables
''' '''
) )
self.add_argument( self.add_argument(
@@ -848,14 +853,14 @@ Incompatible archs are skipped.
) )
env['qemu_img_basename'] = 'qemu-img' env['qemu_img_basename'] = 'qemu-img'
env['qemu_img_executable'] = join(env['qemu_build_dir'], env['qemu_img_basename']) env['qemu_img_executable'] = join(env['qemu_build_dir'], env['qemu_img_basename'])
if env['userland'] is None: if not env['userland']:
env['qemu_executable_basename'] = 'qemu-system-{}'.format(env['arch']) env['qemu_executable_basename'] = 'qemu-system-{}'.format(env['arch'])
else: else:
env['qemu_executable_basename'] = 'qemu-{}'.format(env['arch']) env['qemu_executable_basename'] = 'qemu-{}'.format(env['arch'])
if env['qemu_which'] == 'host': if env['qemu_which'] == 'host':
env['qemu_executable'] = env['qemu_executable_basename'] env['qemu_executable'] = env['qemu_executable_basename']
else: else:
if env['userland'] is None: if not env['userland']:
env['qemu_executable'] = join( env['qemu_executable'] = join(
env['qemu_build_dir'], env['qemu_build_dir'],
'{}-softmmu'.format(env['arch']), '{}-softmmu'.format(env['arch']),
@@ -1160,8 +1165,8 @@ Incompatible archs are skipped.
if os.path.exists(source_path): if os.path.exists(source_path):
env['source_path'] = source_path env['source_path'] = source_path
break break
elif env['userland'] is not None: elif env['userland']:
env['image'] = self.resolve_userland_executable(env['userland']) env['image'] = self.resolve_userland_executable(env['userland'][0])
source_path_noext = os.path.splitext(join( source_path_noext = os.path.splitext(join(
env['userland_source_dir'], env['userland_source_dir'],
env['image'][len(env['userland_build_dir']) + 1:] env['image'][len(env['userland_build_dir']) + 1:]
@@ -1458,7 +1463,7 @@ lunch aosp_{}-eng
continue continue
else: else:
raise Exception('native emulator only supported in if target arch == host arch') raise Exception('native emulator only supported in if target arch == host arch')
if env['userland'] is None and not env['mode'] == 'userland': if env['userland'] and not env['mode'] == 'userland':
if real_all_emulators: if real_all_emulators:
continue continue
else: else:

25
run
View File

@@ -367,7 +367,7 @@ Extra options to append at the end of the emulator command line.
if not self.env['_args_given']['gdb_wait']: if not self.env['_args_given']['gdb_wait']:
self.env['gdb_wait'] = True self.env['gdb_wait'] = True
if not self.env['_args_given']['tmux_args']: if not self.env['_args_given']['tmux_args']:
if self.env['userland'] is None and self.env['baremetal'] is None: if not self.env['userland'] and self.env['baremetal'] is None:
self.env['tmux_args'] = 'start_kernel' self.env['tmux_args'] = 'start_kernel'
else: else:
self.env['tmux_args'] = 'main' self.env['tmux_args'] = 'main'
@@ -453,7 +453,7 @@ Extra options to append at the end of the emulator command line.
if self.env['emulator'] == 'gem5': if self.env['emulator'] == 'gem5':
if ( if (
self.env['baremetal'] is None and self.env['baremetal'] is None and
self.env['userland'] is None not self.env['userland']
): ):
# This is an attempte to run gem5 from a prebuilt download # This is an attempte to run gem5 from a prebuilt download
# but it is not working: # but it is not working:
@@ -507,7 +507,7 @@ Extra options to append at the end of the emulator command line.
if self.env['emulator'] == 'gem5': if self.env['emulator'] == 'gem5':
if self.env['quiet']: if self.env['quiet']:
show_stdout = False show_stdout = False
if self.env['baremetal'] is None and self.env['userland'] is None: if self.env['baremetal'] is None and not self.env['userland']:
if not os.path.exists(self.env['rootfs_raw_file']): if not os.path.exists(self.env['rootfs_raw_file']):
if not os.path.exists(self.env['qcow2_file']): if not os.path.exists(self.env['qcow2_file']):
raise_rootfs_not_found() raise_rootfs_not_found()
@@ -539,17 +539,24 @@ Extra options to append at the end of the emulator command line.
cpt_dir = cpt_dirs[-self.env['gem5_restore']] cpt_dir = cpt_dirs[-self.env['gem5_restore']]
cpt_dirs_sorted_by_tick = sorted(cpt_dirs, key=lambda x: int(x.split('.')[1])) cpt_dirs_sorted_by_tick = sorted(cpt_dirs, key=lambda x: int(x.split('.')[1]))
extra_emulator_args.extend(['-r', str(cpt_dirs_sorted_by_tick.index(cpt_dir) + 1)]) extra_emulator_args.extend(['-r', str(cpt_dirs_sorted_by_tick.index(cpt_dir) + 1)])
if self.env['userland'] is not None: if self.env['userland']:
cmd_opt = self.env['image']
for u in self.env['userland'][1:]:
cmd_opt += ';' + self.resolve_userland_executable(u)
if len(self.env['userland']) > 1:
workload_cpus = ':'
else:
workload_cpus = '0'
cmd.extend([ cmd.extend([
self.env['gem5_se_file'], LF, self.env['gem5_se_file'], LF,
'--cmd', self.env['image'], LF, '--cmd', cmd_opt, LF,
'--num-cpus', str(self.env['cpus']), LF, '--num-cpus', str(self.env['cpus']), LF,
# We have to use cpu[0] here because on multi-cpu workloads, # We have to use cpu[0] here because on multi-cpu workloads,
# cpu[1] and higher use workload as a proxy to cpu[0].workload. # cpu[1] and higher use workload as a proxy to cpu[0].workload.
# as can be seen from the config.ini. # as can be seen from the config.ini.
# If system.cpu[:].workload[:] were used instead, we would get the error: # If system.cpu[:].workload[:] were used instead, we would get the error:
# "KeyError: 'workload'" # "KeyError: 'workload'"
'--param', 'system.cpu[0].workload[:].release = "{}"'.format(self.env['kernel_version']), LF, '--param', 'system.cpu[{}].workload[:].release = "{}"'.format(workload_cpus,self.env['kernel_version']), LF,
]) ])
if self.env['cli_args'] is not None: if self.env['cli_args'] is not None:
cmd.extend(['--options', self.env['cli_args'], LF]) cmd.extend(['--options', self.env['cli_args'], LF])
@@ -672,7 +679,7 @@ Extra options to append at the end of the emulator command line.
qemu_user_and_system_options = [ qemu_user_and_system_options = [
'-trace', 'enable={},file={}'.format(trace_type, self.env['qemu_trace_file']), LF, '-trace', 'enable={},file={}'.format(trace_type, self.env['qemu_trace_file']), LF,
] ]
if self.env['userland'] is not None: if self.env['userland']:
if self.env['gdb_wait']: if self.env['gdb_wait']:
debug_args = ['-g', str(self.env['gdb_port']), LF] debug_args = ['-g', str(self.env['gdb_port']), LF]
else: else:
@@ -863,7 +870,7 @@ Extra options to append at the end of the emulator command line.
if self.env['baremetal']: if self.env['baremetal']:
tmux_args += " --baremetal '{}'".format(self.env['baremetal']) tmux_args += " --baremetal '{}'".format(self.env['baremetal'])
if self.env['userland']: if self.env['userland']:
tmux_args += " --userland '{}'".format(self.env['userland']) tmux_args += " --userland '{}'".format(self.env['userland'][0])
if self.env['in_tree']: if self.env['in_tree']:
tmux_args += ' --in-tree' tmux_args += ' --in-tree'
if self.env['tmux_args'] is not None: if self.env['tmux_args'] is not None:
@@ -877,6 +884,8 @@ Extra options to append at the end of the emulator command line.
cmd.extend(extra_emulator_args) cmd.extend(extra_emulator_args)
cmd.extend(self.env['extra_emulator_args']) cmd.extend(self.env['extra_emulator_args'])
if self.env['userland'] and self.env['emulator'] in ('qemu', 'native'): if self.env['userland'] and self.env['emulator'] in ('qemu', 'native'):
if len(self.env['userland']) > 1:
raise Exception('qemu and native machines only support a single executable')
# The program and arguments must come at the every end of the CLI. # The program and arguments must come at the every end of the CLI.
cmd.extend([self.env['image'], LF]) cmd.extend([self.env['image'], LF])
if self.env['cli_args'] is not None: if self.env['cli_args'] is not None: