gem5: arm more than 8 cpus with gicv3

This commit is contained in:
Ciro Santilli 六四事件 法轮功
2020-07-22 01:00:00 +00:00
parent 224fae82e1
commit d0ada7f58c

View File

@@ -11162,7 +11162,21 @@ At 369a47fc6e5c2f4a7f911c1c058b6088f8824463 + 1 QEMU appears to spawn 3 host thr
https://stackoverflow.com/questions/50248067/how-to-run-a-gem5-arm-aarch64-full-system-simulation-with-fs-py-with-more-than-8 https://stackoverflow.com/questions/50248067/how-to-run-a-gem5-arm-aarch64-full-system-simulation-with-fs-py-with-more-than-8
Build the kernel with the <<gem5-arm-linux-kernel-patches>>, and then run: With <<arm-gic,GICv3>>, tested at LKMC 224fae82e1a79d9551b941b19196c7e337663f22 gem5 3ca404da175a66e0b958165ad75eb5f54cb5e772 on vanilla kernel:
....
./run \
--arch aarch64 \
--emulator gem5 \
--cpus 16 \
-- \
--machine-type VExpress_GEM5_V2 \
;
....
boots to a shell and `nproc` shows `16`.
For the GICv2 extension method, build the kernel with the <<gem5-arm-linux-kernel-patches>>, and then run:
.... ....
./run \ ./run \
@@ -15429,6 +15443,7 @@ The resulting trace is:
191000: Event: Event_85: generic 85 executed @ 191000 191000: Event: Event_85: generic 85 executed @ 191000
.... ....
So yes, `--caches` does work here, leading to a runtime of 191000 rather than 469000 without caches! So yes, `--caches` does work here, leading to a runtime of 191000 rather than 469000 without caches!
Notably, we now see that very little time passed between the first and second instructions which are marked with `ExecEnable` in #39 and #47, presumably because rather than going out all the way to the DRAM system the event chain stops right at the `icache.cpu_side` when a hit happens, which must have been the case for the second instruction, which is just adjacent to the first one. Notably, we now see that very little time passed between the first and second instructions which are marked with `ExecEnable` in #39 and #47, presumably because rather than going out all the way to the DRAM system the event chain stops right at the `icache.cpu_side` when a hit happens, which must have been the case for the second instruction, which is just adjacent to the first one.
@@ -15544,6 +15559,24 @@ We can confirm this with `--trace DRAM` which shows:
Contrast this with the non `--cache` version seen at <<timingsimplecpu-analysis-5>> in which DRAM only actually reads the 4 required bytes. Contrast this with the non `--cache` version seen at <<timingsimplecpu-analysis-5>> in which DRAM only actually reads the 4 required bytes.
The only cryptic thing about the messages is the `IF` flag, but good computer architects would have guessed it correctly, and https://github.com/gem5/gem5/blob/fa70478413e4650d0058cbfe81fd5ce362101994/src/mem/packet.cc#L372[src/mem/packet.cc] confirms:
....
void
Packet::print(std::ostream &o, const int verbosity,
const std::string &prefix) const
{
ccprintf(o, "%s%s [%x:%x]%s%s%s%s%s%s", prefix, cmdString(),
getAddr(), getAddr() + getSize() - 1,
req->isSecure() ? " (s)" : "",
req->isInstFetch() ? " IF" : "",
req->isUncacheable() ? " UC" : "",
isExpressSnoop() ? " ES" : "",
req->isToPOC() ? " PoC" : "",
req->isToPOU() ? " PoU" : "");
}
....
Another interesting observation of running with `--trace Cache,DRAM,XBar` is that between the execution of both instructions, there is a `Cache` event, but no `DRAM` or `XBar` events: Another interesting observation of running with `--trace Cache,DRAM,XBar` is that between the execution of both instructions, there is a `Cache` event, but no `DRAM` or `XBar` events:
.... ....