diff --git a/README.adoc b/README.adoc index a32e63b..1d6f469 100644 --- a/README.adoc +++ b/README.adoc @@ -4456,6 +4456,233 @@ Bibliography: * https://stackoverflow.com/questions/39134990/mmap-of-dev-mem-fails-with-invalid-argument-for-virt-to-phys-address-but-addre/45127582#45127582 * https://stackoverflow.com/questions/43325205/can-we-use-virt-to-phys-for-user-space-memory-in-kernel-module +===== Userland physical address experiments + +Only tested in x86_64. + +The Linux kernel exposes physical addresses to userland through: + +* `/proc//maps` +* `/proc//pagemap` +* `/dev/mem` + +In this section we will play with them. + +First get a virtual address to play with: + +.... +/virt_to_phys_test.out & +.... + +Source: link:kernel_module/user/virt_to_phys_test.c[] + +Sample output: + +.... +vaddr 0x600800 +pid 110 +.... + +The program: + +* allocates a `volatile` variable and sets is value to `0x12345678` +* prints the virtual address of the variable, and the program PID +* runs a while loop until until the value of the variable gets mysteriously changed somehow, e.g. by nasty tinkerers like us + +Then, translate the virtual address to physical using `/proc//maps` and `/proc//pagemap`: + +.... +/virt_to_phys_user.out 110 0x600800 +.... + +Sample output physical address: + +.... +0x7c7b800 +.... + +Source: link:kernel_module/user/virt_to_phys_user.c[] + +Now we can verify that `virt_to_phys_user.out` gave the correct physical address in the following ways: + +* <> +* <> + +Bibliography: + +* https://stackoverflow.com/questions/17021214/decode-proc-pid-pagemap-entry/45126141#45126141 +* https://stackoverflow.com/questions/6284810/proc-pid-pagemaps-and-proc-pid-maps-linux/45500208#45500208 + +====== QEMU xp + +The `xp` <> command reads memory at a given physical address. + +First launch `virt_to_phys_user.out` as described at <>. + +On a second terminal, use QEMU to read the physical address: + +.... +./qemumonitor 'xp 0x7c7b800' +.... + +Output: + +.... +0000000007c7b800: 0x12345678 +.... + +Yes!!! We read the correct value from the physical address. + +We could not find however to write to memory from the QEMU monitor, boring. + +====== /dev/mem + +`/dev/mem` exposes access to physical addresses, and we use it through the convenient `devmem` BusyBox utility. + +First launch `virt_to_phys_user.out` as described at <>. + +Next, read from the physical address: + +.... +devmem 0x7c7b800 +.... + +Possible output: + +.... +Memory mapped at address 0x7ff7dbe01000. +Value at address 0X7C7B800 (0x7ff7dbe01800): 0x12345678 +.... + +which shows that the physical memory contains the expected value `0x12345678`. + +`0x7ff7dbe01000` is a new virtual address that `devmem` maps to the physical address to be able to read from it. + +Modify the physical memory: + +.... +devmem 0x7c7b800 w 0x9abcdef0 +.... + +After one second, we see on the screen: + +.... +i 9abcdef0 +[1]+ Done /virt_to_phys_test.out +.... + +so the value changed, and the `while` loop exited! + +This example requires: + +* `CONFIG_STRICT_DEVMEM=n`, otherwise `devmem` fails with: ++ +.... +devmem: mmap: Operation not permitted +.... +* `nopat` kernel parameter + +which we set by default. + +Bibliography: https://stackoverflow.com/questions/11891979/how-to-access-mmaped-dev-mem-without-crashing-the-linux-kernel + +====== pagemap_dump.out + +Dump the physical address of all pages mapped to a given process using `/proc//maps` and `/proc//pagemap`. + +First launch `virt_to_phys_user.out` as described at <>. Suppose that the output was: + +.... +# /virt_to_phys_test.out & +vaddr 0x601048 +pid 63 +# /virt_to_phys_user.out 63 0x601048 +0x1a61048 +.... + +Now obtain the page map for the process: + +.... +/pagemap_dump.out 63 +.... + +Sample output excerpt: + +.... +vaddr pfn soft-dirty file/shared swapped present library +400000 1ede 0 1 0 1 /virt_to_phys_test.out +600000 1a6f 0 0 0 1 /virt_to_phys_test.out +601000 1a61 0 0 0 1 /virt_to_phys_test.out +602000 2208 0 0 0 1 [heap] +603000 220b 0 0 0 1 [heap] +7ffff78ec000 1fd4 0 1 0 1 /lib/libuClibc-1.0.30.so +.... + +Source: link:kernel_module/user/pagemap_dump.c[] + +Adapted from: https://github.com/dwks/pagemap/blob/8a25747bc79d6080c8b94eac80807a4dceeda57a/pagemap2.c + +Meaning of the flags: + +* `vaddr`: first virtual address of a page the belongs to the process. Notably: ++ +.... +./runtc readelf -l out/x86_64/buildroot/build/kernel_module-1.0/user/virt_to_phys_test.out +.... ++ +contains: ++ +.... + Type Offset VirtAddr PhysAddr + FileSiz MemSiz Flags Align +... + LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000 + 0x000000000000075c 0x000000000000075c R E 0x200000 + LOAD 0x0000000000000e98 0x0000000000600e98 0x0000000000600e98 + 0x00000000000001b4 0x0000000000000218 RW 0x200000 + + Section to Segment mapping: + Segment Sections... +... + 02 .interp .hash .dynsym .dynstr .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame + 03 .ctors .dtors .jcr .dynamic .got.plt .data .bss +.... ++ +from which we deduce that: ++ +** `400000` is the text segment +** `600000` is the data segment +* `pfn`: add three zeroes to it, and you have the physical address. ++ +Three zeroes is 12 bits which is 4kB, which is the size of a page. ++ +For example, the virtual address `0x601000` has `pfn` of `0x1a61`, which means that its physical address is `0x1a61000` ++ +This is consistent with what `virt_to_phys_user.out` told us: the virtual address `0x601048` has physical address `0x1a61048`. ++ +`048` corresponds to the three last zeroes, and is the offset within the page. ++ +Also, this value falls inside `0x601000`, which as previously analyzed is the data section, which is the normal location for global variables such as ours. +* `soft-dirty`: TODO +* `file/shared`: TODO. `1` seems to indicate that the page can be shared across processes, possibly for read-only pages? E.g. the text segment has `1`, but the data has `0`. +* `swapped`: TODO swapped to disk? +* `present`: TODO vs swapped? +* `library`: which executable owns that page + +This program works in two steps: + +* parse the human readable lines lines from `/proc//maps`. This files contains lines of form: ++ +.... +7ffff7b6d000-7ffff7bdd000 r-xp 00000000 fe:00 658 /lib/libuClibc-1.0.22.so +.... ++ +which tells us that: ++ +** `7f8af99f8000-7f8af99ff000` is a virtual address range that belong to the process, possibly containing multiple pages. +** `/lib/libuClibc-1.0.22.so` is the name of the library that owns that memory +* loop over each page of each address range, and ask `/proc//pagemap` for more information about that page, including the physical address + === Linux kernel tracing Good overviews: @@ -4880,6 +5107,12 @@ This example should handle interrupts from userland and print a message to stdou TODO: what is the expected behaviour? I should have documented this when I wrote this stuff, and I'm that lazy right now that I'm in the middle of a refactor :-) +UIO interface in a nutshell: + +* blocking read / poll: waits until interrupts +* `write`: call `irqcontrol` callback. Default: 0 or 1 to enable / disable interrupts. +* `mmap`: access device memory + Sources: * link:kernel_module/user/uio_read.c[] @@ -5805,7 +6038,7 @@ as: Memory at feb54000 .... -Then you can try messing with that address with: +Then you can try messing with that address with <>: .... devmem 0xfeb54000 w 0x12345678 @@ -6029,14 +6262,12 @@ Expected outcome after insmod: * QEMU reports MMIO with printfs * IRQs are generated and handled by this module, which logs to dmesg -Also without insmoding this module, try: +Without insmoding this module, try writing to the register with <>: .... devmem 0x101e9000 w 0x12345678 .... -which touches the register from userland through `/dev/mem`. - We can also observe the interrupt with <>: .... diff --git a/kernel_config_fragment/default b/kernel_config_fragment/default index 1e1b5be..a91c8b0 100644 --- a/kernel_config_fragment/default +++ b/kernel_config_fragment/default @@ -1,4 +1,5 @@ CONFIG_BLK_DEV_INITRD=y +CONFIG_STRICT_DEVMEM=n CONFIG_DYNAMIC_DEBUG=y CONFIG_MODULE_SRCVERSION_ALL=y CONFIG_OVERLAY_FS=y @@ -101,19 +102,6 @@ CONFIG_X86_PTDUMP=y ## UIO -# Userspace drivers: allow you to handle IRQs and do memory IO from userland through a /dev file. -# -# Superseded by the more featureful VFIO. -# -# Documentation/DocBook/uio-howto.tmpl contains actual userland examples -# for the generic examples under drivers/uio -# -# UIO interface in a nutshell: -# -# - blocking read / poll: waits until interrupts -# - write: call irqcontrol callback. Default: 0 or 1 to enable / disable interrupts. -# - mmap: access device memory - # All other UIO depend on this module. CONFIG_UIO=m diff --git a/kernel_module/user/README.adoc b/kernel_module/user/README.adoc index 5df750c..be7eb08 100644 --- a/kernel_module/user/README.adoc +++ b/kernel_module/user/README.adoc @@ -1,6 +1,3 @@ https://github.com/cirosantilli/linux-kernel-module-cheat#rootfs_overlay . link:sched_getaffinity.c[] -. link:usermem.c[] -.. link:pagemap_dump.c[] -. link:uio_read.c[] diff --git a/kernel_module/user/pagemap_dump.c b/kernel_module/user/pagemap_dump.c index bfa7cb1..f4ca4f9 100644 --- a/kernel_module/user/pagemap_dump.c +++ b/kernel_module/user/pagemap_dump.c @@ -1,29 +1,4 @@ -/* -Only tested in x86_64. - -Adapted from: https://github.com/dwks/pagemap/blob/8a25747bc79d6080c8b94eac80807a4dceeda57a/pagemap2.c - -- https://stackoverflow.com/questions/17021214/how-to-decode-proc-pid-pagemap-entries-in-linux/45126141#45126141 -- https://stackoverflow.com/questions/5748492/is-there-any-api-for-determining-the-physical-address-from-virtual-address-in-li -- https://stackoverflow.com/questions/6284810/proc-pid-pagemaps-and-proc-pid-maps-linux/45500208#45500208 - -Dump the page map of a given process PID. - -Data sources: /proc/PIC/{map,pagemap} - -This program works in two steps: - -- parse the human readable lines lines from `/proc//maps`. This files contains lines of form: - - 7ffff7b6d000-7ffff7bdd000 r-xp 00000000 fe:00 658 /lib/libuClibc-1.0.22.so - - which gives us: - - - `7f8af99f8000-7f8af99ff000`: a virtual address range that belong to the process, possibly containing multiple pages. - - `/lib/libuClibc-1.0.22.so` the name of the library that owns that memory. - -- loop over each page of each address range, and ask `/proc//pagemap` for more information about that page, including the physical address. -*/ +/* https://github.com/cirosantilli/linux-kernel-module-cheat#pagemap_dump-out */ #define _XOPEN_SOURCE 700 #include @@ -63,7 +38,7 @@ int main(int argc, char **argv) perror("open pagemap"); return EXIT_FAILURE; } - printf("addr pfn soft-dirty file/shared swapped present library\n"); + printf("vaddr pfn soft-dirty file/shared swapped present library\n"); for (;;) { ssize_t length = read(maps_fd, buffer + offset, sizeof buffer - offset); if (length <= 0) break; @@ -116,11 +91,11 @@ int main(int argc, char **argv) /* Get info about all pages in this page range with pagemap. */ { PagemapEntry entry; - for (uintptr_t addr = low; addr < high; addr += sysconf(_SC_PAGE_SIZE)) { + for (uintptr_t vaddr = low; vaddr < high; vaddr += sysconf(_SC_PAGE_SIZE)) { /* TODO always fails for the last page (vsyscall), why? pread returns 0. */ - if (!pagemap_get_entry(&entry, pagemap_fd, addr)) { + if (!pagemap_get_entry(&entry, pagemap_fd, vaddr)) { printf("%jx %jx %u %u %u %u %s\n", - (uintmax_t)addr, + (uintmax_t)vaddr, (uintmax_t)entry.pfn, entry.soft_dirty, entry.file_page, diff --git a/kernel_module/user/usermem.c b/kernel_module/user/usermem.c deleted file mode 100644 index 091f3b1..0000000 --- a/kernel_module/user/usermem.c +++ /dev/null @@ -1,91 +0,0 @@ -/* -Only tested in x86_64. - -Provide an allocated userland memory for us to test out kernel memory APIs, including: - -- /proc/pid/maps -- /proc/pid/pagemap. See also: https://stackoverflow.com/questions/17021214/decode-proc-pid-pagemap-entry/45126141#45126141 -- /dev/mem - -Usage: - - /usermem.out & - -Outputs the virtual address and pid, e.g.: - - vaddr 0x600800 - pid 110 - -Translate the virtual address to physical for the given PID: - - /virt_to_phys_user.out 110 0x600800 - -Sample output physical address: - - 0x7c7b800 - -## QEMU monitor xp - -Examine the physical memory from the QEMU monitor: on host: - - ./qemumonitor - xp 0x7c7b800 - -Output: - - 0000000007c7b800: 0x12345678 - -Yes!!! We read the correct value from the physical address. - -## /dev/mem - -Firt up, this requires: - -- CONFIG_STRICT_DEVMEM is not set. -- nopat on kernel parameters - -see: https://stackoverflow.com/questions/11891979/how-to-access-mmaped-dev-mem-without-crashing-the-linux-kernel - -Then: - - devmem 0x7c7b800 - -Possible output: - - Memory mapped at address 0x7ff7dbe01000. - Value at address 0X7C7B800 (0x7ff7dbe01800): 0x12345678 - -where 0x7ff7dbe01000 is a new virtual address that was mapped -to our physical address and given to the process that mapped /dev/mem. - -And finally, let's change the value! - - devmem 0x7c7b800 w 0x9abcdef0 - -After one second, we see on the screen: - - i 9abcdef0 - [1]+ Done /usermem.out - -so the while loop was exited! -*/ - -#define _XOPEN_SOURCE 700 -#include -#include -#include -#include - -enum { I0 = 0x12345678 }; - -static volatile uint32_t i = I0; - -int main(void) { - printf("vaddr %p\n", (void *)&i); - printf("pid %ju\n", (uintmax_t)getpid()); - while (i == I0) { - sleep(1); - } - printf("i %jx\n", (uintmax_t)i); - return EXIT_SUCCESS; -} diff --git a/kernel_module/user/virt_to_phys_test.c b/kernel_module/user/virt_to_phys_test.c new file mode 100644 index 0000000..830b8f6 --- /dev/null +++ b/kernel_module/user/virt_to_phys_test.c @@ -0,0 +1,21 @@ +/* https://github.com/cirosantilli/linux-kernel-module-cheat#userland-physical-address-experiments */ + +#define _XOPEN_SOURCE 700 +#include +#include +#include +#include + +enum { I0 = 0x12345678 }; + +static volatile uint32_t i = I0; + +int main(void) { + printf("vaddr %p\n", (void *)&i); + printf("pid %ju\n", (uintmax_t)getpid()); + while (i == I0) { + sleep(1); + } + printf("i %jx\n", (uintmax_t)i); + return EXIT_SUCCESS; +} diff --git a/kernel_module/user/virt_to_phys_user.c b/kernel_module/user/virt_to_phys_user.c index 4ded48a..4d96966 100644 --- a/kernel_module/user/virt_to_phys_user.c +++ b/kernel_module/user/virt_to_phys_user.c @@ -1,10 +1,4 @@ -/* -Convert a virtual address to physical for a given process PID using /proc/PID/pagemap. - -https://stackoverflow.com/questions/5748492/is-there-any-api-for-determining-the-physical-address-from-virtual-address-in-li/45128487#45128487 - -Test this out with usermem.c. -*/ +/* https://github.com/cirosantilli/linux-kernel-module-cheat#userland-physical-address-experiments */ #define _XOPEN_SOURCE 700 #include /* printf */