kernel modules: add a quick scull port from LDD3

Also:

* fix fops.c on both kernels:
  * 5.9: the out of space error code was 1 not 8
  * 6.6: for whatever reason we can't read the user buffer as before on the
         diagnostic print, it leads to segfault and oops
* create memfile.c which is like fops.c but of unlimited size
This commit is contained in:
Ciro Santilli
2025-04-28 15:23:44 +01:00
parent 3d84eccc43
commit e4847e4b40
16 changed files with 2133 additions and 50 deletions

View File

@@ -338,6 +338,14 @@ insmod /mnt/9p/out_rootfs_overlay/lkmc/hello.ko
and the new `pr_info` message should now show on the terminal at the end of the boot.
If you are simultaneously developing the test script and the kernel module, some smart test scripts should take the kernel module as first argument so you can directly run:
....
/mnt/9p/rootfs_overlay/lkmc/scull.sh /mnt/9p/out_rootfs_overlay/lkmc/scull.ko
....
and it will pick up both the test script and the kernel module from host.
This works because we have a <<9p>> mount there setup by default, which mounts the host directory that contains the build outputs on the guest:
....
@@ -7682,11 +7690,19 @@ Bibliography: https://stackoverflow.com/questions/5970595/how-to-create-a-device
==== File operations
File operations are the main method of userland driver communication.
File operations are the main method of userland driver communication that uses common file system calls such as read and write.
`struct file_operations` determines what the kernel will do on filesystem system calls of <<pseudo-filesystems>>.
Through `struct file_operations` drivers tell the kernel what it should do on filesystem system calls of <<pseudo-filesystems>>.
This example illustrates the most basic system calls: `open`, `read`, `write`, `close` and `lseek`:
[[fops]]
===== fops.c
This example illustrates the most basic system calls: `open`, `read`, `write`, `close` and `lseek`.
* link:kernel_modules/fops.c[]
* link:rootfs_overlay/lkmc/fops.sh[]
In it we create a debugfs special file that behaves like a regular file, except that it is stored in memory for as long as the kernel module is loaded, and it has a fixed lengh of 4 bytes. Any longer `write` attempt gets simply truncated up at the end:
....
./fops.sh
@@ -7699,11 +7715,6 @@ Outcome: the test passes:
0
....
Sources:
* link:kernel_modules/fops.c[]
* link:rootfs_overlay/lkmc/fops.sh[]
Then give this a try:
....
@@ -7714,6 +7725,14 @@ We have put printks on each fop, so this allows you to see which system calls ar
No, there no official documentation: https://stackoverflow.com/questions/15213932/what-are-the-struct-file-operations-arguments
[[memfile]]
====== memfile.c
This example behaves the same as <<fops>>, except that the in-memory virtual file has unlimited size. In the kernel module we have therefore to so a bit of memory management and somehow increase the size of the buffer as needed.
* link:kernel_modules/memfile.c[]
* link:rootfs_overlay/lkmc/memfile.sh[]
[[seq-file]]
==== seq_file
@@ -9994,6 +10013,89 @@ See also:
* https://stackoverflow.com/questions/5429137/how-to-print-register-values-in-gdb/31340294#31340294
* https://stackoverflow.com/questions/24169614/how-to-show-all-x86-control-registers-when-debugging-the-linux-kernel-in-gdb-thr/59311764#59311764
[[scull]]
==== scull
This kernel module is a port of scull example from LDD3. It was tested on LKMC e1834763088b8a7532b5fae800039de880471f2d + 1 with Linux kernel 6.8.12.
"Scull" is an acronym for "Simple Character Utility for Loading Localities". This expansion is mostly meaningless however, but there you are.
Source code:
* link:kernel_modules/scull.c[]
* link:rootfs_overlay/lkmc/scull.sh[]
Create the devices and test them:
....
scull.sh
....
scull creates several character devices.
The most "basic" one is `/dev/scull0`, which acts a bit as an in-memory file, except that it has weird quantizations applied to it so that you can't append as normal and it doesn't really look like a regular file. What it actually is more like is an object pool.
The original scull interface is very weird and would erase all data on write-only `O_WRONLY`, but not on read/write `O_RDWR`, which doesn't make much sense:
....
int scull_open(struct inode *inode, struct file *filp) {
if ( (filp->f_flags & O_ACCMODE) == O_WRONLY)
scull_trim(dev); /* ignore errors */
....
We have modified that to the much more reasonable:
....
if ((filp->f_flags & O_TRUNC)) {
....
The old weird truncation condition makes the code hard to test as there is no way to write to two different blocks like it and keep them both in memory, unless you are able to find a CLI tool that supports `O_RDWR` or you write a C program to test things.
With our new inferface, we can differentiate clear all vs don't clear all in the usual manner, e.g. this clears:
....
echo asdf > /dev/scull0
....
but this doesn't:
....
echo asdf >> /dev/scull0
....
The examples from our test should make its weird behavior clearer e.g.:
....
# Append starts writing from the start of the 4k block, not like the usual semantic.
printf asdf > "$f"
printf qw >> "$f"
[ "$(cat "$f")" = qwdf ]
# Overwrite first clears everything, then writes to start of 4k block.
printf asdf > /dev/${module}0
printf qw > /dev/${module}0
[ "$(cat "$f")" = qw ]
# Read from the middle
printf asdf > /dev/${module}0
[ "$(dd if="$f" bs=1 count=2 skip=2 status=none)" = df ]
# Write to the middle
printf asdf > /dev/${module}0
printf we | dd of="$f" bs=1 seek=1 conv=notrunc status=none
[ "$(cat "$f")" = aqwf ]
...
It is also worth noting that the implementation of scull is meant to be "readable" but not optimal:
____
kmalloc is not the most efficient way to allocate large areas of memory (see Chapter 8), so the implementation chosen for scull is not a particularly smart one. The source code for a smart implementation would be more difficult to read, and the aim of this section is to show read and write, not memory management. Thats why the code just uses kmalloc and kfree without resorting to allocation of whole pages, although that approach would be more efficient.
____
Another shortcoming of the example is that it uses mutexes, where rwsem would be the clearly superior choice.
This module was derived from https://github.com/martinezjavier/ldd3/tree/30f801cd0157e8dfb41193f471dc00d8ca10239f/scull which had already ported it to much more recent kernel versions for us. Ideally we should just use that repo as a submodule, but we were lazy to setup the buildroot properly for now, and decided to dump it all into a single file to start with.
== FreeBSD
https://en.wikipedia.org/wiki/FreeBSD
@@ -28112,6 +28214,14 @@ The `--linux-build-id` option should be passed to all scripts that support it, m
To run both kernels simultaneously, one on each QEMU instance, see: xref:simultaneous-runs[xrefstyle=full].
You can also build <<kernel-modules>> against a specific prebuilt kernel with:
....
./build-modules --linux-build-id v4.16
....
This will then allow you to insmod the kernel modules on your newly built kernel.
==== QEMU build variants
Analogous to the <<linux-kernel-build-variants>> but with the `--qemu-build-id` option instead:

View File

@@ -0,0 +1,37 @@
// This workspace exists to work on C files formatted like the Linux kernel,
// notably using tabs instead of space. This is unlike the C files in our userland
// programs, and we couldn't find a better way to make this distinction
// https://stackoverflow.com/questions/47405315/visual-studio-code-and-subfolder-specific-settings
{
"folders": [
{
"path": "."
},
{
"path": "submodules/linux"
}
],
"settings": {
"files.watcherExclude": {
"data/**": true,
".git/**": true,
"out.docker/**": true,
"out/**": true,
"submodules/**": true,
},
"search.exclude": {
"data/**": true,
".git/**": true,
"out.docker/**": true,
"out/**": true,
"submodules/**": true,
},
"[c]": {
"editor.tabSize": 8,
"editor.insertSpaces": false
},
"files.associations": {
"rwsem.h": "c"
}
}
}

View File

@@ -1,6 +0,0 @@
{
"[c]": {
"editor.tabSize": 8,
"editor.insertSpaces": false
}
}

View File

@@ -1,4 +1,4 @@
/* https://cirosantilli.com/linux-kernel-module-cheat#file-operations */
/* https://cirosantilli.com/linux-kernel-module-cheat#fops */
#include <linux/debugfs.h>
#include <linux/errno.h> /* EFAULT */
@@ -10,7 +10,15 @@
#include <uapi/linux/stat.h> /* S_IRUSR */
static struct dentry *debugfs_file;
// The buffer can be stored in two ways: static module data or kmalloc.
#define STATIC 1
#if STATIC
static char data[] = {'a', 'b', 'c', 'd'};
#define BUFLEN sizeof(data)
#else
static char *data;
#define BUFLEN 4
#endif
static int open(struct inode *inode, struct file *filp)
{
@@ -19,7 +27,7 @@ static int open(struct inode *inode, struct file *filp)
}
/* @param[in,out] off: gives the initial position into the buffer.
* We must increment this by the ammount of bytes read.
* We must increment this by the amount of bytes read.
* Then when userland reads the same file descriptor again,
* we start from that point instead.
*/
@@ -27,20 +35,17 @@ static ssize_t read(struct file *filp, char __user *buf, size_t len, loff_t *off
{
ssize_t ret;
pr_info("read\n");
pr_info("len = %zu\n", len);
pr_info("off = %lld\n", (long long)*off);
if (sizeof(data) <= *off) {
pr_info("read len=%zu off=%lld\n", len, (long long)*off);
if (BUFLEN <= *off) {
ret = 0;
} else {
ret = min(len, sizeof(data) - (size_t)*off);
ret = min(len, BUFLEN - (size_t)*off);
if (copy_to_user(buf, data + *off, ret)) {
ret = -EFAULT;
} else {
*off += ret;
}
}
pr_info("buf = %.*s\n", (int)len, buf);
pr_info("ret=%lld\n", (long long)ret);
return ret;
}
@@ -54,13 +59,11 @@ static ssize_t write(struct file *filp, const char __user *buf, size_t len, loff
{
ssize_t ret;
pr_info("write\n");
pr_info("len = %zu\n", len);
pr_info("off = %lld\n", (long long)*off);
if (sizeof(data) <= *off) {
pr_info("write len=%zu off=%lld\n", len, (long long)*off);
if (BUFLEN <= *off) {
ret = 0;
} else {
if (sizeof(data) - (size_t)*off < len) {
if (BUFLEN - (size_t)*off < len) {
ret = -ENOSPC;
} else {
if (copy_from_user(data + *off, buf, len)) {
@@ -89,9 +92,7 @@ static loff_t llseek(struct file *filp, loff_t off, int whence)
{
loff_t newpos;
pr_info("llseek\n");
pr_info("off = %lld\n", (long long)off);
pr_info("whence = %lld\n", (long long)whence);
pr_info("llseek off=%lld whence=%lld\n", (long long)off, (long long)whence);
switch(whence) {
case SEEK_SET:
newpos = off;
@@ -100,7 +101,7 @@ static loff_t llseek(struct file *filp, loff_t off, int whence)
newpos = filp->f_pos + off;
break;
case SEEK_END:
newpos = sizeof(data) + off;
newpos = BUFLEN + off;
break;
default:
return -EINVAL;
@@ -124,12 +125,24 @@ static const struct file_operations fops = {
static int myinit(void)
{
#if STATIC == 0
data = kmalloc(BUFLEN, GFP_KERNEL);
if (!data)
return -ENOMEM;
data[0] = 'a';
data[1] = 'b';
data[2] = 'c';
data[3] = 'd';
#endif
debugfs_file = debugfs_create_file("lkmc_fops", S_IRUSR | S_IWUSR, NULL, NULL, &fops);
return 0;
}
static void myexit(void)
{
#if STATIC == 0
kfree(data);
#endif
debugfs_remove_recursive(debugfs_file);
}

View File

@@ -1,7 +0,0 @@
{
"folders": [
{
"path": "."
}
],
}

189
kernel_modules/memfile.c Normal file
View File

@@ -0,0 +1,189 @@
/* https://cirosantilli.com/linux-kernel-module-cheat#fops */
#include <linux/debugfs.h>
#include <linux/errno.h> /* EFAULT */
#include <linux/fs.h> /* file_operations */
#include <linux/kernel.h> /* min */
#include <linux/module.h>
#include <linux/printk.h> /* printk */
#include <linux/string.h> /* strcpy */
#include <linux/uaccess.h> /* copy_from_user, copy_to_user */
#include <linux/rwsem.h>
#include <uapi/linux/stat.h> /* S_IRUSR */
/* Params */
static int log = 0;
module_param(log, int, S_IRUSR | S_IWUSR);
MODULE_PARM_DESC(log, "enable logging");
/* Dynamic array: https://stackoverflow.com/questions/3536153/c-dynamically-growing-array */
typedef struct {
char *buf;
size_t used;
size_t _size;
} dyn_arr_t;
int dyn_arr_init(dyn_arr_t *a, size_t size);
int dyn_arr_init(dyn_arr_t *a, size_t size)
{
a->buf = kvzalloc(size, GFP_KERNEL);
if (!a->buf)
return -ENOMEM;
a->used = 0;
a->_size = size;
return 0;
}
/* Reserve the required space for a future data insertion of size len at offset off.
* We don't do the actual insertion here as there are multiple possible insertion methods
* e.g. copy_from_user or strcpy.
*/
int dyn_arr_reserve(dyn_arr_t *a, size_t off, size_t len);
int dyn_arr_reserve(dyn_arr_t *a, size_t off, size_t len)
{
size_t new_used, new_size;
new_used = off + len;
if (new_used > a->_size) {
new_size = new_used * 2;
a->buf = kvrealloc(a->buf, a->_size, new_size, GFP_KERNEL);
if (!a->buf)
return -ENOMEM;
a->_size = new_size;
}
if (off > a->used)
memset(a->buf + a->used, '\0', off - a->used);
if (new_used > a->used)
a->used = new_used;
if (log) pr_info("dyn_arr_reserve _size:=%zu used:=%zu\n", a->_size, a->used);
return 0;
}
void dyn_arr_free(dyn_arr_t *a);
void dyn_arr_free(dyn_arr_t *a)
{
kvfree(a->buf);
a->buf = NULL;
a->used = 0;
a->_size = 0;
}
/* Globals. */
static dyn_arr_t data;
static struct dentry *debugfs_file;
struct rw_semaphore rwsem;
static int open(struct inode *inode, struct file *filp)
{
if (log) pr_info("open\n");
if ((filp->f_flags & O_TRUNC)) {
if (log) pr_info("open O_TRUNC\n");
data.used = 0;
} else if ((filp->f_flags & O_APPEND)) {
if (log) pr_info("open O_APPEND\n");
filp->f_pos = data.used;
}
return 0;
}
static ssize_t read(struct file *filp, char __user *buf, size_t len, loff_t *off)
{
ssize_t ret;
if (log) pr_info("read len=%zu off=%lld\n", len, (long long)*off);
down_read(&rwsem);
if (data.used <= *off) {
ret = 0;
} else {
ret = min(len, data.used - (size_t)*off);
if (copy_to_user(buf, data.buf + *off, ret)) {
ret = -EFAULT;
} else {
*off += ret;
}
}
up_read(&rwsem);
if (log) pr_info("read ret:=%lld\n", (long long)ret);
return ret;
}
static ssize_t write(struct file *filp, const char __user *buf, size_t len, loff_t *off)
{
ssize_t ret;
if (log) pr_info("write len=%zu off=%lld\n", len, (long long)*off);
down_write(&rwsem);
dyn_arr_reserve(&data, *off, len);
if (copy_from_user(data.buf + *off, buf, len)) {
ret = -EFAULT;
} else {
ret = len;
*off += ret;
}
up_write(&rwsem);
if (log) pr_info("write ret:=%lld\n", (long long)ret);
return ret;
}
static int release(struct inode *inode, struct file *filp)
{
if (log) pr_info("release\n");
return 0;
}
static loff_t llseek(struct file *filp, loff_t off, int whence)
{
loff_t newpos;
if (log) pr_info("llseek off=%lld whence=%lld\n", (long long)off, (long long)whence);
switch(whence) {
case SEEK_SET:
newpos = off;
break;
case SEEK_CUR:
newpos = filp->f_pos + off;
break;
case SEEK_END:
newpos = data.used + off;
break;
default:
return -EINVAL;
}
if (newpos < 0) return -EINVAL;
filp->f_pos = newpos;
if (log) pr_info("llseek newpos:=%lld\n", (long long)newpos);
return newpos;
}
static const struct file_operations fops = {
.owner = THIS_MODULE,
.llseek = llseek,
.open = open,
.read = read,
.release = release,
.write = write,
};
static int myinit(void)
{
int ret;
ret = dyn_arr_init(&data, 1);
if (ret)
return ret;
init_rwsem(&rwsem);
debugfs_file = debugfs_create_file("lkmc_memfile",
S_IRUSR | S_IWUSR, NULL, NULL, &fops);
return 0;
}
static void myexit(void)
{
dyn_arr_free(&data);
debugfs_remove_recursive(debugfs_file);
}
module_init(myinit)
module_exit(myexit)
MODULE_LICENSE("GPL");

1617
kernel_modules/scull.c Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -17,7 +17,7 @@ static struct dentry *debugfs_file;
/* Called at the beginning of every read.
*
* The return value is passsed to the first show.
* The return value is passed to the first show.
* It normally represents the current position of the iterator.
* It could be any struct, but we use just a single integer here.
*

View File

@@ -11,6 +11,13 @@
"out.docker/**": true,
"out/**": true,
"submodules/**": true,
}
},
"search.exclude": {
"data/**": true,
".git/**": true,
"out.docker/**": true,
"out/**": true,
"submodules/**": true,
},
}
}

View File

@@ -1,7 +1,7 @@
#!/bin/sh
set -e
insmod character_device.ko
/mknoddev.sh lkmc_character_device
./mknoddev.sh lkmc_character_device
[ "$(cat /dev/lkmc_character_device)" = 'abcd' ]
rm /dev/lkmc_character_device
rmmod character_device

View File

@@ -3,7 +3,9 @@ set -e
# Setup
f=/sys/kernel/debug/lkmc_fops
insmod fops.ko
mod="${1:-fops.ko}"
shift
insmod "$mod" "$@"
# read
[ "$(cat "$f")" = abcd ]
@@ -18,7 +20,7 @@ set +e
printf 12345 > "$f"
exit_status="$?"
set -e
[ "$exit_status" -eq 8 ]
[ "$exit_status" -eq 1 ]
[ "$(cat "$f")" = abcd ]
# seek
@@ -26,5 +28,12 @@ printf 1234 > "$f"
printf z | dd bs=1 of="$f" seek=2
[ "$(cat "$f")" = 12z4 ]
# seek past the end
printf 1234 > "$f"
printf xy | dd bs=1 of="$f" seek=6
[ "$(cat "$f")" = 1234 ]
# Teardown
rmmod fops
echo passed

51
rootfs_overlay/lkmc/memfile.sh Executable file
View File

@@ -0,0 +1,51 @@
#!/bin/sh
set -e
# Helpers
odraw() (
od -A n -t x1 -v "$@" | tr -d '\n' | cut -c 2-
)
# Setup
f=/sys/kernel/debug/lkmc_memfile
mod="${1:-memfile.ko}"
shift
insmod "$mod" "$@"
# Starts off empty
[ -z "$(cat "$f")" ]
# write and check it is there
printf 12 > "$f"
[ 12 = "$(cat "$f")" ]
# Append and check that it is there
printf 34 >> "$f"
[ 1234 = "$(cat "$f")" ]
# Restart
printf 56 > "$f"
[ 56 = "$(cat "$f")" ]
# skip
printf 1234 > "$f"
[ 23 = "$(dd if="$f" bs=1 count=2 skip=1)" ]
# seek
printf 1234 > "$f"
printf xy | dd bs=1 of="$f" seek=1 conv=notrunc
[ 1xy4 = "$(cat "$f")" ]
# seek past the end
printf 1234 > "$f"
printf xy | dd bs=1 of="$f" seek=6 conv=notrunc
[ '31 32 33 34 00 00 78 79' = "$(odraw "$f")" ]
# Allocate 1 GB for fun.
dd if=/dev/zero of="$f" bs=1k count=1M
[ '00 00' = "$(dd if="$f" bs=1 count=2 skip=500M | odraw)" ]
# Teardown
rmmod memfile
echo passed

59
rootfs_overlay/lkmc/scull.sh Executable file
View File

@@ -0,0 +1,59 @@
#!/bin/sh
set -eux
name=scull
mod="${1:-$name.ko}"
shift
insmod "$mod" "$@"
major="$(awk "\$2==\"$name\" {print \$1}" /proc/devices)"
rm -f /dev/${name}[0-3]
mknod /dev/${name}0 c $major 0
mknod /dev/${name}1 c $major 1
mknod /dev/${name}2 c $major 2
mknod /dev/${name}3 c $major 3
rm -f /dev/${name}pipe[0-3]
mknod /dev/${name}pipe0 c $major 4
mknod /dev/${name}pipe1 c $major 5
mknod /dev/${name}pipe2 c $major 6
mknod /dev/${name}pipe3 c $major 7
rm -f /dev/${name}single
mknod /dev/${name}single c $major 8
rm -f /dev/${name}uid
mknod /dev/${name}uid c $major 9
rm -f /dev/${name}wuid
mknod /dev/${name}wuid c $major 10
rm -f /dev/${name}priv
mknod /dev/${name}priv c $major 11
## scull
f="/dev/${name}0"
[ -z "$(cat "$f")" ]
# Append starts writing from the start of the 4k block, not like the usual semantic.
printf asdf > "$f"
printf qw >> "$f"
[ qwdf = "$(cat "$f")" ]
# Overwrite first clears everything, then writes to start of 4k block.
printf asdf > /dev/${name}0
printf qw > /dev/${name}0
[ qw = "$(cat "$f")" ]
# Read from the middle
printf asdf > /dev/${name}0
[ df = "$(dd if="$f" bs=1 count=2 skip=2 status=none)" ]
# Write to the middle
printf asdf > "$f"
printf we | dd of="$f" bs=1 seek=1 conv=notrunc status=none
[ awef = "$(cat "$f")" ]
echo passed

View File

@@ -1,7 +1,9 @@
#!/bin/sh
set -e
f=/sys/kernel/debug/lkmc_seq_file
insmod seq_file.ko
mod="${1:-seq_file.ko}"
shift
insmod "$mod" "$@"
[ "$(cat "$f")" = "$(printf '0\n1\n2\n')" ]
[ "$(cat "$f")" = "$(printf '0\n1\n2\n')" ]
[ "$(dd if="$f" bs=1 count=2 skip=0 status=none)" = "$(printf '0\n')" ]

View File

@@ -1,7 +1,9 @@
#!/bin/sh
set -e
f=/sys/kernel/debug/lkmc_seq_file_single_open
insmod seq_file_single_open.ko
mod="${1:-seq_file_single_open.ko}"
shift
insmod "$mod" "$@"
[ "$(cat "$f")" = "$(printf 'ab\ncd\n')" ]
[ "$(dd if="$f" bs=1 count=3 skip=1)" = "$(printf "b\nc\n")" ]
rmmod seq_file_single_open