diff --git a/4.9.11/LKMPG-4.9.11.html b/4.9.11/LKMPG-4.9.11.html index 681cbdc..9bfa216 100644 --- a/4.9.11/LKMPG-4.9.11.html +++ b/4.9.11/LKMPG-4.9.11.html @@ -3,7 +3,7 @@ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
- +The Linux Kernel Module Programming Guide is a free book; you may reproduce and/or modify it under the terms of the Open Software License, version 3.0.
@@ -401,18 +401,18 @@ If you publish or distribute this book commercially, donations, royalties, and/oThe Linux Kernel Module Programming Guide was originally written for the 2.2 kernels by Ori Pomerantz. Eventually, Ori no longer had time to maintain the document. After all, the Linux kernel is a fast moving target. Peter Jay Salzman took over maintenance and updated it for the 2.4 kernels. Eventually, Peter no longer had time to follow developments with the 2.6 kernel, so Michael Burian became a co-maintainer to update the document for the 2.6 kernels. Bob Mottram updated the examples for 3.8 and later kernels, added the sysfs chapter and modified or updated other chapters.
The Linux kernel is a moving target. There has always been a question whether the LKMPG should remove deprecated information or keep it around for historical sake. Michael Burian and I decided to create a new branch of the LKMPG for each new stable kernel version. So version LKMPG 4.9.x will address Linux kernel 4.9.x and LKMPG 2.6.x will address Linux kernel 2.6. No attempt will be made to archive historical information; a person wishing this information should read the appropriately versioned LKMPG.
@@ -423,18 +423,18 @@ The source code and discussions should apply to most architectures, but I can'tThe following people have contributed corrections or good suggestions: Ignacio Martin, David Porter, Daniele Paolo Scarpazza, Dimo Velev, Francois Audeon, Horst Schirmeier, Bob Mottram and Roman Lakeev.
So, you want to write a kernel module. You know C, you've written a few normal programs to run as processes, and now you want to get to where the real action is, to where a single wild pointer can wipe out your file system and a core dump means a reboot.
@@ -445,9 +445,9 @@ What exactly is a kernel module? Modules are pieces of code that can be loaded aLinux distros provide the commands modprobe, insmod and depmod within a package.
@@ -472,9 +472,9 @@ On Parabola:To discover what modules are already loaded within your current kernel use the command lsmod.
@@ -504,33 +504,33 @@ This can be a long list, and you might prefer to search for something particularFor the purposes of following this guide you don't necessarily need to do that. However, it would be wise to run the examples within a test distro running on a virtual machine in order to avoid any possibility of messing up your system.
Before we delve into code, there are a few issues we need to cover. Everyone's system is different and everyone has their own groove. Getting your first "hello world" program to compile and load correctly can sometimes be a trick. Rest assured, after you get over the initial hurdle of doing it for the first time, it will be smooth sailing thereafter.
A module compiled for one kernel won't load if you boot a different kernel unless you enable CONFIG_MODVERSIONS in the kernel. We won't go into module versioning until later in this guide. Until we cover modversions, the examples in the guide may not work if you're running a kernel with modversioning turned on. However, most stock Linux distro kernels come with it turned on. If you're having trouble loading the modules because of versioning errors, compile a kernel with modversioning turned off.
It is highly recommended that you type in, compile and load all the examples this guide discusses. It's also highly recommended you do this from a console. You should not be working on this stuff in X.
@@ -544,9 +544,9 @@ Modules can't print to the screen like printf() can, but they can log informatioBefore you can build anything you'll need to install the header files for your kernel. On Parabola GNU/Linux:
@@ -576,9 +576,9 @@ This will tell you what kernel header files are available. Then for example:All the examples from this document are available within the examples subdirectory. To test that they compile:
@@ -594,13 +594,13 @@ If there are any compile errors then you might have a more recent kernel versionMost people learning programming start out with some sort of "hello world" example. I don't know what happens to people who break with this tradition, but I think it's safer not to find out. We'll start with a series of hello world programs that demonstrate the different aspects of the basics of writing a kernel module.
@@ -744,15 +744,15 @@ Lastly, every kernel module needs to include linux/module.h. We needed to includAnother thing which may not be immediately obvious to anyone getting started with kernel programming is that indentation within your code should be using tabs and not spaces. It's one of the coding conventions of the kernel. You may not like it, but you'll need to get used to it if you ever submit a patch upstream.
Despite what you might think, printk() was not meant to communicate information to the user, even though we used it for exactly this purpose in hello-1! It happens to be a logging mechanism for the kernel, and is used to log information or give warnings. Therefore, each printk() statement comes with a priority, which is the <1> and KERN_ALERT you see. There are 8 priorities and the kernel has macros for them, so you don't have to use cryptic numbers, and you can view them (and their meanings) in linux/kernel.h. If you don't specify a priority level, the default priority, DEFAULT_MESSAGE_LOGLEVEL, will be used.
@@ -767,8 +767,8 @@ If the priority is less than int console_loglevel, the message is printed on youKernel modules need to be compiled a bit differently from regular userspace apps. Former kernel versions required us to care much about these settings, which are usually stored in Makefiles. Although hierarchically organized, many redundant settings accumulated in sublevel Makefiles and made them large and rather difficult to maintain. Fortunately, there is a new way of doing these things, called kbuild, and the build process for external loadable modules is now fully integrated into the standard kernel build mechanism. To learn more on how to compile modules which are not part of the official kernel (such as all the examples you'll find in this guide), see file linux/Documentation/kbuild/modules.txt.
@@ -787,9 +787,9 @@ Here's another exercise for the reader. See that comment above the return statemIn early kernel versions you had to use the init_module and cleanup_module functions, as in the first hello world example, but these days you can name those anything you want by using the module_init and module_exit macros. These macros are defined in linux/init.h. The only requirement is that your init and cleanup functions must be defined before calling the those macros, otherwise you'll get compilation errors. Here's an example of this technique:
@@ -841,9 +841,9 @@ Now have a look at linux/drivers/char/Makefile for a real world example. As youThis demonstrates a feature of kernel 2.2 and later. Notice the change in the definitions of the init and cleanup functions. The __init macro causes the init function to be discarded and its memory freed once the init function finishes for built-in drivers, but not loadable modules. If you think about when the init function is invoked, this makes perfect sense.
@@ -888,9 +888,9 @@ module_exit(hello_3_exit);Honestly, who loads or even cares about proprietary modules? If you do then you might have seen something like this:
@@ -942,9 +942,9 @@ module_exit(cleanup_hello_4);Modules can take command line arguments, but not with the argc/argv you might be used to.
@@ -1094,9 +1094,9 @@ hello-5.o: invalid argument syntax for mylong: 'h'Sometimes it makes sense to divide a kernel module between several source files.
@@ -1167,9 +1167,9 @@ This is the complete makefile for all the examples we've seen so far. The firstObviously, we strongly suggest you to recompile your kernel, so that you can enable a number of useful debugging features, such as forced module unloading (MODULE_FORCE_UNLOAD): when this option is enabled, you can force the kernel to unload a module even when it believes it is unsafe, via a sudo rmmod -f module command. This option can save you a lot of time and a number of reboots during the development of a module. If you don't want to recompile your kernel then you should consider running the examples within a test distro on a virtual machine. If you mess anything up then you can easily reboot or restore the VM.
@@ -1261,13 +1261,13 @@ If you do not desire to actually compile the kernel, you can interrupt the buildA program usually begins with a main() function, executes a bunch of instructions and terminates upon completion of those instructions. Kernel modules work a bit differently. A module always begin with either the init_module or the function you specify with module_init call. This is the entry function for modules; it tells the kernel what functionality the module provides and sets up the kernel to run the module's functions when they're needed. Once it does this, entry function returns and the module does nothing until the kernel wants to do something with the code that the module provides.
@@ -1282,9 +1282,9 @@ Every module must have an entry function and an exit function. Since there's morProgrammers use functions they don't define all the time. A prime example of this is printf(). You use these library functions which are provided by the standard C library, libc. The definitions for these functions don't actually enter your program until the linking stage, which insures that the code (for printf() for example) is available, and fixes the call instruction to point to that code.
@@ -1313,7 +1313,7 @@ Would you like to see what system calls are made by printf()? It's easy! Compile-with gcc -Wall -o hello hello.c. Run the exectable with strace ./hello. Are you impressed? Every line you see corresponds to a system call. strace1 is a handy program that gives you details about what system calls a program is making, including which call is made, what its arguments are and what it returns. It's an invaluable tool for figuring out things like what files a program is trying to access. Towards the end, you'll see a line which looks like write (1, "hello", 5hello). There it is. The face behind the printf() mask. You may not be familiar with write, since most people use library functions for file I/O (like fopen, fputs, fclose). If that's the case, try looking at man 2 write. The 2nd man section is devoted to system calls (like kill() and read()). The 3rd man section is devoted to library calls, which you would probably be more familiar with (like cosh() and random()). +with gcc -Wall -o hello hello.c. Run the exectable with strace ./hello. Are you impressed? Every line you see corresponds to a system call. strace is a handy program that gives you details about what system calls a program is making, including which call is made, what its arguments are and what it returns. It's an invaluable tool for figuring out things like what files a program is trying to access. Towards the end, you'll see a line which looks like write (1, "hello", 5hello). There it is. The face behind the printf() mask. You may not be familiar with write, since most people use library functions for file I/O (like fopen, fputs, fclose). If that's the case, try looking at man 2 write. The 2nd man section is devoted to system calls (like kill() and read()). The 3rd man section is devoted to library calls, which you would probably be more familiar with (like cosh() and random()).
@@ -1322,9 +1322,9 @@ You can even write modules to replace the kernel's system calls, which we'll do
A kernel is all about access to resources, whether the resource in question happens to be a video card, a hard drive or even memory. Programs often compete for the same resource. As I just saved this document, updatedb started updating the locate database. My vim session and updatedb are both using the hard drive concurrently. The kernel needs to keep things orderly, and not give users access to resources whenever they feel like it. To this end, a CPU can run in different modes. Each mode gives a different level of freedom to do what you want on the system. The Intel 80386 architecture had 4 of these modes, which were called rings. Unix uses only two rings; the highest ring (ring 0, also known as `supervisor mode' where everything is allowed to happen) and the lowest ring, which is called `user mode'.
@@ -1335,9 +1335,9 @@ Recall the discussion about library functions vs system calls. Typically, you usWhen you write a small C program, you use variables which are convenient and make sense to the reader. If, on the other hand, you're writing routines which will be part of a bigger problem, any global variables you have are part of a community of other peoples' global variables; some of the variable names can clash. When a program has lots of global variables which aren't meaningful enough to be distinguished, you get namespace pollution. In large projects, effort must be made to remember reserved names, and to find ways to develop a scheme for naming unique variable names and symbols.
@@ -1352,15 +1352,15 @@ The file /proc/kallsyms holds all the symbols that the kernel knows aboutMemory management is a very complicated subject and the majority of O'Reilly's "Understanding The Linux Kernel" exclusively covers memory management! We're not setting out to be experts on memory managements, but we do need to know a couple of facts to even begin worrying about writing real modules.
-If you haven't thought about what a segfault really means, you may be surprised to hear that pointers don't actually point to memory locations. Not real ones, anyway. When a process is created, the kernel sets aside a portion of real physical memory and hands it to the process to use for its executing code, variables, stack, heap and other things which a computer scientist would know about2. This memory begins with 0x00000000 and extends up to whatever it needs to be. Since the memory space for any two processes don't overlap, every process that can access a memory address, say 0xbffff978, would be accessing a different location in real physical memory! The processes would be accessing an index named 0xbffff978 which points to some kind of offset into the region of memory set aside for that particular process. For the most part, a process like our Hello, World program can't access the space of another process, although there are ways which we'll talk about later. +If you haven't thought about what a segfault really means, you may be surprised to hear that pointers don't actually point to memory locations. Not real ones, anyway. When a process is created, the kernel sets aside a portion of real physical memory and hands it to the process to use for its executing code, variables, stack, heap and other things which a computer scientist would know about. This memory begins with 0x00000000 and extends up to whatever it needs to be. Since the memory space for any two processes don't overlap, every process that can access a memory address, say 0xbffff978, would be accessing a different location in real physical memory! The processes would be accessing an index named 0xbffff978 which points to some kind of offset into the region of memory set aside for that particular process. For the most part, a process like our Hello, World program can't access the space of another process, although there are ways which we'll talk about later.
@@ -1368,22 +1368,22 @@ The kernel has its own space of memory as well. Since a module is code which can
-By the way, I would like to point out that the above discussion is true for any operating system which uses a monolithic kernel3. There are things called microkernels which have modules which get their own codespace. The GNU Hurd and QNX Neutrino are two examples of a microkernel. +By the way, I would like to point out that the above discussion is true for any operating system which uses a monolithic kernel. This isn't quite the same thing as "building all your modules into the kernel", although the idea is the same. There are things called microkernels which have modules which get their own codespace. The GNU Hurd and the Magenta kernel of Google Fuchsia are two examples of a microkernel.
One class of module is the device driver, which provides functionality for hardware like a serial port. On unix, each piece of hardware is represented by a file located in /dev named a device file which provides the means to communicate with the hardware. The device driver provides the communication on behalf of a user program. So the es1370.o sound card device driver might connect the /dev/sound device file to the Ensoniq IS1370 sound card. A userspace program like mp3blaster can use /dev/sound without ever knowing what kind of sound card is installed.
Let's look at some device files. Here are device files which represent the first three partitions on the primary master IDE hard drive:
@@ -1448,13 +1448,13 @@ By now you can look at these two device files and know instantly that they are bThe file_operations structure is defined in /usr/include/linux/fs.h, and holds pointers to functions defined by the driver that perform various operations on the device. Each field of the structure corresponds to the address of some function defined by the driver to handle a requested operation.
@@ -1539,9 +1539,9 @@ An instance of struct file_operations containing pointers to functions that areEach device is represented in the kernel by a file structure, which is defined in linux/fs.h. Be aware that a file is a kernel level structure and never appears in a user space program. It's not the same thing as a FILE, which is defined by glibc and would never appear in a kernel space function. Also, its name is a bit misleading; it represents an abstract open `file', not a file on a disk, which is represented by a structure named inode.
@@ -1556,11 +1556,11 @@ Go ahead and look at the definition of file. Most of the entries you see, like s-As discussed earlier, char devices are accessed through device files, usually located in /dev4. The major number tells you which driver handles which device file. The minor number is used only by the driver itself to differentiate which device it's operating on, just in case the driver handles more than one device. +As discussed earlier, char devices are accessed through device files, usually located in /dev. This is by convention. When writing a driver, it's OK to put the device file in your current directory. Just make sure you place it in /dev for a production driver. The major number tells you which driver handles which device file. The minor number is used only by the driver itself to differentiate which device it's operating on, just in case the driver handles more than one device.
@@ -1586,9 +1586,9 @@ If you pass a major number of 0 to register_chrdev, the return value will be the
We can't allow the kernel module to be rmmod'ed whenever root feels like it. If the device file is opened by a process and then we remove the kernel module, using the file would cause a call to the memory location where the appropriate function (read/write) used to be. If we're lucky, no other code was loaded there, and we'll get an ugly error message. If we're unlucky, another kernel module was loaded into the same location, which means a jump into the middle of another function within the kernel. The results of this would be impossible to predict, but they can't be very positive.
@@ -1608,9 +1608,9 @@ It's important to keep the counter accurate; if you ever do lose track of the coThe next code sample creates a char driver named chardev. You can cat its device file.
@@ -1798,9 +1798,9 @@ The next code sample creates a char driver named chardev. You can cat its deviceThe system calls, which are the major interface the kernel shows to the processes, generally stay the same across versions. A new system call may be added, but usually the old ones will behave exactly like they used to. This is necessary for backward compatibility – a new kernel version is not supposed to break regular processes. In most cases, the device files will also remain the same. On the other hand, the internal interfaces within the kernel can and do change between versions.
@@ -1824,9 +1824,9 @@ You might already have noticed that recent kernels look different. In case you hIn Linux, there is an additional mechanism for the kernel and kernel modules to send information to processes — the /proc file system. Originally designed to allow easy access to information about processes (hence the name), it is now used by every bit of the kernel which has something interesting to report, such as /proc/modules which provides the list of modules and /proc/meminfo which stats memory usage statistics.
@@ -1836,7 +1836,7 @@ The method to use the proc file system is very similar to the one used with devi-The reason we use proc_register_dynamic5 is because we don't want to determine the inode number used for our file in advance, but to allow the kernel to determine it to prevent clashes. Normal file systems are located on a disk, rather than just in memory (which is where /proc is), and in that case the inode number is a pointer to a disk location where the file's index-node (inode for short) is located. The inode contains information about the file, for example the file's permissions, together with a pointer to the disk location or locations where the file's data can be found. +The reason we use proc_register_dynamic1 is because we don't want to determine the inode number used for our file in advance, but to allow the kernel to determine it to prevent clashes. Normal file systems are located on a disk, rather than just in memory (which is where /proc is), and in that case the inode number is a pointer to a disk location where the file's index-node (inode for short) is located. The inode contains information about the file, for example the file's permissions, together with a pointer to the disk location or locations where the file's data can be found.
@@ -1916,9 +1916,9 @@ HelloWorld!
We have seen a very simple example for a /proc file where we only read the file /proc/helloworld. It's also possible to write in a /proc file. It works the same way as read, a function is called when the /proc file is written. But there is a little difference with read, data comes from user, so you have to import data from user space to kernel space (with copy_from_user or get_user)
@@ -2035,15 +2035,15 @@ The only memory segment accessible to a process is its own, so when writing reguWe have seen how to read and write a /proc file with the /proc interface. But it's also possible to manage /proc file with inodes. The main concern is to use advanced functions, like permissions.
-In Linux, there is a standard mechanism for file system registration. Since every file system has to have its own functions to handle inode and file operations6, there is a special structure to hold pointers to all those functions, struct inode_operations, which includes a pointer to struct file_operations. In /proc, whenever we register a new file, we're allowed to specify which struct inode_operations will be used to access to it. This is the mechanism we use, a struct inode_operations which includes a pointer to a struct file_operations which includes pointers to our procfs_read and procfs_write functions. +In Linux, there is a standard mechanism for file system registration. Since every file system has to have its own functions to handle inode and file operations2, there is a special structure to hold pointers to all those functions, struct inode_operations, which includes a pointer to struct file_operations. In /proc, whenever we register a new file, we're allowed to specify which struct inode_operations will be used to access to it. This is the mechanism we use, a struct inode_operations which includes a pointer to a struct file_operations which includes pointers to our procfs_read and procfs_write functions.
@@ -2147,9 +2147,9 @@ Still hungry for procfs examples? Well, first of all keep in mind, there are rum
As we have seen, writing a /proc file may be quite "complex". So to help people writting /proc file, there is an API named seq_file that helps @@ -2339,9 +2339,9 @@ You can also read the code of fs/seq_file.c in the linux kernel.
sysfs allows you to interact with the running kernel from userspace by reading or setting variables inside of modules. This can be useful for debugging purposes, or just as an interface for applications or scripts. You can find sysfs directories and files under the sys directory on your system.
@@ -2476,9 +2476,9 @@ Finally, remove the test module:Device files are supposed to represent physical devices. Most physical devices are used for output as well as input, so there has to be some mechanism for device drivers in the kernel to get the output to send to the device from processes. This is done by opening the device file for output and writing to it, just like writing to a file. In the following example, this is implemented by device_write.
@@ -2488,7 +2488,7 @@ This is not always enough. Imagine you had a serial port connected to a modem (e-The answer in Unix is to use a special function called ioctl (short for Input Output ConTroL). Every device can have its own ioctl commands, which can be read ioctl's (to send information from a process to the kernel), write ioctl's (to return information to a process), 7 both or neither. The ioctl function is called with three parameters: the file descriptor of the appropriate device file, the ioctl number, and a parameter, which is of type long so you can use a cast to use it to pass anything. 8 +The answer in Unix is to use a special function called ioctl (short for Input Output ConTroL). Every device can have its own ioctl commands, which can be read ioctl's (to send information from a process to the kernel), write ioctl's (to return information to a process), 3 both or neither. The ioctl function is called with three parameters: the file descriptor of the appropriate device file, the ioctl number, and a parameter, which is of type long so you can use a cast to use it to pass anything. 4
@@ -2959,9 +2959,9 @@ If you want to use ioctls in your own kernel modules, it is best to receive an o
So far, the only thing we've done was to use well defined kernel mechanisms to register /proc files and device handlers. This is fine if you want to do something the kernel programmers thought you'd want, such as write a device driver. But what if you want to do something unusual, to change the behavior of the system in some way? Then, you're mostly on your own.
@@ -3167,24 +3167,24 @@ MODULE_LICENSE("GPL");What do you do when somebody asks you for something you can't do right away? If you're a human being and you're bothered by a human being, the only thing you can say is: "Not right now, I'm busy. Go away!". But if you're a kernel module and you're bothered by a process, you have another possibility. You can put the process to sleep until you can service it. After all, processes are being put to sleep by the kernel and woken up all the time (that's the way multiple processes appear to run on the same time on a single CPU).
-This kernel module is an example of this. The file (called /proc/sleep) can only be opened by a single process at a time. If the file is already open, the kernel module calls wait_event_interruptible9. This function changes +This kernel module is an example of this. The file (called /proc/sleep) can only be opened by a single process at a time. If the file is already open, the kernel module calls wait_event_interruptible5. This function changes the status of the task (a task is the kernel data structure which holds information about a process and the system call it's in, if any) to TASK_INTERRUPTIBLE, which means that the task will not run until it is woken up somehow, and adds it to WaitQ, the queue of tasks waiting to access the file. Then, the function calls the scheduler to context switch to a different process, one which has some use for the CPU.
-When a process is done with the file, it closes it, and module_close is called. That function wakes up all the processes in the queue (there's no mechanism to only wake up one of them). It then returns and the process which just closed the file can continue to run. In time, the scheduler decides that that process has had enough and gives control of the CPU to another process. Eventually, one of the processes which was in the queue will be given control of the CPU by the scheduler. It starts at the point right after the call to module_interruptible_sleep_on10. It can then proceed to set a global variable to tell all the other processes that the file is still open and go on with its life. When the other processes get a piece of the CPU, they'll see that global variable and go back to sleep. +When a process is done with the file, it closes it, and module_close is called. That function wakes up all the processes in the queue (there's no mechanism to only wake up one of them). It then returns and the process which just closed the file can continue to run. In time, the scheduler decides that that process has had enough and gives control of the CPU to another process. Eventually, one of the processes which was in the queue will be given control of the CPU by the scheduler. It starts at the point right after the call to module_interruptible_sleep_on6. It can then proceed to set a global variable to tell all the other processes that the file is still open and go on with its life. When the other processes get a piece of the CPU, they'll see that global variable and go back to sleep.
@@ -3192,7 +3192,7 @@ So we'll use tail -f to keep the file open in the background, while trying to ac
-To make our life more interesting, module_close doesn't have a monopoly on waking up the processes which wait to access the file. A signal, such as Ctrl +c (SIGINT) can also wake up a process. 11 In that case, we want to return with -EINTR immediately. This is important so users can, for example, kill the process before it receives the file. +To make our life more interesting, module_close doesn't have a monopoly on waking up the processes which wait to access the file. A signal, such as Ctrl +c (SIGINT) can also wake up a process. 7 In that case, we want to return with -EINTR immediately. This is important so users can, for example, kill the process before it receives the file.
@@ -3563,9 +3563,9 @@ DECLARE_WAIT_QUEUE_HEAD(WaitQ);
Sometimes one thing should happen before another within a module having multiple threads. Rather than using /proc/sleep commands the kernel has another way to do this which allows timeouts or interrupts to also happen.
@@ -3670,16 +3670,16 @@ There are other variations upon the wait_for_completion function, which iIf processes running on different CPUs or in different threads try to access the same memory then it's possible that strange things can happen or your system can lock up. To avoid this various types of mutual exclusion kernel functions are available. These indicate if a section of code is "locked" or "unlocked" so that simultaneous attempts to run it can't happen.
You can use kernel mutexes (mutual exclusions) in much the same manner that you might deploy them in userland. This may be all that's needed to avoid collisions in most cases.
@@ -3729,9 +3729,9 @@ MODULE_LICENSE("GPL");As the name suggests, spinlocks lock up the CPU that the code is running on, taking 100% of its resources. Because of this you should only use the spinlock mechanism around code which is likely to take no more than a few milliseconds to run and so won't noticably slow anything down from the user's point of view.
@@ -3809,9 +3809,9 @@ MODULE_LICENSE("GPL");Read and write locks are specialised kinds of spinlocks so that you can exclusively read from something or write to something. Like the earlier spinlocks example the one below shows an "irq safe" situation in which if other functions were triggered from irqs which might also read and write to whatever you are concerned with then they wouldn't disrupt the logic. As before it's a good idea to keep anything done within the lock as short as possible so that it doesn't hang up the system and cause users to start revolting against the tyranny of your module.
@@ -3878,9 +3878,9 @@ Of course if you know for sure that there are no functions triggered by irqs whiIf you're doing simple arithmetic: adding, subtracting or bitwise operations then there's another way in the multi-CPU and multi-hyperthreaded world to stop other parts of the system from messing with your mojo. By using atomic operations you can be confident that your addition, subtraction or bit flip did actually happen and wasn't overwritten by some other shenanigans. An example is shown below.
@@ -3967,15 +3967,15 @@ MODULE_LICENSE("GPL");-In Section 1.2.1.2, I said that X and kernel module programming don't mix. That's true for developing kernel modules, but in actual use, you want to be able to send messages to whichever tty12 the command to load the module came from. +In Section 1.2.1.2, I said that X and kernel module programming don't mix. That's true for developing kernel modules, but in actual use, you want to be able to send messages to whichever tty8 the command to load the module came from.
@@ -4096,9 +4096,9 @@ module_exit(print_string_exit);
In certain conditions, you may desire a simpler and more direct way to communicate to the external world. Flashing keyboard LEDs can be such a solution: It is an immediate way to attract attention or to display a status condition. Keyboard LEDs are present on every hardware, they are always visible, they do not need any setup, and their use is rather simple and non-intrusive, compared to writing to a tty or a file.
@@ -4217,17 +4217,17 @@ While you have seen lots of stuff that can be used to aid debugging here, thereThere are two main ways of running tasks: tasklets and work queues. Tasklets are a quick and easy way of scheduling a single function to be run, for example when triggered from an interrupt, whereas work queues are more complicated but also better suited to running multiple things in a sequence.
Here's an example tasklet module. The tasklet_fn function runs for a few seconds and in the mean time execution of the example_tasklet_init function continues to the exit point.
@@ -4284,9 +4284,9 @@ Example tasklet endsVery often, we have "housekeeping" tasks which have to be done at a certain time, or every so often. If the task is to be done by a process, we do it by putting it in the crontab file. If the task is to be done by a kernel module, we have two possibilities. The first is to put a process in the crontab file which will wake up the module by a system call when necessary, for example by opening a file. This is terribly inefficient, however – we run a new process off of crontab, read a new executable to memory, and all this just to wake up a kernel module which is in memory anyway.
@@ -4470,13 +4470,13 @@ MODULE_LICENSE("GPL");Except for the last chapter, everything we did in the kernel so far we've done as a response to a process asking for it, either by dealing with a special file, sending an ioctl(), or issuing a system call. But the job of the kernel isn't just to respond to process requests. Another job, which is every bit as important, is to speak to the hardware connected to the machine.
@@ -4486,7 +4486,7 @@ There are two types of interaction between the CPU and the rest of the computer'-Under Linux, hardware interrupts are called IRQ's (InterruptRe quests)13. There are two types of IRQ's, short and long. A short IRQ is one which is expected to take a very short period of time, during which the rest of the machine will be blocked and no other interrupts will be handled. A long IRQ is one which can take longer, and during which other interrupts may occur (but not interrupts from the same device). If at all possible, it's better to declare an interrupt handler to be long. +Under Linux, hardware interrupts are called IRQ's (InterruptRe quests)9. There are two types of IRQ's, short and long. A short IRQ is one which is expected to take a very short period of time, during which the rest of the machine will be blocked and no other interrupts will be handled. A long IRQ is one which can take longer, and during which other interrupts may occur (but not interrupts from the same device). If at all possible, it's better to declare an interrupt handler to be long.
@@ -4494,14 +4494,14 @@ When the CPU receives an interrupt, it stops whatever it's doing (unless it's pr
-The way to implement this is to call request_irq() to get your interrupt handler called when the relevant IRQ is received. 14This function receives the IRQ number, the name of the function, flags, a name for /proc/interrupts and a parameter to pass to the interrupt handler. Usually there is a certain number of IRQs available. How many IRQs there are is hardware-dependent. The flags can include SA_SHIRQ to indicate you're willing to share the IRQ with other interrupt handlers (usually because a number of hardware devices sit on the same IRQ) and SA_INTERRUPT to indicate this is a fast interrupt. This function will only succeed if there isn't already a handler on this IRQ, or if you're both willing to share. +The way to implement this is to call request_irq() to get your interrupt handler called when the relevant IRQ is received. 10This function receives the IRQ number, the name of the function, flags, a name for /proc/interrupts and a parameter to pass to the interrupt handler. Usually there is a certain number of IRQs available. How many IRQs there are is hardware-dependent. The flags can include SA_SHIRQ to indicate you're willing to share the IRQ with other interrupt handlers (usually because a number of hardware devices sit on the same IRQ) and SA_INTERRUPT to indicate this is a fast interrupt. This function will only succeed if there isn't already a handler on this IRQ, or if you're both willing to share.
Many popular single board computers, such as Raspberry Pis or Beagleboards, have a bunch of GPIO pins. Attaching buttons to those and then having a button press do something is a classic case in which you might need to use interrupts so that instead of having the CPU waste time and battery power polling for a change in input state it's better for the input to trigger the CPU to then run a particular handling function.
@@ -4667,9 +4667,9 @@ MODULE_DESCRIPTION("Handle some GPIO interrupts"Suppose you want to do a bunch of stuff inside of an interrupt routine. A common way to do that without rendering the interrupt unavailable for a significant duration is to combine it with a tasklet. This pushes the bulk of the work off into the scheduler.
@@ -4849,17 +4849,17 @@ MODULE_DESCRIPTION("Interrupt with top and bottom half"At the dawn of the internet everybody trusted everybody completely…but that didn't work out so well. When this guide was originally written it was a more innocent era in which almost nobody actually gave a damn about crypto - least of all kernel developers. That's certainly no longer the case now. To handle crypto stuff the kernel has its own API enabling common methods of encryption, decryption and your favourite hash functions.
Calculating and checking the hashes of things is a common operation. Here is a demonstration of how to calculate a sha256 hash within a kernel module.
@@ -4957,9 +4957,9 @@ Finally, remove the test module:Here is an example of symmetrically encrypting a string using the AES algorithm and a password.
@@ -5150,9 +5150,9 @@ MODULE_LICENSE("GPL");Up to this point we've seen all kinds of modules doing all kinds of things, but there was no consistency in their interfaces with the rest of the kernel. To impose some consistency such that there is at minimum a standardised way to start, suspend and resume a device a device model was added. An example is show below, and you can use this as a template to add your own suspend, resume or other interface functions.
@@ -5258,13 +5258,13 @@ module_exit(devicemodel_exit);Sometimes you might want your code to run as quickly as possible, especially if it's handling an interrupt or doing something which might cause noticible latency. If your code contains boolean conditions and if you know that the conditions are almost always likely to evaluate as either true or false, then you can allow the compiler to optimise for this using the likely and unlikely macros.
@@ -5289,35 +5289,35 @@ When the unlikely macro is used the compiler alters its machine instructiBefore I send you on your way to go out into the world and write kernel modules, there are a few things I need to warn you about. If I fail to warn you and something bad happens, please report the problem to me for a full refund of the amount I was paid for your copy of the book.
You can't do that. In a kernel module you can only use kernel functions, which are the functions you can see in /proc/kallsyms.
You might need to do this for a short time and that is OK, but if you don't enable them afterwards, your system will be stuck and you'll have to power it off.
I probably don't have to warn you about this, but I figured I will anyway, just in case.
@@ -5325,9 +5325,9 @@ I probably don't have to warn you about this, but I figured I will anyway, justI could easily have squeezed a few more chapters into this book. I could have added a chapter about creating new file systems, or about adding new protocol stacks (as if there's a need for that – you'd have to dig underground to find a protocol stack not supported by Linux). I could have added explanations of the kernel mechanisms we haven't touched upon, such as bootstrapping or the disk interface.
@@ -5354,54 +5354,33 @@ If you'd like to contribute to this guide, notice anything glaringly wrong, or j-It's an invaluable tool for figuring out things like what files a -program is trying to access. Ever have a program bail silently because -it couldn't find a file? It's a PITA! -
-I'm a physicist, not a computer scientist, Jim! -
-This isn't quite the same thing as `building all your modules into the -kernel', although the idea is the same. -
-This is by convention. When writing a driver, it's OK to put the device -file in your current directory. Just make sure you place it in /dev for -a production driver -
In version 2.0, in version 2.2 this is done automatically if we set the inode to zero.
+
The difference between the two is that file operations deal with the file itself, and inode operations deal with ways of referencing the file, such as creating links to it.
+
Notice that here the roles of read and write are reversed again, so in ioctl's read is to send information to the kernel and write is to receive information from the kernel.
+
This isn't exact. You won't be able to pass a structure, for example, through an ioctl — but you will be able to pass a pointer to the structure.
+
The easiest way to keep a file open is to open it with tail -f.
+
This means that the process is still in kernel mode – as far as the process is concerned, it issued the open system call and the system call hasn't returned yet. The process doesn't know somebody else used the CPU @@ -5409,25 +5388,25 @@ for most of the time between the moment it issued the call and the moment it returned.
+
This is because we used module_interruptible_sleep_on. We could have used module_sleep_on instead, but that would have resulted is extremely angry users whose Ctrl+cs are ignored.
+
Teletype, originally a combination keyboard-printer used to communicate with a Unix system, and today an abstraction for the text stream used for a Unix program, whether it's a physical terminal, an xterm on an X display, a network connection used with telnet, etc.
+
This is standard nomencalture on the Intel architecture where Linux originated.
+
In practice IRQ handling can be a bit more complex. Hardware is often designed in a way that chains two interrupt controllers, so that all the IRQs from interrupt controller B are cascaded to a certain IRQ from diff --git a/4.9.11/LKMPG-4.9.11.org b/4.9.11/LKMPG-4.9.11.org index a99f1d0..8c2d6be 100644 --- a/4.9.11/LKMPG-4.9.11.org +++ b/4.9.11/LKMPG-4.9.11.org @@ -628,7 +628,7 @@ int main(void) } #+END_SRC -with *gcc -Wall -o hello hello.c*. Run the exectable with *strace ./hello*. Are you impressed? Every line you see corresponds to a system call. strace[fn:4] is a handy program that gives you details about what system calls a program is making, including which call is made, what its arguments are and what it returns. It's an invaluable tool for figuring out things like what files a program is trying to access. Towards the end, you'll see a line which looks like write (1, "hello", 5hello). There it is. The face behind the printf() mask. You may not be familiar with write, since most people use library functions for file I/O (like fopen, fputs, fclose). If that's the case, try looking at man 2 write. The 2nd man section is devoted to system calls (like kill() and read()). The 3rd man section is devoted to library calls, which you would probably be more familiar with (like cosh() and random()). +with *gcc -Wall -o hello hello.c*. Run the exectable with *strace ./hello*. Are you impressed? Every line you see corresponds to a system call. [[https://strace.io/][strace]] is a handy program that gives you details about what system calls a program is making, including which call is made, what its arguments are and what it returns. It's an invaluable tool for figuring out things like what files a program is trying to access. Towards the end, you'll see a line which looks like write (1, "hello", 5hello). There it is. The face behind the printf() mask. You may not be familiar with write, since most people use library functions for file I/O (like fopen, fputs, fclose). If that's the case, try looking at man 2 write. The 2nd man section is devoted to system calls (like kill() and read()). The 3rd man section is devoted to library calls, which you would probably be more familiar with (like cosh() and random()). You can even write modules to replace the kernel's system calls, which we'll do shortly. Crackers often make use of this sort of thing for backdoors or trojans, but you can write your own modules to do more benign things, like have the kernel write Tee hee, that tickles! everytime someone tries to delete a file on your system. @@ -647,11 +647,11 @@ The file */proc/kallsyms* holds all the symbols that the kernel knows about and ** Code space Memory management is a very complicated subject and the majority of O'Reilly's "/Understanding The Linux Kernel/" exclusively covers memory management! We're not setting out to be experts on memory managements, but we do need to know a couple of facts to even begin worrying about writing real modules. -If you haven't thought about what a segfault really means, you may be surprised to hear that pointers don't actually point to memory locations. Not real ones, anyway. When a process is created, the kernel sets aside a portion of real physical memory and hands it to the process to use for its executing code, variables, stack, heap and other things which a computer scientist would know about[fn:5]. This memory begins with 0x00000000 and extends up to whatever it needs to be. Since the memory space for any two processes don't overlap, every process that can access a memory address, say 0xbffff978, would be accessing a different location in real physical memory! The processes would be accessing an index named 0xbffff978 which points to some kind of offset into the region of memory set aside for that particular process. For the most part, a process like our Hello, World program can't access the space of another process, although there are ways which we'll talk about later. +If you haven't thought about what a segfault really means, you may be surprised to hear that pointers don't actually point to memory locations. Not real ones, anyway. When a process is created, the kernel sets aside a portion of real physical memory and hands it to the process to use for its executing code, variables, stack, heap and other things which a computer scientist would know about. This memory begins with 0x00000000 and extends up to whatever it needs to be. Since the memory space for any two processes don't overlap, every process that can access a memory address, say 0xbffff978, would be accessing a different location in real physical memory! The processes would be accessing an index named 0xbffff978 which points to some kind of offset into the region of memory set aside for that particular process. For the most part, a process like our Hello, World program can't access the space of another process, although there are ways which we'll talk about later. The kernel has its own space of memory as well. Since a module is code which can be dynamically inserted and removed in the kernel (as opposed to a semi-autonomous object), it shares the kernel's codespace rather than having its own. Therefore, if your module segfaults, the kernel segfaults. And if you start writing over data because of an off-by-one error, then you're trampling on kernel data (or code). This is even worse than it sounds, so try your best to be careful. -By the way, I would like to point out that the above discussion is true for any operating system which uses a monolithic kernel[fn:6]. There are things called microkernels which have modules which get their own codespace. The GNU Hurd and QNX Neutrino are two examples of a microkernel. +By the way, I would like to point out that the above discussion is true for any operating system which uses a monolithic kernel. This isn't quite the same thing as /"building all your modules into the kernel"/, although the idea is the same. There are things called microkernels which have modules which get their own codespace. The GNU Hurd and the Magenta kernel of Google Fuchsia are two examples of a microkernel. ** Device Drivers One class of module is the device driver, which provides functionality for hardware like a serial port. On unix, each piece of hardware is represented by a file located in /dev named a device file which provides the means to communicate with the hardware. The device driver provides the communication on behalf of a user program. So the es1370.o sound card device driver might connect the /dev/sound device file to the Ensoniq IS1370 sound card. A userspace program like mp3blaster can use /dev/sound without ever knowing what kind of sound card is installed. @@ -772,7 +772,7 @@ An instance of struct file is commonly named filp. You'll also see it refered to Go ahead and look at the definition of file. Most of the entries you see, like struct dentry aren't used by device drivers, and you can ignore them. This is because drivers don't fill file directly; they only use structures contained in file which are created elsewhere. ** Registering A Device -As discussed earlier, char devices are accessed through device files, usually located in /dev[fn:7]. The major number tells you which driver handles which device file. The minor number is used only by the driver itself to differentiate which device it's operating on, just in case the driver handles more than one device. +As discussed earlier, char devices are accessed through device files, usually located in /dev. This is by convention. When writing a driver, it's OK to put the device file in your current directory. Just make sure you place it in /dev for a production driver. The major number tells you which driver handles which device file. The minor number is used only by the driver itself to differentiate which device it's operating on, just in case the driver handles more than one device. Adding a driver to your system means registering it with the kernel. This is synonymous with assigning it a major number during the module's initialization. You do this by using the register_chrdev function, defined by linux/fs.h. @@ -4139,27 +4139,7 @@ I hope I have helped you in your quest to become a better programmer, or at leas If you'd like to contribute to this guide, notice anything glaringly wrong, or just want to add extra sarcastic remarks perhaps involving monkeys or some other kind of animal then please file an issue or even better submit a pull request at https://github.com/bashrc/LKMPG. -[fn:1] In earlier versions of linux, this was known as kerneld. -[fn:2] If such a file exists. Note that the acual behavoir might be - distribution-dependent. If you're interested in the details,read the man - pages that came with module-init-tools, and see for yourself what's - really going on. You could use something like strace modprobe dummy to - find out how dummy.ko gets loaded. FYI: The dummy.ko I'm talking about - here is part of the mainline kernel and can be found in the networking - section. It needs to be compiled as a module (and installed, of course) - for this to work. -[fn:3] If you are modifying the kernel, to avoid overwriting your existing - modules you may want to use the EXTRAVERSION variable in the kernel - Makefile to create a seperate directory. -[fn:4] It's an invaluable tool for figuring out things like what files a - program is trying to access. Ever have a program bail silently because - it couldn't find a file? It's a PITA! -[fn:5] I'm a physicist, not a computer scientist, Jim! -[fn:6] This isn't quite the same thing as `building all your modules into the - kernel', although the idea is the same. -[fn:7] This is by convention. When writing a driver, it's OK to put the device - file in your current directory. Just make sure you place it in /dev for - a production driver + [fn:8] In version 2.0, in version 2.2 this is done automatically if we set the inode to zero. [fn:9] The difference between the two is that file operations deal with the