1
0
mirror of https://github.com/bashrc/LKMPG.git synced 2018-06-11 03:06:54 +02:00

Minor wording and formatting changes

This commit is contained in:
Bob Mottram
2016-03-09 17:30:15 +00:00
parent 711052fe50
commit fa7d4ce334
2 changed files with 67 additions and 61 deletions

View File

@@ -3,7 +3,7 @@
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head> <head>
<!-- 2016-03-09 Wed 16:38 --> <!-- 2016-03-09 Wed 17:30 -->
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" /> <meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" /> <meta name="viewport" content="width=device-width, initial-scale=1" />
<title>The Linux Kernel Module Programming Guide</title> <title>The Linux Kernel Module Programming Guide</title>
@@ -878,7 +878,7 @@ This demonstrates a feature of kernel 2.2 and later. Notice the change in the de
</p> </p>
<p> <p>
There is also an <b>__initdata</b> which works similarly to __init but for init variables rather than functions. There is also an <b>__initdata</b> which works similarly to <b>__init</b> but for init variables rather than functions.
</p> </p>
<p> <p>
@@ -1007,7 +1007,7 @@ module_param(myint, <span class="org-type">int</span>, 0);
</div> </div>
<p> <p>
Arrays are supported too, but things are a bit different now than they were in the 2.4. days. To keep track of the number of parameters you need to pass a pointer to a count variable as third parameter. At your option, you could also ignore the count and pass NULL instead. We show both possibilities here: Arrays are supported too, but things are a bit different now than they were in the olden days. To keep track of the number of parameters you need to pass a pointer to a count variable as third parameter. At your option, you could also ignore the count and pass NULL instead. We show both possibilities here:
</p> </p>
<div class="org-src-container"> <div class="org-src-container">
@@ -1026,7 +1026,7 @@ A good use for this is to have the module variable's default values set, like an
</p> </p>
<p> <p>
Lastly, there's a macro function, MODULE_PARM_DESC(), that is used to document arguments that the module can take. It takes two parameters: a variable name and a free form string describing that variable. Lastly, there's a macro function, <b>MODULE_PARM_DESC()</b>, that is used to document arguments that the module can take. It takes two parameters: a variable name and a free form string describing that variable.
</p> </p>
</div> </div>
@@ -1243,11 +1243,11 @@ This is the complete makefile for all the examples we've seen so far. The first
<h3 id="orgheadline29">Building modules for a precompiled kernel</h3> <h3 id="orgheadline29">Building modules for a precompiled kernel</h3>
<div class="outline-text-3" id="text-orgheadline29"> <div class="outline-text-3" id="text-orgheadline29">
<p> <p>
Obviously, we strongly suggest you to recompile your kernel, so that you can enable a number of useful debugging features, such as forced module unloading (MODULE_FORCE_UNLOAD): when this option is enabled, you can force the kernel to unload a module even when it believes it is unsafe, via a rmmod -f module command. This option can save you a lot of time and a number of reboots during the development of a module. Obviously, we strongly suggest you to recompile your kernel, so that you can enable a number of useful debugging features, such as forced module unloading (<b>MODULE_FORCE_UNLOAD</b>): when this option is enabled, you can force the kernel to unload a module even when it believes it is unsafe, via a <b>sudo rmmod -f module</b> command. This option can save you a lot of time and a number of reboots during the development of a module. If you don't want to recompile your kernel then you should consider running the examples within a test distro on a virtual machine. If you mess anything up then you can easily reboot or restore the VM.
</p> </p>
<p> <p>
Nevertheless, there is a number of cases in which you may want to load your module into a precompiled running kernel, such as the ones shipped with common Linux distributions, or a kernel you have compiled in the past. In certain circumstances you could require to compile and insert a module into a running kernel which you are not allowed to recompile, or on a machine that you prefer not to reboot. If you can't think of a case that will force you to use modules for a precompiled kernel you might want to skip this and treat the rest of this chapter as a big footnote. There are a number of cases in which you may want to load your module into a precompiled running kernel, such as the ones shipped with common Linux distributions, or a kernel you have compiled in the past. In certain circumstances you could require to compile and insert a module into a running kernel which you are not allowed to recompile, or on a machine that you prefer not to reboot. If you can't think of a case that will force you to use modules for a precompiled kernel you might want to skip this and treat the rest of this chapter as a big footnote.
</p> </p>
<p> <p>
@@ -1291,24 +1291,24 @@ To overcome this problem we could resort to the <b>&#x2013;force-vermagic</b> op
</p> </p>
<p> <p>
First of all, make sure that a kernel source tree is available, having exactly the same version as your current kernel. Then, find the configuration file which was used to compile your precompiled kernel. Usually, this is available in your current <i>boot directory, under a name like config-2.6.x. You may just want to copy it to your kernel source tree: cp /boot/config-`uname -r` /usr/src/linux-`uname -r`</i>.config. First of all, make sure that a kernel source tree is available, having exactly the same version as your current kernel. Then, find the configuration file which was used to compile your precompiled kernel. Usually, this is available in your current <i>boot directory, under a name like config-2.6.x. You may just want to copy it to your kernel source tree: *cp /boot/config-`uname -r` /usr/src/linux-`uname -r`</i>.config*.
</p> </p>
<p> <p>
Let's focus again on the previous error message: a closer look at the version magic strings suggests that, even with two configuration files which are exactly the same, a slight difference in the version magic could be possible, and it is sufficient to prevent insertion of the module into the kernel. That slight difference, namely the custom string which appears in the module's version magic and not in the kernel's one, is due to a modification with respect to the original, in the makefile that some distribution include. Then, examine your /usr/src/linux/Makefile, and make sure that the specified version information matches exactly the one used for your current kernel. For example, you makefile could start as follows: Let's focus again on the previous error message: a closer look at the version magic strings suggests that, even with two configuration files which are exactly the same, a slight difference in the version magic could be possible, and it is sufficient to prevent insertion of the module into the kernel. That slight difference, namely the custom string which appears in the module's version magic and not in the kernel's one, is due to a modification with respect to the original, in the makefile that some distribution include. Then, examine your <b>/usr/src/linux/Makefile</b>, and make sure that the specified version information matches exactly the one used for your current kernel. For example, you makefile could start as follows:
</p> </p>
<div class="org-src-container"> <div class="org-src-container">
<pre class="src src-makefile"><span class="org-variable-name">VERSION</span> = 2 <pre class="src src-makefile"><span class="org-variable-name">VERSION</span> = 3
<span class="org-variable-name">PATCHLEVEL</span> = 6 <span class="org-variable-name">PATCHLEVEL</span> = 16
<span class="org-variable-name">SUBLEVEL</span> = 5 <span class="org-variable-name">SUBLEVEL</span> = 7
<span class="org-variable-name">EXTRAVERSION</span> = -1.358custom <span class="org-variable-name">EXTRAVERSION</span> = -1.358custom
</pre> </pre>
</div> </div>
<p> <p>
In this case, you need to restore the value of symbol EXTRAVERSION to -1.358. We suggest to keep a backup copy of the makefile used to compile your kernel available in <b>/lib/modules/3.16.7-1.358/build</b>. A simple <b>cp /lib/modules/`uname-r`/build/Makefile /usr/src/linux-`uname -r`</b> should suffice. Additionally, if you already started a kernel build with the previous (wrong) Makefile, you should also rerun make, or directly modify symbol UTS_RELEASE in file <b>/usr/src/linux-3.16.x/include/linux/version.h</b> according to contents of file <b>/lib/modules/3.16.x/build/include/linux/version.h</b>, or overwrite the latter with the first. In this case, you need to restore the value of symbol <b>EXTRAVERSION</b> to -1.358. We suggest to keep a backup copy of the makefile used to compile your kernel available in <b>/lib/modules/3.16.7-1.358/build</b>. A simple <b>cp /lib/modules/`uname-r`/build/Makefile /usr/src/linux-`uname -r`</b> should suffice. Additionally, if you already started a kernel build with the previous (wrong) Makefile, you should also rerun make, or directly modify symbol UTS_RELEASE in file <b>/usr/src/linux-3.16.x/include/linux/version.h</b> according to contents of file <b>/lib/modules/3.16.x/build/include/linux/version.h</b>, or overwrite the latter with the first.
</p> </p>
<p> <p>
@@ -1345,11 +1345,11 @@ If you do not desire to actually compile the kernel, you can interrupt the build
<h3 id="orgheadline31">How modules begin and end</h3> <h3 id="orgheadline31">How modules begin and end</h3>
<div class="outline-text-3" id="text-orgheadline31"> <div class="outline-text-3" id="text-orgheadline31">
<p> <p>
A program usually begins with a main() function, executes a bunch of instructions and terminates upon completion of those instructions. Kernel modules work a bit differently. A module always begin with either the init_module or the function you specify with module_init call. This is the entry function for modules; it tells the kernel what functionality the module provides and sets up the kernel to run the module's functions when they're needed. Once it does this, entry function returns and the module does nothing until the kernel wants to do something with the code that the module provides. A program usually begins with a <b>main()</b> function, executes a bunch of instructions and terminates upon completion of those instructions. Kernel modules work a bit differently. A module always begin with either the init_module or the function you specify with module_init call. This is the entry function for modules; it tells the kernel what functionality the module provides and sets up the kernel to run the module's functions when they're needed. Once it does this, entry function returns and the module does nothing until the kernel wants to do something with the code that the module provides.
</p> </p>
<p> <p>
All modules end by calling either cleanup_module or the function you specify with the module_exit call. This is the exit function for modules; it undoes whatever entry function did. It unregisters the functionality that the entry function registered. All modules end by calling either <b>cleanup_module</b> or the function you specify with the <b>module_exit</b> call. This is the exit function for modules; it undoes whatever entry function did. It unregisters the functionality that the entry function registered.
</p> </p>
<p> <p>
@@ -1362,11 +1362,11 @@ Every module must have an entry function and an exit function. Since there's mor
<h3 id="orgheadline32">Functions available to modules</h3> <h3 id="orgheadline32">Functions available to modules</h3>
<div class="outline-text-3" id="text-orgheadline32"> <div class="outline-text-3" id="text-orgheadline32">
<p> <p>
Programmers use functions they don't define all the time. A prime example of this is printf(). You use these library functions which are provided by the standard C library, libc. The definitions for these functions don't actually enter your program until the linking stage, which insures that the code (for printf() for example) is available, and fixes the call instruction to point to that code. Programmers use functions they don't define all the time. A prime example of this is <b>printf()</b>. You use these library functions which are provided by the standard C library, libc. The definitions for these functions don't actually enter your program until the linking stage, which insures that the code (for printf() for example) is available, and fixes the call instruction to point to that code.
</p> </p>
<p> <p>
Kernel modules are different here, too. In the hello world example, you might have noticed that we used a function, printk() but didn't include a standard I/O library. That's because modules are object files whose symbols get resolved upon insmod'ing. The definition for the symbols comes from the kernel itself; the only external functions you can use are the ones provided by the kernel. If you're curious about what symbols have been exported by your kernel, take a look at <b>/proc/kallsyms</b>. Kernel modules are different here, too. In the hello world example, you might have noticed that we used a function, <b>printk()</b> but didn't include a standard I/O library. That's because modules are object files whose symbols get resolved upon insmod'ing. The definition for the symbols comes from the kernel itself; the only external functions you can use are the ones provided by the kernel. If you're curious about what symbols have been exported by your kernel, take a look at <b>/proc/kallsyms</b>.
</p> </p>
<p> <p>
@@ -2709,7 +2709,7 @@ This is not always enough. Imagine you had a serial port connected to a modem (e
</p> </p>
<p> <p>
The answer in Unix is to use a special function called ioctl (short for Input Output ConTroL). Every device can have its own ioctl commands, which can be read ioctl's (to send information from a process to the kernel), write ioctl's (to return information to a process), [10] both or neither. The ioctl function is called with three parameters: the file descriptor of the appropriate device file, the ioctl number, and a parameter, which is of type long so you can use a cast to use it to pass anything. [11] The answer in Unix is to use a special function called <b>ioctl</b> (short for Input Output ConTroL). Every device can have its own ioctl commands, which can be read ioctl's (to send information from a process to the kernel), write ioctl's (to return information to a process), [10] both or neither. The ioctl function is called with three parameters: the file descriptor of the appropriate device file, the ioctl number, and a parameter, which is of type long so you can use a cast to use it to pass anything. [11]
</p> </p>
<p> <p>
@@ -3197,15 +3197,15 @@ If you want to use ioctls in your own kernel modules, it is best to receive an o
<h2 id="orgheadline62">System Calls</h2> <h2 id="orgheadline62">System Calls</h2>
<div class="outline-text-2" id="text-orgheadline62"> <div class="outline-text-2" id="text-orgheadline62">
<p> <p>
So far, the only thing we've done was to use well defined kernel mechanisms to register /proc files and device handlers. This is fine if you want to do something the kernel programmers thought you'd want, such as write a device driver. But what if you want to do something unusual, to change the behavior of the system in some way? Then, you're mostly on your own. So far, the only thing we've done was to use well defined kernel mechanisms to register <b>/proc</b> files and device handlers. This is fine if you want to do something the kernel programmers thought you'd want, such as write a device driver. But what if you want to do something unusual, to change the behavior of the system in some way? Then, you're mostly on your own.
</p> </p>
<p> <p>
This is where kernel programming gets dangerous. While writing the example below, I killed the open() system call. This meant I couldn't open any files, I couldn't run any programs, and I couldn't shutdown the computer. I had to pull the power switch. Luckily, no files died. To ensure you won't lose any files either, please run sync right before you do the insmod and the rmmod. If you're not being sensible and using a virtual machine then this is where kernel programming can become hazardous. While writing the example below, I killed the <b>open()</b> system call. This meant I couldn't open any files, I couldn't run any programs, and I couldn't shutdown the system. I had to restart the virtual machine. No important files got anihilated, but if I was doing this on some live mission critical system then that could have been a possible outcome. To ensure you don't lose any files, even within a test environment, please run <b>sync</b> right before you do the <b>insmod</b> and the <b>rmmod</b>.
</p> </p>
<p> <p>
Forget about /proc files, forget about device files. They're just minor details. The real process to kernel communication mechanism, the one used by all processes, is system calls. When a process requests a service from the kernel (such as opening a file, forking to a new process, or requesting more memory), this is the mechanism used. If you want to change the behaviour of the kernel in interesting ways, this is the place to do it. By the way, if you want to see which system calls a program uses, run strace &lt;arguments&gt;. Forget about <b>/proc</b> files, forget about device files. They're just minor details. Minutiae in the vast expanse of the universe. The real process to kernel communication mechanism, the one used by all processes, is <i>system calls</i>. When a process requests a service from the kernel (such as opening a file, forking to a new process, or requesting more memory), this is the mechanism used. If you want to change the behaviour of the kernel in interesting ways, this is the place to do it. By the way, if you want to see which system calls a program uses, run <b>strace &lt;arguments&gt;</b>.
</p> </p>
<p> <p>
@@ -3225,19 +3225,19 @@ So, if we want to change the way a certain system call works, what we need to do
</p> </p>
<p> <p>
The source code here is an example of such a kernel module. We want to `spy' on a certain user, and to printk() a message whenever that user opens a file. Towards this end, we replace the system call to open a file with our own function, called our_sys_open. This function checks the uid (user's id) of the current process, and if it's equal to the uid we spy on, it calls printk() to display the name of the file to be opened. Then, either way, it calls the original open() function with the same parameters, to actually open the file. The source code here is an example of such a kernel module. We want to "spy" on a certain user, and to printk() a message whenever that user opens a file. Towards this end, we replace the system call to open a file with our own function, called <b>our_sys_open</b>. This function checks the uid (user's id) of the current process, and if it's equal to the uid we spy on, it calls printk() to display the name of the file to be opened. Then, either way, it calls the original open() function with the same parameters, to actually open the file.
</p> </p>
<p> <p>
The init_module function replaces the appropriate location in sys_call_table and keeps the original pointer in a variable. The cleanup_module function uses that variable to restore everything back to normal. This approach is dangerous, because of the possibility of two kernel modules changing the same system call. Imagine we have two kernel modules, A and B. A's open system call will be A_open and B's will be B_open. Now, when A is inserted into the kernel, the system call is replaced with A_open, which will call the original sys_open when it's done. Next, B is inserted into the kernel, which replaces the system call with B_open, which will call what it thinks is the original system call, A_open, when it's done. The <b>init_module</b> function replaces the appropriate location in <b>sys_call_table</b> and keeps the original pointer in a variable. The cleanup_module function uses that variable to restore everything back to normal. This approach is dangerous, because of the possibility of two kernel modules changing the same system call. Imagine we have two kernel modules, A and B. A's open system call will be A_open and B's will be B_open. Now, when A is inserted into the kernel, the system call is replaced with A_open, which will call the original sys_open when it's done. Next, B is inserted into the kernel, which replaces the system call with B_open, which will call what it thinks is the original system call, A_open, when it's done.
</p> </p>
<p> <p>
Now, if B is removed first, everything will be well&#x2014;it will simply restore the system call to A_open, which calls the original. However, if A is removed and then B is removed, the system will crash. A's removal will restore the system call to the original, sys_open, cutting B out of the loop. Then, when B is removed, it will restore the system call to what it thinks is the original, A_open, which is no longer in memory. At first glance, it appears we could solve this particular problem by checking if the system call is equal to our open function and if so not changing it at all (so that B won't change the system call when it's removed), but that will cause an even worse problem. When A is removed, it sees that the system call was changed to B_open so that it is no longer pointing to A_open, so it won't restore it to sys_open before it is removed from memory. Unfortunately, B_open will still try to call A_open which is no longer there, so that even without removing B the system would crash. Now, if B is removed first, everything will be well&#x2014;it will simply restore the system call to A_open, which calls the original. However, if A is removed and then B is removed, the system will crash. A's removal will restore the system call to the original, sys_open, cutting B out of the loop. Then, when B is removed, it will restore the system call to what it thinks is the original, <b>A_open</b>, which is no longer in memory. At first glance, it appears we could solve this particular problem by checking if the system call is equal to our open function and if so not changing it at all (so that B won't change the system call when it's removed), but that will cause an even worse problem. When A is removed, it sees that the system call was changed to <b>B_open</b> so that it is no longer pointing to <b>A_open</b>, so it won't restore it to <b>sys_open</b> before it is removed from memory. Unfortunately, <b>B_open</b> will still try to call <b>A_open</b> which is no longer there, so that even without removing B the system would crash.
</p> </p>
<p> <p>
Note that all the related problems make syscall stealing unfeasiable for production use. In order to keep people from doing potential harmful things sys_call_table is no longer exported. This means, if you want to do something more than a mere dry run of this example, you will have to patch your current kernel in order to have sys_call_table exported. In the example directory you will find a README and the patch. As you can imagine, such modifications are not to be taken lightly. Do not try this on valueable systems (ie systems that you do not own - or cannot restore easily). You'll need to get the complete sourcecode of this guide as a tarball in order to get the patch and the README. Depending on your kernel version, you might even need to hand apply the patch. Still here? Well, so is this chapter. If Wyle E. Coyote was a kernel hacker, this would be the first thing he'd try. ;) Note that all the related problems make syscall stealing unfeasiable for production use. In order to keep people from doing potential harmful things <b>sys_call_table</b> is no longer exported. This means, if you want to do something more than a mere dry run of this example, you will have to patch your current kernel in order to have sys_call_table exported. In the example directory you will find a README and the patch. As you can imagine, such modifications are not to be taken lightly. Do not try this on valueable systems (ie systems that you do not own - or cannot restore easily). You'll need to get the complete sourcecode of this guide as a tarball in order to get the patch and the README. Depending on your kernel version, you might even need to hand apply the patch. Still here? Well, so is this chapter. If Wyle E. Coyote was a kernel hacker, this would be the first thing he'd try. ;)
</p> </p>
</div> </div>
@@ -3413,12 +3413,12 @@ asmlinkage <span class="org-type">int</span> <span class="org-function-name">our
<h2 id="orgheadline64">Blocking Processes</h2> <h2 id="orgheadline64">Blocking Processes</h2>
<div class="outline-text-2" id="text-orgheadline64"> <div class="outline-text-2" id="text-orgheadline64">
<p> <p>
What do you do when somebody asks you for something you can't do right away? If you're a human being and you're bothered by a human being, the only thing you can say is: "Not right now, I'm busy. Go away!". But if you're a kernel module and you're bothered by a process, you have another possibility. You can put the process to sleep until you can service it. After all, processes are being put to sleep by the kernel and woken up all the time (that's the way multiple processes appear to run on the same time on a single CPU). What do you do when somebody asks you for something you can't do right away? If you're a human being and you're bothered by a human being, the only thing you can say is: <i>"Not right now, I'm busy. Go away!"</i>. But if you're a kernel module and you're bothered by a process, you have another possibility. You can put the process to sleep until you can service it. After all, processes are being put to sleep by the kernel and woken up all the time (that's the way multiple processes appear to run on the same time on a single CPU).
</p> </p>
<p> <p>
This kernel module is an example of this. The file (called /proc/sleep) can only be opened by a single process at a time. If the file is already open, the kernel module calls wait_event_interruptible[12]. This function changes This kernel module is an example of this. The file (called <b>/proc/sleep</b>) can only be opened by a single process at a time. If the file is already open, the kernel module calls wait_event_interruptible[12]. This function changes
the status of the task (a task is the kernel data structure which holds information about a process and the system call it's in, if any) to TASK_INTERRUPTIBLE, which means that the task will not run until it is woken up somehow, and adds it to WaitQ, the queue of tasks waiting to access the file. Then, the function calls the scheduler to context switch to a different process, one which has some use for the CPU. the status of the task (a task is the kernel data structure which holds information about a process and the system call it's in, if any) to <b>TASK_INTERRUPTIBLE</b>, which means that the task will not run until it is woken up somehow, and adds it to WaitQ, the queue of tasks waiting to access the file. Then, the function calls the scheduler to context switch to a different process, one which has some use for the CPU.
</p> </p>
<p> <p>
@@ -3430,11 +3430,11 @@ So we'll use tail -f to keep the file open in the background, while trying to ac
</p> </p>
<p> <p>
To make our life more interesting, module_close doesn't have a monopoly on waking up the processes which wait to access the file. A signal, such as Ctrl +c (SIGINT) can also wake up a process. [14] In that case, we want to return with -EINTR immediately. This is important so users can, for example, kill the process before it receives the file. To make our life more interesting, module_close doesn't have a monopoly on waking up the processes which wait to access the file. A signal, such as <i>Ctrl +c</i> (<b>SIGINT</b>) can also wake up a process. [14] In that case, we want to return with <b>-EINTR</b> immediately. This is important so users can, for example, kill the process before it receives the file.
</p> </p>
<p> <p>
There is one more point to remember. Some times processes don't want to sleep, they want either to get what they want immediately, or to be told it cannot be done. Such processes use the O_NONBLOCK flag when opening the file. The kernel is supposed to respond by returning with the error code -EAGAIN from operations which would otherwise block, such as opening the file in this example. The program cat_noblock, available in the source directory for this chapter, can be used to open a file with O_NONBLOCK. There is one more point to remember. Some times processes don't want to sleep, they want either to get what they want immediately, or to be told it cannot be done. Such processes use the <b>O_NONBLOCK</b> flag when opening the file. The kernel is supposed to respond by returning with the error code <b>-EAGAIN</b> from operations which would otherwise block, such as opening the file in this example. The program cat_noblock, available in the source directory for this chapter, can be used to open a file with <b>O_NONBLOCK</b>.
</p> </p>
<div class="org-src-container"> <div class="org-src-container">
@@ -3466,6 +3466,10 @@ hostname:~/lkmpg-examples/09-BlockingProcesses#
<div id="outline-container-orgheadline65" class="outline-3"> <div id="outline-container-orgheadline65" class="outline-3">
<h3 id="orgheadline65">Example: sleep.c</h3> <h3 id="orgheadline65">Example: sleep.c</h3>
<div class="outline-text-3" id="text-orgheadline65"> <div class="outline-text-3" id="text-orgheadline65">
<p>
TODO: this will be out of date
</p>
<div class="org-src-container"> <div class="org-src-container">
<pre class="src src-c"><span class="org-comment-delimiter">/*</span> <pre class="src src-c"><span class="org-comment-delimiter">/*</span>
@@ -3537,8 +3541,8 @@ hostname:~/lkmpg-examples/09-BlockingProcesses#
<span class="org-keyword">static</span> <span class="org-type">ssize_t</span> <span class="org-function-name">module_input</span>(<span class="org-keyword">struct</span> <span class="org-type">file</span> *<span class="org-variable-name">file</span>, <span class="org-comment-delimiter">/* </span><span class="org-comment">The file itself </span><span class="org-comment-delimiter">*/</span> <span class="org-keyword">static</span> <span class="org-type">ssize_t</span> <span class="org-function-name">module_input</span>(<span class="org-keyword">struct</span> <span class="org-type">file</span> *<span class="org-variable-name">file</span>, <span class="org-comment-delimiter">/* </span><span class="org-comment">The file itself </span><span class="org-comment-delimiter">*/</span>
<span class="org-keyword">const</span> <span class="org-type">char</span> *<span class="org-variable-name">buf</span>, <span class="org-comment-delimiter">/* </span><span class="org-comment">The buffer with input </span><span class="org-comment-delimiter">*/</span> <span class="org-keyword">const</span> <span class="org-type">char</span> *<span class="org-variable-name">buf</span>, <span class="org-comment-delimiter">/* </span><span class="org-comment">The buffer with input </span><span class="org-comment-delimiter">*/</span>
<span class="org-type">size_t</span> <span class="org-variable-name">length</span>, <span class="org-comment-delimiter">/* </span><span class="org-comment">The buffer's length </span><span class="org-comment-delimiter">*/</span> <span class="org-type">size_t</span> <span class="org-variable-name">length</span>, <span class="org-comment-delimiter">/* </span><span class="org-comment">The buffer's length </span><span class="org-comment-delimiter">*/</span>
<span class="org-type">loff_t</span> * <span class="org-variable-name">offset</span>) <span class="org-type">loff_t</span> * <span class="org-variable-name">offset</span>) <span class="org-comment-delimiter">/* </span><span class="org-comment">offset to file - ignore </span><span class="org-comment-delimiter">*/</span>
{ <span class="org-comment-delimiter">/* </span><span class="org-comment">offset to file - ignore </span><span class="org-comment-delimiter">*/</span> {
<span class="org-type">int</span> <span class="org-variable-name">i</span>; <span class="org-type">int</span> <span class="org-variable-name">i</span>;
<span class="org-comment-delimiter">/*</span> <span class="org-comment-delimiter">/*</span>

View File

@@ -315,7 +315,7 @@ you can see, some things get hardwired into the kernel (obj-y) but where are all
This demonstrates a feature of kernel 2.2 and later. Notice the change in the definitions of the init and cleanup functions. The *__init* macro causes the init function to be discarded and its memory freed once the init function finishes for built-in drivers, but not loadable modules. If you think about when the init function is invoked, this makes perfect sense. This demonstrates a feature of kernel 2.2 and later. Notice the change in the definitions of the init and cleanup functions. The *__init* macro causes the init function to be discarded and its memory freed once the init function finishes for built-in drivers, but not loadable modules. If you think about when the init function is invoked, this makes perfect sense.
There is also an *__initdata* which works similarly to __init but for init variables rather than functions. There is also an *__initdata* which works similarly to *__init* but for init variables rather than functions.
The *__exit* macro causes the omission of the function when the module is built into the kernel, and like __exit, has no effect for loadable modules. Again, if you consider when the cleanup function runs, this makes complete sense; built-in drivers don't need a cleanup function, while loadable modules do. The *__exit* macro causes the omission of the function when the module is built into the kernel, and like __exit, has no effect for loadable modules. Again, if you consider when the cleanup function runs, this makes complete sense; built-in drivers don't need a cleanup function, while loadable modules do.
@@ -404,7 +404,7 @@ int myint = 3;
module_param(myint, int, 0); module_param(myint, int, 0);
#+END_SRC #+END_SRC
Arrays are supported too, but things are a bit different now than they were in the 2.4. days. To keep track of the number of parameters you need to pass a pointer to a count variable as third parameter. At your option, you could also ignore the count and pass NULL instead. We show both possibilities here: Arrays are supported too, but things are a bit different now than they were in the olden days. To keep track of the number of parameters you need to pass a pointer to a count variable as third parameter. At your option, you could also ignore the count and pass NULL instead. We show both possibilities here:
#+BEGIN_SRC c #+BEGIN_SRC c
int myintarray[2]; int myintarray[2];
@@ -417,7 +417,7 @@ module_parm_array(myshortarray, short, , 0); /* put count into "count" variable
A good use for this is to have the module variable's default values set, like an port or IO address. If the variables contain the default values, then perform autodetection (explained elsewhere). Otherwise, keep the current value. This will be made clear later on. A good use for this is to have the module variable's default values set, like an port or IO address. If the variables contain the default values, then perform autodetection (explained elsewhere). Otherwise, keep the current value. This will be made clear later on.
Lastly, there's a macro function, MODULE_PARM_DESC(), that is used to document arguments that the module can take. It takes two parameters: a variable name and a free form string describing that variable. Lastly, there's a macro function, *MODULE_PARM_DESC()*, that is used to document arguments that the module can take. It takes two parameters: a variable name and a free form string describing that variable.
*** Example: hello-5.c *** Example: hello-5.c
@@ -588,9 +588,9 @@ This is the complete makefile for all the examples we've seen so far. The first
** Building modules for a precompiled kernel ** Building modules for a precompiled kernel
Obviously, we strongly suggest you to recompile your kernel, so that you can enable a number of useful debugging features, such as forced module unloading (MODULE_FORCE_UNLOAD): when this option is enabled, you can force the kernel to unload a module even when it believes it is unsafe, via a rmmod -f module command. This option can save you a lot of time and a number of reboots during the development of a module. Obviously, we strongly suggest you to recompile your kernel, so that you can enable a number of useful debugging features, such as forced module unloading (*MODULE_FORCE_UNLOAD*): when this option is enabled, you can force the kernel to unload a module even when it believes it is unsafe, via a *sudo rmmod -f module* command. This option can save you a lot of time and a number of reboots during the development of a module. If you don't want to recompile your kernel then you should consider running the examples within a test distro on a virtual machine. If you mess anything up then you can easily reboot or restore the VM.
Nevertheless, there is a number of cases in which you may want to load your module into a precompiled running kernel, such as the ones shipped with common Linux distributions, or a kernel you have compiled in the past. In certain circumstances you could require to compile and insert a module into a running kernel which you are not allowed to recompile, or on a machine that you prefer not to reboot. If you can't think of a case that will force you to use modules for a precompiled kernel you might want to skip this and treat the rest of this chapter as a big footnote. There are a number of cases in which you may want to load your module into a precompiled running kernel, such as the ones shipped with common Linux distributions, or a kernel you have compiled in the past. In certain circumstances you could require to compile and insert a module into a running kernel which you are not allowed to recompile, or on a machine that you prefer not to reboot. If you can't think of a case that will force you to use modules for a precompiled kernel you might want to skip this and treat the rest of this chapter as a big footnote.
Now, if you just install a kernel source tree, use it to compile your kernel module and you try to insert your module into the kernel, in most cases you would obtain an error as follows: Now, if you just install a kernel source tree, use it to compile your kernel module and you try to insert your module into the kernel, in most cases you would obtain an error as follows:
@@ -618,18 +618,18 @@ depends:
To overcome this problem we could resort to the *--force-vermagic* option, but this solution is potentially unsafe, and unquestionably inacceptable in production modules. Consequently, we want to compile our module in an environment which was identical to the one in which our precompiled kernel was built. How to do this, is the subject of the remainder of this chapter. To overcome this problem we could resort to the *--force-vermagic* option, but this solution is potentially unsafe, and unquestionably inacceptable in production modules. Consequently, we want to compile our module in an environment which was identical to the one in which our precompiled kernel was built. How to do this, is the subject of the remainder of this chapter.
First of all, make sure that a kernel source tree is available, having exactly the same version as your current kernel. Then, find the configuration file which was used to compile your precompiled kernel. Usually, this is available in your current /boot directory, under a name like config-2.6.x. You may just want to copy it to your kernel source tree: cp /boot/config-`uname -r` /usr/src/linux-`uname -r`/.config. First of all, make sure that a kernel source tree is available, having exactly the same version as your current kernel. Then, find the configuration file which was used to compile your precompiled kernel. Usually, this is available in your current /boot directory, under a name like config-2.6.x. You may just want to copy it to your kernel source tree: *cp /boot/config-`uname -r` /usr/src/linux-`uname -r`/.config*.
Let's focus again on the previous error message: a closer look at the version magic strings suggests that, even with two configuration files which are exactly the same, a slight difference in the version magic could be possible, and it is sufficient to prevent insertion of the module into the kernel. That slight difference, namely the custom string which appears in the module's version magic and not in the kernel's one, is due to a modification with respect to the original, in the makefile that some distribution include. Then, examine your /usr/src/linux/Makefile, and make sure that the specified version information matches exactly the one used for your current kernel. For example, you makefile could start as follows: Let's focus again on the previous error message: a closer look at the version magic strings suggests that, even with two configuration files which are exactly the same, a slight difference in the version magic could be possible, and it is sufficient to prevent insertion of the module into the kernel. That slight difference, namely the custom string which appears in the module's version magic and not in the kernel's one, is due to a modification with respect to the original, in the makefile that some distribution include. Then, examine your */usr/src/linux/Makefile*, and make sure that the specified version information matches exactly the one used for your current kernel. For example, you makefile could start as follows:
#+BEGIN_SRC makefile #+BEGIN_SRC makefile
VERSION = 2 VERSION = 3
PATCHLEVEL = 6 PATCHLEVEL = 16
SUBLEVEL = 5 SUBLEVEL = 7
EXTRAVERSION = -1.358custom EXTRAVERSION = -1.358custom
#+END_SRC #+END_SRC
In this case, you need to restore the value of symbol EXTRAVERSION to -1.358. We suggest to keep a backup copy of the makefile used to compile your kernel available in */lib/modules/3.16.7-1.358/build*. A simple *cp /lib/modules/`uname-r`/build/Makefile /usr/src/linux-`uname -r`* should suffice. Additionally, if you already started a kernel build with the previous (wrong) Makefile, you should also rerun make, or directly modify symbol UTS_RELEASE in file */usr/src/linux-3.16.x/include/linux/version.h* according to contents of file */lib/modules/3.16.x/build/include/linux/version.h*, or overwrite the latter with the first. In this case, you need to restore the value of symbol *EXTRAVERSION* to -1.358. We suggest to keep a backup copy of the makefile used to compile your kernel available in */lib/modules/3.16.7-1.358/build*. A simple *cp /lib/modules/`uname-r`/build/Makefile /usr/src/linux-`uname -r`* should suffice. Additionally, if you already started a kernel build with the previous (wrong) Makefile, you should also rerun make, or directly modify symbol UTS_RELEASE in file */usr/src/linux-3.16.x/include/linux/version.h* according to contents of file */lib/modules/3.16.x/build/include/linux/version.h*, or overwrite the latter with the first.
Now, please run make to update configuration and version headers and objects: Now, please run make to update configuration and version headers and objects:
@@ -652,17 +652,17 @@ If you do not desire to actually compile the kernel, you can interrupt the build
* Preliminaries * Preliminaries
** How modules begin and end ** How modules begin and end
A program usually begins with a main() function, executes a bunch of instructions and terminates upon completion of those instructions. Kernel modules work a bit differently. A module always begin with either the init_module or the function you specify with module_init call. This is the entry function for modules; it tells the kernel what functionality the module provides and sets up the kernel to run the module's functions when they're needed. Once it does this, entry function returns and the module does nothing until the kernel wants to do something with the code that the module provides. A program usually begins with a *main()* function, executes a bunch of instructions and terminates upon completion of those instructions. Kernel modules work a bit differently. A module always begin with either the init_module or the function you specify with module_init call. This is the entry function for modules; it tells the kernel what functionality the module provides and sets up the kernel to run the module's functions when they're needed. Once it does this, entry function returns and the module does nothing until the kernel wants to do something with the code that the module provides.
All modules end by calling either cleanup_module or the function you specify with the module_exit call. This is the exit function for modules; it undoes whatever entry function did. It unregisters the functionality that the entry function registered. All modules end by calling either *cleanup_module* or the function you specify with the *module_exit* call. This is the exit function for modules; it undoes whatever entry function did. It unregisters the functionality that the entry function registered.
Every module must have an entry function and an exit function. Since there's more than one way to specify entry and exit functions, I'll try my best to use the terms `entry function' and `exit function', but if I slip and simply refer to them as init_module and cleanup_module, I think you'll know what I mean. Every module must have an entry function and an exit function. Since there's more than one way to specify entry and exit functions, I'll try my best to use the terms `entry function' and `exit function', but if I slip and simply refer to them as init_module and cleanup_module, I think you'll know what I mean.
** Functions available to modules ** Functions available to modules
Programmers use functions they don't define all the time. A prime example of this is printf(). You use these library functions which are provided by the standard C library, libc. The definitions for these functions don't actually enter your program until the linking stage, which insures that the code (for printf() for example) is available, and fixes the call instruction to point to that code. Programmers use functions they don't define all the time. A prime example of this is *printf()*. You use these library functions which are provided by the standard C library, libc. The definitions for these functions don't actually enter your program until the linking stage, which insures that the code (for printf() for example) is available, and fixes the call instruction to point to that code.
Kernel modules are different here, too. In the hello world example, you might have noticed that we used a function, printk() but didn't include a standard I/O library. That's because modules are object files whose symbols get resolved upon insmod'ing. The definition for the symbols comes from the kernel itself; the only external functions you can use are the ones provided by the kernel. If you're curious about what symbols have been exported by your kernel, take a look at */proc/kallsyms*. Kernel modules are different here, too. In the hello world example, you might have noticed that we used a function, *printk()* but didn't include a standard I/O library. That's because modules are object files whose symbols get resolved upon insmod'ing. The definition for the symbols comes from the kernel itself; the only external functions you can use are the ones provided by the kernel. If you're curious about what symbols have been exported by your kernel, take a look at */proc/kallsyms*.
One point to keep in mind is the difference between library functions and system calls. Library functions are higher level, run completely in user space and provide a more convenient interface for the programmer to the functions that do the real work---system calls. System calls run in kernel mode on the user's behalf and are provided by the kernel itself. The library function printf() may look like a very general printing function, but all it really does is format the data into strings and write the string data using the low-level system call write(), which then sends the data to standard output. One point to keep in mind is the difference between library functions and system calls. Library functions are higher level, run completely in user space and provide a more convenient interface for the programmer to the functions that do the real work---system calls. System calls run in kernel mode on the user's behalf and are provided by the kernel itself. The library function printf() may look like a very general printing function, but all it really does is format the data into strings and write the string data using the low-level system call write(), which then sends the data to standard output.
@@ -1733,7 +1733,7 @@ Device files are supposed to represent physical devices. Most physical devices a
This is not always enough. Imagine you had a serial port connected to a modem (even if you have an internal modem, it is still implemented from the CPU's perspective as a serial port connected to a modem, so you don't have to tax your imagination too hard). The natural thing to do would be to use the device file to write things to the modem (either modem commands or data to be sent through the phone line) and read things from the modem (either responses for commands or the data received through the phone line). However, this leaves open the question of what to do when you need to talk to the serial port itself, for example to send the rate at which data is sent and received. This is not always enough. Imagine you had a serial port connected to a modem (even if you have an internal modem, it is still implemented from the CPU's perspective as a serial port connected to a modem, so you don't have to tax your imagination too hard). The natural thing to do would be to use the device file to write things to the modem (either modem commands or data to be sent through the phone line) and read things from the modem (either responses for commands or the data received through the phone line). However, this leaves open the question of what to do when you need to talk to the serial port itself, for example to send the rate at which data is sent and received.
The answer in Unix is to use a special function called ioctl (short for Input Output ConTroL). Every device can have its own ioctl commands, which can be read ioctl's (to send information from a process to the kernel), write ioctl's (to return information to a process), [10] both or neither. The ioctl function is called with three parameters: the file descriptor of the appropriate device file, the ioctl number, and a parameter, which is of type long so you can use a cast to use it to pass anything. [11] The answer in Unix is to use a special function called *ioctl* (short for Input Output ConTroL). Every device can have its own ioctl commands, which can be read ioctl's (to send information from a process to the kernel), write ioctl's (to return information to a process), [10] both or neither. The ioctl function is called with three parameters: the file descriptor of the appropriate device file, the ioctl number, and a parameter, which is of type long so you can use a cast to use it to pass anything. [11]
The ioctl number encodes the major device number, the type of the ioctl, the command, and the type of the parameter. This ioctl number is usually created by a macro call (_IO, _IOR, _IOW or _IOWR --- depending on the type) in a header file. This header file should then be included both by the programs which will use ioctl (so they can generate the appropriate ioctl's) and by the kernel module (so it can understand it). In the example below, the header file is chardev.h and the program which uses it is ioctl.c. The ioctl number encodes the major device number, the type of the ioctl, the command, and the type of the parameter. This ioctl number is usually created by a macro call (_IO, _IOR, _IOW or _IOWR --- depending on the type) in a header file. This header file should then be included both by the programs which will use ioctl (so they can generate the appropriate ioctl's) and by the kernel module (so it can understand it). In the example below, the header file is chardev.h and the program which uses it is ioctl.c.
@@ -2194,11 +2194,11 @@ main()
* System Calls * System Calls
So far, the only thing we've done was to use well defined kernel mechanisms to register /proc files and device handlers. This is fine if you want to do something the kernel programmers thought you'd want, such as write a device driver. But what if you want to do something unusual, to change the behavior of the system in some way? Then, you're mostly on your own. So far, the only thing we've done was to use well defined kernel mechanisms to register */proc* files and device handlers. This is fine if you want to do something the kernel programmers thought you'd want, such as write a device driver. But what if you want to do something unusual, to change the behavior of the system in some way? Then, you're mostly on your own.
This is where kernel programming gets dangerous. While writing the example below, I killed the open() system call. This meant I couldn't open any files, I couldn't run any programs, and I couldn't shutdown the computer. I had to pull the power switch. Luckily, no files died. To ensure you won't lose any files either, please run sync right before you do the insmod and the rmmod. If you're not being sensible and using a virtual machine then this is where kernel programming can become hazardous. While writing the example below, I killed the *open()* system call. This meant I couldn't open any files, I couldn't run any programs, and I couldn't shutdown the system. I had to restart the virtual machine. No important files got anihilated, but if I was doing this on some live mission critical system then that could have been a possible outcome. To ensure you don't lose any files, even within a test environment, please run *sync* right before you do the *insmod* and the *rmmod*.
Forget about /proc files, forget about device files. They're just minor details. The real process to kernel communication mechanism, the one used by all processes, is system calls. When a process requests a service from the kernel (such as opening a file, forking to a new process, or requesting more memory), this is the mechanism used. If you want to change the behaviour of the kernel in interesting ways, this is the place to do it. By the way, if you want to see which system calls a program uses, run strace <arguments>. Forget about */proc* files, forget about device files. They're just minor details. Minutiae in the vast expanse of the universe. The real process to kernel communication mechanism, the one used by all processes, is /system calls/. When a process requests a service from the kernel (such as opening a file, forking to a new process, or requesting more memory), this is the mechanism used. If you want to change the behaviour of the kernel in interesting ways, this is the place to do it. By the way, if you want to see which system calls a program uses, run *strace <arguments>*.
In general, a process is not supposed to be able to access the kernel. It can't access kernel memory and it can't call kernel functions. The hardware of the CPU enforces this (that's the reason why it's called `protected mode'). In general, a process is not supposed to be able to access the kernel. It can't access kernel memory and it can't call kernel functions. The hardware of the CPU enforces this (that's the reason why it's called `protected mode').
@@ -2208,13 +2208,13 @@ The location in the kernel a process can jump to is called system_call. The proc
So, if we want to change the way a certain system call works, what we need to do is to write our own function to implement it (usually by adding a bit of our own code, and then calling the original function) and then change the pointer at sys_call_table to point to our function. Because we might be removed later and we don't want to leave the system in an unstable state, it's important for cleanup_module to restore the table to its original state. So, if we want to change the way a certain system call works, what we need to do is to write our own function to implement it (usually by adding a bit of our own code, and then calling the original function) and then change the pointer at sys_call_table to point to our function. Because we might be removed later and we don't want to leave the system in an unstable state, it's important for cleanup_module to restore the table to its original state.
The source code here is an example of such a kernel module. We want to `spy' on a certain user, and to printk() a message whenever that user opens a file. Towards this end, we replace the system call to open a file with our own function, called our_sys_open. This function checks the uid (user's id) of the current process, and if it's equal to the uid we spy on, it calls printk() to display the name of the file to be opened. Then, either way, it calls the original open() function with the same parameters, to actually open the file. The source code here is an example of such a kernel module. We want to "spy" on a certain user, and to printk() a message whenever that user opens a file. Towards this end, we replace the system call to open a file with our own function, called *our_sys_open*. This function checks the uid (user's id) of the current process, and if it's equal to the uid we spy on, it calls printk() to display the name of the file to be opened. Then, either way, it calls the original open() function with the same parameters, to actually open the file.
The init_module function replaces the appropriate location in sys_call_table and keeps the original pointer in a variable. The cleanup_module function uses that variable to restore everything back to normal. This approach is dangerous, because of the possibility of two kernel modules changing the same system call. Imagine we have two kernel modules, A and B. A's open system call will be A_open and B's will be B_open. Now, when A is inserted into the kernel, the system call is replaced with A_open, which will call the original sys_open when it's done. Next, B is inserted into the kernel, which replaces the system call with B_open, which will call what it thinks is the original system call, A_open, when it's done. The *init_module* function replaces the appropriate location in *sys_call_table* and keeps the original pointer in a variable. The cleanup_module function uses that variable to restore everything back to normal. This approach is dangerous, because of the possibility of two kernel modules changing the same system call. Imagine we have two kernel modules, A and B. A's open system call will be A_open and B's will be B_open. Now, when A is inserted into the kernel, the system call is replaced with A_open, which will call the original sys_open when it's done. Next, B is inserted into the kernel, which replaces the system call with B_open, which will call what it thinks is the original system call, A_open, when it's done.
Now, if B is removed first, everything will be well---it will simply restore the system call to A_open, which calls the original. However, if A is removed and then B is removed, the system will crash. A's removal will restore the system call to the original, sys_open, cutting B out of the loop. Then, when B is removed, it will restore the system call to what it thinks is the original, A_open, which is no longer in memory. At first glance, it appears we could solve this particular problem by checking if the system call is equal to our open function and if so not changing it at all (so that B won't change the system call when it's removed), but that will cause an even worse problem. When A is removed, it sees that the system call was changed to B_open so that it is no longer pointing to A_open, so it won't restore it to sys_open before it is removed from memory. Unfortunately, B_open will still try to call A_open which is no longer there, so that even without removing B the system would crash. Now, if B is removed first, everything will be well---it will simply restore the system call to A_open, which calls the original. However, if A is removed and then B is removed, the system will crash. A's removal will restore the system call to the original, sys_open, cutting B out of the loop. Then, when B is removed, it will restore the system call to what it thinks is the original, *A_open*, which is no longer in memory. At first glance, it appears we could solve this particular problem by checking if the system call is equal to our open function and if so not changing it at all (so that B won't change the system call when it's removed), but that will cause an even worse problem. When A is removed, it sees that the system call was changed to *B_open* so that it is no longer pointing to *A_open*, so it won't restore it to *sys_open* before it is removed from memory. Unfortunately, *B_open* will still try to call *A_open* which is no longer there, so that even without removing B the system would crash.
Note that all the related problems make syscall stealing unfeasiable for production use. In order to keep people from doing potential harmful things sys_call_table is no longer exported. This means, if you want to do something more than a mere dry run of this example, you will have to patch your current kernel in order to have sys_call_table exported. In the example directory you will find a README and the patch. As you can imagine, such modifications are not to be taken lightly. Do not try this on valueable systems (ie systems that you do not own - or cannot restore easily). You'll need to get the complete sourcecode of this guide as a tarball in order to get the patch and the README. Depending on your kernel version, you might even need to hand apply the patch. Still here? Well, so is this chapter. If Wyle E. Coyote was a kernel hacker, this would be the first thing he'd try. ;) Note that all the related problems make syscall stealing unfeasiable for production use. In order to keep people from doing potential harmful things *sys_call_table* is no longer exported. This means, if you want to do something more than a mere dry run of this example, you will have to patch your current kernel in order to have sys_call_table exported. In the example directory you will find a README and the patch. As you can imagine, such modifications are not to be taken lightly. Do not try this on valueable systems (ie systems that you do not own - or cannot restore easily). You'll need to get the complete sourcecode of this guide as a tarball in order to get the patch and the README. Depending on your kernel version, you might even need to hand apply the patch. Still here? Well, so is this chapter. If Wyle E. Coyote was a kernel hacker, this would be the first thing he'd try. ;)
** Example: syscall.c ** Example: syscall.c
#+BEGIN_SRC c #+BEGIN_SRC c
@@ -2379,18 +2379,18 @@ void cleanup_module()
* Blocking Processes * Blocking Processes
What do you do when somebody asks you for something you can't do right away? If you're a human being and you're bothered by a human being, the only thing you can say is: "Not right now, I'm busy. Go away!". But if you're a kernel module and you're bothered by a process, you have another possibility. You can put the process to sleep until you can service it. After all, processes are being put to sleep by the kernel and woken up all the time (that's the way multiple processes appear to run on the same time on a single CPU). What do you do when somebody asks you for something you can't do right away? If you're a human being and you're bothered by a human being, the only thing you can say is: /"Not right now, I'm busy. Go away!"/. But if you're a kernel module and you're bothered by a process, you have another possibility. You can put the process to sleep until you can service it. After all, processes are being put to sleep by the kernel and woken up all the time (that's the way multiple processes appear to run on the same time on a single CPU).
This kernel module is an example of this. The file (called /proc/sleep) can only be opened by a single process at a time. If the file is already open, the kernel module calls wait_event_interruptible[12]. This function changes This kernel module is an example of this. The file (called */proc/sleep*) can only be opened by a single process at a time. If the file is already open, the kernel module calls wait_event_interruptible[12]. This function changes
the status of the task (a task is the kernel data structure which holds information about a process and the system call it's in, if any) to TASK_INTERRUPTIBLE, which means that the task will not run until it is woken up somehow, and adds it to WaitQ, the queue of tasks waiting to access the file. Then, the function calls the scheduler to context switch to a different process, one which has some use for the CPU. the status of the task (a task is the kernel data structure which holds information about a process and the system call it's in, if any) to *TASK_INTERRUPTIBLE*, which means that the task will not run until it is woken up somehow, and adds it to WaitQ, the queue of tasks waiting to access the file. Then, the function calls the scheduler to context switch to a different process, one which has some use for the CPU.
When a process is done with the file, it closes it, and module_close is called. That function wakes up all the processes in the queue (there's no mechanism to only wake up one of them). It then returns and the process which just closed the file can continue to run. In time, the scheduler decides that that process has had enough and gives control of the CPU to another process. Eventually, one of the processes which was in the queue will be given control of the CPU by the scheduler. It starts at the point right after the call to module_interruptible_sleep_on[13]. It can then proceed to set a global variable to tell all the other processes that the file is still open and go on with its life. When the other processes get a piece of the CPU, they'll see that global variable and go back to sleep. When a process is done with the file, it closes it, and module_close is called. That function wakes up all the processes in the queue (there's no mechanism to only wake up one of them). It then returns and the process which just closed the file can continue to run. In time, the scheduler decides that that process has had enough and gives control of the CPU to another process. Eventually, one of the processes which was in the queue will be given control of the CPU by the scheduler. It starts at the point right after the call to module_interruptible_sleep_on[13]. It can then proceed to set a global variable to tell all the other processes that the file is still open and go on with its life. When the other processes get a piece of the CPU, they'll see that global variable and go back to sleep.
So we'll use tail -f to keep the file open in the background, while trying to access it with another process (again in the background, so that we need not switch to a different vt). As soon as the first background process is killed with kill %1 , the second is woken up, is able to access the file and finally terminates. So we'll use tail -f to keep the file open in the background, while trying to access it with another process (again in the background, so that we need not switch to a different vt). As soon as the first background process is killed with kill %1 , the second is woken up, is able to access the file and finally terminates.
To make our life more interesting, module_close doesn't have a monopoly on waking up the processes which wait to access the file. A signal, such as Ctrl +c (SIGINT) can also wake up a process. [14] In that case, we want to return with -EINTR immediately. This is important so users can, for example, kill the process before it receives the file. To make our life more interesting, module_close doesn't have a monopoly on waking up the processes which wait to access the file. A signal, such as /Ctrl +c/ (*SIGINT*) can also wake up a process. [14] In that case, we want to return with *-EINTR* immediately. This is important so users can, for example, kill the process before it receives the file.
There is one more point to remember. Some times processes don't want to sleep, they want either to get what they want immediately, or to be told it cannot be done. Such processes use the O_NONBLOCK flag when opening the file. The kernel is supposed to respond by returning with the error code -EAGAIN from operations which would otherwise block, such as opening the file in this example. The program cat_noblock, available in the source directory for this chapter, can be used to open a file with O_NONBLOCK. There is one more point to remember. Some times processes don't want to sleep, they want either to get what they want immediately, or to be told it cannot be done. Such processes use the *O_NONBLOCK* flag when opening the file. The kernel is supposed to respond by returning with the error code *-EAGAIN* from operations which would otherwise block, such as opening the file in this example. The program cat_noblock, available in the source directory for this chapter, can be used to open a file with *O_NONBLOCK*.
#+BEGIN_SRC sh #+BEGIN_SRC sh
hostname:~/lkmpg-examples/09-BlockingProcesses# insmod sleep.ko hostname:~/lkmpg-examples/09-BlockingProcesses# insmod sleep.ko
@@ -2416,6 +2416,8 @@ hostname:~/lkmpg-examples/09-BlockingProcesses#
#+END_SRC #+END_SRC
** Example: sleep.c ** Example: sleep.c
TODO: this will be out of date
#+BEGIN_SRC c #+BEGIN_SRC c
/* /*
* sleep.c - create a /proc file, and if several processes try to open it at * sleep.c - create a /proc file, and if several processes try to open it at
@@ -2486,8 +2488,8 @@ static ssize_t module_output(struct file *file, /* see include/linux/fs.h */
static ssize_t module_input(struct file *file, /* The file itself */ static ssize_t module_input(struct file *file, /* The file itself */
const char *buf, /* The buffer with input */ const char *buf, /* The buffer with input */
size_t length, /* The buffer's length */ size_t length, /* The buffer's length */
loff_t * offset) loff_t * offset) /* offset to file - ignore */
{ /* offset to file - ignore */ {
int i; int i;
/* /*