Lines Matching +full:a +full:- +full:c
14 requirements for kernel code: its goal is to serve as a primer for Linux
15 kernel development for experienced C programmers. I avoid implementation
20 this document, being grossly under-qualified, but I always wanted to
21 read it, and this was the only way. I hope it will grow into a
28 At any time each of the CPUs in a system can be:
30 - not associated with any process, serving a hardware interrupt;
32 - not associated with any process, serving a softirq or tasklet;
34 - running in kernel space, associated with a process (user context);
36 - running a process in user space.
39 other, but above that is a strict hierarchy: each can only be preempted
40 by the ones above it. For example, while a softirq is running on a CPU,
41 no other softirq will preempt it, but a hardware interrupt can. However,
44 We'll see a number of ways that the user context can block interrupts,
45 to become truly non-preemptable.
48 ------------
50 User context is when you are coming in from a system call or other trap:
52 interrupts. You can sleep, by calling :c:func:`schedule()`.
60 currently executing) is valid, and :c:func:`in_interrupt()`
66 :c:func:`in_interrupt()` will return a false positive.
69 -------------------------------
74 handler is never re-entered: if the same interrupt arrives, it is queued
76 fast: frequently it simply acknowledges the interrupt, marks a 'software
79 You can tell you are in a hardware interrupt, because in_hardirq() returns
84 Beware that this will return a false positive if interrupts are
88 -------------------------------------------------
90 Whenever a system call is about to return to userspace, or a hardware
92 pending (usually by hardware interrupts) are run (``kernel/softirq.c``).
96 take advantage of multiple CPUs. Shortly after we switched from wind-up
97 computers made of match-sticks and snot, we abandoned this limitation
100 ``include/linux/interrupt.h`` lists the different softirqs. A very
102 can register to have it call functions for you in a given length of
105 Softirqs are often a pain to deal with, since the same softirq will run
108 dynamically-registrable (meaning you can have as many as you want), and
117 You can tell you are in a softirq (or tasklet) using the
118 :c:func:`in_softirq()` macro (``include/linux/preempt.h``).
122 Beware that this will return a false positive if a
138 avoid context switches). It is generally a bad idea; use fixed point
141 A rigid stack limit
143 6K for most 32-bit architectures: it's about 14K on most 64-bit
149 Let's keep it that way. Your code should be 64-bit clean, and
150 endian-independent. You should also minimize CPU specific stuff,
153 architecture-dependent part of the kernel tree.
155 ioctls: Not writing a new system call
158 A system call generally looks like this::
166 First, in most cases you don't want to create a new system call. You
167 create a character device and implement an appropriate ioctl for it.
174 implementing a :c:func:`sysfs()` interface instead.
176 Inside the ioctl you're in user context to a process. When a error
177 occurs you return a negated errno (see
178 ``include/uapi/asm-generic/errno-base.h``,
179 ``include/uapi/asm-generic/errno.h`` and ``include/linux/errno.h``),
182 After you slept you should check if a signal occurred: the Unix/Linux
184 ``-ERESTARTSYS`` error. The system call entry code will switch back to
193 return -ERESTARTSYS;
204 A short note on interface design: the UNIX system call motto is "Provide
212 - You are in user context.
214 - You do not own any spinlocks.
216 - You have interrupts enabled (actually, Andi Kleen says that the
233 :c:func:`printk()`
234 ------------------
238 :c:func:`printk()` feeds kernel messages to the console, dmesg, and
240 can be used inside interrupt context, but use with caution: a machine
242 a format string mostly compatible with ANSI C printf, and C string
243 concatenation to give it a first "priority" argument::
256 :c:func:`printk()` internally uses a 1K buffer and does not catch
261 You will know when you are a real kernel hacker when you start
266 Another sidenote: the original Unix Version 6 sources had a comment
268 chit-chat". You should follow that advice.
270 :c:func:`copy_to_user()` / :c:func:`copy_from_user()` / :c:func:`get_user()` / :c:func:`put_user()`
271 ---------------------------------------------------------------------------------------------------
277 :c:func:`put_user()` and :c:func:`get_user()` are used to get
279 userspace. A pointer into userspace should never be simply dereferenced:
280 data should be copied using these routines. Both return ``-EFAULT`` or
283 :c:func:`copy_to_user()` and :c:func:`copy_from_user()` are
289 Unlike :c:func:`put_user()` and :c:func:`get_user()`, they
293 up every year or so. --RR.]
296 user context (it makes no sense), with interrupts disabled, or a
299 :c:func:`kmalloc()`/:c:func:`kfree()`
300 -------------------------------------
306 These routines are used to dynamically request pointer-aligned chunks of
308 :c:func:`kmalloc()` takes an extra flag word. Important values:
316 from interrupt context. You should **really** have a good
317 out-of-memory error-handling strategy.
323 If you see a sleeping function called from invalid context warning
324 message, then maybe you called a sleeping allocation function from
329 ``asm/page_types.h``) bytes, consider using :c:func:`__get_free_pages()`
334 If you are allocating more than a page worth of bytes you can use
335 :c:func:`vmalloc()`. It'll allocate virtual memory in the kernel
339 contiguous memory for some weird device, you have a problem: it is
341 in a running kernel makes it hard. The best way is to allocate the block
342 early in the boot process via the :c:func:`alloc_bootmem()`
345 Before inventing your own cache of often-used objects consider using a
348 :c:macro:`current`
349 ------------------
353 This global variable (really a macro) contains a pointer to the current
354 task structure, so is only valid in user context. For example, when a
355 process makes a system call, this will point to the task structure of
358 :c:func:`mdelay()`/:c:func:`udelay()`
359 -------------------------------------
363 The :c:func:`udelay()` and :c:func:`ndelay()` functions can be
365 overflow - the helper function :c:func:`mdelay()` is useful here, or
366 consider :c:func:`msleep()`.
368 :c:func:`cpu_to_be32()`/:c:func:`be32_to_cpu()`/:c:func:`cpu_to_le32()`/:c:func:`le32_to_cpu()`
369 -----------------------------------------------------------------------------------------------
373 The :c:func:`cpu_to_be32()` family (where the "32" can be replaced
377 :c:func:`be32_to_cpu()`, etc.
380 variation, such as :c:func:`cpu_to_be32p()`, which take a pointer
382 is the "in-situ" family, such as :c:func:`cpu_to_be32s()`, which
385 :c:func:`local_irq_save()`/:c:func:`local_irq_restore()`
386 --------------------------------------------------------
393 enabled, you can simply use :c:func:`local_irq_disable()` and
394 :c:func:`local_irq_enable()`.
398 :c:func:`local_bh_disable()`/:c:func:`local_bh_enable()`
399 --------------------------------------------------------
409 :c:func:`smp_processor_id()`
410 ----------------------------
414 :c:func:`get_cpu()` disables preemption (so you won't suddenly get
417 continuous. You return it again with :c:func:`put_cpu()` when you
425 ------------------------------------
429 After boot, the kernel frees up a special section; functions marked with
432 initialization. ``__exit`` is used to declare a function which is only
434 compiled as a module. See the header file for use. Note that it makes no
435 sense for a function marked with ``__init`` to be exported to modules
436 with :c:func:`EXPORT_SYMBOL()` or :c:func:`EXPORT_SYMBOL_GPL()`- this
439 :c:func:`__initcall()`/:c:func:`module_init()`
440 ----------------------------------------------
444 Many parts of the kernel are well served as a module
445 (dynamically-loadable parts of the kernel). Using the
446 :c:func:`module_init()` and :c:func:`module_exit()` macros it
447 is easy to write code without #ifdefs which can operate both as a module
450 The :c:func:`module_init()` macro defines which function is to be
451 called at module insertion time (if the file is compiled as a module),
452 or at boot time: if the file is not compiled as a module the
453 :c:func:`module_init()` macro becomes equivalent to
454 :c:func:`__initcall()`, which through linker magic ensures that
457 The function can return a negative error number to cause module loading
462 :c:func:`module_exit()`
463 -----------------------
475 not be removable (except for 'rmmod -f').
477 :c:func:`try_module_get()`/:c:func:`module_put()`
478 -------------------------------------------------
482 These manipulate the module usage count, to protect against removal (a
485 :c:func:`try_module_get()` on that module: if it fails, then the
488 :c:func:`module_put()` when you're finished.
491 :c:type:`struct file_operations <file_operations>` structure.
499 A wait queue is used to wait for someone to wake you up when a certain
501 race condition. You declare a :c:type:`wait_queue_head_t`, and then processes
502 which want to wait for that condition declare a :c:type:`wait_queue_entry_t`
506 ---------
508 You declare a ``wait_queue_head_t`` using the
509 :c:func:`DECLARE_WAIT_QUEUE_HEAD()` macro, or using the
510 :c:func:`init_waitqueue_head()` routine in your initialization
514 -------
517 put yourself in the queue before checking the condition. There is a
518 macro to do this: :c:func:`wait_event_interruptible()`
521 this expression is true, or ``-ERESTARTSYS`` if a signal is received. The
522 :c:func:`wait_event()` version ignores signals.
525 ----------------------
527 Call :c:func:`wake_up()` (``include/linux/wait.h``), which will wake
537 class of operations work on :c:type:`atomic_t` (``include/asm/atomic.h``);
538 this contains a signed integer (at least 32 bits long), and you must use
539 these functions to manipulate or read :c:type:`atomic_t` variables.
540 :c:func:`atomic_read()` and :c:func:`atomic_set()` get and set
541 the counter, :c:func:`atomic_add()`, :c:func:`atomic_sub()`,
542 :c:func:`atomic_inc()`, :c:func:`atomic_dec()`, and
543 :c:func:`atomic_dec_and_test()` (returns true if it was
553 operations generally take a pointer to the bit pattern, and a bit
554 number: 0 is the least significant bit. :c:func:`set_bit()`,
555 :c:func:`clear_bit()` and :c:func:`change_bit()` set, clear,
556 and flip the given bit. :c:func:`test_and_set_bit()`,
557 :c:func:`test_and_clear_bit()` and
558 :c:func:`test_and_change_bit()` do the same thing, except return
563 ``BITS_PER_LONG``. The resulting behavior is strange on big-endian
564 platforms though so it is a good idea not to do this.
569 Within the kernel proper, the normal linking rules apply (ie. unless a
571 be used anywhere in the kernel). However, for modules, a special
575 :c:func:`EXPORT_SYMBOL()`
576 -------------------------
580 This is the classic method of exporting a symbol: dynamically loaded
583 :c:func:`EXPORT_SYMBOL_GPL()`
584 -----------------------------
588 Similar to :c:func:`EXPORT_SYMBOL()` except that the symbols
589 exported by :c:func:`EXPORT_SYMBOL_GPL()` can only be seen by
590 modules with a :c:func:`MODULE_LICENSE()` that specifies a GPL
596 :c:func:`EXPORT_SYMBOL_NS()`
597 ----------------------------
601 This is the variant of `EXPORT_SYMBOL()` that allows specifying a symbol
603 Documentation/core-api/symbol-namespaces.rst
605 :c:func:`EXPORT_SYMBOL_NS_GPL()`
606 --------------------------------
610 This is the variant of `EXPORT_SYMBOL_GPL()` that allows specifying a symbol
612 Documentation/core-api/symbol-namespaces.rst
617 Double-linked lists ``include/linux/list.h``
618 --------------------------------------------
620 There used to be three sets of linked-list routines in the kernel
622 pressing need for a single list, it's a good choice.
624 In particular, :c:func:`list_for_each_entry()` is useful.
627 ------------------
629 For code called in user context, it's very common to defy C convention,
630 and return 0 for success, and a negative error number (eg. ``-EFAULT``) for
634 Using :c:func:`ERR_PTR()` (``include/linux/err.h``) to encode a
635 negative error number into a pointer, and :c:func:`IS_ERR()` and
636 :c:func:`PTR_ERR()` to get it back out again: avoids a separate
637 pointer parameter for the error number. Icky, but in a good way.
640 --------------------
644 their toes: it reflects a fundamental change (eg. can no longer be
646 which were caught before). Usually this is accompanied by a fairly
648 the archives. Simply doing a global replace on the file usually makes
652 ------------------------------
669 --------------
674 info page section "C Extensions" for more details - Yes, really the info
675 page, the man page is only a short summary of the stuff in info).
677 - Inline functions
679 - Statement expressions (ie. the ({ and }) constructs).
681 - Declaring attributes of a function / variable / type
684 - typeof
686 - Zero length arrays
688 - Macro varargs
690 - Arithmetic on void pointers
692 - Non-Constant initializers
694 - Assembler Instructions (not outside arch/ and include/asm/)
696 - Function names as strings (__func__).
698 - __builtin_constant_p()
705 C++
706 ---
708 Using C++ in the kernel is usually a bad idea, because the kernel does
714 ---
717 the top of .c files) to abstract away functions rather than using \`#if'
718 pre-processor statements throughout the source code.
724 make a neat patch, there's administrative work to be done:
726 - Figure out who are the owners of the code you've been modifying. Look
734 will look when they find a bug, or when **they** want to make a change.
736 - Usually you want a configuration option for your kernel hack. Edit
739 ``Documentation/kbuild/kconfig-language.rst``.
747 - Edit the ``Makefile``: the CONFIG variables are exported here so you
748 can usually just add a "obj-$(CONFIG_xxx) += xxx.o" line. The syntax
751 - Put yourself in ``CREDITS`` if you consider what you've done
752 noteworthy, usually beyond a single file (your name should be at the
754 consulted when changes are made to a subsystem, and hear about bugs;
755 it implies a more-than-passing commitment to some part of the code.
757 - Finally, don't forget to read
758 ``Documentation/process/submitting-patches.rst``
775 * Kernel pointers have redundant information, so we can use a
776 * scheme where we can return either an error code or a dentry
779 * This should be a per-architecture thing, to allow different
784 #define IS_ERR(ptr) ((unsigned long)(ptr) > (unsigned long)(-1000))
798 * At least we *know* we can't spell, and use a spell-checker.
807 /* Tested on SS-5, SS-10. Probably someone at Sun applied a spell-checker. */
826 clarity fixes, and some excellent non-obvious points. Werner Almesberger
827 for giving me a great summary of :c:func:`disable_irq()`, and Jes