Lines Matching +full:high +full:- +full:side
16 ------------
18 Read-copy update (RCU) is a synchronization mechanism that is often used
19 as a replacement for reader-writer locking. RCU is unusual in that
20 updaters do not block readers, which means that RCU's read-side
28 thought of as an informal, high-level specification for RCU. It is
40 #. `Fundamental Non-Requirements`_
42 #. `Quality-of-Implementation Requirements`_
44 #. `Software-Engineering Requirements`_
53 ------------------------
58 #. `Grace-Period Guarantee`_
60 #. `Memory-Barrier Guarantees`_
62 #. `Guaranteed Read-to-Write Upgrade`_
64 Grace-Period Guarantee
67 RCU's grace-period guarantee is unusual in being premeditated: Jack
73 RCU's grace-period guarantee allows updaters to wait for the completion
74 of all pre-existing RCU read-side critical sections. An RCU read-side
77 RCU treats a nested set as one big RCU read-side critical section.
78 Production-quality implementations of rcu_read_lock() and
105 Because the synchronize_rcu() on line 14 waits for all pre-existing
119 +-----------------------------------------------------------------------+
121 +-----------------------------------------------------------------------+
123 | progress concurrently with readers, but pre-existing readers will |
126 +-----------------------------------------------------------------------+
128 +-----------------------------------------------------------------------+
131 | Second, even when using synchronize_rcu(), the other update-side |
132 | code does run concurrently with readers, whether pre-existing or not. |
133 +-----------------------------------------------------------------------+
173 The RCU read-side critical section in do_something_dlm() works with
178 +-----------------------------------------------------------------------+
180 +-----------------------------------------------------------------------+
182 +-----------------------------------------------------------------------+
184 +-----------------------------------------------------------------------+
188 +-----------------------------------------------------------------------+
190 In order to avoid fatal problems such as deadlocks, an RCU read-side
192 Similarly, an RCU read-side critical section must not contain anything
196 Although RCU's grace-period guarantee is useful in and of itself, with
198 be good to be able to use RCU to coordinate read-side access to linked
199 data structures. For this, the grace-period guarantee is not sufficient,
203 non-\ ``NULL``, locklessly accessing the ``->a`` and ``->b`` fields.
211 5 return -ENOMEM;
217 11 p->a = a;
218 12 p->b = a;
233 5 return -ENOMEM;
240 12 p->a = a;
241 13 p->b = a;
247 executes line 11, it will see garbage in the ``->a`` and ``->b`` fields.
251 which brings us to the publish-subscribe guarantee discussed in the next
257 RCU's publish-subscribe guarantee allows data to be inserted into a
269 5 return -ENOMEM;
275 11 p->a = a;
276 12 p->b = a;
289 +-----------------------------------------------------------------------+
291 +-----------------------------------------------------------------------+
293 | assignments to ``p->a`` and ``p->b`` from being reordered. Can't that |
295 +-----------------------------------------------------------------------+
297 +-----------------------------------------------------------------------+
300 | initialized. So reordering the assignments to ``p->a`` and ``p->b`` |
302 +-----------------------------------------------------------------------+
305 control its accesses to the RCU-protected data, as shown in
315 6 do_something(p->a, p->b);
335 5 do_something(gp->a, gp->b);
344 the current structure with a new one, the fetches of ``gp->a`` and
345 ``gp->b`` might well come from two different structures, which could
356 6 do_something(p->a, p->b);
365 barriers in the Linux kernel. Should a |high-quality implementation of
370 outermost RCU read-side critical section containing that
376 .. |high-quality implementation of C11 memory_order_consume [PDF]| replace:: high-quality implement…
377 .. _high-quality implementation of C11 memory_order_consume [PDF]: http://www.rdrop.com/users/paulm…
383 Of course, it is also necessary to remove elements from RCU-protected
387 #. Wait for all pre-existing RCU read-side critical sections to complete
388 (because only pre-existing readers can possibly have a reference to
427 read-side critical section or in a code segment where the pointer
429 update-side lock.
431 +-----------------------------------------------------------------------+
433 +-----------------------------------------------------------------------+
436 +-----------------------------------------------------------------------+
438 +-----------------------------------------------------------------------+
442 | in a byte-at-a-time manner, resulting in *load tearing*, in turn |
443 | resulting a bytewise mash-up of two distinct pointer values. It might |
444 | even use value-speculation optimizations, where it makes a wrong |
447 | about any dereferences that returned pre-initialization garbage in |
454 +-----------------------------------------------------------------------+
456 In short, RCU's publish-subscribe guarantee is provided by the
458 guarantee allows data elements to be safely added to RCU-protected
460 can be used in combination with the grace-period guarantee to also allow
461 data elements to be removed from RCU-protected linked data structures,
467 resembling the dependency-ordering barrier that was later subsumed
470 late-1990s meeting with the DEC Alpha architects, back in the days when
471 DEC was still a free-standing company. It took the Alpha architects a
480 Memory-Barrier Guarantees
483 The previous section's simple linked-data-structure scenario clearly
484 demonstrates the need for RCU's stringent memory-ordering guarantees on
487 #. Each CPU that has an RCU read-side critical section that begins
489 memory barrier between the time that the RCU read-side critical
491 this guarantee, a pre-existing RCU read-side critical section might
494 #. Each CPU that has an RCU read-side critical section that ends after
497 time that the RCU read-side critical section begins. Without this
498 guarantee, a later RCU read-side critical section running after the
514 +-----------------------------------------------------------------------+
516 +-----------------------------------------------------------------------+
517 | Given that multiple CPUs can start RCU read-side critical sections at |
519 | whether or not a given RCU read-side critical section starts before a |
521 +-----------------------------------------------------------------------+
523 +-----------------------------------------------------------------------+
524 | If RCU cannot tell whether or not a given RCU read-side critical |
526 | it must assume that the RCU read-side critical section started first. |
528 | waiting on a given RCU read-side critical section only if it can |
534 | within the enclosed RCU read-side critical section to the code |
536 | then a given RCU read-side critical section begins before a given |
547 +-----------------------------------------------------------------------+
549 +-----------------------------------------------------------------------+
551 +-----------------------------------------------------------------------+
554 +-----------------------------------------------------------------------+
556 +-----------------------------------------------------------------------+
564 | #. CPU 1: ``do_something_with(q->a);`` |
571 | end of the RCU read-side critical section and the end of the grace |
584 | #. CPU 1: ``do_something_with(q->a); /* Boom!!! */`` |
588 | grace period and the beginning of the RCU read-side critical section, |
594 | believing that you have adhered to the as-if rule than it is to |
596 +-----------------------------------------------------------------------+
598 +-----------------------------------------------------------------------+
600 +-----------------------------------------------------------------------+
603 | compiler might arbitrarily rearrange consecutive RCU read-side |
604 | critical sections. Given such rearrangement, if a given RCU read-side |
606 | read-side critical sections are done? Won't the compiler |
608 +-----------------------------------------------------------------------+
610 +-----------------------------------------------------------------------+
614 | schedule() had better prevent calling-code accesses to shared |
616 | RCU detects the end of a given RCU read-side critical section, it |
617 | will necessarily detect the end of all prior RCU read-side critical |
621 | loop, into user-mode code, and so on. But if your kernel build allows |
623 +-----------------------------------------------------------------------+
625 Note that these memory-barrier requirements do not replace the
627 pre-existing readers. On the contrary, the memory barriers called out in
635 The common-case RCU primitives are unconditional. They are invoked, they
642 guarantee was reverse-engineered, not premeditated. The unconditional
650 Guaranteed Read-to-Write Upgrade
654 within an RCU read-side critical section. For example, that RCU
655 read-side critical section might search for a given data element, and
656 then might acquire the update-side spinlock in order to update that
657 element, all while remaining in that RCU read-side critical section. Of
658 course, it is necessary to exit the RCU read-side critical section
663 +-----------------------------------------------------------------------+
665 +-----------------------------------------------------------------------+
666 | But how does the upgrade-to-write operation exclude other readers? |
667 +-----------------------------------------------------------------------+
669 +-----------------------------------------------------------------------+
672 +-----------------------------------------------------------------------+
674 This guarantee allows lookup code to be shared between read-side and
675 update-side code, and was premeditated, appearing in the earliest
678 Fundamental Non-Requirements
679 ----------------------------
681 RCU provides extremely lightweight readers, and its read-side
686 non-guarantees that have caused confusion. Except where otherwise noted,
687 these non-guarantees were premeditated.
692 #. `Grace Periods Don't Partition Read-Side Critical Sections`_
693 #. `Read-Side Critical Sections Don't Partition Grace Periods`_
698 Reader-side markers such as rcu_read_lock() and
700 through their interaction with the grace-period APIs such as
737 significant ordering constraints would slow down these fast-path APIs.
739 +-----------------------------------------------------------------------+
741 +-----------------------------------------------------------------------+
743 +-----------------------------------------------------------------------+
745 +-----------------------------------------------------------------------+
748 +-----------------------------------------------------------------------+
794 +-----------------------------------------------------------------------+
796 +-----------------------------------------------------------------------+
798 | completed instead of waiting only on pre-existing readers. For how |
800 +-----------------------------------------------------------------------+
802 +-----------------------------------------------------------------------+
807 +-----------------------------------------------------------------------+
809 Grace Periods Don't Partition Read-Side Critical Sections
812 It is tempting to assume that if any part of one RCU read-side critical
814 read-side critical section follows that same grace period, then all of
815 the first RCU read-side critical section must precede all of the second.
817 partition the set of RCU read-side critical sections. An example of this
855 that the thread cannot be in the midst of an RCU read-side critical
858 .. kernel-figure:: GPpartitionReaders1.svg
860 If it is necessary to partition RCU read-side critical sections in this
902 the two RCU read-side critical sections cannot overlap, guaranteeing
911 This non-requirement was also non-premeditated, but became apparent when
914 Read-Side Critical Sections Don't Partition Grace Periods
917 It is also tempting to assume that if an RCU read-side critical section
970 .. kernel-figure:: ReadersPartitionGP1.svg
972 Again, an RCU read-side critical section can overlap almost all of a
974 period. As a result, an RCU read-side critical section cannot partition
977 +-----------------------------------------------------------------------+
979 +-----------------------------------------------------------------------+
981 | read-side critical section, would be required to partition the RCU |
982 | read-side critical sections at the beginning and end of the chain? |
983 +-----------------------------------------------------------------------+
985 +-----------------------------------------------------------------------+
990 +-----------------------------------------------------------------------+
993 -------------------------
1000 completely futile. This is most obvious in preemptible user-level
1003 hypervisor), but can also happen in bare-metal environments due to
1007 but where “extremely long” is not long enough to allow wrap-around
1008 when incrementing a 64-bit counter.
1010 matters, RCU must use compiler directives and memory-barrier
1014 writes and more-frequent concurrent writes will result in more
1021 #. Counters are finite, especially on 32-bit systems. RCU's use of
1026 dyntick-idle nesting counter allows 54 bits for interrupt nesting
1027 level (this counter is 64 bits even on a 32-bit system). Overflowing
1028 this counter requires 2\ :sup:`54` half-interrupts on a given CPU
1029 without that CPU ever going idle. If a half-interrupt happened every
1033 kernel in a single shared-memory environment. RCU must therefore pay
1034 close attention to high-end scalability.
1042 Quality-of-Implementation Requirements
1043 --------------------------------------
1045 These sections list quality-of-implementation requirements. Although an
1048 inappropriate for industrial-strength production use. Classes of
1049 quality-of-implementation requirements are as follows:
1062 RCU is and always has been intended primarily for read-mostly
1063 situations, which means that RCU's read-side primitives are optimized,
1064 often at the expense of its update-side primitives. Experience thus far
1067 #. Read-mostly data, where stale and inconsistent data is not a problem:
1069 #. Read-mostly data, where data must be consistent: RCU works well.
1070 #. Read-write data, where data must be consistent: RCU *might* work OK.
1072 #. Write-mostly data, where data must be consistent: RCU is very
1076 a. Existence guarantees for update-friendly mechanisms.
1077 b. Wait-free read-side primitives for real-time use.
1079 This focus on read-mostly situations means that RCU must interoperate
1084 primitives be legal within RCU read-side critical sections, including
1088 +-----------------------------------------------------------------------+
1090 +-----------------------------------------------------------------------+
1092 +-----------------------------------------------------------------------+
1094 +-----------------------------------------------------------------------+
1095 | These are forbidden within Linux-kernel RCU read-side critical |
1097 | case, voluntary context switch) within an RCU read-side critical |
1099 | read-side critical sections, and also within Linux-kernel sleepable |
1100 | RCU `(SRCU) <Sleepable RCU_>`__ read-side critical sections. In |
1101 | addition, the -rt patchset turns spinlocks into a sleeping locks so |
1104 | locks!) may be acquire within -rt-Linux-kernel RCU read-side critical |
1106 | Note that it *is* legal for a normal RCU read-side critical section |
1114 +-----------------------------------------------------------------------+
1126 inconsistency must be tolerated due to speed-of-light delays if nothing
1152 For example, the translation between a user-level SystemV semaphore ID
1153 to the corresponding in-kernel data structure is protected by RCU, but
1156 by acquiring spinlocks located in the in-kernel data structure from
1157 within the RCU read-side critical section, and this is indicated by the
1171 Linux-kernel RCU implementations must therefore avoid unnecessarily
1175 energy efficiency in battery-powered systems and on specific
1176 energy-efficiency shortcomings of the Linux-kernel RCU implementation.
1177 In my experience, the battery-powered embedded community will consider
1179 mere Linux-kernel-mailing-list posts are insufficient to vent their ire.
1184 `bloatwatch <http://elinux.org/Linux_Tiny-FAQ>`__ efforts, memory
1185 footprint is critically important on single-CPU systems with
1186 non-preemptible (``CONFIG_PREEMPTION=n``) kernels, and thus `tiny
1188 was born. Josh Triplett has since taken over the small-memory banner
1194 unsurprising. For example, in keeping with RCU's read-side
1197 Similarly, in non-preemptible environments, rcu_read_lock() and
1200 In preemptible environments, in the case where the RCU read-side
1202 highest-priority real-time process), rcu_read_lock() and
1204 should not contain atomic read-modify-write operations, memory-barrier
1206 branches. However, in the case where the RCU read-side critical section
1208 interrupts. This is why it is better to nest an RCU read-side critical
1209 section within a preempt-disable region than vice versa, at least in
1211 degrading real-time latencies.
1213 The synchronize_rcu() grace-period-wait primitive is optimized for
1215 addition to the duration of the longest RCU read-side critical section.
1218 they can be satisfied by a single underlying grace-period-wait
1220 single grace-period-wait operation to serve more than `1,000 separate
1221 … <https://www.usenix.org/conference/2004-usenix-annual-technical-conference/making-rcu-safe-deep-s…
1222 of synchronize_rcu(), thus amortizing the per-invocation overhead
1223 down to nearly zero. However, the grace-period optimization is also
1224 required to avoid measurable degradation of real-time scheduling and
1227 In some cases, the multi-millisecond synchronize_rcu() latencies are
1229 used instead, reducing the grace-period latency down to a few tens of
1230 microseconds on small systems, at least in cases where the RCU read-side
1238 to impose modest degradation of real-time latency on non-idle online
1240 scheduling-clock interrupt.
1243 synchronize_rcu_expedited()'s reduced grace-period latency is
1273 25 call_rcu(&p->rh, remove_gp_cb);
1279 lines 1-5. The function remove_gp_cb() is passed to call_rcu()
1285 be legal, including within preempt-disable code, local_bh_disable()
1286 code, interrupt-disable code, and interrupt handlers. However, even
1293 takes too long. Long-running operations should be relegated to separate
1296 +-----------------------------------------------------------------------+
1298 +-----------------------------------------------------------------------+
1303 +-----------------------------------------------------------------------+
1305 +-----------------------------------------------------------------------+
1306 | Presumably the ``->gp_lock`` acquired on line 18 excludes any |
1309 | after ``->gp_lock`` is released on line 25, which in turn means that |
1311 +-----------------------------------------------------------------------+
1350 open-coded it.
1352 +-----------------------------------------------------------------------+
1354 +-----------------------------------------------------------------------+
1360 +-----------------------------------------------------------------------+
1362 +-----------------------------------------------------------------------+
1364 | definition would say that updates in garbage-collected languages |
1371 +-----------------------------------------------------------------------+
1375 be carried out in the meantime? The polling-style
1414 In theory, delaying grace-period completion and callback invocation is
1421 example, an infinite loop in an RCU read-side critical section must by
1423 involved example, consider a 64-CPU system built with
1424 ``CONFIG_RCU_NOCB_CPU=y`` and booted with ``rcu_nocbs=1-63``, where
1442 corresponding CPU's next scheduling-clock.
1444 indefinitely in the kernel without scheduling-clock interrupts, which
1449 has been preempted within an RCU read-side critical section is
1460 caution when changing them. Note that these forward-progress measures
1465 invocation of callbacks when any given non-\ ``rcu_nocbs`` CPU has
1476 #. Lifts callback-execution batch limits, which speeds up callback
1480 overridden. Again, these forward-progress measures are provided only for
1482 RCU`_. Even for RCU, callback-invocation forward
1483 progress for ``rcu_nocbs`` CPUs is much less well-developed, in part
1486 both ``rcu_nocbs`` CPUs and high call_rcu() invocation rates, then
1487 additional forward-progress work will be required.
1493 part due to the collision of multicore hardware with object-oriented
1494 techniques designed in single-threaded environments for single-threaded
1495 use. And in theory, RCU read-side critical sections may be composed, and
1497 real-world implementations of composable constructs, there are
1501 rcu_read_unlock() generate no code, such as Linux-kernel RCU when
1516 the nesting-depth counter. For example, the Linux kernel's preemptible
1518 practical purposes. That said, a consecutive pair of RCU read-side
1520 grace period cannot be enclosed in another RCU read-side critical
1522 within an RCU read-side critical section: To do so would result either
1523 in deadlock or in RCU implicitly splitting the enclosing RCU read-side
1524 critical section, neither of which is conducive to a long-lived and
1528 example, many transactional-memory implementations prohibit composing a
1530 a network receive operation). For another example, lock-based critical
1534 In short, although RCU read-side critical sections are highly
1542 read-side critical sections, perhaps even so intense that there was
1544 read-side critical section in flight. RCU cannot allow this situation to
1545 block grace periods: As long as all the RCU read-side critical sections
1549 RCU read-side critical sections being preempted for long durations,
1550 which has the effect of creating a long-duration RCU read-side critical
1552 systems using real-time priorities are of course more vulnerable.
1557 Other workloads might have very high update rates. Although one can
1562 RCU callbacks in the call_rcu() code path. Finally, high update
1563 rates should not delay RCU read-side critical sections, although some
1564 small read-side delays can occur when using
1569 1990s, a simple user-level test consisting of ``close(open(path))`` in a
1571 appreciation of the high-update-rate corner case. This test also
1572 motivated addition of some RCU code to react to high update rates, for
1576 grace-period processing. This evasive action causes the grace period to
1581 Software-Engineering Requirements
1582 ---------------------------------
1589 splat if rcu_dereference() is used outside of an RCU read-side
1590 critical section. Update-side code can use
1602 that it has been invoked within an RCU read-side critical section. I
1605 #. A given function might wish to check for RCU-related preconditions
1611 assignment. To catch this sort of error, a given RCU-protected
1613 complain about simple-assignment accesses to that pointer. Arnd
1624 non-stack ``rcu_head`` structures must be initialized with
1629 #. An infinite loop in an RCU read-side critical section will eventually
1635 waiting on that particular RCU read-side critical section.
1641 stall warnings are counter-productive during sysrq dumps and during
1654 read-side critical sections, there is currently no good way of doing
1658 #. In kernels built with ``CONFIG_RCU_TRACE=y``, RCU-related information
1660 #. Open-coded use of rcu_assign_pointer() and rcu_dereference()
1662 error-prone. Therefore, RCU-protected `linked
1664 more recently, RCU-protected `hash
1666 other special-purpose RCU-protected data structures are available in
1675 This not a hard-and-fast list: RCU's diagnostic capabilities will
1677 real-world RCU usage.
1680 --------------------------
1696 #. `Scheduling-Clock Interrupts and RCU`_
1701 notable Linux-kernel complications. Each of the following sections
1719 `remind <https://lore.kernel.org/r/CA+55aFy4wcCwaL4okTs8wXhGZ5h-ibecy_Meg9C4MNQrUnwMcg@mail.gmail.c…
1731 used to do, it would create too many per-CPU kthreads. Although the
1748 ``task_struct`` is available and the boot CPU's per-CPU variables are
1749 set up. The read-side primitives (rcu_read_lock(),
1769 itself is a quiescent state and thus a grace period, so the early-boot
1770 implementation can be a no-op.
1775 reason is that an RCU read-side critical section might be preempted,
1785 +-----------------------------------------------------------------------+
1787 +-----------------------------------------------------------------------+
1790 +-----------------------------------------------------------------------+
1792 +-----------------------------------------------------------------------+
1797 | grace-period mechanism. At runtime, this expedited mechanism relies |
1799 | drives the desired expedited grace period. Because dead-zone |
1813 +-----------------------------------------------------------------------+
1815 I learned of these boot-time requirements as a result of a series of
1821 The Linux kernel has interrupts, and RCU read-side critical sections are
1822 legal within interrupt handlers and within interrupt-disabled regions of
1825 Some Linux-kernel architectures can enter an interrupt handler from
1826 non-idle process context, and then just never leave it, instead
1829 “half-interrupts” mean that RCU has to be very careful about how it
1831 way during a rewrite of RCU's dyntick-idle code.
1833 The Linux kernel has non-maskable interrupts (NMIs), and RCU read-side
1835 update-side primitives, including call_rcu(), are prohibited within
1838 The name notwithstanding, some Linux-kernel architectures can have
1840 me <https://lore.kernel.org/r/CALCETrXLq1y7e_dKFPgou-FKHB6Pu-r8+t-6Ds+8=va7anBWDA@mail.gmail.com>`__
1858 one of its functions results in a segmentation fault. The module-unload
1859 functions must therefore cancel any delayed calls to loadable-module
1867 module unload request, we need some other way to deal with in-flight RCU
1871 in-flight RCU callbacks have been invoked. If a module uses
1874 the underlying module-unload code could invoke rcu_barrier()
1879 filesystem-unmount situation, and Dipankar Sarma incorporated
1893 +-----------------------------------------------------------------------+
1895 +-----------------------------------------------------------------------+
1897 | complete, and rcu_barrier() must wait for each pre-existing |
1901 +-----------------------------------------------------------------------+
1903 +-----------------------------------------------------------------------+
1913 | pre-existing callbacks, you will need to invoke both |
1916 +-----------------------------------------------------------------------+
1923 offline CPU, with the exception of `SRCU <Sleepable RCU_>`__ read-side
1925 DYNIX/ptx, but on the other hand, the Linux kernel's CPU-hotplug
1928 The Linux-kernel CPU-hotplug implementation has notifiers that are used
1930 appropriately to a given CPU-hotplug operation. Most RCU operations may
1931 be invoked from CPU-hotplug notifiers, including even synchronous
1932 grace-period operations such as (synchronize_rcu() and
1938 In addition, all-callback-wait operations such as rcu_barrier() may
1939 not be invoked from any CPU-hotplug notifier. This restriction is due
1940 to the fact that there are phases of CPU-hotplug operations where the
1941 outgoing CPU's callbacks will not be invoked until after the CPU-hotplug
1943 rcu_barrier() blocks CPU-hotplug operations during its execution,
1944 which results in another type of deadlock when invoked from a CPU-hotplug
1952 for the force-quiescent-state loop (FQS) to report quiescent states for
1963 The CPU-online path (rcu_cpu_starting()) should never need to report
1976 RCU makes use of kthreads, and it is necessary to avoid excessive CPU-time
1978 RCU's violation of it when running context-switch-heavy workloads when
1982 context-switch-heavy ``CONFIG_NO_HZ_FULL=y`` workloads, but there is
1986 scheduler's runqueue or priority-inheritance spinlocks across an
1988 somewhere within the corresponding RCU read-side critical section.
1994 nesting. The fact that interrupt-disabled regions of code act as RCU
1995 read-side critical sections implicitly avoids earlier issues that used
2012 The kernel needs to access user-space memory, for example, to access data
2013 referenced by system-call parameters. The get_user() macro does this job.
2015 However, user-space memory might well be paged out, which means that
2016 get_user() might well page-fault and thus block while waiting for the
2018 reorder a get_user() invocation into an RCU read-side critical section.
2026 3 v = p->value;
2039 4 v = p->value;
2045 state in the middle of an RCU read-side critical section. This misplaced
2046 quiescent state could result in line 4 being a use-after-free access,
2054 ``p->value`` is not volatile, so the compiler would not have any reason to keep
2057 Therefore, the Linux-kernel definitions of rcu_read_lock() and
2060 of RCU read-side critical sections.
2066 by people with battery-powered embedded systems. RCU therefore conserves
2069 energy-efficiency requirement, so I learned of this via an irate phone
2073 RCU read-side critical section on an idle CPU. (Kernels built with
2077 whether or not it is currently legal to run RCU read-side critical
2079 hand and RCU_NONIDLE() on the other while inspecting idle-loop code.
2089 order of a million deep, even on 32-bit systems, this should not be a
2116 These energy-efficiency requirements have proven quite difficult to
2118 clean-sheet rewrites of RCU's energy-efficiency code, the last of which
2123 phone calls: Flaming me on the Linux-kernel mailing list was apparently
2124 not sufficient to fully vent their ire at RCU's energy-efficiency bugs!
2126 Scheduling-Clock Interrupts and RCU
2129 The kernel transitions between in-kernel non-idle execution, userspace
2133 +-----------------+------------------+------------------+-----------------+
2134 | ``HZ`` Kconfig | In-Kernel | Usermode | Idle |
2137 | | scheduling-clock | scheduling-clock | RCU's |
2138 | | interrupt. | interrupt and | dyntick-idle |
2142 +-----------------+------------------+------------------+-----------------+
2144 | | scheduling-clock | scheduling-clock | RCU's |
2145 | | interrupt. | interrupt and | dyntick-idle |
2149 +-----------------+------------------+------------------+-----------------+
2152 | | on | dyntick-idle | dyntick-idle |
2153 | | scheduling-clock | detection. | detection. |
2161 +-----------------+------------------+------------------+-----------------+
2163 +-----------------------------------------------------------------------+
2165 +-----------------------------------------------------------------------+
2166 | Why can't ``NO_HZ_FULL`` in-kernel execution rely on the |
2167 | scheduling-clock interrupt, just like ``HZ_PERIODIC`` and |
2169 +-----------------------------------------------------------------------+
2171 +-----------------------------------------------------------------------+
2173 | necessarily re-enable the scheduling-clock interrupt on entry to each |
2175 +-----------------------------------------------------------------------+
2181 scheduling-clock interrupt be enabled when RCU needs it to be:
2184 is non-idle, the scheduling-clock tick had better be running.
2186 (11-second) grace periods, with a pointless IPI waking the CPU from
2188 #. If a CPU is in a portion of the kernel that executes RCU read-side
2194 no-joking guaranteed to never execute any RCU read-side critical
2196 sort of thing is used by some architectures for light-weight
2203 fact joking about not doing RCU read-side critical sections.
2204 #. If a CPU is executing in the kernel with the scheduling-clock
2205 interrupt disabled and RCU believes this CPU to be non-idle, and if
2215 is usually OK) and the scheduling-clock interrupt is enabled, of
2220 +-----------------------------------------------------------------------+
2222 +-----------------------------------------------------------------------+
2226 +-----------------------------------------------------------------------+
2228 +-----------------------------------------------------------------------+
2230 | often. But given that long-running interrupt handlers can cause other |
2233 +-----------------------------------------------------------------------+
2236 between in-kernel execution, usermode execution, and idle, and as long
2237 as the scheduling-clock interrupt is enabled when RCU needs it to be,
2244 Although small-memory non-realtime systems can simply use Tiny RCU, code
2248 pair of pointers, it does appear in many RCU-protected data structures,
2253 This need for memory efficiency is one reason that RCU uses hand-crafted
2258 posted them. Although this information might appear in debug-only kernel
2259 builds at some point, in the meantime, the ``->func`` field will often
2267 conditions <https://lore.kernel.org/r/1439976106-137226-1-git-send-email-kirill.shutemov@linux.inte…
2268 the Linux kernel's memory-management subsystem needs a particular bit to
2269 remain zero during all phases of grace-period processing, and that bit
2271 ``->next`` field. RCU makes this guarantee as long as call_rcu() is
2274 energy-efficiency purposes.
2277 structure be aligned to a two-byte boundary, and passing a misaligned
2281 a four-byte or even eight-byte alignment requirement? Because the m68k
2282 architecture provides only two-byte alignment, and thus acts as
2288 potentially have energy-efficiency benefits, but only if the rate of
2289 non-lazy callbacks decreases significantly for some important workload.
2298 hot code paths in performance-critical portions of the Linux kernel's
2301 read-side primitives. To that end, it would be good if preemptible RCU's
2313 minimal per-operation overhead. In fact, in many cases, increasing load
2314 must *decrease* the per-operation overhead, witness the batching
2320 The Linux kernel is used for real-time workloads, especially in
2321 conjunction with the `-rt
2323 real-time-latency response requirements are such that the traditional
2324 approach of disabling preemption across RCU read-side critical sections
2326 an RCU implementation that allows RCU read-side critical sections to be
2328 clear that an earlier `real-time
2332 encountered by a very early version of the -rt patchset.
2334 In addition, RCU must make do with a sub-100-microsecond real-time
2335 latency budget. In fact, on smaller systems with the -rt patchset, the
2336 Linux kernel provides sub-20-microsecond real-time latencies for the
2339 surprise, the sub-100-microsecond real-time latency budget `applies to
2342 up to and including systems with 4096 CPUs. This real-time requirement
2343 motivated the grace-period kthread, which also simplified handling of a
2346 RCU must avoid degrading real-time response for CPU-bound threads,
2348 ``CONFIG_NO_HZ_FULL=y``) or in the kernel. That said, CPU-bound loops in
2356 stress-test suite. This stress-test suite is called ``rcutorture``.
2362 today, given Android smartphones, Linux-powered televisions, and
2372 jurisdictions, a successful multi-year test of a given mechanism, which
2374 safety-critical certifications. In fact, rumor has it that the Linux
2375 kernel is already being used in production for safety-critical
2381 -----------------
2386 implementations, non-preemptible and preemptible. The other four flavors
2390 #. `Bottom-Half Flavor (Historical)`_
2395 Bottom-Half Flavor (Historical)
2398 The RCU-bh flavor of RCU has since been expressed in terms of the other
2400 single flavor. The read-side API remains, and continues to disable
2404 The softirq-disable (AKA “bottom-half”, hence the “_bh” abbreviations)
2405 flavor of RCU, or *RCU-bh*, was developed by Dipankar Sarma to provide a
2406 flavor of RCU that could withstand the network-based denial-of-service
2411 grace periods from ever ending. The result was an out-of-memory
2414 The solution was the creation of RCU-bh, which does
2415 local_bh_disable() across its read-side critical sections, and which
2418 offline. This means that RCU-bh grace periods can complete even when
2420 algorithms based on RCU-bh to withstand network-based denial-of-service
2424 re-enable softirq handlers, any attempt to start a softirq handlers
2425 during the RCU-bh read-side critical section will be deferred. In this
2428 overhead should be associated with the code following the RCU-bh
2429 read-side critical section rather than rcu_read_unlock_bh(), but the
2431 of fine distinction. For example, suppose that a three-millisecond-long
2432 RCU-bh read-side critical section executes during a time of heavy
2439 The `RCU-bh
2440 API <https://lwn.net/Articles/609973/#RCU%20Per-Flavor%20API%20Table>`__
2443 old RCU-bh update-side APIs are now gone, replaced by synchronize_rcu(),
2445 anything that disables bottom halves also marks an RCU-bh read-side
2452 The RCU-sched flavor of RCU has since been expressed in terms of the
2454 single flavor. The read-side API remains, and continues to disable
2458 Before preemptible RCU, waiting for an RCU grace period had the side
2459 effect of also waiting for all pre-existing interrupt and NMI handlers.
2460 However, there are legitimate preemptible-RCU implementations that do
2462 RCU read-side critical section can be a quiescent state. Therefore,
2463 *RCU-sched* was created, which follows “classic” RCU in that an
2464 RCU-sched grace period waits for pre-existing interrupt and NMI
2466 RCU-sched APIs have identical implementations, while kernels built with
2471 re-enable preemption, respectively. This means that if there was a
2472 preemption attempt during the RCU-sched read-side critical section,
2476 very slowly. However, the highest-priority task won't be preempted, so
2477 that task will enjoy low-overhead rcu_read_unlock_sched()
2480 The `RCU-sched
2481 API <https://lwn.net/Articles/609973/#RCU%20Per-Flavor%20API%20Table>`__
2485 rcu_read_lock_sched_held(). However, the old RCU-sched update-side APIs
2488 preemption also marks an RCU-sched read-side critical section,
2496 read-side critical section” was a reliable indication that this someone
2498 read-side critical section, you can probably afford to use a
2499 higher-overhead synchronization mechanism. However, that changed with
2500 the advent of the Linux kernel's notifiers, whose RCU read-side critical
2511 That said, one consequence of these domains is that read-side code must
2523 As noted above, it is legal to block within SRCU read-side critical
2525 block forever in one of a given domain's SRCU read-side critical
2528 happen if any operation in a given domain's SRCU read-side critical
2530 period to elapse. For example, this results in a self-deadlock:
2545 ``ss1``-domain SRCU read-side critical section acquired another mutex
2546 that was held across as ``ss``-domain synchronize_srcu(), deadlock
2551 Unlike the other RCU flavors, SRCU read-side critical sections can run
2560 invoked from CPU-hotplug notifiers, due to the fact that SRCU grace
2564 the CPU-hotplug process. The problem is that if a notifier is waiting on
2569 CPU-hotplug notifiers.
2572 non-expedited grace periods are implemented by the same mechanism. This
2574 period has the side effect of expediting all prior grace periods that
2582 As of v4.12, SRCU's callbacks are maintained per-CPU, eliminating a
2594 API <https://lwn.net/Articles/609973/#RCU%20Per-Flavor%20API%20Table>`__
2609 specified cookie corresponds to an already-completed
2617 certain types of buffer-cache algorithms having multi-stage age-out
2628 anywhere in the code, it is not possible to use read-side markers such
2638 trampoline would be pre-ordained a surprisingly long time before execution
2642 RCU <https://lwn.net/Articles/607117/>`__, is to have implicit read-side
2646 userspace execution also delimit tasks-RCU read-side critical sections.
2648 The tasks-RCU API is quite compact, consisting only of
2660 Some forms of tracing need to wait for all preemption-disabled regions
2665 moniker. And this operation is considered to be quite rude by real-time
2667 by battery-powered systems that don't want their idle CPUs to be awakened.
2669 The tasks-rude-RCU API is also reader-marking-free and thus quite compact,
2677 SRCU's read-side overhead, which includes a full memory barrier in both
2680 readers. Real-time systems that cannot tolerate IPIs may build their
2682 the expense of adding full memory barriers to the read-side primitives.
2684 The tasks-trace-RCU API is also reasonably compact,
2690 -----------------------
2692 One of the tricks that RCU uses to attain update-side scalability is to
2693 increase grace-period latency with increasing numbers of CPUs. If this
2695 grace-period state machine so as to avoid the need for the additional
2700 rcu_barrier() in CPU-hotplug notifiers, it will be necessary to
2704 The tradeoff between grace-period latency on the one hand and
2706 re-examined. The desire is of course for zero grace-period latency as
2716 be unnecessary because the hotpath read-side primitives do not access
2727 carefully run and realistic system-level workload.
2741 Additional work may be required to provide reasonable forward-progress
2746 -------
2754 ---------------