Lines Matching +full:max +full:- +full:outbound +full:- +full:regions
19 documentation at tools/memory-model/. Nevertheless, even this memory
37 Note also that it is possible that a barrier may be a no-op for an
48 - Device operations.
49 - Guarantees.
53 - Varieties of memory barrier.
54 - What may not be assumed about memory barriers?
55 - Address-dependency barriers (historical).
56 - Control dependencies.
57 - SMP barrier pairing.
58 - Examples of memory barrier sequences.
59 - Read memory barriers vs load speculation.
60 - Multicopy atomicity.
64 - Compiler barrier.
65 - CPU memory barriers.
69 - Lock acquisition functions.
70 - Interrupt disabling functions.
71 - Sleep and wake-up functions.
72 - Miscellaneous functions.
74 (*) Inter-CPU acquiring barrier effects.
76 - Acquires vs memory accesses.
80 - Interprocessor interaction.
81 - Atomic operations.
82 - Accessing devices.
83 - Interrupts.
91 - Cache coherency.
92 - Cache coherency vs DMA.
93 - Cache coherency vs MMIO.
97 - And then there's the Alpha.
98 - Virtual Machine Guests.
102 - Circular buffers.
116 +-------+ : +--------+ : +-------+
119 | CPU 1 |<----->| Memory |<----->| CPU 2 |
122 +-------+ : +--------+ : +-------+
127 | : +--------+ : |
130 +---------->| Device |<----------+
133 : +--------+ :
159 STORE A=3, STORE B=4, y=LOAD A->3, x=LOAD B->4
160 STORE A=3, STORE B=4, x=LOAD B->4, y=LOAD A->3
161 STORE A=3, y=LOAD A->3, STORE B=4, x=LOAD B->4
162 STORE A=3, y=LOAD A->3, x=LOAD B->2, STORE B=4
163 STORE A=3, x=LOAD B->2, STORE B=4, y=LOAD A->3
164 STORE A=3, x=LOAD B->2, y=LOAD A->3, STORE B=4
165 STORE B=4, STORE A=3, y=LOAD A->3, x=LOAD B->4
203 -----------------
225 ----------
239 emits a memory-barrier instruction, so that a DEC Alpha CPU will
310 And there are anti-guarantees:
313 generate code to modify these using non-atomic read-modify-write
320 non-atomic read-modify-write sequences can cause an update to one
327 "char", two-byte alignment for "short", four-byte alignment for
328 "int", and either four-byte or eight-byte alignment for "long",
329 on 32-bit and 64-bit systems, respectively. Note that these
331 using older pre-C11 compilers (for example, gcc 4.6). The portion
337 of adjacent bit-fields all having nonzero width
343 NOTE 2: A bit-field and an adjacent non-bit-field member
345 to two bit-fields, if one is declared inside a nested
347 are separated by a zero-length bit-field declaration,
348 or if they are separated by a non-bit-field member
350 bit-fields in the same structure if all members declared
351 between them are also bit-fields, no matter what the
352 sizes of those intervening bit-fields happen to be.
360 in random order, but this can be a problem for CPU-CPU interaction and for I/O.
376 ---------------------------
395 address-dependency barriers; see the "SMP barrier pairing" subsection.
398 (2) Address-dependency barriers (historical).
400 An address-dependency barrier is a weaker form of read barrier. In the
403 the second load will be directed), an address-dependency barrier would
407 An address-dependency barrier is a partial ordering on interdependent
413 considered can then perceive. An address-dependency barrier issued by
418 the address-dependency barrier.
430 [!] Note that address-dependency barriers should normally be paired with
433 [!] Kernel release v5.9 removed kernel APIs for explicit address-
436 address-dependency barriers.
440 A read barrier is an address-dependency barrier plus a guarantee that all
448 Read memory barriers imply address-dependency barriers, and so can
472 This acts as a one-way permeable barrier. It guarantees that all memory
487 This also acts as a one-way permeable barrier. It guarantees that all
498 -not- guaranteed to act as a full memory barrier. However, after an
509 RELEASE variants in addition to fully-ordered and relaxed (no barrier
526 ----------------------------------------------
545 (*) There is no guarantee that some intervening piece of off-the-CPU
552 Documentation/driver-api/pci/pci.rst
553 Documentation/core-api/dma-api-howto.rst
554 Documentation/core-api/dma-api.rst
557 ADDRESS-DEPENDENCY BARRIERS (HISTORICAL)
558 ----------------------------------------
562 to this section are those working on DEC Alpha architecture-specific code
565 address-dependency barriers.
567 [!] While address dependencies are observed in both load-to-load and
568 load-to-store relations, address-dependency barriers are not necessary
569 for load-to-store situations.
571 The requirement of address-dependency barriers is a little subtle, and
584 [!] READ_ONCE_OLD() corresponds to READ_ONCE() of pre-4.15 kernel, which
585 doesn't imply an address-dependency barrier.
602 To deal with this, READ_ONCE() provides an implicit address-dependency barrier
612 <implicit address-dependency barrier>
621 even-numbered cache lines and the other bank processes odd-numbered cache
622 lines. The pointer P might be stored in an odd-numbered cache line, and the
623 variable B might be stored in an even-numbered cache line. Then, if the
624 even-numbered bank of the reading CPU's cache is extremely busy while the
625 odd-numbered bank is idle, one can see the new value of the pointer P (&B),
629 An address-dependency barrier is not required to order dependent writes
646 Therefore, no address-dependency barrier is required to order the read into
648 even without an implicit address-dependency barrier of modern READ_ONCE():
653 of dependency ordering is to -prevent- writes to the data structure, along
664 The address-dependency barrier is very important to the RCU system,
674 --------------------
680 A load-load control dependency requires a full read memory barrier, not
681 simply an (implicit) address-dependency barrier to make it work correctly.
685 <implicit address-dependency barrier>
692 dependency, but rather a control dependency that the CPU may short-circuit
703 However, stores are not speculated. This means that ordering -is- provided
704 for load-store control dependencies, as in the following example:
719 variable 'a' is always non-zero, it would be well within its rights
749 /* WRITE_ONCE(b, 1); -- moved up, BUG!!! */
752 /* WRITE_ONCE(b, 1); -- moved up, BUG!!! */
772 In contrast, without explicit memory barriers, two-legged-if control
792 if (q % MAX) {
800 If MAX is defined to be 1, then the compiler knows that (q % MAX) is
812 relying on this ordering, you should make sure that MAX is greater than
816 BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */
817 if (q % MAX) {
829 You must also be careful not to rely too much on boolean short-circuit
844 out-guess your code. More generally, although READ_ONCE() does force
848 In addition, control dependencies apply only to the then-clause and
849 else-clause of the if-statement in question. In particular, it does
850 not necessarily apply to code following the if-statement:
864 conditional-move instructions, as in this fanciful pseudo-assembly
877 In short, control dependencies apply only to the stores in the then-clause
878 and else-clause of the if-statement in question (including functions
879 invoked by those two clauses), not to code following that if-statement.
890 However, they do -not- guarantee any other sort of ordering:
899 to carry out the stores. Please note that it is -not- sufficient
905 (*) Control dependencies require at least one run-time conditional
917 (*) Control dependencies apply only to the then-clause and else-clause
918 of the if-statement containing the control dependency, including
920 do -not- apply to code following the if-statement containing the
925 (*) Control dependencies do -not- provide multicopy atomicity. If you
933 -------------------
935 When dealing with CPU-CPU interactions, certain types of memory barrier should
942 with an address-dependency barrier, a control dependency, an acquire barrier,
944 read barrier, control dependency, or an address-dependency barrier pairs
963 <implicit address-dependency barrier>
983 match the loads after the read barrier or the address-dependency barrier, and
988 WRITE_ONCE(a, 1); }---- --->{ v = READ_ONCE(c);
992 WRITE_ONCE(d, 4); }---- --->{ y = READ_ONCE(b);
996 ------------------------------------
1015 +-------+ : :
1016 | | +------+
1017 | |------>| C=3 | } /\
1018 | | : +------+ }----- \ -----> Events perceptible to
1020 | | : +------+ }
1022 | | +------+ }
1023 | | wwwwwwwwwwwwwwww } <--- At this point the write barrier
1024 | | +------+ } requires all stores prior to the
1026 | | : +------+ } further stores may take place
1027 | |------>| D=4 | }
1028 | | +------+
1029 +-------+ : :
1036 Secondly, address-dependency barriers act as partial orderings on address-
1052 +-------+ : : : :
1053 | | +------+ +-------+ | Sequence of update
1054 | |------>| B=2 |----- --->| Y->8 | | of perception on
1055 | | : +------+ \ +-------+ | CPU 2
1056 | CPU 1 | : | A=1 | \ --->| C->&Y | V
1057 | | +------+ | +-------+
1059 | | +------+ | : :
1060 | | : | C=&B |--- | : : +-------+
1061 | | : +------+ \ | +-------+ | |
1062 | |------>| D=4 | ----------->| C->&B |------>| |
1063 | | +------+ | +-------+ | |
1064 +-------+ : : | : : | |
1067 | +-------+ | |
1068 Apparently incorrect ---> | | B->7 |------>| |
1069 perception of B (!) | +-------+ | |
1071 | +-------+ | |
1072 The load of X holds ---> \ | X->9 |------>| |
1073 up the maintenance \ +-------+ | |
1074 of coherence of B ----->| B->2 | +-------+
1075 +-------+
1082 If, however, an address-dependency barrier were to be placed between the load
1093 <address-dependency barrier>
1098 +-------+ : : : :
1099 | | +------+ +-------+
1100 | |------>| B=2 |----- --->| Y->8 |
1101 | | : +------+ \ +-------+
1102 | CPU 1 | : | A=1 | \ --->| C->&Y |
1103 | | +------+ | +-------+
1105 | | +------+ | : :
1106 | | : | C=&B |--- | : : +-------+
1107 | | : +------+ \ | +-------+ | |
1108 | |------>| D=4 | ----------->| C->&B |------>| |
1109 | | +------+ | +-------+ | |
1110 +-------+ : : | : : | |
1113 | +-------+ | |
1114 | | X->9 |------>| |
1115 | +-------+ | |
1116 Makes sure all effects ---> \ aaaaaaaaaaaaaaaaa | |
1117 prior to the store of C \ +-------+ | |
1118 are perceptible to ----->| B->2 |------>| |
1119 subsequent loads +-------+ | |
1120 : : +-------+
1138 +-------+ : : : :
1139 | | +------+ +-------+
1140 | |------>| A=1 |------ --->| A->0 |
1141 | | +------+ \ +-------+
1142 | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
1143 | | +------+ | +-------+
1144 | |------>| B=2 |--- | : :
1145 | | +------+ \ | : : +-------+
1146 +-------+ : : \ | +-------+ | |
1147 ---------->| B->2 |------>| |
1148 | +-------+ | CPU 2 |
1149 | | A->0 |------>| |
1150 | +-------+ | |
1151 | : : +-------+
1153 \ +-------+
1154 ---->| A->1 |
1155 +-------+
1175 +-------+ : : : :
1176 | | +------+ +-------+
1177 | |------>| A=1 |------ --->| A->0 |
1178 | | +------+ \ +-------+
1179 | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
1180 | | +------+ | +-------+
1181 | |------>| B=2 |--- | : :
1182 | | +------+ \ | : : +-------+
1183 +-------+ : : \ | +-------+ | |
1184 ---------->| B->2 |------>| |
1185 | +-------+ | CPU 2 |
1188 At this point the read ----> \ rrrrrrrrrrrrrrrrr | |
1189 barrier causes all effects \ +-------+ | |
1190 prior to the storage of B ---->| A->1 |------>| |
1191 to be perceptible to CPU 2 +-------+ | |
1192 : : +-------+
1212 +-------+ : : : :
1213 | | +------+ +-------+
1214 | |------>| A=1 |------ --->| A->0 |
1215 | | +------+ \ +-------+
1216 | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
1217 | | +------+ | +-------+
1218 | |------>| B=2 |--- | : :
1219 | | +------+ \ | : : +-------+
1220 +-------+ : : \ | +-------+ | |
1221 ---------->| B->2 |------>| |
1222 | +-------+ | CPU 2 |
1225 | +-------+ | |
1226 | | A->0 |------>| 1st |
1227 | +-------+ | |
1228 At this point the read ----> \ rrrrrrrrrrrrrrrrr | |
1229 barrier causes all effects \ +-------+ | |
1230 prior to the storage of B ---->| A->1 |------>| 2nd |
1231 to be perceptible to CPU 2 +-------+ | |
1232 : : +-------+
1238 +-------+ : : : :
1239 | | +------+ +-------+
1240 | |------>| A=1 |------ --->| A->0 |
1241 | | +------+ \ +-------+
1242 | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
1243 | | +------+ | +-------+
1244 | |------>| B=2 |--- | : :
1245 | | +------+ \ | : : +-------+
1246 +-------+ : : \ | +-------+ | |
1247 ---------->| B->2 |------>| |
1248 | +-------+ | CPU 2 |
1251 \ +-------+ | |
1252 ---->| A->1 |------>| 1st |
1253 +-------+ | |
1255 +-------+ | |
1256 | A->1 |------>| 2nd |
1257 +-------+ | |
1258 : : +-------+
1267 ----------------------------------------
1271 other loads, and so do the load in advance - even though they haven't actually
1276 It may turn out that the CPU didn't actually need the value - perhaps because a
1277 branch circumvented the load - in which case it can discard the value or just
1291 : : +-------+
1292 +-------+ | |
1293 --->| B->2 |------>| |
1294 +-------+ | CPU 2 |
1296 +-------+ | |
1297 The CPU being busy doing a ---> --->| A->0 |~~~~ | |
1298 division speculates on the +-------+ ~ | |
1302 Once the divisions are complete --> : : ~-->| |
1304 LOAD with immediate effect : : +-------+
1307 Placing a read barrier or an address-dependency barrier just before the second
1322 : : +-------+
1323 +-------+ | |
1324 --->| B->2 |------>| |
1325 +-------+ | CPU 2 |
1327 +-------+ | |
1328 The CPU being busy doing a ---> --->| A->0 |~~~~ | |
1329 division speculates on the +-------+ ~ | |
1336 : : ~-->| |
1338 : : +-------+
1344 : : +-------+
1345 +-------+ | |
1346 --->| B->2 |------>| |
1347 +-------+ | CPU 2 |
1349 +-------+ | |
1350 The CPU being busy doing a ---> --->| A->0 |~~~~ | |
1351 division speculates on the +-------+ ~ | |
1357 +-------+ | |
1358 The speculation is discarded ---> --->| A->1 |------>| |
1359 and an updated value is +-------+ | |
1360 retrieved : : +-------+
1364 --------------------
1373 time to all -other- CPUs. The remainder of this document discusses this
1397 multicopy-atomic systems, CPU B's load must return either the same value
1407 able to compensate for non-multicopy atomicity. For example, suppose
1418 This substitution allows non-multicopy atomicity to run rampant: in
1424 example runs on a non-multicopy-atomic system where CPUs 1 and 2 share a
1429 General barriers can compensate not only for non-multicopy atomicity,
1430 but can also generate additional ordering that can ensure that -all-
1431 CPUs will perceive the same order of -all- operations. In contrast, a
1432 chain of release-acquire pairs do not provide this additional ordering,
1473 Furthermore, because of the release-acquire relationship between cpu0()
1479 However, the ordering provided by a release-acquire chain is local
1490 writes in order, CPUs not involved in the release-acquire chain might
1492 the weak memory-barrier instructions used to implement smp_load_acquire()
1495 store to u as happening -after- cpu1()'s load from v, even though
1501 -not- ensure that any particular value will be read. Therefore, the
1526 ----------------
1533 This is a general barrier -- there are no read-read or write-write
1543 interrupt-handler code and the code that was interrupted.
1549 optimizations that, while perfectly safe in single-threaded code, can
1578 for single-threaded code, is almost certainly not what the developer
1599 single-threaded code, but can be fatal in concurrent code:
1617 single-threaded code, so you need to tell the compiler about cases
1631 This transformation is a win for single-threaded code because it
1643 do the following and MAX is a preprocessor macro with the value 1:
1645 while ((tmp = READ_ONCE(a)) % MAX)
1649 to MAX will always be zero, again allowing the compiler to optimize
1650 the code into near-nonexistence. (It will still load from the
1678 between process-level code and an interrupt handler:
1694 win for single-threaded code:
1755 In single-threaded code, this is not only safe, but also saves
1757 could cause some other CPU to see a spurious value of 42 -- even
1758 if variable 'a' was never zero -- when loading variable 'b'.
1767 damaging, but they can result in cache-line bouncing and thus in
1772 with a single memory-reference instruction, prevents "load tearing"
1775 16-bit store instructions with 7-bit immediate fields, the compiler
1776 might be tempted to use two 16-bit store-immediate instructions to
1777 implement the following 32-bit store:
1784 This optimization can therefore be a win in single-threaded code.
1808 implement these three assignment statements as a pair of 32-bit
1809 loads followed by a pair of 32-bit stores. This would result in
1829 -------------------
1841 All memory barriers except the address-dependency barriers imply a compiler
1855 systems because it is assumed that a CPU will appear to be self-consistent,
1866 windows. These barriers are required even on non-SMP systems as they affect
1897 obj->dead = 1;
1899 atomic_dec(&obj->ref_count);
1913 DMA capable device. See Documentation/core-api/dma-api.rst file for more
1921 if (desc->status != DEVICE_OWN) {
1926 read_data = desc->data;
1927 desc->data = write_data;
1933 desc->status = DEVICE_OWN;
1948 accesses to MMIO regions. See the later "KERNEL I/O BARRIER EFFECTS"
1957 For example, after a non-temporal write to pmem region, we use pmem_wmb()
1968 For memory accesses with write-combining attributes (e.g. those returned
1971 write-combining memory accesses before this macro with those after it when
1987 --------------------------
2034 one-way barriers is that the effects of instructions outside of a critical
2055 RELEASE may -not- be assumed to be a full memory barrier.
2080 -could- occur.
2095 a sleep-unlock race, but the locking primitive needs to resolve
2100 anything at all - especially with respect to I/O accesses - unless combined
2103 See also the section on "Inter-CPU acquiring barrier effects".
2133 -----------------------------
2141 SLEEP AND WAKE-UP FUNCTIONS
2142 ---------------------------
2167 STORE current->state
2210 STORE current->state ...
2212 LOAD event_indicated if ((LOAD task->state) & TASK_NORMAL)
2213 STORE task->state
2258 order multiple stores before the wake-up with respect to loads of those stored
2294 -----------------------
2302 INTER-CPU ACQUIRING BARRIER EFFECTS
2311 ---------------------------
2344 be a problem as a single-threaded linear piece of code will still appear to
2358 --------------------------
2398 LOAD waiter->list.next;
2399 LOAD waiter->task;
2400 STORE waiter->task;
2422 LOAD waiter->task;
2423 STORE waiter->task;
2431 LOAD waiter->list.next;
2432 --- OOPS ---
2439 LOAD waiter->list.next;
2440 LOAD waiter->task;
2442 STORE waiter->task;
2452 On a UP system - where this wouldn't be a problem - the smp_mb() is just a
2459 -----------------
2470 -----------------
2479 efficient to reorder, combine or merge accesses - something that would cause
2483 routines - such as inb() or writel() - which know how to make such accesses
2489 See Documentation/driver-api/device-io.rst for more information.
2493 ----------
2499 This may be alleviated - at least in part - by disabling local interrupts (a
2501 the interrupt-disabled section in the driver. While the driver's interrupt
2508 under interrupt-disablement and then the driver's interrupt handler is invoked:
2527 accesses performed in an interrupt - and vice versa - unless implicit or
2537 likely, then interrupt-disabling locks should be used to guarantee ordering.
2545 specific. Therefore, drivers which are inherently non-portable may rely on
2574 to an outbound DMA buffer allocated by dma_alloc_coherent() will be
2597 The ordering properties of __iomem pointers obtained with non-default
2607 bullets 2-5 above) but they are still guaranteed to be ordered with
2615 register-based, memory-mapped FIFOs residing on peripherals that are not
2621 The inX() and outX() accessors are intended to access legacy port-mapped
2632 Device drivers may expect outX() to emit a non-posted write transaction
2650 little-endian and will therefore perform byte-swapping operations on big-endian
2658 It has to be assumed that the conceptual CPU is weakly-ordered but that it will
2662 of arch-specific code.
2665 stream in any order it feels like - or even in parallel - provided that if an
2671 [*] Some instructions have more than one effect - such as changing the
2672 condition codes, changing registers or changing memory - and different
2698 <--- CPU ---> : <----------- Memory ----------->
2700 +--------+ +--------+ : +--------+ +-----------+
2701 | | | | : | | | | +--------+
2703 | Core |--->| Access |----->| Cache |<-->| | | |
2704 | | | Queue | : | | | |--->| Memory |
2706 +--------+ +--------+ : +--------+ | | | |
2707 : | Cache | +--------+
2709 : | Mechanism | +--------+
2710 +--------+ +--------+ : +--------+ | | | |
2712 | CPU | | Memory | : | CPU | | |--->| Device |
2713 | Core |--->| Access |----->| Cache |<-->| | | |
2715 | | | | : | | | | +--------+
2716 +--------+ +--------+ : +--------+ +-----------+
2747 ----------------------
2764 See Documentation/core-api/cachetlb.rst for more information on cache
2769 -----------------------
2825 (*) the CPU's data cache may affect the ordering, and while cache-coherency
2826 mechanisms may alleviate this - once the store has actually hit the cache
2827 - there's no guarantee that the coherency management will be propagated in
2838 However, it is guaranteed that a CPU will be self-consistent: it will see its
2865 are -not- optional in the above example, as there are architectures
2900 --------------------------
2904 two semantically-related cache lines updated at separate times. This is where
2905 the address-dependency barrier really becomes necessary as this synchronises
2915 ----------------------
2920 barriers for this use-case would be possible but is often suboptimal.
2922 To handle this case optimally, low-level virt_mb() etc macros are available.
2924 identical code for SMP and non-SMP systems. For example, virtual machine guests
2938 ----------------
2943 Documentation/core-api/circular-buffers.rst
2960 Chapter 7.1: Memory-Access Ordering
2963 ARM Architecture Reference Manual (ARMv8, for ARMv8-A architecture profile)
2966 IA-32 Intel Architecture Software Developer's Manual, Volume 3:
2981 Chapter 15: Sparc-V9 Memory Models
2997 Solaris Internals, Core Kernel Architecture, p63-68: