Lines Matching +full:foo +full:- +full:queue

19 documentation at tools/memory-model/.  Nevertheless, even this memory
37 Note also that it is possible that a barrier may be a no-op for an
48 - Device operations.
49 - Guarantees.
53 - Varieties of memory barrier.
54 - What may not be assumed about memory barriers?
55 - Address-dependency barriers (historical).
56 - Control dependencies.
57 - SMP barrier pairing.
58 - Examples of memory barrier sequences.
59 - Read memory barriers vs load speculation.
60 - Multicopy atomicity.
64 - Compiler barrier.
65 - CPU memory barriers.
69 - Lock acquisition functions.
70 - Interrupt disabling functions.
71 - Sleep and wake-up functions.
72 - Miscellaneous functions.
74 (*) Inter-CPU acquiring barrier effects.
76 - Acquires vs memory accesses.
80 - Interprocessor interaction.
81 - Atomic operations.
82 - Accessing devices.
83 - Interrupts.
91 - Cache coherency.
92 - Cache coherency vs DMA.
93 - Cache coherency vs MMIO.
97 - And then there's the Alpha.
98 - Virtual Machine Guests.
102 - Circular buffers.
116 +-------+ : +--------+ : +-------+
119 | CPU 1 |<----->| Memory |<----->| CPU 2 |
122 +-------+ : +--------+ : +-------+
127 | : +--------+ : |
130 +---------->| Device |<----------+
133 : +--------+ :
159 STORE A=3, STORE B=4, y=LOAD A->3, x=LOAD B->4
160 STORE A=3, STORE B=4, x=LOAD B->4, y=LOAD A->3
161 STORE A=3, y=LOAD A->3, STORE B=4, x=LOAD B->4
162 STORE A=3, y=LOAD A->3, x=LOAD B->2, STORE B=4
163 STORE A=3, x=LOAD B->2, STORE B=4, y=LOAD A->3
164 STORE A=3, x=LOAD B->2, y=LOAD A->3, STORE B=4
165 STORE B=4, STORE A=3, y=LOAD A->3, x=LOAD B->4
203 -----------------
225 ----------
239 emits a memory-barrier instruction, so that a DEC Alpha CPU will
310 And there are anti-guarantees:
313 generate code to modify these using non-atomic read-modify-write
320 non-atomic read-modify-write sequences can cause an update to one
327 "char", two-byte alignment for "short", four-byte alignment for
328 "int", and either four-byte or eight-byte alignment for "long",
329 on 32-bit and 64-bit systems, respectively. Note that these
331 using older pre-C11 compilers (for example, gcc 4.6). The portion
337 of adjacent bit-fields all having nonzero width
343 NOTE 2: A bit-field and an adjacent non-bit-field member
345 to two bit-fields, if one is declared inside a nested
347 are separated by a zero-length bit-field declaration,
348 or if they are separated by a non-bit-field member
350 bit-fields in the same structure if all members declared
351 between them are also bit-fields, no matter what the
352 sizes of those intervening bit-fields happen to be.
360 in random order, but this can be a problem for CPU-CPU interaction and for I/O.
376 ---------------------------
395 address-dependency barriers; see the "SMP barrier pairing" subsection.
398 (2) Address-dependency barriers (historical).
400 An address-dependency barrier is a weaker form of read barrier. In the
403 the second load will be directed), an address-dependency barrier would
407 An address-dependency barrier is a partial ordering on interdependent
413 considered can then perceive. An address-dependency barrier issued by
418 the address-dependency barrier.
430 [!] Note that address-dependency barriers should normally be paired with
433 [!] Kernel release v5.9 removed kernel APIs for explicit address-
436 address-dependency barriers.
440 A read barrier is an address-dependency barrier plus a guarantee that all
448 Read memory barriers imply address-dependency barriers, and so can
472 This acts as a one-way permeable barrier. It guarantees that all memory
487 This also acts as a one-way permeable barrier. It guarantees that all
498 -not- guaranteed to act as a full memory barrier. However, after an
509 RELEASE variants in addition to fully-ordered and relaxed (no barrier
526 ----------------------------------------------
533 access queue that accesses of the appropriate type may not cross.
545 (*) There is no guarantee that some intervening piece of off-the-CPU
552 Documentation/driver-api/pci/pci.rst
553 Documentation/core-api/dma-api-howto.rst
554 Documentation/core-api/dma-api.rst
557 ADDRESS-DEPENDENCY BARRIERS (HISTORICAL)
558 ----------------------------------------
562 to this section are those working on DEC Alpha architecture-specific code
565 address-dependency barriers.
567 [!] While address dependencies are observed in both load-to-load and
568 load-to-store relations, address-dependency barriers are not necessary
569 for load-to-store situations.
571 The requirement of address-dependency barriers is a little subtle, and
584 [!] READ_ONCE_OLD() corresponds to READ_ONCE() of pre-4.15 kernel, which
585 doesn't imply an address-dependency barrier.
602 To deal with this, READ_ONCE() provides an implicit address-dependency barrier
612 <implicit address-dependency barrier>
621 even-numbered cache lines and the other bank processes odd-numbered cache
622 lines. The pointer P might be stored in an odd-numbered cache line, and the
623 variable B might be stored in an even-numbered cache line. Then, if the
624 even-numbered bank of the reading CPU's cache is extremely busy while the
625 odd-numbered bank is idle, one can see the new value of the pointer P (&B),
629 An address-dependency barrier is not required to order dependent writes
646 Therefore, no address-dependency barrier is required to order the read into
648 even without an implicit address-dependency barrier of modern READ_ONCE():
653 of dependency ordering is to -prevent- writes to the data structure, along
664 The address-dependency barrier is very important to the RCU system,
674 --------------------
680 A load-load control dependency requires a full read memory barrier, not
681 simply an (implicit) address-dependency barrier to make it work correctly.
685 <implicit address-dependency barrier>
692 dependency, but rather a control dependency that the CPU may short-circuit
703 However, stores are not speculated. This means that ordering -is- provided
704 for load-store control dependencies, as in the following example:
719 variable 'a' is always non-zero, it would be well within its rights
749 /* WRITE_ONCE(b, 1); -- moved up, BUG!!! */
752 /* WRITE_ONCE(b, 1); -- moved up, BUG!!! */
772 In contrast, without explicit memory barriers, two-legged-if control
829 You must also be careful not to rely too much on boolean short-circuit
844 out-guess your code. More generally, although READ_ONCE() does force
848 In addition, control dependencies apply only to the then-clause and
849 else-clause of the if-statement in question. In particular, it does
850 not necessarily apply to code following the if-statement:
864 conditional-move instructions, as in this fanciful pseudo-assembly
877 In short, control dependencies apply only to the stores in the then-clause
878 and else-clause of the if-statement in question (including functions
879 invoked by those two clauses), not to code following that if-statement.
890 However, they do -not- guarantee any other sort of ordering:
899 to carry out the stores. Please note that it is -not- sufficient
905 (*) Control dependencies require at least one run-time conditional
917 (*) Control dependencies apply only to the then-clause and else-clause
918 of the if-statement containing the control dependency, including
920 do -not- apply to code following the if-statement containing the
925 (*) Control dependencies do -not- provide multicopy atomicity. If you
933 -------------------
935 When dealing with CPU-CPU interactions, certain types of memory barrier should
942 with an address-dependency barrier, a control dependency, an acquire barrier,
944 read barrier, control dependency, or an address-dependency barrier pairs
963 <implicit address-dependency barrier>
983 match the loads after the read barrier or the address-dependency barrier, and
988 WRITE_ONCE(a, 1); }---- --->{ v = READ_ONCE(c);
992 WRITE_ONCE(d, 4); }---- --->{ y = READ_ONCE(b);
996 ------------------------------------
1015 +-------+ : :
1016 | | +------+
1017 | |------>| C=3 | } /\
1018 | | : +------+ }----- \ -----> Events perceptible to
1020 | | : +------+ }
1022 | | +------+ }
1023 | | wwwwwwwwwwwwwwww } <--- At this point the write barrier
1024 | | +------+ } requires all stores prior to the
1026 | | : +------+ } further stores may take place
1027 | |------>| D=4 | }
1028 | | +------+
1029 +-------+ : :
1036 Secondly, address-dependency barriers act as partial orderings on address-
1052 +-------+ : : : :
1053 | | +------+ +-------+ | Sequence of update
1054 | |------>| B=2 |----- --->| Y->8 | | of perception on
1055 | | : +------+ \ +-------+ | CPU 2
1056 | CPU 1 | : | A=1 | \ --->| C->&Y | V
1057 | | +------+ | +-------+
1059 | | +------+ | : :
1060 | | : | C=&B |--- | : : +-------+
1061 | | : +------+ \ | +-------+ | |
1062 | |------>| D=4 | ----------->| C->&B |------>| |
1063 | | +------+ | +-------+ | |
1064 +-------+ : : | : : | |
1067 | +-------+ | |
1068 Apparently incorrect ---> | | B->7 |------>| |
1069 perception of B (!) | +-------+ | |
1071 | +-------+ | |
1072 The load of X holds ---> \ | X->9 |------>| |
1073 up the maintenance \ +-------+ | |
1074 of coherence of B ----->| B->2 | +-------+
1075 +-------+
1082 If, however, an address-dependency barrier were to be placed between the load
1093 <address-dependency barrier>
1098 +-------+ : : : :
1099 | | +------+ +-------+
1100 | |------>| B=2 |----- --->| Y->8 |
1101 | | : +------+ \ +-------+
1102 | CPU 1 | : | A=1 | \ --->| C->&Y |
1103 | | +------+ | +-------+
1105 | | +------+ | : :
1106 | | : | C=&B |--- | : : +-------+
1107 | | : +------+ \ | +-------+ | |
1108 | |------>| D=4 | ----------->| C->&B |------>| |
1109 | | +------+ | +-------+ | |
1110 +-------+ : : | : : | |
1113 | +-------+ | |
1114 | | X->9 |------>| |
1115 | +-------+ | |
1116 Makes sure all effects ---> \ aaaaaaaaaaaaaaaaa | |
1117 prior to the store of C \ +-------+ | |
1118 are perceptible to ----->| B->2 |------>| |
1119 subsequent loads +-------+ | |
1120 : : +-------+
1138 +-------+ : : : :
1139 | | +------+ +-------+
1140 | |------>| A=1 |------ --->| A->0 |
1141 | | +------+ \ +-------+
1142 | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
1143 | | +------+ | +-------+
1144 | |------>| B=2 |--- | : :
1145 | | +------+ \ | : : +-------+
1146 +-------+ : : \ | +-------+ | |
1147 ---------->| B->2 |------>| |
1148 | +-------+ | CPU 2 |
1149 | | A->0 |------>| |
1150 | +-------+ | |
1151 | : : +-------+
1153 \ +-------+
1154 ---->| A->1 |
1155 +-------+
1175 +-------+ : : : :
1176 | | +------+ +-------+
1177 | |------>| A=1 |------ --->| A->0 |
1178 | | +------+ \ +-------+
1179 | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
1180 | | +------+ | +-------+
1181 | |------>| B=2 |--- | : :
1182 | | +------+ \ | : : +-------+
1183 +-------+ : : \ | +-------+ | |
1184 ---------->| B->2 |------>| |
1185 | +-------+ | CPU 2 |
1188 At this point the read ----> \ rrrrrrrrrrrrrrrrr | |
1189 barrier causes all effects \ +-------+ | |
1190 prior to the storage of B ---->| A->1 |------>| |
1191 to be perceptible to CPU 2 +-------+ | |
1192 : : +-------+
1212 +-------+ : : : :
1213 | | +------+ +-------+
1214 | |------>| A=1 |------ --->| A->0 |
1215 | | +------+ \ +-------+
1216 | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
1217 | | +------+ | +-------+
1218 | |------>| B=2 |--- | : :
1219 | | +------+ \ | : : +-------+
1220 +-------+ : : \ | +-------+ | |
1221 ---------->| B->2 |------>| |
1222 | +-------+ | CPU 2 |
1225 | +-------+ | |
1226 | | A->0 |------>| 1st |
1227 | +-------+ | |
1228 At this point the read ----> \ rrrrrrrrrrrrrrrrr | |
1229 barrier causes all effects \ +-------+ | |
1230 prior to the storage of B ---->| A->1 |------>| 2nd |
1231 to be perceptible to CPU 2 +-------+ | |
1232 : : +-------+
1238 +-------+ : : : :
1239 | | +------+ +-------+
1240 | |------>| A=1 |------ --->| A->0 |
1241 | | +------+ \ +-------+
1242 | CPU 1 | wwwwwwwwwwwwwwww \ --->| B->9 |
1243 | | +------+ | +-------+
1244 | |------>| B=2 |--- | : :
1245 | | +------+ \ | : : +-------+
1246 +-------+ : : \ | +-------+ | |
1247 ---------->| B->2 |------>| |
1248 | +-------+ | CPU 2 |
1251 \ +-------+ | |
1252 ---->| A->1 |------>| 1st |
1253 +-------+ | |
1255 +-------+ | |
1256 | A->1 |------>| 2nd |
1257 +-------+ | |
1258 : : +-------+
1267 ----------------------------------------
1271 other loads, and so do the load in advance - even though they haven't actually
1276 It may turn out that the CPU didn't actually need the value - perhaps because a
1277 branch circumvented the load - in which case it can discard the value or just
1291 : : +-------+
1292 +-------+ | |
1293 --->| B->2 |------>| |
1294 +-------+ | CPU 2 |
1296 +-------+ | |
1297 The CPU being busy doing a ---> --->| A->0 |~~~~ | |
1298 division speculates on the +-------+ ~ | |
1302 Once the divisions are complete --> : : ~-->| |
1304 LOAD with immediate effect : : +-------+
1307 Placing a read barrier or an address-dependency barrier just before the second
1322 : : +-------+
1323 +-------+ | |
1324 --->| B->2 |------>| |
1325 +-------+ | CPU 2 |
1327 +-------+ | |
1328 The CPU being busy doing a ---> --->| A->0 |~~~~ | |
1329 division speculates on the +-------+ ~ | |
1336 : : ~-->| |
1338 : : +-------+
1344 : : +-------+
1345 +-------+ | |
1346 --->| B->2 |------>| |
1347 +-------+ | CPU 2 |
1349 +-------+ | |
1350 The CPU being busy doing a ---> --->| A->0 |~~~~ | |
1351 division speculates on the +-------+ ~ | |
1357 +-------+ | |
1358 The speculation is discarded ---> --->| A->1 |------>| |
1359 and an updated value is +-------+ | |
1360 retrieved : : +-------+
1364 --------------------
1373 time to all -other- CPUs. The remainder of this document discusses this
1397 multicopy-atomic systems, CPU B's load must return either the same value
1407 able to compensate for non-multicopy atomicity. For example, suppose
1418 This substitution allows non-multicopy atomicity to run rampant: in
1424 example runs on a non-multicopy-atomic system where CPUs 1 and 2 share a
1429 General barriers can compensate not only for non-multicopy atomicity,
1430 but can also generate additional ordering that can ensure that -all-
1431 CPUs will perceive the same order of -all- operations. In contrast, a
1432 chain of release-acquire pairs do not provide this additional ordering,
1473 Furthermore, because of the release-acquire relationship between cpu0()
1479 However, the ordering provided by a release-acquire chain is local
1490 writes in order, CPUs not involved in the release-acquire chain might
1492 the weak memory-barrier instructions used to implement smp_load_acquire()
1495 store to u as happening -after- cpu1()'s load from v, even though
1501 -not- ensure that any particular value will be read. Therefore, the
1526 ----------------
1533 This is a general barrier -- there are no read-read or write-write
1543 interrupt-handler code and the code that was interrupted.
1549 optimizations that, while perfectly safe in single-threaded code, can
1578 for single-threaded code, is almost certainly not what the developer
1599 single-threaded code, but can be fatal in concurrent code:
1617 single-threaded code, so you need to tell the compiler about cases
1631 This transformation is a win for single-threaded code because it
1650 the code into near-nonexistence. (It will still load from the
1678 between process-level code and an interrupt handler:
1694 win for single-threaded code:
1755 In single-threaded code, this is not only safe, but also saves
1757 could cause some other CPU to see a spurious value of 42 -- even
1758 if variable 'a' was never zero -- when loading variable 'b'.
1767 damaging, but they can result in cache-line bouncing and thus in
1772 with a single memory-reference instruction, prevents "load tearing"
1775 16-bit store instructions with 7-bit immediate fields, the compiler
1776 might be tempted to use two 16-bit store-immediate instructions to
1777 implement the following 32-bit store:
1784 This optimization can therefore be a win in single-threaded code.
1794 struct __attribute__((__packed__)) foo {
1799 struct foo foo1, foo2;
1808 implement these three assignment statements as a pair of 32-bit
1809 loads followed by a pair of 32-bit stores. This would result in
1829 -------------------
1841 All memory barriers except the address-dependency barriers imply a compiler
1855 systems because it is assumed that a CPU will appear to be self-consistent,
1866 windows. These barriers are required even on non-SMP systems as they affect
1897 obj->dead = 1;
1899 atomic_dec(&obj->ref_count);
1920 if (desc->status != DEVICE_OWN) {
1925 read_data = desc->data;
1926 desc->data = write_data;
1932 desc->status = DEVICE_OWN;
1948 relaxed I/O accessors and the Documentation/core-api/dma-api.rst file for
1957 For example, after a non-temporal write to pmem region, we use pmem_wmb()
1968 For memory accesses with write-combining attributes (e.g. those returned
1971 write-combining memory accesses before this macro with those after it when
1987 --------------------------
2034 one-way barriers is that the effects of instructions outside of a critical
2055 RELEASE may -not- be assumed to be a full memory barrier.
2080 -could- occur.
2095 a sleep-unlock race, but the locking primitive needs to resolve
2100 anything at all - especially with respect to I/O accesses - unless combined
2103 See also the section on "Inter-CPU acquiring barrier effects".
2133 -----------------------------
2141 SLEEP AND WAKE-UP FUNCTIONS
2142 ---------------------------
2167 STORE current->state
2210 STORE current->state ...
2212 LOAD event_indicated if ((LOAD task->state) & TASK_NORMAL)
2213 STORE task->state
2258 order multiple stores before the wake-up with respect to loads of those stored
2294 -----------------------
2302 INTER-CPU ACQUIRING BARRIER EFFECTS
2311 ---------------------------
2344 be a problem as a single-threaded linear piece of code will still appear to
2358 --------------------------
2398 LOAD waiter->list.next;
2399 LOAD waiter->task;
2400 STORE waiter->task;
2419 Queue waiter
2422 LOAD waiter->task;
2423 STORE waiter->task;
2428 call foo()
2429 foo() clobbers *waiter
2431 LOAD waiter->list.next;
2432 --- OOPS ---
2439 LOAD waiter->list.next;
2440 LOAD waiter->task;
2442 STORE waiter->task;
2452 On a UP system - where this wouldn't be a problem - the smp_mb() is just a
2459 -----------------
2470 -----------------
2479 efficient to reorder, combine or merge accesses - something that would cause
2483 routines - such as inb() or writel() - which know how to make such accesses
2489 See Documentation/driver-api/device-io.rst for more information.
2493 ----------
2499 This may be alleviated - at least in part - by disabling local interrupts (a
2501 the interrupt-disabled section in the driver. While the driver's interrupt
2508 under interrupt-disablement and then the driver's interrupt handler is invoked:
2527 accesses performed in an interrupt - and vice versa - unless implicit or
2537 likely, then interrupt-disabling locks should be used to guarantee ordering.
2545 specific. Therefore, drivers which are inherently non-portable may rely on
2597 The ordering properties of __iomem pointers obtained with non-default
2607 bullets 2-5 above) but they are still guaranteed to be ordered with
2615 register-based, memory-mapped FIFOs residing on peripherals that are not
2621 The inX() and outX() accessors are intended to access legacy port-mapped
2632 Device drivers may expect outX() to emit a non-posted write transaction
2650 little-endian and will therefore perform byte-swapping operations on big-endian
2658 It has to be assumed that the conceptual CPU is weakly-ordered but that it will
2662 of arch-specific code.
2665 stream in any order it feels like - or even in parallel - provided that if an
2671 [*] Some instructions have more than one effect - such as changing the
2672 condition codes, changing registers or changing memory - and different
2698 <--- CPU ---> : <----------- Memory ----------->
2700 +--------+ +--------+ : +--------+ +-----------+
2701 | | | | : | | | | +--------+
2703 | Core |--->| Access |----->| Cache |<-->| | | |
2704 | | | Queue | : | | | |--->| Memory |
2706 +--------+ +--------+ : +--------+ | | | |
2707 : | Cache | +--------+
2709 : | Mechanism | +--------+
2710 +--------+ +--------+ : +--------+ | | | |
2712 | CPU | | Memory | : | CPU | | |--->| Device |
2713 | Core |--->| Access |----->| Cache |<-->| | | |
2714 | | | Queue | : | | | | | |
2715 | | | | : | | | | +--------+
2716 +--------+ +--------+ : +--------+ +-----------+
2728 generate load and store operations which then go into the queue of memory
2729 accesses to be performed. The core may place these in the queue in any order
2747 ----------------------
2764 See Documentation/core-api/cachetlb.rst for more information on cache
2769 -----------------------
2825 (*) the CPU's data cache may affect the ordering, and while cache-coherency
2826 mechanisms may alleviate this - once the store has actually hit the cache
2827 - there's no guarantee that the coherency management will be propagated in
2838 However, it is guaranteed that a CPU will be self-consistent: it will see its
2865 are -not- optional in the above example, as there are architectures
2900 --------------------------
2904 two semantically-related cache lines updated at separate times. This is where
2905 the address-dependency barrier really becomes necessary as this synchronises
2915 ----------------------
2920 barriers for this use-case would be possible but is often suboptimal.
2922 To handle this case optimally, low-level virt_mb() etc macros are available.
2924 identical code for SMP and non-SMP systems. For example, virtual machine guests
2938 ----------------
2943 Documentation/core-api/circular-buffers.rst
2960 Chapter 7.1: Memory-Access Ordering
2963 ARM Architecture Reference Manual (ARMv8, for ARMv8-A architecture profile)
2966 IA-32 Intel Architecture Software Developer's Manual, Volume 3:
2981 Chapter 15: Sparc-V9 Memory Models
2997 Solaris Internals, Core Kernel Architecture, p63-68: