memory-barriers.txt - OpenGrok cross reference for /Linux-v6.6/Documentation/memory-barriers.txt

Lines Matching +refs:is +refs:pre +refs:merge
14 This document is not a specification; it is intentionally (for the sake of
15 brevity) and unintentionally (due to being human) incomplete. This document is
23 To repeat, this document is not a specification of what Linux expects from
26 The purpose of this document is twofold:
35 that, that architecture is incorrect.
37 Note also that it is possible that a barrier may be a no-op for an
137 abstract CPU, memory operation ordering is very relaxed, and a CPU may actually
190 There is an obvious address dependency here, as the value loaded into D depends
206 locations, but the order in which the control registers are accessed is very
271      WRITE_ONCE().  Without them, the compiler is within its rights to
331      using older pre-C11 compilers (for example, gcc 4.6).  The portion
332      of the standard containing this guarantee is Section 3.14, which
345 		to two bit-fields, if one is declared inside a nested
346 		structure declaration and the other is not, or if the two
349 		declaration. It is not safe to concurrently update two
361 What is required is some way of intervening to instruct the compiler and the
367 Such enforcement is important because the CPUs and other devices in a system
387      A write barrier is a partial ordering on stores only; it is not required
400      An address-dependency barrier is a weaker form of read barrier.  In the
404      be required to make sure that the target of the second load is updated
405      after the address obtained by the first load is accessed.
407      An address-dependency barrier is a partial ordering on interdependent
408      loads only; it is not required to have any effect on stores, independent
424      not a control dependency.  If the address for the second load is dependent
425      on the first load, but the dependency is through a conditional rather than
427      a full read barrier or better is required.  See the "Control dependencies"
440      A read barrier is an address-dependency barrier plus a guarantee that all
445      A read barrier is a partial ordering on loads only; it is not required to
462      A general memory barrier is a partial ordering over both loads and stores.
497      for other sorts of memory barrier.  In addition, a RELEASE+ACQUIRE pair is
530  (*) There is no guarantee that any of the memory accesses specified before a
535  (*) There is no guarantee that issuing a memory barrier on one CPU will have
540  (*) There is no guarantee that a CPU will see the correct order of effects
545  (*) There is no guarantee that some intervening piece of off-the-CPU
564 those who are interested in the history, here is the story of
571 The requirement of address-dependency barriers is a little subtle, and
584 [!] READ_ONCE_OLD() corresponds to READ_ONCE() of pre-4.15 kernel, which
624 even-numbered bank of the reading CPU's cache is extremely busy while the
625 odd-numbered bank is idle, one can see the new value of the pointer P (&B),
629 An address-dependency barrier is not required to order dependent writes
646 Therefore, no address-dependency barrier is required to order the read into
647 Q with the store into *Q.  In other words, this outcome is prohibited,
653 of dependency ordering is to -prevent- writes to the data structure, along
659 Note well that the ordering provided by an address dependency is local to
664 The address-dependency barrier is very important to the RCU system,
677 not understand them.  The purpose of this section is to help you prevent
691 This will not have the desired effect because there is no actual address
695 what's actually required is:
703 However, stores are not speculated.  This means that ordering -is- provided
718 Worse yet, if the compiler is able to prove (say) that the value of
719 variable 'a' is always non-zero, it would be well within its rights
728 It is tempting to try to enforce ordering on identical stores on both
756 Now there is no conditional between the load from 'a' and the store to
757 'b', which means that the CPU is within its rights to reorder them:
758 The conditional is absolutely required, and must be present in the
773 ordering is guaranteed only when the stores differ, for example:
784 The initial READ_ONCE() is still required to prevent the compiler from
800 If MAX is defined to be 1, then the compiler knows that (q % MAX) is
801 equal to zero, in which case the compiler is within its rights to
808 Given this transformation, the CPU is not required to respect the ordering
809 between the load from variable 'a' and the store to variable 'b'.  It is
811 is gone, and the barrier won't bring it back.  Therefore, if you are
812 relying on this ordering, you should make sure that MAX is greater than
836 Because the first condition cannot fault and the second condition is
860 It is tempting to argue that there in fact is ordering because the
882 Note well that the ordering provided by a control dependency is local
899       to carry out the stores.  Please note that it is -not- sufficient
907       conditional must involve the prior load.  If the compiler is able
928   (*) Compilers do not understand control dependencies.  It is therefore
936 always be paired.  A lack of appropriate pairing is almost certainly an error.
1010 This sequence of events is committed to the memory coherence system in an order
1079 In the above example, CPU 2 perceives that B is 7, despite the load of *C
1261 The guarantee is that the second load will always come up with A == 1 if the
1269 Many CPUs speculate with loads: that is they see that they will need to load an
1358 	The speculation is discarded --->   --->| A->1  |------>|       |
1359 	and an updated value is                 +-------+       |       |
1366 Multicopy atomicity is a deeply intuitive notion about ordering that is
1390 it loads from X.  The question is then "Can CPU 3's load from X return 0?"
1393 is natural to expect that CPU 3's load from X must therefore return 1.
1408 that CPU 2's general barrier is removed from the above example, leaving
1419 this example, it is perfectly legal for CPU 2's load from X to return 1,
1422 The key point is that although CPU 2's data dependency orders its load
1469 is prohibited:
1475 outcome is prohibited:
1479 However, the ordering provided by a release-acquire chain is local
1481 at least aside from stores.  Therefore, the following outcome is possible:
1485 As an aside, the following outcome is also possible:
1499 However, please keep in mind that smp_load_acquire() is not magic.
1502 following outcome is possible:
1507 consistent system where nothing is ever reordered.
1533 This is a general barrier -- there are no read-read or write-write
1542      One example use for this property is to ease communication between
1553  (*) The compiler is within its rights to reorder loads and stores
1554      to the same variable, and in some cases, the CPU is within its
1570  (*) The compiler is within its rights to merge successive loads from
1578      for single-threaded code, is almost certainly not what the developer
1590  (*) The compiler is within its rights to reload a variable, for example,
1598      This could result in the following code, which is perfectly safe in
1616      is why compilers reload variables.  Doing so is perfectly safe for
1618      where it is not safe.
1620  (*) The compiler is within its rights to omit a load entirely if it knows
1622      the value of variable 'a' is always zero, it can optimize this code:
1631      This transformation is a win for single-threaded code because it
1632      gets rid of a load and a branch.  The problem is that the compiler
1633      will carry out its proof assuming that the current CPU is the only
1634      one updating variable 'a'.  If variable 'a' is shared, then the
1641      But please note that the compiler is also closely watching what you
1643      do the following and MAX is a preprocessor macro with the value 1:
1653  (*) Similarly, the compiler is within its rights to omit a store entirely
1655      Again, the compiler assumes that the current CPU is the only one
1664      The compiler sees that the value of variable 'a' is already zero, so
1676  (*) The compiler is within its rights to reorder memory accesses unless
1692      There is nothing to prevent the compiler from transforming
1741  (*) The compiler is within its rights to invent stores to a variable,
1755      In single-threaded code, this is not only safe, but also saves
1773      and "store tearing," in which a single large access is replaced by
1782      which is not surprising given that it would likely take more
1817 All that aside, it is never necessary to use READ_ONCE() and
1819 because 'jiffies' is marked volatile, it is never necessary to
1820 say READ_ONCE(jiffies).  The reason for this is that READ_ONCE() and
1822 its argument is already marked volatile.
1846 the value of b before loading a[b]), however there is no guarantee in
1848 (eg. is equal to 1) and load a[b] before b (eg. tmp = a[1]; if (b != 1)
1849 tmp = a[b]; ).  There is also the problem of a compiler reloading b after
1852 macro is a good place to start looking.
1855 systems because it is assumed that a CPU will appear to be self-consistent,
1861 is sufficient.
1888      barrier may be required is when atomic ops are used for reference
1901      This makes sure that the death mark on the object is perceived to be set
1902      *before* the reference counter is decremented.
1943      us to guarantee the data is written to the descriptor before the device
1953      This is for use with persistent memory to ensure that stores for which
1960      data transfer caused by subsequent instructions is initiated. This is
1981 This specification is a _minimum_ guarantee; any particular architecture may
2034 one-way barriers is that the effects of instructions outside of a critical
2038 because it is possible for an access preceding the ACQUIRE to happen after the
2077 	One key point is that we are only talking about the CPU doing
2085 	If there is a deadlock, this lock operation will simply spin (or
2091 	But what if the lock is a sleeplock?  In that case, the code will
2117 The following sequence of events is acceptable:
2160 A general memory barrier is interpolated automatically by set_current_state()
2177 The whole sequence above is available in various canned forms, all of which
2200 A general memory barrier is executed by wake_up() if it wakes something up.
2203 is accessed, in particular, it sits between the STORE to indicate the event
2215 where "task" is the thread being woken up and it equals CPU 1's "current".
2217 To repeat, a general memory barrier is guaranteed to be executed by wake_up()
2218 if something is actually awakened, but otherwise there is no such guarantee.
2232 occurs before the task state is accessed.  In particular, if the wake_up() in
2325 Then there is no guarantee as to what order CPU 3 will see the accesses to *A
2343 Under normal operation, memory operation reordering is generally not going to
2362 synchronisation problems, and the usual way of dealing with them is to use
2368 Consider, for example, the R/W semaphore slow path.  Here a waiting process is
2386      next waiter record is;
2409 before proceeding.  Since the record is on the waiter's stack, this means that
2410 if the task pointer is cleared _before_ the next pointer in the list is read,
2437 The way to deal with this is to insert a general SMP memory barrier:
2450 instruction itself is complete.
2452 On a UP system - where this wouldn't be a problem - the smp_mb() is just a
2478 device in the requisite order if the CPU or the compiler thinks it is more
2479 efficient to reorder, combine or merge accesses - something that would cause
2502 routine is executing, the driver's core may not run on the same CPU, and its
2503 interrupt is not permitted to happen again until the current interrupt has been
2508 under interrupt-disablement and then the driver's interrupt handler is invoked:
2536 running on separate CPUs that communicate with each other.  If such a case is
2544 Interfacing with peripherals via I/O accesses is deeply architecture and device
2564 	2. A writeX() issued by a CPU thread holding a spinlock is ordered
2588 	   will arrive at least 1us apart if the first write is immediately read
2589 	   back with readX() and udelay(1) is called prior to the second
2624 	accessed is passed as an argument.
2634 	returning. This is not guaranteed by all architectures and is therefore
2649 writesX()), all of the above assume that the underlying peripheral is
2658 It has to be assumed that the conceptual CPU is weakly-ordered but that it will
2669 causality is maintained.
2681 stream in any way it sees fit, again provided the appearance of causality is
2689 The way cached memory operations are perceived across the system is affected to
2730 it wishes, and continue execution until it is forced to wait for an instruction
2733 What memory barriers are concerned with is controlling the order in which
2760 is discarded from the CPU's cache and reloaded.  To deal with this, the
2775 Amongst these properties is usually the fact that such accesses bypass the
2788 operations in exactly the order specified, so that if the CPU is, for example,
2804 Reality is, of course, much messier.  With many CPUs and compilers, the above
2831 is:
2835 	(Where "LOAD {*C,*D}" is a combined load)
2838 However, it is guaranteed that a CPU will be self-consistent: it will see its
2867 On such architectures, READ_ONCE() and WRITE_ONCE() do whatever is
2885 assumed that the effect of the storage of V to *A is lost.  Similarly:
2902 The DEC Alpha CPU is one of the most relaxed CPUs there is.  Not only that,
2904 two semantically-related cache lines updated at separate times.  This is where
2918 the guest itself is compiled without SMP support.  This is an artifact of
2920 barriers for this use-case would be possible but is often suboptimal.
2923 These have the same effect as smp_mb() etc when SMP is enabled, but generate