memory-barriers.txt - OpenGrok cross reference for /Linux-v4.19/Documentation/memory-barriers.txt

Lines Matching refs:CPU
65      - CPU memory barriers.
75  (*) Inter-CPU acquiring barrier effects.
121 		| CPU 1 |<----->| Memory |<----->| CPU 2 |
138 Each CPU executes a program that generates memory access operations.  In the
139 abstract CPU, memory operation ordering is very relaxed, and a CPU may actually
146 CPU are perceived by the rest of the system as the operations cross the
147 interface between the CPU and rest of the system (the dotted lines).
152 	CPU 1		CPU 2
179 Furthermore, the stores committed by a CPU to the memory system may not be
180 perceived by the loads made by another CPU in the same order as the stores were
186 	CPU 1		CPU 2
193 the address retrieved from P by CPU 2.  At the end of the sequence, any of the
200 Note that CPU 2 will never try and load C into D because the CPU will load P
229 There are some minimal guarantees that may be expected of a CPU:
231  (*) On any given CPU, dependent memory accesses will be issued in order, with
236      the CPU will issue the following memory operations:
241      emits a memory-barrier instruction, so that a DEC Alpha CPU will
249  (*) Overlapping loads and stores within a particular CPU will appear to be
250      ordered within that CPU.  This means that for:
254      the CPU will only issue the following sequence of memory operations:
262      the CPU will only issue:
362 in random order, but this can be a problem for CPU-CPU interaction and for I/O.
364 CPU to restrict the order.
392      A CPU can be viewed as committing a sequence of store operations to the
414      committing sequences of stores to the memory system that the CPU being
415      considered can then perceive.  A data dependency barrier issued by the CPU
417      load touches one of a sequence of stores from another CPU, then by the
515 between two CPUs or between a CPU and a device.  If it can be guaranteed that
532      instruction; the barrier can be considered to draw a line in that CPU's
535  (*) There is no guarantee that issuing a memory barrier on one CPU will have
536      any direct effect on another CPU or any other hardware in the system.  The
537      indirect effect will be the order in which the second CPU sees the effects
538      of the first CPU's accesses occur, but see the next point:
540  (*) There is no guarantee that a CPU will see the correct order of effects
541      from a second CPU's accesses, even _if_ the second CPU uses a memory
542      barrier, unless the first CPU _also_ uses a matching memory barrier (see
545  (*) There is no guarantee that some intervening piece of off-the-CPU
546      hardware[*] will not reorder the memory accesses.  CPU cache coherency
571 	CPU 1		      CPU 2
586 But!  CPU 2's perception of P may be updated _before_ its perception of B, thus
598 	CPU 1		      CPU 2
617 even-numbered bank of the reading CPU's cache is extremely busy while the
630 	CPU 1		      CPU 2
653 the CPU containing it.  See the section on "Multicopy atomicity" for
684 dependency, but rather a control dependency that the CPU may short-circuit
716 	b = 1;  /* BUG: Compiler and CPU can both reorder!!! */
749 'b', which means that the CPU is within its rights to reorder them:
800 Given this transformation, the CPU is not required to respect the ordering
866 A weakly ordered CPU would have no dependency of any sort between the load
875 to the CPU containing it.  See the section on "Multicopy atomicity"
927 When dealing with CPU-CPU interactions, certain types of memory barrier should
940 	CPU 1		      CPU 2
950 	CPU 1		      CPU 2
960 	CPU 1		      CPU 2
978 	CPU 1                               CPU 2
993 	CPU 1
1013 	| CPU 1 |  :    | B=2  |     }
1024 	                   | memory system by CPU 1
1031 	CPU 1			CPU 2
1041 Without intervention, CPU 2 may perceive the events on CPU 1 in some
1042 effectively random order, despite the write barrier issued by CPU 1:
1047 	|       |  :    +------+     \          +-------+  | CPU 2
1048 	| CPU 1 |  :    | A=1  |      \     --->| C->&Y |  V
1058 	                               |        :       :       | CPU 2 |
1071 In the above example, CPU 2 perceives that B is 7, despite the load of *C
1075 and the load of *C (ie: B) on CPU 2:
1077 	CPU 1			CPU 2
1094 	| CPU 1 |  :    | A=1  |      \     --->| C->&Y |
1104 	                               |        :       :       | CPU 2 |
1118 	CPU 1			CPU 2
1127 Without intervention, CPU 2 may then choose to perceive the events on CPU 1 in
1128 some effectively random order, despite the write barrier issued by CPU 1:
1134 	| CPU 1 |   wwwwwwwwwwwwwwww   \    --->| B->9  |
1140 	                                |       +-------+       | CPU 2 |
1152 load of A on CPU 2:
1154 	CPU 1			CPU 2
1164 then the partial ordering imposed by CPU 1 will be perceived correctly by CPU
1171 	| CPU 1 |   wwwwwwwwwwwwwwww   \    --->| B->9  |
1177 	                                |       +-------+       | CPU 2 |
1183 	  to be perceptible to CPU 2            +-------+       |       |
1190 	CPU 1			CPU 2
1208 	| CPU 1 |   wwwwwwwwwwwwwwww   \    --->| B->9  |
1214 	                                |       +-------+       | CPU 2 |
1223 	  to be perceptible to CPU 2            +-------+       |       |
1227 But it may be that the update to A from CPU 1 becomes perceptible to CPU 2
1234 	| CPU 1 |   wwwwwwwwwwwwwwww   \    --->| B->9  |
1240 	                                |       +-------+       | CPU 2 |
1265 actual load instruction to potentially complete immediately because the CPU
1268 It may turn out that the CPU didn't actually need the value - perhaps because a
1274 	CPU 1			CPU 2
1286 	                                        +-------+       | CPU 2 |
1289 	The CPU being busy doing a --->     --->| A->0  |~~~~   |       |
1295 	the CPU can then perform the            :       :       |       |
1302 	CPU 1			CPU 2
1317 	                                        +-------+       | CPU 2 |
1320 	The CPU being busy doing a --->     --->| A->0  |~~~~   |       |
1333 but if there was an update or an invalidation from another CPU pending, then
1339 	                                        +-------+       | CPU 2 |
1342 	The CPU being busy doing a --->     --->| A->0  |~~~~   |       |
1370 	CPU 1			CPU 2			CPU 3
1377 Suppose that CPU 2's load from X returns 1, which it then stores to Y,
1378 and CPU 3's load from Y returns 1.  This indicates that CPU 1's store
1379 to X precedes CPU 2's load from X and that CPU 2's store to Y precedes
1380 CPU 3's load from Y.  In addition, the memory barriers guarantee that
1381 CPU 2 executes its load before its store, and CPU 3 loads from Y before
1382 it loads from X.  The question is then "Can CPU 3's load from X return 0?"
1384 Because CPU 3's load from X in some sense comes after CPU 2's load, it
1385 is natural to expect that CPU 3's load from X must therefore return 1.
1387 on CPU B follows a load from the same variable executing on CPU A (and
1388 CPU A did not originally store the value which it read), then on
1389 multicopy-atomic systems, CPU B's load must return either the same value
1390 that CPU A's load did or some later value.  However, the Linux kernel
1394 for any lack of multicopy atomicity.  In the example, if CPU 2's load
1395 from X returns 1 and CPU 3's load from Y returns 1, then CPU 3's load
1400 that CPU 2's general barrier is removed from the above example, leaving
1403 	CPU 1			CPU 2			CPU 3
1411 this example, it is perfectly legal for CPU 2's load from X to return 1,
1412 CPU 3's load from Y to return 1, and its load from X to return 0.
1414 The key point is that although CPU 2's data dependency orders its load
1415 and store, it does not guarantee to order CPU 1's store.  Thus, if this
1417 store buffer or a level of cache, CPU 2 might have early access to CPU 1's
1514   (*) CPU memory barriers.
1548      to the same variable, and in some cases, the CPU is within its
1556      Prevent both the compiler and the CPU from doing this as follows:
1600      a was modified by some other CPU between the "while" statement and
1627      will carry out its proof assuming that the current CPU is the only
1649      Again, the compiler assumes that the current CPU is the only one
1660      surprise if some other CPU might have stored to variable 'a' in the
1733      though the CPU of course need not do so.
1751      could cause some other CPU to see a spurious value of 42 -- even
1818 Please note that these compiler barriers have no direct effect on the CPU,
1822 CPU MEMORY BARRIERS
1825 The Linux kernel has eight basic CPU memory barriers:
1849 systems because it is assumed that a CPU will appear to be self-consistent,
1862 compiler and the CPU from reordering them.
1901      of writes or reads of shared memory accessible to both the CPU and a
1906      to the device or the CPU, and a doorbell to notify it when new
1951 CPU->Hardware interface and actually affect the hardware at some level.
2036 another CPU not holding that lock.  In short, a ACQUIRE followed by an
2040 not imply a full memory barrier.  Therefore, the CPU's execution of the
2059 	One key point is that we are only talking about the CPU doing
2064 	But suppose the CPU reordered the operations.  In this case,
2065 	the unlock precedes the lock in the assembly code.  The CPU
2068 	try to sleep, but more on that later).	The CPU will eventually
2085 See also the section on "Inter-CPU acquiring barrier effects".
2145 	CPU 1
2188 	CPU 1 (Sleeper)			CPU 2 (Waker)
2197 where "task" is the thread being woken up and it equals CPU 1's "current".
2204 	CPU 1				CPU 2
2284 INTER-CPU ACQUIRING BARRIER EFFECTS
2298 	CPU 1				CPU 2
2307 Then there is no guarantee as to what order CPU 3 will see the accesses to *A
2333 	CPU 1				CPU 2
2354 	CPU 1				CPU 2
2367 this will ensure that the two stores issued on CPU 1 appear at the PCI bridge
2368 before either of the stores issued on CPU 2.
2375 	CPU 1				CPU 2
2411 When there's a system with more than one processor, more than one CPU in the
2462 another CPU might start processing the waiter and might clobber the waiter's
2467 	CPU 1				CPU 2
2505 right order without actually intervening in the CPU.  Since there's only one
2506 CPU, that CPU's dependency ordering logic will take care of everything else.
2523 Many devices can be memory mapped, and so appear to the CPU as if they're just
2527 However, having a clever CPU or a clever compiler creates a potential problem
2529 device in the requisite order if the CPU or the compiler thinks it is more
2560 routine is executing, the driver's core may not run on the same CPU, and its
2609      that's primarily a CPU-specific concept.  The i386 and x86_64 processors
2614      CPUs as i386 and x86_64 - readily maps to the CPU's concept of I/O
2615      space.  However, it may also be mapped as a virtual I/O space in the CPU's
2631      respect to each other on the issuing CPU depends on the characteristics
2674 It has to be assumed that the conceptual CPU is weakly-ordered but that it will
2680 This means that it must be considered that the CPU will execute its instruction
2691 A CPU may also discard any instruction sequence that winds up having no
2702 THE EFFECTS OF THE CPU CACHE
2709 As far as the way a CPU interacts with another part of the system through the
2710 caches goes, the memory system has to include the CPU's caches, and memory
2711 barriers for the most part act at the interface between the CPU and its cache
2714 	    <--- CPU --->         :       <----------- Memory ----------->
2718 	|  CPU   |    | Memory |  :   | CPU    |    |           |    |        |
2728 	|  CPU   |    | Memory |  :   | CPU    |    |           |--->| Device |
2737 CPU that issued it since it may have been satisfied within the CPU's own cache,
2740 cacheline over to the accessing CPU and propagate the effects upon conflict.
2742 The CPU core may execute instructions in any order it deems fit, provided the
2750 accesses cross from the CPU side of things to the memory side of things, and
2754 [!] Memory barriers are _not_ needed within a given CPU, as CPUs always see
2759 the use of any special device communication instructions the CPU may have.
2767 will be ordered.  This means that whilst changes made on one CPU will
2773 has a pair of parallel data caches (CPU 1 has A/B, and CPU 2 has C/D):
2780 	|  CPU 1 |<---+                        |        |
2788 	|  CPU 2 |<---+                        |        |
2803  (*) whilst the CPU core is interrogating one cache, the other cache may be
2814 Imagine, then, that two writes are made on the first CPU, with a write barrier
2815 between them to guarantee that they will appear to reach that CPU's caches in
2818 	CPU 1		CPU 2		COMMENT
2829 the local CPU's caches have apparently been updated in the correct order.  But
2830 now imagine that the second CPU wants to read those values:
2832 	CPU 1		CPU 2		COMMENT
2839 cacheline holding p may get updated in one of the second CPU's caches whilst
2841 CPU's caches by some other cache event:
2843 	CPU 1		CPU 2		COMMENT
2859 Basically, whilst both cachelines will be updated on CPU 2 eventually, there's
2861 as that committed on CPU 1.
2869 	CPU 1		CPU 2		COMMENT
2895 permitted Alpha to sport higher CPU clock rates back in the day.  However,
2907 the kernel must flush the overlapping bits of cache on each CPU (and maybe
2911 cache lines being written back to RAM from a CPU's cache after the device has
2912 installed its own data, or cache lines present in the CPU's cache may simply
2914 is discarded from the CPU's cache and reloaded.  To deal with this, the
2916 cache on each CPU.
2925 a window in the CPU's memory space that has different properties assigned than
2940 A programmer might take it for granted that the CPU will perform memory
2941 operations in exactly the order specified, so that if the CPU is, for example,
2950 they would then expect that the CPU will complete the memory operation for each
2971      of the CPU buses and caches;
2978  (*) the CPU's data cache may affect the ordering, and whilst cache-coherency
2983 So what another CPU, say, might actually observe from the above piece of code
2991 However, it is guaranteed that a CPU will be self-consistent: it will see its
3010 The code above may cause the CPU to generate the full sequence of memory
3019 where a given CPU might reorder successive loads to the same location.
3026 the CPU even sees them.
3049 and the LOAD operation never appear outside of the CPU.
3055 The DEC Alpha CPU is one of the most relaxed CPUs there is.  Not only that,
3056 some versions of the Alpha CPU have a split data cache, permitting them to have