memory-barriers.txt - OpenGrok cross reference for /Linux-v5.10/Documentation/memory-barriers.txt

Lines Matching refs:CPU
65      - CPU memory barriers.
74  (*) Inter-CPU acquiring barrier effects.
119 		| CPU 1 |<----->| Memory |<----->| CPU 2 |
136 Each CPU executes a program that generates memory access operations.  In the
137 abstract CPU, memory operation ordering is very relaxed, and a CPU may actually
144 CPU are perceived by the rest of the system as the operations cross the
145 interface between the CPU and rest of the system (the dotted lines).
150 	CPU 1		CPU 2
177 Furthermore, the stores committed by a CPU to the memory system may not be
178 perceived by the loads made by another CPU in the same order as the stores were
184 	CPU 1		CPU 2
191 the address retrieved from P by CPU 2.  At the end of the sequence, any of the
198 Note that CPU 2 will never try and load C into D because the CPU will load P
227 There are some minimal guarantees that may be expected of a CPU:
229  (*) On any given CPU, dependent memory accesses will be issued in order, with
234      the CPU will issue the following memory operations:
239      emits a memory-barrier instruction, so that a DEC Alpha CPU will
247  (*) Overlapping loads and stores within a particular CPU will appear to be
248      ordered within that CPU.  This means that for:
252      the CPU will only issue the following sequence of memory operations:
260      the CPU will only issue:
360 in random order, but this can be a problem for CPU-CPU interaction and for I/O.
362 CPU to restrict the order.
390      A CPU can be viewed as committing a sequence of store operations to the
412      committing sequences of stores to the memory system that the CPU being
413      considered can then perceive.  A data dependency barrier issued by the CPU
415      load touches one of a sequence of stores from another CPU, then by the
511 between two CPUs or between a CPU and a device.  If it can be guaranteed that
528      instruction; the barrier can be considered to draw a line in that CPU's
531  (*) There is no guarantee that issuing a memory barrier on one CPU will have
532      any direct effect on another CPU or any other hardware in the system.  The
533      indirect effect will be the order in which the second CPU sees the effects
534      of the first CPU's accesses occur, but see the next point:
536  (*) There is no guarantee that a CPU will see the correct order of effects
537      from a second CPU's accesses, even _if_ the second CPU uses a memory
538      barrier, unless the first CPU _also_ uses a matching memory barrier (see
541  (*) There is no guarantee that some intervening piece of off-the-CPU
542      hardware[*] will not reorder the memory accesses.  CPU cache coherency
567 	CPU 1		      CPU 2
582 But!  CPU 2's perception of P may be updated _before_ its perception of B, thus
594 	CPU 1		      CPU 2
613 even-numbered bank of the reading CPU's cache is extremely busy while the
626 	CPU 1		      CPU 2
649 the CPU containing it.  See the section on "Multicopy atomicity" for
680 dependency, but rather a control dependency that the CPU may short-circuit
712 	b = 1;  /* BUG: Compiler and CPU can both reorder!!! */
745 'b', which means that the CPU is within its rights to reorder them:
796 Given this transformation, the CPU is not required to respect the ordering
862 A weakly ordered CPU would have no dependency of any sort between the load
871 to the CPU containing it.  See the section on "Multicopy atomicity"
923 When dealing with CPU-CPU interactions, certain types of memory barrier should
936 	CPU 1		      CPU 2
946 	CPU 1		      CPU 2
956 	CPU 1		      CPU 2
974 	CPU 1                               CPU 2
989 	CPU 1
1009 	| CPU 1 |  :    | B=2  |     }
1020 	                   | memory system by CPU 1
1027 	CPU 1			CPU 2
1037 Without intervention, CPU 2 may perceive the events on CPU 1 in some
1038 effectively random order, despite the write barrier issued by CPU 1:
1043 	|       |  :    +------+     \          +-------+  | CPU 2
1044 	| CPU 1 |  :    | A=1  |      \     --->| C->&Y |  V
1054 	                               |        :       :       | CPU 2 |
1067 In the above example, CPU 2 perceives that B is 7, despite the load of *C
1071 and the load of *C (ie: B) on CPU 2:
1073 	CPU 1			CPU 2
1090 	| CPU 1 |  :    | A=1  |      \     --->| C->&Y |
1100 	                               |        :       :       | CPU 2 |
1114 	CPU 1			CPU 2
1123 Without intervention, CPU 2 may then choose to perceive the events on CPU 1 in
1124 some effectively random order, despite the write barrier issued by CPU 1:
1130 	| CPU 1 |   wwwwwwwwwwwwwwww   \    --->| B->9  |
1136 	                                |       +-------+       | CPU 2 |
1148 load of A on CPU 2:
1150 	CPU 1			CPU 2
1160 then the partial ordering imposed by CPU 1 will be perceived correctly by CPU
1167 	| CPU 1 |   wwwwwwwwwwwwwwww   \    --->| B->9  |
1173 	                                |       +-------+       | CPU 2 |
1179 	  to be perceptible to CPU 2            +-------+       |       |
1186 	CPU 1			CPU 2
1204 	| CPU 1 |   wwwwwwwwwwwwwwww   \    --->| B->9  |
1210 	                                |       +-------+       | CPU 2 |
1219 	  to be perceptible to CPU 2            +-------+       |       |
1223 But it may be that the update to A from CPU 1 becomes perceptible to CPU 2
1230 	| CPU 1 |   wwwwwwwwwwwwwwww   \    --->| B->9  |
1236 	                                |       +-------+       | CPU 2 |
1261 actual load instruction to potentially complete immediately because the CPU
1264 It may turn out that the CPU didn't actually need the value - perhaps because a
1270 	CPU 1			CPU 2
1282 	                                        +-------+       | CPU 2 |
1285 	The CPU being busy doing a --->     --->| A->0  |~~~~   |       |
1291 	the CPU can then perform the            :       :       |       |
1298 	CPU 1			CPU 2
1313 	                                        +-------+       | CPU 2 |
1316 	The CPU being busy doing a --->     --->| A->0  |~~~~   |       |
1329 but if there was an update or an invalidation from another CPU pending, then
1335 	                                        +-------+       | CPU 2 |
1338 	The CPU being busy doing a --->     --->| A->0  |~~~~   |       |
1366 	CPU 1			CPU 2			CPU 3
1373 Suppose that CPU 2's load from X returns 1, which it then stores to Y,
1374 and CPU 3's load from Y returns 1.  This indicates that CPU 1's store
1375 to X precedes CPU 2's load from X and that CPU 2's store to Y precedes
1376 CPU 3's load from Y.  In addition, the memory barriers guarantee that
1377 CPU 2 executes its load before its store, and CPU 3 loads from Y before
1378 it loads from X.  The question is then "Can CPU 3's load from X return 0?"
1380 Because CPU 3's load from X in some sense comes after CPU 2's load, it
1381 is natural to expect that CPU 3's load from X must therefore return 1.
1383 on CPU B follows a load from the same variable executing on CPU A (and
1384 CPU A did not originally store the value which it read), then on
1385 multicopy-atomic systems, CPU B's load must return either the same value
1386 that CPU A's load did or some later value.  However, the Linux kernel
1390 for any lack of multicopy atomicity.  In the example, if CPU 2's load
1391 from X returns 1 and CPU 3's load from Y returns 1, then CPU 3's load
1396 that CPU 2's general barrier is removed from the above example, leaving
1399 	CPU 1			CPU 2			CPU 3
1407 this example, it is perfectly legal for CPU 2's load from X to return 1,
1408 CPU 3's load from Y to return 1, and its load from X to return 0.
1410 The key point is that although CPU 2's data dependency orders its load
1411 and store, it does not guarantee to order CPU 1's store.  Thus, if this
1413 store buffer or a level of cache, CPU 2 might have early access to CPU 1's
1510   (*) CPU memory barriers.
1542      to the same variable, and in some cases, the CPU is within its
1550      Prevent both the compiler and the CPU from doing this as follows:
1594      a was modified by some other CPU between the "while" statement and
1621      will carry out its proof assuming that the current CPU is the only
1643      Again, the compiler assumes that the current CPU is the only one
1654      surprise if some other CPU might have stored to variable 'a' in the
1727      though the CPU of course need not do so.
1745      could cause some other CPU to see a spurious value of 42 -- even
1812 Please note that these compiler barriers have no direct effect on the CPU,
1816 CPU MEMORY BARRIERS
1819 The Linux kernel has eight basic CPU memory barriers:
1843 systems because it is assumed that a CPU will appear to be self-consistent,
1856 compiler and the CPU from reordering them.
1899      of writes or reads of shared memory accessible to both the CPU and a
1904      to the device or the CPU, and a doorbell to notify it when new
2033 another CPU not holding that lock.  In short, a ACQUIRE followed by an
2037 not imply a full memory barrier.  Therefore, the CPU's execution of the
2056 	One key point is that we are only talking about the CPU doing
2061 	But suppose the CPU reordered the operations.  In this case,
2062 	the unlock precedes the lock in the assembly code.  The CPU
2065 	try to sleep, but more on that later).	The CPU will eventually
2082 See also the section on "Inter-CPU acquiring barrier effects".
2142 	CPU 1
2185 	CPU 1 (Sleeper)			CPU 2 (Waker)
2194 where "task" is the thread being woken up and it equals CPU 1's "current".
2201 	CPU 1				CPU 2
2281 INTER-CPU ACQUIRING BARRIER EFFECTS
2295 	CPU 1				CPU 2
2304 Then there is no guarantee as to what order CPU 3 will see the accesses to *A
2339 When there's a system with more than one processor, more than one CPU in the
2390 another CPU might start processing the waiter and might clobber the waiter's
2395 	CPU 1				CPU 2
2433 right order without actually intervening in the CPU.  Since there's only one
2434 CPU, that CPU's dependency ordering logic will take care of everything else.
2451 Many devices can be memory mapped, and so appear to the CPU as if they're just
2455 However, having a clever CPU or a clever compiler creates a potential problem
2457 device in the requisite order if the CPU or the compiler thinks it is more
2481 routine is executing, the driver's core may not run on the same CPU, and its
2540 	   by the same CPU thread to a particular device will arrive in program
2543 	2. A writeX() issued by a CPU thread holding a spinlock is ordered
2544 	   before a writeX() to the same peripheral from another CPU thread
2550 	3. A writeX() by a CPU thread to the peripheral will first wait for the
2552 	   propagated to, the same thread. This ensures that writes by the CPU
2554 	   visible to a DMA engine when the CPU writes to its MMIO control
2557 	4. A readX() by a CPU thread from the peripheral will complete before
2559 	   ensures that reads by the CPU from an incoming DMA buffer allocated
2564 	5. A readX() by a CPU thread from the peripheral will complete before
2566 	   This ensures that two MMIO register writes by the CPU to a peripheral
2587 	respect to other accesses from the same CPU thread to the same
2605 	Since many CPU architectures ultimately access these peripherals via an
2637 It has to be assumed that the conceptual CPU is weakly-ordered but that it will
2643 This means that it must be considered that the CPU will execute its instruction
2654 A CPU may also discard any instruction sequence that winds up having no
2665 THE EFFECTS OF THE CPU CACHE
2672 As far as the way a CPU interacts with another part of the system through the
2673 caches goes, the memory system has to include the CPU's caches, and memory
2674 barriers for the most part act at the interface between the CPU and its cache
2677 	    <--- CPU --->         :       <----------- Memory ----------->
2681 	|  CPU   |    | Memory |  :   | CPU    |    |           |    |        |
2691 	|  CPU   |    | Memory |  :   | CPU    |    |           |--->| Device |
2700 CPU that issued it since it may have been satisfied within the CPU's own cache,
2703 cacheline over to the accessing CPU and propagate the effects upon conflict.
2705 The CPU core may execute instructions in any order it deems fit, provided the
2713 accesses cross from the CPU side of things to the memory side of things, and
2717 [!] Memory barriers are _not_ needed within a given CPU, as CPUs always see
2722 the use of any special device communication instructions the CPU may have.
2732 the kernel must flush the overlapping bits of cache on each CPU (and maybe
2736 cache lines being written back to RAM from a CPU's cache after the device has
2737 installed its own data, or cache lines present in the CPU's cache may simply
2739 is discarded from the CPU's cache and reloaded.  To deal with this, the
2741 cache on each CPU.
2750 a window in the CPU's memory space that has different properties assigned than
2765 A programmer might take it for granted that the CPU will perform memory
2766 operations in exactly the order specified, so that if the CPU is, for example,
2775 they would then expect that the CPU will complete the memory operation for each
2796      of the CPU buses and caches;
2803  (*) the CPU's data cache may affect the ordering, and while cache-coherency
2808 So what another CPU, say, might actually observe from the above piece of code
2816 However, it is guaranteed that a CPU will be self-consistent: it will see its
2835 The code above may cause the CPU to generate the full sequence of memory
2844 where a given CPU might reorder successive loads to the same location.
2851 the CPU even sees them.
2874 and the LOAD operation never appear outside of the CPU.
2880 The DEC Alpha CPU is one of the most relaxed CPUs there is.  Not only that,
2881 some versions of the Alpha CPU have a split data cache, permitting them to have