memory-barriers.txt - OpenGrok cross reference for /Linux-v6.6/Documentation/memory-barriers.txt

Lines Matching +full:max +full:- +full:outbound +full:- +full:regions
19 documentation at tools/memory-model/.  Nevertheless, even this memory
37 Note also that it is possible that a barrier may be a no-op for an
48      - Device operations.
49      - Guarantees.
53      - Varieties of memory barrier.
54      - What may not be assumed about memory barriers?
55      - Address-dependency barriers (historical).
56      - Control dependencies.
57      - SMP barrier pairing.
58      - Examples of memory barrier sequences.
59      - Read memory barriers vs load speculation.
60      - Multicopy atomicity.
64      - Compiler barrier.
65      - CPU memory barriers.
69      - Lock acquisition functions.
70      - Interrupt disabling functions.
71      - Sleep and wake-up functions.
72      - Miscellaneous functions.
74  (*) Inter-CPU acquiring barrier effects.
76      - Acquires vs memory accesses.
80      - Interprocessor interaction.
81      - Atomic operations.
82      - Accessing devices.
83      - Interrupts.
91      - Cache coherency.
92      - Cache coherency vs DMA.
93      - Cache coherency vs MMIO.
97      - And then there's the Alpha.
98      - Virtual Machine Guests.
102      - Circular buffers.
116 		+-------+   :   +--------+   :   +-------+
119 		| CPU 1 |<----->| Memory |<----->| CPU 2 |
122 		+-------+   :   +--------+   :   +-------+
127 		    |       :   +--------+   :       |
130 		    +---------->| Device |<----------+
133 		            :   +--------+   :
159 	STORE A=3,	STORE B=4,	y=LOAD A->3,	x=LOAD B->4
160 	STORE A=3,	STORE B=4,	x=LOAD B->4,	y=LOAD A->3
161 	STORE A=3,	y=LOAD A->3,	STORE B=4,	x=LOAD B->4
162 	STORE A=3,	y=LOAD A->3,	x=LOAD B->2,	STORE B=4
163 	STORE A=3,	x=LOAD B->2,	STORE B=4,	y=LOAD A->3
164 	STORE A=3,	x=LOAD B->2,	y=LOAD A->3,	STORE B=4
165 	STORE B=4,	STORE A=3,	y=LOAD A->3,	x=LOAD B->4
203 -----------------
225 ----------
239      emits a memory-barrier instruction, so that a DEC Alpha CPU will
310 And there are anti-guarantees:
313      generate code to modify these using non-atomic read-modify-write
320      non-atomic read-modify-write sequences can cause an update to one
327      "char", two-byte alignment for "short", four-byte alignment for
328      "int", and either four-byte or eight-byte alignment for "long",
329      on 32-bit and 64-bit systems, respectively.  Note that these
331      using older pre-C11 compilers (for example, gcc 4.6).  The portion
337 		of adjacent bit-fields all having nonzero width
343 		NOTE 2: A bit-field and an adjacent non-bit-field member
345 		to two bit-fields, if one is declared inside a nested
347 		are separated by a zero-length bit-field declaration,
348 		or if they are separated by a non-bit-field member
350 		bit-fields in the same structure if all members declared
351 		between them are also bit-fields, no matter what the
352 		sizes of those intervening bit-fields happen to be.
360 in random order, but this can be a problem for CPU-CPU interaction and for I/O.
376 ---------------------------
395      address-dependency barriers; see the "SMP barrier pairing" subsection.
398  (2) Address-dependency barriers (historical).
400      An address-dependency barrier is a weaker form of read barrier.  In the
403      the second load will be directed), an address-dependency barrier would
407      An address-dependency barrier is a partial ordering on interdependent
413      considered can then perceive.  An address-dependency barrier issued by
418      the address-dependency barrier.
430      [!] Note that address-dependency barriers should normally be paired with
433      [!] Kernel release v5.9 removed kernel APIs for explicit address-
436      address-dependency barriers.
440      A read barrier is an address-dependency barrier plus a guarantee that all
448      Read memory barriers imply address-dependency barriers, and so can
472      This acts as a one-way permeable barrier.  It guarantees that all memory
487      This also acts as a one-way permeable barrier.  It guarantees that all
498      -not- guaranteed to act as a full memory barrier.  However, after an
509 RELEASE variants in addition to fully-ordered and relaxed (no barrier
526 ----------------------------------------------
545  (*) There is no guarantee that some intervening piece of off-the-CPU
552 	    Documentation/driver-api/pci/pci.rst
553 	    Documentation/core-api/dma-api-howto.rst
554 	    Documentation/core-api/dma-api.rst
557 ADDRESS-DEPENDENCY BARRIERS (HISTORICAL)
558 ----------------------------------------
562 to this section are those working on DEC Alpha architecture-specific code
565 address-dependency barriers.
567 [!] While address dependencies are observed in both load-to-load and
568 load-to-store relations, address-dependency barriers are not necessary
569 for load-to-store situations.
571 The requirement of address-dependency barriers is a little subtle, and
584 [!] READ_ONCE_OLD() corresponds to READ_ONCE() of pre-4.15 kernel, which
585 doesn't imply an address-dependency barrier.
602 To deal with this, READ_ONCE() provides an implicit address-dependency barrier
612 			      <implicit address-dependency barrier>
621 even-numbered cache lines and the other bank processes odd-numbered cache
622 lines.  The pointer P might be stored in an odd-numbered cache line, and the
623 variable B might be stored in an even-numbered cache line.  Then, if the
624 even-numbered bank of the reading CPU's cache is extremely busy while the
625 odd-numbered bank is idle, one can see the new value of the pointer P (&B),
629 An address-dependency barrier is not required to order dependent writes
646 Therefore, no address-dependency barrier is required to order the read into
648 even without an implicit address-dependency barrier of modern READ_ONCE():
653 of dependency ordering is to -prevent- writes to the data structure, along
664 The address-dependency barrier is very important to the RCU system,
674 --------------------
680 A load-load control dependency requires a full read memory barrier, not
681 simply an (implicit) address-dependency barrier to make it work correctly.
685 	<implicit address-dependency barrier>
692 dependency, but rather a control dependency that the CPU may short-circuit
703 However, stores are not speculated.  This means that ordering -is- provided
704 for load-store control dependencies, as in the following example:
719 variable 'a' is always non-zero, it would be well within its rights
749 		/* WRITE_ONCE(b, 1); -- moved up, BUG!!! */
752 		/* WRITE_ONCE(b, 1); -- moved up, BUG!!! */
772 In contrast, without explicit memory barriers, two-legged-if control
792 	if (q % MAX) {
800 If MAX is defined to be 1, then the compiler knows that (q % MAX) is
812 relying on this ordering, you should make sure that MAX is greater than
816 	BUILD_BUG_ON(MAX <= 1); /* Order load from a with store to b. */
817 	if (q % MAX) {
829 You must also be careful not to rely too much on boolean short-circuit
844 out-guess your code.  More generally, although READ_ONCE() does force
848 In addition, control dependencies apply only to the then-clause and
849 else-clause of the if-statement in question.  In particular, it does
850 not necessarily apply to code following the if-statement:
864 conditional-move instructions, as in this fanciful pseudo-assembly
877 In short, control dependencies apply only to the stores in the then-clause
878 and else-clause of the if-statement in question (including functions
879 invoked by those two clauses), not to code following that if-statement.
890       However, they do -not- guarantee any other sort of ordering:
899       to carry out the stores.  Please note that it is -not- sufficient
905   (*) Control dependencies require at least one run-time conditional
917   (*) Control dependencies apply only to the then-clause and else-clause
918       of the if-statement containing the control dependency, including
920       do -not- apply to code following the if-statement containing the
925   (*) Control dependencies do -not- provide multicopy atomicity.  If you
933 -------------------
935 When dealing with CPU-CPU interactions, certain types of memory barrier should
942 with an address-dependency barrier, a control dependency, an acquire barrier,
944 read barrier, control dependency, or an address-dependency barrier pairs
963 			      <implicit address-dependency barrier>
983 match the loads after the read barrier or the address-dependency barrier, and
988 	WRITE_ONCE(a, 1);    }----   --->{  v = READ_ONCE(c);
992 	WRITE_ONCE(d, 4);    }----   --->{  y = READ_ONCE(b);
996 ------------------------------------
1015 	+-------+       :      :
1016 	|       |       +------+
1017 	|       |------>| C=3  |     }     /\
1018 	|       |  :    +------+     }-----  \  -----> Events perceptible to
1020 	|       |  :    +------+     }
1022 	|       |       +------+     }
1023 	|       |   wwwwwwwwwwwwwwww }   <--- At this point the write barrier
1024 	|       |       +------+     }        requires all stores prior to the
1026 	|       |  :    +------+     }        further stores may take place
1027 	|       |------>| D=4  |     }
1028 	|       |       +------+
1029 	+-------+       :      :
1036 Secondly, address-dependency barriers act as partial orderings on address-
1052 	+-------+       :      :                :       :
1053 	|       |       +------+                +-------+  | Sequence of update
1054 	|       |------>| B=2  |-----       --->| Y->8  |  | of perception on
1055 	|       |  :    +------+     \          +-------+  | CPU 2
1056 	| CPU 1 |  :    | A=1  |      \     --->| C->&Y |  V
1057 	|       |       +------+       |        +-------+
1059 	|       |       +------+       |        :       :
1060 	|       |  :    | C=&B |---    |        :       :       +-------+
1061 	|       |  :    +------+   \   |        +-------+       |       |
1062 	|       |------>| D=4  |    ----------->| C->&B |------>|       |
1063 	|       |       +------+       |        +-------+       |       |
1064 	+-------+       :      :       |        :       :       |       |
1067 	                               |        +-------+       |       |
1068 	    Apparently incorrect --->  |        | B->7  |------>|       |
1069 	    perception of B (!)        |        +-------+       |       |
1071 	                               |        +-------+       |       |
1072 	    The load of X holds --->    \       | X->9  |------>|       |
1073 	    up the maintenance           \      +-------+       |       |
1074 	    of coherence of B             ----->| B->2  |       +-------+
1075 	                                        +-------+
1082 If, however, an address-dependency barrier were to be placed between the load
1093 				<address-dependency barrier>
1098 	+-------+       :      :                :       :
1099 	|       |       +------+                +-------+
1100 	|       |------>| B=2  |-----       --->| Y->8  |
1101 	|       |  :    +------+     \          +-------+
1102 	| CPU 1 |  :    | A=1  |      \     --->| C->&Y |
1103 	|       |       +------+       |        +-------+
1105 	|       |       +------+       |        :       :
1106 	|       |  :    | C=&B |---    |        :       :       +-------+
1107 	|       |  :    +------+   \   |        +-------+       |       |
1108 	|       |------>| D=4  |    ----------->| C->&B |------>|       |
1109 	|       |       +------+       |        +-------+       |       |
1110 	+-------+       :      :       |        :       :       |       |
1113 	                               |        +-------+       |       |
1114 	                               |        | X->9  |------>|       |
1115 	                               |        +-------+       |       |
1116 	  Makes sure all effects --->   \   aaaaaaaaaaaaaaaaa   |       |
1117 	  prior to the store of C        \      +-------+       |       |
1118 	  are perceptible to              ----->| B->2  |------>|       |
1119 	  subsequent loads                      +-------+       |       |
1120 	                                        :       :       +-------+
1138 	+-------+       :      :                :       :
1139 	|       |       +------+                +-------+
1140 	|       |------>| A=1  |------      --->| A->0  |
1141 	|       |       +------+      \         +-------+
1142 	| CPU 1 |   wwwwwwwwwwwwwwww   \    --->| B->9  |
1143 	|       |       +------+        |       +-------+
1144 	|       |------>| B=2  |---     |       :       :
1145 	|       |       +------+   \    |       :       :       +-------+
1146 	+-------+       :      :    \   |       +-------+       |       |
1147 	                             ---------->| B->2  |------>|       |
1148 	                                |       +-------+       | CPU 2 |
1149 	                                |       | A->0  |------>|       |
1150 	                                |       +-------+       |       |
1151 	                                |       :       :       +-------+
1153 	                                  \     +-------+
1154 	                                   ---->| A->1  |
1155 	                                        +-------+
1175 	+-------+       :      :                :       :
1176 	|       |       +------+                +-------+
1177 	|       |------>| A=1  |------      --->| A->0  |
1178 	|       |       +------+      \         +-------+
1179 	| CPU 1 |   wwwwwwwwwwwwwwww   \    --->| B->9  |
1180 	|       |       +------+        |       +-------+
1181 	|       |------>| B=2  |---     |       :       :
1182 	|       |       +------+   \    |       :       :       +-------+
1183 	+-------+       :      :    \   |       +-------+       |       |
1184 	                             ---------->| B->2  |------>|       |
1185 	                                |       +-------+       | CPU 2 |
1188 	  At this point the read ---->   \  rrrrrrrrrrrrrrrrr   |       |
1189 	  barrier causes all effects      \     +-------+       |       |
1190 	  prior to the storage of B        ---->| A->1  |------>|       |
1191 	  to be perceptible to CPU 2            +-------+       |       |
1192 	                                        :       :       +-------+
1212 	+-------+       :      :                :       :
1213 	|       |       +------+                +-------+
1214 	|       |------>| A=1  |------      --->| A->0  |
1215 	|       |       +------+      \         +-------+
1216 	| CPU 1 |   wwwwwwwwwwwwwwww   \    --->| B->9  |
1217 	|       |       +------+        |       +-------+
1218 	|       |------>| B=2  |---     |       :       :
1219 	|       |       +------+   \    |       :       :       +-------+
1220 	+-------+       :      :    \   |       +-------+       |       |
1221 	                             ---------->| B->2  |------>|       |
1222 	                                |       +-------+       | CPU 2 |
1225 	                                |       +-------+       |       |
1226 	                                |       | A->0  |------>| 1st   |
1227 	                                |       +-------+       |       |
1228 	  At this point the read ---->   \  rrrrrrrrrrrrrrrrr   |       |
1229 	  barrier causes all effects      \     +-------+       |       |
1230 	  prior to the storage of B        ---->| A->1  |------>| 2nd   |
1231 	  to be perceptible to CPU 2            +-------+       |       |
1232 	                                        :       :       +-------+
1238 	+-------+       :      :                :       :
1239 	|       |       +------+                +-------+
1240 	|       |------>| A=1  |------      --->| A->0  |
1241 	|       |       +------+      \         +-------+
1242 	| CPU 1 |   wwwwwwwwwwwwwwww   \    --->| B->9  |
1243 	|       |       +------+        |       +-------+
1244 	|       |------>| B=2  |---     |       :       :
1245 	|       |       +------+   \    |       :       :       +-------+
1246 	+-------+       :      :    \   |       +-------+       |       |
1247 	                             ---------->| B->2  |------>|       |
1248 	                                |       +-------+       | CPU 2 |
1251 	                                  \     +-------+       |       |
1252 	                                   ---->| A->1  |------>| 1st   |
1253 	                                        +-------+       |       |
1255 	                                        +-------+       |       |
1256 	                                        | A->1  |------>| 2nd   |
1257 	                                        +-------+       |       |
1258 	                                        :       :       +-------+
1267 ----------------------------------------
1271 other loads, and so do the load in advance - even though they haven't actually
1276 It may turn out that the CPU didn't actually need the value - perhaps because a
1277 branch circumvented the load - in which case it can discard the value or just
1291 	                                        :       :       +-------+
1292 	                                        +-------+       |       |
1293 	                                    --->| B->2  |------>|       |
1294 	                                        +-------+       | CPU 2 |
1296 	                                        +-------+       |       |
1297 	The CPU being busy doing a --->     --->| A->0  |~~~~   |       |
1298 	division speculates on the              +-------+   ~   |       |
1302 	Once the divisions are complete -->     :       :   ~-->|       |
1304 	LOAD with immediate effect              :       :       +-------+
1307 Placing a read barrier or an address-dependency barrier just before the second
1322 	                                        :       :       +-------+
1323 	                                        +-------+       |       |
1324 	                                    --->| B->2  |------>|       |
1325 	                                        +-------+       | CPU 2 |
1327 	                                        +-------+       |       |
1328 	The CPU being busy doing a --->     --->| A->0  |~~~~   |       |
1329 	division speculates on the              +-------+   ~   |       |
1336 	                                        :       :   ~-->|       |
1338 	                                        :       :       +-------+
1344 	                                        :       :       +-------+
1345 	                                        +-------+       |       |
1346 	                                    --->| B->2  |------>|       |
1347 	                                        +-------+       | CPU 2 |
1349 	                                        +-------+       |       |
1350 	The CPU being busy doing a --->     --->| A->0  |~~~~   |       |
1351 	division speculates on the              +-------+   ~   |       |
1357 	                                        +-------+       |       |
1358 	The speculation is discarded --->   --->| A->1  |------>|       |
1359 	and an updated value is                 +-------+       |       |
1360 	retrieved                               :       :       +-------+
1364 --------------------
1373 time to all -other- CPUs.  The remainder of this document discusses this
1397 multicopy-atomic systems, CPU B's load must return either the same value
1407 able to compensate for non-multicopy atomicity.  For example, suppose
1418 This substitution allows non-multicopy atomicity to run rampant: in
1424 example runs on a non-multicopy-atomic system where CPUs 1 and 2 share a
1429 General barriers can compensate not only for non-multicopy atomicity,
1430 but can also generate additional ordering that can ensure that -all-
1431 CPUs will perceive the same order of -all- operations.  In contrast, a
1432 chain of release-acquire pairs do not provide this additional ordering,
1473 Furthermore, because of the release-acquire relationship between cpu0()
1479 However, the ordering provided by a release-acquire chain is local
1490 writes in order, CPUs not involved in the release-acquire chain might
1492 the weak memory-barrier instructions used to implement smp_load_acquire()
1495 store to u as happening -after- cpu1()'s load from v, even though
1501 -not- ensure that any particular value will be read.  Therefore, the
1526 ----------------
1533 This is a general barrier -- there are no read-read or write-write
1543      interrupt-handler code and the code that was interrupted.
1549 optimizations that, while perfectly safe in single-threaded code, can
1578      for single-threaded code, is almost certainly not what the developer
1599      single-threaded code, but can be fatal in concurrent code:
1617      single-threaded code, so you need to tell the compiler about cases
1631      This transformation is a win for single-threaded code because it
1643      do the following and MAX is a preprocessor macro with the value 1:
1645 	while ((tmp = READ_ONCE(a)) % MAX)
1649      to MAX will always be zero, again allowing the compiler to optimize
1650      the code into near-nonexistence.  (It will still load from the
1678      between process-level code and an interrupt handler:
1694      win for single-threaded code:
1755      In single-threaded code, this is not only safe, but also saves
1757      could cause some other CPU to see a spurious value of 42 -- even
1758      if variable 'a' was never zero -- when loading variable 'b'.
1767      damaging, but they can result in cache-line bouncing and thus in
1772      with a single memory-reference instruction, prevents "load tearing"
1775      16-bit store instructions with 7-bit immediate fields, the compiler
1776      might be tempted to use two 16-bit store-immediate instructions to
1777      implement the following 32-bit store:
1784      This optimization can therefore be a win in single-threaded code.
1808      implement these three assignment statements as a pair of 32-bit
1809      loads followed by a pair of 32-bit stores.  This would result in
1829 -------------------
1841 All memory barriers except the address-dependency barriers imply a compiler
1855 systems because it is assumed that a CPU will appear to be self-consistent,
1866 windows.  These barriers are required even on non-SMP systems as they affect
1897 	obj->dead = 1;
1899 	atomic_dec(&obj->ref_count);
1913      DMA capable device. See Documentation/core-api/dma-api.rst file for more
1921 	if (desc->status != DEVICE_OWN) {
1926 		read_data = desc->data;
1927 		desc->data = write_data;
1933 		desc->status = DEVICE_OWN;
1948      accesses to MMIO regions.  See the later "KERNEL I/O BARRIER EFFECTS"
1957      For example, after a non-temporal write to pmem region, we use pmem_wmb()
1968      For memory accesses with write-combining attributes (e.g. those returned
1971      write-combining memory accesses before this macro with those after it when
1987 --------------------------
2034 one-way barriers is that the effects of instructions outside of a critical
2055 RELEASE may -not- be assumed to be a full memory barrier.
2080 	-could- occur.
2095 	a sleep-unlock race, but the locking primitive needs to resolve
2100 anything at all - especially with respect to I/O accesses - unless combined
2103 See also the section on "Inter-CPU acquiring barrier effects".
2133 -----------------------------
2141 SLEEP AND WAKE-UP FUNCTIONS
2142 ---------------------------
2167 	    STORE current->state
2210 	    STORE current->state	  ...
2212 	LOAD event_indicated		  if ((LOAD task->state) & TASK_NORMAL)
2213 					    STORE task->state
2258 order multiple stores before the wake-up with respect to loads of those stored
2294 -----------------------
2302 INTER-CPU ACQUIRING BARRIER EFFECTS
2311 ---------------------------
2344 be a problem as a single-threaded linear piece of code will still appear to
2358 --------------------------
2398 	LOAD waiter->list.next;
2399 	LOAD waiter->task;
2400 	STORE waiter->task;
2422 	LOAD waiter->task;
2423 	STORE waiter->task;
2431 	LOAD waiter->list.next;
2432 	--- OOPS ---
2439 	LOAD waiter->list.next;
2440 	LOAD waiter->task;
2442 	STORE waiter->task;
2452 On a UP system - where this wouldn't be a problem - the smp_mb() is just a
2459 -----------------
2470 -----------------
2479 efficient to reorder, combine or merge accesses - something that would cause
2483 routines - such as inb() or writel() - which know how to make such accesses
2489 See Documentation/driver-api/device-io.rst for more information.
2493 ----------
2499 This may be alleviated - at least in part - by disabling local interrupts (a
2501 the interrupt-disabled section in the driver.  While the driver's interrupt
2508 under interrupt-disablement and then the driver's interrupt handler is invoked:
2527 accesses performed in an interrupt - and vice versa - unless implicit or
2537 likely, then interrupt-disabling locks should be used to guarantee ordering.
2545 specific. Therefore, drivers which are inherently non-portable may rely on
2574 	   to an outbound DMA buffer allocated by dma_alloc_coherent() will be
2597 	The ordering properties of __iomem pointers obtained with non-default
2607 	bullets 2-5 above) but they are still guaranteed to be ordered with
2615 	register-based, memory-mapped FIFOs residing on peripherals that are not
2621 	The inX() and outX() accessors are intended to access legacy port-mapped
2632 	Device drivers may expect outX() to emit a non-posted write transaction
2650 little-endian and will therefore perform byte-swapping operations on big-endian
2658 It has to be assumed that the conceptual CPU is weakly-ordered but that it will
2662 of arch-specific code.
2665 stream in any order it feels like - or even in parallel - provided that if an
2671  [*] Some instructions have more than one effect - such as changing the
2672      condition codes, changing registers or changing memory - and different
2698 	    <--- CPU --->         :       <----------- Memory ----------->
2700 	+--------+    +--------+  :   +--------+    +-----------+
2701 	|        |    |        |  :   |        |    |           |    +--------+
2703 	|  Core  |--->| Access |----->| Cache  |<-->|           |    |        |
2704 	|        |    | Queue  |  :   |        |    |           |--->| Memory |
2706 	+--------+    +--------+  :   +--------+    |           |    |        |
2707 	                          :                 | Cache     |    +--------+
2709 	                          :                 | Mechanism |    +--------+
2710 	+--------+    +--------+  :   +--------+    |           |    |	      |
2712 	|  CPU   |    | Memory |  :   | CPU    |    |           |--->| Device |
2713 	|  Core  |--->| Access |----->| Cache  |<-->|           |    |        |
2715 	|        |    |        |  :   |        |    |           |    +--------+
2716 	+--------+    +--------+  :   +--------+    +-----------+
2747 ----------------------
2764 See Documentation/core-api/cachetlb.rst for more information on cache
2769 -----------------------
2825  (*) the CPU's data cache may affect the ordering, and while cache-coherency
2826      mechanisms may alleviate this - once the store has actually hit the cache
2827      - there's no guarantee that the coherency management will be propagated in
2838 However, it is guaranteed that a CPU will be self-consistent: it will see its
2865 are -not- optional in the above example, as there are architectures
2900 --------------------------
2904 two semantically-related cache lines updated at separate times.  This is where
2905 the address-dependency barrier really becomes necessary as this synchronises
2915 ----------------------
2920 barriers for this use-case would be possible but is often suboptimal.
2922 To handle this case optimally, low-level virt_mb() etc macros are available.
2924 identical code for SMP and non-SMP systems.  For example, virtual machine guests
2938 ----------------
2943 	Documentation/core-api/circular-buffers.rst
2960 	Chapter 7.1: Memory-Access Ordering
2963 ARM Architecture Reference Manual (ARMv8, for ARMv8-A architecture profile)
2966 IA-32 Intel Architecture Software Developer's Manual, Volume 3:
2981 	Chapter 15: Sparc-V9 Memory Models
2997 Solaris Internals, Core Kernel Architecture, p63-68: