Lines Matching +full:in +full:- +full:memory
4 NUMA Memory Policy
7 What is NUMA Memory Policy?
10 In the Linux kernel, "memory policy" determines from which node the kernel will
11 allocate memory in a NUMA system or in an emulated NUMA system. Linux has
12 supported platforms with Non-Uniform Memory Access architectures since 2.4.?.
13 The current memory policy support was added to Linux 2.6 around May 2004. This
14 document attempts to describe the concepts and APIs of the 2.6 memory policy
17 Memory policies should not be confused with cpusets
18 (``Documentation/admin-guide/cgroup-v1/cpusets.rst``)
20 memory may be allocated by a set of processes. Memory policies are a
21 programming interface that a NUMA-aware application can take advantage of. When
23 takes priority. See :ref:`Memory Policies and cpusets <mem_pol_and_cpusets>`
26 Memory Policy Concepts
29 Scope of Memory Policies
30 ------------------------
32 The Linux kernel supports _scopes_ of memory policy, described here from
42 allocations across all nodes with "sufficient" memory, so as
43 not to overload the initial boot node with boot-time
47 this is an optional, per-task policy. When defined for a
58 executable image that has no awareness of memory policy. See the
59 :ref:`Memory Policy APIs <memory_policy_apis>` section,
63 In a multi-threaded task, task policies apply only to the thread
70 installed. Any pages already faulted in by the task when the task
77 A "VMA" or "Virtual Memory Area" refers to a range of a task's
80 :ref:`Memory Policy APIs <memory_policy_apis>` section,
100 mapping-- i.e., at Copy-On-Write.
103 virtual address space--a.k.a. threads--independent of when
108 are NOT inheritable across exec(). Thus, only NUMA-aware
111 * A task may install a new VMA policy on a sub-range of a
113 the existing virtual memory area into 2 or 3 VMAs, each with
125 Conceptually, shared policies apply to "memory objects" mapped
128 policies--using the mbind() system call specifying a range of
136 As of 2.6.22, only shared memory segments, created by shmget() or
140 support allocation at fault time--a.k.a lazy allocation--so hugetlbfs
145 As mentioned above in :ref:`VMA policies <vma_policy>` section,
156 Thus, different tasks that attach to a shared memory segment can have
159 a shared memory region, when one task has installed shared policy on
162 Components of Memory Policies
163 -----------------------------
165 A NUMA memory policy consists of a "mode", optional mode flags, and
171 Internally, memory policies are implemented by a reference counted
173 discussed in context, below, as required to explain the behavior.
175 NUMA memory policy supports the following 4 behavioral modes:
177 Default Mode--MPOL_DEFAULT
178 This mode is only used in the memory policy APIs. Internally,
179 MPOL_DEFAULT is converted to the NULL memory policy in all
180 policy scopes. Any existing non-default policy will simply be
189 When specified in one of the memory policy APIs, the Default mode
193 be non-empty.
196 This mode specifies that memory must come from the set of
197 nodes specified by the policy. Memory will be allocated from
198 the node in the set with sufficient free memory that is
203 from the single node specified in the policy. If that
204 allocation fails, the kernel will search other nodes, in order
208 Internally, the Preferred policy uses a single node--the
224 page granularity, across the nodes specified in the policy.
228 For allocation of anonymous pages and shared memory pages,
243 specified by the policy based on the order in which they are
246 interleaved system default policy works in this mode.
250 satisfied from the nodemask specified in the policy. If there is
251 a memory pressure on all nodes in the nodemask, the allocation
255 NUMA memory policy supports the following optional mode flags:
260 nodes changes after the memory policy has been defined.
263 change in the set of allowed nodes, the preferred nodemask (Preferred
265 remapped to the new set of allowed nodes. This may result in nodes
268 With this flag, if the user-specified nodes overlap with the
269 nodes allowed by the task's cpuset, then the memory policy is
274 mems 1-3 that sets an Interleave policy over the same set. If
275 the cpuset's mems change to 3-5, the Interleave will now occur
289 set of allowed nodes. The kernel stores the user-passed nodemask,
294 mempolicy is rebound because of a change in the set of allowed
299 1,3,5 may be remapped to 7-9 and then to 1-3 if the set of
304 nodes. In other words, if nodes 0, 2, and 4 are set in the user's
305 nodemask, the policy will be effected over the first (and in the
306 Bind or Interleave case, the third and fifth) nodes in the set of
311 of the new set of allowed nodes (for example, node 5 is set in
312 the user's nodemask when the set of allowed nodes is only 0-3),
314 if not already set, sets the node in the mempolicy nodemask.
317 mems 2-5 that sets an Interleave policy over the same set with
318 MPOL_F_RELATIVE_NODES. If the cpuset's mems change to 3-7, the
319 interleave now occurs over nodes 3,5-7. If the cpuset's mems
320 then change to 0,2-3,5, then the interleave occurs over nodes
321 0,2-3,5.
324 nodemasks to specify memory policies using this flag should
325 disregard their current, actual cpuset imposed memory placement
327 memory nodes 0 to N-1, where N is the number of memory nodes the
329 set of memory nodes allowed by the task's cpuset, as that may
337 Memory Policy Reference Counting
346 When a new memory policy is allocated, its reference count is initialized
348 new policy. When a pointer to a memory policy structure is stored in another
352 During run-time "usage" of the policy, we attempt to minimize atomic operations
380 3) Page allocation usage of task or vma policy occurs in the fault path where
386 shared memory policy while another task, with a distinct mmap_lock, is
392 extra reference on shared policies in the same query/allocation paths
393 used for non-shared policies. For this reason, shared policies are marked
394 as such, and the extra reference is dropped "conditionally"--i.e., only
398 shared policies in a tree structure under spinlock, shared policies are
399 more expensive to use in the page allocation path. This is especially
400 true for shared policies on shared memory regions shared by tasks running
402 falling back to task or system default policy for shared memory regions,
403 or by prefaulting the entire shared memory region into memory and locking
408 Memory Policy APIs
411 Linux supports 4 system calls for controlling memory policy. These APIS
417 user space applications reside in a package that is not part of the
419 prefix, are defined in <linux/syscalls.h>; the mode and flag
420 definitions are defined in <linux/mempolicy.h>.
422 Set [Task] Memory Policy::
427 Set's the calling task's "task/process memory policy" to mode
437 Get [Task] Memory Policy or Related Information::
443 Queries the "task/process memory policy" of the calling task, or the
469 sys_set_mempolicy_home_node set the home node for a VMA policy present in the
473 the default allocation policy to allocate memory close to the local node for an
477 Memory Policy Command Line Interface
480 Although not strictly part of the Linux implementation of memory policy,
486 + set the shared policy for a shared memory segment via mbind(2)
488 The numactl(8) tool is packaged with the run-time version of the library
489 containing the memory policy system call wrappers. Some distributions
490 package the headers and compile-time libraries in a separate development
495 Memory Policies and cpusets
498 Memory policies work within cpusets as described above. For memory policies
503 specified for the policy and the set of nodes with memory is used. If the
508 The interaction of memory policies and cpusets can be problematic when tasks
509 in two cpusets share access to a memory region, such as shared memory segments
512 memories are allowed in both cpusets may be used in the policies. Obtaining
513 this information requires "stepping outside" the memory policy APIs to use the
514 cpuset information and requires that one know in what cpusets other task might
516 memory sets are disjoint, "local" allocation is the only valid policy.