Lines Matching +full:memory +full:- +full:controller
1 .. _cgroup-v2:
11 conventions of cgroup v2. It describes all userland-visible aspects
12 of cgroup including core and specific controller behaviors. All
14 v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgroup-v1>`.
19 1-1. Terminology
20 1-2. What is cgroup?
22 2-1. Mounting
23 2-2. Organizing Processes and Threads
24 2-2-1. Processes
25 2-2-2. Threads
26 2-3. [Un]populated Notification
27 2-4. Controlling Controllers
28 2-4-1. Enabling and Disabling
29 2-4-2. Top-down Constraint
30 2-4-3. No Internal Process Constraint
31 2-5. Delegation
32 2-5-1. Model of Delegation
33 2-5-2. Delegation Containment
34 2-6. Guidelines
35 2-6-1. Organize Once and Control
36 2-6-2. Avoid Name Collisions
38 3-1. Weights
39 3-2. Limits
40 3-3. Protections
41 3-4. Allocations
43 4-1. Format
44 4-2. Conventions
45 4-3. Core Interface Files
47 5-1. CPU
48 5-1-1. CPU Interface Files
49 5-2. Memory
50 5-2-1. Memory Interface Files
51 5-2-2. Usage Guidelines
52 5-2-3. Memory Ownership
53 5-3. IO
54 5-3-1. IO Interface Files
55 5-3-2. Writeback
56 5-3-3. IO Latency
57 5-3-3-1. How IO Latency Throttling Works
58 5-3-3-2. IO Latency Interface Files
59 5-3-4. IO Priority
60 5-4. PID
61 5-4-1. PID Interface Files
62 5-5. Cpuset
63 5.5-1. Cpuset Interface Files
64 5-6. Device
65 5-7. RDMA
66 5-7-1. RDMA Interface Files
67 5-8. HugeTLB
68 5.8-1. HugeTLB Interface Files
69 5-9. Misc
70 5.9-1 Miscellaneous cgroup Interface Files
71 5.9-2 Migration and Ownership
72 5-10. Others
73 5-10-1. perf_event
74 5-N. Non-normative information
75 5-N-1. CPU controller root cgroup process behaviour
76 5-N-2. IO controller root cgroup process behaviour
78 6-1. Basics
79 6-2. The Root and Views
80 6-3. Migration and setns(2)
81 6-4. Interaction with Other Namespaces
83 P-1. Filesystem Support for Writeback
86 R-1. Multiple Hierarchies
87 R-2. Thread Granularity
88 R-3. Competition Between Inner Nodes and Threads
89 R-4. Other Interface Issues
90 R-5. Controller Issues and Remedies
91 R-5-1. Memory
98 -----------
107 ---------------
113 cgroup is largely composed of two parts - the core and controllers.
115 processes. A cgroup controller is usually responsible for
128 disabled selectively on a cgroup. All controller behaviors are
129 hierarchical - if a controller is enabled on a cgroup, it affects all
131 sub-hierarchy of the cgroup. When a controller is enabled on a nested
141 --------
146 # mount -t cgroup2 none $MOUNT_POINT
155 A controller can be moved across hierarchies only after the controller
156 is no longer referenced in its current hierarchy. Because per-cgroup
157 controller states are destroyed asynchronously and controllers may
158 have lingering references, a controller may not show up immediately on
160 Similarly, a controller should be fully disabled to be moved out of
162 controller to become available for other hierarchies; furthermore, due
163 to inter-controller dependencies, other controllers may need to be
169 the hierarchies and controller associations before starting using the
184 ignored on non-init namespace mounts. Please refer to the
188 Only populate memory.events with data for the current cgroup,
193 option is ignored on non-init namespace mounts.
196 Recursively apply memory.min and memory.low protection to
201 behavior but is a mount-option to avoid regressing setups
207 --------------------------------
213 A child cgroup can be created by creating a sub-directory::
218 structure. Each cgroup has a read-writable interface file
220 belong to the cgroup one-per-line. The PIDs are not ordered and the
251 0::/test-cgroup/test-cgroup-nested
258 0::/test-cgroup/test-cgroup-nested (deleted)
284 constraint - threaded controllers can be enabled on non-leaf cgroups
308 - As the cgroup will join the parent's resource domain. The parent
311 - When the parent is an unthreaded domain, it must not have any domain
315 Topology-wise, a cgroup can be in an invalid state. Please consider
318 A (threaded domain) - B (threaded) - C (domain, just created)
333 threads in the cgroup. Except that the operations are per-thread
334 instead of per-process, "cgroup.threads" has the same format and
349 a threaded controller is enabled inside a threaded subtree, it only
355 constraint, a threaded controller must be able to handle competition
356 between threads in a non-leaf cgroup and its child cgroups. Each
357 threaded controller defines how such competitions are handled.
361 --------------------------
363 Each non-root cgroup has a "cgroup.events" file which contains
364 "populated" field indicating whether the cgroup's sub-hierarchy has
368 example, to start a clean-up operation after all processes of a given
369 sub-hierarchy have exited. The populated state updates and
370 notifications are recursive. Consider the following sub-hierarchy
374 A(4) - B(0) - C(1)
384 -----------------------
393 cpu io memory
395 No controller is enabled by default. Controllers can be enabled and
398 # echo "+cpu +memory -io" > cgroup.subtree_control
402 all succeed or fail. If multiple operations on the same controller
405 Enabling a controller in a cgroup indicates that the distribution of
407 Consider the following sub-hierarchy. The enabled controllers are
410 A(cpu,memory) - B(memory) - C()
413 As A has "cpu" and "memory" enabled, A will control the distribution
414 of CPU cycles and memory to its children, in this case, B. As B has
415 "memory" enabled but not "CPU", C and D will compete freely on CPU
416 cycles but their division of memory available to B will be controlled.
418 As a controller regulates the distribution of the target resource to
419 the cgroup's children, enabling it creates the controller's interface
421 would create the "cpu." prefixed controller interface files in C and
422 D. Likewise, disabling "memory" from B would remove the "memory."
423 prefixed controller interface files from C and D. This means that the
424 controller interface files - anything which doesn't start with
428 Top-down Constraint
431 Resources are distributed top-down and a cgroup can further distribute
433 parent. This means that all non-root "cgroup.subtree_control" files
435 "cgroup.subtree_control" file. A controller can be enabled only if
436 the parent has the controller enabled and a controller can't be
443 Non-root cgroups can distribute domain resources to their children
448 This guarantees that, when a domain controller is looking at the part
457 is up to each controller (for more information on this topic please
458 refer to the Non-normative information section in the Controllers
462 enabled controller in the cgroup's "cgroup.subtree_control". This is
471 ----------
491 delegated, the user can build sub-hierarchy under the directory,
495 happens in the delegated sub-hierarchy, nothing can escape the
499 cgroups in or nesting depth of a delegated sub-hierarchy; however,
506 A delegated sub-hierarchy is contained in the sense that processes
507 can't be moved into or out of the sub-hierarchy by the delegatee.
510 requiring the following conditions for a process with a non-root euid
514 - The writer must have write access to the "cgroup.procs" file.
516 - The writer must have write access to the "cgroup.procs" file of the
520 processes around freely in the delegated sub-hierarchy it can't pull
521 in from or push out to outside the sub-hierarchy.
527 ~~~~~~~~~~~~~ - C0 - C00
530 ~~~~~~~~~~~~~ - C1 - C10
537 will be denied with -EACCES.
542 is not reachable, the migration is rejected with -ENOENT.
546 ----------
552 and stateful resources such as memory are not moved together with the
554 inherent trade-offs between migration and various hot paths in terms
560 resource structure once on start-up. Dynamic adjustments to resource
561 distribution can be made by changing controller configuration through
573 controller's interface files are prefixed with the controller name and
574 a dot. A controller's name is composed of lower case alphabets and
593 -------
599 work-conserving. Due to the dynamic nature, this model is usually
615 ------
618 Limits can be over-committed - the sum of the limits of children can
623 As limits can be over-committed, all configuration combinations are
632 -----------
637 soft boundaries. Protections can also be over-committed in which case
644 As protections can be over-committed, all configuration combinations
648 "memory.low" implements best-effort memory protection and is an
653 -----------
656 resource. Allocations can't be over-committed - the sum of the
663 As allocations can't be over-committed, some configuration
668 "cpu.rt.max" hard-allocates realtime slices and is an example of this
676 ------
681 New-line separated values
689 (when read-only or multiple values can be written at once)
715 -----------
717 - Settings for a single feature should be contained in a single file.
719 - The root cgroup should be exempt from resource control and thus
722 - The default time unit is microseconds. If a different unit is ever
725 - A parts-per quantity should use a percentage decimal with at least
726 two digit fractional part - e.g. 13.40.
728 - If a controller implements weight based resource distribution, its
734 - If a controller implements an absolute resource guarantee and/or
736 respectively. If a controller implements best effort resource
743 - If a setting has a configurable default value and keyed specific
757 # cat cgroup-example-interface-file
763 # echo 125 > cgroup-example-interface-file
767 # echo "default 125" > cgroup-example-interface-file
771 # echo "8:16 170" > cgroup-example-interface-file
775 # echo "8:0 default" > cgroup-example-interface-file
776 # cat cgroup-example-interface-file
780 - For events which are not very high frequency, an interface file
787 --------------------
792 A read-write single value file which exists on non-root
798 - "domain" : A normal valid domain cgroup.
800 - "domain threaded" : A threaded domain cgroup which is
803 - "domain invalid" : A cgroup which is in an invalid state.
807 - "threaded" : A threaded cgroup which is a member of a
814 A read-write new-line separated values file which exists on
818 the cgroup one-per-line. The PIDs are not ordered and the
827 - It must have write access to the "cgroup.procs" file.
829 - It must have write access to the "cgroup.procs" file of the
832 When delegating a sub-hierarchy, write access to this file
840 A read-write new-line separated values file which exists on
844 the cgroup one-per-line. The TIDs are not ordered and the
853 - It must have write access to the "cgroup.threads" file.
855 - The cgroup that the thread is currently in must be in the
858 - It must have write access to the "cgroup.procs" file of the
861 When delegating a sub-hierarchy, write access to this file
865 A read-only space separated values file which exists on all
872 A read-write space separated values file which exists on all
879 Space separated list of controllers prefixed with '+' or '-'
880 can be written to enable or disable controllers. A controller
881 name prefixed with '+' enables the controller and '-'
882 disables. If a controller appears more than once on the list,
887 A read-only flat-keyed file which exists on non-root cgroups.
899 A read-write single value files. The default is "max".
906 A read-write single value files. The default is "max".
913 A read-only flat-keyed file with the following entries:
931 A read-write single value file which exists on non-root cgroups.
954 create new sub-cgroups.
957 A write-only single value file which exists in non-root cgroups.
969 the whole thread-group.
974 .. _cgroup-v2-cpu:
977 ---
980 controller implements weight and absolute bandwidth limit models for
992 the cpu controller can only be enabled when all RT processes are in
996 before the cpu controller can be enabled.
1005 A read-only flat-keyed file.
1006 This file exists whether the controller is enabled or not.
1010 - usage_usec
1011 - user_usec
1012 - system_usec
1014 and the following three when the controller is enabled:
1016 - nr_periods
1017 - nr_throttled
1018 - throttled_usec
1021 A read-write single value file which exists on non-root
1027 A read-write single value file which exists on non-root
1030 The nice value is in the range [-20, 19].
1039 A read-write two value file which exists on non-root cgroups.
1051 A read-write nested-keyed file.
1057 A read-write single value file which exists on non-root cgroups.
1072 A read-write single value file which exists on non-root cgroups.
1084 Memory section in Controllers
1085 ------
1087 The "memory" controller regulates distribution of memory. Memory is
1089 intertwining between memory usage and reclaim pressure and the
1090 stateful nature of memory, the distribution model is relatively
1093 While not completely water-tight, all major memory usages by a given
1094 cgroup are tracked so that the total memory consumption can be
1096 following types of memory usages are tracked.
1098 - Userland memory - page cache and anonymous memory.
1100 - Kernel data structures such as dentries and inodes.
1102 - TCP socket buffers.
1107 Memory Interface Files argument
1110 All memory amounts are in bytes. If a value which is not aligned to
1114 memory.current
1115 A read-only single value file which exists on non-root
1118 The total amount of memory currently being used by the cgroup
1121 memory.min
1122 A read-write single value file which exists on non-root
1125 Hard memory protection. If the memory usage of a cgroup
1126 is within its effective min boundary, the cgroup's memory
1128 unprotected reclaimable memory available, OOM killer
1134 Effective min boundary is limited by memory.min values of
1135 all ancestor cgroups. If there is memory.min overcommitment
1136 (child cgroup or cgroups are requiring more protected memory
1139 actual memory usage below memory.min.
1141 Putting more memory than generally available under this
1144 If a memory cgroup is not populated with processes,
1145 its memory.min is ignored.
1147 memory.low
1148 A read-write single value file which exists on non-root
1151 Best-effort memory protection. If the memory usage of a
1153 memory won't be reclaimed unless there is no reclaimable
1154 memory available in unprotected cgroups.
1160 Effective low boundary is limited by memory.low values of
1161 all ancestor cgroups. If there is memory.low overcommitment
1162 (child cgroup or cgroups are requiring more protected memory
1165 actual memory usage below memory.low.
1167 Putting more memory than generally available under this
1170 memory.high
1171 A read-write single value file which exists on non-root
1174 Memory usage throttle limit. This is the main mechanism to
1175 control memory usage of a cgroup. If a cgroup's usage goes
1182 memory.max
1183 A read-write single value file which exists on non-root
1186 Memory usage hard limit. This is the final protection
1187 mechanism. If a cgroup's memory usage reaches this limit and
1192 In default configuration regular 0-order allocations always
1197 as -ENOMEM or silently ignore in cases like disk readahead.
1203 memory.oom.group
1204 A read-write single value file which exists on non-root
1210 (if the memory cgroup is not a leaf cgroup) are killed
1214 Tasks with the OOM protection (oom_score_adj set to -1000)
1219 memory.oom.group values of ancestor cgroups.
1221 memory.events
1222 A read-only flat-keyed file which exists on non-root cgroups.
1230 memory.events.local.
1234 high memory pressure even though its usage is under
1236 boundary is over-committed.
1240 throttled and routed to perform direct memory reclaim
1241 because the high memory boundary was exceeded. For a
1242 cgroup whose memory usage is capped by the high limit
1243 rather than global memory pressure, this event's
1247 The number of times the cgroup's memory usage was
1252 The number of time the cgroup's memory usage was
1256 considered as an option, e.g. for failed high-order
1263 memory.events.local
1264 Similar to memory.events but the fields in the file are local
1268 memory.stat
1269 A read-only flat-keyed file which exists on non-root cgroups.
1271 This breaks down the cgroup's memory footprint into different
1272 types of memory, type-specific details, and other information
1273 on the state and past events of the memory management system.
1275 All memory amounts are in bytes.
1281 If the entry has no per-node counter (or not show in the
1282 memory.numa_stat). We use 'npn' (non-per-node) as the tag
1283 to indicate that it will not show in the memory.numa_stat.
1286 Amount of memory used in anonymous mappings such as
1290 Amount of memory used to cache filesystem data,
1291 including tmpfs and shared memory.
1294 Amount of memory allocated to kernel stacks.
1297 Amount of memory allocated for page tables.
1300 Amount of memory used for storing per-cpu kernel
1304 Amount of memory used in network transmission buffers
1307 Amount of cached filesystem data that is swap-backed,
1322 Amount of swap cached in memory. The swapcache is accounted
1323 against both memory and swap usage.
1326 Amount of memory used in anonymous mappings backed by
1338 Amount of memory, swap-backed and filesystem-backed,
1339 on the internal memory management lists used by the
1343 memory management lists), inactive_foo + active_foo may not be equal to
1344 the value for the foo counter, since the foo counter is type-based, not
1345 list-based.
1352 Part of "slab" that cannot be reclaimed on memory
1356 Amount of memory used for storing in-kernel data
1405 Amount of pages postponed to be freed under memory pressure
1420 memory.numa_stat
1421 A read-only nested-keyed file which exists on non-root cgroups.
1423 This breaks down the cgroup's memory footprint into different
1424 types of memory, type-specific details, and other information
1425 per node on the state of the memory management system.
1433 All memory amounts are in bytes.
1435 The output format of memory.numa_stat is::
1443 The entries can refer to the memory.stat.
1445 memory.swap.current
1446 A read-only single value file which exists on non-root
1452 memory.swap.high
1453 A read-write single value file which exists on non-root
1458 allow userspace to implement custom out-of-memory procedures.
1462 during regular operation. Compare to memory.swap.max, which
1464 continue unimpeded as long as other memory can be reclaimed.
1468 memory.swap.max
1469 A read-write single value file which exists on non-root
1473 limit, anonymous memory of the cgroup will not be swapped out.
1475 memory.swap.events
1476 A read-only flat-keyed file which exists on non-root cgroups.
1492 because of running out of swap system-wide or max
1498 reduces the impact on the workload and memory management.
1500 memory.pressure
1501 A read-only nested-keyed file.
1503 Shows pressure stall information for memory. See
1510 "memory.high" is the main mechanism to control memory usage.
1511 Over-committing on high limit (sum of high limits > available memory)
1512 and letting global memory pressure to distribute memory according to
1518 more memory or terminating the workload.
1520 Determining whether a cgroup has enough memory is not trivial as
1521 memory usage doesn't indicate whether the workload can benefit from
1522 more memory. For example, a workload which writes data received from
1523 network to a file can use all available memory but can also operate as
1524 performant with a small amount of memory. A measure of memory
1525 pressure - how much the workload is being impacted due to lack of
1526 memory - is necessary to determine whether a workload needs more
1527 memory; unfortunately, memory pressure monitoring mechanism isn't
1531 Memory Ownership argument
1534 A memory area is charged to the cgroup which instantiated it and stays
1536 to a different cgroup doesn't move the memory usages that it
1539 A memory area may be used by processes belonging to different cgroups.
1540 To which cgroup the area will be charged is in-deterministic; however,
1541 over time, the memory area is likely to end up in a cgroup which has
1542 enough memory allowance to avoid high reclaim pressure.
1544 If a cgroup sweeps a considerable amount of memory which is expected
1546 POSIX_FADV_DONTNEED to relinquish the ownership of memory areas
1547 belonging to the affected files to ensure correct memory ownership.
1551 --
1553 The "io" controller regulates the distribution of IO resources. This
1554 controller implements both weight based and absolute bandwidth or IOPS
1556 only if cfq-iosched is in use and neither scheme is available for
1557 blk-mq devices.
1564 A read-only nested-keyed file.
1584 A read-write nested-keyed file which exists only on the root
1588 model based controller (CONFIG_BLK_CGROUP_IOCOST) which
1596 enable Weight-based control enable
1606 The controller is disabled by default and can be enabled by
1608 to zero and the controller uses internal device saturation
1616 shows that on sdb, the controller is enabled, will consider
1628 devices which show wide temporary behavior changes - e.g. a
1639 A read-write nested-keyed file which exists only on the root
1643 controller (CONFIG_BLK_CGROUP_IOCOST) which currently
1652 model The cost model in use - "linear"
1678 generate device-specific coefficients.
1681 A read-write flat-keyed file which exists on non-root cgroups.
1701 A read-write nested-keyed file which exists on non-root
1715 When writing, any number of nested key-value pairs can be
1740 A read-only nested-keyed file.
1751 mechanism. Writeback sits between the memory and IO domains and
1752 regulates the proportion of dirty memory by balancing dirtying and
1755 The io controller, in conjunction with the memory controller,
1756 implements control of page cache writeback IOs. The memory controller
1757 defines the memory domain that dirty memory ratio is calculated and
1758 maintained for and the io controller defines the io domain which
1759 writes out dirty pages for the memory domain. Both system-wide and
1760 per-cgroup dirty memory states are examined and the more restrictive
1768 There are inherent differences in memory and writeback management
1769 which affects how cgroup ownership is tracked. Memory is tracked per
1774 As cgroup ownership for memory is tracked per page, there can be pages
1786 As memory controller assigns page ownership on the first use and
1797 amount of available memory capped by limits imposed by the
1798 memory controller and system-wide clean memory.
1802 total available memory and applied the same way as
1809 This is a cgroup v2 controller for IO workload protection. You provide a group
1811 controller will throttle any peers that have a lower latency target than the
1831 your real setting, setting at 10-15% higher than the value in io.stat.
1837 target the controller doesn't do anything. Once a group starts missing its
1841 - Queue depth throttling. This is the number of outstanding IO's a group is
1845 - Artificial delay induction. There are certain types of IO that cannot be
1868 If the controller is enabled you will see extra stats in io.stat in
1892 no-change
1895 none-to-rt
1900 restrict-to-be
1911 +-------------+---+
1912 | no-change | 0 |
1913 +-------------+---+
1914 | none-to-rt | 1 |
1915 +-------------+---+
1916 | rt-to-be | 2 |
1917 +-------------+---+
1918 | all-to-idle | 3 |
1919 +-------------+---+
1923 +-------------------------------+---+
1925 +-------------------------------+---+
1926 | IOPRIO_CLASS_RT (real-time) | 1 |
1927 +-------------------------------+---+
1929 +-------------------------------+---+
1931 +-------------------------------+---+
1935 - Translate the I/O priority class policy into a number.
1936 - Change the request I/O priority class into the maximum of the I/O priority
1940 ---
1942 The process number controller is used to allow a cgroup to stop any
1947 controllers cannot prevent, thus warranting its own controller. For
1949 hitting memory restrictions.
1951 Note that PIDs used in this controller refer to TIDs, process IDs as
1959 A read-write single value file which exists on non-root
1965 A read-only single value file which exists on all cgroups.
1975 through fork() or clone(). These will return -EAGAIN if the creation
1980 ------
1982 The "cpuset" controller provides a mechanism for constraining
1983 the CPU and memory node placement of tasks to only the resources
1987 memory placement to reduce cross-node memory access and contention
1990 The "cpuset" controller is hierarchical. That means the controller
1991 cannot use CPUs or memory nodes not allowed in its parent.
1998 A read-write multiple values file which exists on non-root
1999 cpuset-enabled cgroups.
2006 The CPU numbers are comma-separated numbers or ranges.
2010 0-4,6,8-10
2013 setting as the nearest cgroup ancestor with a non-empty
2020 A read-only multiple values file which exists on all
2021 cpuset-enabled cgroups.
2037 A read-write multiple values file which exists on non-root
2038 cpuset-enabled cgroups.
2040 It lists the requested memory nodes to be used by tasks within
2041 this cgroup. The actual list of memory nodes granted, however,
2043 from the requested memory nodes.
2045 The memory node numbers are comma-separated numbers or ranges.
2049 0-1,3
2052 setting as the nearest cgroup ancestor with a non-empty
2053 "cpuset.mems" or all the available memory nodes if none
2057 and won't be affected by any memory nodes hotplug events.
2059 Setting a non-empty value to "cpuset.mems" causes memory of
2061 they are currently using memory outside of the designated nodes.
2063 There is a cost for this memory migration. The migration
2064 may not be complete and some memory pages may be left behind.
2071 A read-only multiple values file which exists on all
2072 cpuset-enabled cgroups.
2074 It lists the onlined memory nodes that are actually granted to
2075 this cgroup by its parent. These memory nodes are allowed to
2078 If "cpuset.mems" is empty, it shows all the memory nodes from the
2081 the memory nodes listed in "cpuset.mems" can be granted. In this
2084 Its value will be affected by memory nodes hotplug events.
2087 A read-write single value file which exists on non-root
2088 cpuset-enabled cgroups. This flag is owned by the parent cgroup
2095 "member" a non-root member of a partition
2138 "member" Non-root member of a partition
2164 Device controller
2165 -----------------
2167 Device controller manages access to device files. It includes both
2171 Cgroup v2 device controller has no interface files and is implemented
2176 on the return value the attempt will succeed or fail with -EPERM.
2181 If the program returns 0, the attempt fails with -EPERM, otherwise it
2189 ----
2191 The "rdma" controller regulates the distribution and accounting of
2198 A readwrite nested-keyed file that exists for all the cgroups
2219 A read-only file that describes current resource usage.
2228 -------
2230 The HugeTLB controller allows to limit the HugeTLB usage per control group and
2231 enforces the controller limit during page fault.
2245 A read-only flat-keyed file which exists on non-root cgroups.
2256 ----
2260 cgroup resources. Controller is enabled by the CONFIG_CGROUP_MISC config
2263 A resource can be added to the controller via enum misc_res_type{} in the
2269 uncharge APIs. All of the APIs to interact with misc controller are in
2275 Miscellaneous controller provides 3 interface files. If two misc resources (res_a and res_b) are re…
2278 A read-only flat-keyed file shown only in the root cgroup. It shows
2287 A read-only flat-keyed file shown in the non-root cgroups. It shows
2295 A read-write flat-keyed file shown in the non root cgroups. Allowed
2322 ------
2327 perf_event controller, if not mounted on a legacy hierarchy, is
2329 always be filtered by cgroup v2 path. The controller can still be
2333 Non-normative information
2334 -------------------------
2340 CPU controller root cgroup process behaviour
2350 appropriately so the neutral - nice 0 - value is 100 instead of 1024).
2353 IO controller root cgroup process behaviour
2366 ------
2385 The path '/batchjobs/container_id1' can be considered as system-data
2390 # ls -l /proc/self/ns/cgroup
2391 lrwxrwxrwx 1 root root 0 2014-07-15 10:37 /proc/self/ns/cgroup -> cgroup:[4026531835]
2397 # ls -l /proc/self/ns/cgroup
2398 lrwxrwxrwx 1 root root 0 2014-07-15 10:35 /proc/self/ns/cgroup -> cgroup:[4026532183]
2402 When some thread from a multi-threaded process unshares its cgroup
2414 ------------------
2425 # ~/unshare -c # unshare cgroupns in some cgroup
2433 Each process gets its namespace-specific view of "/proc/$PID/cgroup"
2464 ----------------------
2493 ---------------------------------
2496 running inside a non-init cgroup namespace::
2498 # mount -t cgroup2 none $MOUNT_POINT
2505 the view of cgroup hierarchy by namespace-private cgroupfs mount
2518 --------------------------------
2521 address_space_operations->writepage[s]() to annotate bio's using the
2538 super_block by setting SB_I_CGROUPWB in ->s_iflags. This allows for
2555 - Multiple hierarchies including named ones are not supported.
2557 - All v1 mount options are not supported.
2559 - The "tasks" file is removed and "cgroup.procs" is not sorted.
2561 - "cgroup.clone_children" is removed.
2563 - /proc/cgroups is meaningless for v2. Use "cgroup.controllers" file
2571 --------------------
2577 For example, as there is only one instance of each controller, utility
2584 the specific controller.
2588 each controller on its own hierarchy. Only closely related ones, such
2607 Also, as a controller couldn't have any expectation regarding the
2609 controller had to assume that all other controllers were attached to
2616 depending on the specific controller. In other words, hierarchy may
2619 how memory is distributed beyond a certain level while still wanting
2624 ------------------
2632 Generally, in-process knowledge is available only to the process
2633 itself; thus, unlike service-level organization of processes,
2640 sub-hierarchies and control resource distributions along them. This
2641 effectively raised cgroup to the status of a syscall-like API exposed
2651 that the process would actually be operating on its own sub-hierarchy.
2655 system-management pseudo filesystem. cgroup ended up with interface
2658 individual applications through the ill-defined delegation mechanism
2668 -------------------------------------------
2676 The cpu controller considered threads and cgroups as equivalents and
2679 cycles and the number of internal threads fluctuated - the ratios
2685 The io controller implicitly created a hidden leaf node for each
2693 The memory controller didn't have a way to control what happened
2695 clearly defined. There were attempts to add ad-hoc behaviors and
2709 ----------------------
2713 was how an empty cgroup was notified - a userland helper binary was
2716 to in-kernel event delivery filtering mechanism further complicating
2719 Controller interfaces were problematic too. An extreme example is
2731 formats and units even in the same controller.
2737 Controller Issues and Remedies
2738 ------------------------------
2740 Memory subsection
2745 global reclaim prefers is opt-in, rather than opt-out. The costs for
2755 becomes self-defeating.
2757 The memory.low boundary on the other hand is a top-down allocated
2766 available memory. The memory consumption of workloads varies during
2774 The memory.high boundary on the other hand can be set much more
2780 and make corrections until the minimal memory footprint that still
2787 system than killing the group. Otherwise, memory.max is there to
2791 Setting the original memory.limit_in_bytes below the current usage was
2793 limit setting to fail. memory.max on the other hand will first set the
2795 new limit is met - or the task writing to memory.max is killed.
2797 The combined memory+swap accounting and limiting is replaced by real
2800 The main argument for a combined memory+swap facility in the original
2802 able to swap all anonymous memory of a child group, regardless of the
2804 groups can sabotage swapping by other means - such as referencing its
2805 anonymous memory in a tight loop - and an admin can not assume full