Lines Matching +full:memory +full:- +full:controllers

1 .. _cgroup-v2:
11 conventions of cgroup v2. It describes all userland-visible aspects
14 v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgroup-v1>`.
19 1-1. Terminology
20 1-2. What is cgroup?
22 2-1. Mounting
23 2-2. Organizing Processes and Threads
24 2-2-1. Processes
25 2-2-2. Threads
26 2-3. [Un]populated Notification
27 2-4. Controlling Controllers
28 2-4-1. Enabling and Disabling
29 2-4-2. Top-down Constraint
30 2-4-3. No Internal Process Constraint
31 2-5. Delegation
32 2-5-1. Model of Delegation
33 2-5-2. Delegation Containment
34 2-6. Guidelines
35 2-6-1. Organize Once and Control
36 2-6-2. Avoid Name Collisions
38 3-1. Weights
39 3-2. Limits
40 3-3. Protections
41 3-4. Allocations
43 4-1. Format
44 4-2. Conventions
45 4-3. Core Interface Files
46 5. Controllers
47 5-1. CPU
48 5-1-1. CPU Interface Files
49 5-2. Memory
50 5-2-1. Memory Interface Files
51 5-2-2. Usage Guidelines
52 5-2-3. Memory Ownership
53 5-3. IO
54 5-3-1. IO Interface Files
55 5-3-2. Writeback
56 5-3-3. IO Latency
57 5-3-3-1. How IO Latency Throttling Works
58 5-3-3-2. IO Latency Interface Files
59 5-3-4. IO Priority
60 5-4. PID
61 5-4-1. PID Interface Files
62 5-5. Cpuset
63 5.5-1. Cpuset Interface Files
64 5-6. Device
65 5-7. RDMA
66 5-7-1. RDMA Interface Files
67 5-8. HugeTLB
68 5.8-1. HugeTLB Interface Files
69 5-9. Misc
70 5.9-1 Miscellaneous cgroup Interface Files
71 5.9-2 Migration and Ownership
72 5-10. Others
73 5-10-1. perf_event
74 5-N. Non-normative information
75 5-N-1. CPU controller root cgroup process behaviour
76 5-N-2. IO controller root cgroup process behaviour
78 6-1. Basics
79 6-2. The Root and Views
80 6-3. Migration and setns(2)
81 6-4. Interaction with Other Namespaces
83 P-1. Filesystem Support for Writeback
86 R-1. Multiple Hierarchies
87 R-2. Thread Granularity
88 R-3. Competition Between Inner Nodes and Threads
89 R-4. Other Interface Issues
90 R-5. Controller Issues and Remedies
91 R-5-1. Memory
98 -----------
102 qualifier as in "cgroup controllers". When explicitly referring to
107 ---------------
113 cgroup is largely composed of two parts - the core and controllers.
117 although there are utility controllers which serve purposes other than
127 Following certain structural constraints, controllers may be enabled or
129 hierarchical - if a controller is enabled on a cgroup, it affects all
131 sub-hierarchy of the cgroup. When a controller is enabled on a nested
141 --------
146 # mount -t cgroup2 none $MOUNT_POINT
149 controllers which support v2 and are not bound to a v1 hierarchy are
151 Controllers which are not in active use in the v2 hierarchy can be
156 is no longer referenced in its current hierarchy. Because per-cgroup
157 controller states are destroyed asynchronously and controllers may
163 to inter-controller dependencies, other controllers may need to be
167 controllers dynamically between the v2 and other hierarchies is
170 controllers after system boot.
173 automount the v1 cgroup filesystem and so hijack all controllers
176 disabling controllers in v1 and make them always available in v2.
184 ignored on non-init namespace mounts. Please refer to the
192 controllers, and then seeding it with CLONE_INTO_CGROUP is
196 Only populate memory.events with data for the current cgroup,
201 option is ignored on non-init namespace mounts.
204 Recursively apply memory.min and memory.low protection to
209 behavior but is a mount-option to avoid regressing setups
215 --------------------------------
221 A child cgroup can be created by creating a sub-directory::
226 structure. Each cgroup has a read-writable interface file
228 belong to the cgroup one-per-line. The PIDs are not ordered and the
259 0::/test-cgroup/test-cgroup-nested
266 0::/test-cgroup/test-cgroup-nested (deleted)
272 cgroup v2 supports thread granularity for a subset of controllers to
280 Controllers which support thread mode are called threaded controllers.
281 The ones which don't are called domain controllers.
292 constraint - threaded controllers can be enabled on non-leaf cgroups
316 - As the cgroup will join the parent's resource domain. The parent
319 - When the parent is an unthreaded domain, it must not have any domain
320 controllers enabled or populated domain children. The root is
323 Topology-wise, a cgroup can be in an invalid state. Please consider
326 A (threaded domain) - B (threaded) - C (domain, just created)
335 cgroup becomes threaded or threaded controllers are enabled in the
341 threads in the cgroup. Except that the operations are per-thread
342 instead of per-process, "cgroup.threads" has the same format and
356 Only threaded controllers can be enabled in a threaded subtree. When
364 between threads in a non-leaf cgroup and its child cgroups. Each
369 --------------------------
371 Each non-root cgroup has a "cgroup.events" file which contains
372 "populated" field indicating whether the cgroup's sub-hierarchy has
376 example, to start a clean-up operation after all processes of a given
377 sub-hierarchy have exited. The populated state updates and
378 notifications are recursive. Consider the following sub-hierarchy
382 A(4) - B(0) - C(1)
391 Controlling Controllers
392 -----------------------
397 Each cgroup has a "cgroup.controllers" file which lists all
398 controllers available for the cgroup to enable::
400 # cat cgroup.controllers
401 cpu io memory
403 No controller is enabled by default. Controllers can be enabled and
406 # echo "+cpu +memory -io" > cgroup.subtree_control
408 Only controllers which are listed in "cgroup.controllers" can be
415 Consider the following sub-hierarchy. The enabled controllers are
418 A(cpu,memory) - B(memory) - C()
421 As A has "cpu" and "memory" enabled, A will control the distribution
422 of CPU cycles and memory to its children, in this case, B. As B has
423 "memory" enabled but not "CPU", C and D will compete freely on CPU
424 cycles but their division of memory available to B will be controlled.
430 D. Likewise, disabling "memory" from B would remove the "memory."
432 controller interface files - anything which doesn't start with
436 Top-down Constraint
439 Resources are distributed top-down and a cgroup can further distribute
441 parent. This means that all non-root "cgroup.subtree_control" files
442 can only contain controllers which are enabled in the parent's
451 Non-root cgroups can distribute domain resources to their children
454 controllers enabled in their "cgroup.subtree_control" files.
464 controllers. How resource consumption in the root cgroup is governed
466 refer to the Non-normative information section in the Controllers
474 children before enabling controllers in its "cgroup.subtree_control"
479 ----------
499 delegated, the user can build sub-hierarchy under the directory,
502 of all resource controllers are hierarchical and regardless of what
503 happens in the delegated sub-hierarchy, nothing can escape the
507 cgroups in or nesting depth of a delegated sub-hierarchy; however,
514 A delegated sub-hierarchy is contained in the sense that processes
515 can't be moved into or out of the sub-hierarchy by the delegatee.
518 requiring the following conditions for a process with a non-root euid
522 - The writer must have write access to the "cgroup.procs" file.
524 - The writer must have write access to the "cgroup.procs" file of the
528 processes around freely in the delegated sub-hierarchy it can't pull
529 in from or push out to outside the sub-hierarchy.
535 ~~~~~~~~~~~~~ - C0 - C00
538 ~~~~~~~~~~~~~ - C1 - C10
545 will be denied with -EACCES.
550 is not reachable, the migration is rejected with -ENOENT.
554 ----------
560 and stateful resources such as memory are not moved together with the
562 inherent trade-offs between migration and various hot paths in terms
568 resource structure once on start-up. Dynamic adjustments to resource
595 cgroup controllers implement several resource distribution schemes
601 -------
607 work-conserving. Due to the dynamic nature, this model is usually
623 ------
626 Limits can be over-committed - the sum of the limits of children can
631 As limits can be over-committed, all configuration combinations are
640 -----------
645 soft boundaries. Protections can also be over-committed in which case
652 As protections can be over-committed, all configuration combinations
656 "memory.low" implements best-effort memory protection and is an
661 -----------
664 resource. Allocations can't be over-committed - the sum of the
671 As allocations can't be over-committed, some configuration
676 "cpu.rt.max" hard-allocates realtime slices and is an example of this
684 ------
689 New-line separated values
697 (when read-only or multiple values can be written at once)
714 reading; however, controllers may allow omitting later fields or
723 -----------
725 - Settings for a single feature should be contained in a single file.
727 - The root cgroup should be exempt from resource control and thus
730 - The default time unit is microseconds. If a different unit is ever
733 - A parts-per quantity should use a percentage decimal with at least
734 two digit fractional part - e.g. 13.40.
736 - If a controller implements weight based resource distribution, its
742 - If a controller implements an absolute resource guarantee and/or
751 - If a setting has a configurable default value and keyed specific
765 # cat cgroup-example-interface-file
771 # echo 125 > cgroup-example-interface-file
775 # echo "default 125" > cgroup-example-interface-file
779 # echo "8:16 170" > cgroup-example-interface-file
783 # echo "8:0 default" > cgroup-example-interface-file
784 # cat cgroup-example-interface-file
788 - For events which are not very high frequency, an interface file
795 --------------------
800 A read-write single value file which exists on non-root
806 - "domain" : A normal valid domain cgroup.
808 - "domain threaded" : A threaded domain cgroup which is
811 - "domain invalid" : A cgroup which is in an invalid state.
812 It can't be populated or have controllers enabled. It may
815 - "threaded" : A threaded cgroup which is a member of a
822 A read-write new-line separated values file which exists on
826 the cgroup one-per-line. The PIDs are not ordered and the
835 - It must have write access to the "cgroup.procs" file.
837 - It must have write access to the "cgroup.procs" file of the
840 When delegating a sub-hierarchy, write access to this file
848 A read-write new-line separated values file which exists on
852 the cgroup one-per-line. The TIDs are not ordered and the
861 - It must have write access to the "cgroup.threads" file.
863 - The cgroup that the thread is currently in must be in the
866 - It must have write access to the "cgroup.procs" file of the
869 When delegating a sub-hierarchy, write access to this file
872 cgroup.controllers
873 A read-only space separated values file which exists on all
876 It shows space separated list of all controllers available to
877 the cgroup. The controllers are not ordered.
880 A read-write space separated values file which exists on all
883 When read, it shows space separated list of the controllers
887 Space separated list of controllers prefixed with '+' or '-'
888 can be written to enable or disable controllers. A controller
889 name prefixed with '+' enables the controller and '-'
895 A read-only flat-keyed file which exists on non-root cgroups.
907 A read-write single value files. The default is "max".
914 A read-write single value files. The default is "max".
921 A read-only flat-keyed file with the following entries:
939 A read-write single value file which exists on non-root cgroups.
962 create new sub-cgroups.
965 A write-only single value file which exists in non-root cgroups.
977 the whole thread-group.
980 A read-write single value file that allowed values are "0" and "1".
984 Writing "1" to the file will re-enable the cgroup PSI accounting.
992 This may cause non-negligible overhead for some workloads when under
994 be used to disable PSI accounting in the non-leaf cgroups.
997 A read-write nested-keyed file.
1002 Controllers chapter
1005 .. _cgroup-v2-cpu:
1008 ---
1010 The "cpu" controllers regulates distribution of CPU cycles. This
1036 A read-only flat-keyed file.
1041 - usage_usec
1042 - user_usec
1043 - system_usec
1047 - nr_periods
1048 - nr_throttled
1049 - throttled_usec
1050 - nr_bursts
1051 - burst_usec
1054 A read-write single value file which exists on non-root
1060 A read-write single value file which exists on non-root
1063 The nice value is in the range [-20, 19].
1072 A read-write two value file which exists on non-root cgroups.
1084 A read-write single value file which exists on non-root
1090 A read-write nested-keyed file.
1096 A read-write single value file which exists on non-root cgroups.
1111 A read-write single value file which exists on non-root cgroups.
1123 Memory section in Controllers
1124 ------
1126 The "memory" controller regulates distribution of memory. Memory is
1128 intertwining between memory usage and reclaim pressure and the
1129 stateful nature of memory, the distribution model is relatively
1132 While not completely water-tight, all major memory usages by a given
1133 cgroup are tracked so that the total memory consumption can be
1135 following types of memory usages are tracked.
1137 - Userland memory - page cache and anonymous memory.
1139 - Kernel data structures such as dentries and inodes.
1141 - TCP socket buffers.
1146 Memory Interface Files argument
1149 All memory amounts are in bytes. If a value which is not aligned to
1153 memory.current
1154 A read-only single value file which exists on non-root
1157 The total amount of memory currently being used by the cgroup
1160 memory.min
1161 A read-write single value file which exists on non-root
1164 Hard memory protection. If the memory usage of a cgroup
1165 is within its effective min boundary, the cgroup's memory
1167 unprotected reclaimable memory available, OOM killer
1173 Effective min boundary is limited by memory.min values of
1174 all ancestor cgroups. If there is memory.min overcommitment
1175 (child cgroup or cgroups are requiring more protected memory
1178 actual memory usage below memory.min.
1180 Putting more memory than generally available under this
1183 If a memory cgroup is not populated with processes,
1184 its memory.min is ignored.
1186 memory.low
1187 A read-write single value file which exists on non-root
1190 Best-effort memory protection. If the memory usage of a
1192 memory won't be reclaimed unless there is no reclaimable
1193 memory available in unprotected cgroups.
1199 Effective low boundary is limited by memory.low values of
1200 all ancestor cgroups. If there is memory.low overcommitment
1201 (child cgroup or cgroups are requiring more protected memory
1204 actual memory usage below memory.low.
1206 Putting more memory than generally available under this
1209 memory.high
1210 A read-write single value file which exists on non-root
1213 Memory usage throttle limit. This is the main mechanism to
1214 control memory usage of a cgroup. If a cgroup's usage goes
1221 memory.max
1222 A read-write single value file which exists on non-root
1225 Memory usage hard limit. This is the final protection
1226 mechanism. If a cgroup's memory usage reaches this limit and
1231 In default configuration regular 0-order allocations always
1236 as -ENOMEM or silently ignore in cases like disk readahead.
1242 memory.reclaim
1243 A write-only nested-keyed file which exists for all cgroups.
1245 This is a simple interface to trigger memory reclaim in the
1253 echo "1G" > memory.reclaim
1257 type of memory to reclaim from (anon, file, ..).
1261 specified amount, -EAGAIN is returned.
1264 interface) is not meant to indicate memory pressure on the
1265 memory cgroup. Therefore socket memory balancing triggered by
1266 the memory reclaim normally is not exercised in this case.
1268 reclaim induced by memory.reclaim.
1270 memory.peak
1271 A read-only single value file which exists on non-root
1274 The max memory usage recorded for the cgroup and its
1277 memory.oom.group
1278 A read-write single value file which exists on non-root
1284 (if the memory cgroup is not a leaf cgroup) are killed
1288 Tasks with the OOM protection (oom_score_adj set to -1000)
1293 memory.oom.group values of ancestor cgroups.
1295 memory.events
1296 A read-only flat-keyed file which exists on non-root cgroups.
1304 memory.events.local.
1308 high memory pressure even though its usage is under
1310 boundary is over-committed.
1314 throttled and routed to perform direct memory reclaim
1315 because the high memory boundary was exceeded. For a
1316 cgroup whose memory usage is capped by the high limit
1317 rather than global memory pressure, this event's
1321 The number of times the cgroup's memory usage was
1326 The number of time the cgroup's memory usage was
1330 considered as an option, e.g. for failed high-order
1340 memory.events.local
1341 Similar to memory.events but the fields in the file are local
1345 memory.stat
1346 A read-only flat-keyed file which exists on non-root cgroups.
1348 This breaks down the cgroup's memory footprint into different
1349 types of memory, type-specific details, and other information
1350 on the state and past events of the memory management system.
1352 All memory amounts are in bytes.
1358 If the entry has no per-node counter (or not show in the
1359 memory.numa_stat). We use 'npn' (non-per-node) as the tag
1360 to indicate that it will not show in the memory.numa_stat.
1363 Amount of memory used in anonymous mappings such as
1367 Amount of memory used to cache filesystem data,
1368 including tmpfs and shared memory.
1371 Amount of total kernel memory, including
1373 addition to other kernel memory use cases.
1376 Amount of memory allocated to kernel stacks.
1379 Amount of memory allocated for page tables.
1382 Amount of memory allocated for secondary page tables,
1387 Amount of memory used for storing per-cpu kernel
1391 Amount of memory used in network transmission buffers
1394 Amount of memory used for vmap backed memory.
1397 Amount of cached filesystem data that is swap-backed,
1401 Amount of memory consumed by the zswap compression backend.
1404 Amount of application memory swapped out to zswap.
1418 Amount of swap cached in memory. The swapcache is accounted
1419 against both memory and swap usage.
1422 Amount of memory used in anonymous mappings backed by
1434 Amount of memory, swap-backed and filesystem-backed,
1435 on the internal memory management lists used by the
1439 memory management lists), inactive_foo + active_foo may not be equal to
1440 the value for the foo counter, since the foo counter is type-based, not
1441 list-based.
1448 Part of "slab" that cannot be reclaimed on memory
1452 Amount of memory used for storing in-kernel data
1513 Amount of pages postponed to be freed under memory pressure
1528 memory.numa_stat
1529 A read-only nested-keyed file which exists on non-root cgroups.
1531 This breaks down the cgroup's memory footprint into different
1532 types of memory, type-specific details, and other information
1533 per node on the state of the memory management system.
1541 All memory amounts are in bytes.
1543 The output format of memory.numa_stat is::
1551 The entries can refer to the memory.stat.
1553 memory.swap.current
1554 A read-only single value file which exists on non-root
1560 memory.swap.high
1561 A read-write single value file which exists on non-root
1566 allow userspace to implement custom out-of-memory procedures.
1570 during regular operation. Compare to memory.swap.max, which
1572 continue unimpeded as long as other memory can be reclaimed.
1576 memory.swap.max
1577 A read-write single value file which exists on non-root
1581 limit, anonymous memory of the cgroup will not be swapped out.
1583 memory.swap.events
1584 A read-only flat-keyed file which exists on non-root cgroups.
1600 because of running out of swap system-wide or max
1606 reduces the impact on the workload and memory management.
1608 memory.zswap.current
1609 A read-only single value file which exists on non-root
1612 The total amount of memory consumed by the zswap compression
1615 memory.zswap.max
1616 A read-write single value file which exists on non-root
1623 memory.pressure
1624 A read-only nested-keyed file.
1626 Shows pressure stall information for memory. See
1633 "memory.high" is the main mechanism to control memory usage.
1634 Over-committing on high limit (sum of high limits > available memory)
1635 and letting global memory pressure to distribute memory according to
1641 more memory or terminating the workload.
1643 Determining whether a cgroup has enough memory is not trivial as
1644 memory usage doesn't indicate whether the workload can benefit from
1645 more memory. For example, a workload which writes data received from
1646 network to a file can use all available memory but can also operate as
1647 performant with a small amount of memory. A measure of memory
1648 pressure - how much the workload is being impacted due to lack of
1649 memory - is necessary to determine whether a workload needs more
1650 memory; unfortunately, memory pressure monitoring mechanism isn't
1654 Memory Ownership argument
1657 A memory area is charged to the cgroup which instantiated it and stays
1659 to a different cgroup doesn't move the memory usages that it
1662 A memory area may be used by processes belonging to different cgroups.
1663 To which cgroup the area will be charged is in-deterministic; however,
1664 over time, the memory area is likely to end up in a cgroup which has
1665 enough memory allowance to avoid high reclaim pressure.
1667 If a cgroup sweeps a considerable amount of memory which is expected
1669 POSIX_FADV_DONTNEED to relinquish the ownership of memory areas
1670 belonging to the affected files to ensure correct memory ownership.
1674 --
1679 only if cfq-iosched is in use and neither scheme is available for
1680 blk-mq devices.
1687 A read-only nested-keyed file.
1707 A read-write nested-keyed file which exists only on the root
1719 enable Weight-based control enable
1751 devices which show wide temporary behavior changes - e.g. a
1762 A read-write nested-keyed file which exists only on the root
1775 model The cost model in use - "linear"
1801 generate device-specific coefficients.
1804 A read-write flat-keyed file which exists on non-root cgroups.
1824 A read-write nested-keyed file which exists on non-root
1838 When writing, any number of nested key-value pairs can be
1863 A read-only nested-keyed file.
1874 mechanism. Writeback sits between the memory and IO domains and
1875 regulates the proportion of dirty memory by balancing dirtying and
1878 The io controller, in conjunction with the memory controller,
1879 implements control of page cache writeback IOs. The memory controller
1880 defines the memory domain that dirty memory ratio is calculated and
1882 writes out dirty pages for the memory domain. Both system-wide and
1883 per-cgroup dirty memory states are examined and the more restrictive
1891 There are inherent differences in memory and writeback management
1892 which affects how cgroup ownership is tracked. Memory is tracked per
1897 As cgroup ownership for memory is tracked per page, there can be pages
1909 As memory controller assigns page ownership on the first use and
1920 amount of available memory capped by limits imposed by the
1921 memory controller and system-wide clean memory.
1925 total available memory and applied the same way as
1954 your real setting, setting at 10-15% higher than the value in io.stat.
1964 - Queue depth throttling. This is the number of outstanding IO's a group is
1968 - Artificial delay induction. There are certain types of IO that cannot be
1986 This takes a similar format as the other controllers.
2015 no-change
2018 none-to-rt
2023 restrict-to-be
2034 +-------------+---+
2035 | no-change | 0 |
2036 +-------------+---+
2037 | none-to-rt | 1 |
2038 +-------------+---+
2039 | rt-to-be | 2 |
2040 +-------------+---+
2041 | all-to-idle | 3 |
2042 +-------------+---+
2046 +-------------------------------+---+
2048 +-------------------------------+---+
2049 | IOPRIO_CLASS_RT (real-time) | 1 |
2050 +-------------------------------+---+
2052 +-------------------------------+---+
2054 +-------------------------------+---+
2058 - Translate the I/O priority class policy into a number.
2059 - Change the request I/O priority class into the maximum of the I/O priority
2063 ---
2070 controllers cannot prevent, thus warranting its own controller. For
2072 hitting memory restrictions.
2082 A read-write single value file which exists on non-root
2088 A read-only single value file which exists on all cgroups.
2098 through fork() or clone(). These will return -EAGAIN if the creation
2103 ------
2106 the CPU and memory node placement of tasks to only the resources
2110 memory placement to reduce cross-node memory access and contention
2114 cannot use CPUs or memory nodes not allowed in its parent.
2121 A read-write multiple values file which exists on non-root
2122 cpuset-enabled cgroups.
2129 The CPU numbers are comma-separated numbers or ranges.
2133 0-4,6,8-10
2136 setting as the nearest cgroup ancestor with a non-empty
2143 A read-only multiple values file which exists on all
2144 cpuset-enabled cgroups.
2160 A read-write multiple values file which exists on non-root
2161 cpuset-enabled cgroups.
2163 It lists the requested memory nodes to be used by tasks within
2164 this cgroup. The actual list of memory nodes granted, however,
2166 from the requested memory nodes.
2168 The memory node numbers are comma-separated numbers or ranges.
2172 0-1,3
2175 setting as the nearest cgroup ancestor with a non-empty
2176 "cpuset.mems" or all the available memory nodes if none
2180 and won't be affected by any memory nodes hotplug events.
2182 Setting a non-empty value to "cpuset.mems" causes memory of
2184 they are currently using memory outside of the designated nodes.
2186 There is a cost for this memory migration. The migration
2187 may not be complete and some memory pages may be left behind.
2194 A read-only multiple values file which exists on all
2195 cpuset-enabled cgroups.
2197 It lists the onlined memory nodes that are actually granted to
2198 this cgroup by its parent. These memory nodes are allowed to
2201 If "cpuset.mems" is empty, it shows all the memory nodes from the
2204 the memory nodes listed in "cpuset.mems" can be granted. In this
2207 Its value will be affected by memory nodes hotplug events.
2210 A read-write single value file which exists on non-root
2211 cpuset-enabled cgroups. This flag is owned by the parent cgroup
2217 "member" Non-root member of a partition
2223 cannot be changed. All other non-root cgroups start out as
2243 two possible states - valid or invalid. An invalid partition
2254 "member" Non-root member of a partition
2286 A valid non-root parent partition may distribute out all its CPUs
2306 -----------------
2317 on the return value the attempt will succeed or fail with -EPERM.
2322 If the program returns 0, the attempt fails with -EPERM, otherwise it
2330 ----
2339 A readwrite nested-keyed file that exists for all the cgroups
2360 A read-only file that describes current resource usage.
2369 -------
2386 A read-only flat-keyed file which exists on non-root cgroups.
2397 Similar to memory.numa_stat, it shows the numa information of the
2399 use hugetlb pages are included. The per-node values are in bytes.
2402 ----
2424 A read-only flat-keyed file shown only in the root cgroup. It shows
2433 A read-only flat-keyed file shown in the non-root cgroups. It shows
2441 A read-write flat-keyed file shown in the non root cgroups. Allowed
2460 A read-only flat-keyed file which exists on non-root cgroups. The
2478 ------
2489 Non-normative information
2490 -------------------------
2506 appropriately so the neutral - nice 0 - value is 100 instead of 1024).
2522 ------
2541 The path '/batchjobs/container_id1' can be considered as system-data
2546 # ls -l /proc/self/ns/cgroup
2547 lrwxrwxrwx 1 root root 0 2014-07-15 10:37 /proc/self/ns/cgroup -> cgroup:[4026531835]
2553 # ls -l /proc/self/ns/cgroup
2554 lrwxrwxrwx 1 root root 0 2014-07-15 10:35 /proc/self/ns/cgroup -> cgroup:[4026532183]
2558 When some thread from a multi-threaded process unshares its cgroup
2570 ------------------
2581 # ~/unshare -c # unshare cgroupns in some cgroup
2589 Each process gets its namespace-specific view of "/proc/$PID/cgroup"
2620 ----------------------
2649 ---------------------------------
2652 running inside a non-init cgroup namespace::
2654 # mount -t cgroup2 none $MOUNT_POINT
2661 the view of cgroup hierarchy by namespace-private cgroupfs mount
2670 controllers are not covered.
2674 --------------------------------
2677 address_space_operations->writepage[s]() to annotate bio's using the
2694 super_block by setting SB_I_CGROUPWB in ->s_iflags. This allows for
2711 - Multiple hierarchies including named ones are not supported.
2713 - All v1 mount options are not supported.
2715 - The "tasks" file is removed and "cgroup.procs" is not sorted.
2717 - "cgroup.clone_children" is removed.
2719 - /proc/cgroups is meaningless for v2. Use "cgroup.controllers" file
2727 --------------------
2730 hierarchy could host any number of controllers. While this seemed to
2734 type controllers such as freezer which can be useful in all
2736 the fact that controllers couldn't be moved to another hierarchy once
2737 hierarchies were populated. Another issue was that all controllers
2742 In practice, these issues heavily limited which controllers could be
2745 as the cpu and cpuacct controllers, made sense to be put on the same
2753 used in general and what controllers was able to do.
2759 addition of controllers which existed only to identify membership,
2764 topologies of hierarchies other controllers might be on, each
2765 controller had to assume that all other controllers were attached to
2767 least very cumbersome, for controllers to cooperate with each other.
2769 In most use cases, putting controllers on hierarchies which are
2774 controllers. For example, a given configuration might not care about
2775 how memory is distributed beyond a certain level while still wanting
2780 ------------------
2783 This didn't make sense for some controllers and those controllers
2788 Generally, in-process knowledge is available only to the process
2789 itself; thus, unlike service-level organization of processes,
2796 sub-hierarchies and control resource distributions along them. This
2797 effectively raised cgroup to the status of a syscall-like API exposed
2807 that the process would actually be operating on its own sub-hierarchy.
2809 cgroup controllers implemented a number of knobs which would never be
2811 system-management pseudo filesystem. cgroup ended up with interface
2814 individual applications through the ill-defined delegation mechanism
2824 -------------------------------------------
2830 settle it. Different controllers did different things.
2835 cycles and the number of internal threads fluctuated - the ratios
2849 The memory controller didn't have a way to control what happened
2851 clearly defined. There were attempts to add ad-hoc behaviors and
2855 Multiple controllers struggled with internal tasks and came up with
2865 ----------------------
2869 was how an empty cgroup was notified - a userland helper binary was
2872 to in-kernel event delivery filtering mechanism further complicating
2876 controllers completely ignoring hierarchical organization and treating
2878 cgroup. Some controllers exposed a large amount of inconsistent
2881 There also was no consistency across controllers. When a new cgroup
2882 was created, some controllers defaulted to not imposing extra
2890 controllers so that they expose minimal and consistent interfaces.
2894 ------------------------------
2896 Memory subsection
2901 global reclaim prefers is opt-in, rather than opt-out. The costs for
2911 becomes self-defeating.
2913 The memory.low boundary on the other hand is a top-down allocated
2922 available memory. The memory consumption of workloads varies during
2930 The memory.high boundary on the other hand can be set much more
2936 and make corrections until the minimal memory footprint that still
2943 system than killing the group. Otherwise, memory.max is there to
2947 Setting the original memory.limit_in_bytes below the current usage was
2949 limit setting to fail. memory.max on the other hand will first set the
2951 new limit is met - or the task writing to memory.max is killed.
2953 The combined memory+swap accounting and limiting is replaced by real
2956 The main argument for a combined memory+swap facility in the original
2958 able to swap all anonymous memory of a child group, regardless of the
2960 groups can sabotage swapping by other means - such as referencing its
2961 anonymous memory in a tight loop - and an admin can not assume full
2966 that cgroup controllers should account and limit specific physical