cgroup-v2.rst - OpenGrok cross reference for /Linux-v5.4/Documentation/admin-guide/cgroup-v2.rst

Lines Matching refs:cgroup
9 conventions of cgroup v2.  It describes all userland-visible aspects
10 of cgroup including core and specific controller behaviors.  All
12 v1 is available under Documentation/admin-guide/cgroup-v1/.
18      1-2. What is cgroup?
67        5-N-1. CPU controller root cgroup process behaviour
68        5-N-2. IO controller root cgroup process behaviour
92 "cgroup" stands for "control group" and is never capitalized.  The
94 qualifier as in "cgroup controllers".  When explicitly referring to
98 What is cgroup?
101 cgroup is a mechanism to organize processes hierarchically and
105 cgroup is largely composed of two parts - the core and controllers.
106 cgroup core is primarily responsible for hierarchically organizing
107 processes.  A cgroup controller is usually responsible for
113 to one and only one cgroup.  All threads of a process belong to the
114 same cgroup.  On creation, all processes are put in the cgroup that
116 to another cgroup.  Migration of a process doesn't affect already
120 disabled selectively on a cgroup.  All controller behaviors are
121 hierarchical - if a controller is enabled on a cgroup, it affects all
123 sub-hierarchy of the cgroup.  When a controller is enabled on a nested
124 cgroup, it always restricts the resource distribution further.  The
135 Unlike v1, cgroup v2 has only single hierarchy.  The cgroup v2
148 is no longer referenced in its current hierarchy.  Because per-cgroup
165 automount the v1 cgroup filesystem and so hijack all controllers
170 cgroup v2 currently supports the following mount options.
174 	Consider cgroup namespaces as delegation boundaries.  This
182         Only populate memory.events with data for the current cgroup,
196 Initially, only the root cgroup exists to which all processes belong.
197 A child cgroup can be created by creating a sub-directory::
201 A given cgroup may have multiple child cgroups forming a tree
202 structure.  Each cgroup has a read-writable interface file
203 "cgroup.procs".  When read, it lists the PIDs of all processes which
204 belong to the cgroup one-per-line.  The PIDs are not ordered and the
206 another cgroup and then back or the PID got recycled while reading.
208 A process can be migrated into a cgroup by writing its PID to the
209 target cgroup's "cgroup.procs" file.  Only one process can be migrated
215 cgroup that the forking process belongs to at the time of the
216 operation.  After exit, a process stays associated with the cgroup
218 zombie process does not appear in "cgroup.procs" and thus can't be
219 moved to another cgroup.
221 A cgroup which doesn't have any children or live processes can be
222 destroyed by removing the directory.  Note that a cgroup which doesn't
228 "/proc/$PID/cgroup" lists a process's cgroup membership.  If legacy
229 cgroup is in use in the system, this file may contain multiple lines,
230 one for each hierarchy.  The entry for cgroup v2 is always in the
233   # cat /proc/842/cgroup
235   0::/test-cgroup/test-cgroup-nested
237 If the process becomes a zombie and the cgroup it was associated with
240   # cat /proc/842/cgroup
242   0::/test-cgroup/test-cgroup-nested (deleted)
248 cgroup v2 supports thread granularity for a subset of controllers to
251 process belong to the same cgroup, which also serves as the resource
259 Marking a cgroup threaded makes it join the resource domain of its
260 parent as a threaded cgroup.  The parent may be another threaded
261 cgroup whose resource domain is further up in the hierarchy.  The root
271 As the threaded domain cgroup hosts all the domain resource
275 root cgroup is not subject to no internal process constraint, it can
278 The current operation mode or type of the cgroup is shown in the
279 "cgroup.type" file which indicates whether the cgroup is a normal
281 or a threaded cgroup.
283 On creation, a cgroup is always a domain cgroup and can be made
284 threaded by writing "threaded" to the "cgroup.type" file.  The
287   # echo threaded > cgroup.type
289 Once threaded, the cgroup can't be made a domain again.  To enable the
292 - As the cgroup will join the parent's resource domain.  The parent
293   must either be a valid (threaded) domain or a threaded cgroup.
299 Topology-wise, a cgroup can be in an invalid state.  Please consider
306 threaded cgroup.  "cgroup.type" file will report "domain (invalid)" in
310 A domain cgroup is turned into a threaded domain when one of its child
311 cgroup becomes threaded or threaded controllers are enabled in the
312 "cgroup.subtree_control" file while there are processes in the cgroup.
316 When read, "cgroup.threads" contains the list of the thread IDs of all
317 threads in the cgroup.  Except that the operations are per-thread
318 instead of per-process, "cgroup.threads" has the same format and
319 behaves the same way as "cgroup.procs".  While "cgroup.threads" can be
320 written to in any cgroup, as it can only move threads inside the same
324 The threaded domain cgroup serves as the resource domain for the whole
326 all the processes are considered to be in the threaded domain cgroup.
327 "cgroup.procs" in a threaded domain cgroup contains the PIDs of all
329 However, "cgroup.procs" can be written to from anywhere in the subtree
330 to migrate all threads of the matching process to the cgroup.
335 threads in the cgroup and its descendants.  All consumptions which
336 aren't tied to a specific thread belong to the threaded domain cgroup.
340 between threads in a non-leaf cgroup and its child cgroups.  Each
347 Each non-root cgroup has a "cgroup.events" file which contains
348 "populated" field indicating whether the cgroup's sub-hierarchy has
350 the cgroup and its descendants; otherwise, 1.  poll and [id]notify
356 in each cgroup::
363 file modified events will be generated on the "cgroup.events" files of
373 Each cgroup has a "cgroup.controllers" file which lists all
374 controllers available for the cgroup to enable::
376   # cat cgroup.controllers
380 disabled by writing to the "cgroup.subtree_control" file::
382   # echo "+cpu +memory -io" > cgroup.subtree_control
384 Only controllers which are listed in "cgroup.controllers" can be
389 Enabling a controller in a cgroup indicates that the distribution of
403 the cgroup's children, enabling it creates the controller's interface
409 "cgroup." are owned by the parent rather than the cgroup itself.
415 Resources are distributed top-down and a cgroup can further distribute
417 parent.  This means that all non-root "cgroup.subtree_control" files
419 "cgroup.subtree_control" file.  A controller can be enabled only if
430 controllers enabled in their "cgroup.subtree_control" files.
437 The root cgroup is exempt from this restriction.  Root contains
440 controllers.  How resource consumption in the root cgroup is governed
446 enabled controller in the cgroup's "cgroup.subtree_control".  This is
448 populated cgroup.  To control resource distribution of a cgroup, the
449 cgroup must create children and transfer all its processes to the
450 children before enabling controllers in its "cgroup.subtree_control"
460 A cgroup can be delegated in two ways.  First, to a less privileged
461 user by granting write access of the directory and its "cgroup.procs",
462 "cgroup.threads" and "cgroup.subtree_control" files to the user.
464 cgroup namespace on namespace creation.
470 kernel rejects writes to all files other than "cgroup.procs" and
471 "cgroup.subtree_control" on a namespace root from inside the
482 Currently, cgroup doesn't impose any restrictions on the number of
495 to migrate a target process into a cgroup by writing its PID to the
496 "cgroup.procs" file.
498 - The writer must have write access to the "cgroup.procs" file.
500 - The writer must have write access to the "cgroup.procs" file of the
512   ~ cgroup    ~      \ C01
517 currently in C10 into "C00/cgroup.procs".  U0 has write access to the
518 file; however, the common ancestor of the source cgroup C10 and the
519 destination cgroup C00 is above the points of delegation and U0 would
520 not have write access to its "cgroup.procs" files and thus the write
543 should be assigned to a cgroup according to the system's logical and
552 Interface files for a cgroup and its children cgroups occupy the same
556 All cgroup core interface files are prefixed with "cgroup." and each
564 cgroup doesn't do anything to prevent name collisions and it's the
571 cgroup controllers implement several resource distribution schemes
611 "io.max" limits the maximum BPS and/or IOPS that a cgroup can consume
618 A cgroup is protected upto the configured amount of the resource
639 A cgroup is exclusively allocated a certain amount of a finite
703 - The root cgroup should be exempt from resource control and thus
705   informational files on the root cgroup which end up showing global
743     # cat cgroup-example-interface-file
749     # echo 125 > cgroup-example-interface-file
753     # echo "default 125" > cgroup-example-interface-file
757     # echo "8:16 170" > cgroup-example-interface-file
761     # echo "8:0 default" > cgroup-example-interface-file
762     # cat cgroup-example-interface-file
775 All cgroup core files are prefixed with "cgroup."
777   cgroup.type
782 	When read, it indicates the current type of the cgroup, which
785 	- "domain" : A normal valid domain cgroup.
787 	- "domain threaded" : A threaded domain cgroup which is
790 	- "domain invalid" : A cgroup which is in an invalid state.
792 	  be allowed to become a threaded cgroup.
794 	- "threaded" : A threaded cgroup which is a member of a
797 	A cgroup can be turned into a threaded cgroup by writing
800   cgroup.procs
805 	the cgroup one-per-line.  The PIDs are not ordered and the
807 	to another cgroup and then back or the PID got recycled while
811 	the PID to the cgroup.  The writer should match all of the
814 	- It must have write access to the "cgroup.procs" file.
816 	- It must have write access to the "cgroup.procs" file of the
822 	In a threaded cgroup, reading this file fails with EOPNOTSUPP
824 	supported and moves every thread of the process to the cgroup.
826   cgroup.threads
831 	the cgroup one-per-line.  The TIDs are not ordered and the
833 	another cgroup and then back or the TID got recycled while
837 	TID to the cgroup.  The writer should match all of the
840 	- It must have write access to the "cgroup.threads" file.
842 	- The cgroup that the thread is currently in must be in the
843           same resource domain as the destination cgroup.
845 	- It must have write access to the "cgroup.procs" file of the
851   cgroup.controllers
856 	the cgroup.  The controllers are not ordered.
858   cgroup.subtree_control
864 	cgroup to its children.
873   cgroup.events
880 		1 if the cgroup or its descendants contains any live
883 		1 if the cgroup is frozen; otherwise, 0.
885   cgroup.max.descendants
890 	an attempt to create a new cgroup in the hierarchy will fail.
892   cgroup.max.depth
895 	Maximum allowed descent depth below the current cgroup.
897 	an attempt to create a new child cgroup will fail.
899   cgroup.stat
906 		Total number of dying descendant cgroups. A cgroup becomes
907 		dying after being deleted by a user. The cgroup will remain
911 		A process can't enter a dying cgroup under any circumstances,
912 		a dying cgroup can't revive.
914 		A dying cgroup can consume system resources not exceeding
915 		limits, which were active at the moment of cgroup deletion.
917   cgroup.freeze
921 	Writing "1" to the file causes freezing of the cgroup and all
923 	be stopped and will not run until the cgroup will be explicitly
924 	unfrozen. Freezing of the cgroup may take some time; when this action
925 	is completed, the "frozen" value in the cgroup.events control file
929 	A cgroup can be frozen either by its own settings, or by settings
931 	cgroup will remain frozen.
933 	Processes in the frozen cgroup can be killed by a fatal signal.
934 	They also can enter and leave a frozen cgroup: either by an explicit
935 	move by a user, or if freezing of the cgroup races with fork().
936 	If a process is moved to a frozen cgroup, it stops. If a process is
937 	moved out of a frozen cgroup, it becomes running.
939 	Frozen status of a cgroup doesn't affect any cgroup tree operations:
940 	it's possible to delete a frozen (and empty) cgroup, as well as
963 the root cgroup.  Be aware that system management software may already
965 process, and these processes may need to be moved to the root cgroup
1064 cgroup are tracked so that the total memory consumption can be
1088 	The total amount of memory currently being used by the cgroup
1095 	Hard memory protection.  If the memory usage of a cgroup
1096 	is within its effective min boundary, the cgroup's memory
1106 	(child cgroup or cgroups are requiring more protected memory
1107 	than parent will allow), then each child cgroup will get
1114 	If a memory cgroup is not populated with processes,
1122 	cgroup is within its effective low boundary, the cgroup's
1131 	(child cgroup or cgroups are requiring more protected memory
1132 	than parent will allow), then each child cgroup will get
1144 	control memory usage of a cgroup.  If a cgroup's usage goes
1145 	over the high boundary, the processes of the cgroup are
1156 	mechanism.  If a cgroup's memory usage reaches this limit and
1157 	can't be reduced, the OOM killer is invoked in the cgroup.
1169 	Determines whether the cgroup should be treated as
1171 	all tasks belonging to the cgroup or to its descendants
1172 	(if the memory cgroup is not a leaf cgroup) are killed
1179 	If the OOM killer is invoked in a cgroup, it's not going
1180 	to kill any tasks outside of this cgroup, regardless
1191 	hierarchy. For for the local events at the cgroup level see
1195 		The number of times the cgroup is reclaimed due to
1201 		The number of times processes of the cgroup are
1204 		cgroup whose memory usage is capped by the high limit
1209 		The number of times the cgroup's memory usage was
1211 		fails to bring it down, the cgroup goes to OOM state.
1214 		The number of time the cgroup's memory usage was
1222 		disk readahead.  For now OOM in memory cgroup kills
1230 		The number of processes belonging to this cgroup
1235 	to the cgroup i.e. not hierarchical. The file modified event
1241 	This breaks down the cgroup's memory footprint into different
1363 	The total amount of swap currently being used by the cgroup
1370 	Swap usage hard limit.  If a cgroup's swap usage reaches this
1371 	limit, anonymous memory of the cgroup will not be swapped out.
1380 		The number of times the cgroup's swap usage was about
1410 throttles the offending cgroup, a management agent has ample
1414 Determining whether a cgroup has enough memory is not trivial as
1428 A memory area is charged to the cgroup which instantiated it and stays
1429 charged to the cgroup until the area is released.  Migrating a process
1430 to a different cgroup doesn't move the memory usages that it
1431 instantiated while in the previous cgroup to the new cgroup.
1434 To which cgroup the area will be charged is in-deterministic; however,
1435 over time, the memory area is likely to end up in a cgroup which has
1438 If a cgroup sweeps a considerable amount of memory which is expected
1480 	cgroup.
1535 	cgroup.
1572 	If needed, tools/cgroup/iocost_coef_gen.py can be used to
1583 	the cgroup can use in relation to its siblings.
1655 per-cgroup dirty memory states are examined and the more restrictive
1658 cgroup writeback requires explicit support from the underlying
1659 filesystem.  Currently, cgroup writeback is implemented on ext2, ext4
1661 the root cgroup.
1664 which affects how cgroup ownership is tracked.  Memory is tracked per
1666 inode is assigned to a cgroup and all IO requests to write dirty pages
1667 from the inode are attributed to that cgroup.
1669 As cgroup ownership for memory is tracked per page, there can be pages
1673 cgroup becomes the majority over a certain period of time, switches
1674 the ownership of the inode to that cgroup.
1677 mostly dirtied by a single cgroup even when the main writing cgroup
1687 The sysctl knobs which affect writeback behavior are applied to cgroup
1691 	These ratios apply the same to cgroup writeback with the
1696 	For cgroup writeback, this is calculated into ratio against
1704 This is a cgroup v2 controller for IO workload protection.  You provide a group
1783 The process number controller is used to allow a cgroup to stop any
1787 The number of tasks in a cgroup can be exhausted in ways which other
1808 	The number of processes currently in the cgroup and its
1811 Organisational operations are not blocked by cgroup policies, so it is
1814 processes to the cgroup such that pids.current is larger than
1815 pids.max.  However, it is not possible to violate a cgroup PID policy
1817 of a new process would cause a cgroup policy to be violated.
1825 specified in the cpuset interface files in a task's current cgroup.
1843 	cgroup.  The actual list of CPUs to be granted, however, is
1853 	An empty value indicates that the cgroup is using the same
1854 	setting as the nearest cgroup ancestor with a non-empty
1865 	cgroup by its parent.  These CPUs are allowed to be used by
1866 	tasks within the current cgroup.
1869 	all the CPUs from the parent cgroup that can be available to
1870 	be used by this cgroup.  Otherwise, it should be a subset of
1882 	this cgroup.  The actual list of memory nodes granted, however,
1892 	An empty value indicates that the cgroup is using the same
1893 	setting as the nearest cgroup ancestor with a non-empty
1905 	this cgroup by its parent. These memory nodes are allowed to
1906 	be used by tasks within the current cgroup.
1909 	parent cgroup that will be available to be used by this cgroup.
1918 	cpuset-enabled cgroups.  This flag is owned by the parent cgroup
1926 	When set to be a partition root, the current cgroup is the
1930 	cgroup is always a partition root.
1933 	It can only be set in a cgroup if all the following conditions
1938 	2) The parent cgroup is a partition root.
1946 	effective CPUs of the parent cgroup.  Once it is set, this
1971 	granted by the parent cgroup.
1974 	in "cpuset.cpus" can be granted by the parent cgroup or the
1975 	parent cgroup is no longer a partition root itself.  In this
1978 	The cpu affinity of all the tasks in the cgroup will then be
1998 on top of cgroup BPF. To control access to device files, a user may
2046 	It exists for all the cgroup except root.
2062 always be filtered by cgroup v2 path.  The controller can still be
2073 CPU controller root cgroup process behaviour
2076 When distributing CPU cycles in the root cgroup each thread in this
2077 cgroup is treated as if it was hosted in a separate child cgroup of the
2078 root cgroup. This child cgroup weight is dependent on its thread nice
2086 IO controller root cgroup process behaviour
2089 Root cgroup processes are hosted in an implicit leaf child node.
2091 account as if it was a normal child cgroup of the root cgroup with a
2101 cgroup namespace provides a mechanism to virtualize the view of the
2102 "/proc/$PID/cgroup" file and cgroup mounts.  The CLONE_NEWCGROUP clone
2103 flag can be used with clone(2) and unshare(2) to create a new cgroup
2104 namespace.  The process running inside the cgroup namespace will have
2105 its "/proc/$PID/cgroup" output restricted to cgroupns root.  The
2106 cgroupns root is the cgroup of the process at the time of creation of
2107 the cgroup namespace.
2109 Without cgroup namespace, the "/proc/$PID/cgroup" file shows the
2110 complete path of the cgroup of a process.  In a container setup where
2112 "/proc/$PID/cgroup" file may leak potential system level information
2115   # cat /proc/self/cgroup
2119 and undesirable to expose to the isolated processes.  cgroup namespace
2121 creating a cgroup namespace, one would see::
2123   # ls -l /proc/self/ns/cgroup
2124   lrwxrwxrwx 1 root root 0 2014-07-15 10:37 /proc/self/ns/cgroup -> cgroup:[4026531835]
2125   # cat /proc/self/cgroup
2130   # ls -l /proc/self/ns/cgroup
2131   lrwxrwxrwx 1 root root 0 2014-07-15 10:35 /proc/self/ns/cgroup -> cgroup:[4026532183]
2132   # cat /proc/self/cgroup
2135 When some thread from a multi-threaded process unshares its cgroup
2140 A cgroup namespace is alive as long as there are processes inside or
2141 mounts pinning it.  When the last usage goes away, the cgroup
2149 The 'cgroupns root' for a cgroup namespace is the cgroup in which the
2151 /batchjobs/container_id1 cgroup calls unshare, cgroup
2153 init_cgroup_ns, this is the real root ('/') cgroup.
2155 The cgroupns root cgroup does not change even if the namespace creator
2156 process later moves to a different cgroup::
2158   # ~/unshare -c # unshare cgroupns in some cgroup
2159   # cat /proc/self/cgroup
2162   # echo 0 > sub_cgrp_1/cgroup.procs
2163   # cat /proc/self/cgroup
2166 Each process gets its namespace-specific view of "/proc/$PID/cgroup"
2168 Processes running inside the cgroup namespace will be able to see
2169 cgroup paths (in /proc/self/cgroup) only inside their root cgroup.
2174   # echo 7353 > sub_cgrp_1/cgroup.procs
2175   # cat /proc/7353/cgroup
2178 From the initial cgroup namespace, the real cgroup path will be
2181   $ cat /proc/7353/cgroup
2184 From a sibling cgroup namespace (that is, a namespace rooted at a
2185 different cgroup), the cgroup path relative to its own cgroup
2186 namespace root will be shown.  For instance, if PID 7353's cgroup
2189   # cat /proc/7353/cgroup
2193 its relative to the cgroup namespace root of the caller.
2199 Processes inside a cgroup namespace can move into and out of the
2205   # cat /proc/7353/cgroup
2207   # echo 7353 > batchjobs/container_id2/cgroup.procs
2208   # cat /proc/7353/cgroup
2211 Note that this kind of setup is not encouraged.  A task inside cgroup
2214 setns(2) to another cgroup namespace is allowed when:
2217 (b) the process has CAP_SYS_ADMIN against the target cgroup
2220 No implicit cgroup changes happen with attaching to another cgroup
2222 process under the target cgroup namespace root.
2228 Namespace specific cgroup hierarchy can be mounted by a process
2229 running inside a non-init cgroup namespace::
2233 This will mount the unified cgroup hierarchy with cgroupns root as the
2237 The virtualization of /proc/self/cgroup file combined with restricting
2238 the view of cgroup hierarchy by namespace-private cgroupfs mount
2239 provides a properly isolated cgroup view inside the container.
2246 where interacting with cgroup is necessary.  cgroup core and
2253 A filesystem can support cgroup writeback by updating
2259 	associates the bio with the inode's owner cgroup and the
2270 With writeback bio's annotated, cgroup support can be enabled per
2272 selective disabling of cgroup writeback support which is helpful when
2276 wbc_init_bio() binds the specified bio to its cgroup.  Depending on
2292 - The "tasks" file is removed and "cgroup.procs" is not sorted.
2294 - "cgroup.clone_children" is removed.
2296 - /proc/cgroups is meaningless for v2.  Use "cgroup.controllers" file
2306 cgroup v1 allowed an arbitrary number of hierarchies and each
2328 It greatly complicated cgroup core implementation but more importantly
2329 the support for multiple hierarchies restricted how cgroup could be
2333 that a thread's cgroup membership couldn't be described in finite
2359 cgroup v1 allowed threads of a process to belong to different cgroups.
2370 cgroup v1 had an ambiguously defined delegation model which got abused
2374 effectively raised cgroup to the status of a syscall-like API exposed
2377 First of all, cgroup has a fundamentally inadequate interface to be
2379 extract the path on the target hierarchy from /proc/self/cgroup,
2386 cgroup controllers implemented a number of knobs which would never be
2388 system-management pseudo filesystem.  cgroup ended up with interface
2392 effectively abusing cgroup as a shortcut to implementing public APIs
2403 cgroup v1 allowed threads to be in any cgroups which created an
2404 interesting problem where threads belonging to a parent cgroup and its
2410 mapped nice levels to cgroup weights.  This worked for some cases but
2419 cgroup to host the threads.  The hidden leaf had its own copies of all
2435 made cgroup as a whole highly inconsistent.
2437 This clearly is a problem which needs to be addressed from cgroup core
2444 cgroup v1 grew without oversight and developed a large number of
2445 idiosyncrasies and inconsistencies.  One issue on the cgroup core side
2446 was how an empty cgroup was notified - a userland helper binary was
2455 cgroup.  Some controllers exposed a large amount of inconsistent
2458 There also was no consistency across controllers.  When a new cgroup
2466 cgroup v2 establishes common conventions where appropriate and updates
2491 reserve.  A cgroup enjoys reclaim protection when it's within its
2534 cgroup design was that global or parental pressure would always be
2543 that cgroup controllers should account and limit specific physical