cgroup-v2.rst - OpenGrok cross reference for /Linux-v5.10/Documentation/admin-guide/cgroup-v2.rst

Lines Matching refs:cgroup
9 conventions of cgroup v2.  It describes all userland-visible aspects
10 of cgroup including core and specific controller behaviors.  All
12 v1 is available under :ref:`Documentation/admin-guide/cgroup-v1/index.rst <cgroup-v1>`.
18      1-2. What is cgroup?
69        5-N-1. CPU controller root cgroup process behaviour
70        5-N-2. IO controller root cgroup process behaviour
94 "cgroup" stands for "control group" and is never capitalized.  The
96 qualifier as in "cgroup controllers".  When explicitly referring to
100 What is cgroup?
103 cgroup is a mechanism to organize processes hierarchically and
107 cgroup is largely composed of two parts - the core and controllers.
108 cgroup core is primarily responsible for hierarchically organizing
109 processes.  A cgroup controller is usually responsible for
115 to one and only one cgroup.  All threads of a process belong to the
116 same cgroup.  On creation, all processes are put in the cgroup that
118 to another cgroup.  Migration of a process doesn't affect already
122 disabled selectively on a cgroup.  All controller behaviors are
123 hierarchical - if a controller is enabled on a cgroup, it affects all
125 sub-hierarchy of the cgroup.  When a controller is enabled on a nested
126 cgroup, it always restricts the resource distribution further.  The
137 Unlike v1, cgroup v2 has only single hierarchy.  The cgroup v2
150 is no longer referenced in its current hierarchy.  Because per-cgroup
167 automount the v1 cgroup filesystem and so hijack all controllers
172 cgroup v2 currently supports the following mount options.
176 	Consider cgroup namespaces as delegation boundaries.  This
184         Only populate memory.events with data for the current cgroup,
209 Initially, only the root cgroup exists to which all processes belong.
210 A child cgroup can be created by creating a sub-directory::
214 A given cgroup may have multiple child cgroups forming a tree
215 structure.  Each cgroup has a read-writable interface file
216 "cgroup.procs".  When read, it lists the PIDs of all processes which
217 belong to the cgroup one-per-line.  The PIDs are not ordered and the
219 another cgroup and then back or the PID got recycled while reading.
221 A process can be migrated into a cgroup by writing its PID to the
222 target cgroup's "cgroup.procs" file.  Only one process can be migrated
228 cgroup that the forking process belongs to at the time of the
229 operation.  After exit, a process stays associated with the cgroup
231 zombie process does not appear in "cgroup.procs" and thus can't be
232 moved to another cgroup.
234 A cgroup which doesn't have any children or live processes can be
235 destroyed by removing the directory.  Note that a cgroup which doesn't
241 "/proc/$PID/cgroup" lists a process's cgroup membership.  If legacy
242 cgroup is in use in the system, this file may contain multiple lines,
243 one for each hierarchy.  The entry for cgroup v2 is always in the
246   # cat /proc/842/cgroup
248   0::/test-cgroup/test-cgroup-nested
250 If the process becomes a zombie and the cgroup it was associated with
253   # cat /proc/842/cgroup
255   0::/test-cgroup/test-cgroup-nested (deleted)
261 cgroup v2 supports thread granularity for a subset of controllers to
264 process belong to the same cgroup, which also serves as the resource
272 Marking a cgroup threaded makes it join the resource domain of its
273 parent as a threaded cgroup.  The parent may be another threaded
274 cgroup whose resource domain is further up in the hierarchy.  The root
284 As the threaded domain cgroup hosts all the domain resource
288 root cgroup is not subject to no internal process constraint, it can
291 The current operation mode or type of the cgroup is shown in the
292 "cgroup.type" file which indicates whether the cgroup is a normal
294 or a threaded cgroup.
296 On creation, a cgroup is always a domain cgroup and can be made
297 threaded by writing "threaded" to the "cgroup.type" file.  The
300   # echo threaded > cgroup.type
302 Once threaded, the cgroup can't be made a domain again.  To enable the
305 - As the cgroup will join the parent's resource domain.  The parent
306   must either be a valid (threaded) domain or a threaded cgroup.
312 Topology-wise, a cgroup can be in an invalid state.  Please consider
319 threaded cgroup.  "cgroup.type" file will report "domain (invalid)" in
323 A domain cgroup is turned into a threaded domain when one of its child
324 cgroup becomes threaded or threaded controllers are enabled in the
325 "cgroup.subtree_control" file while there are processes in the cgroup.
329 When read, "cgroup.threads" contains the list of the thread IDs of all
330 threads in the cgroup.  Except that the operations are per-thread
331 instead of per-process, "cgroup.threads" has the same format and
332 behaves the same way as "cgroup.procs".  While "cgroup.threads" can be
333 written to in any cgroup, as it can only move threads inside the same
337 The threaded domain cgroup serves as the resource domain for the whole
339 all the processes are considered to be in the threaded domain cgroup.
340 "cgroup.procs" in a threaded domain cgroup contains the PIDs of all
342 However, "cgroup.procs" can be written to from anywhere in the subtree
343 to migrate all threads of the matching process to the cgroup.
348 threads in the cgroup and its descendants.  All consumptions which
349 aren't tied to a specific thread belong to the threaded domain cgroup.
353 between threads in a non-leaf cgroup and its child cgroups.  Each
360 Each non-root cgroup has a "cgroup.events" file which contains
361 "populated" field indicating whether the cgroup's sub-hierarchy has
363 the cgroup and its descendants; otherwise, 1.  poll and [id]notify
369 in each cgroup::
376 file modified events will be generated on the "cgroup.events" files of
386 Each cgroup has a "cgroup.controllers" file which lists all
387 controllers available for the cgroup to enable::
389   # cat cgroup.controllers
393 disabled by writing to the "cgroup.subtree_control" file::
395   # echo "+cpu +memory -io" > cgroup.subtree_control
397 Only controllers which are listed in "cgroup.controllers" can be
402 Enabling a controller in a cgroup indicates that the distribution of
416 the cgroup's children, enabling it creates the controller's interface
422 "cgroup." are owned by the parent rather than the cgroup itself.
428 Resources are distributed top-down and a cgroup can further distribute
430 parent.  This means that all non-root "cgroup.subtree_control" files
432 "cgroup.subtree_control" file.  A controller can be enabled only if
443 controllers enabled in their "cgroup.subtree_control" files.
450 The root cgroup is exempt from this restriction.  Root contains
453 controllers.  How resource consumption in the root cgroup is governed
459 enabled controller in the cgroup's "cgroup.subtree_control".  This is
461 populated cgroup.  To control resource distribution of a cgroup, the
462 cgroup must create children and transfer all its processes to the
463 children before enabling controllers in its "cgroup.subtree_control"
473 A cgroup can be delegated in two ways.  First, to a less privileged
474 user by granting write access of the directory and its "cgroup.procs",
475 "cgroup.threads" and "cgroup.subtree_control" files to the user.
477 cgroup namespace on namespace creation.
483 kernel rejects writes to all files other than "cgroup.procs" and
484 "cgroup.subtree_control" on a namespace root from inside the
495 Currently, cgroup doesn't impose any restrictions on the number of
508 to migrate a target process into a cgroup by writing its PID to the
509 "cgroup.procs" file.
511 - The writer must have write access to the "cgroup.procs" file.
513 - The writer must have write access to the "cgroup.procs" file of the
525   ~ cgroup    ~      \ C01
530 currently in C10 into "C00/cgroup.procs".  U0 has write access to the
531 file; however, the common ancestor of the source cgroup C10 and the
532 destination cgroup C00 is above the points of delegation and U0 would
533 not have write access to its "cgroup.procs" files and thus the write
556 should be assigned to a cgroup according to the system's logical and
565 Interface files for a cgroup and its children cgroups occupy the same
569 All cgroup core interface files are prefixed with "cgroup." and each
577 cgroup doesn't do anything to prevent name collisions and it's the
584 cgroup controllers implement several resource distribution schemes
624 "io.max" limits the maximum BPS and/or IOPS that a cgroup can consume
631 A cgroup is protected upto the configured amount of the resource
652 A cgroup is exclusively allocated a certain amount of a finite
716 - The root cgroup should be exempt from resource control and thus
754     # cat cgroup-example-interface-file
760     # echo 125 > cgroup-example-interface-file
764     # echo "default 125" > cgroup-example-interface-file
768     # echo "8:16 170" > cgroup-example-interface-file
772     # echo "8:0 default" > cgroup-example-interface-file
773     # cat cgroup-example-interface-file
786 All cgroup core files are prefixed with "cgroup."
788   cgroup.type
793 	When read, it indicates the current type of the cgroup, which
796 	- "domain" : A normal valid domain cgroup.
798 	- "domain threaded" : A threaded domain cgroup which is
801 	- "domain invalid" : A cgroup which is in an invalid state.
803 	  be allowed to become a threaded cgroup.
805 	- "threaded" : A threaded cgroup which is a member of a
808 	A cgroup can be turned into a threaded cgroup by writing
811   cgroup.procs
816 	the cgroup one-per-line.  The PIDs are not ordered and the
818 	to another cgroup and then back or the PID got recycled while
822 	the PID to the cgroup.  The writer should match all of the
825 	- It must have write access to the "cgroup.procs" file.
827 	- It must have write access to the "cgroup.procs" file of the
833 	In a threaded cgroup, reading this file fails with EOPNOTSUPP
835 	supported and moves every thread of the process to the cgroup.
837   cgroup.threads
842 	the cgroup one-per-line.  The TIDs are not ordered and the
844 	another cgroup and then back or the TID got recycled while
848 	TID to the cgroup.  The writer should match all of the
851 	- It must have write access to the "cgroup.threads" file.
853 	- The cgroup that the thread is currently in must be in the
854           same resource domain as the destination cgroup.
856 	- It must have write access to the "cgroup.procs" file of the
862   cgroup.controllers
867 	the cgroup.  The controllers are not ordered.
869   cgroup.subtree_control
875 	cgroup to its children.
884   cgroup.events
891 		1 if the cgroup or its descendants contains any live
894 		1 if the cgroup is frozen; otherwise, 0.
896   cgroup.max.descendants
901 	an attempt to create a new cgroup in the hierarchy will fail.
903   cgroup.max.depth
906 	Maximum allowed descent depth below the current cgroup.
908 	an attempt to create a new child cgroup will fail.
910   cgroup.stat
917 		Total number of dying descendant cgroups. A cgroup becomes
918 		dying after being deleted by a user. The cgroup will remain
922 		A process can't enter a dying cgroup under any circumstances,
923 		a dying cgroup can't revive.
925 		A dying cgroup can consume system resources not exceeding
926 		limits, which were active at the moment of cgroup deletion.
928   cgroup.freeze
932 	Writing "1" to the file causes freezing of the cgroup and all
934 	be stopped and will not run until the cgroup will be explicitly
935 	unfrozen. Freezing of the cgroup may take some time; when this action
936 	is completed, the "frozen" value in the cgroup.events control file
940 	A cgroup can be frozen either by its own settings, or by settings
942 	cgroup will remain frozen.
944 	Processes in the frozen cgroup can be killed by a fatal signal.
945 	They also can enter and leave a frozen cgroup: either by an explicit
946 	move by a user, or if freezing of the cgroup races with fork().
947 	If a process is moved to a frozen cgroup, it stops. If a process is
948 	moved out of a frozen cgroup, it becomes running.
950 	Frozen status of a cgroup doesn't affect any cgroup tree operations:
951 	it's possible to delete a frozen (and empty) cgroup, as well as
974 the root cgroup.  Be aware that system management software may already
976 process, and these processes may need to be moved to the root cgroup
1075 cgroup are tracked so that the total memory consumption can be
1099 	The total amount of memory currently being used by the cgroup
1106 	Hard memory protection.  If the memory usage of a cgroup
1107 	is within its effective min boundary, the cgroup's memory
1117 	(child cgroup or cgroups are requiring more protected memory
1118 	than parent will allow), then each child cgroup will get
1125 	If a memory cgroup is not populated with processes,
1133 	cgroup is within its effective low boundary, the cgroup's
1143 	(child cgroup or cgroups are requiring more protected memory
1144 	than parent will allow), then each child cgroup will get
1156 	control memory usage of a cgroup.  If a cgroup's usage goes
1157 	over the high boundary, the processes of the cgroup are
1168 	mechanism.  If a cgroup's memory usage reaches this limit and
1169 	can't be reduced, the OOM killer is invoked in the cgroup.
1188 	Determines whether the cgroup should be treated as
1190 	all tasks belonging to the cgroup or to its descendants
1191 	(if the memory cgroup is not a leaf cgroup) are killed
1198 	If the OOM killer is invoked in a cgroup, it's not going
1199 	to kill any tasks outside of this cgroup, regardless
1210 	hierarchy. For for the local events at the cgroup level see
1214 		The number of times the cgroup is reclaimed due to
1220 		The number of times processes of the cgroup are
1223 		cgroup whose memory usage is capped by the high limit
1228 		The number of times the cgroup's memory usage was
1230 		fails to bring it down, the cgroup goes to OOM state.
1233 		The number of time the cgroup's memory usage was
1241 		The number of processes belonging to this cgroup
1246 	to the cgroup i.e. not hierarchical. The file modified event
1252 	This breaks down the cgroup's memory footprint into different
1389 	This breaks down the cgroup's memory footprint into different
1415 	The total amount of swap currently being used by the cgroup
1422 	Swap usage throttle limit.  If a cgroup's swap usage exceeds
1426 	This limit marks a point of no return for the cgroup. It is NOT
1429 	prohibits swapping past a set amount, but lets the cgroup
1438 	Swap usage hard limit.  If a cgroup's swap usage reaches this
1439 	limit, anonymous memory of the cgroup will not be swapped out.
1448 		The number of times the cgroup's swap usage was over
1452 		The number of times the cgroup's swap usage was about
1482 throttles the offending cgroup, a management agent has ample
1486 Determining whether a cgroup has enough memory is not trivial as
1500 A memory area is charged to the cgroup which instantiated it and stays
1501 charged to the cgroup until the area is released.  Migrating a process
1502 to a different cgroup doesn't move the memory usages that it
1503 instantiated while in the previous cgroup to the new cgroup.
1506 To which cgroup the area will be charged is in-deterministic; however,
1507 over time, the memory area is likely to end up in a cgroup which has
1510 If a cgroup sweeps a considerable amount of memory which is expected
1551 	cgroup.
1606 	cgroup.
1643 	If needed, tools/cgroup/iocost_coef_gen.py can be used to
1654 	the cgroup can use in relation to its siblings.
1726 per-cgroup dirty memory states are examined and the more restrictive
1729 cgroup writeback requires explicit support from the underlying
1730 filesystem.  Currently, cgroup writeback is implemented on ext2, ext4,
1732 attributed to the root cgroup.
1735 which affects how cgroup ownership is tracked.  Memory is tracked per
1737 inode is assigned to a cgroup and all IO requests to write dirty pages
1738 from the inode are attributed to that cgroup.
1740 As cgroup ownership for memory is tracked per page, there can be pages
1744 cgroup becomes the majority over a certain period of time, switches
1745 the ownership of the inode to that cgroup.
1748 mostly dirtied by a single cgroup even when the main writing cgroup
1758 The sysctl knobs which affect writeback behavior are applied to cgroup
1762 	These ratios apply the same to cgroup writeback with the
1767 	For cgroup writeback, this is calculated into ratio against
1775 This is a cgroup v2 controller for IO workload protection.  You provide a group
1854 The process number controller is used to allow a cgroup to stop any
1858 The number of tasks in a cgroup can be exhausted in ways which other
1879 	The number of processes currently in the cgroup and its
1882 Organisational operations are not blocked by cgroup policies, so it is
1885 processes to the cgroup such that pids.current is larger than
1886 pids.max.  However, it is not possible to violate a cgroup PID policy
1888 of a new process would cause a cgroup policy to be violated.
1896 specified in the cpuset interface files in a task's current cgroup.
1914 	cgroup.  The actual list of CPUs to be granted, however, is
1924 	An empty value indicates that the cgroup is using the same
1925 	setting as the nearest cgroup ancestor with a non-empty
1936 	cgroup by its parent.  These CPUs are allowed to be used by
1937 	tasks within the current cgroup.
1940 	all the CPUs from the parent cgroup that can be available to
1941 	be used by this cgroup.  Otherwise, it should be a subset of
1953 	this cgroup.  The actual list of memory nodes granted, however,
1963 	An empty value indicates that the cgroup is using the same
1964 	setting as the nearest cgroup ancestor with a non-empty
1976 	this cgroup by its parent. These memory nodes are allowed to
1977 	be used by tasks within the current cgroup.
1980 	parent cgroup that will be available to be used by this cgroup.
1989 	cpuset-enabled cgroups.  This flag is owned by the parent cgroup
1997 	When set to be a partition root, the current cgroup is the
2001 	cgroup is always a partition root.
2004 	It can only be set in a cgroup if all the following conditions
2009 	2) The parent cgroup is a partition root.
2017 	effective CPUs of the parent cgroup.  Once it is set, this
2042 	granted by the parent cgroup.
2045 	in "cpuset.cpus" can be granted by the parent cgroup or the
2046 	parent cgroup is no longer a partition root itself.  In this
2049 	The cpu affinity of all the tasks in the cgroup will then be
2069 on top of cgroup BPF. To control access to device files, a user may
2117 	It exists for all the cgroup except root.
2135 	the cgroup except root.
2139 	The default value is "max".  It exists for all the cgroup except root.
2149 	are local to the cgroup i.e. not hierarchical. The file modified event
2160 always be filtered by cgroup v2 path.  The controller can still be
2171 CPU controller root cgroup process behaviour
2174 When distributing CPU cycles in the root cgroup each thread in this
2175 cgroup is treated as if it was hosted in a separate child cgroup of the
2176 root cgroup. This child cgroup weight is dependent on its thread nice
2184 IO controller root cgroup process behaviour
2187 Root cgroup processes are hosted in an implicit leaf child node.
2189 account as if it was a normal child cgroup of the root cgroup with a
2199 cgroup namespace provides a mechanism to virtualize the view of the
2200 "/proc/$PID/cgroup" file and cgroup mounts.  The CLONE_NEWCGROUP clone
2201 flag can be used with clone(2) and unshare(2) to create a new cgroup
2202 namespace.  The process running inside the cgroup namespace will have
2203 its "/proc/$PID/cgroup" output restricted to cgroupns root.  The
2204 cgroupns root is the cgroup of the process at the time of creation of
2205 the cgroup namespace.
2207 Without cgroup namespace, the "/proc/$PID/cgroup" file shows the
2208 complete path of the cgroup of a process.  In a container setup where
2210 "/proc/$PID/cgroup" file may leak potential system level information
2213   # cat /proc/self/cgroup
2217 and undesirable to expose to the isolated processes.  cgroup namespace
2219 creating a cgroup namespace, one would see::
2221   # ls -l /proc/self/ns/cgroup
2222   lrwxrwxrwx 1 root root 0 2014-07-15 10:37 /proc/self/ns/cgroup -> cgroup:[4026531835]
2223   # cat /proc/self/cgroup
2228   # ls -l /proc/self/ns/cgroup
2229   lrwxrwxrwx 1 root root 0 2014-07-15 10:35 /proc/self/ns/cgroup -> cgroup:[4026532183]
2230   # cat /proc/self/cgroup
2233 When some thread from a multi-threaded process unshares its cgroup
2238 A cgroup namespace is alive as long as there are processes inside or
2239 mounts pinning it.  When the last usage goes away, the cgroup
2247 The 'cgroupns root' for a cgroup namespace is the cgroup in which the
2249 /batchjobs/container_id1 cgroup calls unshare, cgroup
2251 init_cgroup_ns, this is the real root ('/') cgroup.
2253 The cgroupns root cgroup does not change even if the namespace creator
2254 process later moves to a different cgroup::
2256   # ~/unshare -c # unshare cgroupns in some cgroup
2257   # cat /proc/self/cgroup
2260   # echo 0 > sub_cgrp_1/cgroup.procs
2261   # cat /proc/self/cgroup
2264 Each process gets its namespace-specific view of "/proc/$PID/cgroup"
2266 Processes running inside the cgroup namespace will be able to see
2267 cgroup paths (in /proc/self/cgroup) only inside their root cgroup.
2272   # echo 7353 > sub_cgrp_1/cgroup.procs
2273   # cat /proc/7353/cgroup
2276 From the initial cgroup namespace, the real cgroup path will be
2279   $ cat /proc/7353/cgroup
2282 From a sibling cgroup namespace (that is, a namespace rooted at a
2283 different cgroup), the cgroup path relative to its own cgroup
2284 namespace root will be shown.  For instance, if PID 7353's cgroup
2287   # cat /proc/7353/cgroup
2291 its relative to the cgroup namespace root of the caller.
2297 Processes inside a cgroup namespace can move into and out of the
2303   # cat /proc/7353/cgroup
2305   # echo 7353 > batchjobs/container_id2/cgroup.procs
2306   # cat /proc/7353/cgroup
2309 Note that this kind of setup is not encouraged.  A task inside cgroup
2312 setns(2) to another cgroup namespace is allowed when:
2315 (b) the process has CAP_SYS_ADMIN against the target cgroup
2318 No implicit cgroup changes happen with attaching to another cgroup
2320 process under the target cgroup namespace root.
2326 Namespace specific cgroup hierarchy can be mounted by a process
2327 running inside a non-init cgroup namespace::
2331 This will mount the unified cgroup hierarchy with cgroupns root as the
2335 The virtualization of /proc/self/cgroup file combined with restricting
2336 the view of cgroup hierarchy by namespace-private cgroupfs mount
2337 provides a properly isolated cgroup view inside the container.
2344 where interacting with cgroup is necessary.  cgroup core and
2351 A filesystem can support cgroup writeback by updating
2357 	associates the bio with the inode's owner cgroup and the
2368 With writeback bio's annotated, cgroup support can be enabled per
2370 selective disabling of cgroup writeback support which is helpful when
2374 wbc_init_bio() binds the specified bio to its cgroup.  Depending on
2390 - The "tasks" file is removed and "cgroup.procs" is not sorted.
2392 - "cgroup.clone_children" is removed.
2394 - /proc/cgroups is meaningless for v2.  Use "cgroup.controllers" file
2404 cgroup v1 allowed an arbitrary number of hierarchies and each
2426 It greatly complicated cgroup core implementation but more importantly
2427 the support for multiple hierarchies restricted how cgroup could be
2431 that a thread's cgroup membership couldn't be described in finite
2457 cgroup v1 allowed threads of a process to belong to different cgroups.
2468 cgroup v1 had an ambiguously defined delegation model which got abused
2472 effectively raised cgroup to the status of a syscall-like API exposed
2475 First of all, cgroup has a fundamentally inadequate interface to be
2477 extract the path on the target hierarchy from /proc/self/cgroup,
2484 cgroup controllers implemented a number of knobs which would never be
2486 system-management pseudo filesystem.  cgroup ended up with interface
2490 effectively abusing cgroup as a shortcut to implementing public APIs
2501 cgroup v1 allowed threads to be in any cgroups which created an
2502 interesting problem where threads belonging to a parent cgroup and its
2508 mapped nice levels to cgroup weights.  This worked for some cases but
2517 cgroup to host the threads.  The hidden leaf had its own copies of all
2533 made cgroup as a whole highly inconsistent.
2535 This clearly is a problem which needs to be addressed from cgroup core
2542 cgroup v1 grew without oversight and developed a large number of
2543 idiosyncrasies and inconsistencies.  One issue on the cgroup core side
2544 was how an empty cgroup was notified - a userland helper binary was
2553 cgroup.  Some controllers exposed a large amount of inconsistent
2556 There also was no consistency across controllers.  When a new cgroup
2564 cgroup v2 establishes common conventions where appropriate and updates
2589 reserve.  A cgroup enjoys reclaim protection when it's within its
2632 cgroup design was that global or parental pressure would always be
2641 that cgroup controllers should account and limit specific physical