cgroup-v2.rst - OpenGrok cross reference for /Linux-v4.19/Documentation/admin-guide/cgroup-v2.rst

Lines Matching refs:cgroup
9 conventions of cgroup v2.  It describes all userland-visible aspects
10 of cgroup including core and specific controller behaviors.  All
12 v1 is available under Documentation/cgroup-v1/.
18      1-2. What is cgroup?
65        5-N-1. CPU controller root cgroup process behaviour
66        5-N-2. IO controller root cgroup process behaviour
90 "cgroup" stands for "control group" and is never capitalized.  The
92 qualifier as in "cgroup controllers".  When explicitly referring to
96 What is cgroup?
99 cgroup is a mechanism to organize processes hierarchically and
103 cgroup is largely composed of two parts - the core and controllers.
104 cgroup core is primarily responsible for hierarchically organizing
105 processes.  A cgroup controller is usually responsible for
111 to one and only one cgroup.  All threads of a process belong to the
112 same cgroup.  On creation, all processes are put in the cgroup that
114 to another cgroup.  Migration of a process doesn't affect already
118 disabled selectively on a cgroup.  All controller behaviors are
119 hierarchical - if a controller is enabled on a cgroup, it affects all
121 sub-hierarchy of the cgroup.  When a controller is enabled on a nested
122 cgroup, it always restricts the resource distribution further.  The
133 Unlike v1, cgroup v2 has only single hierarchy.  The cgroup v2
146 is no longer referenced in its current hierarchy.  Because per-cgroup
163 automount the v1 cgroup filesystem and so hijack all controllers
168 cgroup v2 currently supports the following mount options.
172 	Consider cgroup namespaces as delegation boundaries.  This
185 Initially, only the root cgroup exists to which all processes belong.
186 A child cgroup can be created by creating a sub-directory::
190 A given cgroup may have multiple child cgroups forming a tree
191 structure.  Each cgroup has a read-writable interface file
192 "cgroup.procs".  When read, it lists the PIDs of all processes which
193 belong to the cgroup one-per-line.  The PIDs are not ordered and the
195 another cgroup and then back or the PID got recycled while reading.
197 A process can be migrated into a cgroup by writing its PID to the
198 target cgroup's "cgroup.procs" file.  Only one process can be migrated
204 cgroup that the forking process belongs to at the time of the
205 operation.  After exit, a process stays associated with the cgroup
207 zombie process does not appear in "cgroup.procs" and thus can't be
208 moved to another cgroup.
210 A cgroup which doesn't have any children or live processes can be
211 destroyed by removing the directory.  Note that a cgroup which doesn't
217 "/proc/$PID/cgroup" lists a process's cgroup membership.  If legacy
218 cgroup is in use in the system, this file may contain multiple lines,
219 one for each hierarchy.  The entry for cgroup v2 is always in the
222   # cat /proc/842/cgroup
224   0::/test-cgroup/test-cgroup-nested
226 If the process becomes a zombie and the cgroup it was associated with
229   # cat /proc/842/cgroup
231   0::/test-cgroup/test-cgroup-nested (deleted)
237 cgroup v2 supports thread granularity for a subset of controllers to
240 process belong to the same cgroup, which also serves as the resource
248 Marking a cgroup threaded makes it join the resource domain of its
249 parent as a threaded cgroup.  The parent may be another threaded
250 cgroup whose resource domain is further up in the hierarchy.  The root
260 As the threaded domain cgroup hosts all the domain resource
264 root cgroup is not subject to no internal process constraint, it can
267 The current operation mode or type of the cgroup is shown in the
268 "cgroup.type" file which indicates whether the cgroup is a normal
270 or a threaded cgroup.
272 On creation, a cgroup is always a domain cgroup and can be made
273 threaded by writing "threaded" to the "cgroup.type" file.  The
276   # echo threaded > cgroup.type
278 Once threaded, the cgroup can't be made a domain again.  To enable the
281 - As the cgroup will join the parent's resource domain.  The parent
282   must either be a valid (threaded) domain or a threaded cgroup.
288 Topology-wise, a cgroup can be in an invalid state.  Please consider
295 threaded cgroup.  "cgroup.type" file will report "domain (invalid)" in
299 A domain cgroup is turned into a threaded domain when one of its child
300 cgroup becomes threaded or threaded controllers are enabled in the
301 "cgroup.subtree_control" file while there are processes in the cgroup.
305 When read, "cgroup.threads" contains the list of the thread IDs of all
306 threads in the cgroup.  Except that the operations are per-thread
307 instead of per-process, "cgroup.threads" has the same format and
308 behaves the same way as "cgroup.procs".  While "cgroup.threads" can be
309 written to in any cgroup, as it can only move threads inside the same
313 The threaded domain cgroup serves as the resource domain for the whole
315 all the processes are considered to be in the threaded domain cgroup.
316 "cgroup.procs" in a threaded domain cgroup contains the PIDs of all
318 However, "cgroup.procs" can be written to from anywhere in the subtree
319 to migrate all threads of the matching process to the cgroup.
324 threads in the cgroup and its descendants.  All consumptions which
325 aren't tied to a specific thread belong to the threaded domain cgroup.
329 between threads in a non-leaf cgroup and its child cgroups.  Each
336 Each non-root cgroup has a "cgroup.events" file which contains
337 "populated" field indicating whether the cgroup's sub-hierarchy has
339 the cgroup and its descendants; otherwise, 1.  poll and [id]notify
345 in each cgroup::
352 file modified events will be generated on the "cgroup.events" files of
362 Each cgroup has a "cgroup.controllers" file which lists all
363 controllers available for the cgroup to enable::
365   # cat cgroup.controllers
369 disabled by writing to the "cgroup.subtree_control" file::
371   # echo "+cpu +memory -io" > cgroup.subtree_control
373 Only controllers which are listed in "cgroup.controllers" can be
378 Enabling a controller in a cgroup indicates that the distribution of
392 the cgroup's children, enabling it creates the controller's interface
398 "cgroup." are owned by the parent rather than the cgroup itself.
404 Resources are distributed top-down and a cgroup can further distribute
406 parent.  This means that all non-root "cgroup.subtree_control" files
408 "cgroup.subtree_control" file.  A controller can be enabled only if
419 controllers enabled in their "cgroup.subtree_control" files.
426 The root cgroup is exempt from this restriction.  Root contains
429 controllers.  How resource consumption in the root cgroup is governed
435 enabled controller in the cgroup's "cgroup.subtree_control".  This is
437 populated cgroup.  To control resource distribution of a cgroup, the
438 cgroup must create children and transfer all its processes to the
439 children before enabling controllers in its "cgroup.subtree_control"
449 A cgroup can be delegated in two ways.  First, to a less privileged
450 user by granting write access of the directory and its "cgroup.procs",
451 "cgroup.threads" and "cgroup.subtree_control" files to the user.
453 cgroup namespace on namespace creation.
459 kernel rejects writes to all files other than "cgroup.procs" and
460 "cgroup.subtree_control" on a namespace root from inside the
471 Currently, cgroup doesn't impose any restrictions on the number of
484 to migrate a target process into a cgroup by writing its PID to the
485 "cgroup.procs" file.
487 - The writer must have write access to the "cgroup.procs" file.
489 - The writer must have write access to the "cgroup.procs" file of the
501   ~ cgroup    ~      \ C01
506 currently in C10 into "C00/cgroup.procs".  U0 has write access to the
507 file; however, the common ancestor of the source cgroup C10 and the
508 destination cgroup C00 is above the points of delegation and U0 would
509 not have write access to its "cgroup.procs" files and thus the write
532 should be assigned to a cgroup according to the system's logical and
541 Interface files for a cgroup and its children cgroups occupy the same
545 All cgroup core interface files are prefixed with "cgroup." and each
553 cgroup doesn't do anything to prevent name collisions and it's the
560 cgroup controllers implement several resource distribution schemes
600 "io.max" limits the maximum BPS and/or IOPS that a cgroup can consume
607 A cgroup is protected to be allocated upto the configured amount of
628 A cgroup is exclusively allocated a certain amount of a finite
692 - The root cgroup should be exempt from resource control and thus
694   informational files on the root cgroup which end up showing global
726     # cat cgroup-example-interface-file
732     # echo 125 > cgroup-example-interface-file
736     # echo "default 125" > cgroup-example-interface-file
740     # echo "8:16 170" > cgroup-example-interface-file
744     # echo "8:0 default" > cgroup-example-interface-file
745     # cat cgroup-example-interface-file
758 All cgroup core files are prefixed with "cgroup."
760   cgroup.type
765 	When read, it indicates the current type of the cgroup, which
768 	- "domain" : A normal valid domain cgroup.
770 	- "domain threaded" : A threaded domain cgroup which is
773 	- "domain invalid" : A cgroup which is in an invalid state.
775 	  be allowed to become a threaded cgroup.
777 	- "threaded" : A threaded cgroup which is a member of a
780 	A cgroup can be turned into a threaded cgroup by writing
783   cgroup.procs
788 	the cgroup one-per-line.  The PIDs are not ordered and the
790 	to another cgroup and then back or the PID got recycled while
794 	the PID to the cgroup.  The writer should match all of the
797 	- It must have write access to the "cgroup.procs" file.
799 	- It must have write access to the "cgroup.procs" file of the
805 	In a threaded cgroup, reading this file fails with EOPNOTSUPP
807 	supported and moves every thread of the process to the cgroup.
809   cgroup.threads
814 	the cgroup one-per-line.  The TIDs are not ordered and the
816 	another cgroup and then back or the TID got recycled while
820 	TID to the cgroup.  The writer should match all of the
823 	- It must have write access to the "cgroup.threads" file.
825 	- The cgroup that the thread is currently in must be in the
826           same resource domain as the destination cgroup.
828 	- It must have write access to the "cgroup.procs" file of the
834   cgroup.controllers
839 	the cgroup.  The controllers are not ordered.
841   cgroup.subtree_control
847 	cgroup to its children.
856   cgroup.events
863 		1 if the cgroup or its descendants contains any live
866   cgroup.max.descendants
871 	an attempt to create a new cgroup in the hierarchy will fail.
873   cgroup.max.depth
876 	Maximum allowed descent depth below the current cgroup.
878 	an attempt to create a new child cgroup will fail.
880   cgroup.stat
887 		Total number of dying descendant cgroups. A cgroup becomes
888 		dying after being deleted by a user. The cgroup will remain
892 		A process can't enter a dying cgroup under any circumstances,
893 		a dying cgroup can't revive.
895 		A dying cgroup can consume system resources not exceeding
896 		limits, which were active at the moment of cgroup deletion.
912 the root cgroup.  Be aware that system management software may already
914 process, and these processes may need to be moved to the root cgroup
980 cgroup are tracked so that the total memory consumption can be
1004 	The total amount of memory currently being used by the cgroup
1011 	Hard memory protection.  If the memory usage of a cgroup
1012 	is within its effective min boundary, the cgroup's memory
1019 	(child cgroup or cgroups are requiring more protected memory
1020 	than parent will allow), then each child cgroup will get
1027 	If a memory cgroup is not populated with processes,
1035 	cgroup is within its effective low boundary, the cgroup's
1041 	(child cgroup or cgroups are requiring more protected memory
1042 	than parent will allow), then each child cgroup will get
1054 	control memory usage of a cgroup.  If a cgroup's usage goes
1055 	over the high boundary, the processes of the cgroup are
1066 	mechanism.  If a cgroup's memory usage reaches this limit and
1067 	can't be reduced, the OOM killer is invoked in the cgroup.
1079 	Determines whether the cgroup should be treated as
1081 	all tasks belonging to the cgroup or to its descendants
1082 	(if the memory cgroup is not a leaf cgroup) are killed
1089 	If the OOM killer is invoked in a cgroup, it's not going
1090 	to kill any tasks outside of this cgroup, regardless
1100 		The number of times the cgroup is reclaimed due to
1106 		The number of times processes of the cgroup are
1109 		cgroup whose memory usage is capped by the high limit
1114 		The number of times the cgroup's memory usage was
1116 		fails to bring it down, the cgroup goes to OOM state.
1119 		The number of time the cgroup's memory usage was
1127 		disk readahead.  For now OOM in memory cgroup kills
1131 		The number of processes belonging to this cgroup
1137 	This breaks down the cgroup's memory footprint into different
1243 	The total amount of swap currently being used by the cgroup
1250 	Swap usage hard limit.  If a cgroup's swap usage reaches this
1251 	limit, anonymous memory of the cgroup will not be swapped out.
1260 		The number of times the cgroup's swap usage was about
1284 throttles the offending cgroup, a management agent has ample
1288 Determining whether a cgroup has enough memory is not trivial as
1302 A memory area is charged to the cgroup which instantiated it and stays
1303 charged to the cgroup until the area is released.  Migrating a process
1304 to a different cgroup doesn't move the memory usages that it
1305 instantiated while in the previous cgroup to the new cgroup.
1308 To which cgroup the area will be charged is in-deterministic; however,
1309 over time, the memory area is likely to end up in a cgroup which has
1312 If a cgroup sweeps a considerable amount of memory which is expected
1360 	the cgroup can use in relation to its siblings.
1426 per-cgroup dirty memory states are examined and the more restrictive
1429 cgroup writeback requires explicit support from the underlying
1430 filesystem.  Currently, cgroup writeback is implemented on ext2, ext4
1432 the root cgroup.
1435 which affects how cgroup ownership is tracked.  Memory is tracked per
1437 inode is assigned to a cgroup and all IO requests to write dirty pages
1438 from the inode are attributed to that cgroup.
1440 As cgroup ownership for memory is tracked per page, there can be pages
1444 cgroup becomes the majority over a certain period of time, switches
1445 the ownership of the inode to that cgroup.
1448 mostly dirtied by a single cgroup even when the main writing cgroup
1458 The sysctl knobs which affect writeback behavior are applied to cgroup
1462 	These ratios apply the same to cgroup writeback with the
1467 	For cgroup writeback, this is calculated into ratio against
1475 This is a cgroup v2 controller for IO workload protection.  You provide a group
1554 The process number controller is used to allow a cgroup to stop any
1558 The number of tasks in a cgroup can be exhausted in ways which other
1579 	The number of processes currently in the cgroup and its
1582 Organisational operations are not blocked by cgroup policies, so it is
1585 processes to the cgroup such that pids.current is larger than
1586 pids.max.  However, it is not possible to violate a cgroup PID policy
1588 of a new process would cause a cgroup policy to be violated.
1599 on top of cgroup BPF. To control access to device files, a user may
1647 	It exists for all the cgroup except root.
1663 always be filtered by cgroup v2 path.  The controller can still be
1674 CPU controller root cgroup process behaviour
1677 When distributing CPU cycles in the root cgroup each thread in this
1678 cgroup is treated as if it was hosted in a separate child cgroup of the
1679 root cgroup. This child cgroup weight is dependent on its thread nice
1687 IO controller root cgroup process behaviour
1690 Root cgroup processes are hosted in an implicit leaf child node.
1692 account as if it was a normal child cgroup of the root cgroup with a
1702 cgroup namespace provides a mechanism to virtualize the view of the
1703 "/proc/$PID/cgroup" file and cgroup mounts.  The CLONE_NEWCGROUP clone
1704 flag can be used with clone(2) and unshare(2) to create a new cgroup
1705 namespace.  The process running inside the cgroup namespace will have
1706 its "/proc/$PID/cgroup" output restricted to cgroupns root.  The
1707 cgroupns root is the cgroup of the process at the time of creation of
1708 the cgroup namespace.
1710 Without cgroup namespace, the "/proc/$PID/cgroup" file shows the
1711 complete path of the cgroup of a process.  In a container setup where
1713 "/proc/$PID/cgroup" file may leak potential system level information
1716   # cat /proc/self/cgroup
1720 and undesirable to expose to the isolated processes.  cgroup namespace
1722 creating a cgroup namespace, one would see::
1724   # ls -l /proc/self/ns/cgroup
1725   lrwxrwxrwx 1 root root 0 2014-07-15 10:37 /proc/self/ns/cgroup -> cgroup:[4026531835]
1726   # cat /proc/self/cgroup
1731   # ls -l /proc/self/ns/cgroup
1732   lrwxrwxrwx 1 root root 0 2014-07-15 10:35 /proc/self/ns/cgroup -> cgroup:[4026532183]
1733   # cat /proc/self/cgroup
1736 When some thread from a multi-threaded process unshares its cgroup
1741 A cgroup namespace is alive as long as there are processes inside or
1742 mounts pinning it.  When the last usage goes away, the cgroup
1750 The 'cgroupns root' for a cgroup namespace is the cgroup in which the
1752 /batchjobs/container_id1 cgroup calls unshare, cgroup
1754 init_cgroup_ns, this is the real root ('/') cgroup.
1756 The cgroupns root cgroup does not change even if the namespace creator
1757 process later moves to a different cgroup::
1759   # ~/unshare -c # unshare cgroupns in some cgroup
1760   # cat /proc/self/cgroup
1763   # echo 0 > sub_cgrp_1/cgroup.procs
1764   # cat /proc/self/cgroup
1767 Each process gets its namespace-specific view of "/proc/$PID/cgroup"
1769 Processes running inside the cgroup namespace will be able to see
1770 cgroup paths (in /proc/self/cgroup) only inside their root cgroup.
1775   # echo 7353 > sub_cgrp_1/cgroup.procs
1776   # cat /proc/7353/cgroup
1779 From the initial cgroup namespace, the real cgroup path will be
1782   $ cat /proc/7353/cgroup
1785 From a sibling cgroup namespace (that is, a namespace rooted at a
1786 different cgroup), the cgroup path relative to its own cgroup
1787 namespace root will be shown.  For instance, if PID 7353's cgroup
1790   # cat /proc/7353/cgroup
1794 its relative to the cgroup namespace root of the caller.
1800 Processes inside a cgroup namespace can move into and out of the
1806   # cat /proc/7353/cgroup
1808   # echo 7353 > batchjobs/container_id2/cgroup.procs
1809   # cat /proc/7353/cgroup
1812 Note that this kind of setup is not encouraged.  A task inside cgroup
1815 setns(2) to another cgroup namespace is allowed when:
1818 (b) the process has CAP_SYS_ADMIN against the target cgroup
1821 No implicit cgroup changes happen with attaching to another cgroup
1823 process under the target cgroup namespace root.
1829 Namespace specific cgroup hierarchy can be mounted by a process
1830 running inside a non-init cgroup namespace::
1834 This will mount the unified cgroup hierarchy with cgroupns root as the
1838 The virtualization of /proc/self/cgroup file combined with restricting
1839 the view of cgroup hierarchy by namespace-private cgroupfs mount
1840 provides a properly isolated cgroup view inside the container.
1847 where interacting with cgroup is necessary.  cgroup core and
1854 A filesystem can support cgroup writeback by updating
1860 	associates the bio with the inode's owner cgroup.  Can be
1869 With writeback bio's annotated, cgroup support can be enabled per
1871 selective disabling of cgroup writeback support which is helpful when
1875 wbc_init_bio() binds the specified bio to its cgroup.  Depending on
1891 - The "tasks" file is removed and "cgroup.procs" is not sorted.
1893 - "cgroup.clone_children" is removed.
1895 - /proc/cgroups is meaningless for v2.  Use "cgroup.controllers" file
1905 cgroup v1 allowed an arbitrary number of hierarchies and each
1927 It greatly complicated cgroup core implementation but more importantly
1928 the support for multiple hierarchies restricted how cgroup could be
1932 that a thread's cgroup membership couldn't be described in finite
1958 cgroup v1 allowed threads of a process to belong to different cgroups.
1969 cgroup v1 had an ambiguously defined delegation model which got abused
1973 effectively raised cgroup to the status of a syscall-like API exposed
1976 First of all, cgroup has a fundamentally inadequate interface to be
1978 extract the path on the target hierarchy from /proc/self/cgroup,
1985 cgroup controllers implemented a number of knobs which would never be
1987 system-management pseudo filesystem.  cgroup ended up with interface
1991 effectively abusing cgroup as a shortcut to implementing public APIs
2002 cgroup v1 allowed threads to be in any cgroups which created an
2003 interesting problem where threads belonging to a parent cgroup and its
2009 mapped nice levels to cgroup weights.  This worked for some cases but
2018 cgroup to host the threads.  The hidden leaf had its own copies of all
2034 made cgroup as a whole highly inconsistent.
2036 This clearly is a problem which needs to be addressed from cgroup core
2043 cgroup v1 grew without oversight and developed a large number of
2044 idiosyncrasies and inconsistencies.  One issue on the cgroup core side
2045 was how an empty cgroup was notified - a userland helper binary was
2054 cgroup.  Some controllers exposed a large amount of inconsistent
2057 There also was no consistency across controllers.  When a new cgroup
2065 cgroup v2 establishes common conventions where appropriate and updates
2090 reserve.  A cgroup enjoys reclaim protection when it's within its low,
2131 cgroup design was that global or parental pressure would always be
2140 that cgroup controllers should account and limit specific physical