memory.rst - OpenGrok cross reference for /Linux-v6.1/Documentation/admin-guide/cgroup-v1/memory.rst

Lines Matching +full:enable +full:- +full:charge +full:- +full:control
18       we call it "memory cgroup". When you see git-log and source code, you'll
30    Memory-hungry applications can be isolated and limited to a smaller
34 c. Virtualization solutions can control the amount of memory they want
36 d. A CD/DVD burner could control the amount of memory used by the
42 Current Status: linux-2.6.34-mmotm(development version of 2010/April)
46  - accounting anonymous pages, file caches, swap caches usage and limiting them.
47  - pages are linked to per-memcg LRU exclusively, and there is no global LRU.
48  - optionally, memory+swap usage can be accounted and limited.
49  - hierarchical accounting
50  - soft limit
51  - moving (recharging) account at moving a task is selectable.
52  - usage threshold notifier
53  - memory pressure notifier
54  - oom-killer disable knob and oom-notifier
55  - Root cgroup has no limit controls.
60 Brief summary of control files.
93                                      it will return -ENOTSUPP.
111 there were several implementations for memory control. The goal of the
113 for memory control. The first RSS controller was posted by Balbir Singh[2]
119 Cache Control [11].
121 2. Memory Control
134 3. Kernel user memory accounting and slab control
140 -----------
148 ---------------
152 		+--------------------+
155 		+--------------------+
158            +---------------+  |        +---------------+
161            +---------------+  |        +---------------+
163                               + --------------+
165            +---------------+           +------+--------+
166            | page          +---------->  page_cgroup|
168            +---------------+           +---------------+
184 If everything goes well, a page meta-data-structure called page_cgroup is
186 (*) page_cgroup structure is allocated at boot/memory-hotplug time.
189 ------------------------
197 inserted into inode (radix-tree). While it's mapped into the page tables of
201 unaccounted when it's removed from radix-tree. Even if RSS pages are fully
204 A swapped-in page is accounted after adding into swapcache.
206 Note: The kernel does swapin-readahead and reads multiple swaps at once.
212 Note: we just account pages-on-LRU because our purpose is to control amount
213 of used pages; not-on-LRU pages tend to be out-of-control from VM view.
216 --------------------------
222 the cgroup that brought it in -- this will happen on memory pressure).
228 --------------------------------------
235  - memory.memsw.usage_in_bytes.
236  - memory.memsw.limit_in_bytes.
249 The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
257 When a cgroup hits memory.memsw.limit_in_bytes, it's useless to do swap-out
258 in this cgroup. Then, swap-out will not be done by cgroup routine and file
264 -----------
271 cgroup. (See 10. OOM Control below.)
274 pages that are selected for reclaiming come from the per-cgroup LRU
288 -----------
292   Page lock (PG_locked bit of page->flags)
293     mm->page_table_lock or split pte_lock
294       lock_page_memcg (memcg->move_lock)
295         mapping->i_pages lock
296           lruvec->lru_lock.
298 Per-node-per-memcgroup LRU (cgroup's private LRU) is guarded by
299 lruvec->lru_lock; PG_lru bit of page->flags is cleared before
300 isolating a page from its LRU under lruvec->lru_lock.
303 -----------------------------------------------
311 it can be disabled system-wide by passing cgroup.memory=nokmem to the kernel
326 -----------------------------------------------
350 ----------------------
363     deployments where the total amount of memory per-cgroup is overcommitted.
365     box can still run out of non-reclaimable memory.
385 ------------------
387 a. Enable CONFIG_CGROUPS
388 b. Enable CONFIG_MEMCG
391 -------------------------------------------------------------------
395 	# mount -t tmpfs none /sys/fs/cgroup
397 	# mount -t cgroup none /sys/fs/cgroup/memory -o memory
414   We can write "-1" to reset the ``*.limit_in_bytes(unlimited)``.
432 availability of memory on the system. The user is required to re-read
454 Page-fault scalability is also important. At measuring parallel
455 page fault test, multi-process test may be better than multi-thread
462 -------------------
473 To know what happens, disabling OOM_Kill as per "10. OOM Control" (below) and
477 ------------------
479 When a task migrates from one cgroup to another, its charge is not
481 remain charged to it, the charge is dropped when the page is freed or
488 ---------------------
491 cgroup might have some charge associated with it, even though all
492 tasks have migrated away from it. (because we charge against pages, not
495 We move the stats to parent, and no change on the charge except uncharging
506 ---------------
516   charged file caches. Some out-of-use page caches may keep charged until
520 -------------
524 per-memory cgroup local status
546 inactive_file	# of bytes of file-backed memory on inactive LRU list.
547 active_file	# of bytes of file-backed memory on active LRU list.
593 --------------
604 -----------
616 ------------------
626 -------------
628 This is similar to numa_maps but operates on a per-memcg basis.  This is
635 per-node page counts including "hierarchical_<counter>" which sums up all
670 ---------------------------------------
684 is to allow control groups to use as much of the memory as needed, provided
689 When the system detects memory contention or low memory, control groups
690 are pushed back to their soft limits. If the soft limit of each control
692 sure that one control group does not starve the others of memory.
694 Please note that soft limits is a best-effort feature; it comes with
701 -------------
723 is, uncharge task's pages from the old cgroup and charge them to the new cgroup.
728 -------------
733 If you want to enable it::
741       Charges are moved only when you move mm->owner, in other words,
755 --------------------------------------
762 +---+--------------------------------------------------------------------------+
765 | 0 | A charge of an anonymous page (or swap of it) used by the target task.   |
766 |   | You must enable Swap Extension (see 2.4) to enable move of swap charges. |
767 +---+--------------------------------------------------------------------------+
768 | 1 | A charge of file pages (normal file, tmpfs file (e.g. ipc shared memory) |
774 |   | page_mapcount(page) > 1). You must enable Swap Extension (see 2.4) to    |
775 |   | enable move of swap charges.                                             |
776 +---+--------------------------------------------------------------------------+
779 --------
781 - All of moving charge operations are done under cgroup_mutex. It's not good
793 - create an eventfd using eventfd(2);
794 - open memory.usage_in_bytes or memory.memsw.usage_in_bytes;
795 - write string like "<event_fd> <fd of memory.usage_in_bytes> <threshold>" to
801 It's applicable for root and non-root cgroup.
803 10. OOM Control
814  - create an eventfd using eventfd(2)
815  - open memory.oom_control file
816  - write string like "<event_fd> <fd of memory.oom_control>" to
822 You can disable the OOM-killer by writing "1" to memory.oom_control file, as:
826 If OOM-killer is disabled, tasks under cgroup will hang/sleep
827 in memory cgroup's OOM-waitqueue when they request accountable memory.
843 	- oom_kill_disable 0 or 1
844 	  (if 1, oom-killer is disabled)
845 	- under_oom	   0 or 1
847         - oom_kill         integer counter
869 resources that can be easily reconstructed or re-read from a disk.
872 about to out of memory (OOM) or even the in-kernel OOM killer is on its
878 events are not pass-through. For example, you have three cgroups: A->B->C. Now
888  - "default": this is the default behavior specified above. This mode is the
892  - "hierarchy": events always propagate up to the root, similar to the default
897  - "local": events are pass-through, i.e. they only receive notifications when
906 specified by a comma-delimited string, i.e. "low,hierarchy" specifies
907 hierarchical, pass-through, notification for all ancestor memcgs. Notification
908 that is the default, non pass-through behavior, does not specify a mode.
909 "medium,local" specifies pass-through notification for the medium level.
914 - create an eventfd using eventfd(2);
915 - open memory.pressure_level;
916 - write string as "<event_fd> <fd of memory.pressure_level> <level[,mode]>"
938    (Expect a bunch of notifications, and eventually, the oom-killer will
944 1. Make per-cgroup scanner reclaim not-shared pages first
945 2. Teach controller to account for shared-pages
959 2. Singh, Balbir. Memory Controller (RSS Control),
967 6. Menage, Paul. Control Groups v10, http://lwn.net/Articles/236032/
968 7. Vaidyanathan, Srinivasan, Control Groups: Pagecache accounting and control
975     https://lore.kernel.org/r/20070819094658.654.84837.sendpatchset@balbir-laptop
977     https://lore.kernel.org/r/20070817084228.26003.12568.sendpatchset@balbir-laptop