memory.rst - OpenGrok cross reference for /Linux-v6.6/Documentation/admin-guide/cgroup-v1/memory.rst

Lines Matching +full:enable +full:- +full:charge +full:- +full:control
18       we call it "memory cgroup". When you see git-log and source code, you'll
30    Memory-hungry applications can be isolated and limited to a smaller
34 c. Virtualization solutions can control the amount of memory they want
36 d. A CD/DVD burner could control the amount of memory used by the
42 Current Status: linux-2.6.34-mmotm(development version of 2010/April)
46  - accounting anonymous pages, file caches, swap caches usage and limiting them.
47  - pages are linked to per-memcg LRU exclusively, and there is no global LRU.
48  - optionally, memory+swap usage can be accounted and limited.
49  - hierarchical accounting
50  - soft limit
51  - moving (recharging) account at moving a task is selectable.
52  - usage threshold notifier
53  - memory pressure notifier
54  - oom-killer disable knob and oom-notifier
55  - Root cgroup has no limit controls.
59  <cgroup-v1-memory-kernel-extension>`)
61 Brief summary of control files.
119 there were several implementations for memory control. The goal of the
121 for memory control. The first RSS controller was posted by Balbir Singh [2]_
127 Cache Control [11]_.
129 2. Memory Control
142 3. Kernel user memory accounting and slab control
148 -----------
156 ---------------
158 .. code-block::
161 		+--------------------+
164 		+--------------------+
167            +---------------+  |        +---------------+
170            +---------------+  |        +---------------+
172                               + --------------+
174            +---------------+           +------+--------+
175            | page          +---------->  page_cgroup|
177            +---------------+           +---------------+
192 If everything goes well, a page meta-data-structure called page_cgroup is
194 (*) page_cgroup structure is allocated at boot/memory-hotplug time.
197 ------------------------
212 A swapped-in page is accounted after adding into swapcache.
214 Note: The kernel does swapin-readahead and reads multiple swaps at once.
220 Note: we just account pages-on-LRU because our purpose is to control amount
221 of used pages; not-on-LRU pages tend to be out-of-control from VM view.
224 --------------------------
230 the cgroup that brought it in -- this will happen on memory pressure).
232 But see :ref:`section 8.2 <cgroup-v1-memory-movable-charges>` when moving a
237 --------------------------------------
244  - memory.memsw.usage_in_bytes.
245  - memory.memsw.limit_in_bytes.
259 The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
268 When a cgroup hits memory.memsw.limit_in_bytes, it's useless to do swap-out
269 in this cgroup. Then, swap-out will not be done by cgroup routine and file
275 -----------
282 cgroup. (See :ref:`10. OOM Control <cgroup-v1-memory-oom-control>` below.)
285 pages that are selected for reclaiming come from the per-cgroup LRU
296 (See :ref:`oom_control <cgroup-v1-memory-oom-control>` section)
299 -----------
303   Page lock (PG_locked bit of page->flags)
304     mm->page_table_lock or split pte_lock
305       folio_memcg_lock (memcg->move_lock)
306         mapping->i_pages lock
307           lruvec->lru_lock.
309 Per-node-per-memcgroup LRU (cgroup's private LRU) is guarded by
310 lruvec->lru_lock; PG_lru bit of page->flags is cleared before
311 isolating a page from its LRU under lruvec->lru_lock.
313 .. _cgroup-v1-memory-kernel-extension:
316 -----------------------------------------------
324 it can be disabled system-wide by passing cgroup.memory=nokmem to the kernel
339 -----------------------------------------------
363 ----------------------
376     deployments where the total amount of memory per-cgroup is overcommitted.
378     box can still run out of non-reclaimable memory.
399 1. Enable CONFIG_CGROUPS and CONFIG_MEMCG options
401    <cgroups-why-needed>` for the background information)::
403 	# mount -t tmpfs none /sys/fs/cgroup
405 	# mount -t cgroup none /sys/fs/cgroup/memory -o memory
427    We can write "-1" to reset the ``*.limit_in_bytes(unlimited)``.
441 availability of memory on the system. The user is required to re-read
463 Page-fault scalability is also important. At measuring parallel
464 page fault test, multi-process test may be better than multi-thread
470 .. _cgroup-v1-memory-test-troubleshoot:
473 -------------------
484 To know what happens, disabling OOM_Kill as per :ref:`"10. OOM Control"
485 <cgroup-v1-memory-oom-control>` (below) and seeing what happens will be
488 .. _cgroup-v1-memory-test-task-migration:
491 ------------------
493 When a task migrates from one cgroup to another, its charge is not
495 remain charged to it, the charge is dropped when the page is freed or
499 See :ref:`8. "Move charges at task migration" <cgroup-v1-memory-move-charges>`
502 ---------------------
505 <cgroup-v1-memory-test-troubleshoot>` and :ref:`4.2
506 <cgroup-v1-memory-test-task-migration>`, a cgroup might have some charge
508 we charge against pages, not against tasks.)
510 We move the stats to parent, and no change on the charge except uncharging
521 ---------------
531   charged file caches. Some out-of-use page caches may keep charged until
535 -------------
539   * per-memory cgroup local status
561     inactive_file   # of bytes of file-backed memory and MADV_FREE anonymous
563     active_file     # of bytes of file-backed memory on active LRU list.
608 --------------
619 -----------
631 ------------------
641 -------------
643 This is similar to numa_maps but operates on a per-memcg basis.  This is
650 per-node page counts including "hierarchical_<counter>" which sums up all
685 ---------------------------------------
699 is to allow control groups to use as much of the memory as needed, provided
704 When the system detects memory contention or low memory, control groups
705 are pushed back to their soft limits. If the soft limit of each control
707 sure that one control group does not starve the others of memory.
709 Please note that soft limits is a best-effort feature; it comes with
716 -------------
735 .. _cgroup-v1-memory-move-charges:
744 cgroups to allow fine-grained policy adjustments without having to
745 move physical pages between control domains.
748 is, uncharge task's pages from the old cgroup and charge them to the new cgroup.
753 -------------
758 If you want to enable it::
765       <cgroup-v1-memory-movable-charges>` for details.
768       Charges are moved only when you move mm->owner, in other words,
783 .. _cgroup-v1-memory-movable-charges:
786 --------------------------------------
793 +---+--------------------------------------------------------------------------+
796 | 0 | A charge of an anonymous page (or swap of it) used by the target task.   |
797 |   | You must enable Swap Extension (see 2.4) to enable move of swap charges. |
798 +---+--------------------------------------------------------------------------+
799 | 1 | A charge of file pages (normal file, tmpfs file (e.g. ipc shared memory) |
805 |   | page_mapcount(page) > 1). You must enable Swap Extension (see 2.4) to    |
806 |   | enable move of swap charges.                                             |
807 +---+--------------------------------------------------------------------------+
810 --------
812 - All of moving charge operations are done under cgroup_mutex. It's not good
824 - create an eventfd using eventfd(2);
825 - open memory.usage_in_bytes or memory.memsw.usage_in_bytes;
826 - write string like "<event_fd> <fd of memory.usage_in_bytes> <threshold>" to
832 It's applicable for root and non-root cgroup.
834 .. _cgroup-v1-memory-oom-control:
836 10. OOM Control
847  - create an eventfd using eventfd(2)
848  - open memory.oom_control file
849  - write string like "<event_fd> <fd of memory.oom_control>" to
855 You can disable the OOM-killer by writing "1" to memory.oom_control file, as:
859 If OOM-killer is disabled, tasks under cgroup will hang/sleep
860 in memory cgroup's OOM-waitqueue when they request accountable memory.
876 	- oom_kill_disable 0 or 1
877 	  (if 1, oom-killer is disabled)
878 	- under_oom	   0 or 1
880         - oom_kill         integer counter
902 resources that can be easily reconstructed or re-read from a disk.
905 about to out of memory (OOM) or even the in-kernel OOM killer is on its
911 events are not pass-through. For example, you have three cgroups: A->B->C. Now
921  - "default": this is the default behavior specified above. This mode is the
925  - "hierarchy": events always propagate up to the root, similar to the default
930  - "local": events are pass-through, i.e. they only receive notifications when
939 specified by a comma-delimited string, i.e. "low,hierarchy" specifies
940 hierarchical, pass-through, notification for all ancestor memcgs. Notification
941 that is the default, non pass-through behavior, does not specify a mode.
942 "medium,local" specifies pass-through notification for the medium level.
947 - create an eventfd using eventfd(2);
948 - open memory.pressure_level;
949 - write string as "<event_fd> <fd of memory.pressure_level> <level[,mode]>"
971    (Expect a bunch of notifications, and eventually, the oom-killer will
977 1. Make per-cgroup scanner reclaim not-shared pages first
978 2. Teach controller to account for shared-pages
992 .. [2] Singh, Balbir. Memory Controller (RSS Control),
1001 6. Menage, Paul. Control Groups v10, http://lwn.net/Articles/236032/
1002 7. Vaidyanathan, Srinivasan, Control Groups: Pagecache accounting and control
1009     https://lore.kernel.org/r/20070819094658.654.84837.sendpatchset@balbir-laptop
1012    https://lore.kernel.org/r/20070817084228.26003.12568.sendpatchset@balbir-laptop