Lines Matching +full:keys +full:- +full:per +full:- +full:group
1 .. SPDX-License-Identifier: GPL-2.0
9 :Authors: - Fenghua Yu <fenghua.yu@intel.com>
10 - Tony Luck <tony.luck@intel.com>
11 - Vikas Shivappa <vikas.shivappa@intel.com>
38 # mount -t resctrl resctrl [-o cdp[,cdpl2][,mba_MBps]] /sys/fs/resctrl
54 pseudo-locking is a unique way of using cache control to "pin" or
56 "Cache Pseudo-Locking".
93 own settings for cache use which can over-ride
111 "shareable_bits" but no resource group will
117 well as a resource group's allocation.
123 one resource group. No sharing allowed.
125 Corresponding region is pseudo-locked. No
145 non-linear. This field is purely informational
156 "per-thread":
206 5 Reads to slow memory in the non-local NUMA domain
208 3 Non-temporal writes to non-local NUMA domain
209 2 Non-temporal writes to local NUMA domain
210 1 Reads to memory in the non-local NUMA domain
252 counter can be considered for re-use.
265 mask f7 has non-consecutive 1-bits
271 system. The default group is the root directory which, immediately
283 group that is their ancestor. These are called "MON" groups in the rest
286 Removing a directory will move all tasks and cpus owned by the group it
290 Moving MON group directories to a new parent CTRL_MON group is supported
291 for the purpose of changing the resource allocations of a MON group
295 MON group.
301 this group. Writing a task id to the file will add a task to the
302 group. If the group is a CTRL_MON group the task is removed from
303 whichever previous CTRL_MON group owned the task and also from
304 any MON group that owned the task. If the group is a MON group,
306 group. The task is removed from any previous MON group.
311 this group. Writing a mask to this file will add and remove
312 CPUs to/from this group. As with the tasks file a hierarchy is
314 parent CTRL_MON group.
315 When the resource group is in pseudo-locked mode this file will
317 pseudo-locked region.
327 A list of all the resources available to this group.
328 Each resource has its own line and format - see below for details.
336 The "mode" of the resource group dictates the sharing of its
337 allocations. A "shareable" resource group allows sharing of its
338 allocations while an "exclusive" resource group does not. A
339 cache pseudo-locked region is created by first writing
340 "pseudo-locksetup" to the "mode" file before writing the cache
341 pseudo-locked region's schemata to the resource group's "schemata"
342 file. On successful pseudo-locked region creation the mode will
343 automatically change to "pseudo-locked".
351 directories have one file per event (e.g. "llc_occupancy",
352 "mbm_total_bytes", and "mbm_local_bytes"). In a MON group these
354 all tasks in the group. In CTRL_MON groups these files provide
355 the sum for all tasks in the CTRL_MON group and all tasks in
359 -------------------------
364 1) If the task is a member of a non-default group, then the schemata
365 for that group is used.
367 2) Else if the task belongs to the default group, but is running on a
368 CPU that is assigned to some specific group, then the schemata for the
369 CPU's group is used.
371 3) Otherwise the schemata for the default group is used.
374 -------------------------
375 1) If a task is a member of a MON group, or non-default CTRL_MON group
376 then RDT events for the task will be reported in that group.
378 2) If a task is a member of the default CTRL_MON group, but is running
379 on a CPU that is assigned to some specific group, then the RDT events
380 for the task will be reported in that group.
383 "mon_data" group.
388 When moving a task from one group to another you should remember that
390 a task in a monitor group showing 3 MB of cache occupancy. If you move
391 to a new group and immediately check the occupancy of the old and new
392 groups you will likely see that the old group is still showing 3 MB and
393 the new group zero. When the task accesses locations still in cache from
395 you will likely see the occupancy in the old group go down as cache lines
396 are evicted and re-used while the occupancy in the new group rises as
398 membership in the new group.
400 The same applies to cache allocation control. Moving a task to a group
405 to identify a control group and a monitoring group respectively. Each of
406 the resource groups are mapped to these IDs based on the kind of group. The
409 and creation of "MON" group may fail if we run out of RMIDs.
411 max_threshold_occupancy - generic concepts
412 ------------------------------------------
418 limbo RMIDs but which are not ready to be used, user may see an -EBUSY
424 Schemata files - general concepts
425 ---------------------------------
431 ---------
432 On current generation systems there is one L3 cache per socket and L2
443 ---------------------
450 0x3, 0x6 and 0xC are legal 4-bit masks with two bits set, but 0x5, 0x9
451 and 0xA are not. On a system with a 20-bit mask each bit represents 5%
516 ----------------------------------------------------------------
522 ------------------------------------------------------------------
530 ------------------------
543 ------------------------------------------
551 ---------------------------------------------
559 ---------------------------------------
578 ---------------------------------
593 --------------------------------------------------
613 --------------------------------------------------------------------
632 Cache Pseudo-Locking
635 application can fill. Cache pseudo-locking builds on the fact that a
636 CPU can still read and write data pre-allocated outside its current
637 allocated area on a cache hit. With cache pseudo-locking, data can be
640 pseudo-locked memory is made accessible to user space where an
644 The creation of a cache pseudo-locked region is triggered by a request
646 to be pseudo-locked. The cache pseudo-locked region is created as follows:
648 - Create a CAT allocation CLOSNEW with a CBM matching the schemata
649 from the user of the cache region that will contain the pseudo-locked
652 while the pseudo-locked region exists.
653 - Create a contiguous region of memory of the same size as the cache
655 - Flush the cache, disable hardware prefetchers, disable preemption.
656 - Make CLOSNEW the active CLOS and touch the allocated memory to load
658 - Set the previous CLOS as active.
659 - At this point the closid CLOSNEW can be released - the cache
660 pseudo-locked region is protected as long as its CBM does not appear in
661 any CAT allocation. Even though the cache pseudo-locked region will from
663 any CLOS will be able to access the memory in the pseudo-locked region since
665 - The contiguous region of memory loaded into the cache is exposed to
666 user-space as a character device.
668 Cache pseudo-locking increases the probability that data will remain
672 “locked” data from cache. Power management C-states may shrink or
673 power off cache. Deeper C-states will automatically be restricted on
674 pseudo-locked region creation.
676 It is required that an application using a pseudo-locked region runs
678 with the cache on which the pseudo-locked region resides. A sanity check
679 within the code will not allow an application to map pseudo-locked memory
681 pseudo-locked region resides. The sanity check is only done during the
685 Pseudo-locking is accomplished in two stages:
688 of cache that should be dedicated to pseudo-locking. At this time an
691 2) During the second stage a user-space application maps (mmap()) the
692 pseudo-locked memory into its address space.
694 Cache Pseudo-Locking Interface
695 ------------------------------
696 A pseudo-locked region is created using the resctrl interface as follows:
698 1) Create a new resource group by creating a new directory in /sys/fs/resctrl.
699 2) Change the new resource group's mode to "pseudo-locksetup" by writing
700 "pseudo-locksetup" to the "mode" file.
701 3) Write the schemata of the pseudo-locked region to the "schemata" file. All
705 On successful pseudo-locked region creation the "mode" file will contain
706 "pseudo-locked" and a new character device with the same name as the resource
707 group will exist in /dev/pseudo_lock. This character device can be mmap()'ed
708 by user space in order to obtain access to the pseudo-locked memory region.
710 An example of cache pseudo-locked region creation and usage can be found below.
712 Cache Pseudo-Locking Debugging Interface
713 ----------------------------------------
714 The pseudo-locking debugging interface is enabled by default (if
718 location is present in the cache. The pseudo-locking debugging interface uses
720 the pseudo-locked region:
724 example below). In this test the pseudo-locked region is traversed at
732 When a pseudo-locked region is created a new debugfs directory is created for
734 write-only file, pseudo_lock_measure, is present in this directory. The
735 measurement of the pseudo-locked region depends on the number written to this
756 In this example a pseudo-locked region named "newlock" was created. Here is
762 # echo 'hist:keys=latency' > /sys/kernel/tracing/events/resctrl/pseudo_lock_mem_latency/trigger
770 # trigger info: hist:keys=latency:vals=hitcount:sort=hitcount:size=2048 [active]
790 In this example a pseudo-locked region named "newlock" was created on the L2
803 # _-----=> irqs-off
804 # / _----=> need-resched
805 # | / _---=> hardirq/softirq
806 # || / _--=> preempt-depth
808 # TASK-PID CPU# |||| TIMESTAMP FUNCTION
810 pseudo_lock_mea-1672 [002] .... 3132.860500: pseudo_lock_l2: hits=4097 miss=0
818 On a two socket machine (one L3 cache per socket) with just four bits
823 # mount -t resctrl resctrl /sys/fs/resctrl
829 The default resource group is unmodified, so we have access to all parts
832 Tasks that are under the control of group "p0" may only allocate from the
834 Tasks in group "p1" use the "lower" 50% of cache on both sockets.
836 Similarly, tasks that are under the control of group "p0" may use a
838 Tasks in group "p1" may also use 50% memory b/w on both sockets.
841 b/w that the group may be able to use and the system admin can configure
856 Again two sockets, but this time with a more realistic 20-bit mask.
859 processor 1 on socket 0 on a 2-socket and dual core machine. To avoid noisy
860 neighbors, each of the two real-time tasks exclusively occupies one quarter
864 # mount -t resctrl resctrl /sys/fs/resctrl
867 First we reset the schemata for the default group so that the "upper"
873 Next we make a resource group for our first real time task and give
880 Finally we move our first real time task into this resource group. We
887 # taskset -cp 1 1234
894 # taskset -cp 2 5678
903 # echo -e "L3:0=f8000;1=fffff\nMB:0=20;1=100" > p0/schemata
909 # echo -e "L3:0=f8000;1=fffff\nMB:0=20;1=100" > p0/schemata
913 A single socket system which has real-time tasks running on core 4-7 and
914 non real-time workload assigned to core 0-3. The real-time tasks share text
915 and data, so a per task association is not required and due to interaction
920 # mount -t resctrl resctrl /sys/fs/resctrl
923 First we reset the schemata for the default group so that the "upper"
929 Next we make a resource group for our real time cores and give it access
937 Finally we move core 4-7 over to the new group and make sure that the
939 also get 50% of memory bandwidth assuming that the cores 4-7 are SMT
940 siblings and only the real time threads are scheduled on the cores 4-7.
948 mode allowing sharing of their cache allocations. If one resource group
949 configures a cache allocation then nothing prevents another resource group
952 In this example a new exclusive resource group will be created on a L2 CAT
953 system with two L2 cache instances that can be configured with an 8-bit
954 capacity bitmask. The new exclusive resource group will be configured to use
958 # mount -t resctrl resctrl /sys/fs/resctrl/
961 First, we observe that the default group is configured to allocate to all L2
967 We could attempt to create the new resource group at this point, but it will
968 fail because of the overlap with the schemata of the default group::
975 -sh: echo: write error: Invalid argument
979 To ensure that there is no overlap with another resource group the default
980 resource group's schemata has to change, making it possible for the new
981 resource group to become exclusive.
992 A new resource group will on creation not overlap with an exclusive resource
993 group::
1007 A resource group cannot be forced to overlap with an exclusive resource group::
1010 -sh: echo: write error: Invalid argument
1012 overlaps with exclusive group
1014 Example of Cache Pseudo-Locking
1016 Lock portion of L2 cache from cache id 1 using CBM 0x3. Pseudo-locked
1021 # mount -t resctrl resctrl /sys/fs/resctrl/
1024 Ensure that there are bits available that can be pseudo-locked, since only
1025 unused bits can be pseudo-locked the bits to be pseudo-locked needs to be
1026 removed from the default resource group's schemata::
1034 Create a new resource group that will be associated with the pseudo-locked
1035 region, indicate that it will be used for a pseudo-locked region, and
1036 configure the requested pseudo-locked region capacity bitmask::
1039 # echo pseudo-locksetup > newlock/mode
1042 On success the resource group's mode will change to pseudo-locked, the
1043 bit_usage will reflect the pseudo-locked region, and the character device
1044 exposing the pseudo-locked region will exist::
1047 pseudo-locked
1050 # ls -l /dev/pseudo_lock/newlock
1051 crw------- 1 root root 243, 0 Apr 3 05:01 /dev/pseudo_lock/newlock
1056 * Example code to access one page of pseudo-locked cache region
1069 * cores associated with the pseudo-locked region. Here the cpu
1106 /* Application interacts with pseudo-locked memory @mapping */
1120 ----------------------------
1128 1. Read the cbmmasks from each directory or the per-resource "bit_usage"
1159 $ flock -s /sys/fs/resctrl/ find /sys/fs/resctrl
1163 $ cat create-dir.sh
1165 mask = function-of(output.txt)
1169 $ flock /sys/fs/resctrl/ ./create-dir.sh
1188 exit(-1);
1200 exit(-1);
1212 exit(-1);
1221 if (fd == -1) {
1223 exit(-1);
1237 ----------------------
1240 group or CTRL_MON group.
1243 Example 1 (Monitor CTRL_MON group and subset of tasks in CTRL_MON group)
1244 ------------------------------------------------------------------------
1245 On a two socket machine (one L3 cache per socket) with just four bits
1248 # mount -t resctrl resctrl /sys/fs/resctrl
1256 The default resource group is unmodified, so we have access to all parts
1259 Tasks that are under the control of group "p0" may only allocate from the
1261 Tasks in group "p1" use the "lower" 50% of cache on both sockets.
1263 Create monitor groups and assign a subset of tasks to each monitor group.
1281 The parent ctrl_mon group shows the aggregated data.
1288 --------------------------------------------
1289 On a two socket machine (one L3 cache per socket)::
1291 # mount -t resctrl resctrl /sys/fs/resctrl
1295 An RMID is allocated to the group once its created and hence the <cmd>
1308 ---------------------------------------------------------------------
1312 But user can create different MON groups within the root group thereby
1319 # mount -t resctrl resctrl /sys/fs/resctrl
1327 Monitor the groups separately and also get per domain data. From the
1343 -----------------------------------
1345 A single socket system which has real time tasks running on cores 4-7
1350 # mount -t resctrl resctrl /sys/fs/resctrl
1354 Move the cpus 4-7 over to p1::
1367 -----------------------------------------------------------------
1382 +---------------+---------------+---------------+-----------------+
1384 +---------------+---------------+---------------+-----------------+
1386 +---------------+---------------+---------------+-----------------+
1388 +---------------+---------------+---------------+-----------------+
1390 +---------------+---------------+---------------+-----------------+
1392 +---------------+---------------+---------------+-----------------+
1394 +---------------+---------------+---------------+-----------------+
1396 +---------------+---------------+---------------+-----------------+
1398 +---------------+---------------+---------------+-----------------+
1400 +---------------+---------------+---------------+-----------------+
1402 +---------------+---------------+---------------+-----------------+
1404 +---------------+---------------+---------------+-----------------+
1406 +---------------+---------------+---------------+-----------------+
1408 +---------------+---------------+---------------+-----------------+
1410 +---------------+---------------+---------------+-----------------+
1412 +---------------+---------------+---------------+-----------------+
1414 +---------------+---------------+---------------+-----------------+
1416 +---------------+---------------+---------------+-----------------+
1418 +---------------+---------------+---------------+-----------------+
1420 +---------------+---------------+---------------+-----------------+
1422 +---------------+---------------+---------------+-----------------+
1424 +---------------+---------------+---------------+-----------------+
1426 +---------------+---------------+---------------+-----------------+
1428 +---------------+---------------+---------------+-----------------+
1430 +---------------+---------------+---------------+-----------------+
1432 +---------------+---------------+---------------+-----------------+
1434 +---------------+---------------+---------------+-----------------+
1436 +---------------+---------------+---------------+-----------------+
1438 +---------------+---------------+---------------+-----------------+
1446 …958/https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-spec-update.html
1448 2. Erratum BDF102 in Intel Xeon E5-2600 v4 Processor Product Family Specification Update:
1449 …w.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-v4-spec-update.pdf
1452 …are.intel.com/content/www/us/en/develop/articles/intel-resource-director-technology-rdt-reference-…