1.. SPDX-License-Identifier: GPL-2.0 2 3====== 4Design 5====== 6 7 8Overall Architecture 9==================== 10 11DAMON subsystem is configured with three layers including 12 13- Operations Set: Implements fundamental operations for DAMON that depends on 14 the given monitoring target address-space and available set of 15 software/hardware primitives, 16- Core: Implements core logics including monitoring overhead/accurach control 17 and access-aware system operations on top of the operations set layer, and 18- Modules: Implements kernel modules for various purposes that provides 19 interfaces for the user space, on top of the core layer. 20 21 22Configurable Operations Set 23--------------------------- 24 25For data access monitoring and additional low level work, DAMON needs a set of 26implementations for specific operations that are dependent on and optimized for 27the given target address space. On the other hand, the accuracy and overhead 28tradeoff mechanism, which is the core logic of DAMON, is in the pure logic 29space. DAMON separates the two parts in different layers, namely DAMON 30Operations Set and DAMON Core Logics Layers, respectively. It further defines 31the interface between the layers to allow various operations sets to be 32configured with the core logic. 33 34Due to this design, users can extend DAMON for any address space by configuring 35the core logic to use the appropriate operations set. If any appropriate set 36is unavailable, users can implement one on their own. 37 38For example, physical memory, virtual memory, swap space, those for specific 39processes, NUMA nodes, files, and backing memory devices would be supportable. 40Also, if some architectures or devices supporting special optimized access 41check primitives, those will be easily configurable. 42 43 44Programmable Modules 45-------------------- 46 47Core layer of DAMON is implemented as a framework, and exposes its application 48programming interface to all kernel space components such as subsystems and 49modules. For common use cases of DAMON, DAMON subsystem provides kernel 50modules that built on top of the core layer using the API, which can be easily 51used by the user space end users. 52 53 54Operations Set Layer 55==================== 56 57The monitoring operations are defined in two parts: 58 591. Identification of the monitoring target address range for the address space. 602. Access check of specific address range in the target space. 61 62DAMON currently provides the implementations of the operations for the physical 63and virtual address spaces. Below two subsections describe how those work. 64 65 66VMA-based Target Address Range Construction 67------------------------------------------- 68 69This is only for the virtual address space monitoring operations 70implementation. That for the physical address space simply asks users to 71manually set the monitoring target address ranges. 72 73Only small parts in the super-huge virtual address space of the processes are 74mapped to the physical memory and accessed. Thus, tracking the unmapped 75address regions is just wasteful. However, because DAMON can deal with some 76level of noise using the adaptive regions adjustment mechanism, tracking every 77mapping is not strictly required but could even incur a high overhead in some 78cases. That said, too huge unmapped areas inside the monitoring target should 79be removed to not take the time for the adaptive mechanism. 80 81For the reason, this implementation converts the complex mappings to three 82distinct regions that cover every mapped area of the address space. The two 83gaps between the three regions are the two biggest unmapped areas in the given 84address space. The two biggest unmapped areas would be the gap between the 85heap and the uppermost mmap()-ed region, and the gap between the lowermost 86mmap()-ed region and the stack in most of the cases. Because these gaps are 87exceptionally huge in usual address spaces, excluding these will be sufficient 88to make a reasonable trade-off. Below shows this in detail:: 89 90 <heap> 91 <BIG UNMAPPED REGION 1> 92 <uppermost mmap()-ed region> 93 (small mmap()-ed regions and munmap()-ed regions) 94 <lowermost mmap()-ed region> 95 <BIG UNMAPPED REGION 2> 96 <stack> 97 98 99PTE Accessed-bit Based Access Check 100----------------------------------- 101 102Both of the implementations for physical and virtual address spaces use PTE 103Accessed-bit for basic access checks. Only one difference is the way of 104finding the relevant PTE Accessed bit(s) from the address. While the 105implementation for the virtual address walks the page table for the target task 106of the address, the implementation for the physical address walks every page 107table having a mapping to the address. In this way, the implementations find 108and clear the bit(s) for next sampling target address and checks whether the 109bit(s) set again after one sampling period. This could disturb other kernel 110subsystems using the Accessed bits, namely Idle page tracking and the reclaim 111logic. DAMON does nothing to avoid disturbing Idle page tracking, so handling 112the interference is the responsibility of sysadmins. However, it solves the 113conflict with the reclaim logic using ``PG_idle`` and ``PG_young`` page flags, 114as Idle page tracking does. 115 116 117Core Logics 118=========== 119 120 121Monitoring 122---------- 123 124Below four sections describe each of the DAMON core mechanisms and the five 125monitoring attributes, ``sampling interval``, ``aggregation interval``, 126``update interval``, ``minimum number of regions``, and ``maximum number of 127regions``. 128 129 130Access Frequency Monitoring 131~~~~~~~~~~~~~~~~~~~~~~~~~~~ 132 133The output of DAMON says what pages are how frequently accessed for a given 134duration. The resolution of the access frequency is controlled by setting 135``sampling interval`` and ``aggregation interval``. In detail, DAMON checks 136access to each page per ``sampling interval`` and aggregates the results. In 137other words, counts the number of the accesses to each page. After each 138``aggregation interval`` passes, DAMON calls callback functions that previously 139registered by users so that users can read the aggregated results and then 140clears the results. This can be described in below simple pseudo-code:: 141 142 while monitoring_on: 143 for page in monitoring_target: 144 if accessed(page): 145 nr_accesses[page] += 1 146 if time() % aggregation_interval == 0: 147 for callback in user_registered_callbacks: 148 callback(monitoring_target, nr_accesses) 149 for page in monitoring_target: 150 nr_accesses[page] = 0 151 sleep(sampling interval) 152 153The monitoring overhead of this mechanism will arbitrarily increase as the 154size of the target workload grows. 155 156 157Region Based Sampling 158~~~~~~~~~~~~~~~~~~~~~ 159 160To avoid the unbounded increase of the overhead, DAMON groups adjacent pages 161that assumed to have the same access frequencies into a region. As long as the 162assumption (pages in a region have the same access frequencies) is kept, only 163one page in the region is required to be checked. Thus, for each ``sampling 164interval``, DAMON randomly picks one page in each region, waits for one 165``sampling interval``, checks whether the page is accessed meanwhile, and 166increases the access frequency of the region if so. Therefore, the monitoring 167overhead is controllable by setting the number of regions. DAMON allows users 168to set the minimum and the maximum number of regions for the trade-off. 169 170This scheme, however, cannot preserve the quality of the output if the 171assumption is not guaranteed. 172 173 174Adaptive Regions Adjustment 175~~~~~~~~~~~~~~~~~~~~~~~~~~~ 176 177Even somehow the initial monitoring target regions are well constructed to 178fulfill the assumption (pages in same region have similar access frequencies), 179the data access pattern can be dynamically changed. This will result in low 180monitoring quality. To keep the assumption as much as possible, DAMON 181adaptively merges and splits each region based on their access frequency. 182 183For each ``aggregation interval``, it compares the access frequencies of 184adjacent regions and merges those if the frequency difference is small. Then, 185after it reports and clears the aggregated access frequency of each region, it 186splits each region into two or three regions if the total number of regions 187will not exceed the user-specified maximum number of regions after the split. 188 189In this way, DAMON provides its best-effort quality and minimal overhead while 190keeping the bounds users set for their trade-off. 191 192 193Age Tracking 194~~~~~~~~~~~~ 195 196By analyzing the monitoring results, users can also find how long the current 197access pattern of a region has maintained. That could be used for good 198understanding of the access pattern. For example, page placement algorithm 199utilizing both the frequency and the recency could be implemented using that. 200To make such access pattern maintained period analysis easier, DAMON maintains 201yet another counter called ``age`` in each region. For each ``aggregation 202interval``, DAMON checks if the region's size and access frequency 203(``nr_accesses``) has significantly changed. If so, the counter is reset to 204zero. Otherwise, the counter is increased. 205 206 207Dynamic Target Space Updates Handling 208~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 209 210The monitoring target address range could dynamically changed. For example, 211virtual memory could be dynamically mapped and unmapped. Physical memory could 212be hot-plugged. 213 214As the changes could be quite frequent in some cases, DAMON allows the 215monitoring operations to check dynamic changes including memory mapping changes 216and applies it to monitoring operations-related data structures such as the 217abstracted monitoring target memory area only for each of a user-specified time 218interval (``update interval``). 219 220 221.. _damon_design_damos: 222 223Operation Schemes 224----------------- 225 226One common purpose of data access monitoring is access-aware system efficiency 227optimizations. For example, 228 229 paging out memory regions that are not accessed for more than two minutes 230 231or 232 233 using THP for memory regions that are larger than 2 MiB and showing a high 234 access frequency for more than one minute. 235 236One straightforward approach for such schemes would be profile-guided 237optimizations. That is, getting data access monitoring results of the 238workloads or the system using DAMON, finding memory regions of special 239characteristics by profiling the monitoring results, and making system 240operation changes for the regions. The changes could be made by modifying or 241providing advice to the software (the application and/or the kernel), or 242reconfiguring the hardware. Both offline and online approaches could be 243available. 244 245Among those, providing advice to the kernel at runtime would be flexible and 246effective, and therefore widely be used. However, implementing such schemes 247could impose unnecessary redundancy and inefficiency. The profiling could be 248redundant if the type of interest is common. Exchanging the information 249including monitoring results and operation advice between kernel and user 250spaces could be inefficient. 251 252To allow users to reduce such redundancy and inefficiencies by offloading the 253works, DAMON provides a feature called Data Access Monitoring-based Operation 254Schemes (DAMOS). It lets users specify their desired schemes at a high 255level. For such specifications, DAMON starts monitoring, finds regions having 256the access pattern of interest, and applies the user-desired operation actions 257to the regions as soon as found. 258 259 260.. _damon_design_damos_action: 261 262Operation Action 263~~~~~~~~~~~~~~~~ 264 265The management action that the users desire to apply to the regions of their 266interest. For example, paging out, prioritizing for next reclamation victim 267selection, advising ``khugepaged`` to collapse or split, or doing nothing but 268collecting statistics of the regions. 269 270The list of supported actions is defined in DAMOS, but the implementation of 271each action is in the DAMON operations set layer because the implementation 272normally depends on the monitoring target address space. For example, the code 273for paging specific virtual address ranges out would be different from that for 274physical address ranges. And the monitoring operations implementation sets are 275not mandated to support all actions of the list. Hence, the availability of 276specific DAMOS action depends on what operations set is selected to be used 277together. 278 279Applying an action to a region is considered as changing the region's 280characteristics. Hence, DAMOS resets the age of regions when an action is 281applied to those. 282 283 284.. _damon_design_damos_access_pattern: 285 286Target Access Pattern 287~~~~~~~~~~~~~~~~~~~~~ 288 289The access pattern of the schemes' interest. The patterns are constructed with 290the properties that DAMON's monitoring results provide, specifically the size, 291the access frequency, and the age. Users can describe their access pattern of 292interest by setting minimum and maximum values of the three properties. If a 293region's three properties are in the ranges, DAMOS classifies it as one of the 294regions that the scheme is having an interest in. 295 296 297.. _damon_design_damos_quotas: 298 299Quotas 300~~~~~~ 301 302DAMOS upper-bound overhead control feature. DAMOS could incur high overhead if 303the target access pattern is not properly tuned. For example, if a huge memory 304region having the access pattern of interest is found, applying the scheme's 305action to all pages of the huge region could consume unacceptably large system 306resources. Preventing such issues by tuning the access pattern could be 307challenging, especially if the access patterns of the workloads are highly 308dynamic. 309 310To mitigate that situation, DAMOS provides an upper-bound overhead control 311feature called quotas. It lets users specify an upper limit of time that DAMOS 312can use for applying the action, and/or a maximum bytes of memory regions that 313the action can be applied within a user-specified time duration. 314 315 316.. _damon_design_damos_quotas_prioritization: 317 318Prioritization 319^^^^^^^^^^^^^^ 320 321A mechanism for making a good decision under the quotas. When the action 322cannot be applied to all regions of interest due to the quotas, DAMOS 323prioritizes regions and applies the action to only regions having high enough 324priorities so that it will not exceed the quotas. 325 326The prioritization mechanism should be different for each action. For example, 327rarely accessed (colder) memory regions would be prioritized for page-out 328scheme action. In contrast, the colder regions would be deprioritized for huge 329page collapse scheme action. Hence, the prioritization mechanisms for each 330action are implemented in each DAMON operations set, together with the actions. 331 332Though the implementation is up to the DAMON operations set, it would be common 333to calculate the priority using the access pattern properties of the regions. 334Some users would want the mechanisms to be personalized for their specific 335case. For example, some users would want the mechanism to weigh the recency 336(``age``) more than the access frequency (``nr_accesses``). DAMOS allows users 337to specify the weight of each access pattern property and passes the 338information to the underlying mechanism. Nevertheless, how and even whether 339the weight will be respected are up to the underlying prioritization mechanism 340implementation. 341 342 343.. _damon_design_damos_watermarks: 344 345Watermarks 346~~~~~~~~~~ 347 348Conditional DAMOS (de)activation automation. Users might want DAMOS to run 349only under certain situations. For example, when a sufficient amount of free 350memory is guaranteed, running a scheme for proactive reclamation would only 351consume unnecessary system resources. To avoid such consumption, the user would 352need to manually monitor some metrics such as free memory ratio, and turn 353DAMON/DAMOS on or off. 354 355DAMOS allows users to offload such works using three watermarks. It allows the 356users to configure the metric of their interest, and three watermark values, 357namely high, middle, and low. If the value of the metric becomes above the 358high watermark or below the low watermark, the scheme is deactivated. If the 359metric becomes below the mid watermark but above the low watermark, the scheme 360is activated. If all schemes are deactivated by the watermarks, the monitoring 361is also deactivated. In this case, the DAMON worker thread only periodically 362checks the watermarks and therefore incurs nearly zero overhead. 363 364 365.. _damon_design_damos_filters: 366 367Filters 368~~~~~~~ 369 370Non-access pattern-based target memory regions filtering. If users run 371self-written programs or have good profiling tools, they could know something 372more than the kernel, such as future access patterns or some special 373requirements for specific types of memory. For example, some users may know 374only anonymous pages can impact their program's performance. They can also 375have a list of latency-critical processes. 376 377To let users optimize DAMOS schemes with such special knowledge, DAMOS provides 378a feature called DAMOS filters. The feature allows users to set an arbitrary 379number of filters for each scheme. Each filter specifies the type of target 380memory, and whether it should exclude the memory of the type (filter-out), or 381all except the memory of the type (filter-in). 382 383Currently, anonymous page, memory cgroup, address range, and DAMON monitoring 384target type filters are supported by the feature. Some filter target types 385require additional arguments. The memory cgroup filter type asks users to 386specify the file path of the memory cgroup for the filter. The address range 387type asks the start and end addresses of the range. The DAMON monitoring 388target type asks the index of the target from the context's monitoring targets 389list. Hence, users can apply specific schemes to only anonymous pages, 390non-anonymous pages, pages of specific cgroups, all pages excluding those of 391specific cgroups, pages in specific address range, pages in specific DAMON 392monitoring targets, and any combination of those. 393 394To handle filters efficiently, the address range and DAMON monitoring target 395type filters are handled by the core layer, while others are handled by 396operations set. If a memory region is filtered by a core layer-handled filter, 397it is not counted as the scheme has tried to the region. In contrast, if a 398memory regions is filtered by an operations set layer-handled filter, it is 399counted as the scheme has tried. The difference in accounting leads to changes 400in the statistics. 401 402 403Application Programming Interface 404--------------------------------- 405 406The programming interface for kernel space data access-aware applications. 407DAMON is a framework, so it does nothing by itself. Instead, it only helps 408other kernel components such as subsystems and modules building their data 409access-aware applications using DAMON's core features. For this, DAMON exposes 410its all features to other kernel components via its application programming 411interface, namely ``include/linux/damon.h``. Please refer to the API 412:doc:`document </mm/damon/api>` for details of the interface. 413 414 415Modules 416======= 417 418Because the core of DAMON is a framework for kernel components, it doesn't 419provide any direct interface for the user space. Such interfaces should be 420implemented by each DAMON API user kernel components, instead. DAMON subsystem 421itself implements such DAMON API user modules, which are supposed to be used 422for general purpose DAMON control and special purpose data access-aware system 423operations, and provides stable application binary interfaces (ABI) for the 424user space. The user space can build their efficient data access-aware 425applications using the interfaces. 426 427 428General Purpose User Interface Modules 429-------------------------------------- 430 431DAMON modules that provide user space ABIs for general purpose DAMON usage in 432runtime. 433 434DAMON user interface modules, namely 'DAMON sysfs interface' and 'DAMON debugfs 435interface' are DAMON API user kernel modules that provide ABIs to the 436user-space. Please note that DAMON debugfs interface is currently deprecated. 437 438Like many other ABIs, the modules create files on sysfs and debugfs, allow 439users to specify their requests to and get the answers from DAMON by writing to 440and reading from the files. As a response to such I/O, DAMON user interface 441modules control DAMON and retrieve the results as user requested via the DAMON 442API, and return the results to the user-space. 443 444The ABIs are designed to be used for user space applications development, 445rather than human beings' fingers. Human users are recommended to use such 446user space tools. One such Python-written user space tool is available at 447Github (https://github.com/awslabs/damo), Pypi 448(https://pypistats.org/packages/damo), and Fedora 449(https://packages.fedoraproject.org/pkgs/python-damo/damo/). 450 451Please refer to the ABI :doc:`document </admin-guide/mm/damon/usage>` for 452details of the interfaces. 453 454 455Special-Purpose Access-aware Kernel Modules 456------------------------------------------- 457 458DAMON modules that provide user space ABI for specific purpose DAMON usage. 459 460DAMON sysfs/debugfs user interfaces are for full control of all DAMON features 461in runtime. For each special-purpose system-wide data access-aware system 462operations such as proactive reclamation or LRU lists balancing, the interfaces 463could be simplified by removing unnecessary knobs for the specific purpose, and 464extended for boot-time and even compile time control. Default values of DAMON 465control parameters for the usage would also need to be optimized for the 466purpose. 467 468To support such cases, yet more DAMON API user kernel modules that provide more 469simple and optimized user space interfaces are available. Currently, two 470modules for proactive reclamation and LRU lists manipulation are provided. For 471more detail, please read the usage documents for those 472(:doc:`/admin-guide/mm/damon/reclaim` and 473:doc:`/admin-guide/mm/damon/lru_sort`). 474