1.. SPDX-License-Identifier: GPL-2.0 2 3=========================================== 4Intel(R) Memory Protection Extensions (MPX) 5=========================================== 6 7Intel(R) MPX Overview 8===================== 9 10Intel(R) Memory Protection Extensions (Intel(R) MPX) is a new capability 11introduced into Intel Architecture. Intel MPX provides hardware features 12that can be used in conjunction with compiler changes to check memory 13references, for those references whose compile-time normal intentions are 14usurped at runtime due to buffer overflow or underflow. 15 16You can tell if your CPU supports MPX by looking in /proc/cpuinfo:: 17 18 cat /proc/cpuinfo | grep ' mpx ' 19 20For more information, please refer to Intel(R) Architecture Instruction 21Set Extensions Programming Reference, Chapter 9: Intel(R) Memory Protection 22Extensions. 23 24Note: As of December 2014, no hardware with MPX is available but it is 25possible to use SDE (Intel(R) Software Development Emulator) instead, which 26can be downloaded from 27http://software.intel.com/en-us/articles/intel-software-development-emulator 28 29 30How to get the advantage of MPX 31=============================== 32 33For MPX to work, changes are required in the kernel, binutils and compiler. 34No source changes are required for applications, just a recompile. 35 36There are a lot of moving parts of this to all work right. The following 37is how we expect the compiler, application and kernel to work together. 38 391) Application developer compiles with -fmpx. The compiler will add the 40 instrumentation as well as some setup code called early after the app 41 starts. New instruction prefixes are noops for old CPUs. 422) That setup code allocates (virtual) space for the "bounds directory", 43 points the "bndcfgu" register to the directory (must also set the valid 44 bit) and notifies the kernel (via the new prctl(PR_MPX_ENABLE_MANAGEMENT)) 45 that the app will be using MPX. The app must be careful not to access 46 the bounds tables between the time when it populates "bndcfgu" and 47 when it calls the prctl(). This might be hard to guarantee if the app 48 is compiled with MPX. You can add "__attribute__((bnd_legacy))" to 49 the function to disable MPX instrumentation to help guarantee this. 50 Also be careful not to call out to any other code which might be 51 MPX-instrumented. 523) The kernel detects that the CPU has MPX, allows the new prctl() to 53 succeed, and notes the location of the bounds directory. Userspace is 54 expected to keep the bounds directory at that location. We note it 55 instead of reading it each time because the 'xsave' operation needed 56 to access the bounds directory register is an expensive operation. 574) If the application needs to spill bounds out of the 4 registers, it 58 issues a bndstx instruction. Since the bounds directory is empty at 59 this point, a bounds fault (#BR) is raised, the kernel allocates a 60 bounds table (in the user address space) and makes the relevant entry 61 in the bounds directory point to the new table. 625) If the application violates the bounds specified in the bounds registers, 63 a separate kind of #BR is raised which will deliver a signal with 64 information about the violation in the 'struct siginfo'. 656) Whenever memory is freed, we know that it can no longer contain valid 66 pointers, and we attempt to free the associated space in the bounds 67 tables. If an entire table becomes unused, we will attempt to free 68 the table and remove the entry in the directory. 69 70To summarize, there are essentially three things interacting here: 71 72GCC with -fmpx: 73 * enables annotation of code with MPX instructions and prefixes 74 * inserts code early in the application to call in to the "gcc runtime" 75GCC MPX Runtime: 76 * Checks for hardware MPX support in cpuid leaf 77 * allocates virtual space for the bounds directory (malloc() essentially) 78 * points the hardware BNDCFGU register at the directory 79 * calls a new prctl(PR_MPX_ENABLE_MANAGEMENT) to notify the kernel to 80 start managing the bounds directories 81Kernel MPX Code: 82 * Checks for hardware MPX support in cpuid leaf 83 * Handles #BR exceptions and sends SIGSEGV to the app when it violates 84 bounds, like during a buffer overflow. 85 * When bounds are spilled in to an unallocated bounds table, the kernel 86 notices in the #BR exception, allocates the virtual space, then 87 updates the bounds directory to point to the new table. It keeps 88 special track of the memory with a VM_MPX flag. 89 * Frees unused bounds tables at the time that the memory they described 90 is unmapped. 91 92 93How does MPX kernel code work 94============================= 95 96Handling #BR faults caused by MPX 97--------------------------------- 98 99When MPX is enabled, there are 2 new situations that can generate 100#BR faults. 101 102 * new bounds tables (BT) need to be allocated to save bounds. 103 * bounds violation caused by MPX instructions. 104 105We hook #BR handler to handle these two new situations. 106 107On-demand kernel allocation of bounds tables 108-------------------------------------------- 109 110MPX only has 4 hardware registers for storing bounds information. If 111MPX-enabled code needs more than these 4 registers, it needs to spill 112them somewhere. It has two special instructions for this which allow 113the bounds to be moved between the bounds registers and some new "bounds 114tables". 115 116#BR exceptions are a new class of exceptions just for MPX. They are 117similar conceptually to a page fault and will be raised by the MPX 118hardware during both bounds violations or when the tables are not 119present. The kernel handles those #BR exceptions for not-present tables 120by carving the space out of the normal processes address space and then 121pointing the bounds-directory over to it. 122 123The tables need to be accessed and controlled by userspace because 124the instructions for moving bounds in and out of them are extremely 125frequent. They potentially happen every time a register points to 126memory. Any direct kernel involvement (like a syscall) to access the 127tables would obviously destroy performance. 128 129Why not do this in userspace? MPX does not strictly require anything in 130the kernel. It can theoretically be done completely from userspace. Here 131are a few ways this could be done. We don't think any of them are practical 132in the real-world, but here they are. 133 134:Q: Can virtual space simply be reserved for the bounds tables so that we 135 never have to allocate them? 136:A: MPX-enabled application will possibly create a lot of bounds tables in 137 process address space to save bounds information. These tables can take 138 up huge swaths of memory (as much as 80% of the memory on the system) 139 even if we clean them up aggressively. In the worst-case scenario, the 140 tables can be 4x the size of the data structure being tracked. IOW, a 141 1-page structure can require 4 bounds-table pages. An X-GB virtual 142 area needs 4*X GB of virtual space, plus 2GB for the bounds directory. 143 If we were to preallocate them for the 128TB of user virtual address 144 space, we would need to reserve 512TB+2GB, which is larger than the 145 entire virtual address space today. This means they can not be reserved 146 ahead of time. Also, a single process's pre-populated bounds directory 147 consumes 2GB of virtual *AND* physical memory. IOW, it's completely 148 infeasible to prepopulate bounds directories. 149 150:Q: Can we preallocate bounds table space at the same time memory is 151 allocated which might contain pointers that might eventually need 152 bounds tables? 153:A: This would work if we could hook the site of each and every memory 154 allocation syscall. This can be done for small, constrained applications. 155 But, it isn't practical at a larger scale since a given app has no 156 way of controlling how all the parts of the app might allocate memory 157 (think libraries). The kernel is really the only place to intercept 158 these calls. 159 160:Q: Could a bounds fault be handed to userspace and the tables allocated 161 there in a signal handler instead of in the kernel? 162:A: mmap() is not on the list of safe async handler functions and even 163 if mmap() would work it still requires locking or nasty tricks to 164 keep track of the allocation state there. 165 166Having ruled out all of the userspace-only approaches for managing 167bounds tables that we could think of, we create them on demand in 168the kernel. 169 170Decoding MPX instructions 171------------------------- 172 173If a #BR is generated due to a bounds violation caused by MPX. 174We need to decode MPX instructions to get violation address and 175set this address into extended struct siginfo. 176 177The _sigfault field of struct siginfo is extended as follow:: 178 179 87 /* SIGILL, SIGFPE, SIGSEGV, SIGBUS */ 180 88 struct { 181 89 void __user *_addr; /* faulting insn/memory ref. */ 182 90 #ifdef __ARCH_SI_TRAPNO 183 91 int _trapno; /* TRAP # which caused the signal */ 184 92 #endif 185 93 short _addr_lsb; /* LSB of the reported address */ 186 94 struct { 187 95 void __user *_lower; 188 96 void __user *_upper; 189 97 } _addr_bnd; 190 98 } _sigfault; 191 192The '_addr' field refers to violation address, and new '_addr_and' 193field refers to the upper/lower bounds when a #BR is caused. 194 195Glibc will be also updated to support this new siginfo. So user 196can get violation address and bounds when bounds violations occur. 197 198Cleanup unused bounds tables 199---------------------------- 200 201When a BNDSTX instruction attempts to save bounds to a bounds directory 202entry marked as invalid, a #BR is generated. This is an indication that 203no bounds table exists for this entry. In this case the fault handler 204will allocate a new bounds table on demand. 205 206Since the kernel allocated those tables on-demand without userspace 207knowledge, it is also responsible for freeing them when the associated 208mappings go away. 209 210Here, the solution for this issue is to hook do_munmap() to check 211whether one process is MPX enabled. If yes, those bounds tables covered 212in the virtual address region which is being unmapped will be freed also. 213 214Adding new prctl commands 215------------------------- 216 217Two new prctl commands are added to enable and disable MPX bounds tables 218management in kernel. 219:: 220 221 155 #define PR_MPX_ENABLE_MANAGEMENT 43 222 156 #define PR_MPX_DISABLE_MANAGEMENT 44 223 224Runtime library in userspace is responsible for allocation of bounds 225directory. So kernel have to use XSAVE instruction to get the base 226of bounds directory from BNDCFG register. 227 228But XSAVE is expected to be very expensive. In order to do performance 229optimization, we have to get the base of bounds directory and save it 230into struct mm_struct to be used in future during PR_MPX_ENABLE_MANAGEMENT 231command execution. 232 233 234Special rules 235============= 236 2371) If userspace is requesting help from the kernel to do the management 238of bounds tables, it may not create or modify entries in the bounds directory. 239 240Certainly users can allocate bounds tables and forcibly point the bounds 241directory at them through XSAVE instruction, and then set valid bit 242of bounds entry to have this entry valid. But, the kernel will decline 243to assist in managing these tables. 244 2452) Userspace may not take multiple bounds directory entries and point 246them at the same bounds table. 247 248This is allowed architecturally. See more information "Intel(R) Architecture 249Instruction Set Extensions Programming Reference" (9.3.4). 250 251However, if users did this, the kernel might be fooled in to unmapping an 252in-use bounds table since it does not recognize sharing. 253