1.. SPDX-License-Identifier: GPL-2.0 2 3====================== 4Memory Protection Keys 5====================== 6 7Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature 8which is found on Intel's Skylake "Scalable Processor" Server CPUs. 9It will be avalable in future non-server parts. 10 11For anyone wishing to test or use this feature, it is available in 12Amazon's EC2 C5 instances and is known to work there using an Ubuntu 1317.04 image. 14 15Memory Protection Keys provides a mechanism for enforcing page-based 16protections, but without requiring modification of the page tables 17when an application changes protection domains. It works by 18dedicating 4 previously ignored bits in each page table entry to a 19"protection key", giving 16 possible keys. 20 21There is also a new user-accessible register (PKRU) with two separate 22bits (Access Disable and Write Disable) for each key. Being a CPU 23register, PKRU is inherently thread-local, potentially giving each 24thread a different set of protections from every other thread. 25 26There are two new instructions (RDPKRU/WRPKRU) for reading and writing 27to the new register. The feature is only available in 64-bit mode, 28even though there is theoretically space in the PAE PTEs. These 29permissions are enforced on data access only and have no effect on 30instruction fetches. 31 32Syscalls 33======== 34 35There are 3 system calls which directly interact with pkeys:: 36 37 int pkey_alloc(unsigned long flags, unsigned long init_access_rights) 38 int pkey_free(int pkey); 39 int pkey_mprotect(unsigned long start, size_t len, 40 unsigned long prot, int pkey); 41 42Before a pkey can be used, it must first be allocated with 43pkey_alloc(). An application calls the WRPKRU instruction 44directly in order to change access permissions to memory covered 45with a key. In this example WRPKRU is wrapped by a C function 46called pkey_set(). 47:: 48 49 int real_prot = PROT_READ|PROT_WRITE; 50 pkey = pkey_alloc(0, PKEY_DISABLE_WRITE); 51 ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); 52 ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey); 53 ... application runs here 54 55Now, if the application needs to update the data at 'ptr', it can 56gain access, do the update, then remove its write access:: 57 58 pkey_set(pkey, 0); // clear PKEY_DISABLE_WRITE 59 *ptr = foo; // assign something 60 pkey_set(pkey, PKEY_DISABLE_WRITE); // set PKEY_DISABLE_WRITE again 61 62Now when it frees the memory, it will also free the pkey since it 63is no longer in use:: 64 65 munmap(ptr, PAGE_SIZE); 66 pkey_free(pkey); 67 68.. note:: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions. 69 An example implementation can be found in 70 tools/testing/selftests/x86/protection_keys.c. 71 72Behavior 73======== 74 75The kernel attempts to make protection keys consistent with the 76behavior of a plain mprotect(). For instance if you do this:: 77 78 mprotect(ptr, size, PROT_NONE); 79 something(ptr); 80 81you can expect the same effects with protection keys when doing this:: 82 83 pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ); 84 pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey); 85 something(ptr); 86 87That should be true whether something() is a direct access to 'ptr' 88like:: 89 90 *ptr = foo; 91 92or when the kernel does the access on the application's behalf like 93with a read():: 94 95 read(fd, ptr, 1); 96 97The kernel will send a SIGSEGV in both cases, but si_code will be set 98to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when 99the plain mprotect() permissions are violated. 100