1== Overview == 2 3Original x86-64 was limited by 4-level paing to 256 TiB of virtual address 4space and 64 TiB of physical address space. We are already bumping into 5this limit: some vendors offers servers with 64 TiB of memory today. 6 7To overcome the limitation upcoming hardware will introduce support for 85-level paging. It is a straight-forward extension of the current page 9table structure adding one more layer of translation. 10 11It bumps the limits to 128 PiB of virtual address space and 4 PiB of 12physical address space. This "ought to be enough for anybody" ©. 13 14QEMU 2.9 and later support 5-level paging. 15 16Virtual memory layout for 5-level paging is described in 17Documentation/x86/x86_64/mm.txt 18 19== Enabling 5-level paging == 20 21CONFIG_X86_5LEVEL=y enables the feature. 22 23Kernel with CONFIG_X86_5LEVEL=y still able to boot on 4-level hardware. 24In this case additional page table level -- p4d -- will be folded at 25runtime. 26 27== User-space and large virtual address space == 28 29On x86, 5-level paging enables 56-bit userspace virtual address space. 30Not all user space is ready to handle wide addresses. It's known that 31at least some JIT compilers use higher bits in pointers to encode their 32information. It collides with valid pointers with 5-level paging and 33leads to crashes. 34 35To mitigate this, we are not going to allocate virtual address space 36above 47-bit by default. 37 38But userspace can ask for allocation from full address space by 39specifying hint address (with or without MAP_FIXED) above 47-bits. 40 41If hint address set above 47-bit, but MAP_FIXED is not specified, we try 42to look for unmapped area by specified address. If it's already 43occupied, we look for unmapped area in *full* address space, rather than 44from 47-bit window. 45 46A high hint address would only affect the allocation in question, but not 47any future mmap()s. 48 49Specifying high hint address on older kernel or on machine without 5-level 50paging support is safe. The hint will be ignored and kernel will fall back 51to allocation from 47-bit address space. 52 53This approach helps to easily make application's memory allocator aware 54about large address space without manually tracking allocated virtual 55address space. 56 57One important case we need to handle here is interaction with MPX. 58MPX (without MAWA extension) cannot handle addresses above 47-bit, so we 59need to make sure that MPX cannot be enabled we already have VMA above 60the boundary and forbid creating such VMAs once MPX is enabled. 61 62