1.. SPDX-License-Identifier: GPL-2.0 2 3================= 4KVM VCPU Requests 5================= 6 7Overview 8======== 9 10KVM supports an internal API enabling threads to request a VCPU thread to 11perform some activity. For example, a thread may request a VCPU to flush 12its TLB with a VCPU request. The API consists of the following functions:: 13 14 /* Check if any requests are pending for VCPU @vcpu. */ 15 bool kvm_request_pending(struct kvm_vcpu *vcpu); 16 17 /* Check if VCPU @vcpu has request @req pending. */ 18 bool kvm_test_request(int req, struct kvm_vcpu *vcpu); 19 20 /* Clear request @req for VCPU @vcpu. */ 21 void kvm_clear_request(int req, struct kvm_vcpu *vcpu); 22 23 /* 24 * Check if VCPU @vcpu has request @req pending. When the request is 25 * pending it will be cleared and a memory barrier, which pairs with 26 * another in kvm_make_request(), will be issued. 27 */ 28 bool kvm_check_request(int req, struct kvm_vcpu *vcpu); 29 30 /* 31 * Make request @req of VCPU @vcpu. Issues a memory barrier, which pairs 32 * with another in kvm_check_request(), prior to setting the request. 33 */ 34 void kvm_make_request(int req, struct kvm_vcpu *vcpu); 35 36 /* Make request @req of all VCPUs of the VM with struct kvm @kvm. */ 37 bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req); 38 39Typically a requester wants the VCPU to perform the activity as soon 40as possible after making the request. This means most requests 41(kvm_make_request() calls) are followed by a call to kvm_vcpu_kick(), 42and kvm_make_all_cpus_request() has the kicking of all VCPUs built 43into it. 44 45VCPU Kicks 46---------- 47 48The goal of a VCPU kick is to bring a VCPU thread out of guest mode in 49order to perform some KVM maintenance. To do so, an IPI is sent, forcing 50a guest mode exit. However, a VCPU thread may not be in guest mode at the 51time of the kick. Therefore, depending on the mode and state of the VCPU 52thread, there are two other actions a kick may take. All three actions 53are listed below: 54 551) Send an IPI. This forces a guest mode exit. 562) Waking a sleeping VCPU. Sleeping VCPUs are VCPU threads outside guest 57 mode that wait on waitqueues. Waking them removes the threads from 58 the waitqueues, allowing the threads to run again. This behavior 59 may be suppressed, see KVM_REQUEST_NO_WAKEUP below. 603) Nothing. When the VCPU is not in guest mode and the VCPU thread is not 61 sleeping, then there is nothing to do. 62 63VCPU Mode 64--------- 65 66VCPUs have a mode state, ``vcpu->mode``, that is used to track whether the 67guest is running in guest mode or not, as well as some specific 68outside guest mode states. The architecture may use ``vcpu->mode`` to 69ensure VCPU requests are seen by VCPUs (see "Ensuring Requests Are Seen"), 70as well as to avoid sending unnecessary IPIs (see "IPI Reduction"), and 71even to ensure IPI acknowledgements are waited upon (see "Waiting for 72Acknowledgements"). The following modes are defined: 73 74OUTSIDE_GUEST_MODE 75 76 The VCPU thread is outside guest mode. 77 78IN_GUEST_MODE 79 80 The VCPU thread is in guest mode. 81 82EXITING_GUEST_MODE 83 84 The VCPU thread is transitioning from IN_GUEST_MODE to 85 OUTSIDE_GUEST_MODE. 86 87READING_SHADOW_PAGE_TABLES 88 89 The VCPU thread is outside guest mode, but it wants the sender of 90 certain VCPU requests, namely KVM_REQ_TLB_FLUSH, to wait until the VCPU 91 thread is done reading the page tables. 92 93VCPU Request Internals 94====================== 95 96VCPU requests are simply bit indices of the ``vcpu->requests`` bitmap. 97This means general bitops, like those documented in [atomic-ops]_ could 98also be used, e.g. :: 99 100 clear_bit(KVM_REQ_UNBLOCK & KVM_REQUEST_MASK, &vcpu->requests); 101 102However, VCPU request users should refrain from doing so, as it would 103break the abstraction. The first 8 bits are reserved for architecture 104independent requests; all additional bits are available for architecture 105dependent requests. 106 107Architecture Independent Requests 108--------------------------------- 109 110KVM_REQ_TLB_FLUSH 111 112 KVM's common MMU notifier may need to flush all of a guest's TLB 113 entries, calling kvm_flush_remote_tlbs() to do so. Architectures that 114 choose to use the common kvm_flush_remote_tlbs() implementation will 115 need to handle this VCPU request. 116 117KVM_REQ_VM_DEAD 118 119 This request informs all VCPUs that the VM is dead and unusable, e.g. due to 120 fatal error or because the VM's state has been intentionally destroyed. 121 122KVM_REQ_UNBLOCK 123 124 This request informs the vCPU to exit kvm_vcpu_block. It is used for 125 example from timer handlers that run on the host on behalf of a vCPU, 126 or in order to update the interrupt routing and ensure that assigned 127 devices will wake up the vCPU. 128 129KVM_REQ_OUTSIDE_GUEST_MODE 130 131 This "request" ensures the target vCPU has exited guest mode prior to the 132 sender of the request continuing on. No action needs be taken by the target, 133 and so no request is actually logged for the target. This request is similar 134 to a "kick", but unlike a kick it guarantees the vCPU has actually exited 135 guest mode. A kick only guarantees the vCPU will exit at some point in the 136 future, e.g. a previous kick may have started the process, but there's no 137 guarantee the to-be-kicked vCPU has fully exited guest mode. 138 139KVM_REQUEST_MASK 140---------------- 141 142VCPU requests should be masked by KVM_REQUEST_MASK before using them with 143bitops. This is because only the lower 8 bits are used to represent the 144request's number. The upper bits are used as flags. Currently only two 145flags are defined. 146 147VCPU Request Flags 148------------------ 149 150KVM_REQUEST_NO_WAKEUP 151 152 This flag is applied to requests that only need immediate attention 153 from VCPUs running in guest mode. That is, sleeping VCPUs do not need 154 to be awakened for these requests. Sleeping VCPUs will handle the 155 requests when they are awakened later for some other reason. 156 157KVM_REQUEST_WAIT 158 159 When requests with this flag are made with kvm_make_all_cpus_request(), 160 then the caller will wait for each VCPU to acknowledge its IPI before 161 proceeding. This flag only applies to VCPUs that would receive IPIs. 162 If, for example, the VCPU is sleeping, so no IPI is necessary, then 163 the requesting thread does not wait. This means that this flag may be 164 safely combined with KVM_REQUEST_NO_WAKEUP. See "Waiting for 165 Acknowledgements" for more information about requests with 166 KVM_REQUEST_WAIT. 167 168VCPU Requests with Associated State 169=================================== 170 171Requesters that want the receiving VCPU to handle new state need to ensure 172the newly written state is observable to the receiving VCPU thread's CPU 173by the time it observes the request. This means a write memory barrier 174must be inserted after writing the new state and before setting the VCPU 175request bit. Additionally, on the receiving VCPU thread's side, a 176corresponding read barrier must be inserted after reading the request bit 177and before proceeding to read the new state associated with it. See 178scenario 3, Message and Flag, of [lwn-mb]_ and the kernel documentation 179[memory-barriers]_. 180 181The pair of functions, kvm_check_request() and kvm_make_request(), provide 182the memory barriers, allowing this requirement to be handled internally by 183the API. 184 185Ensuring Requests Are Seen 186========================== 187 188When making requests to VCPUs, we want to avoid the receiving VCPU 189executing in guest mode for an arbitrary long time without handling the 190request. We can be sure this won't happen as long as we ensure the VCPU 191thread checks kvm_request_pending() before entering guest mode and that a 192kick will send an IPI to force an exit from guest mode when necessary. 193Extra care must be taken to cover the period after the VCPU thread's last 194kvm_request_pending() check and before it has entered guest mode, as kick 195IPIs will only trigger guest mode exits for VCPU threads that are in guest 196mode or at least have already disabled interrupts in order to prepare to 197enter guest mode. This means that an optimized implementation (see "IPI 198Reduction") must be certain when it's safe to not send the IPI. One 199solution, which all architectures except s390 apply, is to: 200 201- set ``vcpu->mode`` to IN_GUEST_MODE between disabling the interrupts and 202 the last kvm_request_pending() check; 203- enable interrupts atomically when entering the guest. 204 205This solution also requires memory barriers to be placed carefully in both 206the requesting thread and the receiving VCPU. With the memory barriers we 207can exclude the possibility of a VCPU thread observing 208!kvm_request_pending() on its last check and then not receiving an IPI for 209the next request made of it, even if the request is made immediately after 210the check. This is done by way of the Dekker memory barrier pattern 211(scenario 10 of [lwn-mb]_). As the Dekker pattern requires two variables, 212this solution pairs ``vcpu->mode`` with ``vcpu->requests``. Substituting 213them into the pattern gives:: 214 215 CPU1 CPU2 216 ================= ================= 217 local_irq_disable(); 218 WRITE_ONCE(vcpu->mode, IN_GUEST_MODE); kvm_make_request(REQ, vcpu); 219 smp_mb(); smp_mb(); 220 if (kvm_request_pending(vcpu)) { if (READ_ONCE(vcpu->mode) == 221 IN_GUEST_MODE) { 222 ...abort guest entry... ...send IPI... 223 } } 224 225As stated above, the IPI is only useful for VCPU threads in guest mode or 226that have already disabled interrupts. This is why this specific case of 227the Dekker pattern has been extended to disable interrupts before setting 228``vcpu->mode`` to IN_GUEST_MODE. WRITE_ONCE() and READ_ONCE() are used to 229pedantically implement the memory barrier pattern, guaranteeing the 230compiler doesn't interfere with ``vcpu->mode``'s carefully planned 231accesses. 232 233IPI Reduction 234------------- 235 236As only one IPI is needed to get a VCPU to check for any/all requests, 237then they may be coalesced. This is easily done by having the first IPI 238sending kick also change the VCPU mode to something !IN_GUEST_MODE. The 239transitional state, EXITING_GUEST_MODE, is used for this purpose. 240 241Waiting for Acknowledgements 242---------------------------- 243 244Some requests, those with the KVM_REQUEST_WAIT flag set, require IPIs to 245be sent, and the acknowledgements to be waited upon, even when the target 246VCPU threads are in modes other than IN_GUEST_MODE. For example, one case 247is when a target VCPU thread is in READING_SHADOW_PAGE_TABLES mode, which 248is set after disabling interrupts. To support these cases, the 249KVM_REQUEST_WAIT flag changes the condition for sending an IPI from 250checking that the VCPU is IN_GUEST_MODE to checking that it is not 251OUTSIDE_GUEST_MODE. 252 253Request-less VCPU Kicks 254----------------------- 255 256As the determination of whether or not to send an IPI depends on the 257two-variable Dekker memory barrier pattern, then it's clear that 258request-less VCPU kicks are almost never correct. Without the assurance 259that a non-IPI generating kick will still result in an action by the 260receiving VCPU, as the final kvm_request_pending() check does for 261request-accompanying kicks, then the kick may not do anything useful at 262all. If, for instance, a request-less kick was made to a VCPU that was 263just about to set its mode to IN_GUEST_MODE, meaning no IPI is sent, then 264the VCPU thread may continue its entry without actually having done 265whatever it was the kick was meant to initiate. 266 267One exception is x86's posted interrupt mechanism. In this case, however, 268even the request-less VCPU kick is coupled with the same 269local_irq_disable() + smp_mb() pattern described above; the ON bit 270(Outstanding Notification) in the posted interrupt descriptor takes the 271role of ``vcpu->requests``. When sending a posted interrupt, PIR.ON is 272set before reading ``vcpu->mode``; dually, in the VCPU thread, 273vmx_sync_pir_to_irr() reads PIR after setting ``vcpu->mode`` to 274IN_GUEST_MODE. 275 276Additional Considerations 277========================= 278 279Sleeping VCPUs 280-------------- 281 282VCPU threads may need to consider requests before and/or after calling 283functions that may put them to sleep, e.g. kvm_vcpu_block(). Whether they 284do or not, and, if they do, which requests need consideration, is 285architecture dependent. kvm_vcpu_block() calls kvm_arch_vcpu_runnable() 286to check if it should awaken. One reason to do so is to provide 287architectures a function where requests may be checked if necessary. 288 289References 290========== 291 292.. [atomic-ops] Documentation/atomic_bitops.txt and Documentation/atomic_t.txt 293.. [memory-barriers] Documentation/memory-barriers.txt 294.. [lwn-mb] https://lwn.net/Articles/573436/ 295