1The KVM halt polling system 2=========================== 3 4The KVM halt polling system provides a feature within KVM whereby the latency 5of a guest can, under some circumstances, be reduced by polling in the host 6for some time period after the guest has elected to no longer run by cedeing. 7That is, when a guest vcpu has ceded, or in the case of powerpc when all of the 8vcpus of a single vcore have ceded, the host kernel polls for wakeup conditions 9before giving up the cpu to the scheduler in order to let something else run. 10 11Polling provides a latency advantage in cases where the guest can be run again 12very quickly by at least saving us a trip through the scheduler, normally on 13the order of a few micro-seconds, although performance benefits are workload 14dependant. In the event that no wakeup source arrives during the polling 15interval or some other task on the runqueue is runnable the scheduler is 16invoked. Thus halt polling is especially useful on workloads with very short 17wakeup periods where the time spent halt polling is minimised and the time 18savings of not invoking the scheduler are distinguishable. 19 20The generic halt polling code is implemented in: 21 22 virt/kvm/kvm_main.c: kvm_vcpu_block() 23 24The powerpc kvm-hv specific case is implemented in: 25 26 arch/powerpc/kvm/book3s_hv.c: kvmppc_vcore_blocked() 27 28Halt Polling Interval 29===================== 30 31The maximum time for which to poll before invoking the scheduler, referred to 32as the halt polling interval, is increased and decreased based on the perceived 33effectiveness of the polling in an attempt to limit pointless polling. 34This value is stored in either the vcpu struct: 35 36 kvm_vcpu->halt_poll_ns 37 38or in the case of powerpc kvm-hv, in the vcore struct: 39 40 kvmppc_vcore->halt_poll_ns 41 42Thus this is a per vcpu (or vcore) value. 43 44During polling if a wakeup source is received within the halt polling interval, 45the interval is left unchanged. In the event that a wakeup source isn't 46received during the polling interval (and thus schedule is invoked) there are 47two options, either the polling interval and total block time[0] were less than 48the global max polling interval (see module params below), or the total block 49time was greater than the global max polling interval. 50 51In the event that both the polling interval and total block time were less than 52the global max polling interval then the polling interval can be increased in 53the hope that next time during the longer polling interval the wake up source 54will be received while the host is polling and the latency benefits will be 55received. The polling interval is grown in the function grow_halt_poll_ns() and 56is multiplied by the module parameters halt_poll_ns_grow and 57halt_poll_ns_grow_start. 58 59In the event that the total block time was greater than the global max polling 60interval then the host will never poll for long enough (limited by the global 61max) to wakeup during the polling interval so it may as well be shrunk in order 62to avoid pointless polling. The polling interval is shrunk in the function 63shrink_halt_poll_ns() and is divided by the module parameter 64halt_poll_ns_shrink, or set to 0 iff halt_poll_ns_shrink == 0. 65 66It is worth noting that this adjustment process attempts to hone in on some 67steady state polling interval but will only really do a good job for wakeups 68which come at an approximately constant rate, otherwise there will be constant 69adjustment of the polling interval. 70 71[0] total block time: the time between when the halt polling function is 72 invoked and a wakeup source received (irrespective of 73 whether the scheduler is invoked within that function). 74 75Module Parameters 76================= 77 78The kvm module has 3 tuneable module parameters to adjust the global max 79polling interval as well as the rate at which the polling interval is grown and 80shrunk. These variables are defined in include/linux/kvm_host.h and as module 81parameters in virt/kvm/kvm_main.c, or arch/powerpc/kvm/book3s_hv.c in the 82powerpc kvm-hv case. 83 84Module Parameter | Description | Default Value 85-------------------------------------------------------------------------------- 86halt_poll_ns | The global max polling | KVM_HALT_POLL_NS_DEFAULT 87 | interval which defines | 88 | the ceiling value of the | 89 | polling interval for | (per arch value) 90 | each vcpu. | 91-------------------------------------------------------------------------------- 92halt_poll_ns_grow | The value by which the | 2 93 | halt polling interval is | 94 | multiplied in the | 95 | grow_halt_poll_ns() | 96 | function. | 97-------------------------------------------------------------------------------- 98halt_poll_ns_grow_start | The initial value to grow | 10000 99 | to from zero in the | 100 | grow_halt_poll_ns() | 101 | function. | 102-------------------------------------------------------------------------------- 103halt_poll_ns_shrink | The value by which the | 0 104 | halt polling interval is | 105 | divided in the | 106 | shrink_halt_poll_ns() | 107 | function. | 108-------------------------------------------------------------------------------- 109 110These module parameters can be set from the debugfs files in: 111 112 /sys/module/kvm/parameters/ 113 114Note: that these module parameters are system wide values and are not able to 115 be tuned on a per vm basis. 116 117Further Notes 118============= 119 120- Care should be taken when setting the halt_poll_ns module parameter as a 121large value has the potential to drive the cpu usage to 100% on a machine which 122would be almost entirely idle otherwise. This is because even if a guest has 123wakeups during which very little work is done and which are quite far apart, if 124the period is shorter than the global max polling interval (halt_poll_ns) then 125the host will always poll for the entire block time and thus cpu utilisation 126will go to 100%. 127 128- Halt polling essentially presents a trade off between power usage and latency 129and the module parameters should be used to tune the affinity for this. Idle 130cpu time is essentially converted to host kernel time with the aim of decreasing 131latency when entering the guest. 132 133- Halt polling will only be conducted by the host when no other tasks are 134runnable on that cpu, otherwise the polling will cease immediately and 135schedule will be invoked to allow that other task to run. Thus this doesn't 136allow a guest to denial of service the cpu. 137