1.. _float_v2: 2 3Floating Point Services 4####################### 5 6The kernel allows threads to use floating point registers on board 7configurations that support these registers. 8 9.. note:: 10 Floating point services are currently available only for boards 11 based on ARM Cortex-M SoCs supporting the Floating Point Extension, 12 the Intel x86 architecture, the SPARC architecture and ARCv2 SoCs 13 supporting the Floating Point Extension. The services provided 14 are architecture specific. 15 16 The kernel does not support the use of floating point registers by ISRs. 17 18.. contents:: 19 :local: 20 :depth: 2 21 22Concepts 23******** 24 25The kernel can be configured to provide only the floating point services 26required by an application. Three modes of operation are supported, 27which are described below. In addition, the kernel's support for the SSE 28registers can be included or omitted, as desired. 29 30No FP registers mode 31==================== 32 33This mode is used when the application has no threads that use floating point 34registers. It is the kernel's default floating point services mode. 35 36If a thread uses any floating point register, 37the kernel generates a fatal error condition and aborts the thread. 38 39Unshared FP registers mode 40========================== 41 42This mode is used when the application has only a single thread 43that uses floating point registers. 44 45On x86 platforms, the kernel initializes the floating point registers so they can 46be used by any thread (initialization in skipped on ARM Cortex-M platforms and 47ARCv2 platforms). The floating point registers are left unchanged whenever a 48context switch occurs. 49 50.. note:: 51 The behavior is undefined, if two or more threads attempt to use 52 the floating point registers, as the kernel does not attempt to detect 53 (or prevent) multiple threads from using these registers. 54 55Shared FP registers mode 56======================== 57 58This mode is used when the application has two or more threads that use 59floating point registers. Depending upon the underlying CPU architecture, 60the kernel supports one or more of the following thread sub-classes: 61 62* non-user: A thread that cannot use any floating point registers 63 64* FPU user: A thread that can use the standard floating point registers 65 66* SSE user: A thread that can use both the standard floating point registers 67 and SSE registers 68 69The kernel initializes and enables access to the floating point registers, 70so they can be used 71by any thread, then saves and restores these registers during 72context switches to ensure the computations performed by each FPU user 73or SSE user are not impacted by the computations performed by the other users. 74 75ARM Cortex-M architecture (with the Floating Point Extension) 76------------------------------------------------------------- 77 78.. note:: 79 The Shared FP registers mode is the default Floating Point 80 Services mode in ARM Cortex-M. 81 82On the ARM Cortex-M architecture with the Floating Point Extension, the kernel 83treats *all* threads as FPU users when shared FP registers mode is enabled. 84This means that any thread is allowed to access the floating point registers. 85The ARM kernel automatically detects that a given thread is using the floating 86point registers the first time the thread accesses them. 87 88Pretag a thread that intends to use the FP registers by 89using one of the techniques listed below. 90 91* A statically-created ARM thread can be pretagged by passing the 92 :c:macro:`K_FP_REGS` option to :c:macro:`K_THREAD_DEFINE`. 93 94* A dynamically-created ARM thread can be pretagged by passing the 95 :c:macro:`K_FP_REGS` option to :c:func:`k_thread_create`. 96 97Pretagging a thread with the :c:macro:`K_FP_REGS` option instructs the 98MPU-based stack protection mechanism to properly configure the size of 99the thread's guard region to always guarantee stack overflow detection, 100and enable lazy stacking for the given thread upon thread creation. 101 102During thread context switching the ARM kernel saves the *callee-saved* 103floating point registers, if the switched-out thread has been using them. 104Additionally, the *caller-saved* floating point registers are saved on 105the thread's stack. If the switched-in thread has been using the floating 106point registers, the kernel restores the *callee-saved* FP registers of 107the switched-in thread and the *caller-saved* FP context is restored from 108the thread's stack. Thus, the kernel does not save or restore the FP 109context of threads that are not using the FP registers. 110 111Each thread that intends to use the floating point registers must provide 112an extra 72 bytes of stack space where the callee-saved FP context can 113be saved. 114 115`Lazy Stacking 116<https://developer.arm.com/documentation/dai0298/a>`_ 117is currently enabled in Zephyr applications on ARM Cortex-M 118architecture, minimizing interrupt latency, when the floating 119point context is active. 120 121When the MPU-based stack protection mechanism is not enabled, lazy stacking 122is always active in the Zephyr application. When the MPU-based stack protection 123is enabled, the following rules apply with respect to lazy stacking: 124 125* Lazy stacking is activated by default on threads that are pretagged with 126 :c:macro:`K_FP_REGS` 127* Lazy stacking is activated dynamically on threads that are not pretagged with 128 :c:macro:`K_FP_REGS`, as soon as the kernel detects that they are using the 129 floating point registers. 130 131 132If an ARM thread does not require use of the floating point registers any 133more, it can call :c:func:`k_float_disable`. This instructs the kernel 134not to save or restore its FP context during thread context switching. 135 136ARM64 architecture 137------------------ 138 139.. note:: 140 The Shared FP registers mode is the default Floating Point 141 Services mode on ARM64. The compiler is free to optimize code 142 using FP/SIMD registers, and library functions such as memcpy 143 are known to make use of them. 144 145On the ARM64 (Aarch64) architecture the kernel treats each thread as a FPU 146user on a case-by-case basis. A "lazy save" algorithm is used during context 147switching which updates the floating point registers only when it is absolutely 148necessary. For example, the registers are *not* saved when switching from an 149FPU user to a non-user thread, and then back to the original FPU user. 150 151FPU register usage by ISRs is supported although not recommended. When an 152ISR uses floating point or SIMD registers, then the access is trapped, the 153current FPU user context is saved in the thread object and the ISR is resumed 154with interrupts disabled so to prevent another IRQ from interrupting the ISR 155and potentially requesting FPU usage. Because ISR don't have a persistent 156register context, there are no provision for saving an ISR's FPU context 157either, hence the IRQ disabling. 158 159Each thread object becomes 512 bytes larger when Shared FP registers mode 160is enabled. 161 162ARCv2 architecture 163------------------ 164 165On the ARCv2 architecture, the kernel treats each thread as a non-user 166or FPU user and the thread must be tagged by one of the 167following techniques. 168 169* A statically-created ARC thread can be tagged by passing the 170 :c:macro:`K_FP_REGS` option to :c:macro:`K_THREAD_DEFINE`. 171 172* A dynamically-created ARC thread can be tagged by passing the 173 :c:macro:`K_FP_REGS` to :c:func:`k_thread_create`. 174 175If an ARC thread does not require use of the floating point registers any 176more, it can call :c:func:`k_float_disable`. This instructs the kernel 177not to save or restore its FP context during thread context switching. 178 179During thread context switching the ARC kernel saves the *callee-saved* 180floating point registers, if the switched-out thread has been using them. 181Additionally, the *caller-saved* floating point registers are saved on 182the thread's stack. If the switched-in thread has been using the floating 183point registers, the kernel restores the *callee-saved* FP registers of 184the switched-in thread and the *caller-saved* FP context is restored from 185the thread's stack. Thus, the kernel does not save or restore the FP 186context of threads that are not using the FP registers. An extra 16 bytes 187(single floating point hardware) or 32 bytes (double floating point hardware) 188of stack space is required to load and store floating point registers. 189 190RISC-V architecture 191------------------- 192 193On the RISC-V architecture the kernel treats each thread as an FPU 194user on a case-by-case basis with the FPU access allocated on demand. 195A "lazy save" algorithm is used during context switching which updates 196the floating point registers only when it is absolutely necessary. 197For example, the FPU registers are *not* saved when switching from an 198FPU user to a non-user thread (or an FPU user that doesn't touch the FPU 199during its scheduling slot), and then back to the original FPU user. 200 201FPU register usage by ISRs is supported although not recommended. When an 202ISR uses floating point or SIMD registers, then the access is trapped, the 203current FPU user context is saved in the thread object and the ISR is resumed 204with interrupts disabled so to prevent another IRQ from interrupting the ISR 205and potentially requesting FPU usage. Because ISR don't have a persistent 206register context, there are no provision for saving an ISR's FPU context 207either, hence the IRQ disabling. 208 209As an optimization, the FPU context is preemptively restored upon scheduling 210back an "active FPU user" thread that had its FPU context saved away due to 211FPU usage by another thread. Active FPU users are so designated when they 212make the FPU state "dirty" during their most recent scheduling slot before 213being scheduled out. So if a thread doesn't modify the FPU state within its 214scheduling slot and another thread claims the FPU for itself afterwards then 215that first thread will be subjected to the on-demand regime and won't have 216its FPU context restored until it attempts to access it again. But if that 217thread does modify the FPU before being scheduled out then it is likely to 218continue using it when scheduled back in and preemptively restoring its FPU 219context saves on the exception trap overhead that would occur otherwise. 220 221Each thread object becomes 136 bytes (single-precision floating point 222hardware) or 264 bytes (double-precision floating point hardware) larger 223when Shared FP registers mode is enabled. 224 225SPARC architecture 226------------------ 227 228On the SPARC architecture, the kernel treats each thread as a non-user 229or FPU user and the thread must be tagged by one of the 230following techniques: 231 232* A statically-created thread can be tagged by passing the 233 :c:macro:`K_FP_REGS` option to :c:macro:`K_THREAD_DEFINE`. 234 235* A dynamically-created thread can be tagged by passing the 236 :c:macro:`K_FP_REGS` to :c:func:`k_thread_create`. 237 238During thread context switch at exit from interrupt handler, the SPARC 239kernel saves *all* floating point registers, if the FPU was enabled in 240the switched-out thread. Floating point registers are saved on the thread's 241stack. Floating point registers are restored when a thread context is restored 242iff they were saved at the context save. Saving and restoring of the floating 243point registers is synchronous and thus not lazy. The FPU is always disabled 244when an ISR is called (independent of :kconfig:option:`CONFIG_FPU_SHARING`). 245 246Floating point disabling with :c:func:`k_float_disable` is not implemented. 247 248When :kconfig:option:`CONFIG_FPU_SHARING` is used, then 136 bytes of stack space 249is required for each FPU user thread to load and store floating point 250registers. No extra stack is required if :kconfig:option:`CONFIG_FPU_SHARING` is 251not used. 252 253x86 architecture 254---------------- 255 256On the x86 architecture the kernel treats each thread as a non-user, 257FPU user or SSE user on a case-by-case basis. A "lazy save" algorithm is used 258during context switching which updates the floating point registers only when 259it is absolutely necessary. For example, the registers are *not* saved when 260switching from an FPU user to a non-user thread, and then back to the original 261FPU user. The following table indicates the amount of additional stack space a 262thread must provide so the registers can be saved properly. 263 264=========== =============== ========================== 265Thread type FP register use Extra stack space required 266=========== =============== ========================== 267cooperative any 0 bytes 268preemptive none 0 bytes 269preemptive FPU 108 bytes 270preemptive SSE 464 bytes 271=========== =============== ========================== 272 273The x86 kernel automatically detects that a given thread is using 274the floating point registers the first time the thread accesses them. 275The thread is tagged as an SSE user if the kernel has been configured 276to support the SSE registers, or as an FPU user if the SSE registers are 277not supported. If this would result in a thread that is an FPU user being 278tagged as an SSE user, or if the application wants to avoid the exception 279handling overhead involved in auto-tagging threads, it is possible to 280pretag a thread using one of the techniques listed below. 281 282* A statically-created x86 thread can be pretagged by passing the 283 :c:macro:`K_FP_REGS` or :c:macro:`K_SSE_REGS` option to 284 :c:macro:`K_THREAD_DEFINE`. 285 286* A dynamically-created x86 thread can be pretagged by passing the 287 :c:macro:`K_FP_REGS` or :c:macro:`K_SSE_REGS` option to 288 :c:func:`k_thread_create`. 289 290* An already-created x86 thread can pretag itself once it has started 291 by passing the :c:macro:`K_FP_REGS` or :c:macro:`K_SSE_REGS` option to 292 :c:func:`k_float_enable`. 293 294If an x86 thread uses the floating point registers infrequently it can call 295:c:func:`k_float_disable` to remove its tagging as an FPU user or SSE user. 296This eliminates the need for the kernel to take steps to preserve 297the contents of the floating point registers during context switches 298when there is no need to do so. 299When the thread again needs to use the floating point registers it can re-tag 300itself as an FPU user or SSE user by calling :c:func:`k_float_enable`. 301 302Implementation 303************** 304 305Performing Floating Point Arithmetic 306==================================== 307 308No special coding is required for a thread to use floating point arithmetic 309if the kernel is properly configured. 310 311The following code shows how a routine can use floating point arithmetic 312to avoid overflow issues when computing the average of a series of integer 313values. 314 315.. code-block:: c 316 317 int average(int *values, int num_values) 318 { 319 double sum; 320 int i; 321 322 sum = 0.0; 323 324 for (i = 0; i < num_values; i++) { 325 sum += *values; 326 values++; 327 } 328 329 return (int)((sum / num_values) + 0.5); 330 } 331 332Suggested Uses 333************** 334 335Use the kernel floating point services when an application needs to 336perform floating point operations. 337 338Configuration Options 339********************* 340 341To configure unshared FP registers mode, enable the :kconfig:option:`CONFIG_FPU` 342configuration option and leave the :kconfig:option:`CONFIG_FPU_SHARING` configuration 343option disabled. 344 345To configure shared FP registers mode, enable both the :kconfig:option:`CONFIG_FPU` 346configuration option and the :kconfig:option:`CONFIG_FPU_SHARING` configuration option. 347Also, ensure that any thread that uses the floating point registers has 348sufficient added stack space for saving floating point register values 349during context switches, as described above. 350 351For x86, use the :kconfig:option:`CONFIG_X86_SSE` configuration option to enable 352support for SSEx instructions. 353 354API Reference 355************* 356 357.. doxygengroup:: float_apis 358