1###################################################### 2Code sharing between independently linked XIP binaries 3###################################################### 4 5:Author: Tamas Ban 6:Organization: Arm Limited 7:Contact: tamas.ban@arm.com 8 9********** 10Motivation 11********** 12Cortex-M devices are usually constrained in terms of flash and RAM. Therefore, 13it is often challenging to fit bigger projects in the available memory. The PSA 14specifications require a device to both have a secure boot process in place at 15device boot-up time, and to have a partition in the SPE which provides 16cryptographic services at runtime. These two entities have some overlapping 17functionality. Some cryptographic primitives (e.g. hash calculation and digital 18signature verification) are required both in the bootloader and the runtime 19environment. In the current TF-M code base, both firmware components use the 20mbed-crypto library to implement these requirements. During the build process, 21the mbed-crpyto library is built twice, with different configurations (the 22bootloader requires less functionality) and then linked to the corresponding 23firmware component. As a result of this workflow, the same code is placed in the 24flash twice. For example, the code for the SHA-256 algorithm is included in 25MCUboot, but the exact same code is duplicated in the SPE cryptography 26partition. In most cases, there is no memory isolation between the bootloader 27and the SPE, because both are part of the PRoT code and run in the secure 28domain. So, in theory, the code of the common cryptographic algorithms could be 29reused among these firmware components. This could result in a big reduction in 30code footprint, because the cryptographic algorithms are usually flash hungry. 31Code size reduction can be a good opportunity for very constrained devices, 32which might need to use TF-M Profile Small anyway. 33 34******************* 35Technical challenge 36******************* 37Code sharing in a regular OS environment is easily achievable with dynamically 38linked libraries. However, this is not the case in Cortex-M systems where 39applications might run bare-metal, or on top of an RTOS, which usually lacks 40dynamic loading functionality. One major challenge to be solved in the Cortex-M 41space is how to share code between independently linked XIP applications that 42are tied to a certain memory address range to be executable and have absolute 43function and global data memory addresses. In this case, the code is not 44relocatable, and in most cases, there is no loader functionality in the system 45that can perform code relocation. Also, the lack of an MMU makes the address 46space flat, constant and not reconfigurable at runtime by privileged code. 47 48One other difficulty is that the bootloader and the runtime use the same RAM 49area during execution. The runtime firmware is executed strictly after the 50bootloader, so normally, it can reuse the whole secure RAM area, as it would be 51the exclusive user. No attention needs to be paid as to where global data is 52placed by the linker. The bootloader does not need to retain its state. The low 53level startup of the runtime firmware can freely overwrite the RAM with its data 54without corrupting bootloader functionality. However, with code sharing between 55bootloader and runtime firmware, these statements are no longer true. Global 56variables used by the shared code must either retain their value or must be 57reinitialised during low level startup of the runtime firmware. The startup code 58is not allowed to overwrite the shared global variables with arbitrary data. The 59following design proposal provides a solution to these challenges. 60 61************** 62Design concept 63************** 64The bootloader is sometimes implemented as ROM code (BL1) or stored in a region 65of the flash which is lockable, to prevent tampering. In a secure system, the 66bootloader is immutable code and thus implements a part of the Root of Trust 67anchor in the device, which is trusted implicitly. The shared code is primarily 68part of the bootloader, and is reused by the runtime SPE firmware at a later 69stage. Not all of the bootloader code is reused by the runtime SPE, only some 70cryptographic functions. 71 72Simplified steps of building with code sharing enabled: 73 74 - Complete the bootloader build process to have a final image that contains 75 the absolute addresses of the shared functions, and the global variables 76 used by these functions. 77 - Extract the addresses of the functions and related global variables that are 78 intended to be shared from the bootloader executable. 79 - When building runtime firmware, provide the absolute addresses of the shared 80 symbols to the linker, so that it can pick them up, instead of instantiating 81 them again. 82 83The execution flow looks like this: 84 85.. code-block:: bash 86 87 SPE MCUboot func1() MCUboot func2() MCUboot func3() 88 | 89 | Hash() 90 |------------->| 91 |----------------->| 92 | 93 Return | 94 Return |<-----------------| 95 |<-------------| 96 | 97 | 98 |----------------------------------------------------->| 99 | 100 Function pointer in shared global data() | 101 |<-----------------------------------------------------| 102 | 103 | Return 104 |----------------------------------------------------->| 105 | 106 Return | 107 |<-----------------------------------------------------| 108 | 109 | 110 111The execution flow usually returns from a shared function back to the SPE with 112an ordinary function return. So usually, once a shared function is called in the 113call path, all further functions in the call chain will be shared as well. 114However, this is not always the case, as it is possible for a shared function to 115call a non-shared function in SPE code through a global function pointer. 116 117For shared global variables, a dedicated data section must be allocated in the 118linker configuration file. This area must have the same memory address in both 119MCUboot's and the SPE's linker files, to ensure the integrity of the variables. 120For simplicity's sake, this section is placed at the very beginning of the RAM 121area. Also, the RAM wiping functionality at the end of the secure boot flow 122(that is intended to remove any possible secrets from the RAM) must not clear 123this area. Furthermore, it must be ensured that the linker places shared globals 124into this data section. There are two way to achieve this: 125 126 - Put a filter pattern in the section body that matches the shared global 127 variables. 128 - Mark the global variables in the source code with special attribute 129 `__attribute__((section(<NAME_OF_SHARED_SYMBOL_SECTION>)))` 130 131RAM memory layout in MCUboot with code sharing enabled: 132 133.. code-block:: bash 134 135 +------------------+ 136 | Shared symbols | 137 +------------------+ 138 | Shared boot data | 139 +------------------+ 140 | Data | 141 +------------------+ 142 | Stack (MSP) | 143 +------------------+ 144 | Heap | 145 +------------------+ 146 147RAM memory layout in SPE with code sharing enabled: 148 149.. code-block:: bash 150 151 +-------------------+ 152 | Shared symbols | 153 +-------------------+ 154 | Shared boot data | 155 +-------------------+ 156 | Stack (MSP) | 157 +-------------------+ 158 | Stack (PSP) | 159 +-------------------+ 160 | Partition X Data | 161 +-------------------+ 162 | Partition X Stack | 163 +-------------------+ 164 . 165 . 166 . 167 +-------------------+ 168 | Partition Z Data | 169 +-------------------+ 170 | Partition Z Stack | 171 +-------------------+ 172 | PRoT Data | 173 +-------------------+ 174 | Heap | 175 +-------------------+ 176 177Patching Mbed TLS 178================= 179In order to share some global function pointers from mbed-crypto that are 180related to dynamic memory allocation, their scope must be extended from private 181to global. This is needed because some compiler toolchain only extract the 182addresses of public functions and global variables, and extraction of addresses 183is a requirement to share them among binaries. Therefore, a short patch was 184created for the mbed-crypto library, which "globalises" these function pointers: 185 186`lib/ext/mbedcrypto/0002-Enable-crypto-code-sharing-between-independent-binar.patch` 187 188The patch needs to be manually applied in the Mbed TLS repo, if code sharing is 189enabled. The patch has no effect on the functional behaviour of the 190cryptographic library, it only extends the scope of some variables. 191 192************* 193Tools support 194************* 195All the currently supported compilers provide a way to achieve the above 196objectives. However, there is no standard way, which means that the code sharing 197functionality must be implemented on a per compiler basis. The following steps 198are needed: 199 200 - Extraction of the addresses of all global symbols. 201 - The filtering out of the addresses of symbols that aren't shared. The goal is 202 to not need to list all the shared symbols by name. Only a simple pattern 203 has to be provided, which matches the beginning of the symbol's name. 204 Matching symbols will be shared. Examples are in : 205 `bl2/shared_symbol_template.txt` 206 - Provision of the addresses of shared symbols to the linker during the SPE 207 build process. 208 - The resolution of symbol collisions during SPE linking. Because mbed-crypto 209 is linked to both firmware components as a static library, the external 210 shared symbols will conflict with the same symbols found within it. In order 211 to prioritize the external symbol, the symbol with the same name in 212 mbed-crypto must be marked as weak in the symbol table. 213 214The above functionalities are implemented in the toolchain specific CMake files: 215 216 - `toolchain_ARMCLANG.cmake` 217 - `toolchain_GNUARM.cmake` 218 219By the following two functions: 220 221 - `target_share_symbols()`: Extract and filter shared symbol addresses 222 from MCUboot. 223 - `target_link_shared_code()`: Link shared code to the SPE and resolve symbol 224 conflict issues. 225 226ARMCLANG 227======== 228The toolchain specific steps are: 229 230 - Extract all symbols from MCUboot: add `-symdefs` to the compiler command line 231 - Filter shared symbols: call CMake script `FilterSharedSymbols.cmake` 232 - Weaken duplicated (shared) symbols in the mbed-crypto static library that are 233 linked to the SPE: `arm-none-eabi-objcopy` 234 - Link shared code to SPE: Add the filtered output of `-symdefs` to the SPE 235 source file list. 236 237GNUARM 238====== 239The toolchain specific steps are: 240 241 - Extract all symbols from MCUboot: `arm-none-eabi-nm` 242 - Filter shared symbols: call CMake script: `FilterSharedSymbols.cmake` 243 - Strip unshared code from MCUboot: `arm-none-eabi-strip` 244 - Weaken duplicated (shared) symbols in the mbed-crypto static library that are 245 linked to the SPE: `arm-none-eabi-objcopy` 246 - Link shared code to SPE: Add `-Wl -R <SHARED_STRIPPED_CODE.axf>` to the 247 compiler command line 248 249IAR 250=== 251Functionality currently not implemented, but the toolchain supports doing it. 252 253************************** 254Memory footprint reduction 255************************** 256Build type: MinSizeRel 257Platform: mps2/an521 258Version: TF-Mv1.2.0 + code sharing patches 259MCUboot image encryption support is disabled. 260 261+------------------+-------------------+-------------------+-------------------+ 262| | ConfigDefault | ConfigProfile-M | ConfigProfile-S | 263+------------------+----------+--------+----------+--------+----------+--------+ 264| | ARMCLANG | GNUARM | ARMCLANG | GNUARM | ARMCLANG | GNUARM | 265+------------------+----------+--------+----------+--------+----------+--------+ 266| CODE_SHARING=OFF | 122268 | 124572 | 75936 | 75996 | 50336 | 50224 | 267+------------------+----------+--------+----------+--------+----------+--------+ 268| CODE_SHARING=ON | 113264 | 115500 | 70400 | 70336 | 48840 | 48988 | 269+------------------+----------+--------+----------+--------+----------+--------+ 270| Difference | 9004 | 9072 | 5536 | 5660 | 1496 | 1236 | 271+------------------+----------+--------+----------+--------+----------+--------+ 272 273If MCUboot image encryption support is enabled then saving could be up to 274~13-15KB. 275 276.. Note:: 277 278 Code sharing on Musca-B1 was tested only with SW only crypto, so crypto 279 hardware acceleration must be turned off: -DCRYPTO_HW_ACCELERATOR=OFF 280 281 282************************* 283Useability considerations 284************************* 285Functions that only use local variables can be shared easily. However, functions 286that rely on global variables are a bit tricky. They can still be shared, but 287all global variables must be placed in the shared symbol section, to prevent 288overwriting and to enable the retention of their values. 289 290Some global variables might need to be reinitialised to their original values by 291runtime firmware, if they have been used by the bootloader, but need to have 292their original value when runtime firmware starts to use them. If so, the 293reinitialising functionality must be implemented explicitly, because the low 294level startup code in the SPE does not initialise the shared variables, which 295means they retain their value after MCUboot stops running. 296 297If a bug is discovered in the shared code, it cannot be fixed with a firmware 298upgrade, if the bootloader code is immutable. If this is the case, disabling 299code sharing might be a solution, as the new runtime firmware could contain the 300fixed code instead of relying on the unfixed shared code. However, this would 301increase code footprint. 302 303API backward compatibility also can be an issue. If the API has changed in newer 304version of the shared code. Then new code cannot rely on the shared version. 305The changed code and all the other shared code where it is referenced from must 306be ignored and the updated version of the functions must be compiled in the 307SPE binary. The Mbed TLS library is API compatible with its current version 308(``v2.24.0``) since the ``mbedtls-2.7.0 release`` (2018-02-03). 309 310To minimise the risk of incompatibility, use the same compiler flags to build 311both firmware components. 312 313The artifacts of the shared code extraction steps must be preserved so as to 314remain available if new SPE firmware (that relies on shared code) is built and 315released. Those files are necessary to know the address of shared symbols when 316linking the SPE. 317 318************************ 319How to use code sharing? 320************************ 321Considering the above, code sharing is an optional feature, which is disabled 322by default. It can be enabled from the command line with a compile time switch: 323 324 - `TFM_CODE_SHARING`: Set to `ON` to enable code sharing. 325 326With the default settings, only the common part of the mbed-crypto library is 327shared, between MCUboot and the SPE. However, there might be other device 328specific code (e.g. device drivers) that could be shared. The shared 329cryptography code consists mainly of the SHA-256 algorithm, the `bignum` library 330and some RSA related functions. If image encryption support is enabled in 331MCUboot, then AES algorithms can be shared as well. 332 333Sharing code between the SPE and an external project is possible, even if 334MCUboot isn't used as the bootloader. For example, a custom bootloader can also 335be built in such a way as to create the necessary artifacts to share some of its 336code with the SPE. The same artifacts must be created like the case of MCUboot: 337 338 - `shared_symbols_name.txt`: Contains the name of the shared symbols. Used by 339 the script that prevents symbol collision. 340 - `shared_symbols_address.txt`: Contains the type, address and name of shared 341 symbols. Used by the linker when linking runtime SPE. 342 - `shared_code.axf`: GNUARM specific. The stripped version of the firmware 343 component, only contains the shared code. It is used by the linker when 344 linking the SPE. 345 346.. Note:: 347 348 The artifacts of the shared code extraction steps must be preserved to be 349 able to link them to any future SPE version. 350 351When an external project is sharing code with the SPE, the `SHARED_CODE_PATH` 352compile time switch must be set to the path of the artifacts mentioned above. 353 354******************** 355Further improvements 356******************** 357This design focuses only on sharing the cryptography code. However, other code 358could be shared as well. Some possibilities: 359 360- Flash driver 361- Serial driver 362- Image metadata parsing code 363- etc. 364 365-------------- 366 367*Copyright (c) 2020-2024, Arm Limited. All rights reserved.* 368