Name |
Date |
Size |
#Lines |
LOC |
||
---|---|---|---|---|---|---|
.. | - | - | ||||
include/ | 18-Mar-2025 | - | 1,001 | 487 | ||
src/ | 18-Mar-2025 | - | 79,349 | 75,962 | ||
zephyr/ | 18-Mar-2025 | - | 7 | 4 | ||
.clang-format | D | 18-Mar-2025 | 4.2 KiB | 145 | 142 | |
.gitignore | D | 18-Mar-2025 | 7 | 2 | 1 | |
CMakeLists.txt | D | 18-Mar-2025 | 4.3 KiB | 115 | 95 | |
LICENSE.txt | D | 18-Mar-2025 | 11.1 KiB | 202 | 169 | |
README.md | D | 18-Mar-2025 | 12.8 KiB | 328 | 261 | |
SECURITY.md | D | 18-Mar-2025 | 4.4 KiB | 85 | 75 |
README.md
1# Arm(R) Ethos(TM)-U core driver 2 3This repository contains a device driver for the Arm(R) Ethos(TM)-U NPU. 4 5## Building 6 7The source code comes with a CMake based build system. The driver is expected to 8be cross compiled for any of the supported Arm Cortex(R)-M CPUs, which requires 9the user to configure the build to match their system configuration. 10 11 12One such requirement is to define the target CPU, normally by setting 13`CMAKE_SYSTEM_PROCESSOR`. **Note** that when using the toolchain files provided 14in [core_platform](https://git.mlplatform.org/ml/ethos-u/ethos-u-core-platform.git), 15the variable `TARGET_CPU` must be used instead of `CMAKE_SYSTEM_PROCESSOR`. 16 17Target CPU is specified on the form "cortex-m<nr><features>", for example: 18"cortex-m55+nodsp+nofp". 19 20Similarly the target NPU configuration is 21controlled by setting `ETHOSU_TARGET_NPU_CONFIG`, for example "ethos-u55-128". 22 23The build configuration can be defined either in the toolchain file or 24by passing options on the command line. 25 26```[bash] 27$ cmake -B build \ 28 -DCMAKE_TOOLCHAIN_FILE=<toolchain> \ 29 -DCMAKE_SYSTEM_PROCESSOR=cortex-m<nr><features> \ 30 -DETHOSU_TARGET_NPU_CONFIG=ethos-u<nr>-<macs> 31$ cmake --build build 32``` 33 34or when using toolchain files from [core_platform](https://git.mlplatform.org/ml/ethos-u/ethos-u-core-platform.git) 35 36```[bash] 37$ cmake -B build \ 38 -DCMAKE_TOOLCHAIN_FILE=<core_platform_toolchain> \ 39 -DTARGET_CPU=cortex-m<nr><features> \ 40 -DETHOSU_TARGET_NPU_CONFIG=ethos-u<nr>-<macs> 41$ cmake --build build 42``` 43 44## Driver APIs 45 46The driver APIs are defined in `include/ethosu_driver.h` and the related types 47in `include/ethosu_types.h`. Inferences can be invoked in two manners: 48synchronously or asynchronously. The two types of invocation can be freely mixed 49in a single application. 50 51### Synchronous invocation 52 53A typical usage of the driver can be the following: 54 55```[C] 56// reserve a driver to be used (this call could block until a driver is available) 57struct ethosu_driver *drv = ethosu_reserve_driver(); 58... 59// run one or more inferences 60int result = ethosu_invoke(drv, 61 custom_data_ptr, 62 custom_data_size, 63 base_addr, 64 base_addr_size, 65 num_base_addr); 66... 67// release the driver for others to use 68ethosu_release_driver(drv); 69``` 70 71### Asynchronous invocation 72 73A typical usage of the driver can be the following: 74 75```[C] 76// reserve a driver to be used (this call could block until a driver is available) 77struct ethosu_driver *drv = ethosu_reserve_driver(); 78... 79// run one or more inferences 80int result = ethosu_invoke_async(drv, 81 custom_data_ptr, 82 custom_data_size, 83 base_addr, 84 base_addr_size, 85 num_base_addr, 86 user_arg); 87... 88// do some other work 89... 90int ret; 91do { 92 // true = blocking, false = non-blocking 93 // ret > 0 means inference not completed (only for non-blocking mode) 94 ret = ethosu_wait(drv, <true|false>); 95} while(ret > 0); 96... 97// release the driver for others to use 98ethosu_release_driver(drv); 99``` 100 101Note that if `ethosu_wait` is invoked from a different thread and concurrently 102with `ethosu_invoke_async`, the user is responsible to guarantee that 103`ethosu_wait` is called after a successful completion of `ethosu_invoke_async`. 104Otherwise `ethosu_wait` might fail and not actually wait for the inference 105completion. 106 107### Driver initialization 108 109In order to use a driver it first needs to be initialized by calling the `init` 110function, which will also register the handle in the list of available drivers. 111A driver can be torn down by using the `deinit` function, which also removes the 112driver from the list. 113 114The correct mapping is one driver per NPU device. Note that the NPUs must have 115the same configuration, indeed the NPU configuration can be only one, which is 116defined at compile time. 117 118## Implementation design 119 120The driver is structured in two main parts: the driver, which is responsible to 121provide an unified API to the user; and the device part, which deals with the 122details at the hardware level. 123 124In order to do its task the driver needs a device implementation. There could be 125multiple device implementation for different hardware model and/or 126configurations. Note that the driver can be compiled to target only one NPU 127configuration by specializing the device part at compile time. 128 129## Data caching 130 131For running the driver on Arm CPUs which are configured with data cache, certain 132caution must be taken to ensure cache coherency. The driver expects that cache 133clean/flush has been done by the user application before being invoked. The 134driver does provide a deprecated weakly linked function `ethosu_flush_dcache` 135that could be overriden, causing the driver to cache flush/clean base pointers 136marked in the flush mask before each inference. By default the flush mask is set 137to only clean the scratch base pointer containing RW data (IFM in particular). 138It is recommended to not implement this function but have the user application 139make sure that IFM data has been written to memory before invoking an inference 140on the NPU. 141 142The driver also exposes a weakly linked symbol for cache invalidation called 143`ethosu_invalidate_dcache`, that must be overriden when the data cache is used. 144After starting an inference on the NPU, the driver will call this function to 145invalidate the base pointers marked in the invalidation mask. By defaults it 146invalidates the scratch base pointer keeping RW data, to ensure cache coherency 147after the inference is done. The invalidation call is done before waiting for 148the NPU to finish the inference so that depending on the network, the cycles 149for invalidating the cache may be completely hidden (the CPU performs cache 150invalidation before yielding while waiting for the NPU to finish). 151 152Make sure that any base pointers marked for flush/invalidation is aligned to the 153cache line size of your CPU, typically 32 bytes. While not implemented, to the 154really advanced user aiming for maximum performance, it is theoretically 155possible to tell the network compiler to align the IFM/OFM to cache line size, 156and modify the driver so that only OFM data is invalidated (and if left to the 157driver, only IFM data is cache cleaned/flushed). Due to the uncertainty of 158tensor alignment, the driver only flushes/invalidates on base pointer level. 159 160By default the cache flush- and invalidation mask is set to only mark the 161default scratch base pointer (base pointer 1). For maximum flexibility, the 162driver provides a function to modify the cache flush/invalidate masks called 163`ethosu_set_basep_cache_mask`. This function sets the two 8 bit masks, one for 164flush and one for invalidate, where bit 0 corresponds to base pointer 0, bit 1 165corresponds to base pointer 1 etc. See `include/ethosu_driver.h` for more 166information. 167 168An example implementation for the weak functions, using CMSIS primitives could 169look like below: 170 171```[C++] 172extern "C" { 173// Deprecated - recommended to flush/clean in application code 174// p must be 32 byte aligned 175void ethosu_flush_dcache(uint32_t *p, size_t bytes) { 176 SCB_CleanDCache_by_Addr(p, bytes); 177} 178 179// p must be 32 byte aligned 180void ethosu_invalidate_dcache(uint32_t *p, size_t bytes) { 181 SCB_InvalidateDCache_by_Addr(p, bytes); 182} 183} 184``` 185The NPU contain memory attributes that should be set to match the settings used 186in the MPU configuration for the memories used. See `NPU_MEM_ATTR_[0-3]` for 187Ethos-U85 and the `AXI_LIMIT[0-3]_MEM_TYPE` for Ethos-U55/Ethos-U65 in 188corresponding `src/ethosu_config_uX5.h` files. 189 190## Mutex and semaphores 191 192To ensure the correct functionality of the driver mutexes and semaphores are 193used internally. The default implementations of mutexes and semaphores are 194designed for a single-threaded baremetal environment. Hence for integration in 195environemnts where multi-threading is possible, e.g., RTOS, the user is 196responsible to provide implementation for mutexes and semaphores to be used by 197the driver. 198 199The mutex and semaphores are used as synchronisation mechanisms and unless 200specified, the timeout is required to be 'forever'. 201 202The driver allows for an RTOS to set a timeout for the NPU interrupt semaphore. 203The timeout can be set with the CMake variable `ETHOSU_INFERENCE_TIMEOUT`, which 204is then used as `timeout` argument for the interrupt semaphore take call. Note 205that the unit is implementation defined, the value is shipped as is to the 206`ethosu_semaphore_take()` function and an override implementation should cast it 207to the appropriate type and/or convert it to the unit desired. 208 209A macro `ETHOSU_SEMAPHORE_WAIT_FOREVER` is defined in the driver header file, 210and should be made sure to map to the RTOS' equivalent of 211'no timeout/wait forever'. Inference timeout value defaults to this if left 212unset. The macro is used internally in the driver for the available NPU's, thus 213the driver does NOT support setting a timeout other than forever when waiting 214for an NPU to become available (global ethosu_semaphore). 215 216The mutex and semaphore APIs are defined as weak linked functions that can be 217overridden by the user. The APIs are the usual ones and described below: 218 219```[C] 220// create a mutex by returning back a handle 221void *ethosu_mutex_create(void); 222// lock the given mutex 223int ethosu_mutex_lock(void *mutex); 224// unlock the given mutex 225int ethosu_mutex_unlock(void *mutex); 226 227// create a (binary) semaphore by returning back a handle 228void *ethosu_semaphore_create(void); 229// take from the given semaphore, accepting a timeout (unit impl. defined) 230int ethosu_semaphore_take(void *sem, uint64_t timeout); 231// give from the given semaphore 232int ethosu_semaphore_give(void *sem); 233``` 234 235## Begin/End inference callbacks 236 237The driver provide weak linked functions as hooks to receive callbacks whenever 238an inference begins and ends. The user can override such functions when needed. 239To avoid memory leaks, any allocations done in the ethosu_inference_begin() must 240be balanced by a corresponding free of the memory in the ethosu_inference_end() 241callback. 242 243The end callback will always be called if the begin callback has been called, 244including in the event of an interrupt semaphore take timeout. 245 246```[C] 247void ethosu_inference_begin(struct ethosu_driver *drv, void *user_arg); 248void ethosu_inference_end(struct ethosu_driver *drv, void *user_arg); 249``` 250 251Note that the `void *user_arg` pointer passed to invoke() function is the same 252pointer passed to the begin() and end() callbacks. For example: 253 254```[C] 255void my_function() { 256 ... 257 struct my_data data = {...}; 258 int result = int ethosu_invoke_v3(drv, 259 custom_data_ptr, 260 custom_data_size, 261 base_addr, 262 base_addr_size, 263 num_base_addr, 264 (void *)&data); 265 .... 266} 267 268void ethosu_inference_begin(struct ethosu_driver *drv, void *user_arg) { 269 struct my_data *data = (struct my_data*) user_arg; 270 // use drv and data here 271} 272 273void ethosu_inference_end(struct ethosu_driver *drv, void *user_arg) { 274 struct my_data *data = (struct my_data*) user_arg; 275 // use drv and data here 276} 277``` 278 279## License 280 281The Arm Ethos-U core driver is provided under an Apache-2.0 license. Please see 282[LICENSE.txt](LICENSE.txt) for more information. 283 284## Contributions 285 286The Arm Ethos-U project welcomes contributions under the Apache-2.0 license. 287 288Before we can accept your contribution, you need to certify its origin and give 289us your permission. For this process we use the Developer Certificate of Origin 290(DCO) V1.1 (https://developercertificate.org). 291 292To indicate that you agree to the terms of the DCO, you "sign off" your 293contribution by adding a line with your name and e-mail address to every git 294commit message. You must use your real name, no pseudonyms or anonymous 295contributions are accepted. If there are more than one contributor, everyone 296adds their name and e-mail to the commit message. 297 298```[] 299Author: John Doe \<john.doe@example.org\> 300Date: Mon Feb 29 12:12:12 2016 +0000 301 302Title of the commit 303 304Short description of the change. 305 306Signed-off-by: John Doe john.doe@example.org 307Signed-off-by: Foo Bar foo.bar@example.org 308``` 309 310The contributions will be code reviewed by Arm before they can be accepted into 311the repository. 312 313In order to submit a contribution push your patch to 314`ssh://<GITHUB_USER_ID>@review.mlplatform.org:29418/ml/ethos-u/ethos-u-core-driver`. 315To do this you will need to sign-in to 316[review.mlplatform.org](https://review.mlplatform.org) using a GitHub account 317and add your SSH key under your settings. If there is a problem adding the SSH 318key make sure there is a valid email address in the Email Addresses field. 319 320## Security 321 322Please see [Security](SECURITY.md). 323 324## Trademark notice 325 326Arm, Cortex and Ethos are registered trademarks of Arm Limited (or its 327subsidiaries) in the US and/or elsewhere. 328