1# CMSIS-DSP 2 3![GitHub release (latest by date including pre-releases)](https://img.shields.io/github/v/release/ARM-software/CMSIS-DSP?include_prereleases) ![GitHub](https://img.shields.io/github/license/ARM-software/CMSIS-DSP) 4 5 6## About 7 8CMSIS-DSP is an optimized compute library for embedded systems (DSP is in the name for legacy reasons). 9 10It provides optimized compute kernels for Cortex-M and for Cortex-A. 11 12Different variants are available according to the core and most of the functions are using a vectorized version when the Helium or Neon extension is available. 13 14This repository contains the CMSIS-DSP library and several other projects: 15 16* Test framework for bare metal Cortex-M or Cortex-A 17* Examples for bare metal Cortex-M 18* ComputeGraph 19* PythonWrapper 20 21You don't need any of the other projects to build and use CMSIS-DSP library. Building the other projects may require installation of other libraries (CMSIS), other tools (Arm Virtual Hardware) or CMSIS build tools. 22 23### License Terms 24 25CMSIS-DSP is licensed under [Apache License 2.0](LICENSE). 26 27### CMSIS-DSP Kernels 28 29Kernels provided by CMSIS-DSP (list not exhaustive): 30 31* Basic mathematics (real, complex, quaternion, linear algebra, fast math functions) 32* DSP (filtering) 33* Transforms (FFT, MFCC, DCT) 34* Statistics 35* Classical ML (Support Vector Machine, Distance functions for clustering ...) 36 37Kernels are provided with several datatypes : f64, f32, f16, q31, q15, q7. 38 39### Python wrapper 40 41A [PythonWrapper](https://pypi.org/project/cmsisdsp/) is also available and can be installed with: 42 43`pip install cmsisdsp` 44 45With this wrapper you can design your algorithm in Python using an API as close as possible to the C API. The wrapper is compatible with NumPy. The wrapper is supporting fixed point arithmetic. This wrapper works in google colab. 46 47The goal is to make it easier to move from a design to a final implementation in C. 48 49### Compute Graph 50 51CMSIS-DSP is also providing an experimental [static scheduler for compute graph](ComputeGraph/README.md) to describe streaming solutions: 52 53* You define your compute graph in Python 54* A static and deterministic schedule (computed by the Python script) is generated 55* The static schedule can be run on the device with low overhead 56 57The Python scripts for the static scheduler generator are part of the CMSIS-DSP Python wrapper. 58 59The header files are part of the CMSIS-DSP pack (version 1.10.2 and above). 60 61The Compute Graph makes it easier to implement a streaming solution : connecting different compute kernels each consuming and producing different amount of data. 62 63## Support / Contact 64 65For any questions or to reach the CMSIS-DSP team, please create a new issue in https://github.com/ARM-software/CMSIS-DSP/issues 66 67## Table of content 68 69* [Building for speed](#building-for-speed) 70 * [Options to use](#options-to-use) 71 * [Options to avoid](#options-to-avoid) 72* [Half float support](#half-float-support) 73* [How to build](#how-to-build) 74 * [How to build with MDK or Open CMSIS-Pack](#how-to-build-with-mdk-or-open-cmsis-pack) 75 * [How to build with Make](#how-to-build-with-make) 76 * [How to build with cmake](#how-to-build-with-cmake) 77 * [How to build with any other build system](#how-to-build-with-any-other-build-system) 78 * [How to build for aarch64](#how-to-build-for-aarch64) 79* [Code size](#code-size) 80* [Folders and files](#folders-and-files) 81 * [Folders](#folders) 82 * [Files](#files) 83 84## Building for speed 85 86CMSIS-DSP is used when you need performance. As consequence CMSIS-DSP should be compiled with the options giving the best performance: 87 88### Options to use 89 90* `-Ofast` must be used for best performances. 91* When using Helium it is strongly advised to use `-Ofast` 92* `GCC` is currently not giving good performances when targeting Helium. You should use the Arm compiler 93 94When float are used, then the fpu should be selected to ensure that the compiler is not using a software float emulation. 95 96When building with Helium support, it will be automatically detected by CMSIS-DSP. For Neon, it is not the case and you must enable the option `-DARM_MATH_NEON` for the C compilation. With `cmake` this option is controlled with `-DNEON=ON`. 97 98* `-DLOOPUNROLL=ON` can also be used when compiling with cmake 99* It corresponds to the C options `-DARM_MATH_LOOPUNROLL` 100 101Compilers are doing unrolling. So this option may not be needed but it is highly dependent on the compiler. With some compilers, this option is needed to get better performances. 102 103Speed of memory is important. If you can map the data and the constant tables used by CMSIS-DSP in `DTCM` memory then it is better. If you have a cache, enable it. 104 105### Options to avoid 106 107* `-fno-builtin` 108* `-ffreestanding` because it enables previous options 109 110The library is doing some type [punning](https://en.wikipedia.org/wiki/Type_punning) to process word 32 from memory as a pair of `q15` or a quadruple of `q7`. Those type manipulations are done through `memcpy` functions. Most compilers should be able to optimize out those function calls when the length to copy is small (4 bytes). 111 112This optimization will **not** occur when `-fno-builtin` is used and it will have a **very bad** impact on the performances. 113 114Some compiler may also require the use of option `-munaligned-access` to specify that unaligned accesses are used. 115 116## Half float support 117 118`f16` data type (half float) has been added to the library. It is useful only if your Cortex has some half float hardware acceleration (for instance with Helium extension). If you don't need `f16`, you should disable it since it may cause compilation problems. Just define `-DDISABLEFLOAT16` when building. 119 120## How to build 121 122You can build CMSIS-DSP with the open CMSIS-Pack, or cmake, or Makefile and it is also easy to build if you use any other build tool. 123 124### How to build with MDK or Open CMSIS-Pack 125 126The standard way to build is by using the CMSIS pack technology. CMSIS-DSP is available as a pack. 127 128This pack technology is supported by some IDE like [Keil MDK](https://www.keil.com/download/product/) or [Keil studio](https://www.keil.arm.com/). 129 130You can also use those packs using the [Open CMSIS-Pack](https://www.open-cmsis-pack.org/) technology and from command line on any platform. 131 132You should first install the tools from https://github.com/Open-CMSIS-Pack/devtools/tree/main/tools 133 134You can get the CMSIS-Toolbox which is containing the package installer, cmsis build and cmsis project manager. Here is some documentation: 135 136* Documentation about [CMSIS Build](https://open-cmsis-pack.github.io/devtools/buildmgr/latest/index.html) 137* Documentation about [CMSIS Pack](https://open-cmsis-pack.github.io/Open-CMSIS-Pack-Spec/main/html/index.html) 138* Documentation about [CMSIS Project manager](https://github.com/Open-CMSIS-Pack/devtools/blob/main/tools/projmgr/docs/Manual/Overview.md) 139 140Once you have installed the tools, you'll need to download the pack index using the `cpackget` tool. 141 142Then, you'll need to convert a solution file into `.cprj`. For instance, for the CMSIS-DSP Examples, you can go to: 143 144`Examples/cmsis_build` 145 146and then type 147 148`csolution convert -s examples.csolution_ac6.yml` 149 150This command processes the `examples.csolution_ac6.yml` describing how to build the examples for several platforms. It will generate lots of `.cprj` files that can be built with `cbuild`. 151 152If you want to build the `FFT` example for the `Corstone-300` virtual hardware platform, you could just do: 153 154`cbuild "fftbin.Release+VHT-Corstone-300.cprj"` 155 156### How to build with Make 157 158There is an example `Makefile` in `Source`. 159 160In each source folder (like `BasicMathFunctions`), you'll see files with no `_datatype` suffix (like `BasicMathFunctions.c` and `BasicMathFunctionsF16.c`). 161 162Those files are all you need in your makefile. They are including all other C files from the source folders. 163 164Then, for the includes you'll need to add the paths: `Include`, `PrivateInclude` and, since there is a dependency to CMSIS Core, `Core/Include` from `CMSIS_5/CMSIS`. 165 166If you are building for `Cortex-A` and want to use Neon, you'll also need to include `ComputeLibrary/Include` and the source file in `ComputeLibrary/Source`. 167 168### How to build with cmake 169 170Create a `CMakeLists.txt` and inside add a project. 171 172Add CMSIS-DSP as a subdirectory. The variable `CMSISDSP` is the path to the CMSIS-DSP repository in below example. 173 174```cmake 175cmake_minimum_required (VERSION 3.14) 176 177# Define the project 178project (testcmsisdsp VERSION 0.1) 179 180add_subdirectory(${CMSISDSP}/Source bin_dsp) 181``` 182 183CMSIS-DSP is dependent on the CMSIS Core includes. So, you should define `CMSISCORE` on the cmake command line. The path used by CMSIS-DSP will be `${CMSISCORE}/Include`. 184 185You should also set the compilation options to use to build the library. 186 187If you build for Helium, you should use any of the option `MVEF`, `MVEI` or `HELIUM`. 188 189If you build for Neon, use `NEON` and/or `NEONEXPERIMENTAL`. 190 191#### Launching the build 192 193Once cmake has generated the makefiles, you can use a GNU Make to build. 194 195 make VERBOSE=1 196 197### How to build with any other build system 198 199You need the following folders: 200 201* Source 202* Include 203* PrivateInclude 204* ComputeLibrary (only if you target Neon) 205 206In `Source` subfolders, you may either build all of the source file with a datatype suffix (like `_f32.c`), or just compile the files without a datatype suffix. For instance for `BasicMathFunctions`, you can build all the C files except `BasicMathFunctions.c` and `BasicMathFunctionsF16.c`, or you can just build those two files (they are including all of the other C files of the folder). 207 208`f16` files are not mandatory. You can build with `-DDISABLEFLOAT16` 209 210### How to build for aarch64 211 212The intrinsics defined in `Core_A/Include` are not available on recent Cortex-A processors. 213 214But you can still build for those Cortex-A cores and benefit from the Neon intrinsics. 215 216You need to build with `-D__GNUC_PYTHON__` on the compiler command line. This flag was introduced for building the Python wrapper and is disabling the use of CMSIS Core includes. 217 218When this flag is enabled, CMSIS-DSP is defining a few macros used in the library for compiler portability: 219 220```C 221#define __ALIGNED(x) __attribute__((aligned(x))) 222#define __STATIC_FORCEINLINE static inline __attribute__((always_inline)) 223#define __STATIC_INLINE static inline 224``` 225 226If the compiler you are using is requiring different definitions, you can add them to `arm_math_types.h` in the `Include` folder of the library. MSVC and XCode are already supported and in those case, you don't need to define `-D__GNUC_PYTHON__` 227 228Then, you need to define `-DARM_MATH_NEON` 229 230For cmake the equivalent options are: 231 232* `-DHOST=ON` 233* `-DNEON=ON` 234 235cmake is automatically including the `ComputeLibrary` folder. If you are using a different build, you need to include this folder too to build with Neon support. 236 237## Code size 238 239Previous versions of the library were using compilation directives to control the code size. It was too complex and not available in case CMSIS-DSP is only delivered as a static library. 240 241Now, the library relies again on the linker to do the code size optimization. But, this implies some constraints on the code you write and new functions had to be introduced. 242 243If you know the size of your FFT in advance, use initializations functions like `arm_cfft_init_64_f32` instead of using the generic initialization functions `arm_cfft_init_f32`. Using the generic function will prevent the linker from being able to deduce which functions and tables must be kept for the FFT and everything will be included. 244 245There are similar functions for RFFT, MFCC ... 246 247If the flag `ARM_DSP_CONFIG_TABLES` is still set, you'll now get a compilation error to remind you that this flag no more have any effect on code size and that you may have to rework the initializations. 248 249## Folders and files 250 251The only folders required to build and use CMSIS-DSP Library are: 252 253* Source 254* Include 255* PrivateInclude 256* ComputeLibrary (only when using Neon) 257 258Other folders are part of different projects, tests or examples. 259 260### Folders 261 262* cmsisdsp 263 * Required to build the CMSIS-DSP PythonWrapper for the Python repository 264 * It contains all Python packages 265* ComputeLibrary: 266 * Some kernels required when building CMSIS-DSP with Neon acceleration 267* Examples: 268 * Examples of use of CMSIS-DSP on bare metal Cortex-M 269 * Require the use of CMSIS Build tools 270* Include: 271 * Include files for CMSIS-DSP 272* PrivateInclude: 273 * Some include needed to build CMSIS-DSP 274* PythonWrapper: 275 * C code for the CMSIS-DSP PythonWrapper 276 * Examples for the PythonWrapper 277* Scripts: 278 * Debugging scripts 279 * Script to generate some coefficient tables used by CMSIS-DSP 280* Compute Graph: 281 * Not needed to build CMSIS-DSP. This project is relying on CMSIS-DSP library 282 * Examples for the Compute Graph 283 * C++ templates for the Compute Graph 284 * Default (and optional) nodes 285 286* Source: 287 * CMSIS-DSP source 288* Testing: 289 * CMSIS-DSP Test framework for bare metal Cortex-M and Cortex-A 290 * Require the use of CMSIS build tools 291 292### Files 293 294Some files are needed to generate the PythonWrapper: 295 296* PythonWrapper_README.md 297* LICENSE 298* MANIFEST.in 299* pyproject.toml 300* setup.py 301 302And we have a script to make it easier to customize the build: 303 304* cmsisdspconfig.py: 305 * Web browser UI to generate build configurations (temporary until the CMSIS-DSP configuration is reworked to be simpler and more maintainable) 306 307