1# Overview {#mainpage}
2
3## Introduction
4
5This user manual describes the CMSIS DSP software library, a suite of common compute processing functions for use on Cortex-M and Cortex-A processor based devices.
6
7The library is divided into a number of functions each covering a specific category:
8
9 - \ref groupMath "Basic math functions"
10 - \ref groupFastMath "Fast math functions"
11 - \ref groupCmplxMath "Complex math functions"
12 - \ref groupFilters "Filtering functions"
13 - \ref groupMatrix "Matrix functions"
14 - \ref groupTransforms "Transform functions"
15 - \ref groupController "Motor control functions"
16 - \ref groupStats "Statistical functions"
17 - \ref groupSupport "Support functions"
18 - \ref groupInterpolation "Interpolation functions"
19 - \ref groupSVM "Support Vector Machine functions (SVM)"
20 - \ref groupBayes "Bayes classifier functions"
21 - \ref groupDistance "Distance functions"
22 - \ref groupQuaternionMath "Quaternion functions"
23 - \ref groupWindow "Window functions"
24
25The library has generally separate functions for operating on 8-bit integers, 16-bit integers, 32-bit integer and 32-bit floating-point values and 64-bit floating-point values.
26
27The library is providing vectorized versions of most algorithms for Helium and of most f32 algorithms for Neon.
28
29When using a vectorized version, provide a little bit of padding after the end of a buffer (3 words) because the vectorized code may read a little bit after the end of a buffer. You don't have to modify your buffers but just ensure that the end of buffer + padding is not outside of a memory region.
30
31## Related projects
32
33### Python wrapper
34
35A Python wrapper is also available with a Python API as close as possible to the C one. It can be used to start developing and testing an algorithm with NumPy and SciPy before writing the C version. Is is available on [PyPI.org](https://pypi.org/project/cmsisdsp/). It can be installed with: `pip install cmsisdsp`.
36
37### Experimental C++ template extension
38
39This extension is a set of C++ headers. They just need to included to start using the features.
40
41Those headers are not yet part of the pack and you need to get them from the [github repository](https://github.com/ARM-software/CMSIS-DSP/tree/main/dsppp/Include)
42
43More documentation about the @ref dsppp_main "DSP++" extension.
44
45## Using the CMSIS-DSP Library {#using}
46
47The library is released in source form. It is strongly advised to compile the library using `-Ofast` optimization to have the best performances.
48
49Following options should be avoided:
50
51* `-fno-builtin`
52* `-ffreestanding` because it enables previous options
53
54The library is doing some type [punning](https://en.wikipedia.org/wiki/Type_punning) to process word 32 from memory as a pair of `q15` or a quadruple of `q7`.  Those type manipulations are done through `memcpy` functions. Most compilers should be able to optimize out those function calls when the length to copy is small (4 bytes).
55
56This optimization will **not** occur when `-fno-builtin` is used and it will have a **very bad** impact on the performances.
57
58
59The library functions are declared in the public file `Include/arm_math.h`. Simply include this file to use the CMSIS-DSP library. If you don't want to include everything, you can also rely on individual header files from the `Include/dsp/` folder and include only those that are needed in the project.
60
61## Examples {#example}
62
63The library ships with a number of examples which demonstrate how to use the library functions. Please refer to \ref groupExamples.
64
65## Toolchain Support {#toolchain}
66
67The library is now tested on Fast Models building with cmake. Core M0, M4, M7, M33, M55 are tested.
68
69## Access to CMSIS-DSP {#pack}
70
71CMSIS-DSP is actively maintained in the [**CMSIS-DSP GitHub repository**](https://github.com/ARM-software/CMSIS-DSP) and is released as a standalone [**CMSIS-DSP pack**](https://www.keil.arm.com/packs/cmsis-dsp-arm/versions/) in the [CMSIS-Pack format](https://open-cmsis-pack.github.io/Open-CMSIS-Pack-Spec/main/html/index.html).
72
73The table below explains the content of **ARM::CMSIS-DSP** pack.
74
75 Directory                             | Description
76:--------------------------------------|:------------------------------------------------------
77 �� ComputeLibrary                     | Small Neon kernels when building on Cortex-A
78 �� Documentation                      | Folder with this CMSIS-DSP documenation
79 �� Example                            | Example projects demonstrating the usage of the library functions
80 �� Include                            | Include files for using and building the lib
81 �� PrivateInclude                     | Private include files for building the lib
82 �� Source                             | Source files
83 �� ARM.CMSIS-DSP.pdsc                 | CMSIS-Pack description file
84 �� LICENSE                            | License Agreement (Apache 2.0)
85
86See [CMSIS Documentation](https://arm-software.github.io/CMSIS_6/) for an overview of CMSIS software components, tools and specifications.
87
88
89## Preprocessor Macros {#preprocessor}
90
91Each library project has different preprocessor macros.
92
93 - `ARM_MATH_BIG_ENDIAN`:
94   - Define macro ARM_MATH_BIG_ENDIAN to build the library for big endian targets. By default library builds for little endian targets.
95
96 - `ARM_MATH_MATRIX_CHECK`:
97   - Define macro ARM_MATH_MATRIX_CHECK for checking on the input and output sizes of matrices
98
99 - `ARM_MATH_ROUNDING`:
100   - Define macro ARM_MATH_ROUNDING for rounding on support functions
101
102 - `ARM_MATH_LOOPUNROLL`:
103   - Define macro ARM_MATH_LOOPUNROLL to enable manual loop unrolling in DSP functions
104
105 - `ARM_MATH_NEON`:
106   - Define macro ARM_MATH_NEON to enable Neon versions of the DSP functions. It is not enabled by default when Neon is available because performances are dependent on the compiler and target architecture.
107
108 - `ARM_MATH_NEON_EXPERIMENTAL`:
109   - Define macro ARM_MATH_NEON_EXPERIMENTAL to enable experimental Neon versions of of some DSP functions. Experimental Neon versions currently do not have better performances than the scalar versions.
110
111 - `ARM_MATH_HELIUM`:
112   - It implies the flags ARM_MATH_MVEF and ARM_MATH_MVEI and ARM_MATH_MVE_FLOAT16.
113
114 - `ARM_MATH_HELIUM_EXPERIMENTAL`:
115   - Only taken into account when ARM_MATH_MVEF, ARM_MATH_MVEI or ARM_MATH_MVE_FLOAT16 are defined. Enable some vector versions which may have worse performance than scalar depending on the core / compiler configuration.
116
117 - `ARM_MATH_MVEF`:
118   - Select Helium versions of the f32 algorithms. It implies ARM_MATH_FLOAT16 and ARM_MATH_MVEI.
119
120 - `ARM_MATH_MVEI`:
121   - Select Helium versions of the int and fixed point algorithms.
122
123 - `ARM_MATH_MVE_FLOAT16`:
124   - MVE Float16 implementations of some algorithms (Requires MVE extension).
125
126 - `DISABLEFLOAT16`:
127   - Disable float16 algorithms when __fp16 is not supported for a specific compiler / core configuration. This is only valid for scalar. When vector architecture is supporting f16 then it can't be disabled.
128
129 - `ARM_MATH_AUTOVECTORIZE`:
130   - With Helium or Neon, disable the use of vectorized code with C intrinsics and use pure C instead. The vectorization is then done by the compiler.
131
132 - `ARM_DSP_ATTRIBUTE`: Can be set to define CMSIS-DSP function as weak functions. This can either be set on the command line when building or in a new `arm_dsp_config.h` header (see below)
133
134 - `ARM_DSP_TABLE_ATTRIBUTE`: Can be set to define in which section constant tables must be mapped. This can either be set on the command line when building or in a new `arm_dsp_config.h` header (see below). Another way to set those sections is by modifying the linker scripts since the constant tables are defined only in a restricted set of source files.
135
136 - `ARM_DSP_CUSTOM_CONFIG` When set, the file `arm_dsp_config.h` is included by the `arm_math_types.h` headers. You can use this file to define any of the above compilation symbols.
137
138## Code size
139
140Previous versions were using lots of compilation flags to control code size. It was enabled with `ARM_DSP_CONFIG_TABLES`. It was getting too complex and has been removed. Now code size optimizations are relying on the linker.
141
142You no more need to use any compilation flags like `ARM_TABLE_TWIDDLECOEF_F32_2048`, `ARM_FFT_ALLOW_TABLES` etc ...
143
144They have been removed.
145
146Constant tables can use a lot of read only memory but the linker can remove the unused functions and constant tables if it can deduce that those tables or functions are not used.
147
148For this you need to use the right initialization functions in the library and the right options for the linker (they are compiler dependent).
149
150For all transforms functions (CFFT, RFFT ...) instead of using a generic initialization function that works for all lengths (like `arm_cfft_init_f32`), use a dedicated initialization function for a specific size (like `arm_cfft_init_1024_f32`).
151
152By using the right initialization function, you're telling the linker what is really used.
153
154If you use a generic function, the linker cannot deduce the used lengths and thus will keep all the constant tables required for each length.
155
156Then you need to use the right options for the compiler so that the unused tables and functions are removed. It is compiler dependent but generally the options are named like `-ffunction-sections`, `-fdata-sections`, `--gc-sections` ...
157
158## Variations between the architectures
159
160Some algorithms may give slightlty different results on different architectures (like M0 or M4/M7 or M55). It is a tradeoff made for speed reasons and to make best use of the different instruction sets.
161
162All algorithms are compared with a double precision reference and the different versions (for different architectures) have the same characteristics when compared to the double precision (SNR bound, max bound for sample error ...)
163
164As consequence, the small differences that may exists between the different architecture implementations should be too small to have any practical consequences.
165
166
167
168## License {#license}
169
170The CMSIS-DSP is provided free of charge under the [Apache 2.0 License](https://raw.githubusercontent.com/ARM-software/CMSIS-DSP/main/LICENSE).
171