1<!-- 2Semi-automated TOC generation with instructions from 3https://github.com/ekalinin/github-markdown-toc#auto-insert-and-update-toc 4--> 5 6<!--ts--> 7 8* [Porting to a new platform](#porting-to-a-new-platform) 9 * [Requirements](#requirements) 10 * [Getting started](#getting-started) 11 * [Troubleshooting](#troubleshooting) 12 * [Optimizing for your platform](#optimizing-for-your-platform) 13 * [Code module organization](#code-module-organization) 14 * [Implementing more optimizations](#implementing-more-optimizations) 15 16<!-- Added by: advaitjain, at: Mon 05 Oct 2020 02:36:46 PM PDT --> 17 18<!--te--> 19 20***Please note that we are currently pausing accepting new platforms***. Please 21see our [contributions guide](../CONTRIBUTING.md) for more details and context. 22 23Parts of the documentation below will likely change as we start accepting new 24platform support again. 25 26# Porting to a new platform 27 28The remainder of this document provides guidance on porting TensorFlow Lite for 29Microcontrollers to new platforms. You should read the 30[developer documentation](https://www.tensorflow.org/lite/microcontrollers) 31first. 32 33## Requirements 34 35Since the core neural network operations are pure arithmetic, and don't require 36any I/O or other system-specific functionality, the code doesn't have to have 37many dependencies. We've tried to enforce this, so that it's as easy as possible 38to get TensorFlow Lite Micro running even on 'bare metal' systems without an OS. 39Here are the core requirements that a platform needs to run the framework: 40 41- C/C++ compiler capable of C++11 compatibility. This is probably the most 42 restrictive of the requirements, since C++11 is not as widely adopted in the 43 embedded world as it is elsewhere. We made the decision to require it since 44 one of the main goals of TFL Micro is to share as much code as possible with 45 the wider TensorFlow codebase, and since that relies on C++11 features, we 46 need compatibility to achieve it. We only use a small subset of C++ though, 47 so don't worry about having to deal with template metaprogramming or 48 similar challenges! 49 50- Debug logging. The core network operations don't need any I/O functions, but 51 to be able to run tests and tell if they've worked as expected, the 52 framework needs some way to write out a string to some kind of debug 53 console. This will vary from system to system, for example on Linux it could 54 just be `fprintf(stderr, debug_string)` whereas an embedded device might 55 write the string out to a specified UART. As long as there's some mechanism 56 for outputting debug strings, you should be able to use TFL Micro on that 57 platform. 58 59- Math library. The C standard `libm.a` library is needed to handle some of 60 the mathematical operations used to calculate neural network results. 61 62- Global variable initialization. We do use a pattern of relying on global 63 variables being set before `main()` is run in some places, so you'll need to 64 make sure your compiler toolchain supports this. 65 66And that's it! You may be wondering about some other common requirements that 67are needed by a lot of non-embedded software, so here's a brief list of things 68that aren't necessary to get started with TFL Micro on a new platform: 69 70- Operating system. Since the only platform-specific function we need is 71 `DebugLog()`, there's no requirement for any kind of Posix or similar 72 functionality around files, processes, or threads. 73 74- C or C++ standard libraries. The framework tries to avoid relying on any 75 standard library functions that require linker-time support. This includes 76 things like string functions, but still allows us to use headers like 77 `stdtypes.h` which typically just define constants and typedefs. 78 Unfortunately this distinction isn't officially defined by any standard, so 79 it's possible that different toolchains may decide to require linked code 80 even for the subset we use, but in practice we've found it's usually a 81 pretty obvious decision and stable over platforms and toolchains. 82 83- Dynamic memory allocation. All the TFL Micro code avoids dynamic memory 84 allocation, instead relying on local variables on the stack in most cases, 85 or global variables for a few situations. These are all fixed-size, which 86 can mean some compile-time configuration to ensure there's enough space for 87 particular networks, but does avoid any need for a heap and the 88 implementation of `malloc\new` on a platform. 89 90- Floating point. Eight-bit integer arithmetic is enough for inference on many 91 networks, so if a model sticks to these kind of quantized operations, no 92 floating point instructions should be required or executed by the framework. 93 94## Getting started 95 96We recommend that you start trying to compile and run one of the simplest tests 97in the framework as your first step. The full TensorFlow codebase can seem 98overwhelming to work with at first, so instead you can begin with a collection 99of self-contained project folders that only include the source files needed for 100a particular test or executable. You can find a set of pre-generated projects 101[here](https://drive.google.com/open?id=1cawEQAkqquK_SO4crReDYqf_v7yAwOY8). 102 103As mentioned above, the one function you will need to implement for a completely 104new platform is debug logging. If your device is just a variation on an existing 105platform you may be able to reuse code that's already been written. To 106understand what's available, begin with the default reference implementation at 107[tensorflow/lite/micro/debug_log.cc](https://github.com/tensorflow/tflite-micro/blob/main/tensorflow/lite/micro/debug_log.cc), 108which uses fprintf and stderr. If your platform has this level of support for 109the C standard library in its toolchain, then you can just reuse this. 110Otherwise, you'll need to do some research into how your platform and device can 111communicate logging statements to the outside world. As another example, take a 112look at 113[the Mbed version of `DebugLog()`](https://github.com/tensorflow/tflite-micro/blob/main/tensorflow/lite/micro/mbed/debug_log.cc), 114which creates a UART object and uses it to output strings to the host's console 115if it's connected. 116 117Begin by navigating to the micro_error_reporter_test folder in the pregenerated 118projects you downloaded. Inside here, you'll see a set of folders containing all 119the source code you need. If you look through them, you should find a total of 120around 60 C or C++ files that compiled together will create the test executable. 121There's an example makefile in the directory that lists all of the source files 122and include paths for the headers. If you're building on a Linux or MacOS host 123system, you may just be able to reuse that same makefile to cross-compile for 124your system, as long as you swap out the `CC` and `CXX` variables from their 125defaults, to point to your cross compiler instead (for example 126`arm-none-eabi-gcc` or `riscv64-unknown-elf-gcc`). Otherwise, set up a project 127in the build system you are using. It should hopefully be fairly 128straightforward, since all of the source files in the folder need to be 129compiled, so on many IDEs you can just drag the whole lot in. Then you need to 130make sure that C++11 compatibility is turned on, and that the right include 131paths (as mentioned in the makefile) have been added. 132 133You'll see the default `DebugLog()` implementation in 134'tensorflow/lite/micro/debug_log.cc' inside the micro_error_reporter_test 135folder. Modify that file to add the right implementation for your platform, and 136then you should be able to build the set of files into an executable. Transfer 137that executable to your target device (for example by flashing it), and then try 138running it. You should see output that looks something like this: 139 140``` 141Number: 42 142Badly-formed format string 143Another badly-formed format string 144~~ALL TESTS PASSED~~~ 145``` 146 147If not, you'll need to debug what went wrong, but hopefully with this small 148starting project it should be manageable. 149 150## Troubleshooting 151 152When we've been porting to new platforms, it's often been hard to figure out 153some of the fundamentals like linker settings and other toolchain setup flags. 154If you are having trouble, see if you can find a simple example program for your 155platform, like one that just blinks an LED. If you're able to build and run that 156successfully, then start to swap in parts of the TF Lite Micro codebase to that 157working project, taking it a step at a time and ensuring it's still working 158after every change. For example, a first step might be to paste in your 159`DebugLog()` implementation and call `DebugLog("Hello World!")` from the main 160function. 161 162Another common problem on embedded platforms is the stack size being too small. 163Mbed defaults to 4KB for the main thread's stack, which is too small for most 164models since TensorFlow Lite allocates buffers and other data structures that 165require more memory. The exact size will depend on which model you're running, 166but try increasing it if you are running into strange corruption issues that 167might be related to stack overwriting. 168 169## Optimizing for your platform 170 171The default reference implementations in TensorFlow Lite Micro are written to be 172portable and easy to understand, not fast, so you'll want to replace performance 173critical parts of the code with versions specifically tailored to your 174architecture. The framework has been designed with this in mind, and we hope the 175combination of small modules and many tests makes it as straightforward as 176possible to swap in your own code a piece at a time, ensuring you have a working 177version at every step. To write specialized implementations for a platform, it's 178useful to understand how optional components are handled inside the build 179system. 180 181## Code module organization 182 183We have adopted a system of small modules with platform-specific implementations 184to help with portability. Every module is just a standard `.h` header file 185containing the interface (either functions or a class), with an accompanying 186reference implementation in a `.cc` with the same name. The source file 187implements all of the code that's declared in the header. If you have a 188specialized implementation, you can create a folder in the same directory as the 189header and reference source, name it after your platform, and put your 190implementation in a `.cc` file inside that folder. We've already seen one 191example of this, where the Mbed and Bluepill versions of `DebugLog()` are inside 192[mbed](https://github.com/tensorflow/tflite-micro/blob/main/tensorflow/lite/micro/mbed) 193and 194[bluepill](https://github.com/tensorflow/tflite-micro/blob/main/tensorflow/lite/micro/bluepill) 195folders, children of the 196[same directory](https://github.com/tensorflow/tflite-micro/blob/main/tensorflow/lite//micro) 197where the stdio-based 198[`debug_log.cc`](https://github.com/tensorflow/tflite-micro/blob/main/tensorflow/lite/micro/debug_log.cc) 199reference implementation is found. 200 201The advantage of this approach is that we can automatically pick specialized 202implementations based on the current build target, without having to manually 203edit build files for every new platform. It allows incremental optimizations 204from a always-working foundation, without cluttering the reference 205implementations with a lot of variants. 206 207To see why we're doing this, it's worth looking at the alternatives. TensorFlow 208Lite has traditionally used preprocessor macros to separate out some 209platform-specific code within particular files, for example: 210 211``` 212#ifndef USE_NEON 213#if defined(__ARM_NEON__) || defined(__ARM_NEON) 214#define USE_NEON 215#include <arm_neon.h> 216#endif 217``` 218 219There’s also a tradition in gemmlowp of using file suffixes to indicate 220platform-specific versions of particular headers, with kernel_neon.h being 221included by kernel.h if `USE_NEON` is defined. As a third variation, kernels are 222separated out using a directory structure, with 223tensorflow/lite/kernels/internal/reference containing portable implementations, 224and tensorflow/lite/kernels/internal/optimized holding versions optimized for 225NEON on Arm platforms. 226 227These approaches are hard to extend to multiple platforms. Using macros means 228that platform-specific code is scattered throughout files in a hard-to-find way, 229and can make following the control flow difficult since you need to understand 230the macro state to trace it. For example, I temporarily introduced a bug that 231disabled NEON optimizations for some kernels when I removed 232tensorflow/lite/kernels/internal/common.h from their includes, without realizing 233it was where USE_NEON was defined! 234 235It’s also tough to port to different build systems, since figuring out the right 236combination of macros to use can be hard, especially since some of them are 237automatically defined by the compiler, and others are only set by build scripts, 238often across multiple rules. 239 240The approach we are using extends the file system approach that we use for 241kernel implementations, but with some specific conventions: 242 243- For each module in TensorFlow Lite, there will be a parent directory that 244 contains tests, interface headers used by other modules, and portable 245 implementations of each part. 246- Portable means that the code doesn’t include code from any libraries except 247 flatbuffers, or other TF Lite modules. You can include a limited subset of 248 standard C or C++ headers, but you can’t use any functions that require 249 linking against those libraries, including fprintf, etc. You can link 250 against functions in the standard math library, in <math.h>. 251- Specialized implementations are held inside subfolders of the parent 252 directory, named after the platform or library that they depend on. So, for 253 example if you had my_module/foo.cc, a version that used RISC-V extensions 254 would live in my_module/riscv/foo.cc. If you had a version that used the 255 CMSIS library, it should be in my_module/cmsis/foo.cc. 256- These specialized implementations should completely replace the top-level 257 implementations. If this involves too much code duplication, the top-level 258 implementation should be split into smaller files, so only the 259 platform-specific code needs to be replaced. 260- There is a convention about how build systems pick the right implementation 261 file. There will be an ordered list of 'tags' defining the preferred 262 implementations, and to generate the right list of source files, each module 263 will be examined in turn. If a subfolder with a tag’s name contains a .cc 264 file with the same base name as one in the parent folder, then it will 265 replace the parent folder’s version in the list of build files. If there are 266 multiple subfolders with matching tags and file names, then the tag that’s 267 latest in the ordered list will be chosen. This allows us to express “I’d 268 like generically-optimized fixed point if it’s available, but I’d prefer 269 something using the CMSIS library” using the list 'fixed_point cmsis'. These 270 tags are passed in as `TAGS="<foo>"` on the command line when you use the 271 main Makefile to build. 272- There is an implicit “reference” tag at the start of every list, so that 273 it’s possible to support directory structures like the current 274 tensorflow/kernels/internal where portable implementations are held in a 275 “reference” folder that’s a sibling to the NEON-optimized folder. 276- The headers for each unit in a module should remain platform-agnostic, and 277 be the same for all implementations. Private headers inside a sub-folder can 278 be used as needed, but shouldn’t be referred to by any portable code at the 279 top level. 280- Tests should be at the parent level, with no platform-specific code. 281- No platform-specific macros or #ifdef’s should be used in any portable code. 282 283The implementation of these rules is handled inside the Makefile, with a 284[`specialize` function](https://github.com/tensorflow/tflite-micro/blob/main/tensorflow/lite/micro/tools/make/helper_functions.inc#L42) 285that takes a list of reference source file paths as an input, and returns the 286equivalent list with specialized versions of those files swapped in if they 287exist. 288 289## Implementing more optimizations 290 291Clearly, getting debug logging support is only the beginning of the work you'll 292need to do on a particular platform. It's very likely that you'll want to 293optimize the core deep learning operations that take up the most time when 294running models you care about. The good news is that the process for providing 295optimized implementations is the same as the one you just went through to 296provide your own logging. You'll need to identify parts of the code that are 297bottlenecks, and then add specialized implementations in their own folders. 298These don't need to be platform specific, they can also be broken out by which 299library they rely on for example. [Here's where we do that for the CMSIS 300implementation of integer fast-fourier 301transforms](https://github.com/tensorflow/tflite-micro/blob/main/tensorflow/lite/micro/examples/micro_speech/simple_features/simple_features_generator.cc). 302This more complex case shows that you can also add helper source files alongside 303the main implementation, as long as you 304[mention them in the platform-specific makefile](https://github.com/tensorflow/tflite-micro/blob/main/tensorflow/lite/micro/examples/micro_speech/CMSIS/Makefile.inc). 305You can also do things like update the list of libraries that need to be linked 306in, or add include paths to required headers. 307