1<!-- mdformat off(b/169948621#comment2) --> 2 3<!-- 4Semi-automated TOC generation with instructions from 5https://github.com/ekalinin/github-markdown-toc#auto-insert-and-update-toc 6--> 7 8<!--ts--> 9 * [Memory Management in TensorFlow Lite Micro](#memory-management-in-tensorflow-lite-micro) 10 * [Tensor Arena](#tensor-arena) 11 * [Head Section](#head-section) 12 * [Offline planned tensor allocations](#offline-planned-tensor-allocations) 13 * [Temporary Section](#temporary-section) 14 * [Tail Section](#tail-section) 15 * [Recording Memory APIs](#recording-memory-apis) 16 * [Allocation Section Details](#allocation-section-details) 17 18<!-- Added by: freddan80, at: Mon 29 Mar 2021 01:47:42 PM CEST --> 19 20<!--te--> 21 22# Memory Management in TensorFlow Lite Micro 23 24This document outlines how memory is managed internally by TensorFlow Lite Micro 25(TFLM) today. It outlines the "online" allocation strategy used by the default 26TFLM APIs for loading a model into a shared tensor arena. 27 28## Tensor Arena 29 30The main "working" space for TFLM allocations is inside a single `char` or 31`int8_t` buffer. This buffer can be managed by passing it directly into a 32`tflite::MicroInterpreter` constructor or through a `tflite::MicroAllocator` 33instance that can be passed into a `tflite::MicroInterpreter` constructor. 34Internally, the `tflite::MicroAllocator` classifies allocations into 3 different 35sections: 36 37* **Head** - non-persistent allocations. 38* **Temporary** - short term "scoped" allocations. 39* **Tail** - persistent allocations. 40 41The illustration below represents typical allocations in TFLM: 42 43``` 44-------------------------------------------------------------------------------- 45| | | | 46| HEAD |<-- TEMPORARY -->| TAIL | 47| | | | 48-------------------------------------------------------------------------------- 49* Lowest Address Highest Address * 50``` 51 52### Head Section 53 54This non-persistent section typically holds shared Tensor buffers. This section 55does not allocate small iterative chunks, it can only be set by a specific 56length for the entire section. 57 58This allocation length of this section is managed by the 59`tflite::GreedyMemoryPlanner`. That memory planner looks at the entire graph of 60a model and tries to reuse as many buffers as possible to create the smallest 61length for the head. The Tensor buffers for this section can be accessed via a 62`TfLiteEvalTensor` or `TfLiteTensor` instance on the `tflite::MicroInterpreter`. 63 64#### Offline planned tensor allocations 65 66All, or a subset of, tensors can be allocated using an offline planner. An 67offline planner performs tensor allocation on e.g. a host PC. The offline tensor 68allocation plan is added to model metadata. See format below. 69 70For each non-constant tensor in the `tensors:[Tensor]` list of the subgraph, a 71byte offset to the start of the head section of the memory arena is given. -1 72indicates that the tensor will be allocated at runtime by the 73`tflite::GreedyMemoryPlanner`. The offline plan is permitted to overlap buffers 74if it knows that the data will not be used at the same time. 75 76The offline tensor allocation plan will be encoded in the `metadata:[Metadata]` 77field of the model, using the following encoding: 78 79| Metadata component | Value | 80|-|-| 81| name:string | “OfflineMemoryAllocation” | 82| buffer:unit | Index of buffer containing offline tensor allocation data | 83 84The buffer contents for the offline tensor allocation is a list of 32-bit 85integers of the following format: 86 87| Offset | Value | 88|-|-| 89| 0 | Offline allocation format version | 90| 1 | Subgraph index to which this allocation applies | 91| 2 | Number offsets following: n | 92| 3 | Byte offset of tensor #0 or -1 to allocate at runtime | 93| 4 | Byte offset of tensor #1 or -1 to allocate at runtime | 94| ... | ... | 95| 3+(n-1) | Byte offset of tensor #(n-1) or -1 to allocate at runtime | 96 97The `tflite::GreedyMemoryPlanner` treats the provided offline tensor allocation 98plan as constant fixed offset to the start of the head section and will attempt 99to fit any other tensors (such as scratch tensors added a runtime using the 100`RequestScratchBufferInArena` API of `TfLiteContext`) around those fixed 101offsets. 102 103### Temporary Section 104 105This section is used to allocate "scoped" or short-term, non-guaranteed buffers. 106Allocations from this section start from the current end address of the head 107section and grow towards the tail section. An allocation chain can be reset (and 108must be reset before adjusting the head) and moves the current allocation start 109address back to the end of the head section. 110 111TFLM currently uses these allocations for a scope allocation of large C structs 112or scratch memory that is expected to be valid for at least the lifetime of a 113method call. This section. 114 115### Tail Section 116 117This section holds all persistent allocations used by TFLM. This section 118contains many random sized allocations and grows towards the end of the head 119section. Allocations in this section come from a variety of areas inside of 120TFLM. TFLM provides a [recording API](#Recording-Memory-APIs) to assist with 121auditing the contents of this section. 122 123## Recording Memory APIs 124 125TFLM provides simple APIs for auditing memory usage in the shared tensor arena. 126These APIs are opt-in and require some additional memory overhead and a working 127debug logging implementation 128[(reference implementation)](https://github.com/tensorflow/tflite-micro/blob/main/tensorflow/lite/micro/debug_log.cc). 129 130A typical bare-bones TFLM interpreter setup looks as such: 131 132```c++ 133// Buffer for the tensor arena: 134size_t tensor_arena_size = 2048; 135uint8_t tensor_arena[tensor_arena_size]; 136 137// Interpreter using the shared tensor arena above: 138tflite::MicroInterpreter interpreter( 139 tflite::GetModel(my_model_data), ops_resolver, 140 tensor_arena, tensor_arena_size, error_reporter); 141 142// Invoke one time which will allocate internals: 143if (interpreter.Invoke() != kTfLiteOk) { 144 TF_LITE_REPORT_ERROR(error_reporter, "Exception during invoke()!"); 145} 146``` 147 148Recording API can simply be used by including the `RecordingMicroInterpreter` 149class (`recording_micro_interpreter.h`) and replace `tflite::MicroInterpreter` 150with `tflite::RecordingMicroInterpreter`. The same call to `invoke()` is 151performed, but another call is made to `PrintAllocations()` which will output 152detailed allocation logging: 153 154```c++ 155// Add an include to the recording API: 156#include "recording_micro_interpreter.h" 157 158// Simply change the class name from 'MicroInterpreter' to 'RecordingMicroInterpreter': 159tflite::RecordingMicroInterpreter interpreter( 160 tflite::GetModel(my_model_data), ops_resolver, 161 tensor_arena, tensor_arena_size, error_reporter); 162 163// Invoke one time which will allocate internals: 164if (interpreter.Invoke() != kTfLiteOk) { 165 TF_LITE_REPORT_ERROR(error_reporter, "Exception during invoke()!"); 166} 167 168// Print out detailed allocation information: 169interpreter.GetMicroAllocator().PrintAllocations(); 170``` 171 172The output of this call will look something similar to this (output from the 173[memory_arena_threshold_test](https://github.com/tensorflow/tflite-micro/blob/main/tensorflow/lite/micro/memory_arena_threshold_test.cc#L205)): 174 175```bash 176[RecordingMicroAllocator] Arena allocation total 9568 bytes 177[RecordingMicroAllocator] Arena allocation head 7744 bytes 178[RecordingMicroAllocator] Arena allocation tail 1824 bytes 179[RecordingMicroAllocator] 'TfLiteEvalTensor data' used 360 bytes with alignment overhead (requested 360 bytes for 15 allocations) 180[RecordingMicroAllocator] 'Persistent TfLiteTensor data' used 0 bytes with alignment overhead (requested 0 bytes for 0 tensors) 181[RecordingMicroAllocator] 'Persistent TfLiteTensor quantization data' used 0 bytes with alignment overhead (requested 0 bytes for 0 allocations) 182[RecordingMicroAllocator] 'TfLiteTensor variable buffer data' used 0 bytes with alignment overhead (requested 0 bytes for 0 allocations) 183[RecordingMicroAllocator] 'NodeAndRegistration struct' used 392 bytes with alignment overhead (requested 392 bytes for 7 NodeAndRegistration structs) 184[RecordingMicroAllocator] 'Operator runtime data' used 136 bytes with alignment overhead (requested 136 bytes for 5 OpData structs) 185``` 186 187### Allocation Section Details 188 189More information about each recorded allocation section: 190 191* 'TfLiteEvalTensor data' 192 * C struct that holds the data type, dimension, and a pointer to the 193 buffer representing the Tensor. 194* 'Persistent TfLiteTensor data' 195 * C struct that holds more information than a `TfLiteEvalTensor` struct in 196 the graph. 197 * Allocations in this bucket will only show up when accessing tensors from 198 the accessors on `tflite::MicroInterpreter`. 199* 'Persistent TfLiteTensor quantization data' 200 * Length of persistent quantization data assigned to persistent 201 `TfLiteTensor` structs. 202 * Allocations in this bucket will only show up when accessing tensors from 203 the accessors on `tflite::MicroInterpreter`. 204* 'TfLiteTensor variable buffer data' 205 * Length of buffer data from a variable tensor (retains data throughout 206 calls to `invoke()`). 207* 'NodeAndRegistration struct' 208 * C struct that holds a `TfLiteRegistration` and `TfLiteNode` struct 209 instance. 210 * Each operator in a model will contain one `NodeAndRegistration` struct. 211* 'Operator runtime data' 212 * Persistent allocations of data cached by TFLM kernels (e.g. quantization 213 params, multipliers, etc). 214