1<!-- mdformat off(b/169948621#comment2) -->
2
3<!--
4Semi-automated TOC generation with instructions from
5https://github.com/ekalinin/github-markdown-toc#auto-insert-and-update-toc
6-->
7
8<!--ts-->
9   * [Memory Management in TensorFlow Lite Micro](#memory-management-in-tensorflow-lite-micro)
10      * [Tensor Arena](#tensor-arena)
11         * [Head Section](#head-section)
12            * [Offline planned tensor allocations](#offline-planned-tensor-allocations)
13         * [Temporary Section](#temporary-section)
14         * [Tail Section](#tail-section)
15      * [Recording Memory APIs](#recording-memory-apis)
16         * [Allocation Section Details](#allocation-section-details)
17
18<!-- Added by: freddan80, at: Mon 29 Mar 2021 01:47:42 PM CEST -->
19
20<!--te-->
21
22# Memory Management in TensorFlow Lite Micro
23
24This document outlines how memory is managed internally by TensorFlow Lite Micro
25(TFLM) today. It outlines the "online" allocation strategy used by the default
26TFLM APIs for loading a model into a shared tensor arena.
27
28## Tensor Arena
29
30The main "working" space for TFLM allocations is inside a single `char` or
31`int8_t` buffer. This buffer can be managed by passing it directly into a
32`tflite::MicroInterpreter` constructor or through a `tflite::MicroAllocator`
33instance that can be passed into a `tflite::MicroInterpreter` constructor.
34Internally, the `tflite::MicroAllocator` classifies allocations into 3 different
35sections:
36
37*   **Head** - non-persistent allocations.
38*   **Temporary** - short term "scoped" allocations.
39*   **Tail** - persistent allocations.
40
41The illustration below represents typical allocations in TFLM:
42
43```
44--------------------------------------------------------------------------------
45|        |                     |                                               |
46|  HEAD  |<--  TEMPORARY    -->|                    TAIL                       |
47|        |                     |                                               |
48--------------------------------------------------------------------------------
49* Lowest Address                                               Highest Address *
50```
51
52### Head Section
53
54This non-persistent section typically holds shared Tensor buffers. This section
55does not allocate small iterative chunks, it can only be set by a specific
56length for the entire section.
57
58This allocation length of this section is managed by the
59`tflite::GreedyMemoryPlanner`. That memory planner looks at the entire graph of
60a model and tries to reuse as many buffers as possible to create the smallest
61length for the head. The Tensor buffers for this section can be accessed via a
62`TfLiteEvalTensor` or `TfLiteTensor` instance on the `tflite::MicroInterpreter`.
63
64#### Offline planned tensor allocations
65
66All, or a subset of, tensors can be allocated using an offline planner. An
67offline planner performs tensor allocation on e.g. a host PC. The offline tensor
68allocation plan is added to model metadata. See format below.
69
70For each non-constant tensor in the `tensors:[Tensor]` list of the subgraph, a
71byte offset to the start of the head section of the memory arena is given. -1
72indicates that the tensor will be allocated at runtime by the
73`tflite::GreedyMemoryPlanner`. The offline plan is permitted to overlap buffers
74if it knows that the data will not be used at the same time.
75
76The offline tensor allocation plan will be encoded in the `metadata:[Metadata]`
77field of the model, using the following encoding:
78
79| Metadata component | Value |
80|-|-|
81| name:string | “OfflineMemoryAllocation” |
82| buffer:unit | Index of buffer containing offline tensor allocation data |
83
84The buffer contents for the offline tensor allocation is a list of 32-bit
85integers of the following format:
86
87| Offset | Value |
88|-|-|
89| 0 | Offline allocation format version |
90| 1 | Subgraph index to which this allocation applies |
91| 2 | Number offsets following: n |
92| 3 | Byte offset of tensor #0 or -1 to allocate at runtime |
93| 4 | Byte offset of tensor #1 or -1 to allocate at runtime |
94| ... | ... |
95| 3+(n-1) | Byte offset of tensor #(n-1) or -1 to allocate at runtime |
96
97The `tflite::GreedyMemoryPlanner` treats the provided offline tensor allocation
98plan as constant fixed offset to the start of the head section and will attempt
99to fit any other tensors (such as scratch tensors added a runtime using the
100`RequestScratchBufferInArena` API of `TfLiteContext`) around those fixed
101offsets.
102
103### Temporary Section
104
105This section is used to allocate "scoped" or short-term, non-guaranteed buffers.
106Allocations from this section start from the current end address of the head
107section and grow towards the tail section. An allocation chain can be reset (and
108must be reset before adjusting the head) and moves the current allocation start
109address back to the end of the head section.
110
111TFLM currently uses these allocations for a scope allocation of large C structs
112or scratch memory that is expected to be valid for at least the lifetime of a
113method call. This section.
114
115### Tail Section
116
117This section holds all persistent allocations used by TFLM. This section
118contains many random sized allocations and grows towards the end of the head
119section. Allocations in this section come from a variety of areas inside of
120TFLM. TFLM provides a [recording API](#Recording-Memory-APIs) to assist with
121auditing the contents of this section.
122
123## Recording Memory APIs
124
125TFLM provides simple APIs for auditing memory usage in the shared tensor arena.
126These APIs are opt-in and require some additional memory overhead and a working
127debug logging implementation
128[(reference implementation)](https://github.com/tensorflow/tflite-micro/blob/main/tensorflow/lite/micro/debug_log.cc).
129
130A typical bare-bones TFLM interpreter setup looks as such:
131
132```c++
133// Buffer for the tensor arena:
134size_t tensor_arena_size = 2048;
135uint8_t tensor_arena[tensor_arena_size];
136
137// Interpreter using the shared tensor arena above:
138tflite::MicroInterpreter interpreter(
139  tflite::GetModel(my_model_data), ops_resolver,
140  tensor_arena, tensor_arena_size, error_reporter);
141
142// Invoke one time which will allocate internals:
143if (interpreter.Invoke() != kTfLiteOk) {
144  TF_LITE_REPORT_ERROR(error_reporter, "Exception during invoke()!");
145}
146```
147
148Recording API can simply be used by including the `RecordingMicroInterpreter`
149class (`recording_micro_interpreter.h`) and replace `tflite::MicroInterpreter`
150with `tflite::RecordingMicroInterpreter`. The same call to `invoke()` is
151performed, but another call is made to `PrintAllocations()` which will output
152detailed allocation logging:
153
154```c++
155// Add an include to the recording API:
156#include "recording_micro_interpreter.h"
157
158// Simply change the class name from 'MicroInterpreter' to 'RecordingMicroInterpreter':
159tflite::RecordingMicroInterpreter interpreter(
160  tflite::GetModel(my_model_data), ops_resolver,
161  tensor_arena, tensor_arena_size, error_reporter);
162
163// Invoke one time which will allocate internals:
164if (interpreter.Invoke() != kTfLiteOk) {
165  TF_LITE_REPORT_ERROR(error_reporter, "Exception during invoke()!");
166}
167
168// Print out detailed allocation information:
169interpreter.GetMicroAllocator().PrintAllocations();
170```
171
172The output of this call will look something similar to this (output from the
173[memory_arena_threshold_test](https://github.com/tensorflow/tflite-micro/blob/main/tensorflow/lite/micro/memory_arena_threshold_test.cc#L205)):
174
175```bash
176[RecordingMicroAllocator] Arena allocation total 9568 bytes
177[RecordingMicroAllocator] Arena allocation head 7744 bytes
178[RecordingMicroAllocator] Arena allocation tail 1824 bytes
179[RecordingMicroAllocator] 'TfLiteEvalTensor data' used 360 bytes with alignment overhead (requested 360 bytes for 15 allocations)
180[RecordingMicroAllocator] 'Persistent TfLiteTensor data' used 0 bytes with alignment overhead (requested 0 bytes for 0 tensors)
181[RecordingMicroAllocator] 'Persistent TfLiteTensor quantization data' used 0 bytes with alignment overhead (requested 0 bytes for 0 allocations)
182[RecordingMicroAllocator] 'TfLiteTensor variable buffer data' used 0 bytes with alignment overhead (requested 0 bytes for 0 allocations)
183[RecordingMicroAllocator] 'NodeAndRegistration struct' used 392 bytes with alignment overhead (requested 392 bytes for 7 NodeAndRegistration structs)
184[RecordingMicroAllocator] 'Operator runtime data' used 136 bytes with alignment overhead (requested 136 bytes for 5 OpData structs)
185```
186
187### Allocation Section Details
188
189More information about each recorded allocation section:
190
191*   'TfLiteEvalTensor data'
192    *   C struct that holds the data type, dimension, and a pointer to the
193        buffer representing the Tensor.
194*   'Persistent TfLiteTensor data'
195    *   C struct that holds more information than a `TfLiteEvalTensor` struct in
196        the graph.
197    *   Allocations in this bucket will only show up when accessing tensors from
198        the accessors on `tflite::MicroInterpreter`.
199*   'Persistent TfLiteTensor quantization data'
200    *   Length of persistent quantization data assigned to persistent
201        `TfLiteTensor` structs.
202    *   Allocations in this bucket will only show up when accessing tensors from
203        the accessors on `tflite::MicroInterpreter`.
204*   'TfLiteTensor variable buffer data'
205    *   Length of buffer data from a variable tensor (retains data throughout
206        calls to `invoke()`).
207*   'NodeAndRegistration struct'
208    *   C struct that holds a `TfLiteRegistration` and `TfLiteNode` struct
209        instance.
210    *   Each operator in a model will contain one `NodeAndRegistration` struct.
211*   'Operator runtime data'
212    *   Persistent allocations of data cached by TFLM kernels (e.g. quantization
213        params, multipliers, etc).
214