1<!--ts-->
2
3*   [Online Memory Allocation Overview in TensorFlow Lite Micro](#online-memory-allocation-overview-in-tensorflow-lite-micro)
4    *   [Arena](#arena)
5    *   [Existing buffers in the flatbuffer](#existing-buffers-in-the-flatbuffer)
6    *   [Model Init Phase](#model-init-phase)
7    *   [Model Prepare Phase](#model-prepare-phase)
8    *   [Finish Model Allocation Phase](#finish-model-allocation-phase)
9
10<!-- Added by: kreeger, at: Wed Apr 28 10:52:04 CDT 2021 -->
11
12<!--te-->
13
14# Online Memory Allocation Overview in TensorFlow Lite Micro
15
16This document outlines how "online" memory is managed in TensorFlow Lite Micro
17(TFLM).
18
19## Arena
20
21Online memory planning strategically places allocations in a single `uint8_t`
22buffer array. The buffer is split into two main sections: the “head” and the
23“tail”. Generally, non-persistent allocations are placed in the “head” and
24persistent allocations are placed in the “tail”. More details about the arena
25can be [found here](memory_management.md#tensor-arena).
26
27## Existing buffers in the flatbuffer
28
29The TFLite flatbuffer model contains a variety of information required to run a
30model in TFLite or TFLM. The TFLM online memory planner will walk the main
31subgraph and find all tensors required for the model (represented as
32`TfLiteTensor` and `TfLiteEvalTensor` C structs at runtime). Persistent tensors
33in the flatbuffer (e.g. weight tensors) will point at a buffer inlined in the
34flatbuffer. These buffers are reused during online memory planning. The
35corresponding C structures will point back at the buffer packed into the
36flatbuffer.
37
38## Model Init Phase
39
40Either through the first call of `MicroInterpreter::Invoke()` or an explicit
41call to `MicroInterpreter::AllocateTensors()` the online model allocation will
42begin. The `MicroInterpreter` instance will invoke
43`MicroAllocator::StartModelAllocation()`. This function will begin pulling data
44out of the serialized flatbuffer and begin walking through the main subgraph.
45
46The method `MicroAllocator::StartModelAllocation()` begins allocation in the
47following order: * Initializes internal state for scratch buffer allocations *
48Allocates a list of `TfLiteEvalTensor` C structs based on the number of tensors
49in the subgraph. * Allocations are persistent and stored in the tail section. *
50Tensors that reference buffers in the flatbuffer are assigned at this point. *
51Allocates a list of `TfLiteRegistration` and `TfLiteNode` C structs for every
52operator in the model subgraph * Allocations are persistent and stored in the
53tail section. * Walks back through the list of subgraph operators and assigns
54all C structs with relevant information from the flatbuffer.
55
56At the conclusion of this phase, the operator kernel implementations are ready
57for calls to the `TfLiteRegistration::init()` function. The `MicroInterpreter`
58walks through the operator list and invokes all operator implementations that
59have this function. Typically, operator implementations return the object to
60store in the `user_data` field of a `TfLiteNode` struct.
61
62## Model Prepare Phase
63
64After the interpreter has initialized all operator kernels, another pass through
65the subgraph is done. This time, each operator implementations that provides a
66`TfLiteRegistration::prepare()` function is called. This phase in TFLM is used
67for kernels to verify capabilities from model information, validate shapes,
68allocate any scratch buffers requested (through
69`TfLiteContext::GetScratchBuffer()`), and calculate quantization runtime data.
70
71At this time, operator implementation will request tensor data through the
72`TfLiteTensor` C struct. This struct is heavier and contains more information
73that operators will need during this phase of initialization. Internally, TFLM
74will allocate these instances per request in the temp section. The temp section
75is the space between the head and the tail in the arena. During the prepare
76phase, nothing is yet been placed in the head section. This extra space between
77the head and tail is used to allocate buffers that are available until
78`MicroAllocator::ResetTempAllocations()` is called. Additional information
79[available here](memory_management.md#temporary-section).
80
81NOTE: The `TfLiteTensor` struct is only available in TFLM during
82`TfLiteRegistration::prepare()`, after this allocation phase tensor data can
83only be accessed via a `TfLiteEvalTensor` struct.
84
85Additionally, at this time each operator implementation may request scratch
86buffer requests through `TfLiteContext::RequestScratchBufferInArena()`. These
87requests are limited to `kMaxScratchBuffersPerOp` and are stored in an instance
88variable for each operator prepare block. All requests are eventually moved to
89the head section when the interpreter moves to the next operator.
90
91After each call to `TfLiteRegistration::prepare()` the `MicroInterpreter` calls
92`MicroAllocator::FinishPrepareNodeAllocations()`. This method resets temp
93allocations and begins to store all scratch buffer requests inside the head
94section of the arena.
95
96After all operators have been prepared, the `MicroInterpreter` calls
97`MicroAllocator::FinishModelAllocation()` to begin finalizing the online memory
98plan.
99
100## Finish Model Allocation Phase
101
102The last phase of online memory planning is handled in
103`MicroAllocator::FinishModelAllocation()`. This function performs the following
104tasks
105
106*   Allocates space in the tail for all persistent buffer requests that are
107    currently in the head.
108*   Commits Static Memory Plan
109    *   Uses the `GreedyMemoryPlanner` to optimize the non-persistent space in
110        the head.
111    *   Optimizes for the operator that requires the largest byte-width buffer.
112    *   Allocates pointers in the tail that provide pointers into shared space
113        and offsets in the head.
114    *   Sets the size of the head based on the result of
115        `GreedyMemoryPlanner::GetMaxiumMemorySize()`.
116*   Allocates variable tensor buffers in the tail section.
117
118Once TFLM has finalized online model allocation, all buffers are prepared and
119ready for optimal speed for inference. The system no longer enables operator
120implementations to allocate scratch buffers after this point.
121