1<!--ts--> 2 3* [Online Memory Allocation Overview in TensorFlow Lite Micro](#online-memory-allocation-overview-in-tensorflow-lite-micro) 4 * [Arena](#arena) 5 * [Existing buffers in the flatbuffer](#existing-buffers-in-the-flatbuffer) 6 * [Model Init Phase](#model-init-phase) 7 * [Model Prepare Phase](#model-prepare-phase) 8 * [Finish Model Allocation Phase](#finish-model-allocation-phase) 9 10<!-- Added by: kreeger, at: Wed Apr 28 10:52:04 CDT 2021 --> 11 12<!--te--> 13 14# Online Memory Allocation Overview in TensorFlow Lite Micro 15 16This document outlines how "online" memory is managed in TensorFlow Lite Micro 17(TFLM). 18 19## Arena 20 21Online memory planning strategically places allocations in a single `uint8_t` 22buffer array. The buffer is split into two main sections: the “head” and the 23“tail”. Generally, non-persistent allocations are placed in the “head” and 24persistent allocations are placed in the “tail”. More details about the arena 25can be [found here](memory_management.md#tensor-arena). 26 27## Existing buffers in the flatbuffer 28 29The TFLite flatbuffer model contains a variety of information required to run a 30model in TFLite or TFLM. The TFLM online memory planner will walk the main 31subgraph and find all tensors required for the model (represented as 32`TfLiteTensor` and `TfLiteEvalTensor` C structs at runtime). Persistent tensors 33in the flatbuffer (e.g. weight tensors) will point at a buffer inlined in the 34flatbuffer. These buffers are reused during online memory planning. The 35corresponding C structures will point back at the buffer packed into the 36flatbuffer. 37 38## Model Init Phase 39 40Either through the first call of `MicroInterpreter::Invoke()` or an explicit 41call to `MicroInterpreter::AllocateTensors()` the online model allocation will 42begin. The `MicroInterpreter` instance will invoke 43`MicroAllocator::StartModelAllocation()`. This function will begin pulling data 44out of the serialized flatbuffer and begin walking through the main subgraph. 45 46The method `MicroAllocator::StartModelAllocation()` begins allocation in the 47following order: * Initializes internal state for scratch buffer allocations * 48Allocates a list of `TfLiteEvalTensor` C structs based on the number of tensors 49in the subgraph. * Allocations are persistent and stored in the tail section. * 50Tensors that reference buffers in the flatbuffer are assigned at this point. * 51Allocates a list of `TfLiteRegistration` and `TfLiteNode` C structs for every 52operator in the model subgraph * Allocations are persistent and stored in the 53tail section. * Walks back through the list of subgraph operators and assigns 54all C structs with relevant information from the flatbuffer. 55 56At the conclusion of this phase, the operator kernel implementations are ready 57for calls to the `TfLiteRegistration::init()` function. The `MicroInterpreter` 58walks through the operator list and invokes all operator implementations that 59have this function. Typically, operator implementations return the object to 60store in the `user_data` field of a `TfLiteNode` struct. 61 62## Model Prepare Phase 63 64After the interpreter has initialized all operator kernels, another pass through 65the subgraph is done. This time, each operator implementations that provides a 66`TfLiteRegistration::prepare()` function is called. This phase in TFLM is used 67for kernels to verify capabilities from model information, validate shapes, 68allocate any scratch buffers requested (through 69`TfLiteContext::GetScratchBuffer()`), and calculate quantization runtime data. 70 71At this time, operator implementation will request tensor data through the 72`TfLiteTensor` C struct. This struct is heavier and contains more information 73that operators will need during this phase of initialization. Internally, TFLM 74will allocate these instances per request in the temp section. The temp section 75is the space between the head and the tail in the arena. During the prepare 76phase, nothing is yet been placed in the head section. This extra space between 77the head and tail is used to allocate buffers that are available until 78`MicroAllocator::ResetTempAllocations()` is called. Additional information 79[available here](memory_management.md#temporary-section). 80 81NOTE: The `TfLiteTensor` struct is only available in TFLM during 82`TfLiteRegistration::prepare()`, after this allocation phase tensor data can 83only be accessed via a `TfLiteEvalTensor` struct. 84 85Additionally, at this time each operator implementation may request scratch 86buffer requests through `TfLiteContext::RequestScratchBufferInArena()`. These 87requests are limited to `kMaxScratchBuffersPerOp` and are stored in an instance 88variable for each operator prepare block. All requests are eventually moved to 89the head section when the interpreter moves to the next operator. 90 91After each call to `TfLiteRegistration::prepare()` the `MicroInterpreter` calls 92`MicroAllocator::FinishPrepareNodeAllocations()`. This method resets temp 93allocations and begins to store all scratch buffer requests inside the head 94section of the arena. 95 96After all operators have been prepared, the `MicroInterpreter` calls 97`MicroAllocator::FinishModelAllocation()` to begin finalizing the online memory 98plan. 99 100## Finish Model Allocation Phase 101 102The last phase of online memory planning is handled in 103`MicroAllocator::FinishModelAllocation()`. This function performs the following 104tasks 105 106* Allocates space in the tail for all persistent buffer requests that are 107 currently in the head. 108* Commits Static Memory Plan 109 * Uses the `GreedyMemoryPlanner` to optimize the non-persistent space in 110 the head. 111 * Optimizes for the operator that requires the largest byte-width buffer. 112 * Allocates pointers in the tail that provide pointers into shared space 113 and offsets in the head. 114 * Sets the size of the head based on the result of 115 `GreedyMemoryPlanner::GetMaxiumMemorySize()`. 116* Allocates variable tensor buffers in the tail section. 117 118Once TFLM has finalized online model allocation, all buffers are prepared and 119ready for optimal speed for inference. The system no longer enables operator 120implementations to allocate scratch buffers after this point. 121