1.. _zms_api: 2 3Zephyr Memory Storage (ZMS) 4########################### 5Zephyr Memory Storage is a new key-value storage system that is designed to work with all types 6of non-volatile storage technologies. It supports classical on-chip NOR flash as well as new 7technologies like RRAM and MRAM that do not require a separate erase operation at all, that is, 8data on these types of devices can be overwritten directly at any time. 9 10General behavior 11**************** 12ZMS divides the memory space into sectors (minimum 2), and each sector is filled with key-value 13pairs until it is full. 14 15The key-value pair is divided into two parts: 16 17- The key part is written in an ATE (Allocation Table Entry) called "ID-ATE" which is stored 18 starting from the bottom of the sector. 19- The value part is defined as "data" and is stored raw starting from the top of the sector. 20 21Additionally, for each sector we store at the last positions header ATEs which are ATEs that 22are needed for the sector to describe its status (closed, open) and the current version of ZMS. 23 24When the current sector is full we verify first that the following sector is empty, we garbage 25collect the sector N+2 (where N is the current sector number) by moving the valid ATEs to the 26N+1 empty sector, we erase the garbage-collected sector and then we close the current sector by 27writing a garbage_collect_done ATE and the close ATE (one of the header entries). 28Afterwards we move forward to the next sector and start writing entries again. 29 30This behavior is repeated until it reaches the end of the partition. Then it starts again from 31the first sector after garbage collecting it and erasing its content. 32 33Composition of a sector 34======================= 35A sector is organized in this form (example with 3 sectors): 36 37.. list-table:: 38 :widths: 25 25 25 39 :header-rows: 1 40 41 * - Sector 0 (closed) 42 - Sector 1 (open) 43 - Sector 2 (empty) 44 * - Data_a0 45 - Data_b0 46 - Data_c0 47 * - Data_a1 48 - Data_b1 49 - Data_c1 50 * - Data_a2 51 - Data_b2 52 - Data_c2 53 * - GC_done 54 - . 55 - . 56 * - . 57 - . 58 - . 59 * - . 60 - . 61 - . 62 * - . 63 - ID ATE_b2 64 - ID ATE_c2 65 * - ID ATE_a2 66 - ID ATE_b1 67 - ID ATE_c1 68 * - ID ATE_a1 69 - ID ATE_b0 70 - ID ATE_c0 71 * - ID ATE_a0 72 - GC_done ATE 73 - GC_done ATE 74 * - Close ATE (cyc=1) 75 - Close ATE (cyc=1) 76 - Close ATE (cyc=1) 77 * - Empty ATE (cyc=1) 78 - Empty ATE (cyc=2) 79 - Empty ATE (cyc=2) 80 81Definition of each element in the sector 82======================================== 83 84``Empty ATE`` is written when erasing a sector (last position of the sector). 85 86``Close ATE`` is written when closing a sector (second to last position of the sector). 87 88``GC_done ATE`` is written to indicate that the next sector has already been garbage-collected. 89This ATE could be at any position of the sector. 90 91``ID ATE`` are entries that contain a 32-bit key and describe where the data is stored, its 92size and its CRC32. 93 94``Data`` is the actual value associated to the ID-ATE. 95 96How does ZMS work? 97****************** 98 99Mounting the storage system 100=========================== 101 102Mounting the storage system starts by getting the flash parameters, checking that the file system 103properties are correct (sector_size, sector_count ...) then calling the zms_init function to 104make the storage ready. 105 106To mount the filesystem the following elements in the ``zms_fs`` structure must be initialized: 107 108.. code-block:: c 109 110 struct zms_fs { 111 /** File system offset in flash **/ 112 off_t offset; 113 114 /** Storage system is split into sectors, each sector size must be multiple of 115 * erase-blocks if the device has erase capabilities 116 */ 117 uint32_t sector_size; 118 /** Number of sectors in the file system */ 119 uint32_t sector_count; 120 121 /** Flash device runtime structure */ 122 const struct device *flash_device; 123 }; 124 125Initialization 126============== 127 128As ZMS has a fast-forward write mechanism, it must find the last sector and the last pointer of 129the entry where it stopped the last time. 130It must look for a closed sector followed by an open one, then within the open sector, it finds 131(recovers) the last written ATE. 132After that, it checks that the sector after this one is empty, or it will erase it. 133 134ZMS ID/data write 135=================== 136 137To avoid rewriting the same data with the same ID again, ZMS must look in all the sectors if the 138same ID exists and then compares its data. If the data is identical, no write is performed. 139If it must perform a write, then an ATE and the data (if the operation is not a delete) are written 140in the sector. 141If the sector is full (cannot hold the current data + ATE), ZMS has to move to the next sector, 142garbage collect the sector after the newly opened one then erase it. 143Data whose size is smaller or equal to 8 bytes are written within the ATE. 144 145ZMS ID/data read (with history) 146=============================== 147 148By default ZMS looks for the last data with the same ID by browsing through all stored ATEs from 149the most recent ones to the oldest ones. If it finds a valid ATE with a matching ID it retrieves 150its data and returns the number of bytes that were read. 151If a history count is provided and different than 0, older data with same ID is retrieved. 152 153ZMS free space calculation 154========================== 155 156ZMS can also return the free space remaining in the partition. 157However, this operation is very time-consuming as it needs to browse through all valid ATEs 158in all sectors of the partition and for each valid ATE try to find if an older one exists. 159It is not recommended for applications to use this function often, as it is time-consuming and 160could slow down the calling thread. 161 162The cycle counter 163================= 164 165Each sector has a lead cycle counter which is a ``uin8_t`` that is used to validate all the other 166ATEs. 167The lead cycle counter is stored in the empty ATE. 168To become valid, an ATE must have the same cycle counter as the one stored in the empty ATE. 169Each time an ATE is moved from a sector to another it must get the cycle counter of the 170destination sector. 171To erase a sector, the cycle counter of the empty ATE is incremented and a single write of the 172empty ATE is done. 173All the ATEs in that sector become invalid. 174 175Closing sectors 176=============== 177 178To close a sector a close ATE is added at the end of the sector and it must have the same cycle 179counter as the empty ATE. 180When closing a sector, all the remaining space that has not been used is filled with garbage data 181to avoid having old ATEs with a valid cycle counter. 182 183Triggering garbage collection 184============================= 185 186Some applications need to make sure that storage writes have a maximum defined latency. 187When calling ZMS to make a write, the current sector could be almost full such that ZMS needs to 188trigger the GC to switch to the next sector. 189This operation is time-consuming and will cause some applications to not meet their real time 190constraints. 191ZMS adds an API for the application to get the current remaining free space in a sector. 192The application could then decide when to switch to the next sector if the current one is almost 193full. This will of course trigger the garbage collection operation on the next sector. 194This will guarantee the application that the next write won't trigger the garbage collection. 195 196ATE (Allocation Table Entry) structure 197====================================== 198 199An entry has 16 bytes divided between these fields: 200 201See the :c:struct:`zms_ate` structure. 202 203.. note:: The CRC of the data is checked only when a full read of the data is made. 204 The CRC of the data is not checked for a partial read, as it is computed for the whole element. 205 206.. warning:: Enabling the CRC feature on previously existing ZMS content that did not have it 207 enabled will make all existing data invalid. 208 209Available space for user data (key-value pairs) 210*********************************************** 211 212ZMS always needs an empty sector to be able to perform the garbage collection (GC). 213So, if we suppose that 4 sectors exist in a partition, ZMS will only use 3 sectors to store 214key-value pairs and keep one sector empty to be able to perform GC. 215The empty sector will rotate between the 4 sectors in the partition. 216 217.. note:: The maximum single data length that can be written at once in a sector is 64K 218 (this could change in future versions of ZMS). 219 220Small data values 221================= 222 223Values smaller than or equal to 8 bytes will be stored within the entry (ATE) itself, without 224writing data at the top of the sector. 225ZMS has an entry size of 16 bytes which means that the maximum available space in a partition to 226store data is computed in this scenario as: 227 228.. math:: 229 230 \small\frac{(NUM\_SECTORS - 1) \times (SECTOR\_SIZE - (5 \times ATE\_SIZE)) \times (DATA\_SIZE)}{ATE\_SIZE} 231 232Where: 233 234``NUM_SECTOR``: Total number of sectors 235 236``SECTOR_SIZE``: Size of the sector 237 238``ATE_SIZE``: 16 bytes 239 240``(5 * ATE_SIZE)``: Reserved ATEs for header and delete items 241 242``DATA_SIZE``: Size of the small data values (range from 1 to 8) 243 244For example for 4 sectors of 1024 bytes, free space for 8-byte length data is :math:`\frac{3 \times 944 \times 8}{16} = 1416 \, \text{ bytes}`. 245 246Large data values 247================= 248 249Large data values ( > 8 bytes) are stored separately at the top of the sector. 250In this case, it is hard to estimate the free available space, as this depends on the size of 251the data. But we can take into account that for N bytes of data (N > 8 bytes) an additional 25216 bytes of ATE must be added at the bottom of the sector. 253 254Let's take an example: 255 256For a partition that has 4 sectors of 1024 bytes and for data size of 64 bytes. 257Only 3 sectors are available for writes with a capacity of 944 bytes each. 258Each key-value pair needs an extra 16 bytes for the ATE, which makes it possible to store 11 pairs 259in each sector (:math:`\frac{944}{80}`). 260Total data that could be stored in this partition for this case is :math:`11 \times 3 \times 64 = 2112 \text{ bytes}`. 261 262Wear leveling 263************* 264 265This storage system is optimized for devices that do not require an erase. 266Storage systems that rely on an erase value (NVS as an example) need to emulate the erase with 267write operations. This causes a significant decrease in the life expectancy of these devices 268as well as more delays for write operations and initialization of the device when it is empty. 269ZMS uses a cycle count mechanism that avoids emulating erase operations for these devices. 270It also guarantees that every memory location is written only once for each cycle of sector write. 271 272As an example, to erase a 4096-byte sector on devices that do not require an erase operation 273using NVS, 256 flash writes must be performed (supposing that ``write-block-size`` = 16 bytes), while 274using ZMS, only 1 write of 16 bytes is needed. This operation is 256 times faster in this case. 275 276The garbage collection operation also reduces the memory cell life expectancy as it performs write 277operations when moving blocks from one sector to another. 278To make the garbage collector not affect the life expectancy of the device it is recommended 279to dimension the partition appropriately. Its size should be the double of the maximum size of 280data (including headers) that could be written in the storage. 281 282See `Available space for user data <#available-space-for-user-data-key-value-pairs>`_. 283 284Device lifetime calculation 285=========================== 286 287Storage devices, whether they are classical flash or new technologies like RRAM/MRAM, have a 288limited life expectancy which is determined by the number of times memory cells can be 289erased/written. 290Flash devices are erased one page at a time as part of their functional behavior (otherwise 291memory cells cannot be overwritten), and for storage devices that do not require an erase 292operation, memory cells can be overwritten directly. 293 294A typical scenario is shown here to calculate the life expectancy of a device: 295Let's suppose that we store an 8-byte variable using the same ID but its content changes every 296minute. The partition has 4 sectors with 1024 bytes each. 297Each write of the variable requires 16 bytes of storage. 298As we have 944 bytes available for ATEs for each sector, and because ZMS is a fast-forward 299storage system, we are going to rewrite the first location of the first sector after 300:math:`\frac{(944 \times 4)}{16} = 236 \text{ minutes}`. 301 302In addition to the normal writes, the garbage collector will move the data that is still valid 303from old sectors to new ones. 304As we are using the same ID and a big partition size, no data will be moved by the garbage 305collector in this case. 306For storage devices that can be written 20 000 times, the storage will last about 3074 720 000 minutes (~9 years). 308 309To make a more general formula we must first compute the effective used size in ZMS by our 310typical set of data. 311For ID/data pairs with data <= 8 bytes, ``effective_size`` is 16 bytes. 312For ID/data pairs with data > 8 bytes, ``effective_size`` is ``16 + sizeof(data)`` bytes. 313Let's suppose that ``total_effective_size`` is the total size of the data that is written in 314the storage and that the partition is sized appropriately (double of the effective size) to avoid 315having the garbage collector moving blocks all the time. 316 317The expected lifetime of the device in minutes is computed as: 318 319.. math:: 320 321 \small\frac{(SECTOR\_EFFECTIVE\_SIZE \times SECTOR\_NUMBER \times MAX\_NUM\_WRITES)}{(TOTAL\_EFFECTIVE\_SIZE \times WR\_MIN)} 322 323Where: 324 325``SECTOR_EFFECTIVE_SIZE``: The sector size - header size (80 bytes) 326 327``SECTOR_NUMBER``: The number of sectors 328 329``MAX_NUM_WRITES``: The life expectancy of the storage device in number of writes 330 331``TOTAL_EFFECTIVE_SIZE``: Total effective size of the set of written data 332 333``WR_MIN``: Number of writes of the set of data per minute 334 335Features 336******** 337ZMS has introduced many features compared to existing storage system like NVS and will evolve 338from its initial version to include more features that satisfies new technologies requirements 339such as low latency and bigger storage space. 340 341Existing features 342================= 343Version 1 344--------- 345- Supports storage devices that do not require an erase operation (only one write operation 346 to invalidate a sector) 347- Supports large partition and sector sizes (64-bit address space) 348- Supports 32-bit IDs 349- Small-sized data (<= 8 bytes) are stored in the ATE itself 350- Built-in data CRC32 (included in the ATE) 351- Versioning of ZMS (to handle future evolutions) 352- Supports large ``write-block-size`` (only for platforms that need it) 353 354Future features 355=============== 356 357- Add multiple format ATE support to be able to use ZMS with different ATE formats that satisfies 358 requirements from application 359- Add the possibility to skip garbage collector for some application usage where ID/value pairs 360 are written periodically and do not exceed half of the partition size (there is always an old 361 entry with the same ID). 362- Divide IDs into namespaces and allocate IDs on demand from application to handle collisions 363 between IDs used by different subsystems or samples. 364- Add the possibility to retrieve the wear out value of the device based on the cycle count value 365- Add a recovery function that can recover a storage partition if something went wrong 366- Add a library/application to allow migration from NVS entries to ZMS entries 367- Add the possibility to force formatting the storage partition to the ZMS format if something 368 went wrong when mounting the storage. 369 370ZMS and other storage systems in Zephyr 371======================================= 372This section describes ZMS in the wider context of storage systems in Zephyr (not full filesystems, 373but simpler, non-hierarchical ones). 374Today Zephyr includes at least two other systems that are somewhat comparable in scope and 375functionality: :ref:`NVS <nvs_api>` and :ref:`FCB <fcb_api>`. 376Which one to use in your application will depend on your needs and the hardware you are using, 377and this section provides information to help make a choice. 378 379- If you are using devices that do not require an erase operation like RRAM or MRAM, :ref:`ZMS <zms_api>` is definitely the 380 best fit for your storage subsystem as it is designed to avoid emulating erase operation using 381 large block writes for these devices and replaces it with a single write call. 382- For devices that have a large ``write_block_size`` and/or need a sector size that is different than the 383 classical flash page size (equal to erase_block_size), :ref:`ZMS <zms_api>` is also the best fit as there is 384 the possibility to customize these parameters and add the support of these devices in ZMS. 385- For classical flash technology devices, :ref:`NVS <nvs_api>` is recommended as it has low footprint (smaller 386 ATEs and smaller header ATEs). Erasing flash in NVS is also very fast and do not require an 387 additional write operation compared to ZMS. 388 For these devices, NVS reads/writes will be faster as well than ZMS as it has smaller ATE size. 389- If your application needs more than 64K IDs for storage, :ref:`ZMS <zms_api>` is recommended here as it 390 has a 32-bit ID field. 391- If your application is working in a FIFO mode (First-in First-out) then :ref:`FCB <fcb_api>` is 392 the best storage solution for this use case. 393 394More generally to make the right choice between NVS and ZMS, all the blockers should be first 395verified to make sure that the application could work with one subsystem or the other, then if 396both solutions could be implemented, the best choice should be based on the calculations of the 397life expectancy of the device described in this section: `Wear leveling <#wear-leveling>`_. 398 399Recommendations to increase performance 400*************************************** 401 402Sector size and count 403===================== 404 405- The total size of the storage partition should be set appropriately to achieve the best 406 performance with ZMS. 407 All the information regarding the effectively available free space in ZMS can be found 408 in the documentation. See `Available space for user data <#available-space-for-user-data-key-value-pairs>`_. 409 It's recommended to choose a storage partition size that is double the size of the key-value pairs 410 that will be written in the storage. 411- The sector size needs to be set such that a sector can fit the maximum data size that will be 412 stored. 413 Increasing the sector size will slow down the garbage collection operation and make it occur 414 less frequently. 415 Decreasing its size, on the opposite, will make the garbage collection operation faster but also 416 occur more frequently. 417- For some subsystems like :ref:`Settings <settings_api>`, all path-value pairs are split into two ZMS entries (ATEs). 418 The headers needed by the two entries should be accounted for when computing the needed storage 419 space. 420- Storing small data (<= 8 bytes) in ZMS entries can increase the performance, as this data is 421 written within the entry. 422 For example, for the :ref:`Settings <settings_api>` subsystem, choosing a path name that is 423 less than or equal to 8 bytes can make reads and writes faster. 424 425Cache size 426========== 427 428- When using the ZMS API directly, the recommendation for the cache size is to make it at least 429 equal to the number of different entries that will be written in the storage. 430- Each additional cache entry will add 8 bytes to your RAM usage. Cache size should be carefully 431 chosen. 432- If you use ZMS through :ref:`Settings <settings_api>`, you have to take into account that each Settings entry is 433 divided into two ZMS entries. The recommendation for the cache size is to make it at least 434 twice the number of Settings entries. 435 436API Reference 437************* 438 439The ZMS API is provided by ``zms.h``: 440 441.. doxygengroup:: zms_data_structures 442 443.. doxygengroup:: zms_high_level_api 444 445.. comment 446 not documenting .. doxygengroup:: zms 447