1.. _zms_api: 2 3Zephyr Memory Storage (ZMS) 4########################### 5Zephyr Memory Storage is a new key-value storage system that is designed to work with all types 6of non-volatile storage technologies. It supports classical on-chip NOR flash as well as new 7technologies like RRAM and MRAM that do not require a separate erase operation at all, that is, 8data on these types of devices can be overwritten directly at any time. 9 10General behavior 11**************** 12ZMS divides the memory space into sectors (minimum 2), and each sector is filled with key-value 13pairs until it is full. 14 15The key-value pair is divided into two parts: 16 17- The key part is written in an ATE (Allocation Table Entry) called "ID-ATE" which is stored 18 starting from the bottom of the sector 19- The value part is defined as "DATA" and is stored raw starting from the top of the sector 20 21Additionally, for each sector we store at the last positions Header-ATEs which are ATEs that 22are needed for the sector to describe its status (closed, open) and the current version of ZMS. 23 24When the current sector is full we verify first that the following sector is empty, we garbage 25collect the N+2 sector (where N is the current sector number) by moving the valid ATEs to the 26N+1 empty sector, we erase the garbage collected sector and then we close the current sector by 27writing a garbage_collect_done ATE and the close ATE (one of the header entries). 28Afterwards we move forward to the next sector and start writing entries again. 29 30This behavior is repeated until it reaches the end of the partition. Then it starts again from 31the first sector after garbage collecting it and erasing its content. 32 33Composition of a sector 34======================= 35A sector is organized in this form (example with 3 sectors): 36 37.. list-table:: 38 :widths: 25 25 25 39 :header-rows: 1 40 41 * - Sector 0 (closed) 42 - Sector 1 (open) 43 - Sector 2 (empty) 44 * - Data_a0 45 - Data_b0 46 - Data_c0 47 * - Data_a1 48 - Data_b1 49 - Data_c1 50 * - Data_a2 51 - Data_b2 52 - Data_c2 53 * - GC_done 54 - . 55 - . 56 * - . 57 - . 58 - . 59 * - . 60 - . 61 - . 62 * - . 63 - ATE_b2 64 - ATE_c2 65 * - ATE_a2 66 - ATE_b1 67 - ATE_c1 68 * - ATE_a1 69 - ATE_b0 70 - ATE_c0 71 * - ATE_a0 72 - GC_done 73 - GC_done 74 * - Close (cyc=1) 75 - Close (cyc=1) 76 - Close (cyc=1) 77 * - Empty (cyc=1) 78 - Empty (cyc=2) 79 - Empty (cyc=2) 80 81Definition of each element in the sector 82======================================== 83 84``Empty ATE:`` is written when erasing a sector (last position of the sector). 85 86``Close ATE:`` is written when closing a sector (second to last position of the sector). 87 88``GC_done ATE:`` is written to indicate that the next sector has been already garbage 89collected. This ATE could be in any position of the sector. 90 91``ID-ATE:`` are entries that contain a 32 bits Key and describe where the data is stored, its 92size and its crc32 93 94``Data:`` is the actual value associated to the ID-ATE 95 96How does ZMS work? 97****************** 98 99Mounting the Storage system 100=========================== 101 102Mounting the storage starts by getting the flash parameters, checking that the file system 103properties are correct (sector_size, sector_count ...) then calling the zms_init function to 104make the storage ready. 105 106To mount the filesystem some elements in the zms_fs structure must be initialized. 107 108.. code-block:: c 109 110 struct zms_fs { 111 /** File system offset in flash **/ 112 off_t offset; 113 114 /** Storage system is split into sectors, each sector size must be multiple of 115 * erase-blocks if the device has erase capabilities 116 */ 117 uint32_t sector_size; 118 /** Number of sectors in the file system */ 119 uint32_t sector_count; 120 121 /** Flash device runtime structure */ 122 const struct device *flash_device; 123 }; 124 125Initialization 126============== 127 128As ZMS has a fast-forward write mechanism, we must find the last sector and the last pointer of 129the entry where it stopped the last time. 130It must look for a closed sector followed by an open one, then within the open sector, it finds 131(recover) the last written ATE (Allocation Table Entry). 132After that, it checks that the sector after this one is empty, or it will erase it. 133 134ZMS ID-Data write 135=================== 136 137To avoid rewriting the same data with the same ID again, it must look in all the sectors if the 138same ID exist then compares its data, if the data is identical no write is performed. 139If we must perform a write, then an ATE and Data (if not a delete) are written in the sector. 140If the sector is full (cannot hold the current data + ATE) we have to move to the next sector, 141garbage collect the sector after the newly opened one then erase it. 142Data size that is smaller or equal to 8 bytes are written within the ATE. 143 144ZMS ID/data read (with history) 145=============================== 146 147By default it looks for the last data with the same ID by browsing through all stored ATEs from 148the most recent ones to the oldest ones. If it finds a valid ATE with a matching ID it retrieves 149its data and returns the number of bytes that were read. 150If history count is provided that is different than 0, older data with same ID is retrieved. 151 152ZMS free space calculation 153========================== 154 155ZMS can also return the free space remaining in the partition. 156However, this operation is very time consuming and needs to browse all valid ATEs in all sectors 157of the partition and for each valid ATE try to find if an older one exist. 158It is not recommended for application to use this function often, as it is time consuming and 159could slow down the calling thread. 160 161The cycle counter 162================= 163 164Each sector has a lead cycle counter which is a uin8_t that is used to validate all the other 165ATEs. 166The lead cycle counter is stored in the empty ATE. 167To become valid, an ATE must have the same cycle counter as the one stored in the empty ATE. 168Each time an ATE is moved from a sector to another it must get the cycle counter of the 169destination sector. 170To erase a sector, the cycle counter of the empty ATE is incremented and a single write of the 171empty ATE is done. 172All the ATEs in that sector become invalid. 173 174Closing sectors 175=============== 176 177To close a sector a close ATE is added at the end of the sector and it must have the same cycle 178counter as the empty ATE. 179When closing a sector, all the remaining space that has not been used is filled with garbage data 180to avoid having old ATEs with a valid cycle counter. 181 182Triggering Garbage collection 183============================= 184 185Some applications need to make sure that storage writes have a maximum defined latency. 186When calling a ZMS write, the current sector could be almost full and we need to trigger the GC 187to switch to the next sector. 188This operation is time consuming and it will cause some applications to not meet their real time 189constraints. 190ZMS adds an API for the application to get the current remaining free space in a sector. 191The application could then decide when needed to switch to the next sector if the current one is 192almost full and of course it will trigger the garbage collection on the next sector. 193This will guarantee the application that the next write won't trigger the garbage collection. 194 195ATE (Allocation Table Entry) structure 196====================================== 197 198An entry has 16 bytes divided between these variables : 199 200.. code-block:: c 201 202 struct zms_ate { 203 uint8_t crc8; /* crc8 check of the entry */ 204 uint8_t cycle_cnt; /* cycle counter for non-erasable devices */ 205 uint16_t len; /* data len within sector */ 206 uint32_t id; /* data id */ 207 union { 208 uint8_t data[8]; /* used to store small size data */ 209 struct { 210 uint32_t offset; /* data offset within sector */ 211 union { 212 uint32_t data_crc; /* crc for data */ 213 uint32_t metadata; /* Used to store metadata information 214 * such as storage version. 215 */ 216 }; 217 }; 218 }; 219 } __packed; 220 221.. note:: The CRC of the data is checked only when the whole the element is read. 222 The CRC of the data is not checked for a partial read, as it is computed for the whole element. 223 224.. note:: Enabling the CRC feature on previously existing ZMS content without CRC enabled 225 will make all existing data invalid. 226 227.. _free-space: 228 229Available space for user data (key-value pairs) 230*********************************************** 231 232For both scenarios ZMS should always have an empty sector to be able to perform the 233garbage collection (GC). 234So, if we suppose that 4 sectors exist in a partition, ZMS will only use 3 sectors to store 235Key-value pairs and keep one sector empty to be able to launch GC. 236The empty sector will rotate between the 4 sectors in the partition. 237 238.. note:: The maximum single data length that could be written at once in a sector is 64K 239 (This could change in future versions of ZMS) 240 241Small data values 242================= 243 244Values smaller than 8 bytes will be stored within the entry (ATE) itself, without writing data 245at the top of the sector. 246ZMS has an entry size of 16 bytes which means that the maximum available space in a partition to 247store data is computed in this scenario as : 248 249.. math:: 250 251 \small\frac{(NUM\_SECTORS - 1) \times (SECTOR\_SIZE - (5 \times ATE\_SIZE))}{2} 252 253Where: 254 255``NUM_SECTOR:`` Total number of sectors 256 257``SECTOR_SIZE:`` Size of the sector 258 259``ATE_SIZE:`` 16 bytes 260 261``(5 * ATE_SIZE):`` Reserved ATEs for header and delete items 262 263For example for 4 sectors of 1024 bytes, free space for data is :math:`\frac{3 \times 944}{2} = 1416 \, \text{ bytes}`. 264 265Large data values 266================= 267 268Large data values ( > 8 bytes) are stored separately at the top of the sector. 269In this case, it is hard to estimate the free available space, as this depends on the size of 270the data. But we can take into account that for N bytes of data (N > 8 bytes) an additional 27116 bytes of ATE must be added at the bottom of the sector. 272 273Let's take an example: 274 275For a partition that has 4 sectors of 1024 bytes and for data size of 64 bytes. 276Only 3 sectors are available for writes with a capacity of 944 bytes each. 277Each Key-value pair needs an extra 16 bytes for ATE which makes it possible to store 11 pairs 278in each sectors (:math:`\frac{944}{80}`). 279Total data that could be stored in this partition for this case is :math:`11 \times 3 \times 64 = 2112 \text{ bytes}` 280 281.. _wear-leveling: 282 283Wear leveling 284************* 285 286This storage system is optimized for devices that do not require an erase. 287Using storage systems that rely on an erase-value (NVS as an example) will need to emulate the 288erase with write operations. This will cause a significant decrease in the life expectancy of 289these devices and will cause more delays for write operations and for initialization. 290ZMS uses a cycle count mechanism that avoids emulating erase operation for these devices. 291It also guarantees that every memory location is written only once for each cycle of sector write. 292 293As an example, to erase a 4096 bytes sector on a non-erasable device using NVS, 256 flash writes 294must be performed (supposing that write-block-size=16 bytes), while using ZMS only 1 write of 29516 bytes is needed. This operation is 256 times faster in this case. 296 297Garbage collection operation is also adding some writes to the memory cell life expectancy as it 298is moving some blocks from one sector to another. 299To make the garbage collector not affect the life expectancy of the device it is recommended 300to correctly dimension the partition size. Its size should be the double of the maximum size of 301data (including extra headers) that could be written in the storage. 302 303See :ref:`free-space`. 304 305Device lifetime calculation 306=========================== 307 308Storage devices whether they are classical Flash or new technologies like RRAM/MRAM has a limited 309life expectancy which is determined by the number of times memory cells can be erased/written. 310Flash devices are erased one page at a time as part of their functional behavior (otherwise 311memory cells cannot be overwritten) and for non-erasable storage devices memory cells can be 312overwritten directly. 313 314A typical scenario is shown here to calculate the life expectancy of a device: 315Let's suppose that we store an 8 bytes variable using the same ID but its content changes every 316minute. The partition has 4 sectors with 1024 bytes each. 317Each write of the variable requires 16 bytes of storage. 318As we have 944 bytes available for ATEs for each sector, and because ZMS is a fast-forward 319storage system, we are going to rewrite the first location of the first sector after 320:math:`\frac{(944 \times 4)}{16} = 236 \text{ minutes}`. 321 322In addition to the normal writes, garbage collector will move the still valid data from old 323sectors to new ones. 324As we are using the same ID and a big partition size, no data will be moved by the garbage 325collector in this case. 326For storage devices that could be written 20000 times, the storage will last about 3274.720.000 minutes (~9 years). 328 329To make a more general formula we must first compute the effective used size in ZMS by our 330typical set of data. 331For id/data pair with data <= 8 bytes, effective_size is 16 bytes 332For id/data pair with data > 8 bytes, effective_size is 16 bytes + sizeof(data) 333Let's suppose that total_effective_size is the total size of the set of data that is written in 334the storage and that the partition is well dimensioned (double of the effective size) to avoid 335having the garbage collector moving blocks all the time. 336 337The expected life of the device in minutes is computed as : 338 339.. math:: 340 341 \small\frac{(SECTOR\_EFFECTIVE\_SIZE \times SECTOR\_NUMBER \times MAX\_NUM\_WRITES)}{(TOTAL\_EFFECTIVE\_SIZE \times WR\_MIN)} 342 343Where: 344 345``SECTOR_EFFECTIVE_SIZE``: is the size sector - header_size(80 bytes) 346 347``SECTOR_NUMBER``: is the number of sectors 348 349``MAX_NUM_WRITES``: is the life expectancy of the storage device in number of writes 350 351``TOTAL_EFFECTIVE_SIZE``: Total effective size of the set of written data 352 353``WR_MIN``: Number of writes of the set of data per minute 354 355Features 356******** 357ZMS has introduced many features compared to existing storage system like NVS and will evolve 358from its initial version to include more features that satisfies new technologies requirements 359such as low latency and bigger storage space. 360 361Existing features 362================= 363Version1 364-------- 365- Supports non-erasable devices (only one write operation to erase a sector) 366- Supports large partition size and sector size (64 bits address space) 367- Supports 32-bit IDs to store ID/Value pairs 368- Small sized data ( <= 8 bytes) are stored in the ATE itself 369- Built-in Data CRC32 (included in the ATE) 370- Versioning of ZMS (to handle future evolution) 371- Supports large write-block-size (Only for platforms that need this) 372 373Future features 374=============== 375 376- Add multiple format ATE support to be able to use ZMS with different ATE formats that satisfies 377 requirements from application 378- Add the possibility to skip garbage collector for some application usage where ID/value pairs 379 are written periodically and do not exceed half of the partition size (there is always an old 380 entry with the same ID). 381- Divide IDs into namespaces and allocate IDs on demand from application to handle collisions 382 between IDs used by different subsystems or samples. 383- Add the possibility to retrieve the wear out value of the device based on the cycle count value 384- Add a recovery function that can recover a storage partition if something went wrong 385- Add a library/application to allow migration from NVS entries to ZMS entries 386- Add the possibility to force formatting the storage partition to the ZMS format if something 387 went wrong when mounting the storage. 388 389ZMS and other storage systems in Zephyr 390======================================= 391This section describes ZMS in the wider context of storage systems in Zephyr (not full filesystems, 392but simpler, non-hierarchical ones). 393Today Zephyr includes at least two other systems that are somewhat comparable in scope and 394functionality: :ref:`NVS <nvs_api>` and :ref:`FCB <fcb_api>`. 395Which one to use in your application will depend on your needs and the hardware you are using, 396and this section provides information to help make a choice. 397 398- If you are using a non-erasable technology device like RRAM or MRAM, :ref:`ZMS <zms_api>` is definitely the 399 best fit for your storage subsystem as it is designed to avoid emulating erase operation using 400 large block writes for these devices and replaces it with a single write call. 401- For devices with large write_block_size and/or needs a sector size that is different than the 402 classical flash page size (equal to erase_block_size), :ref:`ZMS <zms_api>` is also the best fit as there is 403 the possibility to customize these parameters and add the support of these devices in ZMS. 404- For classical flash technology devices, :ref:`NVS <nvs_api>` is recommended as it has low footprint (smaller 405 ATEs and smaller header ATEs). Erasing flash in NVS is also very fast and do not require an 406 additional write operation compared to ZMS. 407 For these devices, NVS reads/writes will be faster as well than ZMS as it has smaller ATE size. 408- If your application needs more than 64K IDs for storage, :ref:`ZMS <zms_api>` is recommended here as it 409 has a 32-bit ID field. 410- If your application is working in a FIFO mode (First-in First-out) then :ref:`FCB <fcb_api>` is 411 the best storage solution for this use case. 412 413More generally to make the right choice between NVS and ZMS, all the blockers should be first 414verified to make sure that the application could work with one subsystem or the other, then if 415both solutions could be implemented, the best choice should be based on the calculations of the 416life expectancy of the device described in this section: :ref:`wear-leveling`. 417 418Recommendations to increase performance 419*************************************** 420 421Sector size and count 422===================== 423 424- The total size of the storage partition should be well dimensioned to achieve the best 425 performance for ZMS. 426 All the information regarding the effectively available free space in ZMS can be found 427 in the documentation. See :ref:`free-space`. 428 We recommend choosing a storage partition that can hold double the size of the key-value pairs 429 that will be written in the storage. 430- The size of a sector needs to be dimensioned to hold the maximum data length that will be stored. 431 Increasing the size of a sector will slow down the garbage collection operation which will 432 occur less frequently. 433 Decreasing its size, in the opposite, will make the garbage collection operation faster 434 which will occur more frequently. 435- For some subsystems like :ref:`Settings <settings_api>`, all path-value pairs are split into two ZMS entries (ATEs). 436 The header needed by the two entries should be accounted when computing the needed storage space. 437- Using small data to store in the ZMS entries can increase the performance, as this data is 438 written within the entry header. 439 For example, for the :ref:`Settings <settings_api>` subsystem, choosing a path name that is 440 less than or equal to 8 bytes can make reads and writes faster. 441 442Dimensioning cache 443================== 444 445- When using ZMS API directly, the recommended cache size should be, at least, equal to 446 the number of different entries that will be written in the storage. 447- Each additional cache entry will add 8 bytes to your RAM usage. Cache size should be carefully 448 chosen. 449- If you use ZMS through :ref:`Settings <settings_api>`, you have to take into account that each Settings entry is 450 divided into two ZMS entries. The recommended cache size should be, at least, twice the number 451 of Settings entries. 452 453Sample 454****** 455 456A sample of how ZMS can be used is supplied in :zephyr:code-sample:`zms`. 457 458API Reference 459************* 460 461The ZMS subsystem APIs are provided by ``zms.h``: 462 463.. doxygengroup:: zms_data_structures 464 465.. doxygengroup:: zms_high_level_api 466 467.. comment 468 not documenting .. doxygengroup:: zms 469