DESIGN.md - OpenGrok cross reference for /littlefs-2.7.6/DESIGN.md

Lines Matching full:is
16 filesystem that is resilient to power-loss and flash wear without using
19 This document covers the high-level design of littlefs, how it is different
31 Flash itself is an interesting piece of technology with its own quirks and
33 operations: erasing and programming. Programming (setting bits to 0) is
39 systems to lose power at any time. Usually, microcontroller code is simple and
51 1. **Wear leveling** - Writing to flash is destructive. If a filesystem
62    were possible. For RAM we have a stronger requirement, all RAM usage is
77    Here storage is divided into blocks, with each file being stored in a
79    power-loss resilient, so updating a file is a simple as rewriting the blocks
105    and smallest. However the lack of power resilience is not great, and the
110    [JFFS], [YAFFS], and [SPIFFS], storage location is not bound to a piece of
111    data, instead the entire storage is used for a circular log which is
131    The main downside is performance. If we look at garbage collection, the
138    SPIFFS is a very interesting case here, as it uses the fact that repeated
139    programs to NOR flash is both atomic and masking. This is a very neat
142 3. Perhaps the most common type of filesystem, a journaling filesystem is the
189    reach the root of our filesystem, which is often stored in a very small log.
242 complexity is _O(1)_ by bounding the input.
247 can write atomically!), but what it does do is prevent the upward motion of
250 large _n_ (&gt; branching factor) wear propagation is no longer a problem.
252 See where this is going? Separate, logging and COW are imperfect solutions and
256 This is the idea behind littlefs. At the sub-block level, littlefs is built
258 on the filesystem. At the super-block level, littlefs is a CObW tree of blocks
295 every _n_ writes. And there is still the challenge of keeping the RAM usage
309 We could make our logs larger than two blocks, but the next challenge is how
311 during writes, using a data structure to track these blocks is complicated.
312 The simple solution here is to store a two block addresses for every metadata
317 In order to determine which metadata block is the most recent, we store a
337              block 1 is    |----------------|----------------|
357 1. If our block is not full and the program size is small enough to let us
383    operation. What we can do instead is group multiple entries into a commit
406    entries to make space for new ones. This process is called garbage
410    Compared to other filesystems, littlefs's garbage collector is relatively
413    written. If the entry is the most recent we append it to our new block. This
414    is where having two blocks becomes important, if we lose power we still have
420    loss. It's only when the commit's checksum is written that the compacted
441 3. If our block is full of entries _and_ we can't find any garbage, then what?
443    more space is available, but because we have small logs, overflowing a log
450    This is a tradeoff as this approach does use more storage space, but at the
492 There is another complexity the crops up when dealing with small logs. The
493 amortized runtime cost of garbage collection is not only dependent on its
499 1. Log is empty, garbage collection occurs once every _n_ updates
500 2. Log is full, garbage collection occurs **every** update
522 updating an entry given how full the log is:
546 entry, the worst case complexity is _O(nm&sup2;)_, with an amortized complexity
582 linked-list. Appending a block, which is the basis for writing files, means we
623 is divisible by 2&zwj;_&#739;_, that block contains a pointer to block
654 The path to data block 0 is even faster, requiring only two jumps:
668 note is that this optimal path occurs naturally if we greedily choose the
671 So now we have a [COW] data structure that is cheap to append with a runtime
673 that this runtime is also divided by the amount of data we can store in a
674 block, this cost is fairly reasonable.
678 This is a new data structure, so we still have several questions. What is the
686 series. Solving this tells us that on average our storage overhead is only
691 Because our file size is limited the word width we use to store sizes, we can
706 pointers if they are smaller than 104 bytes. This is an easy requirement, as
708 is larger than 104 bytes, we can avoid the extra logic needed to handle
711 This last question is how do we store CTZ skip-lists? We need a pointer to the
716 However, calculating the index + offset pair from the size is a bit
726 doesn't need to touch the disk, so the practical impact is minimal.
729 this equation to a _O(1)_ form.  While browsing the amazing resource that is
755 Unfortunately, the popcount function is non-injective, so we can't solve this
756 equation for our index. But what we can do is solve for an ![n'] index that
757 is greater than ![n] with error bounded by the range of the popcount function.
759 is smaller than our integer resolution. As it turns out, we only need to
774 CTZ skip-lists give us a COW data structure that is easily traversable in
846 Deciding which block to use next is the responsibility of the block allocator.
847 In filesystem design, block allocation is often a second-class citizen, but in
848 a COW filesystem its role becomes much more important as it is needed for
852 the filesystem that is updated with free blocks. However, with power
858 littlefs relies on the fact that the filesystem on disk is a mirror image of
890 is simply a noop. This "drop it on the floor" strategy greatly reduces the
918 The block allocator in littlefs is a compromise between a disk-sized bitmap and
922 is empty, we scan the filesystem for more free blocks, populating our lookahead
963 Wear leveling is the process of distributing wear across all blocks in the
977 In littlefs, it is fairly straightforward to detect bad blocks at write time.
984 errors, we have a copy of the corrupted data in RAM, so all we need to do is
988 The actual act of evicting the bad block and replacing it with a new block is
990 property of CObW data structures is that any block can be replaced during a
991 COW operation. The bounded-writes part is normally triggered by a counter, but
1172 We may find that the new block is also bad, but hopefully after repeating this
1176 error. This is technically true, as there are no more good blocks, but as an
1184 original data even after it has been corrupted. One such mechanism for this is
1187 ECC is an extension to the idea of a checksum. Where a checksum such as CRC can
1189 correct some amount of errors. However, there is a limit to how many errors ECC
1192 can no longer fix the data. If we've reached this point the block is
1197 filesystems, correcting errors without RAM is complicated, and ECC fits better
1202 In littlefs, ECC is entirely optional. Read errors can instead be prevented
1213 One way to do this is to detect when the number of errors in a block exceeds
1214 some threshold, but is still recoverable. With ECC we can do this at write
1218 A different, more generic strategy, is to proactively distribute wear across
1220 rest of storage is approaching the end of its usable life. This is called
1234 dynamic wear leveling. This is a best effort solution. Wear is not distributed
1235 perfectly, but it is distributed among the free blocks and greatly extends the
1239 means is that we don’t actively track wear, instead we rely on a uniform
1241 algorithm. Despite the long name, this is actually a simplification of dynamic
1244 The uniform distribution of wear is left up to the block allocator, which
1245 creates a uniform distribution in two parts. The easy part is when the device
1246 is powered, in which case we allocate the blocks linearly, circling the device.
1247 The harder part is what to do when the device loses power. We can't just
1250 filesystem. As long as this random offset is uniform, the combined allocation
1251 pattern is also a uniform distribution.
1257 different random numbers on each boot. However, the filesystem is in a
1258 relatively unique situation in that it is sitting on top of a large of amount
1262 generator. In practice, this is implemented by xoring the checksums of each
1263 metadata pair, which is already calculated to fetch and mount the filesystem.
1292 Note that this random number generator is not perfect. It only returns unique
1293 random numbers when the filesystem is modified. This is exactly what we want
1295 is not useful for general use.
1303 aggressive wear leveling is desired, you can always combine littlefs with a
1334 However, this doesn't work well when files are small, which is common for
1335 embedded systems. Compared to PCs, _all_ data in an embedded system is small.
1339 with 4 KiB blocks, this is 12 KiB of overhead. A ridiculous 3072x increase.
1379 to do this is to directly associate a directory with a metadata pair (or a
1424 The second improvement we can make is noticing that for very small files, our
1426 have a ~4x storage cost, so if our file is smaller than 1/4 the block size,
1482 On their own, each directory is a linked-list of metadata pairs. This lets us
1512 The main complication is, once again, traversal with a constant amount of
1513 [RAM]. The directory tree is a tree, and the unfortunate fact is you can't
1517 skip-lists, we're not limited to strict COW operations. One thing we can do is
1551 only containing metadata pairs found in our filesystem, it is allowed to
1807 Finding orphans and half-orphans is expensive, requiring a _O(n&sup2;)_
1809 is a power resilient filesystem that works with only a bounded amount of RAM.
1819 We have one last challenge: the move problem. Phrasing the problem is simple:
1839   is found. Unfortunately this interacts poorly with our threaded tree and
1854 Global state is a small set of state that can be updated from _any_ metadata
1909 the filesystem is mounted.
1911 You may have noticed that global state is very expensive. We keep a copy in
1916 state is incredibly valuable.
1997 is duplicated in both the source and destination directories. If this happens,
2027 We can also move directories the same way we move files. There is the threaded
2105 And the result is surprisingly performant, only needing the minimum number