Lines Matching full:log

40 The reason for these differences is to keep the amount of log space and CPU time
46 The method used to log an item or chain modifications together isn't
57 XFS has two types of high level transactions, defined by the type of log space
141 a "log force" to flush the outstanding committed transactions to stable storage
146 tend to use log forces to ensure modifications are on stable storage only when
160 A transaction reservation provides a guarantee that there is physical log space
164 log in the worst case. This means that if we are modifying a btree in the
179 log has this much space available before the transaction is allowed to proceed
180 so that when we come to write the dirty metadata into the log we don't run out
181 of log space half way through the write.
185 also have a "log count" that affects the size of the reservation that is to be
197 transaction, we might set the reservation log count to a value of 2 to indicate
202 Hence when the permanent transaction is first allocated, the log space
204 reservations. That multiple is defined by the reservation log count, and this
206 log space when we roll the transaction. This ensures that the common
207 modifications we make only need to reserve log space once.
209 If the log count for a permanent transaction reaches zero, then it needs to
210 re-reserve physical space in the log. This is somewhat complex, and requires
211 an understanding of how the log accounts for space that has been reserved.
214 Log Space Accounting
217 The position in the log is typically referred to as a Log Sequence Number (LSN).
218 The log is circular, so the positions in the log are defined by the combination
219 of a cycle number - the number of times the log has been overwritten - and the
220 offset into the log. A LSN carries the cycle in the upper 32 bits and the
223 available space in the log.
225 Log space accounting is done via a pair of constructs called "grant heads". The
227 available in the log is defined by the distance between the position of the
228 grant head and the current log tail. That is, how much space can be
229 reserved/consumed before the grant heads would fully wrap the log and overtake
235 into the log rather than basic blocks. Hence it technically isn't using LSNs to
236 represent the log position, but it is still treated like a split {cycle,offset}
241 and need to write into the log. The reserve head is used to prevent new
243 tail. It will block new reservations in a FIFO queue and as the log tail moves
245 mechanism ensures no transaction is starved of resources when log space
249 head contains an LSN and it tracks the physical space usage in the log. While
255 These differences when a permanent transaction is rolled and the internal "log
257 exhausted. At this point, we still require a log space reservation to continue
259 sleep during the transaction commit process waiting for new log space to become
261 locked while we sleep could end up pinning the tail of the log before there is
262 enough free space in the log to fulfill all of the pending reservations and
267 we need to be able to *overcommit* the log reservation space. As has already
268 been detailed, we cannot overcommit physical log space. However, the reserve
271 over the tail of the log all it means is that new reservations will be throttled
272 immediately and remain throttled until the log tail is moved forward far enough
274 can overcommit the reserve head without violating the physical log head and tail
278 xfs_trans_commit() calls, while the physical log space reservation - tracked by
281 physical log space to be reserved from the write grant head, but only if one
284 Code using permanent reservations must always log the items they hold
289 physical head of the log and so do not pin the tail of the log. If a locked item
290 pins the tail of the log when we sleep on the write reservation, then we will
291 deadlock the log as we cannot take the locks needed to write back that item and
292 move the tail of the log forwards to free up write grant space. Re-logging the
293 locked items avoids this deadlock and guarantees that the log reservation we are
297 progress independently because nothing will block the progress of the log
307 the log at any given time. This allows the log to avoid needing to flush each
311 existing changes in the new transaction that is written to the log.
314 written to disk after change D, we would see in the log the following series
315 of transactions, their contents and the log sequence number (LSN) of the
328 the aggregation of all the previous changes currently held only in the log.
330 This relogging technique allows objects to be moved forward in the log so that
331 an object being relogged does not prevent the tail of the log from ever moving
340 removal operation. This keeps them moving forward in the log as the operation
342 log wraps around.
347 the log - repeated operations to the same objects write the same changes to
348 the log over and over again. Worse is the fact that objects tend to get
350 metadata into the log.
354 until either a log buffer is filled (a log buffer can hold multiple
355 transactions) or a synchronous operation forces the log buffers holding the
357 in memory - batching them, if you like - to minimise the impact of the log IO on
361 log buffers made available by the log manager. By default there are 8 log
366 that can be made to the filesystem at any point in time - if all the log
369 be to able to issue enough transactions to keep the log buffers full and under
378 multiple times before they are committed to disk in the log buffers. If we
380 transactions A through D are committed to disk in the same log buffer.
382 That is, a single log buffer may contain multiple copies of the same object,
385 necessary copy in the log buffer, and three stale copies that are simply
387 objects, these "stale objects" can be over 90% of the space used in the log
389 log would greatly reduce the amount of metadata we write to the log, and this
393 memory == log buffer), only it is doing it extremely inefficiently. It is using
396 formatting the changes in a transaction to the log buffer. Hence we cannot avoid
397 accumulating stale objects in the log buffers.
400 changes to objects in memory outside the log buffer infrastructure. Because of
404 them and get them to the log in a consistent, recoverable manner.
410 metadata changes from the size and number of log buffers available. In other
412 written to the log at any point in time, there may be a much greater amount
416 It should be noted that this does not change the guarantee that log recovery
426 log is used effectively in many filesystems including ext3 and ext4. Hence
433 1. Reduce the amount of metadata written to the log by at least
438 4. No on-disk format change (metadata or log format).
449 existing log item dirty region tracking) is that when it comes to writing the
450 changes to the log buffers, we need to ensure that the object we are formatting
452 concurrent modification. Hence flushing the logical changes to the log would
459 trying to get the lock on object A to flush it to the log buffer. This appears
465 vector array that points to the changed regions in the item. The log write code
466 simply copies the memory these vectors point to into the log buffer during
468 using the log buffer as the destination of the formatting code, we can use an
473 the changes in a format that is compatible with the log buffer writing code.
480 asynchronous transactions to the log. The differences between the existing
484 Current format log vector::
493 Log Buffer +-V1-+-V2-+----V3----+
515 buffer is to support splitting vectors across log buffer boundaries correctly.
517 are in the item, so we'd need a new encapsulation method for regions in the log
519 change and as such is not desirable. It also means we'd have to write the log
521 region state that needs to be placed into the headers during the log write.
525 self-describing object that can be passed to the log buffer write code to be
526 handled in exactly the same manner as the existing log vectors are handled.
536 them so that they can be written to the log at some later point in time. The
537 log item is the natural place to store this vector and buffer, and also makes sense
541 The log item is already used to track the log items that have been written to
542 the log but not yet written to disk. Such log items are considered "active"
544 double linked list. Items are inserted into this list during log buffer IO
547 and then moved forward in the AIL when the log buffer IO completes for that
554 committed item tracking needs its own locks, lists and state fields in the log
558 called the Committed Item List (CIL). The list tracks log items that have been
571 When we have a log synchronisation event, commonly known as a "log force",
572 all the items in the CIL must be written into the log via the log buffers.
576 log replay - all the changes in all the objects in a given transaction must
577 either be completely replayed during log recovery, or not replayed at all. If
578 a transaction is not replayed because it is not complete in the log, then
581 To fulfill this requirement, we need to write the entire CIL in a single log
582 transaction. Fortunately, the XFS log code has no fixed limit on the size of a
583 transaction, nor does the log replay code. The only fundamental limit is that
584 the transaction cannot be larger than just under half the size of the log. The
585 reason for this limit is that to find the head and tail of the log, there must
586 be at least one complete transaction in the log at any given time. If a
587 transaction is larger than half the log, then there is the possibility that a
589 only complete previous transaction in the log. This will result in a recovery
591 size of a checkpoint to be slightly less than a half the log.
595 formatted log items and a commit record at the tail. From a recovery
600 Because the checkpoint is just another transaction and all the changes to log
601 items are stored as log vectors, we can use the existing log buffer writing
602 code to write the changes into the log. To do this efficiently, we need to
604 transaction. The current log write code enables us to do this easily with the
605 way it separates the writing of the transaction contents (the log vectors) from
607 per-checkpoint context that travels through the log write process through to
619 are formatting the checkpoint into the log. It also allows concurrent
620 checkpoints to be written into the log buffers in the case of log force heavy
622 requires that we strictly order the commit records in the log so that
623 checkpoint sequence order is maintained during log replay.
626 the same time another transaction modifies the item and inserts the log item
627 into the new CIL, then checkpoint transaction commit code cannot use log items
628 to store the list of log vectors that need to be written into the transaction.
629 Hence log vectors need to be able to be chained together to allow them to be
630 detached from the log items. That is, when the CIL is flushed the memory
631 buffer and log vector attached to each log item needs to be attached to the
632 checkpoint context so that the log item can be released. In diagrammatic form,
638 Log Item <-> log vector 1 -> memory buffer
641 Log Item <-> log vector 2 -> memory buffer
647 Log Item <-> log vector N-1 -> memory buffer
650 Log Item <-> log vector N -> memory buffer
653 And after the flush the CIL head is empty, and the checkpoint context log
659 log vector 1 -> memory buffer
661 | -> Log Item
663 log vector 2 -> memory buffer
665 | -> Log Item
670 log vector N-1 -> memory buffer
672 | -> Log Item
674 log vector N -> memory buffer
676 -> Log Item
679 start, while the checkpoint flush code works over the log vector chain to
682 Once the checkpoint is written into the log buffers, the checkpoint context is
683 attached to the log buffer that the commit record was written to along with a
684 completion callback. Log IO completion will call that callback, which can then
685 run transaction committed processing for the log items (i.e. insert into AIL
686 and unpin) in the log vector chain and then free the log vector chain and
689 Discussion Point: I am uncertain as to whether the log item is the most
691 it. The fact that we walk the log items (in the CIL) just to chain the log
692 vectors and break the link between the log item and the log vector means that
693 we take a cache line hit for the log item list modification, then another for
694 the log vector chaining. If we track by the log vectors, then we only need to
695 break the link between the log item and the log vector, which means we should
696 dirty only the log item cachelines. Normally I wouldn't be concerned about one
697 vs two dirty cachelines except for the fact I've seen upwards of 80,000 log
706 committed transactions with the log sequence number of the transaction commit.
709 committed to the log. In the rare case that a dependent operation occurs (e.g.
710 re-using a freed metadata extent for a data extent), a special, optimised log
714 transaction. This LSN comes directly from the log buffer the transaction is
717 written directly into the log buffers. Hence some other method of sequencing
728 Then, instead of assigning a log buffer LSN to the transaction commit LSN
732 result, the code that forces the log to a specific LSN now needs to ensure that
733 the log forces to a specific checkpoint.
736 that are currently committing to the log. When we flush a checkpoint, the
740 we can also wait on the log buffer that contains the commit record, thereby
741 using the existing log force mechanisms to execute synchronous forces.
744 mitigation algorithms similar to the current log buffer code to allow
749 The main concern with log forces is to ensure that all the previous checkpoints
753 synchronisation in the log force code so that we don't need to wait anywhere
754 else for such serialisation - it only matters when we do a log force.
756 The only remaining complexity is that a log force now also has to handle the
759 simple addition to the existing log forcing code to check the sequence numbers
761 the log force code enables the current mechanism for issuing synchronous
763 force the log at the LSN of that transaction) and so the higher level code
766 Delayed Logging: Checkpoint Log Space Accounting
769 The big issue for a checkpoint transaction is the log space reservation for the
771 ahead of time, nor how many log buffers it will take to write out, nor the
772 number of split log vector regions are going to be used. We can track the
773 amount of log space required as we add items to the commit item list, but we
774 still need to reserve the space in the log for the checkpoint.
776 A typical transaction reserves enough space in the log for the worst case space
777 usage of the transaction. The reservation accounts for log record headers,
782 of log vectors in the transaction).
786 there are lots of transactions that only contain an inode core and an inode log
793 space. From this, it should be obvious that a static log space reservation is
802 log buffer metadata used such as log header records.
804 However, even using a static reservation for just the log metadata is
805 problematic. Typically log record headers use at least 16KB of log space per
806 1MB of log space consumed (512 bytes per 32k) and the reservation needs to be
812 A static reservation needs to manipulate the log grant counters - we can take a
819 checkpoints to be able to free up log space (refer back to the description of
821 space available in the log if we are to use static reservations, and that is
825 The simpler way of doing this is tracking the entire log space used by the
826 items in the CIL and using this to dynamically calculate the amount of log
827 space required by the log metadata. If this log metadata space changes as a
832 maximal amount of log metadata space they require, and such a delta reservation
836 are added to the CIL and avoid the need for reserving and regranting log space
841 log. Hence as part of the reservation growing, we need to also check the size
843 the maximum threshold, we need to push the CIL to the log. This is effectively
845 a CIL push triggered by a log force, only that there is no waiting for the
850 they will be flushed by the periodic log force issued by the xfssyncd. This log
852 allow the idle log to be covered (effectively marked clean) in exactly the same
854 whether this log force needs to be done more frequently than the current rate
858 Delayed Logging: Log Item Pinning
861 Currently log items are pinned during transaction commit while the items are
864 that items get pinned once for every transaction that is committed to the log
865 buffers. Hence items that are relogged in the log buffers will have a pin count
869 pending transactions. Thus the pinning and unpinning of a log item is symmetric
870 as there is a 1:1 relationship with transaction commit and log item completion.
876 log item completion. The result of this is that pinning and unpinning of the
877 log items becomes unbalanced if we retain the "pin on transaction commit, unpin
919 the amount of space available in the log for their reservations. The practical
921 128MB log, which means that it is generally one per CPU in a machine.
924 relatively long period of time - the pinning of log items needs to be done
934 flushing the CIL could involve walking a list of tens of thousands of log
955 that is run as part of the checkpoint commit and log force sequencing. The code
956 path that triggers a CIL flush (i.e. whatever triggers the log force) will enter
957 an ordering loop after writing all the log vectors into the log buffers but
960 record write. As a result it needs a lock and a wait variable. Log force
967 (obtained through completion of a commit record write) while log force
981 The existing log item life cycle is as follows::
988 Allocate log item
989 Attach log item to owner item
990 Attach log item to transaction
992 Record modifications in log item
995 Format item into log buffer
998 Attach transaction to log buffer
1000 <log buffer IO dispatched>
1001 <log buffer IO completes>
1004 Mark log item committed
1005 Insert log item into AIL
1006 Write commit LSN into log item
1007 Unpin log item
1010 Mark log item clean
1015 9. Log item removed from AIL
1016 Moves log tail
1022 at the same time. If the log item is in the AIL or between steps 6 and 7
1033 Allocate log item
1034 Attach log item to owner item
1035 Attach log item to transaction
1037 Record modifications in log item
1040 Format item into log vector + buffer
1041 Attach log vector and buffer to log item
1042 Insert log item into CIL
1046 <next log force>
1050 Chain log vectors and buffers together
1053 write log vectors into log
1055 attach checkpoint context to log buffer
1057 <log buffer IO dispatched>
1058 <log buffer IO completes>
1061 Mark log item committed
1063 Write commit LSN into log item
1064 Unpin log item
1067 Mark log item clean
1070 10. Log item removed from AIL
1071 Moves log tail
1077 committing of the log items to the log itself and the completion processing.
1078 Hence delayed logging should not introduce any constraints on log item
1084 mount option. Fundamentally, there is no reason why the log manager would not