btrfs-progs: docs: add section about zoned devices
Signed-off-by: David Sterba <dsterba@suse.com>
This commit is contained in:
parent
4693e82261
commit
6710641ad5
|
@ -18,6 +18,7 @@ tools. Currently covers:
|
|||
. filesystem limits
|
||||
. bootloader support
|
||||
. file attributes
|
||||
. zoned mode
|
||||
. control device
|
||||
. filesystems with multiple block group profiles
|
||||
. seeding device
|
||||
|
@ -668,8 +669,9 @@ kernel, see `btrfs`(5)
|
|||
*zoned*::
|
||||
(since: 5.12)
|
||||
+
|
||||
zoned mode is allocation/write friendly to host-managed devices, allocation
|
||||
space is split into fixed-size zones that must be updated sequentially
|
||||
zoned mode is allocation/write friendly to host-managed zoned devices,
|
||||
allocation space is partitioned into fixed-size zones that must be updated
|
||||
sequentially, see 'ZONED MODE'
|
||||
|
||||
SWAPFILE SUPPORT
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
@ -1044,6 +1046,76 @@ refers to what `xfs_io`(8) provides:
|
|||
'no dump', same as the attribute
|
||||
|
||||
|
||||
ZONED MODE
|
||||
----------
|
||||
|
||||
Since version 5.12 btrfs supports so called 'zoned mode'. This is a special
|
||||
on-disk format and allocation/write strategy that's friendly to zoned devices.
|
||||
In short, a device is partitioned into fixed-size zones and each zone can be
|
||||
updated by append-only manner, or reset. As btrfs has no fixed data structures,
|
||||
except the super blocks, the zoned mode only requires block placement that
|
||||
follows the device constraints. You can learn about the whole architecture at
|
||||
https://zonedstorage.io .
|
||||
|
||||
The devices are also called SMR/ZBC/ZNS, in 'host-managed' mode. Note that
|
||||
there are devices that appear as non-zoned but actually are, this is
|
||||
'drive-managed' and using zoned mode won't help.
|
||||
|
||||
The zone size depends on the device, typical sizes are 256MiB or 1GiB. In
|
||||
general it must be a power of two. Emulated zoned devices like 'null_blk' allow
|
||||
to set various zone sizes.
|
||||
|
||||
REQUIREMENTS, LIMITATIONS
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
* all devices must have the same zone size
|
||||
* maximum zone size is 8GiB
|
||||
* mixing zoned and non-zoned devices is possible, the zone writes are emulated,
|
||||
but this is namely for testing
|
||||
* the super block is handled in a special way and is at different locations
|
||||
than on a non-zoned filesystem:
|
||||
* primary: 0B (and the next two zones)
|
||||
* secondary: 512G (and the next two zones)
|
||||
* tertiary: 4TiB (4096GiB, and the next two zones)
|
||||
|
||||
INCOMPATIBLE FEATURES
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The main constraint of the zoned devices is lack of in-place update of the data.
|
||||
This is inherently incompatbile with some features:
|
||||
|
||||
* nodatacow - overwrite in-place, cannot create such files
|
||||
* fallocate - preallocating space for in-place first write
|
||||
* mixed-bg - unordered writes to data and metadata, fixing that means using
|
||||
separate data and metadata block groups
|
||||
* booting - the zone at offset 0 contains superblock, resetting the zone would
|
||||
destroy the bootloader data
|
||||
|
||||
Initial support lacks some features but they're planned:
|
||||
|
||||
* only single profile is supported
|
||||
* fstrim - due to dependency on free space cache v1
|
||||
|
||||
SUPER BLOCK
|
||||
~~~~~~~~~~~
|
||||
|
||||
As said above, super block is handled in a special way. In order to be crash
|
||||
safe, at least one zone in a known location must contain a valid superblock.
|
||||
This is implemented as a ring buffer in two consecutive zones, starting from
|
||||
known offsets 0, 512G and 4TiB. The values are different than on non-zoned
|
||||
devices. Each new super block is appended to the end of the zone, once it's
|
||||
filled, the zone is reset and writes continue to the next one. Looking up the
|
||||
latest super block needs to read offsets of both zones and determine the last
|
||||
written version.
|
||||
|
||||
The amount of space reserved for super block depends on the zone size. The
|
||||
secondary and tertiary copies are at distant offsets as the capacity of the
|
||||
devices is expected to be large, tens of terabytes. Maximum zone size supported
|
||||
is 8GiB, which would mean that eg. offset 0-16GiB would be reserved just for
|
||||
the super block on a hypothetical device of that zone size. This is wasteful
|
||||
but required to guarantee crash safety.
|
||||
|
||||
|
||||
CONTROL DEVICE
|
||||
--------------
|
||||
|
||||
|
|
|
@ -242,6 +242,14 @@ reduced-size metadata for extent references, saves a few percent of metadata
|
|||
improved representation of file extents where holes are not explicitly
|
||||
stored as an extent, saves a few percent of metadata if sparse files are used
|
||||
|
||||
*zoned*::
|
||||
(kernel support since 5.12)
|
||||
+
|
||||
zoned mode, data allocation and write friendly to zoned/SMR/ZBC/ZNS devices,
|
||||
see 'ZONED MODE' in `btrfs`(5), the mode is automatically selected when
|
||||
a zoned device is detected
|
||||
|
||||
|
||||
RUNTIME FEATURES
|
||||
----------------
|
||||
|
||||
|
|
Loading…
Reference in a new issue