btrfs-progs: docs: add section about zoned devices

Signed-off-by: David Sterba <dsterba@suse.com>
This commit is contained in:
David Sterba 2021-06-17 23:21:03 +02:00
parent 4693e82261
commit 6710641ad5
2 changed files with 82 additions and 2 deletions

View file

@ -18,6 +18,7 @@ tools. Currently covers:
. filesystem limits . filesystem limits
. bootloader support . bootloader support
. file attributes . file attributes
. zoned mode
. control device . control device
. filesystems with multiple block group profiles . filesystems with multiple block group profiles
. seeding device . seeding device
@ -668,8 +669,9 @@ kernel, see `btrfs`(5)
*zoned*:: *zoned*::
(since: 5.12) (since: 5.12)
+ +
zoned mode is allocation/write friendly to host-managed devices, allocation zoned mode is allocation/write friendly to host-managed zoned devices,
space is split into fixed-size zones that must be updated sequentially allocation space is partitioned into fixed-size zones that must be updated
sequentially, see 'ZONED MODE'
SWAPFILE SUPPORT SWAPFILE SUPPORT
~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~
@ -1044,6 +1046,76 @@ refers to what `xfs_io`(8) provides:
'no dump', same as the attribute 'no dump', same as the attribute
ZONED MODE
----------
Since version 5.12 btrfs supports so called 'zoned mode'. This is a special
on-disk format and allocation/write strategy that's friendly to zoned devices.
In short, a device is partitioned into fixed-size zones and each zone can be
updated by append-only manner, or reset. As btrfs has no fixed data structures,
except the super blocks, the zoned mode only requires block placement that
follows the device constraints. You can learn about the whole architecture at
https://zonedstorage.io .
The devices are also called SMR/ZBC/ZNS, in 'host-managed' mode. Note that
there are devices that appear as non-zoned but actually are, this is
'drive-managed' and using zoned mode won't help.
The zone size depends on the device, typical sizes are 256MiB or 1GiB. In
general it must be a power of two. Emulated zoned devices like 'null_blk' allow
to set various zone sizes.
REQUIREMENTS, LIMITATIONS
~~~~~~~~~~~~~~~~~~~~~~~~~
* all devices must have the same zone size
* maximum zone size is 8GiB
* mixing zoned and non-zoned devices is possible, the zone writes are emulated,
but this is namely for testing
* the super block is handled in a special way and is at different locations
than on a non-zoned filesystem:
* primary: 0B (and the next two zones)
* secondary: 512G (and the next two zones)
* tertiary: 4TiB (4096GiB, and the next two zones)
INCOMPATIBLE FEATURES
~~~~~~~~~~~~~~~~~~~~~
The main constraint of the zoned devices is lack of in-place update of the data.
This is inherently incompatbile with some features:
* nodatacow - overwrite in-place, cannot create such files
* fallocate - preallocating space for in-place first write
* mixed-bg - unordered writes to data and metadata, fixing that means using
separate data and metadata block groups
* booting - the zone at offset 0 contains superblock, resetting the zone would
destroy the bootloader data
Initial support lacks some features but they're planned:
* only single profile is supported
* fstrim - due to dependency on free space cache v1
SUPER BLOCK
~~~~~~~~~~~
As said above, super block is handled in a special way. In order to be crash
safe, at least one zone in a known location must contain a valid superblock.
This is implemented as a ring buffer in two consecutive zones, starting from
known offsets 0, 512G and 4TiB. The values are different than on non-zoned
devices. Each new super block is appended to the end of the zone, once it's
filled, the zone is reset and writes continue to the next one. Looking up the
latest super block needs to read offsets of both zones and determine the last
written version.
The amount of space reserved for super block depends on the zone size. The
secondary and tertiary copies are at distant offsets as the capacity of the
devices is expected to be large, tens of terabytes. Maximum zone size supported
is 8GiB, which would mean that eg. offset 0-16GiB would be reserved just for
the super block on a hypothetical device of that zone size. This is wasteful
but required to guarantee crash safety.
CONTROL DEVICE CONTROL DEVICE
-------------- --------------

View file

@ -242,6 +242,14 @@ reduced-size metadata for extent references, saves a few percent of metadata
improved representation of file extents where holes are not explicitly improved representation of file extents where holes are not explicitly
stored as an extent, saves a few percent of metadata if sparse files are used stored as an extent, saves a few percent of metadata if sparse files are used
*zoned*::
(kernel support since 5.12)
+
zoned mode, data allocation and write friendly to zoned/SMR/ZBC/ZNS devices,
see 'ZONED MODE' in `btrfs`(5), the mode is automatically selected when
a zoned device is detected
RUNTIME FEATURES RUNTIME FEATURES
---------------- ----------------