btrfs-progs: docs: add section about zoned devices
Signed-off-by: David Sterba <dsterba@suse.com>
This commit is contained in:
parent
4693e82261
commit
6710641ad5
|
@ -18,6 +18,7 @@ tools. Currently covers:
|
||||||
. filesystem limits
|
. filesystem limits
|
||||||
. bootloader support
|
. bootloader support
|
||||||
. file attributes
|
. file attributes
|
||||||
|
. zoned mode
|
||||||
. control device
|
. control device
|
||||||
. filesystems with multiple block group profiles
|
. filesystems with multiple block group profiles
|
||||||
. seeding device
|
. seeding device
|
||||||
|
@ -668,8 +669,9 @@ kernel, see `btrfs`(5)
|
||||||
*zoned*::
|
*zoned*::
|
||||||
(since: 5.12)
|
(since: 5.12)
|
||||||
+
|
+
|
||||||
zoned mode is allocation/write friendly to host-managed devices, allocation
|
zoned mode is allocation/write friendly to host-managed zoned devices,
|
||||||
space is split into fixed-size zones that must be updated sequentially
|
allocation space is partitioned into fixed-size zones that must be updated
|
||||||
|
sequentially, see 'ZONED MODE'
|
||||||
|
|
||||||
SWAPFILE SUPPORT
|
SWAPFILE SUPPORT
|
||||||
~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~
|
||||||
|
@ -1044,6 +1046,76 @@ refers to what `xfs_io`(8) provides:
|
||||||
'no dump', same as the attribute
|
'no dump', same as the attribute
|
||||||
|
|
||||||
|
|
||||||
|
ZONED MODE
|
||||||
|
----------
|
||||||
|
|
||||||
|
Since version 5.12 btrfs supports so called 'zoned mode'. This is a special
|
||||||
|
on-disk format and allocation/write strategy that's friendly to zoned devices.
|
||||||
|
In short, a device is partitioned into fixed-size zones and each zone can be
|
||||||
|
updated by append-only manner, or reset. As btrfs has no fixed data structures,
|
||||||
|
except the super blocks, the zoned mode only requires block placement that
|
||||||
|
follows the device constraints. You can learn about the whole architecture at
|
||||||
|
https://zonedstorage.io .
|
||||||
|
|
||||||
|
The devices are also called SMR/ZBC/ZNS, in 'host-managed' mode. Note that
|
||||||
|
there are devices that appear as non-zoned but actually are, this is
|
||||||
|
'drive-managed' and using zoned mode won't help.
|
||||||
|
|
||||||
|
The zone size depends on the device, typical sizes are 256MiB or 1GiB. In
|
||||||
|
general it must be a power of two. Emulated zoned devices like 'null_blk' allow
|
||||||
|
to set various zone sizes.
|
||||||
|
|
||||||
|
REQUIREMENTS, LIMITATIONS
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
* all devices must have the same zone size
|
||||||
|
* maximum zone size is 8GiB
|
||||||
|
* mixing zoned and non-zoned devices is possible, the zone writes are emulated,
|
||||||
|
but this is namely for testing
|
||||||
|
* the super block is handled in a special way and is at different locations
|
||||||
|
than on a non-zoned filesystem:
|
||||||
|
* primary: 0B (and the next two zones)
|
||||||
|
* secondary: 512G (and the next two zones)
|
||||||
|
* tertiary: 4TiB (4096GiB, and the next two zones)
|
||||||
|
|
||||||
|
INCOMPATIBLE FEATURES
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The main constraint of the zoned devices is lack of in-place update of the data.
|
||||||
|
This is inherently incompatbile with some features:
|
||||||
|
|
||||||
|
* nodatacow - overwrite in-place, cannot create such files
|
||||||
|
* fallocate - preallocating space for in-place first write
|
||||||
|
* mixed-bg - unordered writes to data and metadata, fixing that means using
|
||||||
|
separate data and metadata block groups
|
||||||
|
* booting - the zone at offset 0 contains superblock, resetting the zone would
|
||||||
|
destroy the bootloader data
|
||||||
|
|
||||||
|
Initial support lacks some features but they're planned:
|
||||||
|
|
||||||
|
* only single profile is supported
|
||||||
|
* fstrim - due to dependency on free space cache v1
|
||||||
|
|
||||||
|
SUPER BLOCK
|
||||||
|
~~~~~~~~~~~
|
||||||
|
|
||||||
|
As said above, super block is handled in a special way. In order to be crash
|
||||||
|
safe, at least one zone in a known location must contain a valid superblock.
|
||||||
|
This is implemented as a ring buffer in two consecutive zones, starting from
|
||||||
|
known offsets 0, 512G and 4TiB. The values are different than on non-zoned
|
||||||
|
devices. Each new super block is appended to the end of the zone, once it's
|
||||||
|
filled, the zone is reset and writes continue to the next one. Looking up the
|
||||||
|
latest super block needs to read offsets of both zones and determine the last
|
||||||
|
written version.
|
||||||
|
|
||||||
|
The amount of space reserved for super block depends on the zone size. The
|
||||||
|
secondary and tertiary copies are at distant offsets as the capacity of the
|
||||||
|
devices is expected to be large, tens of terabytes. Maximum zone size supported
|
||||||
|
is 8GiB, which would mean that eg. offset 0-16GiB would be reserved just for
|
||||||
|
the super block on a hypothetical device of that zone size. This is wasteful
|
||||||
|
but required to guarantee crash safety.
|
||||||
|
|
||||||
|
|
||||||
CONTROL DEVICE
|
CONTROL DEVICE
|
||||||
--------------
|
--------------
|
||||||
|
|
||||||
|
|
|
@ -242,6 +242,14 @@ reduced-size metadata for extent references, saves a few percent of metadata
|
||||||
improved representation of file extents where holes are not explicitly
|
improved representation of file extents where holes are not explicitly
|
||||||
stored as an extent, saves a few percent of metadata if sparse files are used
|
stored as an extent, saves a few percent of metadata if sparse files are used
|
||||||
|
|
||||||
|
*zoned*::
|
||||||
|
(kernel support since 5.12)
|
||||||
|
+
|
||||||
|
zoned mode, data allocation and write friendly to zoned/SMR/ZBC/ZNS devices,
|
||||||
|
see 'ZONED MODE' in `btrfs`(5), the mode is automatically selected when
|
||||||
|
a zoned device is detected
|
||||||
|
|
||||||
|
|
||||||
RUNTIME FEATURES
|
RUNTIME FEATURES
|
||||||
----------------
|
----------------
|
||||||
|
|
||||||
|
|
Loading…
Reference in a new issue