btrfs-progs: docs: updates, clarifications

Update spelling, add notable kernel/version features or updates. Signed-off-by: David Sterba <dsterba@suse.com>
2022-12-14 02:56:16 +01:00 · 2022-12-14 02:56:16 +01:00 · 8cec98cb75
parent 719b5a592f
commit 8cec98cb75
9 changed files with 160 additions and 28 deletions
--- a/Documentation/Auto-repair.rst
+++ b/Documentation/Auto-repair.rst
@ -2,7 +2,13 @@ Auto-repair on read
 ===================

 Data or metadata that are found to be damaged (e.g. because the checksum does
-not match) at the time they're read from the device can be salvaged in case the
+not match) at the time they're read from a device can be salvaged in case the
 filesystem has another valid copy when using block group profile with redundancy
-(DUP, RAID1, RAID5/6). The correct data are returned to the user application
-and the damaged copy is replaced by it.
+(DUP, RAID1-like, RAID5/6). The correct data are returned to the user application
+and the damaged copy is replaced by it. When this happen a message is emitted
+to the system log.
+
+If there are more copies of data and one of them is damaged but not read by
+user application then this is not detected. To verify all data and metadata
+copies there's :doc:`scrub<Scrub>` that needs to be started manually, automatic
+repairs happens in that case.
--- a/Documentation/Custom-ioctls.rst
+++ b/Documentation/Custom-ioctls.rst
@ -6,14 +6,15 @@ call interface to let user applications access the advanced features. They're
 low level and the following list gives only an overview of the capabilities or
 a command if available:

- reverse lookup, from file offset to inode, ``btrfs inspect-internal
-  logical-resolve``
+- reverse lookup, from file offset to inode, as command ``btrfs
+  inspect-internal logical-resolve``

- resolve inode number to list of name, ``btrfs inspect-internal inode-resolve``
+- resolve inode number to list of names, as command ``btrfs inspect-internal inode-resolve``

 - tree search, given a key range and tree id, lookup and return all b-tree items
  found in that range, basically all metadata at your hand but you need to know
-  what to do with them
+  what to do with them, the ioctl is privileged as it has full access to all
+  filesystem metadata

 - informative, about devices, space allocation or the whole filesystem, many of
  which is also exported in ``/sys/fs/btrfs``
--- a/Documentation/Deduplication.rst
+++ b/Documentation/Deduplication.rst
@ -41,8 +41,8 @@ File based deduplication
 ------------------------

 The tool takes a list of files and tries to find duplicates among data only
-from that files. This is suitable e.g. for files that originated from the same
-base image, source of a reflinked file. Optionally the tools could track a
+from these files. This is suitable e.g. for files that originated from the same
+base image, source of a reflinked file. Optionally the tool could track a
 database of hashes and allow to deduplicate blocks from more files, or use that
 for repeated runs and update the database incrementally.

@ -53,7 +53,7 @@ The tool typically scans the filesystem and builds a database of file block
 hashes, then finds candidate files and deduplicates the ranges. The hash
 database is kept as an ordinary file and can be scaled according to the needs.

-As the file changes, the hash database may get out of sync and the scan has to
+As the files change, the hash database may get out of sync and the scan has to
 be done repeatedly.

 Safety of block comparison
@ -64,12 +64,13 @@ a source file, destination file and the range. The blocks from both files are
 compared for exact match before merging to the same range (i.e. there's no
 hash based comparison). Pages representing the extents in memory are locked
 prior to deduplication and prevent concurrent modification by buffered writes
-or mmaped writes.
+or mmaped writes. Blocks are compared byte by byte and not using any hash-based
+approach, i.e. the existing checksums are not used.

 Limitations, compatibility
 --------------------------

-Files that are subject do deduplication must have the same status regarding
+Files that are subject to deduplication must have the same status regarding
 COW, i.e. both regular COW files with checksums, or both NOCOW, or files that
 are COW but don't have checksums (NODATASUM attribute is set).

--- a/Documentation/Feature-by-version.rst
+++ b/Documentation/Feature-by-version.rst
@ -384,10 +384,10 @@ features see [[Status]] page.

 5.17 - send and relocation
        Send and relocation (balance, device remove, shrink, block group
-        reclaim) can now work in parallel
+        reclaim) can now work in parallel.

 5.17 - device add vs balance
-        It is possible to add a device with paused balance
+        It is possible to add a device with paused balance.

        .. note::
           Since kernel 5.17.7 and btrfs-progs 5.17.1
@ -414,11 +414,116 @@ features see [[Status]] page.
        the VFS limitation to reflink files on separate subvolume mounts of the
        same filesystem has been removed

+5.18 - syslog error messages with filesystem state
+        Messages are printed with a one letter tag ("state: X") that denotes in
+        which state the filesystem was at this point:
+
+        * A - transaction aborted (permanent)
+        * E - filesystem error (permanent)
+        * M - remount in progress (transient)
+        * R - device replace in progress (transient)
+        * C - checksum checks disabled by mount option (rescue=ignoredatacsums)
+        * L - log tree replay did not complete due to some error
+
+5.18 - tree-checker verifies transaction id pre-write
+        Metadata buffer to be written gets an extra check if the stored
+        transaction number matches the current state of the filesystem.
+
 5.19 - subpage support pages > 4KiB
        Metadata node size is supported regardless of the CPU page size
-        (minimum size is 4KiB), data sectorsize is supported <= page size.
+        (minimum size is 4KiB), data sector size is supported <= page size.
        Additionally subpage also supports RAID56.

 5.19 - per-type background threshold for reclaim
        Add sysfs tunable for background reclaim threshold for all block group
        types (data, metadata, system).
+
+5.19 - automatically repair device number mismatch
+        Device information is storead in two places, the number in the super
+        block and items in the device tree. When this is goes out of sync, e.g.
+        by device removal short before unmount, the next mount could fail.
+        The b-tree is an authoritative information an can be used to override
+        the stale value in the superblock.
+
+5.19 - defrag can convert inline files to regular ones
+        The logic has been changed so that inline files are considered for
+        defragmentation even if the mount option max_inline would prevent that.
+        No defragmentation might happen but the inlined files are not skipped.
+
+5.19 - explicit minimum zone size is 4MiB
+        Set the minimum limit of zone on zoned devices to 4MiB. Real devices
+        zones are much larger, this is for emulated devices.
+
+5.19 - sysfs tunable for automatic block group reclaim
+        Add possibility to set a threshold to automatically reclaim block groups
+        also in non-zoned mode. By default completely empty block groups are
+        reclaimed automatically but the threshold can be tuned in
+        /sys/fs/btrfs/FSID/allocation/PROFILE/bg_reclaim_threshold .
+
+5.19 - tree-checker verifies metadata block ownership
+        Additional check done by tree-checker to verify relationship between a
+        tree block and it's tree root owner.
+
+6.x
+---
+
+6.0 - send protocol v2
+        Send protocol update that adds new commands and extends existing
+        functionality to write large data chunks. Compressed (and encrypted)
+        extents can be optionally emitted and transfered as-is without the need
+        to recompress (or reencrypt) on the receiving side.
+
+6.0 - sysfs exports commit stats
+        The file /sys/fs/btrfs/FSID/commit_stats shows number of commits and
+        various time related statistics.
+
+6.0 - sysfs exports chunk sizes
+        Chunk size value can be read from
+        /sys/fs/btrfs/FSID/allocation/PROFILE/chunk_size .
+
+6.0 - sysfs shows zoned mode among features
+        The zoned mode has been supported since 5.10 and adding functionality.
+        Now it's advertised among features.
+
+6.0 - checksum implementation is logged at mount time
+        When a filesystem is mounted the implementation backing the checksums
+        is logged. The information is also accessible in
+        /sys/fs/btrfs/FSID/checksum .
+
+6.1 - sysfs support to temporarily skip exact qgroup accounting
+        Allow user override of qgroup accounting and make it temporarily out
+        of date e.g. in case when there are several subvolumes deleted and the
+        qgroup numbers need to be updated at some cost, an update after that
+        can amortize the costs.
+
+6.1 - scrub also repairs superblock
+        An improvement to scrub in case the superblock is detected to be
+        corrupted, the repair happens immediately. Previously it was delayed
+        until the next transaction commit for performance reasons that would
+        store an updated and correct copy eventually.
+
+6.1 - block group tree
+        An incompatible change that has to be enabled at mkfs time. Add a new
+        b-tree item that stores information about block groups in a compact way
+        that significantly improves mount time that's usually long due to
+        fragmentation and scatterd b-tree items tracking the individual block
+        groups. Requires and also enables the free-space-tree and no-holes
+        features.
+
+6.1 - discard stats available in sysfs
+        The directory '/sys/fs/btrfs/FSID/discard' exports statistics and
+        tunables related to discard.
+
+6.1 - additional qgroup stats in sysfs
+        The overall status of qgroups are exported in
+        /sys/sys/fs/btrfs/FSID/qgroups/ .
+
+6.1 - check that subperblock is unchnaged at thaw time
+        Do full check of super block once a filesystem is thawed. This namely
+        happens when system resumes from suspend or hibernation. Accidental
+        change by other operating systems will be detected.
+
+6.2 - discard=async on by default
+        Devices that support trim/discard will enable the asynchronous discard
+        for the whole filesystem.
+
--- a/Documentation/Glossary.rst
+++ b/Documentation/Glossary.rst
@ -19,7 +19,7 @@ balance
 	again. It is primarily intended to rebalance the data in the filesystem
 	across the *devices* when a device is added or removed. A balance
 	will regenerate missing copies for the redundant *RAID* levels, if a
-	device has failed. As of linux kernel 3.3, a balance operation can be
+	device has failed. As of Linux kernel 3.3, a balance operation can be
 	made selective about which parts of the filesystem are rewritten.

 barrier
--- a/Documentation/btrfs-ioctl.rst
+++ b/Documentation/btrfs-ioctl.rst
@ -163,7 +163,7 @@ fd
    ignored
 name
    name of the subvolume, although the buffer can be almost 4k, the file
-    size is limited by linux VFS to 255 characters and must not contain a slash
+    size is limited by Linux VFS to 255 characters and must not contain a slash
    ('/')

 BTRFS_IOC_SUBVOL_CREATE_V2
@ -190,7 +190,7 @@ qgroup_inherit
    ...
 name
    name of the subvolume, although the buffer can be almost 4k, the file size
-    is limited by linux VFS to 255 characters and must not contain a slash ('/')
+    is limited by Linux VFS to 255 characters and must not contain a slash ('/')
 devid
    ...

--- a/Documentation/ch-bootloaders.rst
+++ b/Documentation/ch-bootloaders.rst
@ -4,8 +4,11 @@ booting from BTRFS with respect to features.
 U-boot (https://www.denx.de/wiki/U-Boot/) has decent support for booting but
 not all BTRFS features are implemented, check the documentation.

-EXTLINUX (from the https://syslinux.org project) can boot but does not support
-all features. Please check the upstream documentation before you use it.
+EXTLINUX (from the https://syslinux.org project) has limited support for BTRFS
+boot and hasn't been updated for for a long time so is not recommended as
+bootloader.

-The first 1MiB on each device is unused with the exception of primary
-superblock that is on the offset 64KiB and spans 4KiB.
+In general, the first 1MiB on each device is unused with the exception of
+primary superblock that is on the offset 64KiB and spans 4KiB. The rest can be
+freely used by bootloaders or for other system information. Note that booting
+from a filesystem on :doc:`zoned device<Zoned-mode>` is not supported.
--- a/Documentation/ch-fs-limits.rst
+++ b/Documentation/ch-fs-limits.rst
@ -1,6 +1,9 @@
 maximum file name length
        255

+        This limit is imposed by Linux VFS, the strucutres of BTRFS could store
+        larger file names.
+
 maximum symlink target length
        depends on the *nodesize* value, for 4KiB it's 3949 bytes, for larger nodesize
        it's 4095 due to the system limit PATH_MAX
@ -13,16 +16,29 @@ maximum number of inodes
        2\ :sup:`64` but depends on the available metadata space as the inodes are created
        dynamically

+        Each subvolume is an independent namespace of inodes and thus their
+        numbers, so the limit is per subvolume, not for the whole filesystem.
+
 inode numbers
-        minimum number: 256 (for subvolumes), regular files and directories: 257
+        minimum number: 256 (for subvolumes), regular files and directories: 257,
+        maximum number: (2\:sup:`64` - 256)
+
+        The inode numbers that can be assigned to user created files are from
+        the whole 64bit space except first 256 and last 256 in that range that
+        are reserved for internal b-tree identifiers.

 maximum file length
-        inherent limit of btrfs is 2\ :sup:`64` (16 EiB) but the linux VFS limit is 2\ :sup:`63` (8 EiB)
+        inherent limit of BTRFS is 2\ :sup:`64` (16 EiB) but the practical
+        limit of Linux VFS is 2\ :sup:`63` (8 EiB)

 maximum number of subvolumes
-        the subvolume ids can go up to 2\ :sup:`64` but the number of actual subvolumes
-        depends on the available metadata space, the space consumed by all subvolume
-        metadata includes bookkeeping of shared extents can be large (MiB, GiB)
+        the subvolume ids can go up to 2\ :sup:`48` but the number of actual subvolumes
+        depends on the available metadata space
+
+        The space consumed by all subvolume metadata includes bookkeeping of
+        shared extents can be large (MiB, GiB). The range is not the full 64bit
+        range because of qgroups that use the upper 16 bits for another
+        purposes.

 maximum number of hardlinks of a file in a directory
        65536 when the *extref* feature is turned on during mkfs (default), roughly
--- a/Documentation/ch-swapfile.rst
+++ b/Documentation/ch-swapfile.rst
@ -1,6 +1,6 @@
 A swapfile is file-backed memory that the system uses to temporarily offload
 the RAM.  It is supported since kernel 5.0. Use ``swapon(8)`` to activate the
-swapfile. There are some limitations of the implementation in BTRFS and linux
+swapfile. There are some limitations of the implementation in BTRFS and Linux
 swap subsystem:

 * filesystem - must be only single device