btrfs-progs: docs: updates, clarifications

Update spelling, add notable kernel/version features or updates.

Signed-off-by: David Sterba <dsterba@suse.com>
This commit is contained in:
David Sterba 2022-12-14 02:56:16 +01:00
parent 719b5a592f
commit 8cec98cb75
9 changed files with 160 additions and 28 deletions

View file

@ -2,7 +2,13 @@ Auto-repair on read
===================
Data or metadata that are found to be damaged (e.g. because the checksum does
not match) at the time they're read from the device can be salvaged in case the
not match) at the time they're read from a device can be salvaged in case the
filesystem has another valid copy when using block group profile with redundancy
(DUP, RAID1, RAID5/6). The correct data are returned to the user application
and the damaged copy is replaced by it.
(DUP, RAID1-like, RAID5/6). The correct data are returned to the user application
and the damaged copy is replaced by it. When this happen a message is emitted
to the system log.
If there are more copies of data and one of them is damaged but not read by
user application then this is not detected. To verify all data and metadata
copies there's :doc:`scrub<Scrub>` that needs to be started manually, automatic
repairs happens in that case.

View file

@ -6,14 +6,15 @@ call interface to let user applications access the advanced features. They're
low level and the following list gives only an overview of the capabilities or
a command if available:
- reverse lookup, from file offset to inode, ``btrfs inspect-internal
logical-resolve``
- reverse lookup, from file offset to inode, as command ``btrfs
inspect-internal logical-resolve``
- resolve inode number to list of name, ``btrfs inspect-internal inode-resolve``
- resolve inode number to list of names, as command ``btrfs inspect-internal inode-resolve``
- tree search, given a key range and tree id, lookup and return all b-tree items
found in that range, basically all metadata at your hand but you need to know
what to do with them
what to do with them, the ioctl is privileged as it has full access to all
filesystem metadata
- informative, about devices, space allocation or the whole filesystem, many of
which is also exported in ``/sys/fs/btrfs``

View file

@ -41,8 +41,8 @@ File based deduplication
------------------------
The tool takes a list of files and tries to find duplicates among data only
from that files. This is suitable e.g. for files that originated from the same
base image, source of a reflinked file. Optionally the tools could track a
from these files. This is suitable e.g. for files that originated from the same
base image, source of a reflinked file. Optionally the tool could track a
database of hashes and allow to deduplicate blocks from more files, or use that
for repeated runs and update the database incrementally.
@ -53,7 +53,7 @@ The tool typically scans the filesystem and builds a database of file block
hashes, then finds candidate files and deduplicates the ranges. The hash
database is kept as an ordinary file and can be scaled according to the needs.
As the file changes, the hash database may get out of sync and the scan has to
As the files change, the hash database may get out of sync and the scan has to
be done repeatedly.
Safety of block comparison
@ -64,12 +64,13 @@ a source file, destination file and the range. The blocks from both files are
compared for exact match before merging to the same range (i.e. there's no
hash based comparison). Pages representing the extents in memory are locked
prior to deduplication and prevent concurrent modification by buffered writes
or mmaped writes.
or mmaped writes. Blocks are compared byte by byte and not using any hash-based
approach, i.e. the existing checksums are not used.
Limitations, compatibility
--------------------------
Files that are subject do deduplication must have the same status regarding
Files that are subject to deduplication must have the same status regarding
COW, i.e. both regular COW files with checksums, or both NOCOW, or files that
are COW but don't have checksums (NODATASUM attribute is set).

View file

@ -384,10 +384,10 @@ features see [[Status]] page.
5.17 - send and relocation
Send and relocation (balance, device remove, shrink, block group
reclaim) can now work in parallel
reclaim) can now work in parallel.
5.17 - device add vs balance
It is possible to add a device with paused balance
It is possible to add a device with paused balance.
.. note::
Since kernel 5.17.7 and btrfs-progs 5.17.1
@ -414,11 +414,116 @@ features see [[Status]] page.
the VFS limitation to reflink files on separate subvolume mounts of the
same filesystem has been removed
5.18 - syslog error messages with filesystem state
Messages are printed with a one letter tag ("state: X") that denotes in
which state the filesystem was at this point:
* A - transaction aborted (permanent)
* E - filesystem error (permanent)
* M - remount in progress (transient)
* R - device replace in progress (transient)
* C - checksum checks disabled by mount option (rescue=ignoredatacsums)
* L - log tree replay did not complete due to some error
5.18 - tree-checker verifies transaction id pre-write
Metadata buffer to be written gets an extra check if the stored
transaction number matches the current state of the filesystem.
5.19 - subpage support pages > 4KiB
Metadata node size is supported regardless of the CPU page size
(minimum size is 4KiB), data sectorsize is supported <= page size.
(minimum size is 4KiB), data sector size is supported <= page size.
Additionally subpage also supports RAID56.
5.19 - per-type background threshold for reclaim
Add sysfs tunable for background reclaim threshold for all block group
types (data, metadata, system).
5.19 - automatically repair device number mismatch
Device information is storead in two places, the number in the super
block and items in the device tree. When this is goes out of sync, e.g.
by device removal short before unmount, the next mount could fail.
The b-tree is an authoritative information an can be used to override
the stale value in the superblock.
5.19 - defrag can convert inline files to regular ones
The logic has been changed so that inline files are considered for
defragmentation even if the mount option max_inline would prevent that.
No defragmentation might happen but the inlined files are not skipped.
5.19 - explicit minimum zone size is 4MiB
Set the minimum limit of zone on zoned devices to 4MiB. Real devices
zones are much larger, this is for emulated devices.
5.19 - sysfs tunable for automatic block group reclaim
Add possibility to set a threshold to automatically reclaim block groups
also in non-zoned mode. By default completely empty block groups are
reclaimed automatically but the threshold can be tuned in
/sys/fs/btrfs/FSID/allocation/PROFILE/bg_reclaim_threshold .
5.19 - tree-checker verifies metadata block ownership
Additional check done by tree-checker to verify relationship between a
tree block and it's tree root owner.
6.x
---
6.0 - send protocol v2
Send protocol update that adds new commands and extends existing
functionality to write large data chunks. Compressed (and encrypted)
extents can be optionally emitted and transfered as-is without the need
to recompress (or reencrypt) on the receiving side.
6.0 - sysfs exports commit stats
The file /sys/fs/btrfs/FSID/commit_stats shows number of commits and
various time related statistics.
6.0 - sysfs exports chunk sizes
Chunk size value can be read from
/sys/fs/btrfs/FSID/allocation/PROFILE/chunk_size .
6.0 - sysfs shows zoned mode among features
The zoned mode has been supported since 5.10 and adding functionality.
Now it's advertised among features.
6.0 - checksum implementation is logged at mount time
When a filesystem is mounted the implementation backing the checksums
is logged. The information is also accessible in
/sys/fs/btrfs/FSID/checksum .
6.1 - sysfs support to temporarily skip exact qgroup accounting
Allow user override of qgroup accounting and make it temporarily out
of date e.g. in case when there are several subvolumes deleted and the
qgroup numbers need to be updated at some cost, an update after that
can amortize the costs.
6.1 - scrub also repairs superblock
An improvement to scrub in case the superblock is detected to be
corrupted, the repair happens immediately. Previously it was delayed
until the next transaction commit for performance reasons that would
store an updated and correct copy eventually.
6.1 - block group tree
An incompatible change that has to be enabled at mkfs time. Add a new
b-tree item that stores information about block groups in a compact way
that significantly improves mount time that's usually long due to
fragmentation and scatterd b-tree items tracking the individual block
groups. Requires and also enables the free-space-tree and no-holes
features.
6.1 - discard stats available in sysfs
The directory '/sys/fs/btrfs/FSID/discard' exports statistics and
tunables related to discard.
6.1 - additional qgroup stats in sysfs
The overall status of qgroups are exported in
/sys/sys/fs/btrfs/FSID/qgroups/ .
6.1 - check that subperblock is unchnaged at thaw time
Do full check of super block once a filesystem is thawed. This namely
happens when system resumes from suspend or hibernation. Accidental
change by other operating systems will be detected.
6.2 - discard=async on by default
Devices that support trim/discard will enable the asynchronous discard
for the whole filesystem.

View file

@ -19,7 +19,7 @@ balance
again. It is primarily intended to rebalance the data in the filesystem
across the *devices* when a device is added or removed. A balance
will regenerate missing copies for the redundant *RAID* levels, if a
device has failed. As of linux kernel 3.3, a balance operation can be
device has failed. As of Linux kernel 3.3, a balance operation can be
made selective about which parts of the filesystem are rewritten.
barrier

View file

@ -163,7 +163,7 @@ fd
ignored
name
name of the subvolume, although the buffer can be almost 4k, the file
size is limited by linux VFS to 255 characters and must not contain a slash
size is limited by Linux VFS to 255 characters and must not contain a slash
('/')
BTRFS_IOC_SUBVOL_CREATE_V2
@ -190,7 +190,7 @@ qgroup_inherit
...
name
name of the subvolume, although the buffer can be almost 4k, the file size
is limited by linux VFS to 255 characters and must not contain a slash ('/')
is limited by Linux VFS to 255 characters and must not contain a slash ('/')
devid
...

View file

@ -4,8 +4,11 @@ booting from BTRFS with respect to features.
U-boot (https://www.denx.de/wiki/U-Boot/) has decent support for booting but
not all BTRFS features are implemented, check the documentation.
EXTLINUX (from the https://syslinux.org project) can boot but does not support
all features. Please check the upstream documentation before you use it.
EXTLINUX (from the https://syslinux.org project) has limited support for BTRFS
boot and hasn't been updated for for a long time so is not recommended as
bootloader.
The first 1MiB on each device is unused with the exception of primary
superblock that is on the offset 64KiB and spans 4KiB.
In general, the first 1MiB on each device is unused with the exception of
primary superblock that is on the offset 64KiB and spans 4KiB. The rest can be
freely used by bootloaders or for other system information. Note that booting
from a filesystem on :doc:`zoned device<Zoned-mode>` is not supported.

View file

@ -1,6 +1,9 @@
maximum file name length
255
This limit is imposed by Linux VFS, the strucutres of BTRFS could store
larger file names.
maximum symlink target length
depends on the *nodesize* value, for 4KiB it's 3949 bytes, for larger nodesize
it's 4095 due to the system limit PATH_MAX
@ -13,16 +16,29 @@ maximum number of inodes
2\ :sup:`64` but depends on the available metadata space as the inodes are created
dynamically
Each subvolume is an independent namespace of inodes and thus their
numbers, so the limit is per subvolume, not for the whole filesystem.
inode numbers
minimum number: 256 (for subvolumes), regular files and directories: 257
minimum number: 256 (for subvolumes), regular files and directories: 257,
maximum number: (2\:sup:`64` - 256)
The inode numbers that can be assigned to user created files are from
the whole 64bit space except first 256 and last 256 in that range that
are reserved for internal b-tree identifiers.
maximum file length
inherent limit of btrfs is 2\ :sup:`64` (16 EiB) but the linux VFS limit is 2\ :sup:`63` (8 EiB)
inherent limit of BTRFS is 2\ :sup:`64` (16 EiB) but the practical
limit of Linux VFS is 2\ :sup:`63` (8 EiB)
maximum number of subvolumes
the subvolume ids can go up to 2\ :sup:`64` but the number of actual subvolumes
depends on the available metadata space, the space consumed by all subvolume
metadata includes bookkeeping of shared extents can be large (MiB, GiB)
the subvolume ids can go up to 2\ :sup:`48` but the number of actual subvolumes
depends on the available metadata space
The space consumed by all subvolume metadata includes bookkeeping of
shared extents can be large (MiB, GiB). The range is not the full 64bit
range because of qgroups that use the upper 16 bits for another
purposes.
maximum number of hardlinks of a file in a directory
65536 when the *extref* feature is turned on during mkfs (default), roughly

View file

@ -1,6 +1,6 @@
A swapfile is file-backed memory that the system uses to temporarily offload
the RAM. It is supported since kernel 5.0. Use ``swapon(8)`` to activate the
swapfile. There are some limitations of the implementation in BTRFS and linux
swapfile. There are some limitations of the implementation in BTRFS and Linux
swap subsystem:
* filesystem - must be only single device