btrfs-progs: docs: updates

- group features on status page
- update developer docs
- add cross references

Signed-off-by: David Sterba <dsterba@suse.com>
This commit is contained in:
David Sterba 2023-09-06 17:03:49 +02:00
parent 2726a83952
commit b40943dea4
9 changed files with 262 additions and 110 deletions

View file

@ -13,7 +13,7 @@ in meeting your performance expectations for your specific workload.
Combination of features can vary in performance, the table does not
cover all possibilities.
**The table is based on the latest released linux kernel: 6.4**
**The table is based on the latest released linux kernel: 6.5**
The columns for each feature reflect the status of the implementation
in following ways:
@ -43,26 +43,34 @@ in following ways:
- Stability
- Performance
- Notes
* - :doc:`discard (synchronous)<Trim>`
* - :doc:`Subvolumes, snapshots<Subvolumes>`
- :statusok:`OK`
- OK
-
- mounted with `-o discard` (has performance implications), also see `fstrim`
* - :doc:`discard (asynchronous)<Trim>`
- :statusok:`OK`
-
- mounted with `-o discard=async` (improved performance)
* - Autodefrag
- :statusok:`OK`
-
-
* - :doc:`Defrag<Defragmentation>`
- :statusmok:`mostly OK`
-
- extents get unshared (see below)
* - :doc:`Compression<Compression>`
- :statusok:`OK`
-
-
* - :doc:`Checksumming algorithms<Checksumming>`
- :statusok:`OK`
- OK
-
* - :doc:`Defragmentation<Defragmentation>`
- :statusmok:`mostly OK`
-
- extents get unshared (see below)
* - :ref:`Autodefrag<mount-option-autodefrag>`
- :statusok:`OK`
-
-
* - :doc:`Discard (synchronous)<Trim>`
- :statusok:`OK`
-
- mounted with `-o discard` (has performance implications), also see `fstrim`
* - :doc:`Discard (asynchronous)<Trim>`
- :statusok:`OK`
-
- mounted with `-o discard=async` (improved performance)
* - :doc:`Out-of-band dedupe<Deduplication>`
- :statusok:`OK`
- :statusmok:`mostly OK`
@ -71,10 +79,14 @@ in following ways:
- :statusok:`OK`
- :statusmok:`mostly OK`
- (reflink), heavily referenced extents have a noticeable performance hit (see below)
* - :doc:`More checksumming algorithms<Checksumming>`
* - :doc:`Filesystem resize<Resize>`
- :statusok:`OK`
- OK
-
- shrink, grow
* - :doc:`Device replace<Volume-management>`
- :statusmok:`mostly OK`
- mostly OK
- (see below)
* - :doc:`Auto-repair<Auto-repair>`
- :statusok:`OK`
- OK
@ -87,18 +99,66 @@ in following ways:
- :statusmok:`mostly OK`
- mostly OK
-
* - :ref:`Degraded mount<mount-option-degraded>`
- :statusok:`OK`
- n/a
-
* - :doc:`Balance<Balance>`
- :statusok:`OK`
- OK
- balance + qgroups can be slow when there are many snapshots
* - :doc:`Send<Send-receive>`
- :statusok:`OK`
- OK
-
* - :doc:`Receive<Send-receive>`
- :statusok:`OK`
- OK
-
* - Offline UUID change
- :statusok:`OK`
- OK
-
* - Metadata UUID change
- :statusok:`OK`
- OK
-
* - :doc:`Seeding<Seeding-device>`
- :statusok:`OK`
- OK
-
* - :doc:`Quotas, qgroups<Qgroups>`
- :statusmok:`mostly OK`
- mostly OK
- qgroups with many snapshots slows down balance
* - :doc:`Swapfile<Swapfile>`
- :statusok:`OK`
- n/a
- with some limitations
* - nodatacow
- :statusok:`OK`
- OK
-
* - :doc:`Device replace<Volume-management>`
* - :doc:`Subpage block size<Subpage>`
- :statusmok:`mostly OK`
- mostly OK
- (see below)
* - Degraded mount
- :statusok:`OK`
- n/a
-
- Also see table below for more detailed compatibility.
* - :doc:`Zoned mode<Zoned-mode>`
- :statusmok:`mostly OK`
- mostly OK
- Not yet feature complete but moderately stable, also see table below
for more detailed compatibility.
Block group profiles
^^^^^^^^^^^^^^^^^^^^
.. list-table::
:header-rows: 1
* - Feature
- Stability
- Performance
- Notes
* - :ref:`Single (block group profile)<mkfs-section-profiles>`
- :statusok:`OK`
- OK
@ -131,50 +191,59 @@ in following ways:
- :statusunstable:`unstable`
- n/a
- (see below)
* - Mixed block groups
* - :ref:`Mixed block groups<mkfs-feature-mixed-bg>`
- :statusok:`OK`
- OK
-
* - :doc:`Filesystem resize<Resize>`
- :statusok:`OK`
- OK
- shrink, grow
* - :doc:`Balance<Balance>`
- :statusok:`OK`
- OK
- balance + qgroups can be slow when there are many snapshots
* - Offline UUID change
On-disk format
^^^^^^^^^^^^^^
Features that are typically set at *mkfs* time (sometimes can be changed or
converted later).
.. list-table::
:header-rows: 1
* - Feature
- Stability
- Performance
- Notes
* - :ref:`extended-refs<mkfs-feature-extended-refs>`
- :statusok:`OK`
- OK
-
* - Metadata UUID change
* - :ref:`skinny-metadata<mkfs-feature-skinny-metadata>`
- :statusok:`OK`
- OK
-
* - :doc:`Subvolumes, snapshots<Subvolumes>`
* - :ref:`no-holes<mkfs-feature-no-holes>`
- :statusok:`OK`
- OK
-
* - :doc:`Send<Send-receive>`
* - :ref:`Free space tree<mkfs-feature-free-space-tree>`
- :statusok:`OK`
- OK
-
* - :doc:`Receive<Send-receive>`
* - :ref:`Block group tree`<mkfs-feature-block-group-tree>`
- :statusok:`OK`
- OK
-
* - :doc:`Seeding<Seeding-device>`
- :statusok:`OK`
- OK
-
* - :doc:`Quotas, qgroups<Qgroups>`
- :statusmok:`mostly OK`
- mostly OK
- qgroups with many snapshots slows down balance
* - :doc:`Swapfile<Swapfile>`
- :statusok:`OK`
- n/a
- with some limitations
Interoperability
^^^^^^^^^^^^^^^^
Integration with other Linux features or external systems.
:doc:`See also<Interoperability>`.
.. list-table::
:header-rows: 1
* - Feature
- Stability
- Performance
- Notes
* - :ref:`NFS<interop-nfs>`
- :statusok:`OK`
- OK
@ -183,10 +252,6 @@ in following ways:
- :statusok:`OK`
- OK
- IO controller
* - :ref:`Samba<interop-samba>`
- :statusok:`OK`
- OK
- compression, server-side copies, snapshots
* - :ref:`io_uring<interop-io-uring>`
- :statusok:`OK`
- OK
@ -199,35 +264,10 @@ in following ways:
- :statusok:`OK`
- OK
-
* - :ref:`Free space tree<mkfs-feature-free-space-tree>`
- :statusok:`OK`
-
-
* - Block group tree
- :statusok:`OK`
-
-
* - :ref:`no-holes<mkfs-feature-no-holes>`
* - :ref:`Samba<interop-samba>`
- :statusok:`OK`
- OK
-
* - :ref:`skinny-metadata<mkfs-feature-skinny-metadata>`
- :statusok:`OK`
- OK
-
* - :ref:`extended-refs<mkfs-feature-extended-refs>`
- :statusok:`OK`
- OK
-
* - :doc:`Subpage block size<Subpage>`
- :statusmok:`mostly OK`
- mostly OK
- Also see table below for more detailed compatibility.
* - :doc:`Zoned mode<Zoned-mode>`
- :statusmok:`mostly OK`
- mostly OK
- Not yet feature complete but moderately stable, also see table below
for more detailed compatibility.
- compression, server-side copies, snapshots
Please open an issue if:
@ -256,7 +296,7 @@ with subpage or require another feature to work:
- The max_inline mount option value is ignored, as if mounted with max_inline=0
* - Free space cache v1
- :statusunsupp:`unsupported`
- Free space tree is mandatory, v1 has some assumptions about page size
- Free space tree is mandatory, v1 makes some assumptions about page size
* - Compression
- :statusok:`partial support`
- Only page-aligned ranges can be compressed
@ -303,12 +343,6 @@ are unaffected by the zoned device constraints.
* - Free space tree
- :statusok:`supported`
-
* - single profile
- :statusok:`supported`
- Both data and metadata
* - DUP profile
- :statusok:`partial support`
- Only for metadata
* - Filesystem resize
- :statusok:`supported`
-

View file

@ -150,6 +150,33 @@ DATA STRUCTURES AND DEFINITIONS
__u64 rsv_excl;
};
.. _struct_btrfs_ioctl_fs_info_args:
.. code-block:: c
/* Request information about checksum type and size */
#define BTRFS_FS_INFO_FLAG_CSUM_INFO (1 << 0)
/* Request information about filesystem generation */
#define BTRFS_FS_INFO_FLAG_GENERATION (1 << 1)
/* Request information about filesystem metadata UUID */
#define BTRFS_FS_INFO_FLAG_METADATA_UUID (1 << 2)
struct btrfs_ioctl_fs_info_args {
__u64 max_id; /* out */
__u64 num_devices; /* out */
__u8 fsid[BTRFS_FSID_SIZE]; /* out */
__u32 nodesize; /* out */
__u32 sectorsize; /* out */
__u32 clone_alignment; /* out */
/* See BTRFS_FS_INFO_FLAG_* */
__u16 csum_type; /* out */
__u16 csum_size; /* out */
__u64 flags; /* in/out */
__u64 generation; /* out */
__u8 metadata_uuid[BTRFS_FSID_SIZE]; /* out */
__u8 reserved[944]; /* pad to 1k */
};
.. list-table::
:header-rows: 1
@ -157,10 +184,14 @@ DATA STRUCTURES AND DEFINITIONS
- Value
* - BTRFS_UUID_SIZE
- 16
* - BTRFS_FSID_SIZE
- 16
* - BTRFS_SUBVOL_NAME_MAX
- 4039
* - BTRFS_PATH_NAME_MAX
- 4087
* - BTRFS_VOL_NAME_MAX
- 255
OVERVIEW
--------
@ -296,9 +327,9 @@ LIST OF IOCTLS
* - BTRFS_IOC_DEV_INFO
-
-
* - BTRFS_IOC_FS_INFO
-
-
* - :ref:`BTRFS_IOC_FS_INFO<BTRFS_IOC_FS_INFO>`
- get information about filesystem (device count, fsid, ...)
- :ref:`struct btrfs_ioctl_fs_info_args<struct_btrfs_ioctl_fs_info_args>`
* - BTRFS_IOC_BALANCE_V2
-
-
@ -555,6 +586,26 @@ Change the flags of a subvolume.
* - ioctl args
- uint64_t, either 0 or `BTRFS_SUBVOL_RDONLY`
.. _BTRFS_IOC_FS_INFO:
BTRFS_IOC_FS_INFO
~~~~~~~~~~~~~~~~~
Read internal information about the filesystem. The data can be exchanged
both ways and part of the structure could be optionally filled. The reserved
bytes can be used to get new kind of information in the future, always
depending on the flags set.
.. list-table::
:header-rows: 1
* - Field
- Description
* - ioctl fd
- file descriptor of any file/directory in the filesystem
* - ioctl args
- :ref:`struct btrfs_ioctl_fs_info_args<struct_btrfs_ioctl_fs_info_args>`
.. _BTRFS_IOC_GET_SUBVOL_INFO:
BTRFS_IOC_GET_SUBVOL_INFO

View file

@ -52,6 +52,8 @@ OPTIONS
change fsid stored as *metadata_uuid* to a randomly generated UUID,
see also *-U*
.. _btrfstune-feature-metadata-uuid:
-M <UUID>
(since kernel: 5.0)

View file

@ -68,7 +68,7 @@ No other attributes are supported. For the complete list please refer to the
XFLAGS
^^^^^^
There's overlap of letters assigned to the bits with the attributes, this list
There's an overlap of letters assigned to the bits with the attributes, this list
refers to what ``xfs_io(8)`` provides:
i

View file

@ -27,13 +27,15 @@ acl, noacl
The support for ACL is build-time configurable (BTRFS_FS_POSIX_ACL) and
mount fails if *acl* is requested but the feature is not compiled in.
.. _mount-option-autodefrag:
autodefrag, noautodefrag
(since: 3.0, default: off)
Enable automatic file defragmentation.
When enabled, small random writes into files (in a range of tens of kilobytes,
currently it's 64KiB) are detected and queued up for the defragmentation process.
Not well suited for large database workloads.
May not be well suited for large database workloads.
The read latency may increase due to reading the adjacent blocks that make up the
range for defragmentation, successive write will merge the blocks in the new
@ -170,10 +172,12 @@ datasum, nodatasum
The cost of checksumming of the blocks in memory is much lower than the IO,
modern CPUs feature hardware support of the checksumming algorithm.
.. _mount-option-degraded:
degraded
(default: off)
Allow mounts with less devices than the RAID profile constraints
Allow mounts with fewer devices than the RAID profile constraints
require. A read-write mount (or remount) may fail when there are too many devices
missing, for example if a stripe member is completely missing from RAID0.
@ -261,12 +265,12 @@ flushoncommit, noflushoncommit
one transaction commit.
fragment=<type>
(depends on compile-time option BTRFS_DEBUG, since: 4.4, default: off)
(depends on compile-time option CONFIG_BTRFS_DEBUG, since: 4.4, default: off)
A debugging helper to intentionally fragment given *type* of block groups. The
type can be *data*, *metadata* or *all*. This mount option should not be used
outside of debugging environments and is not recognized if the kernel config
option *BTRFS_DEBUG* is not enabled.
option *CONFIG_BTRFS_DEBUG* is not enabled.
nologreplay
(default: off, even read-only)
@ -287,8 +291,8 @@ max_inline=<bytes>
with a K suffix (case insensitive). In practice, this value
is limited by the filesystem block size (named *sectorsize* at mkfs time),
and memory page size of the system. In case of sectorsize limit, there's
some space unavailable due to leaf headers. For example, a 4KiB sectorsize,
maximum size of inline data is about 3900 bytes.
some space unavailable due to b-tree leaf headers. For example, a 4KiB
sectorsize, maximum size of inline data is about 3900 bytes.
Inlining can be completely turned off by specifying 0. This will increase data
block slack if file sizes are much smaller than block size but will reduce

View file

@ -3,6 +3,11 @@ On-disk Format
This document describes the Btrfs ondisk format.
.. note::
This document contains outdated and incomplete information and has been
copied from the original btrfs.wiki.kernel.org with little review.
Overview
~~~~~~~~

View file

@ -5,12 +5,13 @@ There's some common functionality found in many places like help, parsing
values, sorting, extensible arrays, etc. Not all places are unified and use old
code implementing it manually. Below is list of usable APIs that should be spread
and updated where it's still not. A need for new API might emerge from
cleanups, then it should appear here.
cleanups, then it should appear here. The text below gives pointers and is not
extensive, search the definitions and actual use in other code too.
Option parsing
--------------
Files: common/help.h, common/parse-utils.h
Files: :file:`common/help.h`, :file:`common/parse-utils.h`
Global options need to be processed and consumed by `clean_args_no_options`,
argument count by `check_argc_*`, `usage_*` for handling usage.
@ -18,6 +19,21 @@ argument count by `check_argc_*`, `usage_*` for handling usage.
Options are parsed by `getopt` or `getopt_long`. Individual values from options
are recognized by `parse_*`, basic types and custom types are supported.
Size unit pretty printing
-------------------------
Files: :file:`common/units.h`
Many commands print byte sizes with suffixes and the output format can be
affected by command line options. In the help text the options are specified by
either `HELPINFO_UNITS_SHORT_LONG` (both long and short options) or just
`HELPINFO_UNITS_LONG` in case the short option letters would conflict.
Automatic parsing of the options from *argv* is done by `get_unit_mode_from_arg`.
Printing options is done by `pretty_size_mode` which takes the value and option
mode. Default mode is human readable, the macros defining the modes are from
`UNITS_*` namespace.
TODO
----
@ -33,4 +49,3 @@ Undocumented or incomplete APIs:
* common/string-table.h
* common/string-table.h
* common/task-utils.h
* common/units.h

View file

@ -19,14 +19,36 @@ Data types
Raw data types. Integer values are stored in little endian byte order.
- unsigned int 8bit (u8)
- unsigned int 16bit (u16)
- unsigned int 32bit (u32)
- unsigned int 64bit (u64)
- variable length binary data (data)
- variable length string (string)
- UUID, 16 bytes (uuid)
- time specification, 64bit seconds, 32bit nanoseconds (timespec)
.. list-table::
:header-rows: 1
* - Meaning
- Size
- Name
* - unsigned int
- 8 bit
- u8
* - unsigned int
- 16 bit
- u16
* - unsigned int
- 32 bit
- u32
* - unsigned int
- 64 bit
- u64
* - variable length binary data
- variable
- data
* - variable length string
- variable
- string
* - UUID
- 16 bytes
- uuid
* - time specification
- 64bit seconds, 32bit nanoseconds
- timespec
Stream structure
----------------

View file

@ -79,6 +79,8 @@ OPTIONS
On multiple devices the default is *raid1*.
.. _mkfs-feature-mixed-bg:
-M|--mixed
Normally the data and metadata block groups are isolated. The *mixed* mode
will remove the isolation and store both types in the same block group type.
@ -300,12 +302,29 @@ free-space-tree
(default since btrfs-progs 5.15, kernel support since 4.5)
Enable the free space tree (mount option *space_cache=v2*) for persisting the
free space cache.
free space cache in a b-tree. This is built on top of the COW mechanism
and has better performance than v1.
Offline conversion from filesystems that don't have this feature
enabled at *mkfs* time is possible, see :doc:`btrfstune`.
Online conversion can be done by mounting with ``space_cache=v2``, this
is sufficient to be done one time.
.. _mkfs-feature-block-group-tree:
block-group-tree
(kernel support since 6.1)
Enable the block group tree to greatly reduce mount time for large filesystems.
Enable a dedicated b-tree for block group items, this greatly reduces
mount time for large filesystems due to better data locality that
avoids seeking. On rotational devices the *large* size is considered
starting from the 2-4TiB. Can be used on other types of devices (SSD,
NVMe, ...) as well.
Offline conversion from filesystems that don't have this feature
enabled at *mkfs* time is possible, see :doc:`btrfstune`. Online
conversion is not possible.
.. _mkfs-section-profiles: