Commit graph

93 commits

Author SHA1 Message Date
David Sterba 1c551e22cf btrfs-progs: make all parameters of rb_tree search/insert const
Tree comparators never change parameters, make them all const and also
change the rb-tree prototypes.

Signed-off-by: David Sterba <dsterba@suse.com>
2024-03-12 21:43:54 +01:00
Qu Wenruo e54514aaea btrfs-progs: fix stray fd close in open_ctree_fs_info()
[BUG]
Although commit b2a1be83b8 ("btrfs-progs: mkfs: keep file descriptors
open during whole time") is making sure we're only closing the writeable
fds after the fs is properly created, there is still a missing fd not
following the requirement.

And this explains the issue why sometimes after mkfs.btrfs, lsblk still
doesn't give a valid uuid.

Shown by the strace output (the command is "mkfs.btrfs -f
/dev/test/scratch1"):

  openat(AT_FDCWD, "/dev/test/scratch1", O_RDWR) = 5 <<< Writeable open
  fadvise64(5, 0, 0, POSIX_FADV_DONTNEED) = 0
  sysinfo({uptime=2529, loads=[8704, 6272, 2496], totalram=4104548352, freeram=3376611328, sharedram=9211904, bufferram=43016192, totalswap=3221221376, freeswap=3221221376, procs=190, totalhigh=0, freehigh=0, mem_unit=1}) = 0
  lseek(5, 0, SEEK_END)                   = 10737418240
  lseek(5, 0, SEEK_SET)                   = 0
  ......
  close(5)                                = 0 <<< Closed now
  pwrite64(6, "O\250\22\261\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 1163264) = 16384
  pwrite64(6, "\201\316\272\342\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 1179648) = 16384
  pwrite64(6, "K}S\t\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 1196032) = 16384
  pwrite64(6, "\207j$\265\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 1212416) = 16384
  pwrite64(6, "q\267;\336\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 5242880) = 16384
  fsync(6) <<< But we're still writing into the disk.

[CAUSE]
After more digging, it turns out we have a very obvious escape in
open_ctree_fs_info():

	open_ctree_fs_info()
	|- fp = open(oca->filename, flags);
	|- info = __open_ctree_fd();
	|- close(fp);

As later we only do IO using the device fd, this close() seems fine.

But the truth is, for mkfs usage, this fs_info is a temporary one, with
a special magic number for the disk.  And since mkfs is doing writeable
operations, this close() would immediately trigger udev scan.

And since at this stage, the fs is not yet fully created, udev can race
with mkfs, and may get the invalid temporary superblock.

[FIX]
Introduce a new btrfs_fs_info member, initial_fd, for
open_ctree_fs_info() to record the fd.

And on close_ctree(), if we find fs_info::initial_fd is a valid fd, then
close it.

By this, we make sure all writeable fds are only closed after we have
written valid super blocks into the disk.

Issue: #734
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2024-02-08 08:30:37 +01:00
David Sterba d739e3b73a btrfs-progs: kernel-shared: use kmalloc and kfree
All the code in kernel-shared should use the proper memory allocation
helpers.

Signed-off-by: David Sterba <dsterba@suse.com>
2023-11-03 18:04:37 +01:00
Naohiro Aota 8816a65fec btrfs-progs: zoned: check SB zone existence properly
Currently, write_dev_supers() compares the superblock location vs the size
of the device to check if it can write the superblock. This is not correct
for a zoned device, whose superblock location is different than a regular
device.

Introduce check_sb_location() to check if the superblock zone exists for
the zoned case.

Running btrfs check can fail on a certain zoned device setup (e.g,
zone size = 128MB, device size = 16GB).

From generic/330:

  yes | btrfs check --repair --force /dev/nullb1
  [1/7] checking root items
  Fixed 0 roots.
  [2/7] checking extents
  ERROR: zoned: failed to read zone info of 4096 and 4097: Invalid argument
  ERROR: failed to write super block for devid 1: write error: Input/output error
  failed to write new super block err -5
  failed to repair damaged filesystem, aborting

This happens because write_dev_supers() is comparing the original
superblock location vs the device size to check if it can write out a
superblock copy or not.

For the above example, since the first copy location (64MB) < device size
(16GB), it tries to write out the copy. But, the copy must be written into
zone 4096 (512G / zone size (128M) = 4096), which is out of the device.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-21 15:51:06 +02:00
David Sterba b421fdff95 btrfs-progs: move raid-stripe-tree and squota build out of experimental
The kernel patches for RST and squota are queued for 6.7, we need to be
able to test the features so it's not necessary to hide the mkfs support
under experimental build. The kernel may still need debug build to
enable mount.

Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-17 19:33:59 +02:00
David Sterba 21aa6777b2 btrfs-progs: clean up includes, using include-what-you-use
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-03 01:11:57 +02:00
Josef Bacik f4e16e0238 btrfs-progs: update read_tree_block to take a btrfs_parent_tree_check
In the kernel we've added a control struct to handle the different
checks we want to do on extent buffers when we read them.  Update our
copy of read_tree_block to take this as an argument, then update all of
the callers to use the new structure.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-03 01:11:57 +02:00
Josef Bacik a38570f9d6 btrfs-progs: use btrfs_tree_parent_check for btrfs_read_extent_buffer
In the kernel we have a control structure call btrfs_tree_parent_check
to pass around the various sanity checks we have for extent buffers.
Add this to btrfs_tree_parent_check and then update the callers.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-03 01:11:57 +02:00
Josef Bacik ec9cbf2f43 btrfs-progs: add commit_root_sem to btrfs_fs_info
This is used in ctree.c around getting the old root, add this to our
btrfs_fs_info to make it more straightforward to sync ctree.c into
btrfs-progs.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-03 01:11:56 +02:00
Josef Bacik f7271ef547 btrfs-progs: add trans_lock to fs_info
This exists in the kernel, and is touched by ctree.c, add it to the
btrfs_fs_info to make syncing ctree.c more straightforward.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-03 01:11:56 +02:00
Josef Bacik 2d8058ae09 btrfs-progs: replace blocksize with parent argument for btrfs_alloc_tree_block
In the kernel we pass in the parent to btrfs_alloc_tree_block instead of
the blocksize and simply derive the blocksize from the fs_info.  Update
the function to match the kernel's convention and update all of the
callers so we can sync ctree.c easily.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-03 01:11:56 +02:00
Josef Bacik f94ad0c516 btrfs-progs: pass btrfs_trans_handle through btrfs_clear_buffer_dirty
This is the calling convention in the kernel because we track dirty
blocks per transaction instead of globally in the fs_info.  Simply
mirror what we do in the kernel to make it easier to sync ctree.c
locally.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-03 01:11:55 +02:00
Johannes Thumshirn b6490733a8 btrfs-progs: read fs with stripe tree from disk
When encountering a filesystem formatted with the raid stripe tree
feature, read it from disk.

Also add the incompat declaration to the tree printer.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-10-02 18:41:08 +02:00
Anand Jain 66c4c9632f btrfs-progs: prepare the latest device's superblock for commit
Add a flag to copy the superblock of the latest device to the
fs_info::super_copy for the commit process, rather than using the
superblock from the device specified in the argument.

This serves as groundwork to enable recovery from an incomplete
btrfstune -M|m|u|U operation.

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-08-28 17:24:58 +02:00
Anand Jain d46a0ef6a0 btrfs-progs: rename struct open_ctree_flags to open_ctree_args
The struct open_ctree_flags currently holds arguments for
open_ctree_fs_info(), it can be confusing when mixed with a local variable
named open_ctree_flags as below in the function cmd_inspect_dump_tree().

  cmd_inspect_dump_tree()
  ::
  struct open_ctree_flags ocf = { 0 };
  ::
  unsigned open_ctree_flags;

So rename struct open_ctree_flags to struct open_ctree_args.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-07-26 15:00:47 +02:00
David Sterba 22cf63d8ee btrfs-progs: kernel-shared: add helper write_extent_buffer_chunk_tree_uuid
Sync the helper write_extent_buffer_chunk_tree_uuid from kernel.

Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-27 23:40:56 +02:00
David Sterba 339de9b2d7 btrfs-progs: kernel-shared: use write_extent_buffer_fsid where possible
We already have the helper but don't use it everywhere.

Signed-off-by: David Sterba <dsterba@suse.com>
2023-06-27 16:29:58 +02:00
Qu Wenruo 33a21e7578 btrfs-progs: tune: rework the main idea of csum change
The existing attempt for changing csum types is as the following:

- Create a new temporary csum root
- Generate new data csums into the temporary csum root
- Drop the old csum tree and make the temporary one as csum root
- Change the checksums for metadata in-place

Unfortunately after some experiments, the csum root switch method has a
big pitfall, the backref items in extent tree.

Those backref items still point back to the old tree, meaning without a
lot of extra tricks, the extent tree would be corrupted.

Thus we have to go a new single tree variant:

- Generate new data csums into the csum root
  The new data csums would have a different objectid to distinguish
  them.
- Drop the old data csum items
- Change the key objectids of the new csums
- Change the checksums for metadata in-place

This means unfortunately we have to revert most of the old code, and
update the temporary item format.

The new temporary item would only record the target csum type.
At every stage we have a method to determine the progress, thus no need
for an item, but in the future it's still open for change.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:32 +02:00
David Sterba 8f33e591e9 btrfs-progs: partial sync of ctree.c from kernel
Sync checksum helpers, extent buffer helpers with kernel 6.4-rc1.

Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:31 +02:00
Qu Wenruo 46364d3766 btrfs-progs: replace write_and_map_eb() by write_data_to_disk()
The function write_and_map_eb() is quite abused as a way to write any
generic buffer back to disk.

But we have a more suitable function already, write_data_to_disk().

This patch would remove the abused write_data_to_disk() calls, and
convert the only three valid call sites to write_data_to_disk() instead.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:31 +02:00
Josef Bacik e150c843ea btrfs-progs: sync tree-checker.[ch] from kernel
This syncs tree-checker.c from the kernel.  The main modification was to
add a open ctree flag to skip the deeper leaf checks, and plumbing this
through tree-checker.c.  We need this for things like fsck or
btrfs-image that need to work with slightly corrupted file systems, and
these checks simply make us unable to look at the corrupted blocks.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:30 +02:00
Josef Bacik 361e4bea13 btrfs-progs: rename btrfs_check_* to __btrfs_check_*
These helpers are called __btrfs_check_* in the kernel as they return
the special enum to indicate what part of the leaf/node failed.  Rename
the uses in btrfs-progs to match the kernel naming convention to make it
easier to sync that code.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:30 +02:00
Josef Bacik a474379935 btrfs-progs: add a btrfs_read_extent_buffer helper
This exists in the kernel to do the read on an extent buffer we may have
already looked up and initialized.  Simply create this helper by
extracting out the existing code from read_tree_block and make
read_tree_block call this helper.  This gives us the helper we need to
sync ctree.c into btrfs-progs, and keeps the code the same in
btrfs-progs.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:30 +02:00
Josef Bacik e176d5eab6 btrfs-progs: add an atomic parameter to btrfs_buffer_uptodate
We have this extra parameter in the kernel to indicate if we are atomic
and thus can't lock the io_tree when checking the transid for an extent
buffer.  This isn't necessary in btrfs-progs, but to allow for easier
syncing of ctree.c add this argument to our copy of btrfs_buffer_uptodate.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:30 +02:00
Josef Bacik 5d84ee58e9 btrfs-progs: update arguments of find_extent_buffer
In the kernel we only take a bytenr for this as the extent buffer cache
is indexed on bytenr.  Since we're passing in the btrfs_fs_info we can
simply use the ->nodesize for the blocksize, and drop the blocksize
argument completely.  This brings us into parity with the kernel, which
will allow the syncing of ctree.c.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:30 +02:00
Josef Bacik b3477244f9 btrfs-progs: update read_tree_block to match the kernel definition
The in-kernel version of read_tree_block adds some extra sanity checks
to make sure we don't return blocks that don't match what we expect.
This includes the owning root, the level, and the expected first key.
We don't actually do these checks in btrfs-progs, however kernel code
we're going to sync will expect this calling convention, so update it to
match the in-kernel code and then update all the callers.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:30 +02:00
Josef Bacik 6da52f41c7 btrfs-progs: pass root_id for btrfs_free_tree_block
In the kernel we pass in the root_id for btrfs_free_tree_block instead
of the root itself.  Update the btrfs-progs version of the helper to
match what we do in the kernel.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:30 +02:00
Josef Bacik 656938b665 btrfs-progs: rename clear_extent_buffer_dirty to btrfs_clear_buffer_dirty
This is a mirror of the change I've done in the kernel, but in progs
it's even more simply because clean_tree_block was just a wrapper around
clear_extent_buffer_dirty.  Change this to btrfs_clear_buffer_dirty, and
then update all the callers to use this helper instead of
clean_tree_block and clear_extent_buffer_dirty.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:29 +02:00
Josef Bacik 5780714b58 btrfs-progs: add btrfs_locking_nest to btrfs_alloc_tree_block
This is how btrfs_alloc_tree_block is defined in the kernel, so when we
go to sync this code in it'll be easier if we're already setup to accept
this argument.  Since we're in progs we don't care about nesting so just
use BTRFS_NORMAL_NESTING everywhere, as we sync in the kernel code it'll
get updated to whatever is appropriate.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:29 +02:00
Josef Bacik 006baaecdd btrfs-progs: rename btrfs_alloc_free_block to btrfs_alloc_tree_block
This is in keeping with what the function actually does, and is named
this way in the kernel.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:29 +02:00
Josef Bacik 8e427ada49 btrfs-progs: copy btrfs_root::state from kernel
We changed from members in the root for all the different flags to a
bit based flag system.  In order to make syncing the kernel code into
btrfs-progs easier go ahead and sync in the bits we use and update all
the users of the old ->track_dirty and ->ref_cows to use the state bits.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:29 +02:00
Josef Bacik 4a9a8f2a8a btrfs-progs: sync extent-io-tree.[ch] and misc.h from the kernel
This is a bit larger than the previous syncs, because we use
extent_io_tree's everywhere.  There's a lot of stuff added to
kerncompat.h, and then I went through and cleaned up all the API
changes, which were

- extent_io_tree_init takes an fs_info and an owner now.
- extent_io_tree_cleanup is now extent_io_tree_release.
- set_extent_dirty takes a gfpmask.
- clear_extent_dirty takes a cached_state.
- find_first_extent_bit takes a cached_state.

The diffstat looks insane for this, but keep in mind extent-io-tree.c
and extent-io-tree.h are ~2000 loc just by themselves.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:29 +02:00
Josef Bacik b0a4eab561 btrfs-progs: remove parent_key arg from btrfs_check_* helpers
Now that this is unused by these helpers and only used by the repair
related code we can remove this argument from the main helpers.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:28 +02:00
Josef Bacik d36571ed9a btrfs-progs: remove fs_info argument from btrfs_check_* helpers
This can be pulled out of the extent buffer that is passed in, drop the
fs_info argument from the function.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-05-26 18:02:28 +02:00
Qu Wenruo 68a04bc710 btrfs-progs: tune: add new option to convert back to extent tree
With previous btrfstune support to convert to block-group-tree, it has
implemented most of the infrastructure for bi-directional conversion.

This patch will implement the remaining conversion support to go back to
extent tree.

The modification includes:

- New convert_to_extent_tree() function in btrfstune.c
  It's almost the same as convert_to_bg_tree(), but with small changes:
  * No need to set extra features like NO_HOLES/FST.
  * Need to delete the block group tree when everything finished.

- Update btrfs_delete_and_free_root() to handle non-global roots
  Currently the function can only accepts global roots (extent/csum/free
  space trees)

  If we pass a non-global root into the function, we will screw up
  global_roots_tree and crash.

  Since we're going to use btrfs_delete_and_free_root() to free block
  group tree which is not a global tree, this is needed.

- New handling for half converted fs in get_last_converted_bg()
  There are two cases need to be handled:

  * The bg tree is already empty
    We need to grab the first bg in extent tree.
    Or at conversion function we will fail at grabbing the first bg.

  * The bg tree is not empty
    Then we need to grab the last bg in extent tree.

- Extra root switching in involved functions. This involves:

  * read_converting_block_groups()
  * insert_block_group_item()
  * update_block_group_item()

  We just need to update our target root according to the current
  compat_ro and super flags.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2023-04-19 01:10:24 +02:00
David Sterba 9be33f558c btrfs-progs: tune: update checksum conversion
The checksum conversion is still experimental and still does not convert
all filesystems correctly. Do not use on valuable data.

Previous implementation copied the UUID conversion which was not a good
base for the checksum conversion so it left out basically all trees
except extent and checksum.

This update adds the base for the required safety features:

- let the old csum tree intact until the full conversion is done (ie.
  all data are still verifiable)
- add on-disk status tracking item, this should keep the from/to
  checksum conversion, last generation to catch potential updates of the
  underlying filesystem if conversion is interrupted and the filesystem
  mounted
- convert most of the fundamental trees, the subvolumes, tree log and
  relocation trees are not converted
- trees are converted in-place to avoid potentially running out of space
  but this might be better done by transaction protection with a
  temporary tree

Known issues:

- not all trees are converted
- not all checksums are correctly inserted into the new tree and reading
  the files leads to EIO

Issue: #438
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-28 20:11:22 +01:00
Josef Bacik fac1fae3ef btrfs-progs: rename extent buffer flags to EXTENT_BUFFER_*
We have been overloading the extent_state flags for use on the extent
buffers as well.  When we sync extent-io-tree.[ch] this will become
impossible, so rename these flags to EXTENT_BUFFER_* and use those
definitions instead of the extent_state definitions.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-28 18:57:44 +01:00
Josef Bacik 20d88c17e7 btrfs-progs: move extent cache code directly into btrfs_fs_info
We have some extra features in the btrfs-progs copy of the
extent_io_tree that don't exist in the kernel.  In order to make syncing
easier simply move this functionality into btrfs_fs_info, that way we
can sync in the new extent_io_tree code and not have to worry about
breaking anything.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-28 18:57:44 +01:00
Josef Bacik 412eea9e97 btrfs-progs: do not pass io_tree into verify_parent_transid
We do not use the io_tree, don't bother passing it into
verify_parent_transid.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-28 18:57:44 +01:00
Josef Bacik ccee633f3a btrfs-progs: move dirty eb tracking to it's own io_tree
btrfs-progs has a cache tree embedded in the extent_io_tree in order to
track extent buffers.  We use the extent_io_tree part to track dirty,
and the cache tree to keep the extent buffers in.  When we sync
extent-io-tree.[ch] we'll lose this ability, so separate out the dirty
tracking into its own extent_io_tree.  Subsequent patches will adjust
the extent buffer lookup so it doesn't use the custom extent_io_tree
thing.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-28 18:57:43 +01:00
Josef Bacik af30cf2e3e btrfs-progs: make the find extent buffer helpers take fs_info
This is a cleanup patch to make syncing the btrfs kernel code into
btrfs-progs easier.  In btrfs-progs we have an extra cache in the
extent_io_tree that's exclusively used for the extent buffer tracking.
In order to untangle this dependency start passing around the fs_info to
search for extent_buffers, and then have the helpers use the appropriate
structure to find the extent buffer.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-11-28 18:57:43 +01:00
David Sterba c2be0e2ce0 btrfs-progs: use template for out of memory error messages
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:09 +02:00
David Sterba feef6aaaf6 btrfs-progs: kernel-lib: remove radix-tree
The radix-tree is not used in userspace code. In kernel it's for
tracking unpersisted and in-memory structures and has been replaced by
the xarray.

Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:08:07 +02:00
Qu Wenruo d8f3355734 btrfs-progs: unexport csum_tree_block()
The function csum_tree_block() is not really utilized by anyone, all
current callers just use csum_tree_block_size().

Furthermore there is a stale definition in common/utils.h which is using
the old "struct btrfs_root" as the first argument, while we have already
migrated to "struct btrfs_fs_info".

So just unexport csum_tree_block() and remove the stale definition.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-10-11 09:06:11 +02:00
Qu Wenruo 2f2f6bfe17 btrfs-progs: btrfstune: add the ability to convert to block group tree feature
The new '-b' option will be responsible for converting to block group
tree compat ro feature.

The workflow looks like this for new convert:

- Setting CHANGING_BG_TREE flag
  And initialize fs_info->last_converted_bg_bytenr value to (u64)-1.

  Any bg with bytenr >= last_converted_bg_bytenr will have its bg item
  update go to the new root (bg tree).

- Iterate each block group by their bytenr in descending order
  This involves:
  * Delete the old bg item from the old tree (extent tree)
  * Update last_converted_bg_bytenr to the bytenr of the bg
  * Add the new bg item into the new tree (bg tree)
  * If we have converted a bunch of bgs, commit current transaction

- Clear CHANGING_BG_TREE flag
  And set the new BLOCK_GROUP_TREE compat ro flag and commit.

And since we're doing the convert in multiple transactions, we also need
to resume from last interrupted convert.

In that case, we just grab the last unconverted bg, and start from it.

And to co-operate with the new kernel requirement for both no-holes and
free-space-tree features, the convert tool will check for
free-space-tree feature. If not enabled, will error out with an error
message to how to continue (by mounting with "-o space_cache=v2").

For missing no-holes feature, we just need to set the flag during
convert.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-12 18:25:32 +02:00
Qu Wenruo 1430b41427 btrfs-progs: separate block group tree from extent tree v2
Block group tree feature is completely a standalone feature, and it has
been over 5 years before the initial introduction to solve the long
mount time.

I don't really want to waste another 5 years waiting for a feature which
may or may not work, but definitely not properly reviewed for its
preparation patches.

So this patch will separate the block group tree feature into a
standalone compat RO feature.

There is a catch, in mkfs create_block_group_tree(), current
tree-checker only accepts block group item with valid chunk_objectid,
but the existing code from extent-tree-v2 didn't properly initialize it.

This patch will also fix above mentioned problem so kernel can mount it
correctly.

Now mkfs/fsck should be able to handle the fs with block group tree.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-12 18:25:32 +02:00
Qu Wenruo c5a21a7814 btrfs-progs: don't save block group root into super block
The extent tree v2 (thankfully not yet fully materialized) needs a
new root for storing all block group items.

My initial proposal years ago just added a new tree rootid, and load it
from tree root, just like what we did for quota/free space tree/uuid/extent
roots.

But the extent tree v2 patches introduced a completely new (and to me,
wasteful) way to store block group tree root into super block.

Currently there are only 3 trees stored in super blocks, and they all
have their valid reasons:

- Chunk root
  Needed for bootstrap.

- Tree root
  Really the entrance of all trees.

- Log root
  This is special as log root has to be updated out of existing
  transaction mechanism.

There is not even any reason to put block group root into super blocks,
the block group tree is updated at the same timing as old extent tree,
no need for extra bootstrap/out-of-transaction update.

So just move block group root from super block into tree root.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-09-12 15:31:27 +02:00
Qu Wenruo 75fea7496c btrfs-progs: use write_data_to_disk() to handle RAID56 in write_and_map_eb()
Function write_data_to_disk() can handle RAID56 writes without any
problem.

So just call write_data_to_disk() inside write_and_map_eb() instead of
manually doing the RAID56 write.

Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-08-16 15:18:12 +02:00
Qu Wenruo fc6925bfd3 btrfs-progs: avoid repeated data write for metadata
[BUG]
Shinichiro reported that "mkfs.btrfs -m DUP" is doing repeated write
into the device.
For non-zoned device this is not a big deal, but for zoned device this
is critical, as zoned device doesn't support overwrite at all.

[CAUSE]
The problem is related to write_and_map_eb() call, since commit
2a93728391 ("btrfs-progs: use write_data_to_disk() to replace
write_extent_to_disk()"), we call write_data_to_disk() for metadata
write back.

But the problem is, write_data_to_disk() will call btrfs_map_block()
with rw = WRITE.

By that btrfs_map_block() will always return all stripes, while in
write_data_to_disk() we also iterate through each mirror of the range.

This results above repeated writeback.

[FIX]
Fix this problem by completely remove @mirror argument
from write_data_to_disk().
With extra comments to explicitly show that function will write to
all mirrors.

Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Fixes: 2a93728391 ("btrfs-progs: use write_data_to_disk() to replace write_extent_to_disk()")
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-08-16 15:18:12 +02:00
Qu Wenruo 3ff9d35257 btrfs-progs: use read_data_from_disk() to replace read_extent_from_disk() and replace read_extent_data()
The function read_extent_from_disk() is only a wrapper to read tree
block.

And read_extent_data() is just a while loop to eliminate short read
caused by stripe boundary.

In fact, a lot of call sites of read_extent_data() are either reading
metadata (thus no possible short read) or doing extra loop by
themselves.

This patch will replace those two functions with read_data_from_disk(),
making it the only entrance for data/metadata read.
And update read_data_from_disk() to return the read bytes, so caller can
do a simple while loop.

For the few callers of read_extent_data(), open-code a small while loop
for them.

This will allow later RAID56 read repair using P/Q much easier.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2022-04-25 19:08:30 +02:00