btrfs-progs

Author	SHA1	Message	Date
David Sterba	1c551e22cf	btrfs-progs: make all parameters of rb_tree search/insert const Tree comparators never change parameters, make them all const and also change the rb-tree prototypes. Signed-off-by: David Sterba <dsterba@suse.com>	2024-03-12 21:43:54 +01:00
Qu Wenruo	e54514aaea	btrfs-progs: fix stray fd close in open_ctree_fs_info() [BUG] Although commit `b2a1be83b8` ("btrfs-progs: mkfs: keep file descriptors open during whole time") is making sure we're only closing the writeable fds after the fs is properly created, there is still a missing fd not following the requirement. And this explains the issue why sometimes after mkfs.btrfs, lsblk still doesn't give a valid uuid. Shown by the strace output (the command is "mkfs.btrfs -f /dev/test/scratch1"): openat(AT_FDCWD, "/dev/test/scratch1", O_RDWR) = 5 <<< Writeable open fadvise64(5, 0, 0, POSIX_FADV_DONTNEED) = 0 sysinfo({uptime=2529, loads=[8704, 6272, 2496], totalram=4104548352, freeram=3376611328, sharedram=9211904, bufferram=43016192, totalswap=3221221376, freeswap=3221221376, procs=190, totalhigh=0, freehigh=0, mem_unit=1}) = 0 lseek(5, 0, SEEK_END) = 10737418240 lseek(5, 0, SEEK_SET) = 0 ...... close(5) = 0 <<< Closed now pwrite64(6, "O\250\22\261\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 1163264) = 16384 pwrite64(6, "\201\316\272\342\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 1179648) = 16384 pwrite64(6, "K}S\t\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 1196032) = 16384 pwrite64(6, "\207j$\265\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 1212416) = 16384 pwrite64(6, "q\267;\336\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384, 5242880) = 16384 fsync(6) <<< But we're still writing into the disk. [CAUSE] After more digging, it turns out we have a very obvious escape in open_ctree_fs_info(): open_ctree_fs_info() \|- fp = open(oca->filename, flags); \|- info = __open_ctree_fd(); \|- close(fp); As later we only do IO using the device fd, this close() seems fine. But the truth is, for mkfs usage, this fs_info is a temporary one, with a special magic number for the disk. And since mkfs is doing writeable operations, this close() would immediately trigger udev scan. And since at this stage, the fs is not yet fully created, udev can race with mkfs, and may get the invalid temporary superblock. [FIX] Introduce a new btrfs_fs_info member, initial_fd, for open_ctree_fs_info() to record the fd. And on close_ctree(), if we find fs_info::initial_fd is a valid fd, then close it. By this, we make sure all writeable fds are only closed after we have written valid super blocks into the disk. Issue: #734 Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2024-02-08 08:30:37 +01:00
David Sterba	d739e3b73a	btrfs-progs: kernel-shared: use kmalloc and kfree All the code in kernel-shared should use the proper memory allocation helpers. Signed-off-by: David Sterba <dsterba@suse.com>	2023-11-03 18:04:37 +01:00
Naohiro Aota	8816a65fec	btrfs-progs: zoned: check SB zone existence properly Currently, write_dev_supers() compares the superblock location vs the size of the device to check if it can write the superblock. This is not correct for a zoned device, whose superblock location is different than a regular device. Introduce check_sb_location() to check if the superblock zone exists for the zoned case. Running btrfs check can fail on a certain zoned device setup (e.g, zone size = 128MB, device size = 16GB). From generic/330: yes \| btrfs check --repair --force /dev/nullb1 [1/7] checking root items Fixed 0 roots. [2/7] checking extents ERROR: zoned: failed to read zone info of 4096 and 4097: Invalid argument ERROR: failed to write super block for devid 1: write error: Input/output error failed to write new super block err -5 failed to repair damaged filesystem, aborting This happens because write_dev_supers() is comparing the original superblock location vs the device size to check if it can write out a superblock copy or not. For the above example, since the first copy location (64MB) < device size (16GB), it tries to write out the copy. But, the copy must be written into zone 4096 (512G / zone size (128M) = 4096), which is out of the device. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-10-21 15:51:06 +02:00
David Sterba	b421fdff95	btrfs-progs: move raid-stripe-tree and squota build out of experimental The kernel patches for RST and squota are queued for 6.7, we need to be able to test the features so it's not necessary to hide the mkfs support under experimental build. The kernel may still need debug build to enable mount. Signed-off-by: David Sterba <dsterba@suse.com>	2023-10-17 19:33:59 +02:00
David Sterba	21aa6777b2	btrfs-progs: clean up includes, using include-what-you-use Signed-off-by: David Sterba <dsterba@suse.com>	2023-10-03 01:11:57 +02:00
Josef Bacik	f4e16e0238	btrfs-progs: update read_tree_block to take a btrfs_parent_tree_check In the kernel we've added a control struct to handle the different checks we want to do on extent buffers when we read them. Update our copy of read_tree_block to take this as an argument, then update all of the callers to use the new structure. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-10-03 01:11:57 +02:00
Josef Bacik	a38570f9d6	btrfs-progs: use btrfs_tree_parent_check for btrfs_read_extent_buffer In the kernel we have a control structure call btrfs_tree_parent_check to pass around the various sanity checks we have for extent buffers. Add this to btrfs_tree_parent_check and then update the callers. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-10-03 01:11:57 +02:00
Josef Bacik	ec9cbf2f43	btrfs-progs: add commit_root_sem to btrfs_fs_info This is used in ctree.c around getting the old root, add this to our btrfs_fs_info to make it more straightforward to sync ctree.c into btrfs-progs. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-10-03 01:11:56 +02:00
Josef Bacik	f7271ef547	btrfs-progs: add trans_lock to fs_info This exists in the kernel, and is touched by ctree.c, add it to the btrfs_fs_info to make syncing ctree.c more straightforward. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-10-03 01:11:56 +02:00
Josef Bacik	2d8058ae09	btrfs-progs: replace blocksize with parent argument for btrfs_alloc_tree_block In the kernel we pass in the parent to btrfs_alloc_tree_block instead of the blocksize and simply derive the blocksize from the fs_info. Update the function to match the kernel's convention and update all of the callers so we can sync ctree.c easily. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-10-03 01:11:56 +02:00
Josef Bacik	f94ad0c516	btrfs-progs: pass btrfs_trans_handle through btrfs_clear_buffer_dirty This is the calling convention in the kernel because we track dirty blocks per transaction instead of globally in the fs_info. Simply mirror what we do in the kernel to make it easier to sync ctree.c locally. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-10-03 01:11:55 +02:00
Johannes Thumshirn	b6490733a8	btrfs-progs: read fs with stripe tree from disk When encountering a filesystem formatted with the raid stripe tree feature, read it from disk. Also add the incompat declaration to the tree printer. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-10-02 18:41:08 +02:00
Anand Jain	66c4c9632f	btrfs-progs: prepare the latest device's superblock for commit Add a flag to copy the superblock of the latest device to the fs_info::super_copy for the commit process, rather than using the superblock from the device specified in the argument. This serves as groundwork to enable recovery from an incomplete btrfstune -M\|m\|u\|U operation. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-08-28 17:24:58 +02:00
Anand Jain	d46a0ef6a0	btrfs-progs: rename struct open_ctree_flags to open_ctree_args The struct open_ctree_flags currently holds arguments for open_ctree_fs_info(), it can be confusing when mixed with a local variable named open_ctree_flags as below in the function cmd_inspect_dump_tree(). cmd_inspect_dump_tree() :: struct open_ctree_flags ocf = { 0 }; :: unsigned open_ctree_flags; So rename struct open_ctree_flags to struct open_ctree_args. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-07-26 15:00:47 +02:00
David Sterba	22cf63d8ee	btrfs-progs: kernel-shared: add helper write_extent_buffer_chunk_tree_uuid Sync the helper write_extent_buffer_chunk_tree_uuid from kernel. Signed-off-by: David Sterba <dsterba@suse.com>	2023-06-27 23:40:56 +02:00
David Sterba	339de9b2d7	btrfs-progs: kernel-shared: use write_extent_buffer_fsid where possible We already have the helper but don't use it everywhere. Signed-off-by: David Sterba <dsterba@suse.com>	2023-06-27 16:29:58 +02:00
Qu Wenruo	33a21e7578	btrfs-progs: tune: rework the main idea of csum change The existing attempt for changing csum types is as the following: - Create a new temporary csum root - Generate new data csums into the temporary csum root - Drop the old csum tree and make the temporary one as csum root - Change the checksums for metadata in-place Unfortunately after some experiments, the csum root switch method has a big pitfall, the backref items in extent tree. Those backref items still point back to the old tree, meaning without a lot of extra tricks, the extent tree would be corrupted. Thus we have to go a new single tree variant: - Generate new data csums into the csum root The new data csums would have a different objectid to distinguish them. - Drop the old data csum items - Change the key objectids of the new csums - Change the checksums for metadata in-place This means unfortunately we have to revert most of the old code, and update the temporary item format. The new temporary item would only record the target csum type. At every stage we have a method to determine the progress, thus no need for an item, but in the future it's still open for change. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-05-26 18:02:32 +02:00
David Sterba	8f33e591e9	btrfs-progs: partial sync of ctree.c from kernel Sync checksum helpers, extent buffer helpers with kernel 6.4-rc1. Signed-off-by: David Sterba <dsterba@suse.com>	2023-05-26 18:02:31 +02:00
Qu Wenruo	46364d3766	btrfs-progs: replace write_and_map_eb() by write_data_to_disk() The function write_and_map_eb() is quite abused as a way to write any generic buffer back to disk. But we have a more suitable function already, write_data_to_disk(). This patch would remove the abused write_data_to_disk() calls, and convert the only three valid call sites to write_data_to_disk() instead. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-05-26 18:02:31 +02:00
Josef Bacik	e150c843ea	btrfs-progs: sync tree-checker.[ch] from kernel This syncs tree-checker.c from the kernel. The main modification was to add a open ctree flag to skip the deeper leaf checks, and plumbing this through tree-checker.c. We need this for things like fsck or btrfs-image that need to work with slightly corrupted file systems, and these checks simply make us unable to look at the corrupted blocks. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-05-26 18:02:30 +02:00
Josef Bacik	361e4bea13	btrfs-progs: rename btrfs_check_* to __btrfs_check_* These helpers are called __btrfs_check_* in the kernel as they return the special enum to indicate what part of the leaf/node failed. Rename the uses in btrfs-progs to match the kernel naming convention to make it easier to sync that code. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-05-26 18:02:30 +02:00
Josef Bacik	a474379935	btrfs-progs: add a btrfs_read_extent_buffer helper This exists in the kernel to do the read on an extent buffer we may have already looked up and initialized. Simply create this helper by extracting out the existing code from read_tree_block and make read_tree_block call this helper. This gives us the helper we need to sync ctree.c into btrfs-progs, and keeps the code the same in btrfs-progs. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-05-26 18:02:30 +02:00
Josef Bacik	e176d5eab6	btrfs-progs: add an atomic parameter to btrfs_buffer_uptodate We have this extra parameter in the kernel to indicate if we are atomic and thus can't lock the io_tree when checking the transid for an extent buffer. This isn't necessary in btrfs-progs, but to allow for easier syncing of ctree.c add this argument to our copy of btrfs_buffer_uptodate. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-05-26 18:02:30 +02:00
Josef Bacik	5d84ee58e9	btrfs-progs: update arguments of find_extent_buffer In the kernel we only take a bytenr for this as the extent buffer cache is indexed on bytenr. Since we're passing in the btrfs_fs_info we can simply use the ->nodesize for the blocksize, and drop the blocksize argument completely. This brings us into parity with the kernel, which will allow the syncing of ctree.c. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-05-26 18:02:30 +02:00
Josef Bacik	b3477244f9	btrfs-progs: update read_tree_block to match the kernel definition The in-kernel version of read_tree_block adds some extra sanity checks to make sure we don't return blocks that don't match what we expect. This includes the owning root, the level, and the expected first key. We don't actually do these checks in btrfs-progs, however kernel code we're going to sync will expect this calling convention, so update it to match the in-kernel code and then update all the callers. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-05-26 18:02:30 +02:00
Josef Bacik	6da52f41c7	btrfs-progs: pass root_id for btrfs_free_tree_block In the kernel we pass in the root_id for btrfs_free_tree_block instead of the root itself. Update the btrfs-progs version of the helper to match what we do in the kernel. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-05-26 18:02:30 +02:00
Josef Bacik	656938b665	btrfs-progs: rename clear_extent_buffer_dirty to btrfs_clear_buffer_dirty This is a mirror of the change I've done in the kernel, but in progs it's even more simply because clean_tree_block was just a wrapper around clear_extent_buffer_dirty. Change this to btrfs_clear_buffer_dirty, and then update all the callers to use this helper instead of clean_tree_block and clear_extent_buffer_dirty. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-05-26 18:02:29 +02:00
Josef Bacik	5780714b58	btrfs-progs: add btrfs_locking_nest to btrfs_alloc_tree_block This is how btrfs_alloc_tree_block is defined in the kernel, so when we go to sync this code in it'll be easier if we're already setup to accept this argument. Since we're in progs we don't care about nesting so just use BTRFS_NORMAL_NESTING everywhere, as we sync in the kernel code it'll get updated to whatever is appropriate. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-05-26 18:02:29 +02:00
Josef Bacik	006baaecdd	btrfs-progs: rename btrfs_alloc_free_block to btrfs_alloc_tree_block This is in keeping with what the function actually does, and is named this way in the kernel. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-05-26 18:02:29 +02:00
Josef Bacik	8e427ada49	btrfs-progs: copy btrfs_root::state from kernel We changed from members in the root for all the different flags to a bit based flag system. In order to make syncing the kernel code into btrfs-progs easier go ahead and sync in the bits we use and update all the users of the old ->track_dirty and ->ref_cows to use the state bits. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-05-26 18:02:29 +02:00
Josef Bacik	4a9a8f2a8a	btrfs-progs: sync extent-io-tree.[ch] and misc.h from the kernel This is a bit larger than the previous syncs, because we use extent_io_tree's everywhere. There's a lot of stuff added to kerncompat.h, and then I went through and cleaned up all the API changes, which were - extent_io_tree_init takes an fs_info and an owner now. - extent_io_tree_cleanup is now extent_io_tree_release. - set_extent_dirty takes a gfpmask. - clear_extent_dirty takes a cached_state. - find_first_extent_bit takes a cached_state. The diffstat looks insane for this, but keep in mind extent-io-tree.c and extent-io-tree.h are ~2000 loc just by themselves. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-05-26 18:02:29 +02:00
Josef Bacik	b0a4eab561	btrfs-progs: remove parent_key arg from btrfs_check_* helpers Now that this is unused by these helpers and only used by the repair related code we can remove this argument from the main helpers. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-05-26 18:02:28 +02:00
Josef Bacik	d36571ed9a	btrfs-progs: remove fs_info argument from btrfs_check_* helpers This can be pulled out of the extent buffer that is passed in, drop the fs_info argument from the function. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-05-26 18:02:28 +02:00
Qu Wenruo	68a04bc710	btrfs-progs: tune: add new option to convert back to extent tree With previous btrfstune support to convert to block-group-tree, it has implemented most of the infrastructure for bi-directional conversion. This patch will implement the remaining conversion support to go back to extent tree. The modification includes: - New convert_to_extent_tree() function in btrfstune.c It's almost the same as convert_to_bg_tree(), but with small changes: * No need to set extra features like NO_HOLES/FST. * Need to delete the block group tree when everything finished. - Update btrfs_delete_and_free_root() to handle non-global roots Currently the function can only accepts global roots (extent/csum/free space trees) If we pass a non-global root into the function, we will screw up global_roots_tree and crash. Since we're going to use btrfs_delete_and_free_root() to free block group tree which is not a global tree, this is needed. - New handling for half converted fs in get_last_converted_bg() There are two cases need to be handled: * The bg tree is already empty We need to grab the first bg in extent tree. Or at conversion function we will fail at grabbing the first bg. * The bg tree is not empty Then we need to grab the last bg in extent tree. - Extra root switching in involved functions. This involves: * read_converting_block_groups() * insert_block_group_item() * update_block_group_item() We just need to update our target root according to the current compat_ro and super flags. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2023-04-19 01:10:24 +02:00
David Sterba	9be33f558c	btrfs-progs: tune: update checksum conversion The checksum conversion is still experimental and still does not convert all filesystems correctly. Do not use on valuable data. Previous implementation copied the UUID conversion which was not a good base for the checksum conversion so it left out basically all trees except extent and checksum. This update adds the base for the required safety features: - let the old csum tree intact until the full conversion is done (ie. all data are still verifiable) - add on-disk status tracking item, this should keep the from/to checksum conversion, last generation to catch potential updates of the underlying filesystem if conversion is interrupted and the filesystem mounted - convert most of the fundamental trees, the subvolumes, tree log and relocation trees are not converted - trees are converted in-place to avoid potentially running out of space but this might be better done by transaction protection with a temporary tree Known issues: - not all trees are converted - not all checksums are correctly inserted into the new tree and reading the files leads to EIO Issue: #438 Signed-off-by: David Sterba <dsterba@suse.com>	2023-02-28 20:11:22 +01:00
Josef Bacik	fac1fae3ef	btrfs-progs: rename extent buffer flags to EXTENT_BUFFER_* We have been overloading the extent_state flags for use on the extent buffers as well. When we sync extent-io-tree.[ch] this will become impossible, so rename these flags to EXTENT_BUFFER_* and use those definitions instead of the extent_state definitions. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-11-28 18:57:44 +01:00
Josef Bacik	20d88c17e7	btrfs-progs: move extent cache code directly into btrfs_fs_info We have some extra features in the btrfs-progs copy of the extent_io_tree that don't exist in the kernel. In order to make syncing easier simply move this functionality into btrfs_fs_info, that way we can sync in the new extent_io_tree code and not have to worry about breaking anything. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-11-28 18:57:44 +01:00
Josef Bacik	412eea9e97	btrfs-progs: do not pass io_tree into verify_parent_transid We do not use the io_tree, don't bother passing it into verify_parent_transid. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-11-28 18:57:44 +01:00
Josef Bacik	ccee633f3a	btrfs-progs: move dirty eb tracking to it's own io_tree btrfs-progs has a cache tree embedded in the extent_io_tree in order to track extent buffers. We use the extent_io_tree part to track dirty, and the cache tree to keep the extent buffers in. When we sync extent-io-tree.[ch] we'll lose this ability, so separate out the dirty tracking into its own extent_io_tree. Subsequent patches will adjust the extent buffer lookup so it doesn't use the custom extent_io_tree thing. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-11-28 18:57:43 +01:00
Josef Bacik	af30cf2e3e	btrfs-progs: make the find extent buffer helpers take fs_info This is a cleanup patch to make syncing the btrfs kernel code into btrfs-progs easier. In btrfs-progs we have an extra cache in the extent_io_tree that's exclusively used for the extent buffer tracking. In order to untangle this dependency start passing around the fs_info to search for extent_buffers, and then have the helpers use the appropriate structure to find the extent buffer. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-11-28 18:57:43 +01:00
David Sterba	c2be0e2ce0	btrfs-progs: use template for out of memory error messages Signed-off-by: David Sterba <dsterba@suse.com>	2022-10-11 09:08:09 +02:00
David Sterba	feef6aaaf6	btrfs-progs: kernel-lib: remove radix-tree The radix-tree is not used in userspace code. In kernel it's for tracking unpersisted and in-memory structures and has been replaced by the xarray. Signed-off-by: David Sterba <dsterba@suse.com>	2022-10-11 09:08:07 +02:00
Qu Wenruo	d8f3355734	btrfs-progs: unexport csum_tree_block() The function csum_tree_block() is not really utilized by anyone, all current callers just use csum_tree_block_size(). Furthermore there is a stale definition in common/utils.h which is using the old "struct btrfs_root" as the first argument, while we have already migrated to "struct btrfs_fs_info". So just unexport csum_tree_block() and remove the stale definition. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-10-11 09:06:11 +02:00
Qu Wenruo	2f2f6bfe17	btrfs-progs: btrfstune: add the ability to convert to block group tree feature The new '-b' option will be responsible for converting to block group tree compat ro feature. The workflow looks like this for new convert: - Setting CHANGING_BG_TREE flag And initialize fs_info->last_converted_bg_bytenr value to (u64)-1. Any bg with bytenr >= last_converted_bg_bytenr will have its bg item update go to the new root (bg tree). - Iterate each block group by their bytenr in descending order This involves: * Delete the old bg item from the old tree (extent tree) * Update last_converted_bg_bytenr to the bytenr of the bg * Add the new bg item into the new tree (bg tree) * If we have converted a bunch of bgs, commit current transaction - Clear CHANGING_BG_TREE flag And set the new BLOCK_GROUP_TREE compat ro flag and commit. And since we're doing the convert in multiple transactions, we also need to resume from last interrupted convert. In that case, we just grab the last unconverted bg, and start from it. And to co-operate with the new kernel requirement for both no-holes and free-space-tree features, the convert tool will check for free-space-tree feature. If not enabled, will error out with an error message to how to continue (by mounting with "-o space_cache=v2"). For missing no-holes feature, we just need to set the flag during convert. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-09-12 18:25:32 +02:00
Qu Wenruo	1430b41427	btrfs-progs: separate block group tree from extent tree v2 Block group tree feature is completely a standalone feature, and it has been over 5 years before the initial introduction to solve the long mount time. I don't really want to waste another 5 years waiting for a feature which may or may not work, but definitely not properly reviewed for its preparation patches. So this patch will separate the block group tree feature into a standalone compat RO feature. There is a catch, in mkfs create_block_group_tree(), current tree-checker only accepts block group item with valid chunk_objectid, but the existing code from extent-tree-v2 didn't properly initialize it. This patch will also fix above mentioned problem so kernel can mount it correctly. Now mkfs/fsck should be able to handle the fs with block group tree. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-09-12 18:25:32 +02:00
Qu Wenruo	c5a21a7814	btrfs-progs: don't save block group root into super block The extent tree v2 (thankfully not yet fully materialized) needs a new root for storing all block group items. My initial proposal years ago just added a new tree rootid, and load it from tree root, just like what we did for quota/free space tree/uuid/extent roots. But the extent tree v2 patches introduced a completely new (and to me, wasteful) way to store block group tree root into super block. Currently there are only 3 trees stored in super blocks, and they all have their valid reasons: - Chunk root Needed for bootstrap. - Tree root Really the entrance of all trees. - Log root This is special as log root has to be updated out of existing transaction mechanism. There is not even any reason to put block group root into super blocks, the block group tree is updated at the same timing as old extent tree, no need for extra bootstrap/out-of-transaction update. So just move block group root from super block into tree root. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-09-12 15:31:27 +02:00
Qu Wenruo	75fea7496c	btrfs-progs: use write_data_to_disk() to handle RAID56 in write_and_map_eb() Function write_data_to_disk() can handle RAID56 writes without any problem. So just call write_data_to_disk() inside write_and_map_eb() instead of manually doing the RAID56 write. Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-08-16 15:18:12 +02:00
Qu Wenruo	fc6925bfd3	btrfs-progs: avoid repeated data write for metadata [BUG] Shinichiro reported that "mkfs.btrfs -m DUP" is doing repeated write into the device. For non-zoned device this is not a big deal, but for zoned device this is critical, as zoned device doesn't support overwrite at all. [CAUSE] The problem is related to write_and_map_eb() call, since commit `2a93728391` ("btrfs-progs: use write_data_to_disk() to replace write_extent_to_disk()"), we call write_data_to_disk() for metadata write back. But the problem is, write_data_to_disk() will call btrfs_map_block() with rw = WRITE. By that btrfs_map_block() will always return all stripes, while in write_data_to_disk() we also iterate through each mirror of the range. This results above repeated writeback. [FIX] Fix this problem by completely remove @mirror argument from write_data_to_disk(). With extra comments to explicitly show that function will write to all mirrors. Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Fixes: `2a93728391` ("btrfs-progs: use write_data_to_disk() to replace write_extent_to_disk()") Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-08-16 15:18:12 +02:00
Qu Wenruo	3ff9d35257	btrfs-progs: use read_data_from_disk() to replace read_extent_from_disk() and replace read_extent_data() The function read_extent_from_disk() is only a wrapper to read tree block. And read_extent_data() is just a while loop to eliminate short read caused by stripe boundary. In fact, a lot of call sites of read_extent_data() are either reading metadata (thus no possible short read) or doing extra loop by themselves. This patch will replace those two functions with read_data_from_disk(), making it the only entrance for data/metadata read. And update read_data_from_disk() to return the read bytes, so caller can do a simple while loop. For the few callers of read_extent_data(), open-code a small while loop for them. This will allow later RAID56 read repair using P/Q much easier. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-04-25 19:08:30 +02:00

1 2

93 commits