btrfs_fs_info::block_group_cache and the bit BLOCK_GROUP_DIRY are not
used anymore, so is the block_group_state_bits(). Remove them.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Su Yue <Damenly_Su@gmx.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This commit organises block groups cache in
btrfs_fs_info::block_group_cache_tree. And any dirty block groups are
linked in transaction_handle::dirty_bgs.
To keep coherence of bisect, it does almost replace in place:
1. Replace the old btrfs group lookup functions with new functions
introduced in former commits.
2. set_extent_bits(..., BLOCK_GROUP_DIRYT) things are replaced by linking
the block group cache into trans::dirty_bgs. Checking and clearing bits
are transformed too.
3. set_extent_bits(..., bit | EXTENT_LOCKED) things are replaced by
new the btrfs_add_block_group_cache() which inserts caches into
btrfs_fs_info::block_group_cache_tree directly. Other operations are
converted to tree operations.
Signed-off-by: Su Yue <Damenly_Su@gmx.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We are going to touch dirty_bgs in transaction directly, so every call
chain should pass @trans to the leaf functions.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Su Yue <Damenly_Su@gmx.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The old style uses extent bit BLOCK_GROUP_DIRTY to mark dirty block
groups in extent cache. To replace it, add btrfs_trans_handle::dirty_bgs
and btrfs_block_group_cache::dirty_list.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Su Yue <Damenly_Su@gmx.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The new function btrfs_add_block_group_cache() abstracts the old
set_extent_bits and set_state_private operations.
Rename the rb tree version to btrfs_add_block_group_cache_kernel().
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Su Yue <Damenly_Su@gmx.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Change @cotnains to @next of block_group_cache_tree_search(). Now, the
function will try to search the block group containing the @bytenr. If
not found, return NULL if @next is zero. Or It will return the next
block group.
The mode of search used in kernel has the parameter updated.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Su Yue <Damenly_Su@gmx.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Simple copy and paste, remove useless lock operantions in progs. Th new
coming lookup functions are temporarily named with suffix _kernel.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Su Yue <Damenly_Su@gmx.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Just break loop and return the error code if failed. Functions in the
call chain are able to handle it.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Su Yue <Damenly_Su@gmx.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
For certain btrfs images, a BUG_ON() can be triggered at open_ctree()
time:
Opening filesystem to check...
extent_io.c:158: insert_state: BUG_ON `end < start` triggered, value 1
btrfs(+0x2de57)[0x560c4d7cfe57]
btrfs(+0x2e210)[0x560c4d7d0210]
btrfs(set_extent_bits+0x254)[0x560c4d7d0854]
btrfs(exclude_super_stripes+0xbf)[0x560c4d7c65ff]
btrfs(btrfs_read_block_groups+0x29d)[0x560c4d7c698d]
btrfs(btrfs_setup_all_roots+0x3f3)[0x560c4d7c0b23]
btrfs(+0x1ef53)[0x560c4d7c0f53]
btrfs(open_ctree_fs_info+0x90)[0x560c4d7c11a0]
btrfs(+0x6d3f9)[0x560c4d80f3f9]
btrfs(main+0x94)[0x560c4d7b60c4]
/usr/lib/libc.so.6(__libc_start_main+0xf3)[0x7fd189773ee3]
btrfs(_start+0x2e)[0x560c4d7b635e]
[CAUSE]
This is caused by passing @len == 0 to add_excluded_extent(), which
means one reverse mapped range is just out of the block group range,
normally means a by-one error.
[FIX]
Fix the boundary check on the reserve mapped range against block group
range. If a reverse mapped super block starts at the end of the block
group, it doesn't cover so we don't need to bother the case.
Issue: #210
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
For a fuzzed image, `btrfs check` both modes trigger BUG_ON():
Opening filesystem to check...
volumes.c:1795: btrfs_chunk_readonly: BUG_ON `!ce` triggered, value 1
btrfs(+0x2f712)[0x557beff3b712]
btrfs(+0x32059)[0x557beff3e059]
btrfs(btrfs_read_block_groups+0x282)[0x557beff30972]
btrfs(btrfs_setup_all_roots+0x3f3)[0x557beff2ab23]
btrfs(+0x1ef53)[0x557beff2af53]
btrfs(open_ctree_fs_info+0x90)[0x557beff2b1a0]
btrfs(+0x6d3f9)[0x557beff793f9]
btrfs(main+0x94)[0x557beff200c4]
/usr/lib/libc.so.6(__libc_start_main+0xf3)[0x7f623ac97ee3]
btrfs(_start+0x2e)[0x557beff2035e]
[CAUSE]
The fuzzed image has a bad extent tree:
item 0 key (288230376165343232 BLOCK_GROUP_ITEM 8388608) itemoff 16259 itemsize 24
block group used 0 chunk_objectid 256 flags DATA
There is no corresponding chunk for the block group.
In then we hit the BUG_ON(), which expects chunk mapping for
btrfs_chunk_readonly().
[FIX]
Remove that BUG_ON() with proper error handling, and make
btrfs_read_block_groups() handle the -ENOENT error from
read_one_block_group() to continue.
So one corrupted block group item won't screw up the remaining block
group items.
Issue: #209
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The progs side function btrfs_lookup_first_block_group() calls
find_first_extent_bit() to find block group which contains bytenr
or after the bytenr. This behavior differs from kernel code, so
add the comments.
Add the coments of btrfs_lookup_block_group() too, this one works
like kernel side.
Signed-off-by: Su Yue <Damenly_Su@gmx.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We access btrfs_block_group_cache::item mostly for @used and @flags.
@flags is already a dedicated member in btrfs_block_group_cache, only
@used doesn't have a dedicated member.
This patch will remove btrfs_block_group_cache::item and add
btrfs_block_group_cache::used.
It's the btrfs-progs equivalent of the following kernel patches:
btrfs: move block_group_item::used to block group
btrfs: move block_group_item::flags to block group
btrfs: remove embedded block_group_cache::item
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch does the following refactor:
- Refactor parameter from @root to @fs_info
- Refactor the large loop body into another function
Now we have a helper function, read_one_block_group(), to handle
block group cache and space info related routine.
- Refactor the return value
Even we have the code handling ret > 0 from find_first_block_group(),
it never works, as when there is no more block group,
find_first_block_group() just return -ENOENT other than 1.
This is super confusing, it's almost a mircle it even works.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The following functions are just using @root to reach fs_info:
- exclude_super_stripes
- free_excluded_extents
- add_excluded_extent
Refactor them to use fs_info directly.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
With the introduction of xxhash64 to btrfs-progs we created a crypto/
directory for all the hashes used in btrfs (although no
cryptographically secure hash is there yet).
Move the crc32c implementation from kernel-lib/ to crypto/ as well so we
have all hashes consolidated.
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: David Sterba <dsterba@suse.com>
Discovered with cppcheck. Fix signed/unsigned int mismatches, sizeof and
long formats.
Pull-request: #197
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This piece of code has been commented since 2009, given the number of
changes that have happened it's unlikely it could be made to work or is
needed at all. Just delete it.
The code was disabled in commit 95d3f20b51 ("Mixed back reference
(FORWARD ROLLING FORMAT CHANGE)") that changed the format significantly
and we don't need the compatibility code anymore.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
'pin' is always true in __free_extent so there is no point in checking
it. Just remove the if and unindent the code.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
There is a bug report of unexpected ENOSPC from btrfs-convert, issue #123.
After some debugging, even when we have enough unallocated space, we
still hit ENOSPC at btrfs_reserve_extent().
[CAUSE]
Btrfs-progs relies on chunk preallocator to make enough space for
data/metadata.
However after the introduction of delayed-ref, it's no longer reliable
to rely on btrfs_space_info::bytes_used and
btrfs_space_info::bytes_pinned to calculate used metadata space.
For a running transaction with a lot of allocated tree blocks,
btrfs_space_info::bytes_used stays its original value, and will only be
updated when running delayed ref.
This makes btrfs-progs chunk preallocator completely useless. And for
btrfs-convert/mkfs.btrfs --rootdir, if we're going to have enough
metadata to fill a metadata block group in one transaction, we will hit
ENOSPC no matter whether we have enough unallocated space.
[FIX]
This patch will introduce btrfs_space_info::bytes_reserved to track how
many space we have reserved but not yet committed to extent tree.
To support this change, this commit also introduces the following
modification:
- More comment on btrfs_space_info::bytes_*
To make code a little easier to read
- Export update_space_info() to preallocate empty data/metadata space
info for mkfs.
For mkfs, we only have a temporary fs image with SYSTEM chunk only.
Export update_space_info() so that we can preallocate empty
data/metadata space info before we start a transaction.
- Proper btrfs_space_info::bytes_reserved update
The timing is the as kernel (except we don't need to update
bytes_reserved for data extents)
* Increase bytes_reserved when call alloc_reserved_tree_block()
* Decrease bytes_reserved when running delayed refs
With the help of head->must_insert_reserved to determine whether we
need to decrease.
Issue: #123
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In github issues, one user reports unexpected ENOSPC error if enabling
datasum druing convert. After some investigation, it looks like that
during ext2_saved/image creation, we could create large file extent
whose size can be 128M (max data extent size).
In that case, its csum block will be at least 128K. Under certain case
we need to allocate extra metadata chunks to fulfill such space
requirement.
However we only do metadata prealloc if we're reserving extents for fs
trees. (we use btrfs_root::ref_cows to determine whether we should do
metadata prealloc, and that member is only set for fs trees).
There is no explaination on why we only do metadata prealloc for file
trees, but from my educated guess, it could be related to avoid nested
extent/chunk tree modication.
At least extent reservation for csum tree shouldn't be a problem with
metadata block group preallocation.
So adding new condition for metadata preallocate to avoid unexpected
ENOSPC problem.
Issue: #123
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is a indirect recursion which can reach the extent reservation:
btrfs_reserve_extent() <--|
|- do_chunk_alloc() |
|- btrfs_alloc_chunk() |
|- btrfs_insert_item() |
|- btrfs_reserve_extent() <--|
Currently, we're using root->ref_cows to determine whether we should do
chunk prealloc to avoid such loop.
But that's still a hidden trap. Instead of solving it using some hidden
tricks, this patch will make chunk/block group allocation exclusive.
Now if do_chunk_alloc() determines to alloc chunk, it will set a flag in
transaction handle so new call of do_chunk_alloc() will refuse to
allocate new chunk until current chunk allocation finishes.
The chunks get over-allocated by 2M so there's enough space in case the
recursive call asks for a different type of blockgroup.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
BTRFS_COMPAT_EXTENT_TREE_V0 is introduced for a short time in kernel,
and it's over 10 years ago.
Nowadays there should be no user for that feature, and kernel has remove
this support in Jun, 2018. There is no need for btrfs-progs to support
it.
This patch will remove EXTENT_TREE_V0 related code and replace those
BUG_ON() to a more graceful error message.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There is a bug report of BUG_ON() which is caused by __free_extent()
failed to lookup a backref extent:
Failed to find [1429288337408, 168, 16384]
btrfs unable to find ref byte nr 1429288583168 parent 0 root 2 owner 0 offset 0
convert/source-ext2.c:834: ext2_copy_inodes: BUG_ON ret triggered, value -5
./btrfs-convert[0x410941]
./btrfs-convert(main+0x1fdc)[0x40d3b8]
/lib64/libc.so.6(__libc_start_main+0xf3)[0x7f93bb7d2f33]
./btrfs-convert(_start+0x2e)[0x40a96e]
It's still unclear how this bug can be triggered, but adding such debug
output will provide more info for us to debug.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[BUG]
When running fuzz-tests/003 and fuzz-tests/009, btrfs-progs will crash
due to BUG_ON().
[CAUSE]
We abused BUG_ON() in btrfs_commit_transaction(), which is one of the
most error prone function for fuzzed images.
Currently to cleanup the aborted transaction, we only need to clean up
the only per-transaction data: delayed refs.
This patch will introduce a new function, btrfs_destroy_delayed_refs()
to cleanup delayed refs when we failed to commit transaction.
With that function, we will gently destroy per-trans delayed ref, and
remove the BUG_ON()s in btrfs_commit_transaction().
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This patch will refactor btrfs_finish_extent_commit():
- Make it return void
There is no failure pattern for btrfs_finish_extent_commit(), thus it
always return 0. And the caller doesn't care about the return value.
So no need to return int.
- Remove @root and @unpin parameters
@root is only used to extract fs_info, which can be extracted from
transaction handler already.
@unpin is always fs_info->pinned_extents.
All these parameters can be extracted from @trans, no need to pass
them.
The function signature now matches the kernel counterpart.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
cleanup_ref_head() will only return 0 or 1, no way to return a negative
value. So remove the dead branch.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Commit 7a12d8470e ("btrfs-progs: Do metadata preallocation as long as
we're not modifying extent tree") tries to fix#123, however due to the
fact that chunk tree also has root->ref_cows set, we will call
do_chunk_alloc() until call stack explodes.
So revert that offending patch until we have a much better comment on
root->ref_cows and find a better solution to this problem.
Signed-off-by: Qu Wenruo <wqu@suse.com>
In github issues, one user reports unexpected ENOSPC error if enabling
datasum druing convert. After some investigation, it looks like that
during ext2_saved/image creation, we could create large file extent
whose size can be 128M (max data extent size).
In that case, its csum block will be at least 128K. Under certain case
we need to allocate extra metadata chunks to fulfill such space
requirement.
However we only do metadata prealloc if we're reserving extents for fs
trees. (we use btrfs_root::ref_cows to determine whether we should do
metadata prealloc, and that member is only set for fs trees).
There is no explaination on why we only do metadata prealloc for file
trees, but at least from my investigation, it could be related to avoid
nested extent tree modication.
At least extent reservation for csum tree shouldn't be a problem with
metadata block group preallocation.
So change the metadata block group preallocation check from
"root->ref_cow" to "root->root_key.objectid !=
BTRFS_EXTENT_TREE_OBJECTID", and add some comment for it.
Issue: #123
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Found using -Wmissing-prototypes in GCC. This should improve LTO
behavior.
Note that set_free_space_tree_thresholds is an unused function. Adding
inline seems to remove the unused function warning.
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: Rosen Penev <rosenp@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
For now this doesn't change the functionality since FST code is not yet
enabled via the compat bits. But this will be needed when it's enabled
so that the FST is correctly modified during repair operations that
allocate/deallocate extents.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Now that delayed refs have been wired let's merge the two function. In
the process also remove one BUG_ON since alloc_reserved_tree_block's
callers can handle errors. No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Now that delayed refs have been all wired up clean up the __free_extent2
adapter function since it's no longer needed. No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Given that the new delayed refs infrastructure is implemented and wired
up, there is no point in keeping the old code. So just remove it.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This commit enables the delayed refs infrastructures. This entails doing
the following:
1. Replacing existing calls of btrfs_extent_post_op (which is the
equivalent of delayed refs) with the proper btrfs_run_delayed_refs.
As well as eliminating open-coded calls to finish_current_insert and
del_pending_extents which execute the delayed ops.
2. Wiring up the addition of delayed refs when freeing extents
(btrfs_free_extent) and when adding new extents (alloc_tree_block).
3. Adding calls to btrfs_run_delayed refs in the transaction commit
path alongside comments why every call is needed, since it's not
always obvious (those call sites were derived empirically by running
and debugging existing tests)
4. Correctly flagging the transaction in which we are reinitialising
the extent tree.
5. Moving btrfs_write_dirty_block_groups to
btrfs_write_dirty_block_groups since blockgroups should be written to
disk after the last delayed refs have been run.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The root argument is used only to get a reference to the fs_info, this
can be achieved with the transaction handle being passed so use that.
This is in preparation for moving this function in the main transaction
commit routine. No functional changes.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This commit pulls those portions of the kernel implementation of
delayed refs which are necessary to have them working in user-space.
I've done the following modifications:
1. Replaced all kmem_cache_alloc calls to kmalloc.
2. Removed all locking-related code, since we are single threaded in
userspace.
3. Removed code which deals with data refs - delayed refs in user space
are going to be used only for cowonly trees.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is a simple adapter function to convert the delayed-refs structures
to the current arguments of alloc_reserved_tree_block.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is a simple adapter to convert the arguments delayed ref arguments
to the existing arguments of __free_extent.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When running test fuzz/003, we could hit the following BUG_ON:
====== RUN MAYFAIL btrfs check --init-csum-tree tests//fuzz-tests/images/bko-155621-bad-block-group-offset.raw.restored
Unable to find block group for 0
Unable to find block group for 0
Unable to find block group for 0
extent-tree.c:2657: alloc_tree_block: BUG_ON `ret` triggered, value -28
failed (ignored, ret=134): btrfs check --init-csum-tree tests/fuzz-tests/images/bko-155621-bad-block-group-offset.raw.restored
mayfail: returned code 134 (SIGABRT), not ignored
test failed for case 003-multi-check-unmounted
Just remove that BUG_ON() and allow us to exit gracefully, the caller
handles the errors.
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Currently some instances of btrfs_free_extent are called with the
last parameter ("offset") being set to 1. This makes no sense, since
offset is used for data extents. I suspect this is a left-over from
95d3f20b51 ("Mixed back reference (FORWARD ROLLING FORMAT CHANGE)")
since this commit changed the signature of the function from :
-int btrfs_free_extent(struct btrfs_trans_handle *trans, struct btrfs_root
- *root, u64 bytenr, u64 num_bytes, u64 parent,
- u64 root_objectid, u64 ref_generation,
- u64 owner_objectid, int pin);
to
+int btrfs_free_extent(struct btrfs_trans_handle *trans,
+ struct btrfs_root *root,
+ u64 bytenr, u64 num_bytes, u64 parent,
+ u64 root_objectid, u64 owner, u64 offset);
I.e the last parameter was "pin" and not offset. So these are just
leftovers with no semantic meaning. Fix this by passing 0.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is not really needed, since we can reference the fs_info from the
passed transaction. This is in preparation for delayed-refs support.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This argument is no longer used in this function so remove it.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
They are not really needed, what free_extent_hook wants is really a
pointer to fs_info so give it to it directly. This is in preparation
of delayed refs code.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
This is in preparation of delayed refs code.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Instead of updating this during update_block_group, move the updating
code at the places where we free/allocate a block. This resembles the
current state of the kernel code. This is in prep for delayed refs.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
It's not needed, since we can obtain a reference to fs_info from the
passed transaction handle. This is needed by delayed refs code.
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>