Introduce new function, scrub_one_block_group(), to scrub a block group.
For Single/DUP/RAID0/RAID1/RAID10, we use old mirror number based
map_block, and check extent by extent.
For parity based profile (RAID5/6), we use new map_block_v2() and check
full stripe by full stripe.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Introduce a new function, scrub_one_full_stripe(), to check a full
stripe.
It handles the full stripe scrub in the following steps:
0) Check if we need to check full stripe
If full stripe contains no extent, why waste our CPU and IO?
1) Read out full stripe
Then we know how many devices are missing or have read error.
If out of repair, then exit
If have missing device or have read error, try recover here.
2) Check data stripe against csum
We add data stripe with csum error as corrupted stripe, just like
dev missing or read error.
Then recheck if csum mismatch is still below tolerance.
Finally we check the full stripe using 2 factors only:
A) If the full stripe go through recover ever
B) If the full stripe has csum error
Combine factor A and B we get:
1) A && B: Recovered, csum mismatch
Screwed up totally
2) A && !B: Recovered, csum match
Recoverable, data corrupted but P/Q is good to recover
3) !A && B: Not recovered, csum mismatch
Try to recover corrupted data stripes
If recovered csum match, then recoverable
Else, screwed up
4) !A && !B: Not recovered, no csum mismatch
Best case, just check if P/Q matches.
If P/Q matches, everything is good
Else, just P/Q is screwed up, still recoverable.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Introduce a internal helper, write_full_stripe() to calculate P/Q and
write the whole full stripe.
This is useful to recover RAID56 stripes.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Introduce function, recover_from_parities(), to recover data stripes.
It just wraps raid56_recov() with extra check functions to
scrub_full_stripe structure.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Introduce a new function, btrfs_check_extent_exists(), to check if there
is any extent in the range specified by user.
The parameter can be a large range, and if any extent exists in the
range, it will return >0 (in fact it will return 1).
Or return 0 if no extent is found.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Introduce new function, verify_parities(), to check whether parities match
with full stripe, whose data stripes match with their csum.
Caller should fill the scrub_full_stripe structure properly before
calling this function.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Introduce new function, scrub_one_data_stripe(), to check all data and
tree blocks inside the data stripe.
This function will not try to recovery any error, but only check if any
data/tree blocks has mismatch csum.
If data missing csum, which is completely valid for case like nodatasum,
it will just record it, but not report as error.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Introduce a new function, scrub_one_extent(), as a wrapper to check one
mirror-based extent.
It will accept a btrfs_path parameter @path, which must point to a
META/EXTENT_ITEM.
And @start, @len, which must be a subset of META/EXTENT_ITEM.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Introduce new function, check/recover_data_mirror(), to check and recover
mirror based data blocks.
Unlike tree block, data blocks must be recovered sector by sector, so we
introduced corrupted_bitmap for check and recover.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Introduce new functions, check/recover_tree_mirror(), to check and
recover mirror-based tree blocks (Single/DUP/RAID0/1/10).
check_tree_mirror() can also be used on in-memory tree blocks using @data
parameter.
This is very handy for RAID5/6 case, either checking the data stripe
tree block by @bytenr and 0 as @mirror, or using @data parameter for
recovered in-memory data.
While recover_tree_mirror() is only used for mirror-based profiles, as
RAID56 recovery is done by stripe unit, not mirror unit.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Introuduce new local structures, scrub_full_stripe and scrub_stripe, for
incoming offline RAID56 scrub support.
For pure stripe/mirror based profiles, like raid0/1/10/dup/single, we
will follow the original bytenr and mirror number based iteration, so
they don't need any extra structures for these profiles.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
Introduce a new function: btrfs_read_data_csums(), to read out csums
for sectors in range.
This is quite useful for read out data csum so we don't need to do it
using open code.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com>
Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
For READ, caller normally hopes to get what they request, other than
full stripe map.
In this case, we should remove unrelated stripe map, just like the
following case:
32K 96K
|<-request range->|
0 64k 128K
RAID0: | Data 1 | Data 2 |
disk1 disk2
Before this patch, we return the full stripe:
Stripe 0: Logical 0, Physical X, Len 64K, Dev disk1
Stripe 1: Logical 64k, Physical Y, Len 64K, Dev disk2
After this patch, we limit the stripe result to the request range:
Stripe 0: Logical 32K, Physical X+32K, Len 32K, Dev disk1
Stripe 1: Logical 64k, Physical Y, Len 32K, Dev disk2
And if it's a RAID5/6 stripe, we just handle it like RAID0, ignoring
parities.
This should make caller easier to use.
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Introduce a new function, __btrfs_map_block_v2().
Unlike old btrfs_map_block(), which needs different parameter to handle
different RAID profile, this new function uses unified btrfs_map_block
structure to handle all RAID profile in a more meaningful method:
Return physical address along with logical address for each stripe.
For RAID1/Single/DUP (none-stripped):
result would be like:
Map block: Logical 128M, Len 10M, Type RAID1, Stripe len 0, Nr_stripes 2
Stripe 0: Logical 128M, Physical X, Len: 10M Dev dev1
Stripe 1: Logical 128M, Physical Y, Len: 10M Dev dev2
Result will be as long as possible, since it's not stripped at all.
For RAID0/10 (stripped without parity):
Result will be aligned to full stripe size:
Map block: Logical 64K, Len 128K, Type RAID10, Stripe len 64K, Nr_stripes 4
Stripe 0: Logical 64K, Physical X, Len 64K Dev dev1
Stripe 1: Logical 64K, Physical Y, Len 64K Dev dev2
Stripe 2: Logical 128K, Physical Z, Len 64K Dev dev3
Stripe 3: Logical 128K, Physical W, Len 64K Dev dev4
For RAID5/6 (stripped with parity and dev-rotation):
Result will be aligned to full stripe size:
Map block: Logical 64K, Len 128K, Type RAID6, Stripe len 64K, Nr_stripes 4
Stripe 0: Logical 64K, Physical X, Len 64K Dev dev1
Stripe 1: Logical 128K, Physical Y, Len 64K Dev dev2
Stripe 2: Logical RAID5_P, Physical Z, Len 64K Dev dev3
Stripe 3: Logical RAID6_Q, Physical W, Len 64K Dev dev4
The new unified layout should be very flex and can even handle things
like N-way RAID1 (which old mirror_num basic one can't handle well).
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
This new test checks inspect-internal rootid
- handle path to subvolume/directory/file as an argument
- get different id for each subvolume
- get the expected id for each file/directory (i.e. the same as
containing subvolume)
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The post-rollback helper still assumes just extN, we need an extra
argument that'll get passed to fsck. Change all callsites at once so the
tests do not fail temporarily.
Signed-off-by: David Sterba <dsterba@suse.com>
First patch causes test-convert fails. This is because
generate_dataset() creates a name containing trailing spaces for
"slow_symlink" type, and cause getfacl error in convert_test_perm().
(This is not noticed since original run_check_stdout() throws away the
error.)
Fix this by use space for delimiter for cut.
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
run_check_stdout() uses "... | tee ... || _fail". However, since tee
won't fail, _fail() is not called even if first command fails.
Fix this by checking PIPESTATUS in the end.
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Add test case which checks if -r|--rootdir mkfs option can handle
symlink/char/block/fifo files.
Signed-off-by: Qu Wenruo <quwenruo.btrfs@gmx.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[Bug]
If using mkfs.btrfs with "-r" parameter and specified directory has
fifo/socket/char/block special file, then created filesystem can't pass
fsck:
------
checking fs roots
unresolved ref dir 241158 index 3 namelen 9 name S.dirmngr filetype 0 errors 80, filetype mismatch
ERROR: errors found in fs roots
------
[Reason]
Btrfs dir items/indexes records inode type, while "-r" only handles
directories, regular files and symlink, it makes such special files type
to be regular file and caused the problem.
[Fix]
Add missing types for add_directory_items(), so that result of
"mkfs.btrfs -r" can pass mkfs.
Signed-off-by: Qu Wenruo <quwenruo.btrfs@gmx.com>
Signed-off-by: David Sterba <dsterba@suse.com>
make_btrfs is too long to understand, make creatation of root tree
in a function.
Some of the tree roots are now created in a loop, where the code is just
copypasted. We now make use of the reference_root_table to translate
block index to root objectid.
Signed-off-by: Gu Jinxiang <gujx@cn.fujitsu.com>
[ updated changelog ]
Signed-off-by: David Sterba <dsterba@suse.com>
Since cmd_inspect_rootid() calls btrfs_open_dir(), it rejects a file to
be specified. But as the document says, a file should be supported.
This patch introduces btrfs_open_file_or_dir(), which is a counterpart
of btrfs_open_dir(), to safely check and open btrfs file or directory.
The original btrfs_open_dir() content is moved to btrfs_open() and shared
by both function.
Signed-off-by: Tomohiro Misono <misono.tomohiro@jp.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
While looking at a log of a corrupted fs I needed to verify we were
missing csums for a given range. Make this easier by printing out the
range of bytes a csum item covers.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
When ino is BTRFS_EMPTY_SUBVOL_DIR_OBJECTID, the item is not referred to
any file-tree. So lookup_path_rootid() doesn't return any meaningful
value.
As was reported, this can be triggered by
$ btrfs sub create test1
$ btrfs sub create test1/test2
$ btrfs sub snap test1 test1.snap
$ btrfs fi du -s test1
Total Exclusive Set shared Filename
0.00B 0.00B 0.00B test1
$ btrfs fi du -s test1.snap
Total Exclusive Set shared Filename
ERROR: cannot check space of 'test1.snap': Inappropriate ioctl for device
Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it>
Signed-off-by: David Sterba <dsterba@suse.com>
In du_walk_dir(), when du_add_file() returns an error it is usually
ignored. However if the error is returned querying the last item, the
error is returned to the caller.
Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it>
Signed-off-by: David Sterba <dsterba@suse.com>
Sometimes it's needed to do a check on a mounted filesystem. This should
work fine on a quiescent filesystem or a read-only mount. Changes on the
block device done by kernel might confuse the userspace checker and it
might crash when it reads some stale data.
Repair without mount checks is not supported right now.
Signed-off-by: David Sterba <dsterba@suse.cz>
In some cases it's clear from the context which item is being printed,
so we can remove them. If the item has no data, some description is
still desired (eg. orphan or various backrefs).
Signed-off-by: David Sterba <dsterba@suse.com>