Commit graph

5565 commits

Author SHA1 Message Date
David Sterba 9be1b1c442 btrfs-progs: qgroup create: accept only valid qgroupid
Many qgroup commands accept the level/id format and also a path to
subvolume, the qgroup id is derived from that. This does not make sense
for the create command as we can't create the 0/subvolid qgroup (thus
can't be derived from the path), only the higher levels.

Signed-off-by: David Sterba <dsterba@suse.com>
2021-09-07 14:20:42 +02:00
David Sterba 47b04b747d btrfs-progs: move parse_qgroupid to parse utils
Signed-off-by: David Sterba <dsterba@suse.com>
2021-09-07 14:20:42 +02:00
David Sterba cb5a542871 btrfs-progs: factor out plain qgroupid parsing
We'll use plain qgroupid parsing function elsewhere so split that part
from parse_qgroupid_or_path. The parsing is slightly reworked and goes
from start to end, while previously it looked up the slash and worked
from there. In case a valid qgroupid is also a valid path, the path must
be specified as absolute.

Signed-off-by: David Sterba <dsterba@suse.com>
2021-09-07 14:20:42 +02:00
David Sterba ebf500b43d btrfs-progs: rename parse_qgroupid
This helper can parse a qgroupid or a path, so rename it accordingly, so
a plain qgroupid parsing can be factored out as a standalone helper.

Signed-off-by: David Sterba <dsterba@suse.com>
2021-09-07 14:20:42 +02:00
Qu Wenruo 9a11b1b792 btrfs-progs: backport btrfs_check_node() from kernel
The btrfs_check_node() has far less meaningful error message compared to
kernel counterpart, and it even lacks certain checks like level check.

Backport btrfs_check_node() to btrfs-progs to not only unify the code
but greatly improve the readability of the error messages.

Extra modification includes:

- No fs_info needed
  As we don't need to output fsid.

- Remove unlikely() macro

- Extra BTRFS_TREE_BLOCK_* error type

- Btrfs-progs specific error handling
  To record the corrupted tree blocks.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-09-07 14:20:41 +02:00
Qu Wenruo 8f8cafa2ce btrfs-progs: backport btrfs_check_leaf() from kernel
Currently btrfs_check_leaf() provides almost meaningless messages for
things like invalid item offset:

  incorrect offsets 8492 3707786077

While kernel tree-checker is doing a way better job, so it's wise to
backport btrfs_check_leaf() from kernel.

There are some modification needed:

- New generic_err() helper

- Remove unlikely() macro

- Remove empty essential tree check
  Mkfs still needs to create empty essential trees.

- Using BTRFS_TREE_BLOCK_* return value
  Original mode check still relies on them to do certain repair.

- No need for btrfs_fs_info
  We no longer need fsid output, thus no need for btrfs_fs_info.

- No item contents check

- Still using the fail: label for btrfs-progs specific error handling

The new output looks like:

  corrupt leaf: root=2 block=72164753408 slot=109, unexpected item end, have 3707786077 expect 8492

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-09-07 14:19:54 +02:00
Qu Wenruo 1f8dfe681f btrfs-progs: use btrfs_key for btrfs_check_node() and btrfs_check_leaf()
In kernel space we hardly use btrfs_disk_key, unless for very lowlevel
code.

There is no need to intentionally use btrfs_disk_key in btrfs-progs
either.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-09-07 13:58:44 +02:00
David Sterba c3ee6a8a09 btrfs-progs: unify GPL header comments
Add the GPL v2 header to files where it was missing and is not from an
external source, update to the most recent version with the address.

Signed-off-by: David Sterba <dsterba@suse.com>
2021-09-07 13:58:44 +02:00
David Sterba f3a132fa1b btrfs-progs: factor out compression type name parsing to common utils
Signed-off-by: David Sterba <dsterba@suse.com>
2021-09-07 13:58:44 +02:00
David Sterba 0c3991e171 btrfs-progs: mkfs: use common parser of bg profiles
Replace open-coded profile parser with the common one.

Signed-off-by: David Sterba <dsterba@suse.com>
2021-09-07 13:58:44 +02:00
David Sterba 31df7dc295 btrfs-progs: factor out profile parsing to common utils
There are some duplicate parsers of the profile names, factor out the
one from balance to the common code.

Signed-off-by: David Sterba <dsterba@suse.com>
2021-09-07 13:58:44 +02:00
Sidong Yang dd3b931b1c btrfs-progs: props: init compression prop_handlers with field name
Compression prop_handler is initialized without field name. This patch
corrects that it's initialized with field name.

Signed-off-by: Sidong Yang <realwakka@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-09-07 13:58:44 +02:00
David Sterba 6b93a7336c btrfs-progs: move number and range parsing helpers to parse-utils.c
Some of the parsers in cmds/balance.c are generic enough for the common
part.

Signed-off-by: David Sterba <dsterba@suse.com>
2021-09-07 13:58:43 +02:00
David Sterba af56460de8 btrfs-progs: split parsing helpers from utils.c
There are various parsing helpers scattered everywhere, unify them to
one file and start with helpers already in utils.c.

Signed-off-by: David Sterba <dsterba@suse.com>
2021-09-06 17:15:51 +02:00
David Sterba d15b7b0d58 btrfs-progs: parse profile name from the raid table
We can use the raid table to match profile names, additionally make the
test case insensitive. The single profile is not represented as a bit
and must be set manually for now.

Signed-off-by: David Sterba <dsterba@suse.com>
2021-09-06 17:15:13 +02:00
David Sterba 7572839a74 btrfs-progs: add and use bit masks for RAID1 and RAID56 profiles
Many test conditions can be simplified in case they check all the
related profiles.

Signed-off-by: David Sterba <dsterba@suse.com>
2021-09-06 16:36:18 +02:00
David Sterba 7fe4396467 btrfs-progs: copy some raid_attr helpers from kernel
There are convenience helpers for the raid attr table, copy them from
kernel for further cleanups.

Signed-off-by: David Sterba <dsterba@suse.com>
2021-09-06 16:36:17 +02:00
Josef Bacik 79e534def9 btrfs-progs: add the incompat flag for extent tree v2
I will have a lot of preparatory patches to reduce the review pain of
this large feature.  In order to enable that work define the incompat
flag.  Once all of the work lands to support the feature there will be a
patch to actually enable us to select it and manipulate file systems
with that incompat flag set.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-09-06 16:36:17 +02:00
Josef Bacik 4b6cf2a3eb btrfs-progs: mkfs: generate free space tree at make_btrfs() time
With extent-tree-v2 we won't be able to cache block groups based on the
extent tree, so we need to have a valid free space tree before we open
the temporary file system to finish setting the file system up.  Set up
the basic free space entries for our temporary system chunk if we have
the free space tree enabled and stop generating the tree after the fact.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-09-06 16:36:17 +02:00
Josef Bacik 826e466028 btrfs-progs: add add_block_group_free_space helper
This exists in the kernel free-space-tree.c but not in progs.  We need
it to generate the free space items for new block groups, which is
needed when we start creating the free space tree in make_btrfs().

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-09-06 16:36:17 +02:00
Josef Bacik abc2b30443 btrfs-progs: mkfs: add the block group item in make_btrfs()
Currently we build a bare-bones file system in make_btrfs(), and then we
load it up and fill in the rest of the file system after the fact.  One
thing we omit in make_btrfs() is the block group item for the temporary
system chunk we allocate, because we just add it after we've opened the
file system.

However I want to be able to generate the free space tree at
make_btrfs() time, because extent tree v2 will not have an extent tree
that has every block allocated in the system.  In order to do this I
need to make sure that the free space tree entries are added on block
group creation, which is annoying if we have to add this chunk after
I've created a free space tree.

So make future work simpler by simply adding our block group item at
make_btrfs() time, this way I can do the right things with the free
space tree in the generic make block group code without needing a
special case for our temporary system chunk.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-09-06 16:36:17 +02:00
Josef Bacik 3d870a491f btrfs-progs: make sure track_dirty and ref_cows is set properly
Adding support for the per-block group roots means we will be reading
the roots directly in different places.  Make sure we set ->track_dirty
and ->ref_cows properly in the helper so we don't have to do this
everywhere.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-09-03 15:33:53 +02:00
Josef Bacik 41c3fa5d0b btrfs-progs: mkfs: add helper for writing empty tree nodes
With extent tree v2 we're going to be writing some more empty trees for
the initial mkfs step, so take this common code and make it a helper.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-09-03 15:33:30 +02:00
Josef Bacik 9b4207a0f9 btrfs-progs: mkfs: set nritems based on root items written
For the root tree we were just hard setting the nritems to 4, which will
change when we move to extent tree v2.  Instead set the nritems after
we've added all the root items we need to the root tree.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-09-03 15:32:40 +02:00
Josef Bacik 6ee697fb25 btrfs-progs: mkfs: use blocks_nr to determine the super used bytes
We were setting the super block's used bytes to a static number.
However the number of blocks we have to write has the correct used size,
so just add up the total number of blocks we're allocating as we
determine their offsets.  This value will be used later which is why I'm
calculating it this way instead of doing the math to set the bytes_super
specifically.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-09-03 15:32:16 +02:00
Josef Bacik 5a164401e9 btrfs-progs: mkfs: get rid of MKFS_SUPER_BLOCK
We use these block's in order to keep track of which blocks need to be
added to the extent tree and where their roots need to be written.
However we skip MKFS_SUPER_BLOCK for all of these helpers, and we don't
actually need to keep track of the specific block we allocated because
it is always BTRFS_SUPER_INFO_OFFSET.  Remove this enum as we don't need
it.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-09-03 15:30:54 +02:00
Josef Bacik 5e63118215 btrfs-progs: mkfs: use an associative array for init blocks
Allow creating trees more flexible, eg. in no fixed order.  To handle
this we want to rework the initial mkfs step to take an array of the
blocks we want to create and use the array to keep track of which blocks
we need to create. Use that for current format, make it ready for extent
tree v2.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-09-03 15:26:35 +02:00
David Sterba 0534441e2e btrfs-progs: tests: add mkfs test for raid0/1 and raid10/2
Extend basic tests with the degenerate raid0 and raid10, coming in 5.15.
Mount of a freshly created filesystem works even on older kernels too.

Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-27 15:40:53 +02:00
David Sterba a177ef7dd4 btrfs-progs: mkfs: allow degenerate raid0/raid10
Kernel patch b2f78e88052bc0bee ("btrfs: allow degenerate raid0/raid10")
in
5.15 will allow mounting and converting to single device raid0 or two
device raid10.  Let mkfs create such filesystem.

"The motivation is to allow to preserve the profile type as long as it
 possible for some intermediate state (device removal, conversion), or
 when there are disks of different size, with raid0 the otherwise
 unusable space of the last device will be used too.  Similarly for
 raid10, though the two largest devices would need to be the same."

Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-27 15:40:53 +02:00
Li Zhang b199123b33 btrfs-progs: build: fix detection of ext4 i_{a,c,a}time_extra
Running convert-tests.sh Reported that the 019-ext4-copy-timestamps test
failed:

  ...
  mount -o loop -t ext4 btrfs-progs/tests/test.img btrfs-progs/tests/mnt
  ====== RUN CHECK touch btrfs-progs/tests/mnt/file
  ====== RUN CHECK stat btrfs-progs/tests/mnt/file
  File: 'btrfs-progs/tests/mnt/file'
  Size: 0           Blocks: 0          IO Block: 4096   regular empty file
  Device: 700h/1792d  Inode: 13          Links: 1
  Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
  Context: unconfined_u:object_r:unlabeled_t:s0
  Access: 2021-08-24 22:10:21.999209679 +0800
  Modify: 2021-08-24 22:10:21.999209679 +0800
  Change: 2021-08-24 22:10:21.999209679 +0800
  ...
  btrfs-progs/btrfs-convert btrfs-progs/tests/test.img
  ...
  ====== RUN CHECK mount -t btrfs -o loop btrfs-progs/tests/test.img btrfs-progs/tests/mnt
  ====== RUN CHECK stat btrfs-progs/tests/mnt/file
  File: 'btrfs-progs/tests/mnt/file'
  Size: 0           Blocks: 0          IO Block: 4096   regular empty file
  Device: 2ch/44d Inode: 267         Links: 1
  Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
  Context: unconfined_u:object_r:unlabeled_t:s0
  Access: 2021-08-24 22:10:21.000000000 +0800
  Modify: 2021-08-24 22:10:21.000000000 +0800
  Change: 2021-08-24 22:10:21.000000000 +0800
  ...
  atime on converted inode does not match
  test failed for case 019-ext4-copy-timestamps

Obviously, the log says that btrfs-convert does not support nanoseconds.
I looked at the source code and found that only if ext2_fs.h defines
EXT4_EPOCH_MASK btrfs-convert to support nanoseconds. But in e2fsprogs,
EXT4_EPOCH_MASK was introduced in v1.43, but in some older versions,
such as v1.40, e2fsprogs actually supports nanoseconds. It seems that if
struct ext2_inode_large contains the i_atime_extra member, ext4 is
supports nanoseconds, so I updated the logic to determine whether the
current ext4 file system supports nanosecond precision.  In addition, I
imported some definitions to encode and decode tv_nsec (copied from
e2fsprogs source code).

Author: Li Zhang <zhanglikernel@gmail.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-26 20:34:36 +02:00
Qu Wenruo 325dba6432 btrfs-progs: check: output proper csum values for --check-data-csum
[BUG]
When running "btrfs check --check-data-csum" on fs with corrupted data,
the error message almost makes no sense:

  $ btrfs check --check-data-csum /dev/test/test
  Opening filesystem to check...
  Checking filesystem on /dev/test/test
  UUID: c31afe0a-55bc-4e7d-aba0-9dfa9ddf8090
  [1/7] checking root items
  [2/7] checking extents
  [3/7] checking free space cache
  [4/7] checking fs roots
  [5/7] checking csums against data
  mirror 1 bytenr 13631488 csum 19 expected csum 152 <<<
  ERROR: errors found in csum tree
  [6/7] checking root refs
  [7/7] checking quota groups skipped (not enabled on this FS)
  found 147456 bytes used, error(s) found
  total csum bytes: 16
  total tree bytes: 131072
  total fs tree bytes: 32768
  total extent tree bytes: 16384
  btree space waste bytes: 124799
  file data blocks allocated: 16384
   referenced 16384

[CAUSE]
We're just outputting the first byte and in decimal, which is completely
different from what we did in kernel space, nor what we did for metadata
csum mismatch.

[FIX]
Use btrfs_format_csum() for btrfs-check to output csum.

Now the result looks much better:

  [5/7] checking csums against data
  mirror 1 bytenr 13631488 csum 0x13fec125 expected csum 0x98757625

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-26 14:27:40 +02:00
Qu Wenruo 773afad3e6 btrfs-progs: slightly enhance btrfs_format_csum()
- Change it void
  The old one always return csum_size.

- Use snprintf()

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-26 14:27:01 +02:00
Qu Wenruo 991a598f53 btrfs-progs: move btrfs_format_csum() to common/utils.[ch]
Function btrfs_format_csum() is a special helper only used in
btrfs-progs.

Move it to common/utils.[ch] other than leaving it in
kernel-shared/disk-io.c.

Since we're moving the code, also introduce a macro,
BTRFS_CSUM_STRING_LEN, to replace open-coded string length calculation.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-26 14:26:13 +02:00
Josef Bacik 4d57632e2f btrfs-progs: tests: add image with an invalid super bytes_used
This is used to validate the detection and correction code in both fsck
modes for an invalid bytes_used value in the super block.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-25 15:43:13 +02:00
Josef Bacik e64af00bd1 btrfs-progs: check: detect and fix problems with super_bytes_used
We do not detect problems with our bytes_used counter in the super
block.  Thankfully the same method to fix block groups is used to re-set
the value in the super block, so simply add some extra code to validate
the bytes_used field and then piggy back on the repair code for block
groups.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-25 15:39:48 +02:00
Josef Bacik 04520da77b btrfs-progs: check: do not infinite loop on corrupt keys with lowmem mode
By enabling the lowmem checks properly I uncovered the case where test
fsck/007 will infinite loop at the detection stage.  This is because
when checking the inode item we will just btrfs_next_item(), and because
we ignore check tree block failures at read time we don't get an -EIO
from btrfs_next_leaf.  Generally what check usually does is validate the
leaves/nodes as we hit them, but in this case we're not doing that.  Fix
this by checking the leaf if we move to the next one and if it fails
bail.  This allows us to pass the fsck/007 test with lowmem.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-25 15:39:48 +02:00
Josef Bacik 4a1863d638 btrfs-progs: check: do not double add unaligned extent records
The repair cycle in the main check will drop all of our cache and loop
through again to make sure everything is still good to go.
Unfortunately we record our unaligned extent records on a per-root list
so they can be retrieved when we're checking the fs roots.  This isn't
straightforward to clean up, so instead simply check our current list of
unaligned extent records when we are adding a new one to make sure we're
not duplicating our efforts.  This makes us able to pass fsck/001 with
my super bytes_used fix applied.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-25 15:38:54 +02:00
Josef Bacik 8c3c13bb45 btrfs-progs: check blocks in btrfs_next_sibling_block
By enabling the lowmem checks properly I uncovered the case where test
fsck/007 will infinite loop at the detection stage.  This is because
when checking the inode item we will just btrfs_next_item(), and because
we ignore check tree block failures at read time we don't get an -EIO
from btrfs_next_leaf.

This occurs because we allow fsck to raw-read blocks even if they fail
basic sanity checks, because we want the opportunity to repair the
blocks.  However this means corrupt blocks are sitting in cache marked
as uptodate.  btrfs_search_slot() handles this by doing a check_block()
on every block we add to the path, so that anything that is doing a
search gets a proper -EIO.

btrfs_next_sibling_block() needs a similar check.  With this fix we now
return -EIO on btrfs_next_leaf() properly and we no longer infinite loop
on fsck/007 with lowmem.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-25 15:38:54 +02:00
Josef Bacik 71404c50e8 btrfs-progs: tests: fix running check mode lowmem tests
When I added the invalid super image I saw that the lowmem tests were
passing, despite not having the detection code yet.  Turns out this is
because we weren't using a run command helper which does the proper
expansion and adds the --mode=lowmem option.  Fix this to use the proper
handler, and now the lowmem test fails properly without my patch to add
this support to the lowmem mode.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-25 15:38:54 +02:00
Josef Bacik a4190da45e btrfs-progs: check btrfs_super_used in lowmem check
We can already fix this problem with the block accounting code, we just
need to keep track of how much we should have used on the file system,
and then check it against the bytes_super.  The repair just piggy backs
on the block group used repair.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-25 15:38:54 +02:00
Josef Bacik c3521f8a57 btrfs-progs: propagate extent item errors in lowmem mode
Test 044 was failing with lowmem because it was not bubbling up the
error to the user.  This is because we try to allow repair the
opportunity to clear the error, however if repair isn't set we simply do
not add the temporary error to the main error return variable.  Fix this
by adding the tmp_err to err before moving on to the next item.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-25 15:38:54 +02:00
Josef Bacik 477775946d btrfs-progs: propagate fs root errors in lowmem mode
We have a check that will return an error only if ret < 0, but we return
the lowmem specific errors which are all > 0.  Fix this by simply
checking if (ret).  This allows test 010 to pass with lowmem properly.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-25 15:38:54 +02:00
David Sterba 28df4bbd74 btrfs-progs: corupt-block: leave only long option for --block-group
Long options are always preferred and in case there's a long list of
single letter options it's the best practice to keep the options sane.

Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-25 15:38:54 +02:00
Josef Bacik 72a37dca27 btrfs-progs: tests: add image with a corrupt block group item
This image has a broken used field of a block group item to validate
fsck does the correct thing.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-25 15:38:54 +02:00
Josef Bacik a832d36b59 btrfs-progs: check: detect and fix invalid used for block groups
The lowmem mode validates the used field of the block group item, but
the normal mode does not.  Fix this by keeping a running tally of what
we think the used value for the block group should be, and then if it
mismatches report an error and fix the problem if we have repair set.
We have to keep track of pending extents because we process leaves as we
see them, so it could be much later in the process that we find the
block group item to associate the extents with.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-25 15:38:54 +02:00
Josef Bacik 572a0e888a btrfs-progs: corrupt-block: add ability to corrupt block group items
While doing the extent tree v2 stuff I noticed that fsck doesn't detect
an invalid ->used value on the block group item in the normal mode.  To
build a test case for this I need the ability to corrupt block group
items.  This allows us to corrupt the various fields of a block group.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-25 15:38:54 +02:00
Qu Wenruo 3875c14a5a btrfs-progs: image: fix restored image size misalignment
[BUG]
There is a small device size misalignment between the super block device
size and the device extent size:

  total_bytes             10737418240 	<<<
  bytes_used              15097856
  dev_item.total_bytes    10737418240
  dev_item.bytes_used     1094713344

        item 0 key (DEV_ITEMS DEV_ITEM 1) itemoff 16185 itemsize 98
                devid 1 total_bytes 1095761920 bytes_used 1094713344
				    ^^^^^^^^^^

[CAUSE]
In fixup_device_size(), we only reset superblock device item size, which
will be overwritten in write_dev_supers() using btrfs_device::total_bytes.

And it doesn't touch btrfs_superblock::total_bytes either.

[FIX]
So fix the small mismatch by also resetting btrfs_device::total_bytes,
btrfs_device::bytes_used and btrfs_superblock::total_bytes.

Thankfully since commit 73dd4e3c87 ("btrfs-progs: image: Don't modify
the chunk and device tree if the source dump is single device") single
device dump won't have such problem, but it's still worth for
multi-device dump.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-25 15:38:53 +02:00
Qu Wenruo c4ff83cb5d btrfs-progs: image: reduce memory requirements for decompression
With recent change to enlarge max_pending_size to 256M for data dump,
the decompress code requires quite a lot of memory, up to 256M * 4.

The reason is we're using wrapped uncompress() function call, which
needs the buffer to be large enough to contain the decompressed data.

This patch will re-work the decompress work to use inflate() which can
resume it decompression so that we can use a much smaller buffer size.

Use 512K as buffer size.

Now the memory consumption for restore is reduced to

 cluster data size + 512K * nr_running_threads

Instead of the original one:

 cluster data size + 1G * nr_running_threads

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-25 15:38:53 +02:00
Qu Wenruo 0cc8a7bd4a btrfs-progs: image: introduce -d option to dump data
This new experimental data dump feature will dump the whole image, not
only the existing tree blocks but also all its data extents(*).

This feature will rely on the new dump format (_DUmP_v1), as it needs
extra large extent size limit, and older btrfs-image dump can't handle
such large item/cluster size.

Since we're dumping all extents including data extents, for the restored
image there is no need to use any extra super block flags to inform
kernel.
Kernel should just treat the restored image as any ordinary btrfs.

This new feature will be hidden behind the experimental features, that's
to say, if --enable-experimental is not enabled, although we still have
the option, it will not do anything but output an error message.

*: The data extents will be dumped as is, that's to say, even for
preallocated extent, its (meaningless) data will be read out and
dumpped.
This behavior will cause extra space usage for the image, but we can
skip all the complex partially shared preallocated extent check.

Issue: #394
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-25 15:38:53 +02:00
Qu Wenruo 6b03e0dc40 btrfs-progs: image: introduce framework for more dump versions
The original dump format only contains a magic member to verify the
format, this means if we want to introduce new on-disk format or change
certain size limit, we can only introduce new magic as version.

Introduce the framework to allow multiple magic numbers to co-exist for
further extensions.

Introduce the following members for each dump version.

- max_pending_size
  The threshold size of a cluster. It's not a hard limit but a soft
  one. One cluster can go larger than max_pending_size for one item, but
  next item would go to the next cluster.

- magic_cpu
  The magic number in CPU byte order.

- extra_sb_flags
  If the super block of this restore needs extra super block flags like
  BTRFS_SUPER_FLAG_METADUMP_V2.
  For incoming data dump feature, we don't need any extra super block
  flags.

This change also implies that all image dumps will use the same magic
for all clusters. No mixing is allowed, as we will use the first cluster
to determine the dump version.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-08-25 15:38:53 +02:00