btrfs-progs

Author	SHA1	Message	Date
David Sterba	a177ef7dd4	btrfs-progs: mkfs: allow degenerate raid0/raid10 Kernel patch b2f78e88052bc0bee ("btrfs: allow degenerate raid0/raid10") in 5.15 will allow mounting and converting to single device raid0 or two device raid10. Let mkfs create such filesystem. "The motivation is to allow to preserve the profile type as long as it possible for some intermediate state (device removal, conversion), or when there are disks of different size, with raid0 the otherwise unusable space of the last device will be used too. Similarly for raid10, though the two largest devices would need to be the same." Signed-off-by: David Sterba <dsterba@suse.com>	2021-08-27 15:40:53 +02:00
Qu Wenruo	991a598f53	btrfs-progs: move btrfs_format_csum() to common/utils.[ch] Function btrfs_format_csum() is a special helper only used in btrfs-progs. Move it to common/utils.[ch] other than leaving it in kernel-shared/disk-io.c. Since we're moving the code, also introduce a macro, BTRFS_CSUM_STRING_LEN, to replace open-coded string length calculation. Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-08-26 14:26:13 +02:00
Josef Bacik	8c3c13bb45	btrfs-progs: check blocks in btrfs_next_sibling_block By enabling the lowmem checks properly I uncovered the case where test fsck/007 will infinite loop at the detection stage. This is because when checking the inode item we will just btrfs_next_item(), and because we ignore check tree block failures at read time we don't get an -EIO from btrfs_next_leaf. This occurs because we allow fsck to raw-read blocks even if they fail basic sanity checks, because we want the opportunity to repair the blocks. However this means corrupt blocks are sitting in cache marked as uptodate. btrfs_search_slot() handles this by doing a check_block() on every block we add to the path, so that anything that is doing a search gets a proper -EIO. btrfs_next_sibling_block() needs a similar check. With this fix we now return -EIO on btrfs_next_leaf() properly and we no longer infinite loop on fsck/007 with lowmem. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-08-25 15:38:54 +02:00
Qu Wenruo	a138daac17	btrfs-progs: mkfs: set super_cache_generation to 0 if we're using free space tree [HICCUP] There is a bug report that mkfs.btrfs -R free-space-tree still makes kernel to try to cleanup the v1 space cache: # mkfs.btrfs -R free-space-tree -f /dev/test/scratch1 # mount /dev/test/scratch1 /mnt/btrfs # dmesg \| grep cleaning BTRFS info (device dm-6): cleaning free space cache v1 [CAUSE] By default, mkfs.btrfs will set super cache generation to (u64)-1, which will inform kernel that the v1 space cache is invalid, needs to regenerate it. But for free space cache tree, kernel will set super cache generation to 0, to indicate v1 space cache is not in use. This means, even we enabled free space tree with all the RO compatible bits and new tree, as long as super cache generation is not 0, kernel still consider the fs has some invalid v1 space cache, and will try to remove them. [FIX] This is not a big deal, but to make the "-R free-space-tree" to really work as kernel, we also need to set super cache generation to 0. Reported-by: Chris Murphy <lists@colorremedies.com> Link: https://lore.kernel.org/linux-btrfs/CAJCQCtSvgzyOnxtrqQZZirSycEHp+g0eDH5c+Kw9mW=PgxuXmw@mail.gmail.com/ Reviewed-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-08-20 14:24:55 +02:00
David Sterba	6527771668	btrfs-progs: add nparity for raid1c34 definitions The values of .ncopies was not explicitly set. Signed-off-by: David Sterba <dsterba@suse.com>	2021-07-23 00:59:27 +02:00
Qu Wenruo	07ecf878c1	btrfs-progs: check: batch v1 space cache inodes when clearing Currently v1 space cache clearing will delete one cache inode just in one transaction, and then start a new transaction to delete the next inode. This is far from efficient and can make the already slow v1 space cache deleting even slower, as large fs has tons of cache inodes to delete. This patch will speed up the process by batching up to 16 inode deletion into one transaction. A quick benchmark of deleting 702 v1 space cache inodes would look like this: Unpatched: 4.898s Patched: 0.087s Which is obviously a big win. Reported-by: Joshua <joshua@mailmag.net> Link: https://lore.kernel.org/linux-btrfs/0b4cf70fc883e28c97d893a3b2f81b11@mailmag.net/ Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-07-22 16:26:05 +02:00
Sidong Yang	94f3b75c00	btrfs-progs: zoned: fix memory leak in btrfs_sb_io() In btrfs_sb_io(), blk_zone_report is used for getting information about zones. But it is not freed if code goes in usual path. This patch frees the variable just after it used. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Sidong Yang <realwakka@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-07-02 17:27:53 +02:00
David Sterba	1dc6f33c28	btrfs-progs: zoned: use fixed width type when reading zone size The ioctl BLKGETZONESZ expects 32bit integer, declare the target variable as such. Signed-off-by: David Sterba <dsterba@suse.com>	2021-07-02 17:27:53 +02:00
David Sterba	b1f374dd1d	btrfs-progs: switch %Lu to %llu format The %Lu format is not standard and we use %llu everywhere else, so switch the remaining cases. Signed-off-by: David Sterba <dsterba@suse.com>	2021-06-19 22:07:49 +02:00
David Sterba	9f6c055e38	btrfs-progs: dump-tree: add options to dump checksums Add new options to dumps checksums in node headers and in the checksum items: $ btrfs inspect dump-tree --csum-headers image root tree leaf 471515136 items 19 free space 12186 generation 15 owner ROOT_TREE leaf 471515136 flags 0x1(WRITTEN) backref revision 1 csum 0x756b2d54 fs uuid df0348df-5773-47dd-81e9-a18221461239 For nodes/leaves it's appended on the 2nd line of the header. Checksum items are stored in leaves as EXTENT_CSUM key type, with offset value as the logical offset starting. As the array would be hard to parse or match, each offset value is printed with the checksum. For crc32c it's 4 values on a line, for xxhash it's 2 and for the long 256bit checksums it's one checksum per line. $ btrfs inspect dump-tree --csum-items image leaf 5423104 items 1 free space 30 generation 6 owner CSUM_TREE leaf 5423104 flags 0x1(WRITTEN) backref revision 1 fs uuid bd7c981e-16ff-4081-a734-3ef5d50cafc1 chunk uuid 13f4c76c-7845-4984-88ed-f01b52e05cf8 item 0 key (EXTENT_CSUM EXTENT_CSUM 22020096) itemoff 55 itemsize 16228 range start 22020096 end 38637568 length 16617472 [22020096] 0x8941f998 [22024192] 0x8941f998 [22028288] 0x8941f998 [22032384] 0x8941f998 [22036480] 0x8941f998 [22040576] 0x8941f998 [22044672] 0x8941f998 [22048768] 0x8941f998 ... $ btrfs inspect dump-tree --csum-items image leaf 5718016 items 1 free space 7746 generation 6 owner CSUM_TREE leaf 5718016 flags 0x1(WRITTEN) backref revision 1 fs uuid f453a5b4-8b4a-4fbf-90a2-2925e4fe2335 chunk uuid eb1da63b-248b-44c2-82da-71b2564bf50e item 0 key (EXTENT_CSUM EXTENT_CSUM 52387840) itemoff 7771 itemsize 8512 range start 52387840 end 53477376 length 1089536 [52387840] 0x686ede9288c391e7e05026e56f2f91bfd879987a040ea98445dabc76f55b8e5f [52391936] 0x686ede9288c391e7e05026e56f2f91bfd879987a040ea98445dabc76f55b8e5f ... The options are not on by default, the header checksum is not important for the structures. Data checksums can be quite big so that would make the dump long and without any actual data to match against. Signed-off-by: David Sterba <dsterba@suse.com>	2021-06-19 22:07:49 +02:00
David Sterba	72d710637c	btrfs-progs: print-tree: convert mode to bitmask Replace follow and traverse by one parameter that takes bits to affect the behaviour. This allows to extend btrfs_print_tree output with more modes from one place. Signed-off-by: David Sterba <dsterba@suse.com>	2021-06-09 20:31:49 +02:00
David Sterba	6134973527	btrfs-progs: zoned: make it work without kernel support There's a report that a system with 4.19 kernel fails boot because device scan exits with error. This is because zoned support is compiled in btrfs-progs but not in kernel. To make new progs and old kernels work, do a fallback when the zoned ioctl is not available, as if it were a non-zoned device. There is no other option, but this is safe at least for the device scan that would not error out. Any unaligned writes to a zoned device will fail as expected. Issue: #376 Signed-off-by: David Sterba <dsterba@suse.com>	2021-06-07 17:38:46 +02:00
Su Yue	80a86f1b47	btrfs-progs: do not BUG_ON if btrfs_add_to_fsid succeeded to write superblock Commit `8ef9313cf2` ("btrfs-progs: zoned: implement log-structured superblock") changed to write BTRFS_SUPER_INFO_SIZE bytes to device. The before num of bytes to be written is sectorsize. It causes mkfs.btrfs failed on my 16k pagesize kvm: $ /usr/bin/mkfs.btrfs -s 16k -f -mraid0 /dev/vdb2 /dev/vdb3 btrfs-progs v5.12 See http://btrfs.wiki.kernel.org for more information. ERROR: superblock magic doesn't match ERROR: superblock magic doesn't match common/device-scan.c:195: btrfs_add_to_fsid: BUG_ON `ret != sectorsize` triggered, value 1 /usr/bin/mkfs.btrfs(btrfs_add_to_fsid+0x274)[0xaaab4fe8a5fc] /usr/bin/mkfs.btrfs(main+0x1188)[0xaaab4fe4dc8c] /usr/lib/libc.so.6(__libc_start_main+0xe8)[0xffff7223c538] /usr/bin/mkfs.btrfs(+0xc558)[0xaaab4fe4c558] [1] 225842 abort (core dumped) /usr/bin/mkfs.btrfs -s 16k -f -mraid0 /dev/vdb2 /dev/vdb3 btrfs_add_to_fsid() now always calls sbwrite() to write BTRFS_SUPER_INFO_SIZE bytes to device, so change condition of the BUG_ON(). Also add comments for sbread() and sbwrite(). Signed-off-by: Su Yue <l@damenly.su> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-12 16:00:14 +02:00
David Sterba	6c53222add	btrfs-progs: delete bogus zero checksum check The check condition (csum_result == 0) does not make sense anymore as it's not the buffer and not the crc32c result as it used to be. The message does not bring any value and looks like it's some debugging aid from the old times (added in 2008 as `bb7055ec21` ("Add some extra debugging around file data checksum failures")). Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-08 00:58:51 +02:00
David Sterba	c19ac510a7	btrfs-progs: move repair.[ch] to common/ Move the file to common as it's used by several parts, while still keeping the name 'repair' although the only thing it does is adding a corrupted extent. Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:47 +02:00
David Sterba	b19a603d62	btrfs-progs: remove unnecessary linux/*.h includes Decrease dependency on system headers, remove where they're not needed or became stale after code moved. The path-utils.h encapsulate path operations so include linux/limits.h here, that's where PATH_MAX is defined. Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:47 +02:00
David Sterba	aa56bf3a31	btrfs-progs: zoned: replace raw ioctl with a helper for device size Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
David Sterba	c7b5f884e0	btrfs-progs: add prefix to zero_blocks This is a public helper for devices, add the prefix to make it clear. Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
David Sterba	2b5d4f2e6f	btrfs-progs: add prefix to discard_blocks This is a helper for devices, make it clear in the function name. Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
David Sterba	bc6864967b	btrfs-progs: add prefix to exported queue_param As this is a public helper, add a prefix that makes it clear what is the queue related to. Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
David Sterba	38254c4934	btrfs-progs: kerncompat: add const_ilog2 The newly added zoned mode constants can utilize the const ilog2 version. Copy it from kernel include/linux/log2.h. Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
Naohiro Aota	8c2dfa6387	btrfs-progs: zoned: wipe temporary superblocks in superblock log zone mkfs.btrfs uses a temporary superblock during the initialization process. The temporary superblock uses BTRFS_MAGIC_TEMPORARY as its magic which is different from a regular superblock. As a result, libblkid, which only supports the usual magic, cannot recognize the volume as btrfs. So, let's wipe the temporary magic before writing out the usual superblock. Technically, we can add the temporary magic to the libblkid's table. But, it will result in recognizing a half-baked filesystem as btrfs, which is not ideal. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
Naohiro Aota	8bbb0c5744	btrfs-progs: zoned: support zero out on zoned block device If we zero out a region in a sequential write required zone, we cannot write to the region until we reset the zone. Thus, we must prohibit zeroing out to a sequential write required zone. zero_dev_clamped() is modified to take the zone information and it calls zero_zone_blocks() if the device is host managed to avoid writing to sequential write required zones. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
Naohiro Aota	58ec593892	btrfs-progs: zoned: support resetting zoned device All zones of zoned block devices should be reset before writing. Support this by introducing PREP_DEVICE_ZONED. btrfs_reset_all_zones() walk all the zones on a device, and reset a zone if it is sequential required zone, or discard the zone range otherwise. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:46 +02:00
Naohiro Aota	bfdb3ae237	btrfs-progs: zoned: reset zone of freed block group When freeing a chunk, we can/should reset the underlying device zones for the chunk. Introduce btrfs_reset_chunk_zones() and reset the zones. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	bfd34b7876	btrfs-progs: zoned: redirty clean extent buffers Tree manipulating operations like merging nodes often release once-allocated tree nodes. Btrfs cleans such nodes so that pages in the node are not uselessly written out. On ZONED drives, however, such optimization blocks the following IOs as the cancellation of the write out of the freed blocks breaks the sequential write sequence expected by the device. Check if next dirty extent buffer is continuous to a previously written one. If not, it redirty extent buffers between the previous one and the next one, so that all dirty buffers are written sequentially. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	feff533e34	btrfs-progs: zoned: calculate allocation offset for conventional zones Conventional zones do not have a write pointer, so we cannot use it to determine the allocation offset for sequential allocation if a block group contains a conventional zone. But instead, we can consider the end of the highest addressed extent in the block group for the allocation offset. For new block group, we cannot calculate the allocation offset by consulting the extent tree, because it can cause deadlock by taking extent buffer lock after chunk mutex, which is already taken in btrfs_make_block_group(). Since it is a new block group anyways, we can simply set the allocation offset to 0. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	50ae9f62c7	btrfs-progs: zoned: implement sequential extent allocation Implement a sequential extent allocator for zoned filesystems. This allocator only needs to check if there is enough space in the block group after the allocation pointer to satisfy the extent allocation request. Since the allocator is really simple, we implement it directly in find_search_start(). Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	f08410f078	btrfs-progs: zoned: load zone's allocation offset A zoned filesystem must allocate blocks at the zones' write pointer. The device's write pointer position can be mapped to a logical address within a block group. To facilitate this, add an "alloc_offset" to the block group to track the logical addresses of the write pointer. This logical address is populated in btrfs_load_block_group_zone_info() from the write pointers of corresponding zones. For now, zoned filesystems the single profile. Supporting non-single profile with zone append writing is not trivial. For example, in the DUP profile, we send a zone append writing IO to two zones on a device. The device reply with written LBAs for the IOs. If the offsets of the returned addresses from the beginning of the zone are different, then it results in different logical addresses. We need fine-grained logical to physical mapping to support such separated physical address issue. Since it should require additional metadata type, disable non-single profiles for now. This commit supports the case all the zones in a block group are sequential. The next patch will handle the case having a conventional zone. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	b031fe84fd	btrfs-progs: zoned: implement zoned chunk allocator Implement a zoned chunk and device extent allocator. One device zone becomes a device extent so that a zone reset affects only this device extent and does not change the state of blocks in the neighbor device extents. To implement the allocator, we need to extend the following functions for a zoned filesystem: - init_alloc_chunk_ctl - dev_extent_search_start - dev_extent_hole_check - decide_stripe_size Here, dev_extent_hole_check() is newly introduced to check the validity of a hole found. init_alloc_chunk_ctl_zoned() is mostly the same as regular one. It always set the stripe_size to the zone size and aligns the parameters to the zone size. dev_extent_search_start() only aligns the start offset to zone boundaries. We don't care about the first 1MB like in regular filesystem because we anyway reserve the first two zones for superblock logging. dev_extent_hole_check_zoned() checks if zones in given hole are either conventional or empty sequential zones. Also, it skips zones reserved for superblock logging. With the change to the hole, the new hole may now contain pending extents. So, in this case, loop again to check that. Finally, decide_stripe_size_zoned() should shrink the number of devices instead of stripe size because we need to honor stripe_size == zone_size. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	8ef9313cf2	btrfs-progs: zoned: implement log-structured superblock Superblock (and its copies) is the only data structure in btrfs which has a fixed location on a device. Since we cannot overwrite in a sequential write required zone, we cannot place superblock in the zone. One easy solution is limiting superblock and copies to be placed only in conventional zones. However, this method has two downsides: one is reduced number of superblock copies. The location of the second copy of superblock is 256GB, which is in a sequential write required zone on typical devices in the market today. So, the number of superblock and copies is limited to be two. Second downside is that we cannot support devices which have no conventional zones at all. To solve these two problems, we employ superblock log writing. It uses two adjacent zones as a circular buffer to write updated superblocks. Once the first zone is filled up, start writing into the second one. Then, when both zones are filled up and before starting to write to the first zone again, reset the first zone. We can determine the position of the latest superblock by reading write pointer information from a device. One corner case is when both zones are full. For this situation, we read out the last superblock of each zone, and compare them to determine which zone is older. The following zones are reserved as the circular buffer on ZONED btrfs. - primary superblock: offset 0B (and the following zone) - first copy: offset 512G (and the following zone) - Second copy: offset 4T (4096G, and the following zone) If these reserved zones are conventional, superblock is written fixed at the start of the zone without logging. Currently, superblock reading/writing is done by pread/pwrite. This commit replace the call sites with sbread/sbwrite to wrap the functions. For zoned btrfs, btrfs_sb_io which is called from sbread/sbwrite reverses the IO position back to a mirror number, maps the mirror number into the superblock logging position, and do the IO. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	49d5ce4d0f	btrfs-progs: zoned: allow zoned filesystems on non-zoned block devices Run a zoned filesystem on non-zoned devices. This is done by "slicing up" the block device into fixed-sized chunks and emulate a conventional zone on each of them. The emulated zone size is determined from the size of device extent. This is mainly aimed at testing of zoned filesystems, i.e. the zoned chunk allocator, on regular block devices. Currently, we always use EMULATED_ZONE_SIZE (256MiB) for the emulated zone size. In the future, this will be customized by mkfs option. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	707f0716e0	btrfs-progs: zoned: disallow mixed-bg in ZONED mode Placing both data and metadata in a block group is impossible in ZONED mode. For data, we can allocate a space for it and write it immediately after the allocation. For metadata, however, we cannot do that, because the logical addresses are recorded in other metadata buffers to build up the trees. As a result, a data buffer can be placed after a metadata buffer, which is not written yet. Writing out the data buffer will break the sequential write rule. Check and disallow MIXED_BG with ZONED mode. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	3c0f83e541	btrfs-progs: zoned: introduce max_zone_append_size The zone append write command has a maximum IO size restriction it accepts. This is because a zone append write command cannot be split, as we ask the device to place the data into a specific target zone and the device responds with the actual written location of the data. Introduce max_zone_append_size to zone_info and fs_info to track the value, so we can limit all I/O to a zoned block device that we want to write using the zone append command to the device's limits. Zone append command is mandatory for zoned btrfs. So, reject a device with max_zone_append_size == 0. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	7e520022ff	btrfs-progs: zoned: check and enable ZONED mode Introduce function btrfs_check_zoned_mode() to check if ZONED flag is enabled on the file system and if the file system consists of zoned devices with equal zone size. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	384840b9c0	btrfs-progs: zoned: get zone information of zoned block devices Get the zone information (number of zones and zone size) from all the devices, if the volume contains a zoned block device. To avoid costly run-time zone report commands to test the device zones type during block allocation, it also records all the zone status (zone type, write pointer position, etc.). Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	242c8328bc	btrfs-progs: zoned: add new ZONED feature flag With the zoned feature enabled, a zoned block device-aware btrfs allocates block groups aligned to the device zones and always written in sequential zones at the zone write pointer position. It also supports "emulated" zoned mode on a non-zoned device. In the emulated mode, btrfs emulates conventional zones by slicing the device into fixed-size zones. We don't support conversion from the ext4 volume with the zoned feature because we can't be sure all the converted block groups are aligned to zone boundaries. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	acdd22ab68	btrfs-progs: provide fs_info from btrfs_device Likewise in the kernel code, provide fs_info access from struct btrfs_device. This will help to unify the code between the kernel and the userland. Since fs_info can be NULL at the time of btrfs_add_to_fsid(), let's use btrfs_open_devices() to set fs_info to the devices. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	cf67267d33	btrfs-progs: rename calc_size to stripe_size alloc_chunk_ctl::calc_size is actually the stripe_size in the kernel side code. Let's rename it to clarify what the "calc" is. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	a968606632	btrfs-progs: simplify arguments of chunk_bytes_by_type() Chunk_bytes_by_type() takes type, calc_size, and ctl as arguments. But the first two can be obtained from the ctl. Let's drop these arguments for simplicity. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:45 +02:00
Naohiro Aota	3428487d90	btrfs-progs: drop alloc_chunk_ctl::stripe_len Since commit `b9444efb66` ("btrfs-progs: don't pretend RAID56 has a different stripe length"), alloc_chunk_ctl::stripe_len is always fixed to BTRFS_STRIPE_LEN. Let's replace alloc_chunk_ctl::stripe_len with BTRFS_STRIPE_LEN, like in the kernel code. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:44 +02:00
Naohiro Aota	2d3b31d604	btrfs-progs: use round_down for allocation calcs Several calculations in the chunk allocation process use this pattern. x /= y; x *= y; Replace this pattern with round_down(). Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:44 +02:00
Naohiro Aota	907c5fd7a4	btrfs-progs: fix to use half the available space for DUP profile In the DUP profile, we can use only half of the space available in a device extent. Fix the calculation of calc_size for it. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:44 +02:00
Naohiro Aota	13a6cff8b6	btrfs-progs: rewrite btrfs_alloc_data_chunk() using create_chunk() btrfs_alloc_data_chunk() and create_chunk() have the most part in common. Let's rewrite btrfs_alloc_data_chunk() using create_chunk(). There are two differences between btrfs_alloc_data_chunk() and create_chunk(). create_chunk() uses find_next_chunk() to decide the logical address of the chunk, and it uses btrfs_alloc_dev_extent() to decide the physical address of a device extent. On the other hand, btrfs_alloc_data_chunk() uses *start for both logical and physical addresses. To support the btrfs_alloc_data_chunk()'s use case, we use ctl->start and ctl->dev_offset. If these values are set (non-zero), use the specified values as the address. It is safe to use 0 to indicate the value is not set here. Because both lower addresses of logical (0..BTRFS_FIRST_CHUNK_TREE_OBJECT_ID) and physical (0..BTRFS_BLOCK_RESERVED_1M_FOR_SUPER) are reserved. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:44 +02:00
Naohiro Aota	1e344dc8cf	btrfs-progs: factor out create_chunk() Factor out create_chunk() from btrfs_alloc_chunk(). This new function creates a chunk. There is no functional changes. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:44 +02:00
Naohiro Aota	0e3865206e	btrfs-progs: factor out decide_stripe_size() Factor out decide_stripe_size() from btrfs_alloc_chunk(). This new function calculates the actual stripe size to allocate and decides the size of a stripe (ctl->calc_size). This commit has no functional changes. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:44 +02:00
Naohiro Aota	3acac33e8c	btrfs-progs: consolidate parameter initialization of regular allocator Move parameter initialization code for regular allocator to init_alloc_chunk_ctl_policy_regular(). This will help adding another allocator in the future. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:44 +02:00
Naohiro Aota	605ffad6f0	btrfs-progs: convert type of alloc_chunk_ctl::type Convert alloc_chunk_ctl::type to take the original type in btrfs_alloc_chunk(). This will help refactoring in the following commits. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:44 +02:00
Naohiro Aota	ff65b83306	btrfs-progs: refactor find_free_dev_extent_start() Factor out the function dev_extent_search_start() from find_free_dev_extent_start() to decide the starting position of a device extent search. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:44 +02:00
Naohiro Aota	1da9fede64	btrfs-progs: introduce chunk allocation policy Introduce chunk allocation policy for btrfs. This policy controls how chunks and device extents are allocated from devices. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-05-06 16:41:44 +02:00

1 2

85 commits