# zpool data (re)distribution



## xibo (Mar 16, 2012)

Hello,

The way I "guess" it, a zpool consisting of multiple vdevs acts like a raid-0 of these vdevs. In raid-0, data is segmented into blocks of some size, which are then distributed over the disks (i.e. vdevs) in a sequential way. So, the questions that arise to me are:


 Does this apply to zpools, too, or will an entire file (independent of size) get onto one vdev, and then the next file go to the next vdev and so on?
 If spreading happens by files and not blocks, how are Inodes spread?
 When another vdev is added to a zpool that already contains data, will only future allocations be distributed, or will a redistribution take place when the new devices are added?
 If a redistribution does not take place, is there a way to force it (other then backup+replay)?

Alonso


----------



## phoenix (Mar 16, 2012)

xibo said:
			
		

> The way I "guess" it, a zpool consisting of multiple vdevs acts like a raid-0 of these vdevs.



Similar to a RAID0, yes.  There are differences, though.



> Does this apply to zpools, too, or will an entire file (independent of size) get onto one vdev, and then the next file go to the next vdev and so on?



Everything in ZFS is block-based.  Thus, data blocks are striped across vdevs, and across the disks in a vdev.  The exact semantics depend on a lot of variables (size of block, number of vdevs, redundancy level of the vdev, fullness of the vdevs, etc).



> When another vdev is added to a zpool that already contains data, will only future allocations be distributed, or will a redistribution take place when the new devices are added?



No rebalancing of existing data happens in ZFS.  Writes are biased toward "less full" vdevs, so the bulk of new writes will go to the new vdev.  However, some writes will still go to other vdevs in the pool.



> If a redistribution does not take place, is there a way to force it (other then backup+replay)?



No, there is no way to rebalance data in a pool.  This requires the "block-pointer re-write" feature that will arrive "Real Soon Now(tm)".    (It's been in development for over 5 years now.)

You can "fake it" via zfs send/recv, or by backup/destroy/create/restore processes.

Instead of "guessing" how things work, have a read through the ZFS Admin Guide, ZFS Best Practices Guide.  And, if you can get access to a 9-STABLE box (updated about 1 month after release of 9.0), then read through the man pages for zpool(8) and zfs(8).  They've been updated to reflect FreeBSD technologies and device naming, and have unsupported features removed for clarity.


----------

