# How to resize a ZFS filesystem



## tovo (Jun 11, 2010)

Hi all,
I'm using a server with 7 disks in hard RAID 5. I built a ZFS filesystem on it (by using gpart and zpool on freebsd 8). I added 1 disk to the RAID 5 pool and now, I'm looking for a way to extend my ZFS filesystem without destroying all my datas.
I didn't find any documentation on it and any help is welcome.

Thanks in advance


----------



## SirDice (Jun 11, 2010)

AFAIK you can't extend an existing RAIDZ volume. You can add another RAIDZ volume to an existing one but that would require at least 3 disks.


----------



## tovo (Jun 11, 2010)

In reality, I don'y use Raidz.
Here are the commands I used to build my ZFS :

```
gpart create -s GPT aacd0
gpart add -t freebsd-zfs aacd0
zpool create data /dev/aacd0p1
zpool add -f data aacd0p1
zfs set mountpoint=/home/data data
```

May be there's something wrong on these commands ?


----------



## SirDice (Jun 11, 2010)

Nothing wrong with the commands. Do I understand you have a hardware RAID 5 controller?

You usually cannot add 1 disk to an existing RAID 5 volume. This will depend on your controller though. It may have this feature.


----------



## tovo (Jun 11, 2010)

Indeed, I have a hardware controller (Adaptec 5805) and this hardware allows RAID 5 warm (hot ?) extension.


----------



## phoenix (Jun 11, 2010)

Does the array appear larger in dmesg output?  If so, then all you need to do is export the pool and import the pool.  After that, the extra space will appear in the pool.

However, you really shouldn't use ZFS on top of a hardware RAID array.  You lose half the features of ZFS by doing so.


----------



## tovo (Jun 14, 2010)

Unfortunately, in dmesg output, the array is not larger. May be it does mean that the problem is in a lower level. When I use the adaptec utility by doing : 
	
	



```
/opt/StorMan/arcconf getconfig 1 LD
```
I get :

```
<snip>
Logical device number 0
   Logical device name                      : eph-dat
   RAID level                               : 5
   Status of logical device                 : Logical device Reconfiguring
   Size                                     : 13332470 MB
   Stripe-unit size                         : 256 KB
<snip>
```
But 
	
	



```
dmesg | grep MB
```
 gives : 

```
aacd0: 11427830MB (23404195840 sectors)
```


----------



## tovo (Jun 14, 2010)

Well,
Finally, I decided to make some backup and erase all the datas in order to rebuild the partition and the pool.
Thank you all for trying to help me.
See you soon


----------



## phoenix (Jun 14, 2010)

If you are going to rebuild the array anyway, consider not using a RAID5 array.  Instead, put the controller into "single disk" mode or JBOD mode.  If the controller doesn't support those, then create a bunch of 1-disk RAID0 arrays.

Then create the pool using the individual disks, and let ZFS manage it all.

For example, to create an 8-disk raidz2 pool:
`# zpool create poolname raidz2 /dev/aacd0 /dev/aacd1 /dev/aacd2 /dev/aacd3 /dev/aacd4 /dev/aacd5 /dev/aacd6 /dev/aacd7`


----------



## danbi (Jun 16, 2010)

Some RAID controllers take eons to reconfigure an RAID array, especially with regards to increasing size -- they need to re-replicate every single block. With current large drives this is useless feature.

Follow the advice phoenix gave. He only forgot to remind you better glabel the drives 

On-demand growth is the primary reason why I prefer to use mirror vdevs on ZFS. If you need more drives, you can add these in pairs.

If you forget about your controller's 'hardware RAID' you may increase the capacity in ZFS by replacing your existing drives with larger capacity drives. Here again, it is best to use mirror vdevs, as you need to replace only two drives in order to see new capacity. If you have 5 drive raidz (RAID-5) then you need to replace all 5 in order to see more capacity.


----------



## Matty (Jun 16, 2010)

danbi said:
			
		

> If you forget about your controller's 'hardware RAID' you may increase the capacity in ZFS by replacing your existing drives with larger capacity drives. Here again, it is best to use mirror vdevs, as you need to replace only two drives in order to see new capacity. If you have 5 drive raidz (RAID-5) then you need to replace all 5 in order to see more capacity.



I thought you need zfsv16 for this to work.


----------



## tovo (Jun 16, 2010)

phoenix said:
			
		

> If you are going to rebuild the array anyway, consider not using a RAID5 array.  Instead, put the controller into "single disk" mode or JBOD mode.  If the controller doesn't support those, then create a bunch of 1-disk RAID0 arrays.
> 
> Then create the pool using the individual disks, and let ZFS manage it all.


Really ? And what about the performance ? I always thought that hardware Raid was always better because of the performance (read/write speed, data integrity) and the cpu usage.
My controller supports JBOD mode but how will I implement the redundancy by this way ?


----------



## phoenix (Jun 16, 2010)

Nope, it's been available in ZFS since before ZFSv6 (FreeBSD 7.0).

In the ZFSv6 days, you need to export the pool and then import the pool in order for the new space to become available.

In later ZFS versions (don't know the exact version off-hand), there's an "autoexpand" property that can be set on the pool.  If set, the extra space becomes available as soon as all devices in a vdev are replaced.  If not set, you need to export/import.

I've replaced 2 vdevs worth of drives so far this year.  1 on FreeBSD 7.x with ZFSv13.  1 on FreeBSD 8-STABLE with ZFSv14.  Works quite nicely.


----------



## phoenix (Jun 16, 2010)

tovo said:
			
		

> Really ? And what about the performance ? I always thought that hardware Raid was always better because of the performance (read/write speed, data integrity) and the cpu usage.



Depends on the RAID controller.  Some high-end controller from Areca, LSI/3Ware (PCIe 8x+) are super-fast and may be faster than software RAID.  However, if you have lots of CPU and RAM, software RAID may be faster.  Depends on the workload.

In this day of multi-GHz, multi-core CPUs, you don't need high-end, specialised controllers, if using software like ZFS, gmirror, graid3/graid5.



> My controller supports JBOD mode but how will I implement the redundancy by this way?



Via ZFS:  `# zpool create mypool raidz2 da0 da1 da2 da3 da4 da5 raidz2 da6 da7 da8 da9 da10 da11`

That create a ZFS pool named "mypool", which is comprised of two raidz2 (RAID6) vdevs, each with 6 drives.  This is, essentially, a "RAID60" array, as ZFS will stripe reads/writes across all the vdevs in the pool (essentially a RAID0 stripe).


----------



## tovo (Jun 17, 2010)

Well,
I think I didn't express myself very well but I understand what you mean. What I'm gonna do is to setup a JBOD and then build a raidz2 pool with all the 8 disks. I hope that it will be a good compromise between safety and space loss.
Thanks a lot


----------



## chappjc (Jun 30, 2010)

phoenix said:
			
		

> Depends on the RAID controller.  Some high-end controller from Areca, LSI/3Ware (PCIe 8x+) are super-fast and may be faster than software RAID.  However, if you have lots of CPU and RAM, software RAID may be faster.  Depends on the workload.
> 
> In this day of multi-GHz, multi-core CPUs, you don't need high-end, specialised controllers, if using software like ZFS, gmirror, graid3/graid5.
> 
> ...



Thanks to phoenix for this info.  I too just assembled a hardware RAID (RAID6) and was curious how the filesystem would expand when I add drives in the future.  Since I am using an Areca ARC-1680ML, which has an Intel IOP348 XOR engine, I believe it makes sense to keep using the hardware RAID.  ZFS just comes in handy for a huge (>10TB) file system.  

If I stick with the hardware RAID, is the zfs+zpool expansion accomplished by simply a zpool export/import or a reboot since I am using the disk without a partition (my zpool comprises just da0)?  I'm afraid the autoexpand option isn't mentioned in the zfs or zpool man pages and does not appear in zfs get.

Possibly it is still better to go with raidz2, even with a good RAID card?

Also, regarding the raidz2 command above, it seems that a stripe of two raidz2's will gobble up 4 drives worth capacity for parity.  Just the price for performance?


----------



## phoenix (Jun 30, 2010)

Yes, if you expand the size of the hardware RAID array, the extra space will be added to the pool after an export/import.

RAID arrays should be shallow (small number of drives) to get the best performance.  And then you add them all together into a RAID0 stripe.

IOW, a single RAID6 array using 12-drives will be slower than 2 RAID6 arrays using 6 drives each in a RAID0 stripeset.  Yes, you lose a bit more raw storage space ... but you gain a lot more redundancy (can lose 4 drives instead of just 2 without losing data) and a lot more raw throughput.

Putting ZFS on top of a single device (hardware RAID array) causes you to miss out on close to half the features of ZFS as it can only detect errors, it cannot fix them.


----------



## chappjc (Jun 30, 2010)

phoenix said:
			
		

> IOW, a single RAID6 array using 12-drives will be slower than 2 RAID6 arrays using 6 drives each in a RAID0 stripeset.  Yes, you lose a bit more raw storage space ... but you gain a lot more redundancy (can lose 4 drives instead of just 2 without losing data) and a lot more raw throughput.



I see -- benefits are twofold.  Still I'm not sure I want to spare 4 drives for parity, especially since my daily access will be bottle-necked by gigabit ethernet.



			
				phoenix said:
			
		

> Putting ZFS on top of a single device (hardware RAID array) causes you to miss out on close to half the features of ZFS as it can only detect errors, it cannot fix them.



I'm still a little foggy about what those features add over a RAID card, which can also rebuild when a replacement drive is added.  Although, I must say, Areca's CLI utility is pretty crummy, so I would not miss that!


----------



## phoenix (Jul 1, 2010)

If a file is corrupted on a hardware raid array, then that data is lost permanently.  Hardware RAID is only redundant at the drive level; ZFS is redundant at the data block level.

If a file is corrupted on a ZFS filesystem using redundant vdevs, ZFS will read that data from a redundant block (so the read succeeds), and it will write out the correct data overtop of the bad data (so future reads will also succeed).  With only a single device underneath ZFS, though, that can't happen (there's no redundant data for ZFS to read).

That's the biggest advantage to having ZFS handle the redundancy.


----------

