# zfs, raidz and total number of vdevs?



## wonslung (Jun 9, 2009)

Hello, i'm looking to upgrade my home NAS, my current is about a year or two old...it's running linux 2.6 and using mdadm software raid.  When looking to upgrade i've been drawn to ZFS and now that i've been using ZFS in freebsd that is the route i want to go.


My question is this:  Is there really a problem with using more than 9 vdevs in a raidz setup? and/or is there a way to separate them into smaller groups but still have them all in the same pool?

I was looking at going with 12 1 tb hard drives in raidz.  11+1 spare. The case i have is upgradeable to 20 drives leaving me room for 8 more which i eventually planned to use.  If i did smaller raidz settups like 9+1 spare x2 they would be in separate pools.  Would it be possible to somehow POOL that into 1 large pool or is that just a terrible idea?


or can i run 19+1 or 18+2 without much trouble?


----------



## graudeejs (Jun 9, 2009)

You may add to pool as many raidz/disks as you want


check out this one:
http://forums.freebsd.org/showthread.php?t=3689


From what i can recall even sun recommends to use about 8 disks per raidz max, that gives maximum io bandwidth


----------



## phoenix (Jun 9, 2009)

wonslung said:
			
		

> My question is this:  Is there really a problem with using more than 9 vdevs in a raidz setup? and/or is there a way to separate them into smaller groups but still have them all in the same pool?



Yes!!  Most definitely there is!!  See the post killasmurf86 linked to for the details on our backup servers, that use 24 drives in a single pool.  

The way raidz works, you get the IOps (I/O operations per second) of a single drive for each raidz vdev.  Also, when resilvering (rebuilding the array after replacing a drive) ZFS has to touch every drive in the raidz vdev.  If there are more than 8 or 9, this process will thrash the drives and take several days to complete (if it ever does).

We made the mistake of building a storage server with a single 24-drive raidz2 vdev.  It gave us over 10 TB of storage (400 GB drives), but was extremely slow for writes.  Then a drive died.  Spent over a week trying to get it to resilver.  It was horrible.

Then I started reading some of the Sun blogs about ZFS, and came across all their info on IOps and recommendations for their 48-drive Thumper servers (multiple mirror and raidz vdevs in a single pool).  The consensus is "Don't use more than 8-9 drives in any single raidz vdev".



> I was looking at going with 12 1 tb hard drives in raidz.  11+1 spare. The case i have is upgradeable to 20 drives leaving me room for 8 more which i eventually planned to use.  If i did smaller raidz settups like 9+1 spare x2 they would be in separate pools.  Would it be possible to somehow POOL that into 1 large pool or is that just a terrible idea?



You should only have (need) 1 pool per server.  That's the point of pooled storage.  You just keep adding vdevs into the pool as time goes by.

For 12 drives, I'd go with 2x 6-drive raidz2 vdevs.  That will give you 8 TB of disk space (using 1 TB drives), and will allow you to lose up to 4 drives (2 from each vdev) before losing data.

Later, when you add the extra 8 drives, you can set them up as a 6-drive raidz2 vdev, and 2 spares (or, a spare and a cache/log device, if using ZFSv13 in FreeBSD 7-STABLE or 8-CURRENT).

The zpool commands would look something like:

```
# zpool create mypool raidz2 da0 da1 da2 da3 da4 da5
# zpool add mypool raidz2 da6 da7 da8 da9 da10 da11
# zpool add mypool raidz2 da12 da13 da14 da15 da16 da17
# zpool add mypool spare da18
# zpool add mypool log da19
```

(That last line might be cache instead of log, can't remember off-hand, as I haven't used ZFSv13 yet, and am going by memory of a blog post.)



> or can i run 19+1 or 18+2 without much trouble?



Definitely do NOT do that.


----------



## wonslung (Jun 9, 2009)

oh wow, i didn't realize it worked like this.
i guess i'm still thinking about things in the traditional raid way.

I was thinking that each group of drives had to be in it's own pool.

you're saying i can make 3 raidz pools and then pool THOSE together into one big pool.

this is awesome.

so i'm going to have 12 drives at first....can i add new drives to the vdevs later if i decide?

let's say i do decide to made 2 groups of 5 drives, 1 spare and one log device (i'm still kind of fuzzy on how/why i would do that, i'll have to look it up but i'm just going by your example)

later, if i wanted to add more drives could i add one to each group or would i have to add a completely new third group?
	
	



```
# zpool create mypool raidz da0 da1 da2 da3 da4
# zpool add mypool raidz da5 da6 da7 da8 da9 
# zpool add mypool spare da10
# zpool add mypool log da11
```

later if i wanted to add more drives i'd pretty much only be able to add them as groups of 5?


----------



## phoenix (Jun 9, 2009)

wonslung said:
			
		

> oh wow, i didn't realize it worked like this.
> i guess i'm still thinking about things in the traditional raid way.
> 
> I was thinking that each group of drives had to be in it's own pool.
> ...



3 raidz *vdevs*, added into a single *pool*.

ZFS is organised like this:

```
(drive) (drive) (drive)   (drive) (drive) (drive)    (drive)         (drive)
   \       |       /         \       |       /          \               /
    -(raidz vdev)--           -(raidz vdev)--            -(mirror vdev)-
           \                         |                           /
            \                        |                          /
             ----------------------(pool)-----------------------
              /     /      |       |        |      |     \     \
             /      |      |       |        |      |      |     \
          (fs)     (fs)   (fs)   (fs)     (fs)    (fs)  (fs)    (fs)
```

The pool is comprised of *vdevs* (virtual devices).  A vdev can be a single file, a single slice, a single drive, a mirrored set of drives, or a raidz set of drives.  You can add as many vdevs to a pool as needed.  (Obviously, using files or slices for vdevs is not recommended for production use, can be useful for testing and playing.)

Note:  adding a non-redundant vdev to the pool can compromise the integrity of the pool, as losing the non-redundant vdev will cause data-loss to the pool (possibly the loss of the entire pool).

Just to clarify some terminology.  



> so i'm going to have 12 drives at first....can i add new drives to the vdevs later if i decide?



No. You cannot *extend* a raidz vdev (ie change a 6-drive raidz vdev to an 8-drive).

However, you can replace the drives in a raidz vdev with larger drives, to *expand* the total size of a raidz vdev.  You have to replace each drive individually.  Once all the drives in the raidz vdev have been replaced, you export the pool, and import the pool, and all the extra space becomes available.

And you can always add more raidz (or mirror) vdevs to a pool.



> let's say i do decide to made 2 groups of 5 drives, 1 spare and one log device (i'm still kind of fuzzy on how/why i would do that, i'll have to look it up but i'm just going by your example)



With later versions of ZFS, you can move the ZIL (ZFS Intent Log, the journal) to a separate drive, which spreads the I/O around a bit better, and improves performance.  It's not available in ZFSv6, which is what's available in FreeBSD 7.0-7.2.  And add drives as cache drives to speed up certain operations (mainly reads), which is really useful if you have an SSD that can sit between the slow harddrives and the fast RAM.  (Advanced topics, perhaps.)



> later, if i wanted to add more drives could i add one to each group or would i have to add a completely new third group?



See above.  You have to add the drives as another vdev.



> later if i wanted to add more drives i'd pretty much only be able to add them as groups of 5?



No, the vdevs don't have to be symmetrical.  You can create a pool with a mirrored vdev, a 5-drive raidz1 vdev, a 6-drive raidz2 vdev, a single drive vdev, and so on.  ZFS will then create, in essence, a RAID0 strips across all the vdevs.


----------



## wonslung (Jun 9, 2009)

yah, one question though,  is it SAFE to have a single drive as the log device or should it be mirrored?

also, how big does the log device need to be?

thanks again

and yah, i understand the idea of a vdev now...for some reason i was thinking it meant drive or partition....now i see it CAN be a drive or partition or a raidz group or a mirror....it's basically just anything that you use to made the pool, and the pool is pretty much a single drive or a raid0 group of vdevs....that's right?

also originally i was planning on having the os on it's own mirrored pool of 2 smaller drives but if i understand this correctly it's almost a waste of space to do that.  I would get much more out of spending that money on extra big drives for the pool and the added i/o of a new vdev than having it seperate.

is this correct?

also, i can have a single spare and it belong to the whole pool right? or do i need a spare for each vdev?


----------



## phoenix (Jun 9, 2009)

wonslung said:
			
		

> yah, one question though,  is it SAFE to have a single drive as the log device or should it be mirrored?



It's fine on a single drive.  If ZFS notices any issues with it, it switches back automatically to using the pool for the ZIL.  There are a couple of blogs on the sun site that cover it in more detail.



> also, how big does the log device need to be?



Not very big at all.  I don't recall the specifics, but the ZIL is written out to disk fairly often, so the pending data is never that big.



> and yah, i understand the idea of a vdev now...for some reason i was thinking it meant drive or partition....now i see it CAN be a drive or partition or a raidz group or a mirror....it's basically just anything that you use to made the pool



Correct.



> , and the pool is pretty much a single drive or a raid0 group of vdevs....that's right?



A pool is a collection of vdevs.



> also originally i was planning on having the os on it's own mirrored pool of 2 smaller drives but if i understand this correctly it's almost a waste of space to do that.



What we've been doing, and is working very well, is to grab a couple of 2 or 4 GB CompactFlash drives, pop them into either a CF-to-IDE or CF-to-SATA adapter, and use those for / and /usr.  Create a single large slice and partition on one, install to it, create a gmirror(8) array and add it and the other CF disk to create a software RAID 1.

Then create a storage pool out of all the real harddrives in the system.  And finally, create ZFS filesystems for /var, /home, /usr/local, /usr/ports, /usr/src, /usr/obj, and /tmp.

That way, only the base OS is on the CF, and everything else is on ZFS.

Alternatively, you can use USB flash drives, although I had a bad USB stick in one server that corrupted a bunch of files in the gmirror array, requiring several days of careful surgery to get things working with new drives.



> also, i can have a single spare and it belong to the whole pool right? or do i need a spare for each vdev?



Spares are assigned to the pool, and are available for use by any vdev that needs it.  So, yes, 1 spare for the whole pool would be doable.


----------



## wonslung (Jun 9, 2009)

thanks, so if i understand it correctly, a small fast device would be better for the log device...something like a ssd?

so for the compactflash, just to be clear
do i need 2 or 4?

2 for / and 2 for /usr?

also, can you add the log device later or does it have to be added when you make the original pool

what about using a usb thumb drive as a log device? would that be a bad idea? from what i gather it just needs to be as big as half your ram.


----------



## phoenix (Jun 9, 2009)

wonslung said:
			
		

> thanks, so if i understand it correctly, a small fast device would be better for the log device...something like a ssd?



Correct.



> so for the compactflash, just to be clear
> do i need 2 or 4?



FreeBSD will install into less than 1 GB.  2 GB is plenty.  I believe the 4 GB and larger CF disks support DMA, though.  We're currently using 2 GB disks in one server, and 4 GB disks in the other.  Use whatever is least expensive.  



> also, can you add the log device later or does it have to be added when you make the original pool



Can be added at any time.


----------



## wonslung (Jun 9, 2009)

thanks for all this great infomation.

I think i'm going to take your advice more or less and install to cf.
thanks again.


i remember now where i saw the thing about the log device needing a mirror
it was in this zfs best practices wiki
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide

it also says what you said about it reverting to the pool on failure so it's kind of confusing.


----------



## phoenix (Jun 10, 2009)

Ah, it seems that in certain versions of ZFS, the failure of a separate log vdev is treated like the failure of a "root" vdev (whatever that is), and can render a pool unusable.  Hence the note to create mirrored log vdevs.

There's a fix available that is incorporated into later versions of ZFS that allows a system to continue on automatically if a single, separate log vdev becomes unusable.

There's no (easy) indication of which versions of ZFS this applies to, though.


----------



## wonslung (Jun 10, 2009)

that's kinda scary =)

do you find booting from cf to be as fast or faster than traditional hard drives? i'm mainly using this for a media server so the idea of everything ZFS brings to the table is very appealing. I'm wondering also about the compression settings, and which one i should use.


----------



## phoenix (Jun 10, 2009)

Booting off CF using a SATA adapter is very speedy.  The longest part of the boot, on these servers, is the POST, RAID controller initialisation, and 10-second count-down at the loader prompt.    The rest scrolls by too fast for me to read anything.

Booting of CF using an IDE adapter is still speedy, but I can read bits and pieces of the kernel/bootup messages.

Booting off a USB stick is slow, althought these are 2 GB consumer (ie < $10) sticks.

Except for the USB boot, it's as fast (if not faster) than booting off a normal SATA drive.

We use gzip-9 compression for our backups filesystem.  The CPU usage (monitored via SNMP every 2 minutes) never goes above 20% (or 5% per CPU as there are 4) during the backup runs.

We also use lzjb compression on /usr/ports and /usr/src.  Additional CPU usage is barely noticeable during cvsup, portsnap, or buildworld.

So long as you have over 1 GHz of CPU and 2 GB of RAM, using ZFS doesn't really add any load to the system.


----------



## wonslung (Jun 10, 2009)

i'm upgrading my e7300 dual core to a q9550 quad core (intel)
going from 4gb ddr2 800 to 8 gb ddr2 800
decided to order 2 cf-->Sata2 adapaters and 2 233x 8gb compact flash cards for the boot device (thanks for that)
ordered the smallest sata drive i could find for the log device, might upgrade it to a ssd when i can afford it.

already have 6 1tb sata drives in mdadm linux raid which i'll be using and ordered 6 more sata drives, and found a great 4u 20 hot swap drive case on newegg.

which brings me to my next question. to migrate the data to the new server what i'm going to have to do is copy it to single 1tb drives, then put the server together with starting with just 1 raidz vdev, then copy it from the single drives to the pool, then make add the second vdev to the pool.  In traditional raid it would take a long time, i understand zfs will let me do it right away but it will have all the data only on the first vdev.

is there a way to force it to split it across both vdevs after i add the second group of drives?


----------



## wonslung (Jun 10, 2009)

> No, the vdevs don't have to be symmetrical.  You can create a pool with a mirrored vdev, a 5-drive raidz1 vdev, a 6-drive raidz2 vdev, a single drive vdev, and so on.  ZFS will then create, in essence, a RAID0 strips across all the vdevs.



I've read on a couple wiki's that you have to use like vdevs.  Did this change in 7.2?


from the zfs wikipedia entry:
# You cannot mix vdev types in a zpool. For example, if you had a striped ZFS pool consisting of disks on a SAN, you cannot add the local-disks as a mirrored vdev.
http://en.wikipedia.org/wiki/ZFS


----------



## phoenix (Jun 10, 2009)

wonslung said:
			
		

> I've read on a couple wiki's that you have to use like vdevs.  Did this change in 7.2?
> 
> from the zfs wikipedia entry:
> # You cannot mix vdev types in a zpool. For example, if you had a striped ZFS pool consisting of disks on a SAN, you cannot add the local-disks as a mirrored vdev.
> http://en.wikipedia.org/wiki/ZFS



I'm pretty sure that's talking about mixing storage technologies on vdevs (iSCSI vs local), and not about mixing vdevs "types" (mirror, raidz1, raidz2).  I'll have to look up all the blogs about configuring the Thumper storage servers (48 drive behemoths), as I recall they all recommneded using a mirrored vdev (OS), a mirrored log/cache vdev, and raidz2 vdevs for bulk storage.

Maybe I'll test this out tomorrow, as we have a spare 16-drive server sitting on the work bench.


----------



## wonslung (Jun 10, 2009)

i was just currious =)
thanks for all of your wonderful help though

another thing i meant to ask you, but forgot.

for swap space do you use a seperate partition/drive or do you use a zvol on top of zfs.

I was thinking the idea of the zvol swap would be better if it worked properly


----------



## phoenix (Jun 10, 2009)

wonslung said:
			
		

> which brings me to my next question. to migrate the data to the new server what i'm going to have to do is copy it to single 1tb drives, then put the server together with starting with just 1 raidz vdev, then copy it from the single drives to the pool, then make add the second vdev to the pool.  In traditional raid it would take a long time, i understand zfs will let me do it right away but it will have all the data only on the first vdev.
> 
> is there a way to force it to split it across both vdevs after i add the second group of drives?



Not AFAIK.  But any new writes will be done primarily to the new vdev, as the pool tries to balance the storage usage across the vdevs in the pool.

I don't think there's any way to check how much storage space is used by any one vdev.  If there was, you could probably do some manual copying of data between directories, then deleting directories, and checking the storage usage.


----------



## wonslung (Jun 10, 2009)

well when you do zpool list it shows how much is on each group
i DO know that much.

but what i meant was this:

I know that when i copy all the data over it's going to be ONLY on the first raidz vdev because the second one WONT exist yet
then i'm going to add the second vdev
is there a command to FORCE it to spread the data between the two or does it just do it slowly on it's own later?

edit:

going back over this thread i can't help but feel how exciting this is.  I've fallen in love with having a home media server, I've been using XFS on linux software raid5 but i've run into some limitations that ZFS really just blows out of the water.  My system will be able to hold 20 drives, i'm going to have 12 drives to start with.  it would almost be better for me to do smaller vdevs (4 drives in raidz)x3 instead of (6 drives in raidz)x2 wouldn't it?


----------



## phoenix (Jun 11, 2009)

No, when you do a *zpool list*, it shows you the stats for the entire pool, not for the individual vdevs in the pool.

If you could get the stats for the individual vdevs, then you could do copy/delete tricks to spread the load around and watch to make sure it's actually doing it.  Without knowing how much is on each vdev, though, you can't really do this.

ZFS will favour writing to newer/emptier vdevs, and (over time) will balance the writes across all the vdevs.  There's no way, that I know of yet, to force it to rebalance the data across all the vdevs to spread the load onto new vdevs.  This probably wouldn't work, anyway, due to the way snapshots work.


----------



## wonslung (Jun 11, 2009)

i got the command wrong.


here

```
We can see where the data is currently written in our pool using zpool iostat -v:

zpool iostat -v trout
                                 capacity     operations    bandwidth
pool                           used  avail   read  write   read  write
----------------------------  -----  -----  -----  -----  -----  -----
trout                         64.5M   181M      0      0  13.7K    278
  mirror                      64.5M  58.5M      0      0  19.4K    394
    /home/ocean/disk2             -      -      0      0  20.6K  15.4K
    /home/ocean/disk1             -      -      0      0      0  20.4K
  mirror                          0   123M      0      0      0      0
    /home/ocean/disk3             -      -      0      0      0    768
    /home/ocean/disk4             -      -      0      0      0    768
----------------------------  -----  -----  -----  -----  -----  -----
```

taken from this page
http://flux.org.uk/howto/solaris/zfs_tutorial_01


----------



## wonslung (Jun 11, 2009)

phoenix said:
			
		

> .  This probably wouldn't work, anyway, due to the way snapshots work.



yeah, i didn't think of that....ok, well i guess zfs is smart enough to work it out on it's own...my issue is that i don't have enough drives to just build both all the vdevs then copy the data over...i'll have to build 1 or 2 of the vdevs (depending on which way i go) copy the data, then build the last one.

i still haven't decided if i want to go with 2 raidz vdevs of 6 drives each or 3 raidz vdevs of 4 drives each...i lose 1 tb the second way but it would be much faster right?

i'm also still curious about using a zvol for swap space.  It would seem to me that it would be faster than a single drive or partition for swap because of all the added i/o but of course i don't really know as well as you probably do.


----------



## phoenix (Jun 11, 2009)

wonslung said:
			
		

> i got the command wrong. zpool



Ah, cool.  Didn't realise that was available.  Thanks.

I've always used gstat(8) to monitor disk throughput in real-time, although I have used iostat a couple of times.  Guess I never paid attention to the output from -v.


----------



## phoenix (Jun 11, 2009)

wonslung said:
			
		

> i still haven't decided if i want to go with 2 raidz vdevs of 6 drives each or 3 raidz vdevs of 4 drives each...i lose 1 tb the second way but it would be much faster right?



Yes, going with 3 vdevs would be faster than 2.  In theory, it should be 50% faster.  



> i'm also still curious about using a zvol for swap space.  It would seem to me that it would be faster than a single drive or partition for swap because of all the added i/o but of course i don't really know as well as you probably do.



It's kind of a catch-22 to use swap-on-zvol.  ZFS needs to allocate a bit of memory to track various things when accessing stuff on a zvol.  If the OS is short on RAM, it will write stuff out to swap ... generating more memory requests, and the cycle continues.  In theory, things should work.  In practise, some Solaris and FreeBSD users has run into kernel panics and out-of-memory situations due to using swap-on-zvol.

In using ZFS since August of last year, I have not run into this issue on any of the 3 servers I run ZFS on. [knock wood]  All three use swap-on-zfs.


----------



## wonslung (Jun 11, 2009)

well i will have 8gb of ram so i'm guessing swap on vdev will be ok, as i doubt i'll ever really need swap

sofar this is what i'll have on monday.

I already have a system i'm going to use to upgrade.  It's a regular desktop motherboard, socket 775 with a e7400 intel core2duo and 6 1tb hard drives.  I've ordered a new 4u case which holds 20 hotswap drives (my old case is 4u as well but it only holds 6 hotswap drives so to expand i really need the new one)
i also ordered an intel q9550 quadcore and 4 more gb of ddr2 800 for a total of 8gb 
also ordered 6 more 1tb hard drives so i'll have a total of 12.
ordered 2 8gb compact flash cards and the sata>cf adapters, and an extra 80gb hard drive.

my original plan didn't include the cf cards or the 80gb drive but after listening to your ideas/suggestions i decided to go ahead and go that route.  i was thinking i'd use the small hard drive as a log device, and later i plan on buying one of those super fast ssd drives to use as a cache device. I guess now i've decided to go with raidz vdevs made up of 4 disks each.  My next order in a month or two will probably be for the ssd and at least one more 1tb drive for a hotswap.  I'm just hoping i'll be ok with no hotswap drive until then.


i'm just currious, but why did you go with ufs instead of zfs for root.

i was thinking how cool it would be to have zfs snapshots for when you decide to upgrade stuff.

I guess you could maintain a backup on zfs as well, is that what you do?


----------



## wonslung (Jun 11, 2009)

Another question just popped up.  When dealing with compression, Can you switch types? if so it only affects the new data right?


----------



## phoenix (Jun 12, 2009)

wonslung said:
			
		

> Another question just popped up.  When dealing with compression, Can you switch types? if so it only affects the new data right?



Correct.  The ZFS property *compression* applies to new data as it is written to the filesystem.  And you can change it at any time.


----------



## phoenix (Jun 12, 2009)

wonslung said:
			
		

> i'm just currious, but why did you go with ufs instead of zfs for root.



Booting off ZFS isn't enabled in FreeBSD 7.2, and all the hacked up methods for getting / on ZFS were too hackish for use in production.  Once ZFS booting is enabled on a -RELEASE, we'll look at migrating to it.



> i was thinking how cool it would be to have zfs snapshots for when you decide to upgrade stuff.



Yes, this is indead very interesting.  There's a project out there for creating Boot Environments, where you can boot to different filesystems, snapshots, and clones.  So you can install, create a BE, upgrade, create a BE, and have access to either/or at boot time.

Solaris is using this now, it's part of their installer/upgrade tools.



> I guess you could maintain a backup on zfs as well, is that what you do?



Yeah, I added "localhost" as a "remote server" to be backed up via the normal backup process, with an exclude list for everything except / and /usr.    That way, if things go horribly, horribly wrong, and we can't boot off either part of the mirror, we can boot off a LiveCD, export/import the pool, and restore from the backups.


----------



## wonslung (Jun 12, 2009)

well i know the boot loader won't boot zfs but what's wrong with using the compact flash for /boot and putting everything else on zfs.  i mean, is there a technical reason not to do this?

i'm asking mainly because that's the way i did it on the last install i did and i want to know if there is any major issue with that?

as far as the backup system goes, i'm really going to have to read up on what you have going.

I'm also interested in using NFS and maybe samba as well.

My main purpose is a media server for tv show and movie backups which stream to 3 htpcs, 2 xbox 360's, 2 xbox's with xbox media center and a couple ps3's (i know, it's a lot, haha)  I'm really looking forward to the compression aspect of zfs.  From what i've read on the net it seems if you have a powerful enough cpu it actually speeds up file read/write due to less actual disk i/o.  I'm hoping that zfs compression=gzip or gzip-9 will be ok with media files on a quad core cpu but i'll do some testing (can't really find much information on people using it for media servers yet)

anyways, most of the computer in the house run freebsd or linux of some kind with the exception of 2 windows machines, both laptops so nfs would be awesome, especially for home directories over the network.


----------



## UnixMan (Jun 12, 2009)

First of all, thanks for all the tips you gave out in this thread Phoenix. I appreciate it, as I'm sure many others do.



			
				phoenix said:
			
		

> No, the vdevs don't have to be symmetrical.  You can create a pool with a mirrored vdev, a 5-drive raidz1 vdev, a 6-drive raidz2 vdev, a single drive vdev, and so on.  ZFS will then create, in essence, a RAID0 strips across all the vdevs.



I had trouble trying to add unsymmetrical vdevs to my pool.

```
[root@Touzyoh ~]# zpool add mokona raidz da2 da3 da4 da5
invalid vdev specification
use '-f' to override the following errors:
mismatched replication level: pool uses 5-way raidz and new vdev uses 4-way raidz
```

when I ran zpool upgrade -v I noticed that I was running version 6 of ZFS. :q I assume the error I'm getting is because of my extremely outdated version (saw you mention version 13). What I'm confused about is that zpool upgrade -v says my current platform will only support version 6. I just rebuilt world to 7.2-STABLE last month, how can I get version 13 so I can upgrade my pool? I thought rebuilding world would have done the trick.


```
root@Touzyoh ~]# zpool upgrade -v
This system is currently running ZFS version 6.

The following versions are supported:

VER  DESCRIPTION
---  --------------------------------------------------------
 1   Initial ZFS version
 2   Ditto blocks (replicated metadata)
 3   Hot spares and double parity RAID-Z
 4   zpool history
 5   Compression using the gzip algorithm
 6   bootfs pool property 
For more information on a particular version, including supported releases, see:

http://www.opensolaris.org/os/community/zfs/version/N

Where 'N' is the version number.
[root@Touzyoh ~]# uname -a 
FreeBSD Touzyoh.example.me 7.2-STABLE FreeBSD 7.2-STABLE #2: Mon May 11 08:32:11 UTC 2009     root@Touzyoh.example.me:/usr/obj/usr/src/sys/GENERIC  amd64
```


----------



## wonslung (Jun 12, 2009)

UnixMan said:
			
		

> v
> This system is currently running ZFS version 6.
> 
> The following versions are supported:
> ...




version 6 is what comes with 7.0-7.1 (and originally 7.2)

13 has been MFC and you can update to 13 using cvsup the normal way.

I'm using 13, it's great, all kinds of new features like delegated  managment and refquotas...and a ton of other stuff, let me find the release notes and edit this when i have them.

http://svn.freebsd.org/viewvc/base?view=revision&revision=192498
http://www.bsdunix.ch/serendipity/i...FS-Version-13-to-FreeBSD-stable-RELENG_7.html


----------



## phoenix (Jun 12, 2009)

wonslung said:
			
		

> well i know the boot loader won't boot zfs but what's wrong with using the compact flash for /boot and putting everything else on zfs.  i mean, is there a technical reason not to do this?
> 
> i'm asking mainly because that's the way i did it on the last install i did and i want to know if there is any major issue with that?



Yeah, that should work.  You'll have to change a few loader.conf settings to tell it to boot from /kernel/kernel instead of /boot/kernel/kernel, but otherwise it should work.

As these were work boxes, we went with a very conservative setup, keeping the OS on UFS (/ and /usr) and just putting the data on ZFS.  We also started with ZFS back when it first was imported into FreeBSD (7.0 timeframe) and there were some issues.  We wanted to make sure we could boot into a full OS to track down/fix any ZFS issues.

Now, ZFS is much more stable and reliable in FreeBSD.  So you shouldn't have any issues putting just /boot on UFS, and everything else on ZFS>



> anyways, most of the computer in the house run freebsd or linux of some kind with the exception of 2 windows machines, both laptops so nfs would be awesome, especially for home directories over the network.



NFS support is built right into ZFS.  You still set all the same /etc/rc.conf variables (nfs_server, mountd, statd, etc).  Then you just set the *sharenfs* property for the filesystem you want to share, using the same syntax as in the /etc/exports file.  And ZFS does the rest, calling nfsd and mountd to export the filesystem.  And anytime you edit the sharenfs property, mountd gets refreshed.

CIFS and iSCSI support are not built into ZFS on FreeBSD, so you still need to use the Samba port or an iSCSI target port.


----------



## phoenix (Jun 12, 2009)

UnixMan said:
			
		

> I had trouble trying to add unsymmetrical vdevs to my pool.
> 
> ```
> [root@Touzyoh ~]# zpool add mokona raidz da2 da3 da4 da5
> ...



Hrm, good to know.  I haven't tried to use any non-symmetrical vdevs, just assumed you could.  The docs that I've read don't specify this, and just made it sound like you could add any vdevs to the pool.  Guess that's a reason to have multiple pools per system (always wondered why anyone would do that).

You need to update to a newer 7-STABLE in order to get ZFSv13 support.  It was only MFC'd in the last couple of weeks.


----------



## MasterCATZ (Oct 11, 2011)

Sorry for reviving dead thread but 

what would you think is best 

4 Drive vdev1 raidz1
4 drive vdev2 raidz1
Pooled together as RAID5
allowing for 2 Drive failurs as long as its not in any of the the other vdev's
has  less IO but shorter re silvering ?

8 drive vdev raidz2 
as RAID6 (2 Drive Failur , longer re silvering times )

Normally I have 8 Drives together , but my new SAS cards have only get 4 ports per channel and to make things easier I was thinking of just grouping them together by these smaller connection groups

the final system will be 3 SAS cards Pooled together 

I am just unsure which will be best 8 disk raidz2 vdev or 2x 4 disk raidz1 per vdev

its only for multimedia storage and needing better read speeds then write speeds ( which is what seemed to have happened when I went 4 disk the write speeds seemed better ?? but also using another version of OS now as well so I am unsure  )


----------



## phoenix (Oct 13, 2011)

Depends on the size.  If you are using drives under 1 TB, then raidz1 would be okay, and give better performance.  For drives over 1 TB, you should use raidz2.  Ideally, with 6 disks per vdev, although 8 works as well.

The reason?  The time it takes to resilver a dead drive.  For drives over 1 TB, you're looking at several days to over a week (depending on how full and fragmented the pool is), during which time a raidz1 vdev would have 0 redundancy.  If you lose a second drive while the first is resilvering ... you lose the pool!!

Thus, for larger drives that take a long time to resilver, use raidz2 (or even raidz3) to protect the pool during the resilver.


----------



## bbzz (Oct 13, 2011)

What does resilvering time depend on, besides pool type? Disk size, other computer specs?
How much would it take to resilver 2TB drive in 6 drive vdev pool with raidz2?


----------



## phoenix (Oct 13, 2011)

Resilvering touches every block of data on the disk, in the order that it was created (temporal order).  If you create and delete a lot of snapshots, and you create and delete a lot of files, then new blocks will be interspersed with old blocks.  Thus, resilvering will thrash the drive heads.

If you search through the archives for the zfs-discuss mailing list, you'll find several threads where various formulas are given for determing the worst-case resilvering times of a vdev, based on the number of drives in the vdev, the type of vdev (raidz1, raidz2, etc), and the size of the drives.  There's no "one true number".

Suffice to say, a 2 TB drive in a 50% full pool will take several days to resilver in a raidz2 vdev.

In our oldest ZFS box (within a week of ZFSv6 hitting FreeBSD 7-STABLE we built the box) it takes almost 3 weeks to resilver a 500 GB drive in an 8-drive raidz2 vdev.  The pool is over 2 years old, with snapshots created daily, and snapshots deleted after about 6-8 months, with data changing on a daily basis.  Pool has 3x 8-drive raidz2 vdevs.  Drives connected to 3Ware PCIe controllers as Single Disk arrays.

In our newest ZFS box, resilvering a 500 GB drive takes about 3 days.  This pool is under 50% full, has snapshots created daily, has no snapshots deleted, has dedupe enabled, and has 4x 6-drive raidz2 vdevs.  Drives connected to SuperMicro AOC-USAS-8Li SATA controllers.


----------



## olav (Oct 13, 2011)

ZFS really needs the block rewrite functionality. Right now if you want to defrag, the only solution is to yearly move your data to a new storage pool.


----------



## bbzz (Oct 13, 2011)

If a resilvering is abruptly stopped, say during a brief power outage, what happens to disk, pool? Does it continue where it stopped ?


----------



## olav (Oct 13, 2011)

No, you have to restart.


----------



## bbzz (Oct 13, 2011)

From some opensolaris guide http://dlc.sun.com/osol/docs/content/ZFSADMIN/gbbvf.html


> Resilvering is interruptible and safe. If the system loses power or is rebooted, the resilvering process resumes exactly where it left off, without any need for manual intervention.


----------



## olav (Oct 13, 2011)

Ah that's correct, it's resilvering. But a scrub will restart?


----------

