# Help me build my ZFS box



## SilverJS (Jun 20, 2011)

Hey guys,

Have slowly been researching this for the past little while, and this forum has been a very nice source of info - thanks to all so far.

I want to build myself a ZFS server, most likely using Sub.Mesa's distro. This would be as a back-up server; I already have a file server with 10 TB (consisting of 5 2TB drives that Windows 7 sees as "one" with its built-in disk manager) but I have no redundancy at all in there. I'm only looking for something to serve as a back-up box for music, movies, documents, photos, my music studio sessons, etc.; therefore, a 10 TB pool is what I'm looking for. No streaming from the ZFS box. Performance is not that big of a concern, but reliability is.

I do have a few questions, though, for now mostly relating to hardware, specifically the motherboard and hard drives themselves.

First off, I've already got an Ahtlon X2 250, a Scythe Shuriken cooler for it, and a Seasonic 300W power supply lying around. So, I'd like to use those. With that:

1. Mobo: I was thinking the Gigabyte GA-890GPA-UD3H. This has 8 built-in SATA ports. I know this doesn't support ECC memory; but can you suggest an AM3 board that does? The board would (much preferably) have an HDMI port, for all I've got right now for an extra monitor is a 32" 1080p TV. (BTW, plan would be for this to be run headless once configured.)

2. Memory: ECC or non-ECC? I know the whole thing about doing it once and doing it right, but is it worth it? My readings seem contradictory on this;

Hard drives: This is also a big one. I'd like to run RAIDZ2, which means either 6 or 10 drives, as I recall, for optimum performance (or is this only for the 4K corrected thing?). I suppose the easiest way to reach the 10 TB goal is with 4 data drives and 2 parity, which points to 3TB drives - but am I missing something? I guess I could also run 9 1.5 TB drives in RAIDZ. (The numbers for RAIDZ were, as I recall, either 2, 3, 5 or 9 drives for optimum performance?) Not that performance is THAT big a deal, as I've mentioned previously...

So anyhow:

3. What drive size/number combination would you recommend?
4. Using which specific hard drives?

Awesome, thanks!


----------



## carlton_draught (Jun 21, 2011)

Some points.

Go with ECC. The premium is not much, and if reliability is truly a concern then you should be using ECC. All ZFS does is ensure that data is correct once it is written to disk. If your RAM is faulty you may unknowingly be writing errant data to disk, which ZFS won't do anything about.

Asus motherboards have ECC support as standard (assuming processors support it) from memory. I'd check there first.

The first link has some comments about drives. My choice is Hitachi or Samsung. If you have lots of data, 3TB makes things easier.

Also, if reliability is truly as big a concern as you say, then IMO the 5*2TB of non-redundant data that sounds like it's important to you is an accident waiting to happen, unless backed up frequently. You can use SMART data as a poor man's method for determining data corruption, but it won't catch everything and obviously it won't self-heal.


----------



## SilverJS (Jun 21, 2011)

Cool, thanks.

Yeah, I was actually looking at the Asus 880G series of motherboards - I've got one in one of my computers right now! -, I think they'd do fine.  I'd need a SAS card though, probably the BR10i.  

Either way - for ECC RAM, is this what I'd need? :

http://www.ncix.com/products/?sku=61737&vpn=kvr1333d3d8r9s/4g&manufacture=Kingston 

or 

http://www.ncix.com/products/?sku=51397&vpn=KVR1066D3D8R7SK2/8G&manufacture=Kingston

I understand that in this application (is in most, I guess), the more RAM, the better - so, maybe two of the latter item...?

And yes, I understand that my current setup is far less than ideal.  =)  Hence, the desire to setup a back-up box, ASAP...=)


----------



## vermaden (Jun 21, 2011)

SilverJS said:
			
		

> 2. Memory: ECC or non-ECC? I know the whole thing about doing it once and doing it right, but is it worth it? My readings seem contradictory on this;



I would not go for ECC personally here, I once had box with ECC memory, all later boxes was without ECC and I do not see the point in paying extra for it.



			
				SilverJS said:
			
		

> Hard drives: This is also a big one. I'd like to run RAIDZ2, which means either 6 or 10 drives, as I recall


RAIDZ2 (similar to RAID6) requires at least 4 drives, so 4/5/6/7/8/9/10/11/${MOAR} will do.

To comparision, RAIDZ (slimilar to RAID5) requires only 3 drives.

You should ask Yourself if 2 x RAID5 (STRIPE) + HOT SPARE would not be better (2*4 + 1) or (2*5 + 1).



> (or is this only for the 4K corrected thing)


I personally stick with 512B drives, currently 2TB Seagate Barracuda LP (I do not need performance on that box as these are 'low power' drives).



> 3. What drive size/number combination would you recommend?



I would go for one of these:

STRIPE[ RAIDZ(4 * 2TB) + RAIDZ(4 * 2TB) ] + HOTSPARE(1) = 6TB + 6TB = 12TB with 8 drives + hotspare

STRIPE[ RAIDZ(3 * 2TB) + RAIDZ(3 * 2TB) ] + HOTSPARE(1) = 4TB + 4TB = 8TB with 6 drives + hotspare




> 4. Using which specific hard drives?


I would omit 4K drives, check *lists.freebsd.org* how the 3TB drives are working, but they are all 4K drives.


----------



## AndyUKG (Jun 21, 2011)

vermaden said:
			
		

> RAIDZ2 (similar to RAID6) ... To comparision, RAIDZ (slimilar to RAID5)
> 
> You should ask Yourself if 2 x RAID5 (STRIPE) + HOT SPARE would not be better (2*4 + 1) or (2*5 + 1).



The main reason not to choose RAIDZ1 is with very large disks (2TB etc) in the event of a disk failure the RAID set can take over 24 hours to rebuild which leaves all the data in your pool vulnerable to a second disk failure. It's obviously not veryyy likely to have 2 disks fail within a day or 2 but none the less it's a real risk.
If you you can't or won't take regular backups, can't aford to loose any data (ie data changed between backups) or if uptime is really critical then go for RAIDZ2.

cheers Andy.


----------



## danbi (Jun 21, 2011)

vermaden said:
			
		

> RAIDZ2 (similar to RAID6) requires at least 4 drives, so 4/5/6/7/8/9/10/11/${MOAR} will do.
> 
> To comparision, RAIDZ (slimilar to RAID5) requires only 3 drives.



This is not correct.

RAIDZ1 requires N+1 drives, therefore the smallest RAIDZ array is with 2 drives.
RAIDZ2 requires N+2 drives, therefore the smallest RAIDZ2 array is 3 drives.

Today's large capacity commodity drives are rather flaky. There is high risk for drive failure while rebuilding. For a system of this size, you are risking too much in having only one parity disk. You may well consider using 8-stable which already has ZFS v28 support and RAIDZ3 (3 spare drives per vdev).

'RAID' was really designed to use large number of small disks, where rebuild times etc are much, much smaller -- but obviously for a home system you need to conserve drive bays, drive ports and power.


----------



## SilverJS (Jun 21, 2011)

Awesome - thanks for the replies, guys.  Still not sure about ECC or not - I'm not sure what I should get for ECC.  Although I love Asus boards, they do have a reputation for being very finicky about memory, and even though they list ECC support, there's very few modules in their memory support list that are ECC (a particular board only had one!).

So I can do RAIDZ3 in FreeBSD 8?  I didn't know that!  Heck, while we're at it, might as well.  So, it would be a total of 9 drives - 6 data, 3 parity?  Then, I'll have to research the concept of a hot spare, I'm not too sure how to implement that in FreeBSD?  But, if I have the room in the case, the SATA port, and the money, is there any disadvantage to using a hot spare?

Cheers!


----------



## carlton_draught (Jun 22, 2011)

SilverJS said:
			
		

> Awesome - thanks for the replies, guys.  Still not sure about ECC or not - I'm not sure what I should get for ECC.  Although I love Asus boards, they do have a reputation for being very finicky about memory, and even though they list ECC support, there's very few modules in their memory support list that are ECC (a particular board only had one!).


Here's what I do. I go to one of the big memory vendors with a good reputation (e.g. I use Kingston), use their memory tool, select the motherboard I would use, and it spits out a list of suitable memory sticks. I compare with the motherboard's pdf manual to make sure of the rules about how many I need, where to put them and any constraints e.g. quad/dual channel etc. I then find prices from a parts vendor I trust.

A mobo manufacturer only has incentive to do so much validation, and often only with the existing memory at the time. A memory vendor will produce new memory module and validate them against existing motherboards, because if they don't, people won't buy their product. The memory manufacturers will guarantee that their sticks work with your motherboard, so do you really need the blessing of two companies?

Another reason I like ECC btw is that your motherboard will log ECC errors and whether they were correctable or not. It's nice to be able to check that out. With non-ECC RAM, the only way you'd know is if you had a reason to suspect and were willing to have it do memtest over a weekend.


----------



## Terry_Kennedy (Jun 22, 2011)

carlton_draught said:
			
		

> Here's what I do. I go to one of the big memory vendors with a good reputation (e.g. I use Kingston), use their memory tool, select the motherboard I would use, and it spits out a list of suitable memory sticks. I compare with the motherboard's pdf manual to make sure of the rules about how many I need, where to put them and any constraints e.g. quad/dual channel etc. I then find prices from a parts vendor I trust.


I discovered that for at least some high-end modules, "Kingston" memory is actually some other brand with a Kingston label on it. In particular, I purchased Kingston KVR1333D3D4R9S/8G which was actually re-labeled Hynix HMT31GR7AFR4C-H9. I had purchased it based on a "this seems like the right part" gut feeling, as the Kingston part number wasn't listed as supported on the Supermicro motherboard (X8DTH-iF) I was using. Once I received the parts and noticed the Hynix part number, I checked and that part was listed as supported by Supermicro.

Getting the exact part isn't as important when using more common memory modules - 8GB registered modules are somewhat unusual.



> Another reason I like ECC btw is that your motherboard will log ECC errors and whether they were correctable or not. It's nice to be able to check that out. With non-ECC RAM, the only way you'd know is if you had a reason to suspect and were willing to have it do memtest over a weekend.


Some people will say that memory errors will never happen. This is from one of my other servers (not the Kingston modules) on Sunday:

```
+MCA: Global Cap 0x0000000000000005, Status 0x0000000000000000
+MCA: Vendor "GenuineIntel", ID 0x6b4, APIC ID 0
+MCA: CPU 1 UNCOR PCC OVER BUSL0 Source RD Memory
+MCA: Address 0x703ae14
+MCA: Bank 0, Status 0xf624210022200810
+MCA: Vendor "GenuineIntel", ID 0x6b4, APIC ID 0
+MCA: CPU 1 UNCOR PCC OVER BUSL0 Source RD Memory
+MCA: Address 0x703ae14
+MCA: Bank 0, Status 0xb601a00022000800
```
If someone is going to use checksummed RAID (like ZFS), it makes sense to also use ECC memory. Otherwise the ZFS could be reliably storing garbled data.


----------



## SilverJS (Jun 22, 2011)

carlton_draught said:
			
		

> Here's what I do. I go to one of the big memory vendors with a good reputation (e.g. I use Kingston), use their memory tool, select the motherboard I would use, and it spits out a list of suitable memory sticks. I compare with the motherboard's pdf manual to make sure of the rules about how many I need, where to put them and any constraints e.g. quad/dual channel etc. I then find prices from a parts vendor I trust.



Now why hadn't I thought of that?  Perfect!  Thanks for that.

So would this be feasible, then?  RAIDZ3 plus hot spare?  10 drives total, I guess?  (Plus boot drive)


----------



## SilverJS (Jun 28, 2011)

OK - so I'll be making the drive down this weekend to pick everything up.  Memory Express (in Edmonton) has the 3TB 5K3000 drives, or also some Seagate Barracuda 2TB drives, which I believe are 4K.

Do you guys see any issues with using 6 X 3TB drives in a RAIDZ2 setup?  The board I want to use (M4A88T-M) has six SATA ports, so that's awesome.  I'd use some random IDE drive on the board's PATA board for a boot drive.

Any show stoppers here?  I could also get the 2TB drives, along with a PCI-E SATA expansion card, but I think the less hardware, the better...?


----------



## vermaden (Jun 29, 2011)

SilverJS said:
			
		

> 3TB 5K3000 drives, or also some Seagate Barracuda 2TB drives, which I believe are 4K.



I have Seagate Barracuda LP 2TB and they are 512B (not 4K), but there are more Barracudas then LP only, so check your exact model.


----------



## rusty (Jun 29, 2011)

Just installed a Seagate Green SATA-III 2TB these are 4K and use SmartAlign.

A standard zpool on a single disk and no messing with ashift etc. 
Transfers of a 10GB .mkv gives 
read  132MB/s
write 115MB/s

Not bad at all for a 5900 rpm drive.


----------



## vermaden (Jun 29, 2011)

@rusty

True, power consumption also improved since LP series:
http://www.silentpcreview.com/article1181-page6.html


----------



## AndyUKG (Jun 29, 2011)

rusty said:
			
		

> Just installed a Seagate Green SATA-III 2TB these are 4K and use SmartAlign.



Anyone any idea what SmartAlign actually does? The two PDFs I pulled off the seagate site don't really explain anything, other than its automagic!

http://www.seagate.com/docs/pdf/whitepaper/mb604_4k_transition_faq.pdf

cheers Andy.


----------



## vermaden (Jun 29, 2011)

AndyUKG said:
			
		

> Anyone any idea what SmartAlign actually does?



Check here: http://consumer.media.seagate.com/2010/06/the-digital-den/advanced-format-drives-with-smartalign/


----------



## AndyUKG (Jun 29, 2011)

vermaden said:
			
		

> Check here: http://consumer.media.seagate.com/2010/06/the-digital-den/advanced-format-drives-with-smartalign/



Thanks, still just seems to say its automagic. Well if it really works its great! And would beg the question why all 4k drive manufacturers haven't used something similar, would have saved a lot of problems!!


----------



## Sebulon (Jun 29, 2011)

rusty said:
			
		

> Just installed a Seagate Green SATA-III 2TB these are 4K and use SmartAlign.
> 
> A standard zpool on a single disk and no messing with ashift etc.
> Transfers of a 10GB .mkv gives
> ...



Hey,

Gotta ask you, since I've gotten burned myself once using a 3TB WD Green- have you also scrubbed that pool afterwards, and tried to use zfs send/recv? I got no indication of error until I did just that. I got as many checksum errors as I had files stored on it=) Then, after I used gpart and created an aligned partition, I haven't had any more issues with it. Could be that it's been fixed in firmware or that it's WD Green only, but I thought I'd at least give you a heads up, just in case.

/Sebulon


----------



## rusty (Jun 30, 2011)

I did a scrub after the original rsync of 534GB to the pool, just tried a *zfs send | zfs recv* with no errors I'm relieved to say


----------



## SilverJS (Jul 3, 2011)

vermaden said:
			
		

> I would go for one of these:
> 
> STRIPE[ RAIDZ(4 * 2TB) + RAIDZ(4 * 2TB) ] + HOTSPARE(1) = 6TB + 6TB = 12TB with 8 drives + hotspare
> 
> STRIPE[ RAIDZ(3 * 2TB) + RAIDZ(3 * 2TB) ] + HOTSPARE(1) = 4TB + 4TB = 8TB with 6 drives + hotspare



OK, I'm now at the point where I have to decide how to allocate all those drives.  =)  I've just received the hardware, which is 8 3TB drives.  I plan on using 7, and keeping the 8th as a spare once one fails.  I am seriously considering using the second option above, I think that is the best split between performance and redundancy, and resilvering times if required.

I have actually started another thread on these forums on this, but I later remembered that somebody had posted some info about this in one of my previous threads, and here it is. =)  Thanks.

So, what do you think?  [RAIDZ (3 * 3TB) + RAIDZ (3 * 3TB)] + HOTSPARE?


----------



## tingo (Jul 3, 2011)

As long as you have a maximum of 9 spindles (drives) per vdev, you should be fine.


----------



## vermaden (Jul 3, 2011)

SilverJS said:
			
		

> [RAIDZ (3 * 3TB) + RAIDZ (3 * 3TB)] + HOTSPARE?



Lets get rid of the marketing first: 3 * 1000^4 / 1024^4 = 2.72 TB (instead of 3 TB)

Average WRITE speed for example *Hitachi 7K3000 *is about 115 MB/s: http://www.pcdiy.com.tw/cont_img/107291_4.jpg

You have about 2.72 TB = 2785 GB = 2851840 MB data to resilver if a drive fails, let's see how much time it would take: 2851840 / 115 / 60^2 = 6.9 (about 7 hours)

Which is quite acceptable for me, so if you have 8 drives, then RAIDZ(3) + RAIDZ(3) + hotspare(2) should be OK.


----------



## SilverJS (Jul 4, 2011)

I do have eight drives, but one of them is not physically installed; I plan on using that as an actual spare to replace a physically failed drive.

So is there a performance advantage to going with two RAIDZ(3)+hotspare vice one RAIDZ2(6)+hotspare?


----------



## vermaden (Jul 4, 2011)

SilverJS said:
			
		

> So is there a performance advantage to going with two RAIDZ(3)+hotspare vice one RAIDZ2(6)+hotspare?


Yes, with 2 * RAIDZ(3) You will get about 2 x performance of RAIDZ2(6) because these RAIDZ(3) are striped.


----------



## SilverJS (Jul 4, 2011)

I see!  I'm just starting to get into all of this, and a LOT more research is required - but so far, using FreeNAS 8.0.1 BETA3 and a RAIDZ2(6)+hotspare, I'm getting write and read speeds of about 11-12 Mb per second, which to me seems abysmally low...again, I'll have to research if this is due to network bottleneck or whatever (I'm very green!), but we'll see.  Either way, I agree, a 7hour downtime is not a lot, unless I'm away, which I tend to be somewhat often with the job.  I guess that's another consideration.


----------



## vermaden (Jul 4, 2011)

SilverJS said:
			
		

> Either way, I agree, a 7hour downtime is not a lot, unless I'm away, which I tend to be somewhat often with the job.



It's NOT downtime, the pool/data will be available all the time, it's just how long would take for HOTSPARE to RESILVER, so the ZFS pool will still be REDUNDANT.


----------



## jalla (Jul 4, 2011)

vermaden said:
			
		

> Yes, with 2 * RAIDZ(3) You will get about 2 x performance of RAIDZ2(6) because these RAIDZ(3) are striped.



What makes you think that? RAIDZ2 stripes data across 6 disks (2 parity), RAIDZ stripes data across 2x3 disks (2 parity). Performance should be very similar.

The big difference is in fault-tolerance where RAIDZ2 is much better for the same number of disks. See this for a comparison of different protection schemes.


----------



## jalla (Jul 4, 2011)

SilverJS said:
			
		

> I see!  I'm just starting to get into all of this, and a LOT more research is required - but so far, using FreeNAS 8.0.1 BETA3 and a RAIDZ2(6)+hotspare, I'm getting write and read speeds of about 11-12 Mb per second, which to me seems abysmally low...again, I'll have to research if this is due to network bottleneck or whatever (I'm very green!), but we'll see.  Either way, I agree, a 7hour downtime is not a lot, unless I'm away, which I tend to be somewhat often with the job.  I guess that's another consideration.



11-12Mb/s would be normal for a 100mbit network.


----------



## vermaden (Jul 4, 2011)

jalla said:
			
		

> What makes you think that?


Its the same difference, like putting 4 drives into 4-way mirror [1] and putting drives into stripe of 2 mirrors [2], the second one would be 2 * faster. Its like comparing RAID10 to RAID11.

[1]

```
[1]-[2]-[3]-[4] MIRROR
```

[2]

```
[1]-[2] MIRROR \
                 STRIPE
[3]-[4] MIRROR /
```

RAIDZ(3) + RAIDZ(3) is like RAID50 while RAIDZ2(6) is 'only' a RAID6.


----------



## jalla (Jul 4, 2011)

OP says nothing about mirroring. The question here is about two striped vdevs vs one.

Mirroring two RAIDZ vdevs halves the effective storage.


----------



## vermaden (Jul 4, 2011)

@jalla

Have you ever heard about examples maybe?

Start understanding what you are reading, this could help take this conversation to another level.


----------



## jalla (Jul 4, 2011)

@vermaden
You're answering a question that hasn't been posed. As such, you bring nothing useful to the discussion.


----------



## AndyUKG (Jul 4, 2011)

Comparing 2x RAIDz1 vs 1x RAIDz2, the former will provide twice the write IOPS than the latter. I think read performance should be similar for both options but don't have that 100% clear myself. But I think stating twice the performance is over simplifying a little...

With regard rebuild times, that is in a perfect world. I see over 24 hour rebuilds on disks with similar average write performance, which may be due to the rather old Dell server I have but also due to other factors such as other IO load on the zpool etc etc..

Andy.


----------



## jalla (Jul 4, 2011)

AndyUKG said:
			
		

> Comparing 2x RAIDz1 vs 1x RAIDz2, the former will provide twice the write IOPS than the latter. I think read performance should be similar for both options but don't have that 100% clear myself. But I think stating twice the performance is over simplifying a little...


I don't agree.

The limiting factor is the combined number of IOPS the disks can sustain. If you're reading/writing 128k records that should distribute the load evenly across all 6 disks in parallell, regardless of whether they are organized in 2x3 or 1x6 vdevs.

There are different scenarios that could affect perfomance slightly (read vs write, sequencial vs random, small vs large block), but basically with the same number of disks and parity, plain raidz and raidz2 should have similar performance.


----------



## AndyUKG (Jul 4, 2011)

There are quite a few explanations for this behaviour out there, here is one:

http://blogs.oracle.com/roch/entry/when_to_and_not_to


----------

