# zfs: configure many disks



## nORKy (May 30, 2012)

Hi,

I'm building a big syslog server.

I have 4 HP MSA20 bay disks:

bay 1: 12*500G
bay 2: 6*500G
bay 3: 12*750G
bay 4: 10*750G
(and 2*36G for OS)

What can I do with this? Use only ZFS RAID? *O*r use Hardware RAID? *B*oth?

Thanks you for your ideas.


----------



## einthusan (May 31, 2012)

I'm not an expert but I would use ZFS RAID only. Hardware RAID can't detect silent errors (whatever that means). Also, with ZFS you can export drives to different systems and actually move the physical set of drive to different servers if you ever needed to. Also, it*'*s cheaper since you don't need so many hardware RAID cards. But common sense tells me not to use BOTH software and hardware RAID. That will probably mess things up.


----------



## nORKy (May 31, 2012)

einthusan said:
			
		

> Also, its cheaper since you don't need so many hardware RAID cards.



We need many many terabytes of data. And it's a "recycling": that's why I don't have "2T disks"


----------



## Sebulon (May 31, 2012)

@nORKy

One big pool, made up like:
bay 1: 12*500G = 2x6 raidz2 _vdev0,1_
bay 2: 6*500G = 1x6 raidz2 _vdev2_
bay 3: 12*750G = 2x6 raidz2 _vdev3,4_
bay 4: 10*750G = 1x6 raidz2 _vdev5_
+ 4x spare

Is about 15TB useable in a massive RAID60. Provides the best performance/safety/storage-ratio.



> (and 2*36G for OS)


Perfect! Make sure to mirror them.

/Sebulon


----------



## nORKy (May 31, 2012)

I will try, thanks you

But, one question, 15T when the max is 25T, isn't... little? I lost 10T.


----------



## nORKy (May 31, 2012)

An other question: do you think two mirrored SSD ZIL can help my syslog server? It does many writes.


----------



## Sebulon (May 31, 2012)

@nORKy

I understand how you feel.

There were a lot of factors I took into account when thinking about what you could do with those drives, and as I said, it provides the best performance/safety/storage-ratio. If you were to prioritize differently, perhaps only consider safety, you could end up with another configuration entirely. Same for performance and storage as well. This however is the best middle-ground. You are of course free to build your system any way you choose.

/Sebulon


----------



## nORKy (May 31, 2012)

Why do you split the raid6 here: "bay 1: 12*500G = 2x6 raidz2 vdev0,1",
 why not "1x12 raidz2"?


----------



## Sebulon (May 31, 2012)

nORKy said:
			
		

> An other question : do you think 2 mirrored  SSD ZIL can help my syslog server ? It does many writes.



Only if it is writing *sync*-writes! Otherwise it will actually hurt writes. One way to get sync-writes is if a filesystem is exported over NFS and when the client mounts and starts writing to it, those writes will be syncÂ´ed. belon_cfy reported that istgt did not start sync-writes by default, with the default ZFS settings, and I know that SAMBA doesnÂ´t either. Typically it is quite alright to not have syncÂ´ed writes unless *every* write counts. For example with a ginormous database 25TB large and some of those transactions are corrupted, and the application doesnÂ´t know which ones, becomes a *beep*storm quite fast. But for ordinary data, if a transfer goes wrong, you just send it over again and no worries.

/Sebulon


----------



## Sebulon (May 31, 2012)

nORKy said:
			
		

> Why do you split the raid6 here : "bay 1: 12*500G = 2x6 raidz2 vdev0,1",
> why not "1x12 raidz2" ?



Oracle/SUN best practice is to never have vdevÂ´s larger that 8xdrives. It has to do with resilvering times reported here on the forum, mailing lists and also from personal experience taking several weeks to complete. During which time, the pool performance and tolerance is severely crippled.

Also matching the drives in vdevÂ´s to an even number is optimal for performance as ZFS stripes writes across vdevÂ´s. With my recommendation you would have 6xdrives in every vdev. Having one vdev with 6 and another with 12 would be suboptimal in that regard.

/Sebulon


----------



## nORKy (May 31, 2012)

Sebulon said:
			
		

> Only if it is writing *sync*-writes! Otherwise it will actually hurt writes. One way to get sync-writes is if a filesystem is exported over NFS and when the client mounts and starts writing to it, those writes will be syncÂ´ed. belon_cfy reported that istgt did not start sync-writes by default, with the default ZFS settings, and I know that SAMBA doesnÂ´t either. Typically it is quite alright to not have syncÂ´ed writes unless *every* write counts. For example with a ginormous database 25TB large and some of those transactions are corrupted, and the application doesnÂ´t know which ones, becomes a *beep*storm quite fast. But for ordinary data, if a transfer goes wrong, you just send it over again and no worries.
> 
> /Sebulon



So, my syslog-ng doesn't need.


----------



## nORKy (May 31, 2012)

Sebulon said:
			
		

> Oracle/SUN best practice is to never have vdevÂ´s larger that 8xdrives. It has to do with resilvering times reported here on the forum, mailing lists and also from personal experience taking several weeks to complete. During which time, the pool performance and tolerance is severely crippled.
> 
> Also matching the drives in vdevÂ´s to an even number is optimal for performance as ZFS stripes writes across vdevÂ´s. With my recommendation you would have 6xdrives in every vdev. Having one vdev with 6 and another with 12 would be suboptimal in that regard.
> 
> /Sebulon



*A*nd what about performances beetween RAIDZ1 and RAIDZ2? *I*s it the same?


----------



## jalla (May 31, 2012)

nORKy said:
			
		

> *A*nd what about performances beetween RAIDZ1 and RAIDZ2? *I*s it the same?



In my experi*e*nce they're the same *for the same number of data drives*. I.e. 5xraidz1 has the same performance as a 6xraidz2.


----------



## bbzz (May 31, 2012)

Since these are old, recycled disks, they might die suddenly in a large number. The question is how much storage you really need out of those 25TB, and whether or not data is backed up regularly.

A three-way mirrors would give you absolute best performance, and survivability, but you do only get 1/3 of usable storage.


----------



## jalla (Jun 1, 2012)

bbzz said:
			
		

> Since these are old, recycled disks, they might die suddenly in a large number. The question is how much storage you really need out of those 25TB, and whether or not data is backed up regularly.
> 
> A three-way mirrors would give you absolute best performance, and survivability, but you do only get 1/3 of usable storage.



Partially true. A 3-way mirror is fast on reads, but slow on writes.


----------



## jalla (Jun 1, 2012)

bbzz said:
			
		

> Since these are old, recycled disks, they might die suddenly in a large number.


Do you have any empirical evidence to back up this? I mean the claim that older disks are more likely to die.


----------



## Crest (Jun 1, 2012)

jalla said:
			
		

> Do you have any empirical evidence to back up this? I mean the claim that older disks are more likely to die.



Ooohhh yes. I once lost 3 drives after one planned reboot. I revived two of them with a hair dryer bringing them back to operational temperature allowing the 6 disk RAID10 to rebuild.


----------



## bbzz (Jun 2, 2012)

jalla said:
			
		

> Do you have any empirical evidence to back up this? I mean the claim that older disks are more likely to die.



Well, not quite an "evidence", but you could call it a common sense, and it happened to me on more than one occasion. I used to recycle older disks, mostly between 80GB and 500GB, anything that was older than 6 years, and do some torrent multimedia storage backing up, disks weren't even always on. 

As time went on, they started displaying bad sectors and some wouldn't even start after a while. My guess is they were pretty trashed back in their day. The point is that multiple disks were known to "die" at the same time. Maybe newer disks are more resilient.

And you are right about 3-way mirror and writes, although I'd like to know how much slower it is compared to 2-way mirrors. I don't think it's all that significant give you have plenty of vdevs.


----------



## Terry_Kennedy (Jun 3, 2012)

jalla said:
			
		

> Do you have any empirical evidence to back up this? I mean the claim that older disks are more likely to die.


You can generalize this to "if you have one drive failure, you're more likely to have another during the RAID rebuild process".

After replacing a failed drive, data needs to be reconstructed on the new drive to make the RAID optimal. [The terminology varies, but the point remains the same.] Reading the other drives to reconstruct the data can cause a higher-than-normal rate of activity on those drives and may induce a failure in one or more of them.


----------



## nORKy (Jun 4, 2012)

Actually, no backup. But, there will be a "HAST" duplication in the future.


----------

