# ZFS with SSD Read Cache Throughput



## einthusan (Mar 25, 2012)

Hi All,

Can a ZFS pool (without hardware raid) saturate a 1 GbE? Assume I am using 4x500 GB SATA drive.
If not, can adding a 30GB SSD as read cache saturate the Gigabit line?

Does ZFS work with hardware raid or is it meant to be an alternative to it?

I'm using this for video streaming in a production environment. Each video being 1-2 GB. Some videos being 100+ MB.

I'm just trying to figure out a way so that if for instance 100 people were simultaneously streaming a video, the bottleneck would be the Gigabit connection and not the hard disk I/O.

Thanks for anyone's help in advance. Any links to other discussions that I might have overlooked would be appreciated as well.


----------



## phoenix (Mar 25, 2012)

If you want high I/Ops and high throughput (MBps), then be sure to use mirror vdevs and not raidz (any level) vdevs.

Be sure to put in as much RAM as you can afford, and give as much as possible (within the limits of any apps running on the system) to the ARC.

Adding an L2ARC won't help the streaming case, as large sequential (aka streaming) reads bypass the L2ARC.

A 'normal' SATA harddrive has approx 100 MBps of sequential bandwidth available.  4 drives in 2 mirror vdevs in the same pool, would give you approx 200 MBps of sequential read throughput, in a perfect benchmark, which should (in theory) saturate a gigabit link.

Adding 2 more drives to make 3 mirror vdevs should give you enough cushion to have multiple reads going over a gigabit link with the NIC being the bottleneck.


----------



## jalla (Mar 25, 2012)

phoenix said:
			
		

> If you want high I/Ops and high throughput (MBps), then be sure to use mirror vdevs and not raidz (any level) vdevs.


It might surprise you that a 4-disk raidz is faster than a 4-disk mirror


```
gnome:/medialib/video/Movies# ls -l Goodfellas.mpg 
-rw-r--r--  1 tl  wheel  6675808936 Jan 23  2011 Goodfellas.mpg
gnome:/medialib/video/Movies# time cp Goodfellas.mpg /dev/null
0.007u 1.276s 0:26.78 4.7%	20+1475k 0+0io 0pf+0w
gnome:/medialib/video/Movies# dc
6675808936
27/p
247252182
1024
1024*
/p
235
```

I.E, clearly more than enough to saturate a 1Gb link.

The disks in this case are older Samsung F1 (500Gb) which probably have a max throughput of 80-90 MBps.


```
gnome:/medialib/video/Movies# zpool status media
  pool: media
 state: ONLINE
 scan: scrub canceled on Sun Mar 11 18:26:39 2012
config:

	NAME                STATE     READ WRITE CKSUM
	media               ONLINE       0     0     0
	  raidz1-0          ONLINE       0     0     0
	    label/media_p0  ONLINE       0     0     0
	    label/media_p1  ONLINE       0     0     0
	    label/media_p2  ONLINE       0     0     0
	    label/media_p3  ONLINE       0     0     0
```


----------



## phoenix (Mar 25, 2012)

A single 4-disk raidz vdev will not be faster than 2x 2-disk mirror vdevs (aka, RAID10), especially if you try to do more than 1 thing at a time.

My home media server was originally a 3-disk raidz1 vdev using 160 GB SATA and a 2-disk mirror vdev using 120 GB IDE drives (only using master on each channel).  Had to use a USB key as an L2ARC cache in order to get anywhere close to 100 MBps throughput while downloading torrents and watching shows on the separate HTPC.

Switched to a 4-disk raidz1 using 500 GB SATA disks.  Still had to use the USB-based L2ARC to get good speeds through it.  Was much better than the mismatched, unbalanced pool prior.  But still noticed stuttering and buffering with the HTPC while the wife or I used the media server PC for anything.

Migrated to a dual 2-disk mirror setup using the same 500 GB SATA disks with the USB-based L2ARC device.  Pool performance was still pretty bad.  Removing the USB L2ARC device, suddenly things were speedy!  Where before I was lucky to get 700 KBps (just under 6 Mbps) downloads on torrents and anything over 1 Mbps to the HTPC was iffy ... now I can download at over 2 MBps (just under 20 Mbps) across multiple torrents, while watching x264 streams to the HTPC, while the wife is on Facebook, without any issues.

Watching gstat(1) output, I routinely see all 4 drives doing more than 30 Mbps each in reads and writes, and the resilver I did this morning to replace a dead drive topped 250 MBps.

For a single sequential read stream, you can (theoretically) get better performance out of a raidz1 vdev than a mirror vdev.  But not for 2+ simultaneous read streams, or simultaneous read/write, or pretty much any other workload.

raidz vdevs are great are storage and redundancy.  But they're not that great for performance.  Good enough for some things?  Definitely.  But streaming workloads aren't one of them.


----------



## jalla (Mar 25, 2012)

Well, read the numbers. 235Mb/s speaks of itself.


----------



## phoenix (Mar 25, 2012)

Yes, for a single sequential stream.  Now try two simultaneously.  Or a read and a write.  And watch the numbers tank.


----------



## jalla (Mar 25, 2012)

For sequencial data, I usually see 120-140Mb/s from the pool

Here is a typical view of gstat while reading 3 multiGb files

```
dT: 10.002s  w: 10.000s  filter: media
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
   10    554    554  41924   17.8      0      0    0.0   99.5| label/media_p1
   10    573    573  42996   16.1      0      0    0.0   95.6| label/media_p0
   10    573    573  43012   16.2      0      0    0.0   96.5| label/media_p3
   10    564    564  42700   17.8      0      0    0.0  100.0| label/media_p2
```
Increasing to 5 readers shows a slight drop

```
dT: 10.001s  w: 10.000s  filter: media
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
   10    503    503  37898   19.8      0      0    0.0   99.9| label/media_p1
   10    504    504  38023   19.8      0      0    0.0   99.8| label/media_p0
   10    502    502  37664   19.9      0      0    0.0  100.0| label/media_p3
   10    497    497  37328   20.1      0      0    0.0  100.0| label/media_p2
```
 
So how about simultaneous reads and writes?
Here's 3 readers with mythtv doing 3 recordings at the same time

```
dT: 10.001s  w: 10.000s  filter: media
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
   10    585    567  42266   16.4     18   1255   28.2   99.5| label/media_p1
   10    585    566  42092   15.8     18   1256   14.2   95.6| label/media_p0
    0    584    565  41965   15.4     19   1256   15.2   93.6| label/media_p3
   10    563    544  40420   17.0     19   1255   37.6  100.0| label/media_p2
```
 
In all cases that's a throughput of well over 100Mb/s


----------



## einthusan (Mar 25, 2012)

Thanks for both of your input and tests you have shown. However, just wanted to clarify some things (for me and others who might read this). I am somewhat new to RAID and ZFS and storage related bottleneck issues so sorry if i say something stupid.

1. I'm looking at streaming 1 GB files to not just 1 or 2 people simultaneously, I'm talking 50-100 people at say 700 kb/s streams (moderating each streams max speed).

2. Would a 1TB or 2TB drives be better than a 500 GB? So basically 4 drives each with 2TB or 1TB capacity.

3. We already have offsite backups, so i hope the two methods explained don't waste storage space for "redundancy" and if any redundancy is done, its for the sole purpose of high I/O throughput.

4. Are these throughputs achieved by storing the exact same data on 4 drives? So basically if we used a 4x500 GB, I could only store around 400 movies each being 1 GB?

5. Is it recommended that I don't fill up the entire hard drive with movies, and thus leave the drive at least 20% free?

The data below is our current setup, believe it or not, just uses a UFS+2 filesystem with no hardware or software raid at all. No mirroring no nothing, just direct disk reads. Able to do 25 MB/s and don't know how much more it can go because we don't have enough real world traffic to test. I think there are at least 20 simultaneous streams causing the 25 MB/s throughput. Also, we don't care about write performance loss at all. Just want read performance.


```
dT: 1.001s  w: 1.000s
L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy  Name
80    210    208  25830  471.1      2     32     588.9  100.2   ada3s1d
```


----------



## t1066 (Mar 25, 2012)

phoenix said:
			
		

> Adding an L2ARC won't help the streaming case, as large sequential (aka streaming) reads bypass the L2ARC.



Actually,

`# sysctl vfs.zfs.l2arc_noprefetch=0`

will make L2ARC to serve out streaming data. And with the current generation of SSDs, like the Crucial M4, the gain in speed and latency is quite respectable.


----------



## jalla (Mar 26, 2012)

einthusan said:
			
		

> 1. I'm looking at streaming 1 GB files to not just 1 or 2 people simultaneously, I'm talking 50-100 people at say 700 kb/s streams (moderating each streams max speed).



I don't see any significant difference in reading a 100 files compared to just a handful. Here's a typical gstat while copying 113 big files to /dev/null in parallell.


```
dT: 20.002s  w: 20.000s  filter: media
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
   10    530    530  39632   18.9      0      0    0.0  100.0| label/media_p1
   10    522    522  39050   19.2      0      0    0.0  100.0| label/media_p0
   10    516    516  38544   18.4      0      0    0.0   96.5| label/media_p3
   10    510    510  38164   19.7      0      0    0.0  100.0| label/media_p2
```


----------



## einthusan (Mar 26, 2012)

Wow, that's exactly what I want. So your setup is 4x500 GB drives in ZFS raidz? A bit more information about your disk configuration would be greatly helpful. Thanks.


----------



## jalla (Mar 27, 2012)

Nothing special, just 4 disks in raidZ connected to onboard SATA ports on a Asus MB.

I should note that I'm using the old ata(4)() driver instead of ahci(4)(). ahci should give better performance, but unfortunately I can't trust it with my hardware. (After losing one drive for the third time I went back to ATA).


----------

