# ZFS problems with read speed.



## zux (Feb 2, 2012)

Hi, *I*'ve installed FreeBSD 9.0 on a separate SSD disk and *I*'m testing zfs on a 4 other disks. (these are 7200rpm SATA disks 1 TB in size)

*I* like what *I*'m seeing, except the read speed from the tests *I*'ve done. I have tried several type of zfs (raidz, two mirrors). Right now *I* have created a zfs volume like this:

```
zpool create data da0 da1 da2 da3
```

This gives me a wonderful  write speed:

```
# dd if=/dev/zero of=/data/test.iso bs=1024M count=10
10+0 records in
10+0 records out
10737418240 bytes transferred in 37.657154 secs (285136213 bytes/sec)
```

but the read speed is even slower than write, and never exceeds the read speed of a single drive.


```
# dd if=/data/test.iso of=/dev/null 
20971520+0 records in
20971520+0 records out
10737418240 bytes transferred in 96.546774 secs (111214677 bytes/sec)
```

*W*hat am *I* doing wrong? *O*r am *I* expecting wrong results? *T*his is my first time with zfs, and *I*'m not very familiar with FreeBSd FreeBSD also.

P.S. this box is a quadcore Xeon with 8 GB of RAM. *T*he four drives are connected to a 3ware controller. *I*'ve tested reading speed from all four drives at the same time, and they all gave ~106-110 MB/s, so the controller shouldn't be a problem.


----------



## Terry_Kennedy (Feb 5, 2012)

zux said:
			
		

> What am I doing wrong? Or am I expecting wrong results? This is my first time with zfs, and I'm not very familiar with FreeBSD also.
> 
> P.S. this box is a quadcore Xeon with 8 GB of RAM. The four drives are connected to a 3ware controller. I've tested reading speed from all four drives at the same time, and they all gave ~106-110 MB/s, so the controller shouldn't be a problem.


You don't mention what FreeBSD version or what ZFS pool version and ZFS options you're using, so it is hard to say.

Using a 3Ware 9650, I get somewhat faster results than you:


```
(0:6) rz1:/data/DVD Movies# time dd if=filename-censored.ISO of=/dev/null bs=1024m
20+1 records in
20+1 records out
21575630848 bytes transferred in 27.665664 secs (779870343 bytes/sec)
0.000u 20.314s 0:27.79 73.0%    27+1536k 165862+0io 0pf+0w
```

Remember that ZFS will only operate on one raidz member at a time, so for maximum speed you will need a pool composed of multiple raidz's.


----------



## shitson (Feb 5, 2012)

This isn't really the best test, but try and modify your sector size for your transfer to /dev/null, maybe something like *bs=1m* and post results. If you leave up the bs size to dd it may select something that will inhibit the speed of transfer.


```
hydra# dd if=/troop/data/my.iso1 of=/dev/null [B]bs=1M[/B]
5000+0 records in
5000+0 records out
5242880000 bytes transferred in 4.098166 secs [color="SeaGreen"](1279323486 bytes/sec)[/color]
hydra# dd if=/troop/data/my.iso of=/dev/null bs=1M
2000+0 records in
2000+0 records out
2097152000 bytes transferred in 2.181354 secs (961399289 bytes/sec)
hydra# dd if=/troop/data/my.iso of=/dev/null
4096000+0 records in
4096000+0 records out
2097152000 bytes transferred in 19.749541 secs [color="Red"](106187379 bytes/sec)[/color]
hydra# ls -lah /troop/data/my.iso
-rw-r--r--  1 root  wheel     2G Feb  5 16:51 /troop/data/my.iso
hydra# ls -lah /troop/data/my.iso1
-rw-r--r--  1 root  wheel   4.9G Feb  5 19:18 /troop/data/my.iso1
```


----------



## phoenix (Feb 6, 2012)

And you really shouldn't use dd() as a benchmarking tool.  Especially as a disk benchmarking tool.  And definitely not as a ZFS benchmarking tool.  Especially when using /dev/zero.

Use a proper disk benchmarking tool.  And understand how the tool works and interacts with CPU, RAM, ARC, L2ARC, controller cache, disk caches, etc.


----------



## Terry_Kennedy (Feb 6, 2012)

phoenix said:
			
		

> And you really shouldn't use dd() as a benchmarking tool.  Especially as a disk benchmarking tool.  And definitely not as a ZFS benchmarking tool.  Especially when using /dev/zero.
> 
> Use a proper disk benchmarking tool.  And understand how the tool works and interacts with CPU, RAM, ARC, L2ARC, controller cache, disk caches, etc.


Very true. I just posted my dd() results as it was very easy to do, and generated a result that the OP could definitely relate to.

The benchmarks/iozone port can test a sometimes-bewilderingly large number of things. Interpreting them is the challenging part. Fortunately, the iozone author is quite helpful. When trying to understand an unusual trough in one of my benchmarks, he correctly guessed that I was using a particular CPU model due to the result being from an anomaly in its cache handling.

I recommend the add-on Excel macros which he sells on his web site for a very modest cost. They allow you to visualize the data graphically, like this.


----------



## zux (Feb 6, 2012)

Terry_Kennedy said:
			
		

> You don't mention what FreeBSD version or what ZFS pool version and ZFS options you're using, so it is hard to say.
> 
> Using a 3Ware 9650, I get somewhat faster results than you:
> 
> Remember that ZFS will only operate on one raidz member at a time, so for maximum speed you will need a pool composed of multiple raidz's.



This is FreeBSD 9.0, with the default ZFS pool version (28 ?).
My 3ware controller is:

```
vendor     = '3ware Inc'
    device     = '9650SE SATA-II RAID PCIe'
```



> This isn't really the best test, but try and modify your sector size for your transfer to /dev/null, maybe something like bs=1m and post results. If you leave up the bs size to dd it may select something that will inhibit the speed of transfer.



Oh, you*'re* right:

```
[root@piekuuns2 ~]# dd if=/data/test.iso of=/dev/null bs=1M
10240+0 records in
10240+0 records out
10737418240 bytes transferred in 34.432641 secs (311838360 bytes/sec)
[root@piekuuns2 ~]# dd if=/data/test.iso of=/dev/null bs=1024M
10+0 records in
10+0 records out
10737418240 bytes transferred in 30.079750 secs (356965009 bytes/sec)
```

As for the benchmarking tools, I think it's best to test with real work. This box will be a secondary backup storage, for large backup files. I'm right now copying a 200GB database backup. For the moment, network is the bottleneck, so I'll start looking at aggregating NIC's.

Thanks for your answers.


----------

