# ZFS performance lost when mirroring, with single disk is OK.



## gnoma (Dec 5, 2012)

Hello,

I am currently testing iSCSI target on a vmware virtual machine(2GB RAM, 2 CPUs), but if I am able to solve the zfs performance issue, I will put it on a physical host with 6GB RAM. 

The problem is not in the istgt, I've already checked that.
The root pool is on a single disk and it is located on the physical disk where one of the mirror disks are. The other mirror disk is alone on another physical disk.
However, the zfsroot pool seems to have a lot better performance than the mirrored disks.
Check this out:


```
sandbox# zfs list
NAME             USED  AVAIL  REFER  MOUNTPOINT
datacore         146G   204M    31K  /datacore
datacore/istgt   146G   145G  1.29G  -
zfsroot         6.08G  56.4G  1.95G  /
zfsroot/swap    4.13G  60.6G    16K  -
sandbox#
sandbox# zpool status
  pool: datacore
 state: ONLINE
 scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        datacore    ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            da1     ONLINE       0     0     0
            da2     ONLINE       0     0     0

errors: No known data errors

  pool: zfsroot
 state: ONLINE
 scan: scrub repaired 0 in 0h1m with 0 errors on Wed Dec  5 16:16:53 2012
config:

        NAME        STATE     READ WRITE CKSUM
        zfsroot     ONLINE       0     0     0
          da0p2     ONLINE       0     0     0

errors: No known data errors
sandbox#
sandbox# dd if=/dev/zero of=/datacore/file.foo bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes transferred in 6.949640 secs (15088206 bytes/sec)
sandbox# dd if=/dev/zero of=/root/file.foo bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes transferred in 0.292284 secs (358752432 bytes/sec)
sandbox#
```
Does anybody know a reason for this? A single disk write 100MB takes 0.3 seconds and a mirror disks takes 7 seconds. Note that the checksum of the mirror(datacore) pool is off, and the zfsroot is on.
Any ideas? I hope this is not a normal behavior for zfs mirror.
Thank you.


----------



## usdmatt (Dec 6, 2012)

Using /dev/zero is absolutely useless for testing throughput. I think people use it because it gives big numbers and makes them feel like their system is fast. Check these numbers from my ZFS system:


```
backup# dd if=/dev/zero of=test.dat bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes transferred in 2.178107 secs (48141620 bytes/sec)
backup# dd if=/dev/zero of=test.dat bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes transferred in 0.176244 secs (594956922 bytes/sec)
```

First run ~48MBps, but the second 567MBps? Anything over about 130-150MBps with current SATA disks is probably an error. There's no way your single disk is running at ~350MBps. Your results are also heavily influenced by the fact that all those writes will be going into RAM first, then flushed to disk in the background.

If you want to actually test performance, use proper performance tools like benchmarks/bonnie++, specifying a test size twice your RAM. You may also want to reboot between tests just to make sure it's fair.

If you do find the mirror is noticeably slower, try off-lining the disks one at a time and redoing the test to make sure it's not one bad disk in the pair dragging it down.


----------



## jem (Dec 6, 2012)

How are the two disks connected?  What sort of controller?  What sort of host bus?

Writing to a mirror could take up to twice as long compared to a single disk, but reading can be faster due to being able to interleave the requests between two drives.


----------



## gnoma (Dec 6, 2012)

Aloha,



> If you want to actually test performance, use proper performance tools like benchmarks/bonnie++, specifying a test size twice your RAM. You may also want to reboot between tests just to make sure it's fair.


I am not interesting of the exact numbers, that's why I don't need benchmark or other testing tool. I know very well that this numbers I got are probably wrong, or at least not the same as for real world production tasks like databases or file transfers. What bothers me is the huge performance differences between the 2 pools in a same given task. I know that in zfs, big role is playing the memory and the cache, so I tried to rotate the order and test first the single disk pool and then the mirror pool. The result was almost the same.


> How are the two disks connected? What sort of controller? What sort of host bus?


The disks are connected via vmware "LSI Logic Parallel" SCSI controller. They are all vmware virtual disks 150GB.
I think it doesn't matter what are the physical disks and the controller, because they are all controlled by the ESXi 5.1 OS. The virtual machine hardware version is 8 (newest compatible only with ESXi5 and later). VMware tools are installed and up to date.
One of the mirrored disks is located on the same physical disks as the zfsroot disk, the other mirrored disk alone on the 2nd physical disk.


> Writing to a mirror could take up to twice as long compared to a single disk, but reading can be faster due to being able to interleave the requests between two drives.


As far as I know writing data to mirror raid array should take the time necessary to write the same data on the slowest disk of the array. Not twice longer, because it is written on all disks at the same time. And I think ZFS mirror shouldn't be exception of this rule.

However, I the results I got are far more disappointing.



> If you do find the mirror is noticeably slower, try off-lining the disks one at a time and redoing the test to make sure it's not one bad disk in the pair dragging it down.


Done, I removed on of the mirrored disks, now there is just a single disk (the one that is alone on the physical disk). The status of the pool is degraded. Rebooted the system just in case. Now the writing of 100MB takes a little more than 5 seconds. Tried it 2-3 times, it doesn't get faster. It is little faster than with both mirrored disks, but still more than 10 times slower than the zfsroot pool which is single disk pool.

Any more ideas? could it be because 99% of the pool is locked in a virtual block devise(/dev/zvol/datacore/istgt) and it is not mounted as a filesystem?

Thank you for the response.


----------



## Martillo1 (Dec 6, 2012)

Why not a pure mirror configuration?


----------



## usdmatt (Dec 6, 2012)

> I am not interesting of the exact numbers, that's why I don't need benchmark or other testing tool. I know very well that this numbers I got are probably wrong, or at least not the same as for real world production tasks like databases or file transfers. What bothers me is the huge performance differences between the 2 pools in a same given task.



Exact numbers isn't really the issue. It's entirely possible for write tests using /dev/zero to be all over the place and to bear absolutely no relation to real performance (As I demonstrated with a pool that gave ~50MBps one minute and 550 the next). You can not compare those two dd runs and come to any serious conclusion that one pool is faster than the other, as the ~350MBps result you got for the single disk is obviously wrong.

Considering you are on top of ESXi using the Parallel driver I would personally consider 48MBps to be about right and your far-too-fast-for-the-hardware performance in the other dd is an anomaly.


I would recommend using the LSI SAS adapter in VMware rather than the Parallel one
If you really think one pool is slower than the other, please get some genuine performance figures for both that can be compared with some level of accuracy.
Having a ZVOL take up almost all the space could cause slower writes. I didn't notice before but you've only got 200MB left which isn't ideal. I would create the ZVOL using the -s option to make it sparse so it doesn't reserve all the pool space.

Also any particular reason for creating a ZFS mirror (assumed for data safety) and then turning checksum off? Turning checksum off is almost always a mistake.


----------



## gnoma (Dec 6, 2012)

Hello,

The purpose of this zfs pool is to be iSCSI target LUN. That's why it is meant to be 100% a block devise. I think that the checksum plays role only when it is about files. The idea here is to have a block devise only, to be turned as a LUN and to be formated with VMFS filesystem.
Adding -s when creating the mirror pool seems to give results, now using dd, the speed is the same as writing file in the zfsroot pool.
However, at large write task, it still have extremely slow performance. Trying to migrate 8GB virtual machine on this iSCSI datastore ended with timeout error, only 1.6GB were transfered and the syslog got full with:

```
Dec  6 14:42:05 datacore istgt[1424]: istgt_iscsi.c: 777:istgt_iscsi_write_pdu_internal: ***ERROR*** iscsi_write() failed (errno=32)
Dec  6 14:42:05 datacore istgt[1424]: istgt_iscsi.c:3082:istgt_iscsi_transfer_in_internal: ***ERROR*** iscsi_write_pdu() failed
Dec  6 14:42:05 datacore istgt[1424]: istgt_iscsi.c:3447:istgt_iscsi_task_response: ***ERROR*** iscsi_transfer_in() failed
```
There is still possibility that the virtualization process play role here, but when I move it to a physical machine with 6GB RAM and xeon CPU, if it still has the same performance, I think I'll use gmirror for iSCSI target.


----------



## usdmatt (Dec 6, 2012)

I can't really help with the iSCSI errors or comment on what performance you should expect.

Disks -> ESXi -> FreeBSD (ZFS ZVOL) -> iSCSI -> ESXi to me seems like it's bound to have performance problems though. Are you running this all on one system or do you have a second ESXi system accessing the storage?

I personally would not consider running a storage system for VMs on anything other than raw hardware. I don't like the idea of trying to run a server whose sole purpose is providing storage (at as high performance as possible), but is accessing it's own disks & network through virtualisation.

Any yes, checksum still fully applies to ZVOLS. With checksum on, any errors a scrub (or normal read) finds in a record can be automatically read from the other disk and re-mirrored. All data in a ZPOOL is stored on disk as checksummed records (assuming you have checksum on) regardless of whether it was written to a ZFS file system or a ZVOL.


----------



## usdmatt (Dec 6, 2012)

> Adding -s when creating the mirror pool seems to give results, now using dd, the speed is the same as writing file in the zfsroot pool.
> However, at large write task, it still have extremely slow performance.



This is what I was talking about. You're now seeing similar figures with dd, but that's not any real indication of what performance you are actually getting. You're seeing the 'real' performance when trying to write lots of data.


----------



## gnoma (Dec 6, 2012)

There is a second ESXi host with a second storage. I am taking the 8GB VM from it and I am trying to migrate it to this LUN. And the ESXi OS is running on different drive, so this 2 physical disks should not ne affected by anything else except this VM.
After I destroyed the ZVOL and recreated it again with -s option, I turn on the checksum.
And this datacore machine is only for test purpose. I think all the configuration is ready now, or at least I'll not be able to do more on virtual machine, so the next week when I get the hardware, I am going to put everything on real metal hardware and I'll see if the issue remains. Then it will probably be RAID10 for the LUN and mirror for the root pool, but if I have this issue on RAID1, it will sure be the same at some levels of RAID10.

Thank you.


----------



## usdmatt (Dec 6, 2012)

I would expect a raid-10 pool on real hardware to perform a lot better that the system you are testing with.


----------



## gnoma (Dec 6, 2012)

Just for the test, I destroyed the datacore pool and made gmirror array. I've started the istgt without changing anything to the istgt.conf except pointing it to the block devise /dev/mirror/gm0. It works perfectly, I have migrated a virtual machine (vpn server with almost no storage load) on it, wihtout getting a single iscsi write error. The migration is slow and inside the migrated VM, storage performance is even slower(this is normal considering that the SAN is virtual machine for test purpose), but so far this is much better than the zfs pool. 
When I get the hardware, I will test it on zfs just to see the score, but I think I'm giving up the zfs idea for iSCSI and probably I'll make it on gmirror.
I will still have 1 or 2 disks fault tolerance for raid 10, but I think that the performance will be a lot faster than zfs.


----------

