# ZFS slow write



## miks (Jul 5, 2010)

I just can't understand why on 3 different disks zfs always have lower write speed than ufs. In some case differences is dramatic.
Also, I'm seeing interesting things in gstat. While there are UFS in use, busy column never get red and busy percents are steady. When I'm copying files to ZFS, some moments busy column are showing 0%  but in next few seconds it's going red and some 87-110%.
In moments when with ZFS busy column is red, rsync is having some kind of freeze - current speed report is not updating anymore.

OS: FreeBSD 8.1 RC2
Controller: 9690SA-4I
RAM: 8gb
Controller write and read cache is on for all drives.

When disks is formated as ZFS they are simple pool (no raidz, etc)

Intel SSD X25-M G2 80GB
with UFS

```
[root@host /]# rsync --progress /home/user/test.img  /ufs-ssd/test.img
test.img
  3222128640 100%   78.84MB/s    0:00:38 (xfer#1, to-check=0/1)
sent 3222522038 bytes  received 31 bytes  81582837.19 bytes/sec
```

with ZFS

```
[root@host /]# rsync -av --progress /home/user/test.img /zfs-ssd/test.img
sending incremental file list
test.img
  3222128640 100%   46.29MB/s    0:01:06 (xfer#1, to-check=0/1)
sent 3222522042 bytes  received 31 bytes  48458978.54 bytes/sec
```

Seagate DiamondMax 1TB 7,2k
with UFS

```
[root@host /]# rsync -av --progress /home/user/test.img /ufs-1b/test.img 
test.img
  3222128640 100%   74.97MB/s    0:00:40 (xfer#1, to-check=0/1)
sent 3222522042 bytes  received 31 bytes  77651134.29 bytes/sec
```

with ZFS

```
[root@host /]# rsync -av --progress /home/user/test.img /zfs-1tb/test.img
test.img
  3222128640 100%   61.32MB/s    0:00:50 (xfer#1, to-check=0/1)
sent 3222522042 bytes  received 31 bytes  63812318.28 bytes/sec
```

Western Digital Raptor 150GB 10k
with UFS

```
[root@host /]# rsync -av --progress /home/user/test.img /ufs-150gb/test.img
xtest.img
  3222128640 100%   73.47MB/s    0:00:41 (xfer#1, to-check=0/1)
sent 3222522042 bytes  received 31 bytes  75824048.78 bytes/sec
```

with ZFS

```
[root@host /]# rsync -av --progress /home/user/test.img /zfs-150gb/test.img
test.img
  3222128640 100%   44.93MB/s    0:01:08 (xfer#1, to-check=0/1)
sent 3222522042 bytes  received 31 bytes  46367224.07 bytes/sec
```

Any suggestions to try out?


----------



## Alt (Jul 5, 2010)

You benchmarking software raid(with raid functions off) against non-raid


----------



## miks (Jul 5, 2010)

There are no raid in this case, I just copied file from singe disk to single disk.


----------



## Alt (Jul 5, 2010)

When you say "to *disk*" it means you copy to /dev/adX :e In your case you copying to zfs pool with 1 disk 

Btw, what you want to see?

So why in this case its slower?
1. Its software.
2. Its stripe with 1 disk. Its still raid.
3. (too much) caching
4. zfs reservering(writing to unused) blocks to do snapshots etc (and God knows what other reasons)
5. You get scalability, but you must lose something. For example, if you add mirror vdev you will get greater read speed.. "Nature law" as you wish=)

About write blackouts - i think need to reduce write cache timeouts/buffers.


----------



## miks (Jul 5, 2010)

Regarding to write cache timeouts/buffers.
It's vfs.zfs.txg.timeout?


----------



## Alt (Jul 6, 2010)

Seems yes
http://permalink.gmane.org/gmane.os.freebsd.devel.file-systems/8875


----------



## miks (Jul 6, 2010)

Tried it with 5 and 1, no big differences. 
Also I don't want to believe that with ZFS you get close to 2x slower writes because of benefit that it can give.


----------



## User23 (Jul 6, 2010)

Maybe the cpu overhead with ZFS is to much for your cpu.

I have tested myself ZFS on single device (hwraid 5) and multiple (every drive as single drive) in different ZFS configurations with a 9690SA-8I:

http://forums.freebsd.org/showthread.php?t=9859

And i would not use rsync to benchmark.


----------



## miks (Jul 6, 2010)

I have two xeon quadra core cpu, so there can't be problem because of slow cpu.
Why would you not recommend to use rsync? Its not optimized for ZFS ?
I believe it's quite real daily usage not some synthetic benchmarking tools.
I also got very similar results with simple cp command. SSD and 10K WD digital seems to be close to 2x faster in writing with UFS than with ZFS.
But for me most annoying thing with ZFS is periodical freeze while it's writing to disks.


----------



## User23 (Jul 7, 2010)

miks said:
			
		

> I have two xeon quadra core cpu, so there can't be problem because of slow cpu.
> Why would you not recommend to use rsync? Its not optimized for ZFS ?



Because what you compare is copying files UFS->UFS and UFS->ZFS.



			
				miks said:
			
		

> I believe it's quite real daily usage not some synthetic benchmarking tools.



So your daily usage is copying from UFS->ZFS? On single drives??



			
				miks said:
			
		

> I also got very similar results with simple cp command. SSD and 10K WD digital seems to be close to 2x faster in writing with UFS than with ZFS.



Yes, because even rsync is just copying your files too.



			
				miks said:
			
		

> But for me most annoying thing with ZFS is periodical freeze while it's writing to disks.



I dont think you will get happy with ZFS ... not in that way ^^


----------



## miks (Jul 7, 2010)

> Because what you compare is copying files UFS->UFS and UFS->ZFS.


No, I'm copying from ZFS mirror.



> So your daily usage is copying from UFS->ZFS? On single drives??


Results are same if I'm copying data from single disk zfs pool back to mirrored one.
Does raidz with 4 x 1TB drives will give much more stable write speed that 2 drives mirror?



> I dont think you will get happy with ZFS ... not in that way


Writing performance is known problem with ZFS?


----------



## t1066 (Jul 7, 2010)

I use the command

`# sysctl vfs.zfs.txg.write_limit_override=1048576000`

to limit the write speed to around 100M/s.

Tweaking the above command may alleviate the periodic freeze.


----------



## miks (Jul 7, 2010)

t1066 said:
			
		

> I use the command
> 
> `# sysctl vfs.zfs.txg.write_limit_override=1048576000`
> 
> ...



Thanks, now ZFS writing is even faster than UFS!


----------



## Matty (Jul 8, 2010)

t1066 said:
			
		

> I use the command
> 
> `# sysctl vfs.zfs.txg.write_limit_override=1048576000`
> 
> ...



Using it couple of weeks now and it gives  me the best results so far


----------



## miks (Jul 8, 2010)

Can someone explain what this tunable is doing?
From name it seems that it's limiting write, but where? Even with UFS I got close to 80mb/s max write speed.


----------



## Matty (Jul 8, 2010)

miks said:
			
		

> Can someone explain what this tunable is doing?
> From name it seems that it's limiting write, but where? Even with UFS I got close to 80mb/s max write speed.



You tune this variable to the max mb/s your HD can handle. this way there are a lot less write stalls.
Before that you would tune the txg wait time to 4s or 5s but that is only convenient if you'r writing a full speed.

So now the txg gets written when a. 30sec have passed or b. when x amount of data is in the txg (x= max you'r harddisk can handle)


----------



## User23 (Jul 8, 2010)

I have tested with default


```
sysctl vfs.zfs.txg.write_limit_override=0
```

and


```
sysctl vfs.zfs.txg.write_limit_override=1048576000
```


on a single disk with zfs and i see no changed in the write speed.

FreeBSD 8.1-RC1 amd64


----------



## Matty (Jul 8, 2010)

User23 said:
			
		

> I have tested with default
> 
> 
> ```
> ...



did you remove the txg timeout in the loader.conf?
and it's no so much about the speed but to get rid of the write stalls


----------



## miks (Jul 8, 2010)

User23 said:
			
		

> I have tested with default
> 
> 
> ```
> ...



What kind of disk and controller you have?


----------



## User23 (Jul 8, 2010)

3ware 9550SXU-8LP with a WD5002ABYS configured as single device.
I'll check this again with a faster WD3000HLFS. Unfortunately i have no free SSD


----------



## Matty (Jul 8, 2010)

User23 said:
			
		

> 3ware 9550SXU-8LP with a WD5002ABYS configured as single device.
> I'll check this again with a faster WD3000HLFS. Unfortunately i have no free SSD



think the raidcontroller cache does help quiet a bit.


----------



## miks (Jul 8, 2010)

From this test http://www.xbitlabs.com/articles/storage/display/1tb-14hdd-roundup_9.html looks like WD RE3 have sequential write speed around 100mb/s.
Maybe ZFS have problems only with disks who have lower write speed?


----------



## Matty (Jul 8, 2010)

miks said:
			
		

> From this test http://www.xbitlabs.com/articles/storage/display/1tb-14hdd-roundup_9.html looks like WD RE3 have sequential write speed around 100mb/s.
> Maybe ZFS have problems only with disks who have lower write speed?



I have a raid10 zpool with 4 samsung F3 1TB drives and had same issue. 

now I can write 180mb/s without any problems or stalls on a AMD X2 with 4gb.


----------



## miks (Jul 8, 2010)

exactly with these sysctl vfs.zfs.txg.write_limit_override=1048576000 values or little bit higher?


----------



## Matty (Jul 8, 2010)

miks said:
			
		

> exactly with these sysctl vfs.zfs.txg.write_limit_override=1048576000 values or little bit higher?



I'm at work right now but as far as I remember round 180.000.000. will check when I get home.


----------



## User23 (Jul 8, 2010)

```
ada0 at ahcich0 bus 0 scbus1 target 0 lun 0
ada0: <WDC WD3000HLFS-01G6U0 04.04V01> ATA-8 SATA 2.x device
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 286168MB (586072368 512 byte sectors: 16H 63S/T 16383C)
```


```
# zpool status
  pool: home
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        home        ONLINE       0     0     0
          ada0      ONLINE       0     0     0

errors: No known data errors
```

With sysctl vfs.zfs.txg.write_limit_override=0 i got:


```
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
                16G    84  99 95612  28 50712  15   239  99 115950  17 177.3  19
Latency               436ms   12484ms    6157ms   45120us    2803ms    1309ms
Version  1.96       ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 20065  97 +++++ +++ 13700  98 16218  97 +++++ +++ 14248  98
Latency             16900us     224us     361us   39299us     123us     622us
```

and with sysctl vfs.zfs.txg.write_limit_override=1048576000 i got:


```
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
                16G    96  99 92547  28 48933  14   223  92 117699  17 180.9  16
Latency               441ms    3122ms    4999ms    1009ms     550ms    1091ms
Version  1.96       ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 18500  97 +++++ +++ 17332  98 14217  97 +++++ +++ 17178  98
Latency             21881us     153us     216us   43772us      45us      94us
```


----------



## da1 (Feb 11, 2011)

This post was very useful for me so I will post my conf:

8.1 amd64 (ZFSonRooT) with 2x 500GB WD RE3 (mirror) and 2x 1.5T WD Caviar Black (mirror).

/boot/loader.conf

```
vfs.root.mountfrom="zfs:zroot"
zfs_load="YES"
geom_mirror_load="YES"
linux_load="YES"
sound_load="YES"
snd_ich_load="YES"
#nvidia_load="YES"
accf_http_load="YES"
#vboxdrv_load="YES"
loader_logo="beastie"
coretemp_load="YES"
geom_label_load="YES"
atapicam_load="YES"                     # allows ATAPI devices to be accessed through the SCSI subsystem, cam(4)
ahci_load="YES"                         # Allow S-ATA extra features (NCQ,etc)
if_tap_load="YES"

############# ZFS tunnables
## for AHCI
vfs.zfs.vdev.min_pending=4              #default=4
vfs.zfs.vdev.max_pending=8              #default = 35
## NO AHCI
#vfs.zfs.vdev.min_pending=4             #default=4
#vfs.zfs.vdev.max_pending=8             #default = 35

# Increase vm.kmem_size to allow for ZFS ARC to utilise more memory.
vm.kmem_size="2048M"
vm.kmem_size_max="2048M"
vfs.zfs.arc_max="2048M"

# Disable ZFS prefetching (we will not disable it because we have 6GB of RAM)
# http://southbrain.com/south/2008/04/the-nightmare-comes-slowly-zfs.html
# vfs.zfs.prefetch_disable="0"

# Decrease ZFS txg timeout value from 30 (default) to 5 seconds.  This
# should increase throughput and decrease the "bursty" stalls that
# happen during immense I/O with ZFS.
# http://lists.freebsd.org/pipermail/freebsd-fs/2009-December/007343.html
# http://lists.freebsd.org/pipermail/freebsd-fs/2009-December/007355.html
vfs.zfs.txg.timeout="5"

# Target seconds to sync a txg
vfs.zfs.txg.synctime="1"
```

bonnie:

```
[root@mainserver ~]# bonnie
File './Bonnie.2543', size: 104857600
Writing with putc()...done
Rewriting...done
Writing intelligently...done
Reading with getc()...done
Reading intelligently...done
Seeker 1...Seeker 2...Seeker 3...start 'em...done...done...done...
              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
          100 251650 96.8 83025 20.1 119967 30.6 225507 99.0 905097 96.1 160539.4 175.5
```

and bonnie++:

```
[root@mainserver ~]# bonn
bonnie   bonnie++
[root@mainserver ~]# bonnie++ -s 16000 -u root
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
mainserver   16000M   104  99 76228  21 48577  13   238  99 172447  20 167.8   3
Latency               115ms    1337ms    2428ms   67607us     596ms     840ms
Version  1.96       ------Sequential Create------ --------Random Create--------
mainserver          -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 11034  95  6396  97  2048  99 20678  89 +++++ +++ 18925  97
Latency             21623us   11981us   15821us   14862us    4989us     102us
1.96,1.96,mainserver,1,1297439902,16000M,,104,99,76228,21,48577,13,238,99,172447,20,167.8,3,16,,,,,11034,95,6396,97,2048,99,20678,89,+++++,+++,18925,97,115ms,1337ms,2428ms,67607us,596ms,840ms,21623us,11981us,15821us,14862us,4989us,102us
```


```
vfs.zfs.txg.write_limit_override: 268435456
```


----------

