# ZFS performance degradation over time



## garrettmoore (Jan 1, 2010)

Hi,

I'm having problems with ZFS performance. When my system comes up, read/write speeds are excellent (testing with dd if=/dev/zero of=/tank/bigfile and dd if=/tank/bigfile of=/dev/null); I get at least 100MB/s on both reads and writes, and I'm happy with that.

The longer the system is up, the worse my performance gets. Currently my system has been up for 4 days, and read/write performance is down to about 10MB/s at best.

The system is only accessed by 3 clients: myself, my roommate, and our HTPC. Usually, only one client will be doing anything at a time, so it is not under heavy load or anything.

*Software:*

```
FreeBSD leviathan 8.0-RELEASE FreeBSD 8.0-RELEASE #0: Sat Nov 21 15:02:08 UTC 2009
root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64
```
The following apps are running and touching data on the zpool:

rTorrent - read and write, usually active, not doing much for reads (not doing any major seeding)
SABnzbd+ - write only, not always active
Lighttpd - running ruTorrent (web interface for rTorrent); nothing else
samba - all of our clients are running Windows, so we use samba to network-mount the zpool

*Hardware:*

AMD Athlon II X2 250 Dual Core Processor Socket AM3 3.0GHZ
Gigabyte MA790GP-UD4H AMD790GX ATX AM2+/AM3 Sideport 2PCI-E Sound GBLAN HDMI CrossFireX Motherboard
Corsair XMS2 TWIN2X4096-6400C5 4GB DDR2 2X2GB
Supermicro AOC-USASLP-L8I LSI 1068E 8-PORT RAID 0/1/10 Uio SATA/SAS Controller W/ 16MB Low Profile
*8x* Western Digital WD15EADS Caviar Green 1.5TB SATA 32MB Cache 3.5IN

*ZFS setup:*
I have the 1.5TB drives in one RAIDZ pool. All 8 drives are connected to the Supermicro L8I controller. The controller is set to 'disabled', so it isn't doing anything with the drives except presenting them to the system untouched. (So I'm really only using it as an expansion card, for the extra ports).

```
[root@leviathan ~]# zpool status
  pool: tank
 state: ONLINE
config:
        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            da0     ONLINE       0     0     0
            da1     ONLINE       0     0     0
            da2     ONLINE       0     0     0
            da3     ONLINE       0     0     0
            da4     ONLINE       0     0     0
            da5     ONLINE       0     0     0
            da6     ONLINE       0     0     0
            da7     ONLINE       0     0     0

errors: No known data errors
```

Any suggestions as to what might be causing the performance to degrade with system uptime? If I missed anything or more information is needed, please let me know. Thanks in advance.


----------



## tobiastheviking (Jan 2, 2010)

No suggestions, but i have exactly the same problem as you do. 

All my hardware is different, but the setup is mostly the same(small file server, only used by me, etc).

One thing i have found is that "top" will say "Mem: 962M Active" even when i have closed all programs, and nothing should be using memory at all.

The memory stays active, and is never marked inactive. At the same time i see zfskern using a lot of processing power.

I've been trying to debug this for some time. I have even done a complete reinstall of the system. No luck thus far.

I've just tried disabling zil in loader.conf if it does anything i will write back.


----------



## tobiastheviking (Jan 2, 2010)

Ok, that did nothing


```
last pid: 95448;  load averages:  0.00,  0.00,  0.00                                                                                 up 0+09:06:40  12:02:25
71 processes:  3 running, 68 sleeping
CPU:  0.4% user,  0.0% nice,  2.6% system,  0.4% interrupt, 96.7% idle
Mem: 1099M Active, 109M Inact, 666M Wired, 68M Cache, 213M Buf, 29M Free
Swap: 8192M Total, 7896K Used, 8184M Free

  PID USERNAME  THR PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
 1900 <username>      1  64   20 85344K 36460K select   3:07  0.00% /usr/local/bin/python2.6 -OO /usr/local/bin/SABnzbd.py
```
I have sorted it after size(ie, memory usage). in the 9 hours since i rebooted, nothing has really been done.

I moved less than 10gigs from a ufs to a zfs drive. irssi has been running in a screen. And a few other programs have been doing minor stuff as well. But nothing major.

The program that is currently using the most memory is sabnzbd, which is using 85mb(it has done NOTHING since i started it). So i have no idea why so much memory is marked as active.

From experience, killing all the programs i have running will NOT free any more memory.


----------



## wonslung (Jan 2, 2010)

This could be related to the prefetch bug.

have you tried disabling prefetch or applying the new prefetch patches?


----------



## oliverh (Jan 2, 2010)

wonslung said:
			
		

> This could be related to the prefetch bug.
> 
> have you tried disabling prefetch or applying the new prefetch patches?



ZFS prefetch isn't used with <=4GB of memory.


----------



## tobiastheviking (Jan 2, 2010)

I only have 2gb.


----------



## gkontos (Jan 2, 2010)

You will need some ZFS tuning. First try limiting the arc size.
For example

```
vfs.zfs.arc_min: 122880000
vfs.zfs.arc_max: 983040000
```
Is a nice start for a 2GB system.

Regards,

George


----------



## wonslung (Jan 2, 2010)

oliverh said:
			
		

> ZFS prefetch isn't used with <=4GB of memory.



i swear i read wrong...i thouhgt he had more than 4gb memory.

my bad.  yah, it's probably an arc issue then...

but honestly the best option would be to get more memory....memory is cheap.....tuning for short term but unless the motherboard wont' support it, get 4+ gb's 

i use 8 myself and it works well.

EDIT:
went back and read his post again....he's got 4 gb memory...i thought prefetch was only disabled if you had LESS than 4 gb memory but was ON if you had exactly 4 gb memory.


----------



## tty23 (Jan 2, 2010)

Hi,

I have similar problems with ZFS. I want to move my home server from Debian/Linux to FreeBSD. Now FreeBSD 8 is up and running, everything seems to be fine, except the long term ZFS performance.
I created a 4 drive RaidZ pool and am currently copying the stuff from my old drives (the linux ones with ext3 fs) to the new raid pool. The fist 10GB are copied really fast, but then the performance decreases drastically. When starting to copy the stuff, the linux drive usually reads at 20-30MB/s (according to iostat 1), the ZFS raid writes even faster, zpool iostat 1 reported values > 200MB/s.
After some time the performance gets down to 5MB/s (the last drive I copied), today I started to copy another drive, and currently get about 1MB/s (according to zpool iostat).
The drives I copy are between 160 and 320GB, so copying at 1MB/s takes some time.
I followed the ZFS tuning guide in the wiki (http://wiki.freebsd.org/ZFSTuningGuide), which basically says that you should increase the kern.maxvnodes setting.

Note that in my experience the last days, there seems no connection between uptime and ZFS performance. It looks more like the amount of ZFS usage leads to ZFS performance degradation.

At the beginning, I also tried different values of the other settings mentioned in the ZFS tuning guide, but found no notable differences (I played with vm.kmem_size_max, vm.kmem_size, vfs.zfs.arc_max).

It is good to know that others have similar problems, so I guess my hardware is not the cause.

And NFS with ZFS is even worse. As already mentioned I am currently copying a drive to the ZFS raid. When I try to access this ZFS raid via NFS during copying, the share is so slow, it is barely usable. 
Note that I am directly connected via a 1GB network link, and NFS access to the ext3 hdds of my old linux systems is really fast.


----------



## oliverh (Jan 2, 2010)

wonslung said:
			
		

> i swear i read wrong...i thouhgt he had more than 4gb memory.
> 
> my bad.  yah, it's probably an arc issue then...
> 
> ...



I'm using it with 2GB and 4GB (both of them 64BIT) without any problems. Well, I don't have a datacenter, but sometimes I have to transfer big data (sat pictures, highres photography (photogrammetry) etc.). I wouldn't try it with something <2GB or 32BIT, too much fuss about nothing ;-)


----------



## tobiastheviking (Jan 3, 2010)

gkontos said:
			
		

> You will need some ZFS tuning. First try limiting the arc size.
> For example
> 
> ```
> ...



Since you are saying 2GB that only applies to me i think.

Those values are way higher than what i currently have for arc. but i'll try them anyways.

Currently it's at 

```
vfs.zfs.arc_min: 53856640
vfs.zfs.arc_max: 430853120
```


----------



## garrettmoore (Jan 3, 2010)

It seems like we have a decent number of people with the same issue, based on how many replies there have been already.

It would be really helpful if someone familiar with the codebase could step in, since they would probably have some insight as to what tuning parameters are causing it, and since we have several people to test with, we should be able to narrow it down.

I'm going to take an initial stab at it and guess that it's some sort of performance tuning ZFS is attempting to do "on the fly" which fails utterly for our usage patterns.


----------



## tobiastheviking (Jan 3, 2010)

Agreed.

Something that might help ease the pain a bit, would be something like ionice(for when moving stuff around). But, i haven't been able to find anything like it for freebsd yet.


----------



## tobiastheviking (Jan 3, 2010)

I noticed a funny thing while moving data from a UFS(ad22) drive to a zraid drive(ad12,14,16,18).

gstat output

```
dT: 1.003s  w: 1.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
-snip-
    4    278      2    128    6.2    270   1894    9.0   72.9| ad12
    6    182      0      0    0.0    176   1975    7.3   65.2| ad14
    5    189      2    128    6.8    181   2015    7.2   59.7| ad16
    6    184      2    128    4.8    176   1975    7.2   68.7| ad18
    0     87     40   5105    2.7     47   2903    2.3   15.6| ad22
-snip-
```
Can it really be true that it only uses 15% on ad22(which is also getting data moved TO it from another computer) while it uses ~70% on the zraid drives moving data from ad22.

The only disk activity to the zraid is the move. This is how it looks constantly.


----------



## tobiastheviking (Jan 3, 2010)

Oh, and setting arc min and arc max to the new values(which were higher than my original) did naught.


----------



## hedwards (Jan 4, 2010)

tobiastheviking said:
			
		

> Can it really be true that it only uses 15% on ad22(which is also getting data moved TO it from another computer) while it uses ~70% on the zraid drives moving data from ad22.
> 
> The only disk activity to the zraid is the move. This is how it looks constantly.


If I'm understanding you correctly, the answer yes it could very well be. If it's taking 15% to do just one side of the operations on UFS, then it's not necessarily unreasonable for it to be taking up ~70% when doing both sides of it.

Assuming you're moving data from ad22 to the ZRAID or vice versa, then it's not that unreasonable at all. You're not just having to do the operations of copying the files, you're also having to have the computer check the other disks in the array and do any relevant checksums.


----------



## tty23 (Jan 4, 2010)

Some infos about my case (if anyone is interested)

Hardware:
- Memory: 3065 MB
- Controller: Intel ICH7
- HDDs: 4x WDC WD10EARS-00Y5B1 (Western Digital Green 1TB)

Software
- FreeBSD 8
- ZFS RaidZ with the 4 mentioned HDDs
- 3 file systems on the pool

To find out the speed of degradation, I rebooted the server and
copied a 8GB file from my old Linux drive to the ZFS pool several 
times.
Write speed was measured with "zpool iostat 1"

1. copy:
  - throughput start: 25MB/s
  - throughput end: 20MB/s 
2. copy
  - throughput start: 20MB/s
  - throughput end: 5-10MB/s (alternating between 5 and 10MB/s)
3. copy
  - throughput start: 5-10MB/s
  - throughput end: 5MB/s
4. copy
  - throughput start: 5MB/s
  - throuthput end; 5MB/s

`# sysctl -a | grep zfs`
The command was run after the 4. copy.

```
vfs.zfs.arc_meta_limit: 168369920
vfs.zfs.arc_meta_used: 15162272
vfs.zfs.mdcomp_disable: 0
vfs.zfs.arc_min: 84184960
vfs.zfs.arc_max: 673479680
vfs.zfs.zfetch.array_rd_sz: 1048576
vfs.zfs.zfetch.block_cap: 256
vfs.zfs.zfetch.min_sec_reap: 2
vfs.zfs.zfetch.max_streams: 8
vfs.zfs.prefetch_disable: 1
vfs.zfs.recover: 0
vfs.zfs.txg.synctime: 5
vfs.zfs.txg.timeout: 30
vfs.zfs.scrub_limit: 10
vfs.zfs.vdev.cache.bshift: 16
vfs.zfs.vdev.cache.size: 10485760
vfs.zfs.vdev.cache.max: 16384
vfs.zfs.vdev.aggregation_limit: 131072
vfs.zfs.vdev.ramp_rate: 2
vfs.zfs.vdev.time_shift: 6
vfs.zfs.vdev.min_pending: 4
vfs.zfs.vdev.max_pending: 35
vfs.zfs.cache_flush_disable: 0
vfs.zfs.zil_disable: 0
vfs.zfs.version.zpl: 3
vfs.zfs.version.vdev_boot: 1
vfs.zfs.version.spa: 13
vfs.zfs.version.dmu_backup_stream: 1
vfs.zfs.version.dmu_backup_header: 2
vfs.zfs.version.acl: 1
vfs.zfs.debug: 0
vfs.zfs.super_owner: 0
kstat.zfs.misc.arcstats.hits: 183630
kstat.zfs.misc.arcstats.misses: 38211
kstat.zfs.misc.arcstats.demand_data_hits: 114956
kstat.zfs.misc.arcstats.demand_data_misses: 13889
kstat.zfs.misc.arcstats.demand_metadata_hits: 68674
kstat.zfs.misc.arcstats.demand_metadata_misses: 24322
kstat.zfs.misc.arcstats.prefetch_data_hits: 0
kstat.zfs.misc.arcstats.prefetch_data_misses: 0
kstat.zfs.misc.arcstats.prefetch_metadata_hits: 0
kstat.zfs.misc.arcstats.prefetch_metadata_misses: 0
kstat.zfs.misc.arcstats.mru_hits: 67371
kstat.zfs.misc.arcstats.mru_ghost_hits: 19058
kstat.zfs.misc.arcstats.mfu_hits: 116259
kstat.zfs.misc.arcstats.mfu_ghost_hits: 14086
kstat.zfs.misc.arcstats.deleted: 268507
kstat.zfs.misc.arcstats.recycle_miss: 56554
kstat.zfs.misc.arcstats.mutex_miss: 38
kstat.zfs.misc.arcstats.evict_skip: 35618
kstat.zfs.misc.arcstats.hash_elements: 2645
kstat.zfs.misc.arcstats.hash_elements_max: 10052
kstat.zfs.misc.arcstats.hash_collisions: 15960
kstat.zfs.misc.arcstats.hash_chains: 46
kstat.zfs.misc.arcstats.hash_chain_max: 3
kstat.zfs.misc.arcstats.p: 33528320
kstat.zfs.misc.arcstats.c: 84184960
kstat.zfs.misc.arcstats.c_min: 84184960
kstat.zfs.misc.arcstats.c_max: 673479680
kstat.zfs.misc.arcstats.size: 15296928
kstat.zfs.misc.arcstats.hdr_size: 551824
kstat.zfs.misc.arcstats.l2_hits: 0
kstat.zfs.misc.arcstats.l2_misses: 0
kstat.zfs.misc.arcstats.l2_feeds: 0
kstat.zfs.misc.arcstats.l2_rw_clash: 0
kstat.zfs.misc.arcstats.l2_writes_sent: 0
kstat.zfs.misc.arcstats.l2_writes_done: 0
kstat.zfs.misc.arcstats.l2_writes_error: 0
kstat.zfs.misc.arcstats.l2_writes_hdr_miss: 0
kstat.zfs.misc.arcstats.l2_evict_lock_retry: 0
kstat.zfs.misc.arcstats.l2_evict_reading: 0
kstat.zfs.misc.arcstats.l2_free_on_write: 0
kstat.zfs.misc.arcstats.l2_abort_lowmem: 0
kstat.zfs.misc.arcstats.l2_cksum_bad: 0
kstat.zfs.misc.arcstats.l2_io_error: 0
kstat.zfs.misc.arcstats.l2_size: 0
kstat.zfs.misc.arcstats.l2_hdr_size: 0
kstat.zfs.misc.arcstats.memory_throttle_count: 415384
kstat.zfs.misc.vdev_cache_stats.delegations: 7553
kstat.zfs.misc.vdev_cache_stats.hits: 35799
kstat.zfs.misc.vdev_cache_stats.misses: 13472
```


----------



## deepdish (Jan 4, 2010)

Is there any compression enabled or changes to checksum algorithms on your zpool's ?


----------



## tty23 (Jan 4, 2010)

Nope, default config, the only thing I changed is sharenfs=on.


----------



## garrettmoore (Jan 4, 2010)

Default configuration for me as well. I created my zpool with `zpool create tank da{0,1,2,3,4,5,6,7}`.

Maybe what we need to do is a test like tty23 did, except with the output from `sysctl -a | grep zfs` immediately after the system boots, and also after *each* copy test, to see how the parameters are changing. If we can narrow it down a bit more it may mean something to someone. 

Would this be useful data?


----------



## deepdish (Jan 4, 2010)

garrettmoore said:
			
		

> Default configuration for me as well. I created my zpool with `zpool create tank da{0,1,2,3,4,5,6,7}`.
> 
> Maybe what we need to do is a test like tty23 did, except with the output from `sysctl -a | grep zfs` immediately after the system boots, and also after *each* copy test, to see how the parameters are changing. If we can narrow it down a bit more it may mean something to someone.
> 
> Would this be useful data?



I have 2 zpool's on my system and I have no issues participating in these tests. As long as the commands used is all laid out, then I'll post my results and we can compare.

My results will be skewed as I have enabled compression globally on all of my pools (using lzjb compression), but the idea of performance degradation due to usage should still apply.


----------



## Alt (Jan 4, 2010)

If it can help - i cant see same results for 100m file on raidz (4 virtual disks) under vmware.


----------



## Matty (Jan 5, 2010)

Here are my results:
amd X2 2ghz, 4GB ram single WD hddisk with zfs on ad0s2 on an empty pool running 8.0-release


```
[matty@fb ~]$ cat /boot/loader.conf 
zfs_load="YES"
vm.kmem_size="4G"
kern.maxusers=2048 
vfs.zfs.txg.timeout="5"
```


```
tank  type                  filesystem             -
tank  creation              Tue Jan  5 14:07 2010  -
tank  used                  2.93G                  -
tank  available             100G                   -
tank  referenced            2.93G                  -
tank  compressratio         1.00x                  -
tank  mounted               yes                    -
tank  quota                 none                   default
tank  reservation           none                   default
tank  recordsize            128K                   default
tank  mountpoint            /tank                  default
tank  sharenfs              off                    default
tank  checksum              on                     default
tank  compression           off                    default
tank  atime                 off                    local
tank  devices               on                     default
tank  exec                  on                     default
tank  setuid                on                     default
tank  readonly              off                    default
tank  jailed                off                    default
tank  snapdir               hidden                 default
tank  aclmode               groupmask              default
tank  aclinherit            restricted             default
tank  canmount              on                     default
tank  shareiscsi            off                    default
tank  xattr                 off                    temporary
tank  copies                1                      default
tank  version               3                      -
tank  utf8only              off                    -
tank  normalization         none                   -
tank  casesensitivity       sensitive              -
tank  vscan                 off                    default
tank  nbmand                off                    default
tank  sharesmb              off                    default
tank  refquota              none                   default
tank  refreservation        none                   default
tank  primarycache          all                    default
tank  secondarycache        all                    default
tank  usedbysnapshots       0                      -
tank  usedbydataset         2.93G                  -
tank  usedbychildren        55.5K                  -
tank  usedbyrefreservation  0                      -
```

I did

```
dd if=/dev/urandom of=./file1 bs=1m count=1000
```
 6 times with adding 1 to of=./fileX. Between runs I waited till the disk had written all from its' cache/ram.

Here are the results:


```
1048576000 bytes transferred in 16.884601 secs (62102504 bytes/sec)
1048576000 bytes transferred in 16.782244 secs (62481274 bytes/sec)
1048576000 bytes transferred in 17.148042 secs (61148439 bytes/sec)
1048576000 bytes transferred in 16.877827 secs (62127429 bytes/sec)
1048576000 bytes transferred in 16.804360 secs (62399044 bytes/sec)
1048576000 bytes transferred in 17.143705 secs (61163908 bytes/sec)
```

And one with a 10GB dump

```
10485760000 bytes transferred in 200.109637 secs (52400075 bytes/sec)
```


----------



## deepdish (Jan 5, 2010)

I wonder if we should justify using sync(8) for these tests?


----------



## Matty (Jan 11, 2010)

My result posted above were on an empty pool.

after filling the pool to 90% I also see some kind of degradation from 61MB/s to 35/40MB/s.


----------



## Matty (Jan 12, 2010)

The issue with the useage of too much active memory only seems to happend  when doing some copying of data from one filesystem to a zfs  and not when uploading some data by AFD (osx).

 loader.conf:

```
vfs.zfs.arc_max="1800M"
vfs.zfs.arc_min="1000M"
```
I think with these vales I  managed to hold at least 1G for zfs and by this I kept rather good performance judging by the value in iostat.

Could someone try and confirm this?


----------



## Cellsplicer (Jan 16, 2010)

I've also been playing around with my ARC sizes as my zfs write performance seems to decrease after about a day of uptime on 8.0-RELEASE-p2. I've found that at this time, kstat.zfs.misc.arcstats.memory_throttle_count seems to rise when I write to it.

The problem is, FreeBSD seems to ignore any of the values I set in /boot/loader.conf. I've set the following values in loader.conf:


```
vm.kmem_size="2048M"
vm.kmem_size_max="2048M"
vfs.zfs.arc_min="256M"
vfs.zfs.arc_max="1024M"
```
Upon rebooting it seems that FreeBSD is still automatically tuning those values by itself:


```
vm.kmem_size_max: 329853485875
vm.kmem_size: 2770690048
vfs.zfs.arc_min: 216460160
vfs.zfs.arc_max: 1731681280
```


----------



## wonslung (Jan 17, 2010)

Matty said:
			
		

> My result posted above were on an empty pool.
> 
> after filling the pool to 90% I also see some kind of degradation from 61MB/s to 35/40MB/s.




This is to be expected.  ZFS has this issue on ALL systems.

It is mainly to do with the algorithm that ZFS uses to try to pick the next good block.  I've read that they are going to start using different algorithms depending on how full the pool is and maybe even make it a tunable.


----------



## Cellsplicer (Jan 18, 2010)

Cellsplicer said:
			
		

> I've also been playing around with my ARC sizes as my zfs write performance seems to decrease after about a day of uptime on 8.0-RELEASE-p2. I've found that at this time, kstat.zfs.misc.arcstats.memory_throttle_count seems to rise when I write to it.
> 
> The problem is, FreeBSD seems to ignore any of the values I set in /boot/loader.conf. I've set the following values in loader.conf:
> 
> ...



Nevermind. I had a typo in my loader.conf! Sorry for the inconvenience.


----------



## Savagedlight (Jan 19, 2010)

*Information + first test*

I have similar issues with performance degradation.
I've noticed it when copying from a UFS2+softupdates FS to a ZFS filesystem. Once the performance issue hits, it affects *all* writes to the ZFS pool, be it direct copy from UFS2 to ZFS, over samba, ftp, or using dd to dump.

It does not manifest itself until 'top' says the amount of free memory is below 200M, but isn't constantly present.

The amount of inact memory increases roughly as fast as free memory decreases - which makes sense.

Below are details on the setup and initial tests.
Due to character limits, I'll have to post the rest of the results in followup posts.


```
CPU: Intel(R) Core(TM)2 Duo CPU     E7400  @ 2.80GHz (2806.98-MHz K8-class CPU)
real memory  = 8589934592 (8192 MB)
avail memory = 8253591552 (7871 MB)
atapci1: <Intel ICH10 SATA300 controller> port 0xb000-0xb007,0xac00-0xac03,0xa880-0xa887,0xa800-0xa803,0xa480-0xa49f mem 0xf9fff000-0xf9fff7ff irq 19 at device 31.2 on pci0
```

Pool:

```
NAME           STATE    READ  WRITE CKSUM
storage        ONLINE     0     0     0
 raidz1        ONLINE     0     0     0
   ad10        ONLINE     0     0     0
   ad12        ONLINE     0     0     0
   ad14        ONLINE     0     0     0
   ad16        ONLINE     0     0     0
```
These are 4x Western Digital Greenpower 1.5TB disks.

file: /boot/loader.conf

```
kern.hz=1000
net.inet.tcp.tcbhashsize=4096
net.inet.tcp.hostcache.hashsize=1024
kern.ipc.nmbclusters=65536
vm.kmem_size="4G"
vfs.zfs.vdev.min_pending=1 #default=4
vfs.zfs.vdev.max_pending=1 #default = 35
vfs.zfs.arc_min="3G"
vfs.zfs.arc_max="3G"
```

`# zpool get all storage`

```
NAME     PROPERTY       VALUE       SOURCE
storage  size           5.44T       -
storage  used           635G        -
storage  available      4.82T       -
storage  capacity       11%         -
storage  altroot        -           default
storage  health         ONLINE      -
storage  guid           <#here>     -
storage  version        13          default
storage  bootfs         -           default
storage  delegation     on          default
storage  autoreplace    off         default
storage  cachefile      -           default
storage  failmode       wait        default
storage  listsnapshots  off         default
```

Before writing anything
`# top -SP`

```
last pid:  1843;  load averages:  0.00,  0.00,  0.00  up 0+00:15:44  04:41:49
186 processes: 3 running, 163 sleeping, 20 waiting
CPU 0:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 1:  0.4% user,  0.0% nice,  0.0% system,  0.0% interrupt, 99.6% idle
Mem: 40M Active, 21M Inact, 161M Wired, 284K Cache, 44M Buf, 7692M Free
Swap: 4096M Total, 4096M Free

  PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
   11 root        2 171 ki31     0K    32K CPU0    0  31:37 200.00% idle
   12 root       20 -60    -     0K   320K WAIT    0   0:07  0.00% intr
 1730 root        1  44    0 12652K  2200K nanslp  0   0:00  0.00% gstat
```

`# sysctl kstat.zfs.misc.arcstats`

```
kstat.zfs.misc.arcstats.hits: 1481
kstat.zfs.misc.arcstats.misses: 110
kstat.zfs.misc.arcstats.demand_data_hits: 0
kstat.zfs.misc.arcstats.demand_data_misses: 0
kstat.zfs.misc.arcstats.demand_metadata_hits: 1477
kstat.zfs.misc.arcstats.demand_metadata_misses: 100
kstat.zfs.misc.arcstats.prefetch_data_hits: 0
kstat.zfs.misc.arcstats.prefetch_data_misses: 0
kstat.zfs.misc.arcstats.prefetch_metadata_hits: 4
kstat.zfs.misc.arcstats.prefetch_metadata_misses: 10
kstat.zfs.misc.arcstats.mru_hits: 294
kstat.zfs.misc.arcstats.mru_ghost_hits: 0
kstat.zfs.misc.arcstats.mfu_hits: 1183
kstat.zfs.misc.arcstats.mfu_ghost_hits: 0
kstat.zfs.misc.arcstats.deleted: 28
kstat.zfs.misc.arcstats.recycle_miss: 0
kstat.zfs.misc.arcstats.mutex_miss: 0
kstat.zfs.misc.arcstats.evict_skip: 0
kstat.zfs.misc.arcstats.hash_elements: 82
kstat.zfs.misc.arcstats.hash_elements_max: 83
kstat.zfs.misc.arcstats.hash_collisions: 0
kstat.zfs.misc.arcstats.hash_chains: 0
kstat.zfs.misc.arcstats.hash_chain_max: 0
kstat.zfs.misc.arcstats.p: 1610612736
kstat.zfs.misc.arcstats.c: 3221225472
kstat.zfs.misc.arcstats.c_min: 3221225472
kstat.zfs.misc.arcstats.c_max: 3221225472
kstat.zfs.misc.arcstats.size: 821232
kstat.zfs.misc.arcstats.hdr_size: 17056
kstat.zfs.misc.arcstats.l2_hits: 0
kstat.zfs.misc.arcstats.l2_misses: 0
kstat.zfs.misc.arcstats.l2_feeds: 0
kstat.zfs.misc.arcstats.l2_rw_clash: 0
kstat.zfs.misc.arcstats.l2_writes_sent: 0
kstat.zfs.misc.arcstats.l2_writes_done: 0
kstat.zfs.misc.arcstats.l2_writes_error: 0
kstat.zfs.misc.arcstats.l2_writes_hdr_miss: 0
kstat.zfs.misc.arcstats.l2_evict_lock_retry: 0
kstat.zfs.misc.arcstats.l2_evict_reading: 0
kstat.zfs.misc.arcstats.l2_free_on_write: 0
kstat.zfs.misc.arcstats.l2_abort_lowmem: 0
kstat.zfs.misc.arcstats.l2_cksum_bad: 0
kstat.zfs.misc.arcstats.l2_io_error: 0
kstat.zfs.misc.arcstats.l2_size: 0
kstat.zfs.misc.arcstats.l2_hdr_size: 0
kstat.zfs.misc.arcstats.memory_throttle_count: 0
```

First write test
Writing 1GB to the pool.
`# dd if=/dev/urandom of=./file1 bs=1m count=1024`

```
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 14.762347 secs (72735170 bytes/sec)
```

*During write*
`# zpool iostat 5`

```
storage      635G  4.82T      0      0      0      0
storage      635G  4.82T      1     29  3.00K  1.09M
storage      635G  4.82T      0      0      0      0
storage      635G  4.82T      0      0      0      0
storage      635G  4.82T      0      0      0      0
storage      635G  4.82T      0  1.59K      0   204M
storage      635G  4.82T      0      0      0      0
```
First line is start of write. End of write is at third line. Some backlogging of writes?

*After write*
`# top -SP`

```
last pid:  1946;  load averages:  0.00,  0.02,  0.00  up 0+00:31:05  04:57:10
185 processes: 3 running, 162 sleeping, 20 waiting
CPU 0:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 1:  0.0% user,  0.0% nice,  0.4% system,  0.0% interrupt, 99.6% idle
Mem: 40M Active, 21M Inact, 1245M Wired, 284K Cache, 44M Buf, 6608M Free
Swap: 4096M Total, 4096M Free

  PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
   11 root        2 171 ki31     0K    32K CPU0    0  60:30 199.56% idle
   12 root       20 -60    -     0K   320K WAIT    0   0:13  0.00% intr
    0 root       40  -8    0     0K   624K -       0   0:02  0.00% kernel
```

`# sysctl kstat.zfs.misc.arcstats`

```
kstat.zfs.misc.arcstats.hits: 1500
kstat.zfs.misc.arcstats.misses: 117
kstat.zfs.misc.arcstats.demand_data_hits: 0
kstat.zfs.misc.arcstats.demand_data_misses: 0
kstat.zfs.misc.arcstats.demand_metadata_hits: 1496
kstat.zfs.misc.arcstats.demand_metadata_misses: 107
kstat.zfs.misc.arcstats.prefetch_data_hits: 0
kstat.zfs.misc.arcstats.prefetch_data_misses: 0
kstat.zfs.misc.arcstats.prefetch_metadata_hits: 4
kstat.zfs.misc.arcstats.prefetch_metadata_misses: 10
kstat.zfs.misc.arcstats.mru_hits: 311
kstat.zfs.misc.arcstats.mru_ghost_hits: 0
kstat.zfs.misc.arcstats.mfu_hits: 1185
kstat.zfs.misc.arcstats.mfu_ghost_hits: 0
kstat.zfs.misc.arcstats.deleted: 38
kstat.zfs.misc.arcstats.recycle_miss: 0
kstat.zfs.misc.arcstats.mutex_miss: 0
kstat.zfs.misc.arcstats.evict_skip: 0
kstat.zfs.misc.arcstats.hash_elements: 8348
kstat.zfs.misc.arcstats.hash_elements_max: 8348
kstat.zfs.misc.arcstats.hash_collisions: 193
kstat.zfs.misc.arcstats.hash_chains: 190
kstat.zfs.misc.arcstats.hash_chain_max: 1
kstat.zfs.misc.arcstats.p: 1610612736
kstat.zfs.misc.arcstats.c: 3221225472
kstat.zfs.misc.arcstats.c_min: 3221225472
kstat.zfs.misc.arcstats.c_max: 3221225472
kstat.zfs.misc.arcstats.size: 1077603800
kstat.zfs.misc.arcstats.hdr_size: 1736384
kstat.zfs.misc.arcstats.l2_hits: 0
kstat.zfs.misc.arcstats.l2_misses: 0
kstat.zfs.misc.arcstats.l2_feeds: 0
kstat.zfs.misc.arcstats.l2_rw_clash: 0
kstat.zfs.misc.arcstats.l2_writes_sent: 0
kstat.zfs.misc.arcstats.l2_writes_done: 0
kstat.zfs.misc.arcstats.l2_writes_error: 0
kstat.zfs.misc.arcstats.l2_writes_hdr_miss: 0
kstat.zfs.misc.arcstats.l2_evict_lock_retry: 0
kstat.zfs.misc.arcstats.l2_evict_reading: 0
kstat.zfs.misc.arcstats.l2_free_on_write: 0
kstat.zfs.misc.arcstats.l2_abort_lowmem: 0
kstat.zfs.misc.arcstats.l2_cksum_bad: 0
kstat.zfs.misc.arcstats.l2_io_error: 0
kstat.zfs.misc.arcstats.l2_size: 0
kstat.zfs.misc.arcstats.l2_hdr_size: 0
kstat.zfs.misc.arcstats.memory_throttle_count: 0
```


----------



## Savagedlight (Jan 19, 2010)

*second batch of tests.*

Second write test
Writing 1GB to the pool.
[CMD="dd"]if=/dev/urandom of=./file2 bs=1m count=1024[/CMD]

```
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 15.163315 secs (70811814 bytes/sec)
```

*During write*
[CMD="zpool"]iostat 1[/CMD]

```
capacity     operations    bandwidth
pool         used  avail   read  write   read  write
storage      636G  4.82T      0      0      0      0
storage      636G  4.82T      0    136      0  17.1M
storage      636G  4.82T      0      0      0      0
storage      636G  4.82T      0      0      0      0
storage      636G  4.82T      0      0      0      0
storage      636G  4.82T      0      0      0      0
storage      636G  4.82T      0      0      0      0
storage      636G  4.82T      0      0      0      0
storage      636G  4.82T      0      0      0      0
storage      636G  4.82T      0      0      0      0
storage      636G  4.82T      0      0      0      0
storage      636G  4.82T      0      0      0      0
storage      636G  4.82T      0      0      0      0
storage      636G  4.82T      0      0      0      0
storage      636G  4.82T      0      0      0      0
storage      636G  4.82T      0      0      0      0
storage      636G  4.82T      0  1.74K      0   211M
storage      636G  4.82T      0  1.82K      0   232M
storage      636G  4.82T      0  2.02K      0   257M
storage      636G  4.82T      0  1.90K      0   241M
storage      636G  4.82T      0    575      0  62.1M
storage      638G  4.81T      0      0      0      0
storage      638G  4.81T      0      0      0      0
storage      638G  4.81T      0      0      0      0
storage      638G  4.81T      0      0      0      0
storage      638G  4.81T      0      0      0      0
```

First line is start of write. dd is done before the first 211M line. Still some backlogging of writes. (may be intended?)

*After write*
[CMD="top"]-SP[/CMD]

```
last pid:  1994;  load averages:  0.04,  0.03,  0.01  up 0+00:38:17  05:04:22
185 processes: 3 running, 162 sleeping, 20 waiting
CPU 0:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 1:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 40M Active, 21M Inact, 2281M Wired, 284K Cache, 44M Buf, 5571M Free
Swap: 4096M Total, 4096M Free

  PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
   11 root        2 171 ki31     0K    32K CPU0    0  74:32 199.17% idle
   12 root       20 -60    -     0K   320K WAIT    0   0:16  0.00% intr
    0 root       40  -8    0     0K   624K -       1   0:04  0.00% kernel
```

[CMD="sysctl"]kstat.zfs.misc.arcstats[/CMD]

```
kstat.zfs.misc.arcstats.hits: 1518
kstat.zfs.misc.arcstats.misses: 117
kstat.zfs.misc.arcstats.demand_data_hits: 0
kstat.zfs.misc.arcstats.demand_data_misses: 0
kstat.zfs.misc.arcstats.demand_metadata_hits: 1514
kstat.zfs.misc.arcstats.demand_metadata_misses: 107
kstat.zfs.misc.arcstats.prefetch_data_hits: 0
kstat.zfs.misc.arcstats.prefetch_data_misses: 0
kstat.zfs.misc.arcstats.prefetch_metadata_hits: 4
kstat.zfs.misc.arcstats.prefetch_metadata_misses: 10
kstat.zfs.misc.arcstats.mru_hits: 323
kstat.zfs.misc.arcstats.mru_ghost_hits: 0
kstat.zfs.misc.arcstats.mfu_hits: 1191
kstat.zfs.misc.arcstats.mfu_ghost_hits: 0
kstat.zfs.misc.arcstats.deleted: 48
kstat.zfs.misc.arcstats.recycle_miss: 0
kstat.zfs.misc.arcstats.mutex_miss: 0
kstat.zfs.misc.arcstats.evict_skip: 0
kstat.zfs.misc.arcstats.hash_elements: 16605
kstat.zfs.misc.arcstats.hash_elements_max: 16605
kstat.zfs.misc.arcstats.hash_collisions: 927
kstat.zfs.misc.arcstats.hash_chains: 883
kstat.zfs.misc.arcstats.hash_chain_max: 3
kstat.zfs.misc.arcstats.p: 2150481920
kstat.zfs.misc.arcstats.c: 3221225472
kstat.zfs.misc.arcstats.c_min: 3221225472
kstat.zfs.misc.arcstats.c_max: 3221225472
kstat.zfs.misc.arcstats.size: 2154261472
kstat.zfs.misc.arcstats.hdr_size: 3453840
kstat.zfs.misc.arcstats.l2_hits: 0
kstat.zfs.misc.arcstats.l2_misses: 0
kstat.zfs.misc.arcstats.l2_feeds: 0
kstat.zfs.misc.arcstats.l2_rw_clash: 0
kstat.zfs.misc.arcstats.l2_writes_sent: 0
kstat.zfs.misc.arcstats.l2_writes_done: 0
kstat.zfs.misc.arcstats.l2_writes_error: 0
kstat.zfs.misc.arcstats.l2_writes_hdr_miss: 0
kstat.zfs.misc.arcstats.l2_evict_lock_retry: 0
kstat.zfs.misc.arcstats.l2_evict_reading: 0
kstat.zfs.misc.arcstats.l2_free_on_write: 0
kstat.zfs.misc.arcstats.l2_abort_lowmem: 0
kstat.zfs.misc.arcstats.l2_cksum_bad: 0
kstat.zfs.misc.arcstats.l2_io_error: 0
kstat.zfs.misc.arcstats.l2_size: 0
kstat.zfs.misc.arcstats.l2_hdr_size: 0
kstat.zfs.misc.arcstats.memory_throttle_count: 0
```

Write test three through seven show no performance degradation. (69-72 MB/s)
[CMD="top"]-SP[/CMD] (after seventh run)

```
last pid:  2058;  load averages:  0.20,  0.15,  0.06  up 0+00:45:42  05:11:47
186 processes: 3 running, 163 sleeping, 20 waiting
CPU 0:  0.0% user,  0.0% nice,  0.4% system,  0.0% interrupt, 99.6% idle
CPU 1:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 40M Active, 21M Inact, 3288M Wired, 284K Cache, 44M Buf, 4564M Free
Swap: 4096M Total, 4096M Free
```

So far so good.


----------



## Savagedlight (Jan 19, 2010)

*Third batch.*

Copying a file from a UFS2+softupdate FS (/usr/home) to the pool.
The file is 23178715182 bytes (22G). Lets see what happens.

```
15s: Mem: 42M Active, 287M Inact, 4211M Wired, 284K Cache, 828M Buf, 3374M Free
25s: Mem: 42M Active, 1139M Inact, 4128M Wired, 284K Cache, 828M Buf, 2605M Free
35s: Mem: 42M Active, 1932M Inact, 4081M Wired, 284K Cache, 828M Buf, 1860M Free
45s: Mem: 42M Active, 2767M Inact, 4180M Wired, 284K Cache, 828M Buf, 925M Free
```
'zpool iostat' says between 0 and 170MB/s writes.
later: 

```
Mem: 42M Active, 3304M Inact, 4085M Wired, 284K Cache, 828M Buf, 483M Free
```
Performance just dropped to about 60MB/s, according to 'zpool iostat'.

Even later: 

```
Mem: 56M Active, 3506M Inact, 4025M Wired, 116M Cache, 828M Buf, 211M Free
```
'zpool iostat' says between 10M and 50M writes when there are any. gstat still reporting "healthy" busy levels for all disks. (below 70%)

The Problem
Even later, still copying the same file.
`zpool iostat`:

```
capacity     operations    bandwidth
pool         used  avail   read  write   read  write
storage      683G  4.77T      0      0      0      0
storage      683G  4.77T      0     50      0  5.77M
storage      683G  4.77T      0      1      0   255K
storage      683G  4.77T      0      0      0      0
storage      683G  4.77T      0      1      0   255K
storage      683G  4.77T      0      0      0      0
storage      683G  4.77T      0     45      0  5.73M
storage      683G  4.77T      0     73      0  9.22M
storage      683G  4.77T      1    283  2.49K  26.3M
storage      683G  4.77T      0      0      0      0
storage      683G  4.77T      0      0      0      0
storage      683G  4.77T      0      0      0      0
storage      683G  4.77T      0     50      0  5.77M
storage      683G  4.77T      0    362      0  41.1M
storage      683G  4.77T      0      0      0      0
storage      683G  4.77T      2      2  3.99K  1.50K
storage      683G  4.77T      0      0      0      0
storage      683G  4.77T      0     17      0  25.4K
storage      683G  4.77T      0      0      0      0
storage      683G  4.77T      0     20      0  29.9K
storage      683G  4.77T      0      0      0    510
storage      683G  4.77T      0      0      0      0
storage      683G  4.77T      0     16      0  27.9K
storage      683G  4.77T      0      0      0      0
storage      683G  4.77T      0      0      0      0
storage      683G  4.77T      0     46      0  5.80M
storage      683G  4.77T      0    229      0  28.4M
storage      683G  4.77T      0      0      0      0
storage      683G  4.77T      0     50      0  5.87M
storage      683G  4.77T      0      0      0      0
storage      683G  4.77T      0    100      0  11.5M
```
`gstat`

```
dT: 1.003s  w: 1.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    1     91     91  11506    1.4      0      0    0.0   13.1| ad8
    1     91     91  11506    1.5      0      0    0.0   13.2| ad8s1
    0      0      0      0    0.0      0      0    0.0    0.0| ad8s1a
    0      0      0      0    0.0      0      0    0.0    0.0| ad8s1b
    0      0      0      0    0.0      0      0    0.0    0.0| ad8s1d
    0      0      0      0    0.0      0      0    0.0    0.0| ad8s1e
    1     91     91  11506    1.5      0      0    0.0   13.3| ad8s1f
    1     25      0      0    0.0     24   1963    0.9  184.6| ad10
    0    106      0      0    0.0    106   8761    1.1   12.0| ad12
    0    107      0      0    0.0    107   8760    1.1   12.2| ad14
    0    109      0      0    0.0    109   8754    1.0   11.1| ad16
```

`top`:

```
last pid:  2161;  load averages:  0.25,  0.33,  0.27  up 0+01:02:50  05:28:55
187 processes: 3 running, 164 sleeping, 20 waiting
CPU 0:  0.0% user,  0.0% nice,  0.0% system,  0.4% interrupt, 99.6% idle
CPU 1:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 56M Active, 3606M Inact, 4029M Wired, 38M Cache, 828M Buf, 186M Free
Swap: 4096M Total, 48K Used, 4096M Free
```
The moment the "Mem: Free" status in top went below 200M, there was a severe performance hit.
Okay, this seems to be pretty much exactly what I observed earlier.

`sysctl kstat.zfs.misc.arcstats`

```
kstat.zfs.misc.arcstats.hits: 49109
kstat.zfs.misc.arcstats.misses: 1769
kstat.zfs.misc.arcstats.demand_data_hits: 22112
kstat.zfs.misc.arcstats.demand_data_misses: 1
kstat.zfs.misc.arcstats.demand_metadata_hits: 6058
kstat.zfs.misc.arcstats.demand_metadata_misses: 1335
kstat.zfs.misc.arcstats.prefetch_data_hits: 20310
kstat.zfs.misc.arcstats.prefetch_data_misses: 0
kstat.zfs.misc.arcstats.prefetch_metadata_hits: 629
kstat.zfs.misc.arcstats.prefetch_metadata_misses: 433
kstat.zfs.misc.arcstats.mru_hits: 26543
kstat.zfs.misc.arcstats.mru_ghost_hits: 1073
kstat.zfs.misc.arcstats.mfu_hits: 1627
kstat.zfs.misc.arcstats.mfu_ghost_hits: 27
kstat.zfs.misc.arcstats.deleted: 294263
kstat.zfs.misc.arcstats.recycle_miss: 3619
kstat.zfs.misc.arcstats.mutex_miss: 0
kstat.zfs.misc.arcstats.evict_skip: 702
kstat.zfs.misc.arcstats.hash_elements: 28690
kstat.zfs.misc.arcstats.hash_elements_max: 48274
kstat.zfs.misc.arcstats.hash_collisions: 63018
kstat.zfs.misc.arcstats.hash_chains: 2737
kstat.zfs.misc.arcstats.hash_chain_max: 5
kstat.zfs.misc.arcstats.p: 3199266304
kstat.zfs.misc.arcstats.c: 3221225472
kstat.zfs.misc.arcstats.c_min: 3221225472
kstat.zfs.misc.arcstats.c_max: 3221225472
kstat.zfs.misc.arcstats.size: 3154617112
kstat.zfs.misc.arcstats.hdr_size: 6015360
kstat.zfs.misc.arcstats.l2_hits: 0
kstat.zfs.misc.arcstats.l2_misses: 0
kstat.zfs.misc.arcstats.l2_feeds: 0
kstat.zfs.misc.arcstats.l2_rw_clash: 0
kstat.zfs.misc.arcstats.l2_writes_sent: 0
kstat.zfs.misc.arcstats.l2_writes_done: 0
kstat.zfs.misc.arcstats.l2_writes_error: 0
kstat.zfs.misc.arcstats.l2_writes_hdr_miss: 0
kstat.zfs.misc.arcstats.l2_evict_lock_retry: 0
kstat.zfs.misc.arcstats.l2_evict_reading: 0
kstat.zfs.misc.arcstats.l2_free_on_write: 0
kstat.zfs.misc.arcstats.l2_abort_lowmem: 0
kstat.zfs.misc.arcstats.l2_cksum_bad: 0
kstat.zfs.misc.arcstats.l2_io_error: 0
kstat.zfs.misc.arcstats.l2_size: 0
kstat.zfs.misc.arcstats.l2_hdr_size: 0
kstat.zfs.misc.arcstats.memory_throttle_count: 1103
```


----------



## Savagedlight (Jan 19, 2010)

*And the final bunch of data*

Now to rerun the previous tests.
Test 9
`dd if=/dev/urandom of=./file9 bs=1m count=1024`

```
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 26.445673 secs (40601796 bytes/sec)
```
It went down from ~70MB/s to ~40MB/s!

`sysctl kstat.zfs.misc.arcstats`

```
kstat.zfs.misc.arcstats.hits: 50068
kstat.zfs.misc.arcstats.misses: 1772
kstat.zfs.misc.arcstats.demand_data_hits: 22496
kstat.zfs.misc.arcstats.demand_data_misses: 1
kstat.zfs.misc.arcstats.demand_metadata_hits: 6257
kstat.zfs.misc.arcstats.demand_metadata_misses: 1338
kstat.zfs.misc.arcstats.prefetch_data_hits: 20686
kstat.zfs.misc.arcstats.prefetch_data_misses: 0
kstat.zfs.misc.arcstats.prefetch_metadata_hits: 629
kstat.zfs.misc.arcstats.prefetch_metadata_misses: 433
kstat.zfs.misc.arcstats.mru_hits: 27114
kstat.zfs.misc.arcstats.mru_ghost_hits: 1073
kstat.zfs.misc.arcstats.mfu_hits: 1639
kstat.zfs.misc.arcstats.mfu_ghost_hits: 30
kstat.zfs.misc.arcstats.deleted: 294281
kstat.zfs.misc.arcstats.recycle_miss: 3643
kstat.zfs.misc.arcstats.mutex_miss: 0
kstat.zfs.misc.arcstats.evict_skip: 702
kstat.zfs.misc.arcstats.hash_elements: 38480
kstat.zfs.misc.arcstats.hash_elements_max: 48274
kstat.zfs.misc.arcstats.hash_collisions: 65447
kstat.zfs.misc.arcstats.hash_chains: 4725
kstat.zfs.misc.arcstats.hash_chain_max: 5
kstat.zfs.misc.arcstats.p: 3199118848
kstat.zfs.misc.arcstats.c: 3221225472
kstat.zfs.misc.arcstats.c_min: 3221225472
kstat.zfs.misc.arcstats.c_max: 3221225472
kstat.zfs.misc.arcstats.size: 3149963720
kstat.zfs.misc.arcstats.hdr_size: 8003840
kstat.zfs.misc.arcstats.l2_hits: 0
kstat.zfs.misc.arcstats.l2_misses: 0
kstat.zfs.misc.arcstats.l2_feeds: 0
kstat.zfs.misc.arcstats.l2_rw_clash: 0
kstat.zfs.misc.arcstats.l2_writes_sent: 0
kstat.zfs.misc.arcstats.l2_writes_done: 0
kstat.zfs.misc.arcstats.l2_writes_error: 0
kstat.zfs.misc.arcstats.l2_writes_hdr_miss: 0
kstat.zfs.misc.arcstats.l2_evict_lock_retry: 0
kstat.zfs.misc.arcstats.l2_evict_reading: 0
kstat.zfs.misc.arcstats.l2_free_on_write: 0
kstat.zfs.misc.arcstats.l2_abort_lowmem: 0
kstat.zfs.misc.arcstats.l2_cksum_bad: 0
kstat.zfs.misc.arcstats.l2_io_error: 0
kstat.zfs.misc.arcstats.l2_size: 0
kstat.zfs.misc.arcstats.l2_hdr_size: 0
kstat.zfs.misc.arcstats.memory_throttle_count: 1169
```

Copying a bunch of files from the UFS2 filesystem to the ZFS filesystem, all of them LARGE files.
Performance is good for a while (zpool iostat claims ~40MB/s), untill suddenly this happens:
[CMD=]zpool iostat 1[/CMD]
	
	



```
capacity     operations    bandwidth
pool         used  avail   read  write   read  write
storage      670G  4.78T      0    439      0  44.6M
storage      670G  4.78T      0      0      0      0
storage      670G  4.78T      0      0      0      0
storage      670G  4.78T      0      0      0      0
storage      671G  4.78T      0      3      0   464K
storage      671G  4.78T      0     50      0  5.75M
storage      671G  4.78T      0      0      0      0
storage      671G  4.78T      0    435      0  44.6M
storage      671G  4.78T      0      0      0      0
storage      671G  4.78T      0      0      0      0
storage      671G  4.78T      0      0      0      0
storage      671G  4.78T      0      0      0      0
storage      671G  4.78T      0     50      0  5.77M
storage      671G  4.78T      0    276      0  34.2M
storage      671G  4.78T      0      0      0      0
storage      671G  4.78T      0    158      0  10.3M
storage      671G  4.78T      0      0      0      0
storage      671G  4.78T      0      0      0      0
storage      671G  4.78T      0      0      0      0
storage      671G  4.78T      0      0      0      0
storage      671G  4.78T      0    288      0  35.5M
storage      671G  4.78T      0    199      0  15.1M
storage      671G  4.78T      0      0      0      0
storage      671G  4.78T      0      0      0      0
storage      671G  4.78T      0      0      0      0
storage      671G  4.78T      0     51      0  5.77M
storage      671G  4.78T      0    440      0  45.1M
```


```
sysctl kstat.zfs.misc.arcstats
kstat.zfs.misc.arcstats.hits: 246502
kstat.zfs.misc.arcstats.misses: 12076
kstat.zfs.misc.arcstats.demand_data_hits: 26592
kstat.zfs.misc.arcstats.demand_data_misses: 7
kstat.zfs.misc.arcstats.demand_metadata_hits: 193843
kstat.zfs.misc.arcstats.demand_metadata_misses: 7060
kstat.zfs.misc.arcstats.prefetch_data_hits: 24321
kstat.zfs.misc.arcstats.prefetch_data_misses: 2
kstat.zfs.misc.arcstats.prefetch_metadata_hits: 1746
kstat.zfs.misc.arcstats.prefetch_metadata_misses: 5007
kstat.zfs.misc.arcstats.mru_hits: 141208
kstat.zfs.misc.arcstats.mru_ghost_hits: 2008
kstat.zfs.misc.arcstats.mfu_hits: 79334
kstat.zfs.misc.arcstats.mfu_ghost_hits: 32
kstat.zfs.misc.arcstats.deleted: 365410
kstat.zfs.misc.arcstats.recycle_miss: 3737
kstat.zfs.misc.arcstats.mutex_miss: 0
kstat.zfs.misc.arcstats.evict_skip: 702
kstat.zfs.misc.arcstats.hash_elements: 33907
kstat.zfs.misc.arcstats.hash_elements_max: 48274
kstat.zfs.misc.arcstats.hash_collisions: 79293
kstat.zfs.misc.arcstats.hash_chains: 3648
kstat.zfs.misc.arcstats.hash_chain_max: 5
kstat.zfs.misc.arcstats.p: 3214337024
kstat.zfs.misc.arcstats.c: 3221225472
kstat.zfs.misc.arcstats.c_min: 3221225472
kstat.zfs.misc.arcstats.c_max: 3221225472
kstat.zfs.misc.arcstats.size: 1900052248
kstat.zfs.misc.arcstats.hdr_size: 7131904
kstat.zfs.misc.arcstats.l2_hits: 0
kstat.zfs.misc.arcstats.l2_misses: 0
kstat.zfs.misc.arcstats.l2_feeds: 0
kstat.zfs.misc.arcstats.l2_rw_clash: 0
kstat.zfs.misc.arcstats.l2_writes_sent: 0
kstat.zfs.misc.arcstats.l2_writes_done: 0
kstat.zfs.misc.arcstats.l2_writes_error: 0
kstat.zfs.misc.arcstats.l2_writes_hdr_miss: 0
kstat.zfs.misc.arcstats.l2_evict_lock_retry: 0
kstat.zfs.misc.arcstats.l2_evict_reading: 0
kstat.zfs.misc.arcstats.l2_free_on_write: 0
kstat.zfs.misc.arcstats.l2_abort_lowmem: 0
kstat.zfs.misc.arcstats.l2_cksum_bad: 0
kstat.zfs.misc.arcstats.l2_io_error: 0
kstat.zfs.misc.arcstats.l2_size: 0
kstat.zfs.misc.arcstats.l2_hdr_size: 0
kstat.zfs.misc.arcstats.memory_throttle_count: 1535
```

top:

```
last pid:  2307;  load averages:  0.32,  0.28,  0.17  up 0+01:23:49  05:49:54
186 processes: 3 running, 163 sleeping, 20 waiting
CPU 0:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
CPU 1:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 58M Active, 4637M Inact, 2894M Wired, 102M Cache, 828M Buf, 222M Free
Swap: 4096M Total, 40K Used, 4096M Free

  PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
   11 root        2 171 ki31     0K    32K RUN     0 157:22 198.10% idle
    0 root       40  -8    0     0K   624K -       1   1:43  0.00% kernel
   12 root       20 -60    -     0K   320K WAIT    0   1:01  0.00% intr
```

After some time, performance picks up again, into the 40-50MB/s range. Still nowhere near where it was in the beginning, though.


Whew, that ent up being a lot of data.
I really hope this information helps in finding out where the write performance degredation resides.


----------



## Matty (Jan 19, 2010)

the problem is in the maxvnodes. Yesterday I set maxvnodes to 10000 (from 800000) and even after copying few gigs from ufs to zfs the active/inactive stayed below 500mb with leaves enough free mem for zfs. 

If the freemem goes below a certain level then performance drops as well. 
As far as I can see the maxvnodes goes into active/inactive mem. ZFS-ARC cache into wired. correct me if I'm wrong


----------



## Cellsplicer (Jan 19, 2010)

Today I added the following directive to my /boot/loader.conf:


```
vfs.zfs.txg.synctime="1"
```

It seems to have made a difference and write throttles occur less often. It is able to sustain writes of over 80MB/s versus 20-30MB/s which I was getting with the default sync time of 5 seconds. Of course this is still down from the 200MB/s+ that I should be getting but it's a good start.

My theory is that by lowering the sync time, ZFS starts writing to the disk straight away, before the ARC fills up. Write throttles seem to occur when the ARC fills up and it needs to write to the disks. If the drives can't keep up during an I/O intensive operation then a write throttle will occur and applications will have to wait until ZFS can sync the disks.


----------



## Savagedlight (Jan 19, 2010)

No performance gain from doing that here - plummeled down to 30-40MB/s write speed right away actually.

I notice I forgot to mention I'm running FBSD 8.0-RELEASE-p2 AMD64.


----------



## garrettmoore (Jan 19, 2010)

*@Savagedlight:* Nice work. Thanks for your data. I see the exact same patterns as you do.

*@Matty:* I'm not sure about your vnodes theory. Whether my performance is good, or when my performance is bad, my vnodes seem to be staying around 12,000.


*Memory:*
I've been watching my memory usage and I have no idea what is consuming memory as 'Active'.

Last night I had around 6500MB 'Active' again, 1500MB Wired, no inact, ~30MB buf, no free, and ~100MB swap used. My performance copying ZFS->ZFS was again slow (<1MB/s). I tried killing rTorrent and no significant amount of memory was reclaimed - maybe 100MB. `ps aux` showed no processes using any significant amount of memory, and I was definitely nowhere near 6500MB usage.

I tried running a perl oneliner to hog a bunch of memory (perl -e '$x="x"x3000000000'), and almost all of the Active memory was IMMEDIATELY marked as Free, and my performance was excellent again.

I'm not sure what in userland could be causing the issue. The only things I've installed are rTorrent, lighttpd, samba, smartmontools, vim, bash, Python, Perl, and SABNZBd. There is nothing that *should* be consuming any serious amount of memory.


*Drives:*
One thing that came up on FreeBSD-stable is that *WD Green drives have a serious issue with their power management under FreeBSD/Linux*. The drive heads park after 8 seconds of inactivity, but BSD/Linux sync to disk less frequently than this, so the drives are continually parking and unparking.

The drives are rated for ~300,000 load cycles; after 2000 power on hours my drives are already over 90,000 load cycles. I used a tool released by WD to change the idle timeout from 8 seconds to 5 minutes (the maximum) and haven't seen my load cycle count increase yet.

Use smartctl to check the Load_Cycle_Count attribute on your drives, and if it is unusually high, get the tool 'wdidle3' and boot to a DOS environment to use it to configure your drive timeouts. 
Details: http://www.silentpcreview.com/Terabyte_Drive_Fix
wdidle3: http://home.arcor.de/ghostadmin/wdidle3_1_00.zip

Doing this does not seem to have helped with my performance issues, but it ought to help drive longevity.


----------



## Matty (Jan 19, 2010)

garrettmoore said:
			
		

> *
> 
> 
> Memory:
> ...


*
which of these programs is reading or writing on ufs?*


----------



## garrettmoore (Jan 19, 2010)

All of the apps are installed to a UFS gmirror. Configuration files are stored on UFS.

All data (rTorrent torrents, folders shared via samba, download locations for sabnzbd, etc) is on ZFS.


----------



## Savagedlight (Jan 20, 2010)

I realize it's a dirty workaround, but I've set the perl one-liner to run every five minutes. It really does make ZFS' performance improve for a while. By improve, I mean "jump up from about 20MB/s writes to about 230MB/s writes". 

It seems to me like there is *something* in relation to the filesystems (cache perhaps?) which causes memory to be marked as Inact when it doesn't really need that particular data any more - but simply doesn't want to *use* Inact memory, even when there's no 'free' memory.

Right now, while copying from UFS2+SoftUpdates to ZFS:

```
Mem: 14M Active, 2487M Inact, 5155M Wired, 202M Cache, 828M Buf, 56M Free
```
(yes, I bumped the ARC size to 4G and kernel size to start at 5G)


----------



## Cellsplicer (Jan 20, 2010)

I can confirm that the perl trick works. After running the following:


```
perl -e '$x="x"x3000000000'
```

I am able to write at over 200MB/s for a short while.


----------



## Savagedlight (Jan 20, 2010)

*Slightly off-topic*:


			
				garrettmoore said:
			
		

> *Drives:*
> I used a tool released by WD to change the idle timeout from 8 seconds to 5 minutes (the maximum) and haven't seen my load cycle count increase yet.



The documentation for the tool states you can use the /D switch to disable the timer. I assume this means it won't ever enter idle mode. Is this worth it, or would that cause longevity issues of its own?

On-topic again:
*Custom kernel*:
In the hassle of trying to make ZFS work properly, I forgot I had compiled a custom kernel. The said kernel were built with these make.conf lines:

```
CPUTYPE=core2
COPTFLAGS= -O2 -pipe
```

It also had these customizations:

```
#makeoptions     DEBUG=-g                # Build kernel without gdb(1) debug symbols
options         QUOTA
options         DEVICE_POLLING
```

I've recompiled the kernel w/o the COPTFLAGS, and reenabled the default debug options. Going to see if this changes anything.

*Memory*
Is there any way of telling FBSDs memory manager to flush out inactive memory, other than by forcing it to like stated earlier in the thread?


----------



## garrettmoore (Jan 20, 2010)

Savagedlight said:
			
		

> *Slightly off-topic*:
> The documentation for the tool states you can use the /D switch to disable the timer. I assume this means it won't ever enter idle mode. Is this worth it, or would that cause longevity issues of its own?


People on the silentpcreview forums claimed that sometimes the disable switch would cause the timer to just stay at it's default setting of 8 seconds instead of actually disabling it. I set it the idle time to 5 minutes over 24 hours ago and they haven't parked again since.


----------



## Matty (Jan 20, 2010)

Savagedlight said:
			
		

> I realize it's a dirty workaround, but I've set the perl one-liner to run every five minutes. It really does make ZFS' performance improve for a while. By improve, I mean "jump up from about 20MB/s writes to about 230MB/s writes".
> 
> It seems to me like there is *something* in relation to the filesystems (cache perhaps?) which causes memory to be marked as Inact when it doesn't really need that particular data any more - but simply doesn't want to *use* Inact memory, even when there's no 'free' memory.
> 
> ...


you could try to set the maxvnodes to much lower level. 
4GB machine:  2800M arc cache leaves around 1 Gb free and with maxvnodes set to 12000 a max of 500 mb gets used by ufs which leaves enough room for proper zfs performance. 
worked for me. 




> 11.13.3.1 kern.maxvnodes
> 
> A vnode is the internal representation of a file or directory. So increasing the number of vnodes available to the operating system cuts down on disk I/O.
> Normally this is handled by the operating system and does not need to be changed. In some cases where disk I/O is a bottleneck and the system is running out of vnodes, this setting will need to be increased.
> ...


----------



## bb (Jan 20, 2010)

The one thing that's for sure is that zfs on FreeBSD competes for your RAM with other filesystems. 

Meaning: Whenever top shows memory as beeing "Inactive", it effectively means that zfs cannot use this memory for it's caches.

So what happens if you copy a large amount of data from a ufs/ext2/msdos filesystem to a zfs filesystem is that more and more memory will get "Inactive", while zfs caches are shrinked to a minimum. Even if the copy job is finished, the memory will stay "Inactive". You have to unmount the ufs/ext2/msdos filesystem to make the "Inactive" memory usable for zfs again.

That doesn't mean that there are no other problems. I don't use zfs on a permanently running machine, and I don't use raidz. But you should check with top, if you have large amounts of "Inactive" memory, and if you have, unmount non-zfs filesystems (if you can) and see if the performance increases to normal.


----------



## Savagedlight (Jan 20, 2010)

Seems like Matty's last post did the trick for me.
ZFS performance seems to be good after adding these lines to /etc/sysctl.conf:

```
kern.maxvnodes=10000
kern.minvnodes=1000
```
(I figured it made sense to change minvnodes to something less than maxvnodes, since it defaulted to 25k for me)


----------



## garrettmoore (Jan 23, 2010)

Lowering my maxvnodes doesn't seem to help my Active memory usage.


----------



## Dorlas (Jan 24, 2010)

garrettmoore said:
			
		

> Hi,
> 
> I'm having problems with ZFS performance. When my system comes up, read/write speeds are excellent (testing with dd if=/dev/zero of=/tank/bigfile and dd if=/tank/bigfile of=/dev/null); I get at least 100MB/s on both reads and writes, and I'm happy with that.
> 
> ...



Welcome!

Encountered this problem - the culprit was a program rtorrent (it eventually all RAM mark as Active - and had no memory for ZFS - hence the slowdown).

Translate and Read this link: http://www.opennet.ru/openforum/vsluhforumID1/87116.html

And use for rtorrent this patch: http://www.opennet.ru/openforum/vsluhforumID1/87116.html # 14

Good Luck!


----------



## wonslung (Jan 26, 2010)

garrettmoore said:
			
		

> I'm not sure what in userland could be causing the issue. The only things I've installed are rTorrent, lighttpd, samba, smartmontools, vim, bash, Python, Perl, and SABNZBd. There is nothing that *should* be consuming any serious amount of memory.



one thing i can say for sure is that with rtorrent if you do not LIMIT the memory rtorrent has available to it on a zfs system it will not work well.

I had this EXACT same issue on a server, by limited both arc and rtorrent's memory i was able to get consistent performance.


----------



## wonslung (Jan 26, 2010)

Dorlas said:
			
		

> Welcome!
> 
> Encountered this problem - the culprit was a program rtorrent (it eventually all RAM mark as Active - and had no memory for ZFS - hence the slowdown).
> 
> ...



you shouldnt' have to patch rtorrent.
It's QUITE simple to just set rtorrent to a maximum amount of memory with a line line this:


```
max_memory_usage = 1024M
```


----------



## phoenix (Jan 27, 2010)

Not sure if this has been mentioned yet, but lighttpd with sendfile support enabled leads to horrible performance with ZFS.

And rtorrent with pre-allocate space enabled will drag down a ZFS pool.

Disabvle those two options (requires re-compiling lighttpd) and see how things go.


----------



## Matty (Jan 27, 2010)

phoenix said:
			
		

> And rtorrent with pre-allocate space enabled will drag down a ZFS pool.


How does zfs handel torrents anyway? because I got poor read performance when reading from a torrent files compared to a file that was uploaded by eq samba?

will all the small bits of a torrent be spread all over the harddisk and is this causing the poorer performance?


----------



## wonslung (Jan 27, 2010)

Matty said:
			
		

> How does zfs handel torrents anyway? because I got poor read performance when reading from a torrent files compared to a file that was uploaded by eq samba?
> 
> will all the small bits of a torrent be spread all over the harddisk and is this causing the poorer performance?



It totally depends on hardware and settings.

When i was running on FreeBSD i NEVER had a problem with ZFS and rtorrent and i am a HEAVY user but i also had a lot of i/o to go around (12 7200 RPM drives )

now i'm on opensolaris and i get even better performance but i really miss a lot of the features in FreeBSD.....The only reason i switched was dedup....it was a really good feature for what i needed....

also, opensolaris allows me to run xen which is QUITE cool but i still use freebsd and zfs for a LOT of rtorrent + rutorrent seedboxes so i can ASSURE you if you set things up correctly you will not have problems.


also, zfs tends to cache up stuff in arc and flush to disk every 5-30 seconds so torrents are no problem in THAT manner.


----------



## xmanpsk (Jan 27, 2010)

wonslung said:
			
		

> you shouldnt' have to patch rtorrent.
> It's QUITE simple to just set rtorrent to a maximum amount of memory with a line line this:
> 
> 
> ...



No, that didn't help me. I had the same problem with Active memory as *garrettmoore*. Before patching I had 4-5GB Active memory with swapping (6GB total installed), after patching  - only 250-300MB and great ZFS perfomance!

Here's instructions for rtorrent-0.8.6:
1). Make libtorrent backup (if something goes wrong):
pkg_create -b libtorrent-0.12.6
2). Extract libtorrent source:
cd /usr/ports/net-p2p/libtorrent
make clean extract
3). Open in your favourite editor file /usr/ports/net-p2p/libtorrent/work/libtorrent-0.12.6/src/data/memory_chunk.cc and find MemoryChunk::unmap() function definition.
4). Make changes:

```
void
  MemoryChunk::unmap() {
    if (!is_valid())
      throw internal_error("MemoryChunk::unmap() called on an invalid object");

+   if (msync(m_ptr, m_end - m_ptr,MS_INVALIDATE) != 0)
+       throw internal_error("MemoryChunk::unmap() - msync() system call failed");
+
    if (munmap(m_ptr, m_end - m_ptr) != 0)
      throw internal_error("MemoryChunk::unmap() system call failed: " + std::string(rak::error_number::current().c_str()));
  }
```
5). Build and install edited libtorrent:
cd /usr/ports/net-p2p/libtorrent
make && make deinstall reinstall

Thanks *Dorlas* for the link.


----------



## tobiastheviking (Jan 28, 2010)

I am getting the problem with "Active" memory, not from rtorrent(which i do have, but there is no correlation between usage and memory usage).

During testing i have also disabled rtorrent and sabnzbd+, and just done a cp from a UFS disk to a ZFS disk(or from ZFS to ZFS for that matter), and gotten the problem.

As it is, right now, both rtorrent and sabnzbd+ are running(on a UFS drive), and i had <100M Active. Then i started an rsync of some 5+ gb data. Active memory is now 1160M. Neither rtorrent or sabnzbd+ is actually doing anything right now, they are just running.


----------



## Matty (Jan 29, 2010)

tobiastheviking said:
			
		

> I am getting the problem with "Active" memory, not from rtorrent(which i do have, but there is no correlation between usage and memory usage).
> 
> During testing i have also disabled rtorrent and sabnzbd+, and just done a cp from a UFS disk to a ZFS disk(or from ZFS to ZFS for that matter), and gotten the problem.
> 
> As it is, right now, both rtorrent and sabnzbd+ are running(on a UFS drive), and i had <100M Active. Then i started an rsync of some 5+ gb data. Active memory is now 1160M. Neither rtorrent or sabnzbd+ is actually doing anything right now, they are just running.


try http://forums.freebsd.org/showpost.php?p=63019&postcount=46


----------



## tobiastheviking (Feb 16, 2010)

Matty said:
			
		

> try http://forums.freebsd.org/showpost.php?p=63019&postcount=46


Done and done.

Well, i currently have an uptime of 21 days, which is bound to be a record on this system. 

I still have 1132M marked as Active, though i can't see what could possibly be using it. But for now, it looks usable.


----------



## Matty (Feb 16, 2010)

tobiastheviking said:
			
		

> Done and done.
> 
> Well, i currently have an uptime of 21 days, which is bound to be a record on this system.
> 
> I still have 1132M marked as Active, though i can't see what could possibly be using it. But for now, it looks usable.



Samba with sendfile and sharing a zfs filesystem? just a guess


----------



## garrettmoore (Feb 17, 2010)

I'm still having issues. I have 8GB of ram now and it's been good for a while, but I'm downloading a 52GB torrent and my write performance keeps going to hell. Running that perl command to flush memory works, but is both tedious and sloppy. 

Has changing min/max vnodes been the fix for everyone else? When I tried lowering my maxvnodes to such a small amount my performance seemed to be much, much worse.


----------



## Matty (Feb 17, 2010)

garrettmoore said:
			
		

> I'm still having issues. I have 8GB of ram now and it's been good for a while, but I'm downloading a 52GB torrent and my write performance keeps going to hell. Running that perl command to flush memory works, but is both tedious and sloppy.
> 
> Has changing min/max vnodes been the fix for everyone else? When I tried lowering my maxvnodes to such a small amount my performance seemed to be much, much worse.



have you tried another torrent client?


----------



## xmanpsk (Feb 18, 2010)

garrettmoore said:
			
		

> I'm still having issues. I have 8GB of ram now and it's been good for a while, but I'm downloading a 52GB torrent and my write performance keeps going to hell. Running that perl command to flush memory works, but is both tedious and sloppy.
> 
> Has changing min/max vnodes been the fix for everyone else? When I tried lowering my maxvnodes to such a small amount my performance seemed to be much, much worse.



Did you try rtorrent patch?


----------



## wonslung (Feb 18, 2010)

xmanpsk said:
			
		

> Did you try rtorrent patch?



I can vouch for the patch.  I was a doubter myself having never really noticed the issue (i was on a high ram system with a lot of fast drives) but when i installed to a 2 gb ram machine it was clearly evident, even on ufs, that rtorrent eventually fills all the ram.

This patch fixes it.  I've mailed it to the maintainer of rtorrent, i think it should be included.


----------



## tobiastheviking (Feb 21, 2010)

Matty said:
			
		

> Samba with sendfile and sharing a zfs filesystem? just a guess



Nope, nfs only. But the problem happens even if i boot into single mode. Without nfs i get the same problem. Samba isn't even installed.


----------



## tobiastheviking (Feb 21, 2010)

wonslung said:
			
		

> I can vouch for the patch.  I was a doubter myself having never really noticed the issue (i was on a high ram system with a lot of fast drives) but when i installed to a 2 gb ram machine it was clearly evident, even on ufs, that rtorrent eventually fills all the ram.
> 
> This patch fixes it.  I've mailed it to the maintainer of rtorrent, i think it should be included.



Is there a bug report for this patch?


----------



## wonslung (Feb 22, 2010)

i opened a bug report with rtorrent's people but thye insist the problem lies with FreeBSD's code, not rtorrent's.

The fact of the matter is, i do not know which is true, all i know is that when i use this patch, i do not notice any adverse effects to my rtorrent experience, only the benifet of my memory not being slowly stuck in ACTIVE.


----------



## dmdx86 (Feb 23, 2010)

Looks like the patch to libtorrent was put into the ports tree. Thanks guys!


----------



## Jago (Feb 24, 2010)

wonslung said:
			
		

> i opened a bug report with rtorrent's people but thye insist the problem lies with FreeBSD's code, not rtorrent's.


The hilarious thing is that they insist that rtorrent is basically perfect and that it's just good at unconvering filesystem issues in different operating systems. This was from a developer comment to a single rtorrent bugreport confirmed by people using Linux, FreeBSD, MacOS X and OpenSolaris across multiple versions of reiserfs, ext2, ufs and zfs. The arrogance of rtorrent developers is *mindboggling*.


----------



## wonslung (Feb 24, 2010)

Jago said:
			
		

> The hilarious thing is that they insist that rtorrent is basically perfect and that it's just good at unconvering filesystem issues in different operating systems. This was from a developer comment to a single rtorrent bugreport confirmed by people using Linux, FreeBSD, MacOS X and OpenSolaris across multiple versions of reiserfs, ext2, ufs and zfs. The arrogance of rtorrent developers is *mindboggling*.



I know.  While i love rtorrent and think it is the best torrent client out there currently, it has OBVIOUS issues on FreeBSD, not because of FreeBSD but because of how the kernel is different, yet whenever you bring these issues up (this memory issue is one instance, another REALLY good example is slow hashing) they blame it on FreeBSD.

I think it's more likely an issue due to poor understanding of the FreeBSD kernel, of course i am not a developer really so it's easy to sit down here and make judgements, in all honesty, i'm not qualified to make them, but what i CAN do, is see that this patch solves the problem, and with no degregation of rtorrent's torrent seeding/downloading ability.


I'd like to see a FreeBSD dev create a patch to make it hash faster on bsd though.


----------



## garrettmoore (Mar 17, 2010)

I tried the patch supplied for libtorrent and I still have all of my memory eventually being marked as Inact when I download torrents. How can I check to make sure rtorrent is definitely using the version of libtorrent I modified and reinstalled?


----------



## wonslung (Mar 17, 2010)

garrettmoore said:
			
		

> I tried the patch supplied for libtorrent and I still have all of my memory eventually being marked as Inact when I download torrents. How can I check to make sure rtorrent is definitely using the version of libtorrent I modified and reinstalled?



the latest libtorrent from ports includes the patch


----------



## garrettmoore (Mar 18, 2010)

How do I make sure I upgrade my libtorrent etc fully? I'm running 8.0. I haven't used ports much.


----------



## DutchDaemon (Mar 18, 2010)

`portmaster -Rf rtorrent\*` or `portupgrade -Rf rtorrent\*`.


----------



## chrcol (Mar 20, 2010)

I have been doing dozens of tests on a raid mirror 2 drive zfs setup along with someone else, adjusting many zfs variables.

conclusions so far on the setup.

optimal queue settings seems to be.

```
vfs.zfs.vdev.min_pending: 4
vfs.zfs.vdev.max_pending: 8
```

the default max 35 seems to flood the drive's whilst 1 is defenitly too low to get full speed and also causes the system to lag too much when the drives are under heavy load, however 1 would be a good setting if there is very little multitasking.

prefetch is a toughy, if its on it almost triples sequential read speed, however it increases latency noticebly and has a noticeable affect on system response when the disks are under stress.

compression, gzip default or gzip-9 both cause the server to reboot when stressing the drives, not sure if this is software or hardware yet.  Also even under light use gzip causes system lag, using mkdir, ls etc.  however the lzjb seems quite impressive and yields very good performance.

disabling data cache so metadata only really improves disk latency however throughput sharply drops.

reducing arc size only hurts performance in our case, however we havent dont much testing on this on yet.

disabling checksum gives about 10% on read speed and about 4% on write speed.  this was done really just out of curiousity.

the hardware is 12 gig of ram and a quad core i7.


----------



## garrettmoore (Jul 15, 2011)

Just as an update --

I am no longer running any torrent apps on my system. All I run now is a few python apps (SickBeard / CouchPotato / SABNZBD), Apache22 serving some static html files, and samba.

I am still having the same performance issues as originally brought up in this thread. Performance is good if I force flush everything out of memory, and then degrades again over time.

I wanted to check to see if anyone had any new suggestions, ideas, or had made some headway in figuring out some tuning options.


----------



## Sebulon (Jul 15, 2011)

Hi,

I have personally had issues with WD Green drives- which seems to be one thing in common here, but never's been mentioned. My issues were never with performance, though I never used it primarily, only as a replicating target. Please show output of:

```
# zdb <pool>
```

Can't make any promises but it's worth ruling out. Also worth trying STABLE, now thats it's got ZFS V28. My sequentials, like scrubs and resilvers have gone up dramatically; x4. My primary pool is made up of 8 Samsung Spinpoint F3 raidz2, I have a Supermicro X7SBE, 8GB RAM and a core2 duo 2.13Ghz. I am now scrubbing at 300MB/s and can shuffle data over NFS async at 100MB/s and SAMBA async at 100MB/s

/boot/loader.conf

```
ahci_load="YES"
aio_load="YES"
zfs_load="YES"
Vfs.zfs.prefetch_disable=1
```

/usr/local/etc/smb.conf

```
socket options = TCP_NODELAY SO_SNDBUF=131072 SO_RCVBUF=131072
use sendfile = no
min receivefile size = 16384
aio read size = 16384
aio write size = 16384
aio write behind = yes
```
Awsome SAMBA tuning, in case anyone's interested.

/Sebulon


----------



## garrettmoore (Jul 16, 2011)

zdb is still running, I'll edit the rest of the results in later. It is taking forever.


```
# zdb tank
    version=13
    name='tank'
    state=0
    txg=2002316
    pool_guid=15631058209680076792
    hostid=1304739570
    hostname='leviathan'
    vdev_tree
        type='root'
        id=0
        guid=15631058209680076792
        children[0]
                type='raidz'
                id=0
                guid=14185904529334632668
                nparity=1
                metaslab_array=23
                metaslab_shift=36
                ashift=9
                asize=12002376286208
                is_log=0
                children[0]
                        type='disk'
                        id=0
                        guid=15290683616584576164
                        path='/dev/da0'
                        whole_disk=0
                        DTL=35
                children[1]
                        type='disk'
                        id=1
                        guid=8251901779817056534
                        path='/dev/da1'
                        whole_disk=0
                        DTL=34
                children[2]
                        type='disk'
                        id=2
                        guid=9617199221839498887
                        path='/dev/da2'
                        whole_disk=0
                        DTL=33
                children[3]
                        type='disk'
                        id=3
                        guid=11494989113403118025
                        path='/dev/da3'
                        whole_disk=0
                        DTL=94
                children[4]
                        type='disk'
                        id=4
                        guid=10053854906903946266
                        path='/dev/da4'
                        whole_disk=0
                        DTL=31
                children[5]
                        type='disk'
                        id=5
                        guid=2928242912600629893
                        path='/dev/da5'
                        whole_disk=0
                        DTL=87
                children[6]
                        type='disk'
                        id=6
                        guid=13488841482098780283
                        path='/dev/da6'
                        whole_disk=0
                        DTL=27
                children[7]
                        type='disk'
                        id=7
                        guid=668559818837929671
                        path='/dev/da7'
                        whole_disk=0
                        DTL=128
Uberblock

        magic = 0000000000bab10c
        version = 13
        txg = 2210544
        guid_sum = 9377515222552487103
        timestamp = 1310829373 UTC = Sat Jul 16 11:16:13 2011

Dataset mos [META], ID 0, cr_txg 4, 46.7M, 131 objects
Dataset tank [ZPL], ID 16, cr_txg 1, 4.43T, 78621 objects
```

_I am actually considering replacing all 8 drives with different drives. I have had way too many issues with these and I'm actually kind of fed up. A bunch of them also have a really high load cycle count and will probably need to have the warranty used still. (I've had to warranty 4 out of 8 of them). I'm in Canada so I have limited drive selection - mainly ncix.com, newegg.ca, and canadacomputers.com. I can't find the exact drives you mention but I could get *WD Caviar Blacks*. Do they have any ridiculous settings which would be problematic for RAIDZ, or would they work well?_

edit: Just got back from the store, bought 8 *Western Digital Caviar Black (WD2002FAEX) 2000GB (2TB) SATA3 7200RPM 64M Cache (OEM)*. I'm going to start replacing all of my drives one at a time


I'm running 8.0-REL still. I've never done an upgrade of FreeBSD. Is it reliable/stable to upgrade both the OS, and a ZFS pool? Not losing data is the most important thing to me.


----------



## Sebulon (Jul 16, 2011)

Hi,

the main issue with WD Green drives is that they are "Advanced format", aka 4k drives (anyone second me on that). You can confirm that by looking at your exact model and post it here or googling it yourself if that exact model is a 4k drive. If so, then your pool needs to be redone from scratch or zpool replaced one by one with regular 512b drives, like e.g. those black drives you mentioned. If the black drives are bigger in size, you will gain that capacity after the final drive is replaced. If you decide to redo the pool, you need to back everything up, crash the pool, then:

```
# gnop create -S 4096 da0
# zpool create tank raidz da0.nop da{1,2,3,4,5,6,7}
# zpool export tank
# gnop destroy da0.nop
# zpool import tank
# zdb tank
```

And look for "ashift=12", instead of the 9 you have now. Worth noting is that the ashift value is per vdev, so you can mix 4k and regular drives in the same pool with separate vdevs, but not in the same vdev, or your performance goes kaput.

PS. I am paranoid enough to have raidz2 on eight drives. I've actually had two drives giving up on me at the same time, so I was glad I made that choice. If you decide to redo your pool, maybe that is worth considering?

/Sebulon


----------



## garrettmoore (Jul 16, 2011)

I'm so sick of the Green drives doing stupid stuff though like the load cycling (due to 'wdidle'), I figured I'd just cough up the money to replace them entirely.

I'm not going to recreate the pool from scratch because I have 5TB of used space and nowhere else to put that data. I'll just incrementally replace each drive with the 2TB ones.

I'm assuming I'll be OK with not having 2 simultaneous failures. My entire array is for backups of my workstation, and for "backups" of my media, so it's nothing irreplaceable. If I lose it I'll be sad but I can always rebuild it.


Also how long will 'zdb' with no arguments take? It's been running for at least 4 or 5 hours. Is there any point in letting it run, or should I just kill it?


----------



## Sebulon (Jul 17, 2011)

Kill it. zdb traverses all of your data before it's done, so how long depends on how fast it goes and how much data is stored, which was quite alot.

And I wouldn't be too sad about not fiinding any Samsung drives, if I were you- they drop like flies I've had to replace all of them within a year. The main thing nowadays, I think, is to have 5.4k rpm 3.5" drives, instead of 7.2k rpm 3.5". They are just as good at shuffling data at large block sizes, but they are generally alot cooler, so they last longer.

Another option is if you have enough interfaces and enough power connectors would be to connect all of them, create a tank2 and zfs send/recv between them. Perhaps less tedious than replacing one drive at a time and waiting for resilver to finish?

I wish you all the best!

/Sebulon


----------



## garrettmoore (Jul 21, 2011)

Well, this is infuriating.

I finished rebuilding my array with 8x 2TB WD Black drives. My resilvering speed was increasing the fewer green drives I had in the array. From watching gstat, the final drive was resilvering at up to 60MB/s.

My read speed from the array locally seems to be good. Copying files from the pool to somewhere else in the pool gives me around 80MB/s. Copying to my UFS system drive (300GB Seagate Barracuda) gives me about 60MB/s, which is probably around the max speed of that drive.

Writing to the array over the network seems good - around 75MB/s, although it seems to fluctuate. Watching gstat is weird - no disk activity for 5 seconds, then all of the data is written at once.

Reading to the array over the network - AWFUL. I can't get more than 8MB/s or so. I get the exact same speeds via FTP and Samba. 

I just removed samba33 and installed samba34 with AIO and some of the suggestions here. No difference.

I was using kern.minvnodes=25000 kern.maxvnodes=75000 and tried setting them to 1000/10000 respectively; no difference.

Prefetch was disabled; I enabled it and rebooted. No difference.

What the hell is going on?


*edit:* All the above was from my FreeBSD box to my Win7 workstation with a Velociraptor 300GB hdd. I just tried copying files to my Win7 HTPC with an Agility3 SSD, and I get 70MB/s+ over SMB. All the computers are plugged into the same 48 port switch _and in the same room_. Oh my god I think I'm going to have a brain aneuryism.


----------



## AndyUKG (Jul 21, 2011)

garrettmoore said:
			
		

> Reading to the array over the network - AWFUL. I can't get more than 8MB/s or so. I get the exact same speeds via FTP and Samba.



Is read performance better if you share a UFS file system via Samba or FTP? It would certainly be very odd if this was due to ZFS given the other performance test you mention in your last post...

thanks Andy.


----------



## garrettmoore (Jul 22, 2011)

I get the exact same transfer speeds to my desktop from a UFS partition. It seems like it's some sort of networking issue. 

Server is using a Realtek onboard nic on a Gigabyte motherboard:

```
re0: <RealTek 8168/8168B/8168C/8168CP/8168D/8168DP/8111B/8111C/8111CP/8111DP PCIe Gigabit Ethernet> port 0xce00-0xceff mem
 0xfddff000-0xfddfffff,0xfdde0000-0xfddeffff irq 18 at device 0.0 on pci3
```

Desktop is also using a Realtek onboard nic on a Gigabyte motherboard.

HTPC is using a Marvell-Yukon onboard nic.

Performance seems pretty stable now on the server.

edit: Just tried benchmarking with bonnie++ with all default settings. My gmirror RAID1 array (used for the OS, UFS, 2x300GB Seagate Barracuda):

```
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
leviathan       16G  1044  99 46138   6 18059   3  1962  93 40800   4 239.1   4
Latency              8094us     322ms    3969ms   93464us     230ms    4940ms
Version  1.96       ------Sequential Create------ --------Random Create--------
leviathan           -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 16907  23 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
Latency               292ms      36us      39us   95931us      63us      50us
```

ZFS:

```
Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
leviathan       16G   141  97 182266  37 148702  29   459  99 483250  53 182.0   9
Latency             64380us    5597ms    5687ms   37178us     542ms     595ms
Version  1.96       ------Sequential Create------ --------Random Create--------
leviathan           -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 20104  61 29755  54 20827  94 29241  95 +++++ +++ 28826  91
Latency               184ms     252ms     353us   20234us      64us     174us
```

If I'm reading that right, 182MB/s write and 483MB/s read? Seems good to me! Also, my speeds are totally consistent - I'm not noticing any slowdown like I used to see. Hooray!!!


----------

