# l2arc degraded



## pillai_hfx (Aug 8, 2014)

I have a new ZFS deployment with an Intel DC S3500 as L2ARC running FreeBSD 9.3. Smart monitoring tools show no errors on the drive. But zfs-stats show the following 


```
L2 ARC Summary: (DEGRADED)
        Passed Headroom:                        23.31m
        Tried Lock Failures:                    4.28m
        IO In Progress:                         13.87k
        Low Memory Aborts:                      103
        Free on Write:                          1.44m
        Writes While Full:                      2.40m
        R/W Clashes:                            1.74k
        Bad Checksums:                          1.15m
        IO Errors:                              501.55k
        SPA Mismatch:                           11
```


```
kstat.zfs.misc.arcstats.l2_compress_successes: 51418193
kstat.zfs.misc.arcstats.l2_compress_zeros: 0
kstat.zfs.misc.arcstats.l2_compress_failures: 108469007
```

     I have an identical hardware in deployment on FreeBSD 9.2 without this problem.
Looks like this may be related to L2ARC compression according to the url's below


https://bugs.freenas.org/projects/f...ions/ededc82f03e5dd5a518ccda3b0aae506fbfd8fc8

http://svnweb.freebsd.org/base?view=revision&sortby=file&revision=256889

https://bugs.freenas.org/issues/3418

http://lists.freebsd.org/pipermail/freebsd-current/2013-October/045088.html

Until the above fixes show up on release versions, is there any thing that could be done now to solve this issue? I would prefer to stick with release versions. The server is otherwise working fine. Thanks.


----------



## Toast (Aug 9, 2014)

pillai_hfx said:
			
		

> http://svnweb.freebsd.org/base?view=revision&sortby=file&revision=256889



Release 9.3 and 10.0 should have that patch already.
http://svnweb.freebsd.org/base?view=rev ... ion=262173
http://svnweb.freebsd.org/base?view=rev ... ion=257058


----------



## pillai_hfx (Aug 9, 2014)

Thanks. I looked at arc.c and it indeed has that patch in 9.3. But this didn't fix the issue. It would have been nice if the L2ARC compression could be disabled easily by sysctl for now.


----------



## ZFSZealot (Aug 27, 2014)

I can confirm, running 9.3-RELEASE that this is happening.  This is on Intel 120GB 320 SSDs attached to the motherboard ICH9 controller ports, AHCI enabled, on a SuperMicro X7SBE board.  I've been using this set of hardware since 8.0 and never saw the problem until 9.3-RELEASE with the L2ARC compression.  I tend to trust Intel to report brokenness via SMART.  ashift on my root pool and 15k rpm stripe/mirror pool is 9, and on my 7200rpm raidz3 pool it's 12.

Any updates or any way to disable L2ARC compression?


----------



## SirDice (Aug 27, 2014)

ZFSZealot said:
			
		

> Any updates or any way to disable L2ARC compression?


Not sure if it's going to help but have you tried removing and re-adding the L2ARC?


----------



## ZFSZealot (Aug 28, 2014)

@SirDice, yes, as an experiment, I removed the L2ARC from all of the pools and then added them back to just rpool and my 15k RPM pool, as both of them are ashift=9. I'm waiting for those to fill up before declaring that the errors aren't increasing (`sysctl` of course still shows the same number of errors as I haven't rebooted). Right now I'm seeing no increase in errors (kstat.zfs.misc.arcstats.l2_cksum_bad and kstat.zfs.misc.arcstats.l2_io_error) and almost 60GB of L2ARC written. It would be nice to be able to isolate this just to situations where ashift=12 as is the case with my bulk storage pool.


----------



## Sebulon (Nov 5, 2014)

ZFSZealot said:


> @SirDice, yes, as an experiment, I removed the L2ARC from all of the pools and then added them back to just rpool and my 15k RPM pool, as both of them are ashift=9. I'm waiting for those to fill up before declaring that the errors aren't increasing (`sysctl` of course still shows the same number of errors as I haven't rebooted). Right now I'm seeing no increase in errors (kstat.zfs.misc.arcstats.l2_cksum_bad and kstat.zfs.misc.arcstats.l2_io_error) and almost 60GB of L2ARC written. It would be nice to be able to isolate this just to situations where ashift=12 as is the case with my bulk storage pool.



Since you never posted an update to this, I´m asking, how did it go?

/Sebulon


----------



## ZFSZealot (Jan 4, 2015)

Thanks for checking back.  Yes, unfortunately I do see "degraded" with L2ARC devices only on `ashift=9` pools.  This seems to show up when the L2ARC devices fill completely.  The other thing that's strange is that it looks like it's allocated a lot more cache than there is space for (`zfs-stats -L`):


```
L2 ARC Summary: (DEGRADED)
  Passed Headroom:  87.71m
  Tried Lock Failures:  102.49m
  IO In Progress:  14.96k
  Low Memory Aborts:  194.42k
  Free on Write:  338.78k
  Writes While Full:  57.84k
  R/W Clashes:  7.84k
  Bad Checksums:  11.20m
  IO Errors:  956.77k
  SPA Mismatch:  142.81b

L2 ARC Size: (Adaptive)  222.59  GiB
  Header Size:  2.00% 4.45  GiB

L2 ARC Evicts:
  Lock Retries:  291
  Upon Reading:  0

L2 ARC Breakdown:  251.27m
  Hit Ratio:  16.09% 40.42m
  Miss Ratio:  83.91% 210.85m
  Feeds:  4.02m

L2 ARC Buffer:
  Bytes Scanned:  2.17 PiB
  Buffer Iterations:  4.02m
  List Iterations:  256.91m
  NULL List Iterations:  69.43m

L2 ARC Writes:
  Writes Sent:  100.00%  1.85m
```

Output of `zpool list -v`:


```
NAME  SIZE  ALLOC  FREE  CAP  DEDUP  HEALTH  ALTROOT
rpool  85G  48.2G  36.8G  56%  1.00x  ONLINE  -
  mirror  85G  48.2G  36.8G  -
  gpt/disk1  -  -  -  -
  gpt/disk2  -  -  -  -
  gpt/disk0  -  -  -  -
cache  -  -  -  -  -  -
  ada3  14.9G  14.9G  8.00M  -
sas15k  1.09T  683G  429G  61%  1.00x  ONLINE  -
  mirror  278G  171G  107G  -
  da9  -  -  -  -
  da13  -  -  -  -
  mirror  278G  171G  107G  -
  da10  -  -  -  -
  da14  -  -  -  -
  mirror  278G  171G  107G  -
  da11  -  -  -  -
  da15  -  -  -  -
  mirror  278G  171G  107G  -
  da12  -  -  -  -
  da16  -  -  -  -
cache  -  -  -  -  -  -
  gpt/sas15kl2arc  32.0G  32.0G  6.95M  -
sata7k  20T  14.0T  6.02T  69%  1.00x  ONLINE  -
  raidz3  20T  14.0T  6.02T  -
  da8  -  -  -  -
  da3  -  -  -  -
  da4  -  -  -  -
  da7  -  -  -  -
  da22  -  -  -  -
  da20  -  -  -  -
  da5  -  -  -  -
  da21  -  -  -  -
  da6  -  -  -  -
  da0  -  -  -  -
  da2  -  -  -  -
cache  -  -  -  -  -  -
  gpt/sata7kl2arc  32.0G  32.0G  8M  -
scratch  79.5G  2.00G  77.5G  2%  1.00x  ONLINE  -
  gpt/scratch  79.5G  2.00G  77.5G  -
```

Notice I have a total of 15+32+32GB of L2ARC devices but `zfs-stats -L` says the L2ARC size is 222GB.

If this isn't a problem anyone else is seeing I'm willing to blame hardware.  The L2ARC devices are Intel 320 series SSDs on a SuperMicro X7SBE motherboard SATA controller (Intel ICH9).  It has been quite stable and I don't see any SMART errors though, and it's odd "degraded" doesn't show up until L2ARC devices are full.  Also, any hardware failure would have to have coincidentally happened at the exact same time as I updated to 9.3.

Is there any way to see which one of the L2ARC devices ZFS thinks is reading bad data?


----------



## ZFSZealot (Jan 4, 2015)

Sorry, just realized that 222GB is possible on 79GB of space because of compression.  Still getting the errors though.  It seems a lot like this post, just smaller numbers:

http://lists.freebsd.org/pipermail/freebsd-current/2013-October/045088.html


----------



## ZFSZealot (Feb 11, 2015)

And now I've moved the server to 10.1-RELEASE on a different server (PowerEdge 2950) with different SSDs (but same spinning disks and external enclosures), and have found it's happening there also.  Another strange artifact - notice it says 16.0E (exabytes???!) are free on my "sata7kl2arc" device:


```
root@cadence:/ # zpool list -v
NAME                                     SIZE  ALLOC   FREE   FRAG  EXPANDSZ    CAP  DEDUP  HEALTH  ALTROOT
rpool                                    247G  11.6G   235G      -         -     4%  1.00x  ONLINE  -
  mirror                                 247G  11.6G   235G      -         -
    da1p3                                   -      -      -      -         -
    da0p3                                   -      -      -      -         -
cache                                       -      -      -      -      -      -
  gpt/rpooll2arc                        16.0G  1.15G  14.8G     0%         -
sas15k                                  1.09T   678G   434G      -         -    60%  1.00x  ONLINE  -
  mirror                                 278G   170G   108G      -         -
    diskid/DISK-BJ00PA1041UG                -      -      -      -         -
    diskid/DISK-BJ00P86011L5                -      -      -      -         -
  mirror                                 278G   170G   108G      -         -
    diskid/DISK-BJ00PA1040D3                -      -      -      -         -
    diskid/DISK-BJ00P86011F0                -      -      -      -         -
  mirror                                 278G   169G   109G      -         -
    diskid/DISK-BJ00PA1041YB                -      -      -      -         -
    diskid/DISK-BJ00PA1041P5                -      -      -      -         -
  mirror                                 278G   170G   108G      -         -
    diskid/DISK-BJ00PA1041H9                -      -      -      -         -
    diskid/DISK-BJ00P86011FU                -      -      -      -         -
cache                                       -      -      -      -      -      -
  gpt/sas15kl2arc                       32.0G  17.8G  14.2G     0%         -
sata7k                                    30T  14.5T  15.5T      -         -    48%  1.00x  ONLINE  -
  raidz3                                  30T  14.5T  15.5T      -         -
    diskid/DISK-%20%20%20%20%20%20MK0371YHK8475A      -      -      -      -         -
    diskid/DISK-%20%20%20%20%20%20MK0371YHK524RG      -      -      -      -         -
    diskid/DISK-%20%20%20%20%20%20MK0371YHK9K5YA      -      -      -      -         -
    diskid/DISK-%20%20%20%20%20%20MK0371YHK9DPNA      -      -      -      -         -
    diskid/DISK-%20%20%20%20%20%20MK0351YHGEW0DA      -      -      -      -         -
    diskid/DISK-%20%20%20%20%20%20PN2234P8KKP4WY      -      -      -      -         -
    diskid/DISK-%20%20%20%20%20%20MK0371YHK9EATA      -      -      -      -         -
    diskid/DISK-%20%20%20%20%20%20MK0371YHHRUSMA      -      -      -      -         -
    diskid/DISK-%20%20%20%20%20%20MK0371YHJZB13G      -      -      -      -         -
    diskid/DISK-%20%20%20%20%20%20MK0371YHK9K6TA      -      -      -      -         -
    diskid/DISK-%20%20%20%20%20%20MK0371YHK9DX3A      -      -      -      -         -
cache                                       -      -      -      -      -      -
  gpt/sata7kl2arc                       32.0G  41.3G  16.0E     0%         -
```

I've tested, wiped and tested the SSDs again and again.  SMART reports are clean.  Pool status looks OK also:


```
root@cadence:/ # zpool status
  pool: rpool
state: ONLINE
status: The pool is formatted using a legacy on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
        pool will no longer be accessible on software that does not support feature
        flags.
  scan: scrub repaired 0 in 0h7m with 0 errors on Mon Feb  9 20:56:59 2015
config:

        NAME              STATE     READ WRITE CKSUM
        rpool             ONLINE       0     0     0
          mirror-0        ONLINE       0     0     0
            da1p3         ONLINE       0     0     0
            da0p3         ONLINE       0     0     0
        cache
          gpt/rpooll2arc  ONLINE       0     0     0

errors: No known data errors

  pool: sas15k
state: ONLINE
status: The pool is formatted using a legacy on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
        pool will no longer be accessible on software that does not support feature
        flags.
  scan: scrub repaired 0 in 1h23m with 0 errors on Sun Feb  8 12:26:21 2015
config:

        NAME                          STATE     READ WRITE CKSUM
        sas15k                        ONLINE       0     0     0
          mirror-0                    ONLINE       0     0     0
            diskid/DISK-BJ00PA1041UG  ONLINE       0     0     0
            diskid/DISK-BJ00P86011L5  ONLINE       0     0     0
          mirror-1                    ONLINE       0     0     0
            diskid/DISK-BJ00PA1040D3  ONLINE       0     0     0
            diskid/DISK-BJ00P86011F0  ONLINE       0     0     0
          mirror-2                    ONLINE       0     0     0
            diskid/DISK-BJ00PA1041YB  ONLINE       0     0     0
            diskid/DISK-BJ00PA1041P5  ONLINE       0     0     0
          mirror-3                    ONLINE       0     0     0
            diskid/DISK-BJ00PA1041H9  ONLINE       0     0     0
            diskid/DISK-BJ00P86011FU  ONLINE       0     0     0
        cache
          gpt/sas15kl2arc             ONLINE       0     0     0

errors: No known data errors

  pool: sata7k
state: ONLINE
status: The pool is formatted using a legacy on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
        pool will no longer be accessible on software that does not support feature
        flags.
  scan: scrub repaired 0 in 7h42m with 0 errors on Sat Feb  7 00:32:55 2015
config:

        NAME                                              STATE     READ WRITE CKSUM
        sata7k                                            ONLINE       0     0     0
          raidz3-0                                        ONLINE       0     0     0
            diskid/DISK-%20%20%20%20%20%20MK0371YHK8475A  ONLINE       0     0     0
            diskid/DISK-%20%20%20%20%20%20MK0371YHK524RG  ONLINE       0     0     0
            diskid/DISK-%20%20%20%20%20%20MK0371YHK9K5YA  ONLINE       0     0     0
            diskid/DISK-%20%20%20%20%20%20MK0371YHK9DPNA  ONLINE       0     0     0
            diskid/DISK-%20%20%20%20%20%20MK0351YHGEW0DA  ONLINE       0     0     0
            diskid/DISK-%20%20%20%20%20%20PN2234P8KKP4WY  ONLINE       0     0     0
            diskid/DISK-%20%20%20%20%20%20MK0371YHK9EATA  ONLINE       0     0     0
            diskid/DISK-%20%20%20%20%20%20MK0371YHHRUSMA  ONLINE       0     0     0
            diskid/DISK-%20%20%20%20%20%20MK0371YHJZB13G  ONLINE       0     0     0
            diskid/DISK-%20%20%20%20%20%20MK0371YHK9K6TA  ONLINE       0     0     0
            diskid/DISK-%20%20%20%20%20%20MK0371YHK9DX3A  ONLINE       0     0     0
        cache
          gpt/sata7kl2arc                                 ONLINE       0     0     0

errors: No known data errors
```

The pools are all version 28, and I've hesitated upgrading them as I don't need any of the newer features, and they'll still import into Solaris if need be.

Should I post a PR per the handbook?  Is there any other info I can provide?


----------



## Sebulon (Feb 11, 2015)

Hey man!

Try going up to 10.1-STABLE instead. It seems a lot was fixed after 10.1-RELEASE:
FreeBSD 10.1-STABLE #0 r277949
`# zpool list -v`

```
NAME  SIZE  ALLOC  FREE  EXPANDSZ  FRAG  CAP  DEDUP  HEALTH  ALTROOT
pool1  3.97G  639M  3.34G  -  11%  15%  1.00x  ONLINE  -
  mirror  3.97G  639M  3.34G  -  11%  15%
  gpt/disk1  -  -  -  -  -  -
  gpt/disk2  -  -  -  -  -  -
pool2  59.5G  26.2G  33.3G  -  40%  44%  1.00x  ONLINE  -
  raidz1  29.8G  13.1G  16.7G  -  41%  43%
  gpt/disk3  -  -  -  -  -  -
  gpt/disk4  -  -  -  -  -  -
  gpt/disk5  -  -  -  -  -  -
  raidz1  29.8G  13.1G  16.7G  -  40%  44%
  gpt/disk6  -  -  -  -  -  -
  gpt/disk7  -  -  -  -  -  -
  gpt/disk10  -  -  -  -  -  -
cache  -  -  -  -  -  -
  gpt/disk9  9.99G  9.19G  820M  -  0%  91%
```

/Sebulon


----------



## gkontos (Feb 11, 2015)

Sebulon said:


> Try going up to 10.1-STABLE instead. It seems a lot was fixed after 10.1-RELEASE:



Hi, do you have any particular details because in the release notes I don't see anything...


----------



## ZFSZealot (Feb 11, 2015)

I can give it a shot as I can always get back through the magic of snapshots, but this is production instead of test so it may be a few days... In the meantime is it worth filing a PR?

At the risk of possibly conflating different problems and acknowledging this was 9.2, I'm going to say this sounds exactly like https://bugs.freenas.org/issues/5347, but that was determined by the reporter to be a hardware problem.  Like reported for the cited FreeNAS bug, my problem doesn't show until one of the L2ARC devices gets full.  Again, I never saw this problem until L2ARC compression was committed.

I'm using Intel 120GB 320 and 330 SSDs (da2, da3, da5) in a PowerEdge 2950, connected to the LSI 1068e based controller in the storage slot, so very different hardware.  I've partitioned smaller 32GB and 16GB slices to constrain L2ARC size as the server only has 32GB of RAM.


----------



## User23 (Feb 11, 2015)

I have the same problem but only on one server with FreeBSD 10.1. Unfortunately the server needed to reboot 2 days ago, so everything looks fine.
Before the reboot it showed around 100k+ IO errors on 2x http://ark.intel.com/products/56601/Intel-SSD-X25-M-Series-80GB-2_5in-SATA-3Gbs-34nm-MLC 

At the moment:

```
L2 ARC Summary: (HEALTHY)
    Passed Headroom:            3.34m
    Tried Lock Failures:            11.80m
    IO In Progress:                92.44k
    Low Memory Aborts:            44
    Free on Write:                35.06k
    Writes While Full:            3.75k
    R/W Clashes:                1.71k
    Bad Checksums:                0
    IO Errors:                0
    SPA Mismatch:                23.74k

L2 ARC Size: (Adaptive)                130.00    GiB
    Header Size:            2.09%    2.72    GiB

L2 ARC Breakdown:                111.71m
    Hit Ratio:            11.92%    13.31m
    Miss Ratio:            88.08%    98.39m
    Feeds:                    191.58k

L2 ARC Buffer:
    Bytes Scanned:                104.72    TiB
    Buffer Iterations:            191.58k
    List Iterations:            12.11m
    NULL List Iterations:            5.87m

L2 ARC Writes:
    Writes Sent:            100.00%    128.13k
```

I think its worth filing a PR.


----------



## Sebulon (Feb 11, 2015)

gkontos said:


> Hi, do you have any particular details because in the release notes I don't see anything...



Sure have: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197164

/Sebulon


----------



## ZFSZealot (Feb 12, 2015)

Sebulon said:


> Sure have: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197164
> 
> /Sebulon



I had a problem like if not identical to that described in that bug report, on the original hardware that only had 8GB of RAM.  I was using 120GB + 120GB + 16GB SSD L2ARC devices which was already far more than the 5x RAM recommended, and when L2ARC compression came along in 9.3-RELEASE it pushed the RAM usage for L2ARC headers to the point where the system was actively swapping and then hung as described.  I caught the system before it completely hung a couple of times and removed the L2ARC devices, freeing up a large amount of RAM and restoring the system to a working state.  Then I started constraining, using partitions, the size of the L2ARC devices.  I had analyzed it as a misconfiguration, not a bug.

Unfortunately, I'm not convinced that 197164 has anything to do with the L2ARC IO errors and degraded state, although that showed up at the same time as L2ARC compression also.  197164 seems more like what Karl was describing here: http://lists.freebsd.org/pipermail/freebsd-bugs/2014-March/055604.html.

Someone did already file a PR for the problem described here though: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=195746.

I will try updating to 10.1-STABLE in a few days if possible.


----------



## Sebulon (Feb 13, 2015)

ZFSZealot
Now I see what you mean, no that´s not fixed:

```
cache  -  -  -  -  -  -
  gpt/cache1  238G  306G  16.0E  -  0%  128%
  gpt/cache2  238G  303G  16.0E  -  0%  127%
```
I´m thinking it´s purely cosmetical though, since everything else seems OK. Like it´s the "FREE" calculation that just freaks out since "ALLOC" > "SIZE". But possible now due to the compression in L2ARC that didn´t exist back when the guys at Sun wrote it.

/Sebulon


----------



## User23 (Feb 13, 2015)

And any IO Errors in the L2 ARC Summary?


----------



## Sebulon (Feb 13, 2015)

User23 said:


> And any IO Errors in the L2 ARC Summary?



Yes there are, but they may also just be "normal" if kstat.zfs.misc.arcstats.l2_compress_failures counts as errors? I don´t think a failure to compress something is an error, it did it´s best and just couldn´t, the L2ARC shouldn´t be negatively affected by it, in my opinion.

/Sebulon


----------



## User23 (Feb 13, 2015)

I don't know if these compress failures count as IO Errors.


----------



## Sebulon (Feb 13, 2015)

Neither do I, was just thinking out loud, is all.

/Sebulon


----------



## User23 (Feb 13, 2015)

Well the "degraded L2ARC problem" can't be only compression related. Compression is not enabled on my datasets,
and the L2ARC was degraded after 60-70 days uptime. Before 10.1 the server need to be restarted at least one time a month.
I'll stay with 10.1 Release until this happens again and test 10.1 Stable after that.


----------



## ZFSZealot (Feb 13, 2015)

Sebulon said:


> ZFSZealot
> Now I see what you mean, no that´s not fixed:
> 
> ```
> ...



Right, that's one of the issues I pointed out but yes, it's probably cosmetic.  What I'm concerned about is `zfs-stats -L` showing DEGRADED, with IO errors and bad checksums.  Again, this didn't happen until I upgraded to 9.3-RELEASE and it continues to show up on different hardware with 10.1-RELEASE.  I've extensively tested the hardware and there is no indication of any hardware problem that would explain the I/O errors.  I also see exactly 0 I/O errors until one of the L2ARC devices fills completely with cache data.


```
root@cadence:/ # zfs-stats -L

------------------------------------------------------------------------
ZFS Subsystem Report                            Fri Feb 13 10:06:58 2015
------------------------------------------------------------------------

L2 ARC Summary: (DEGRADED)
        Passed Headroom:                        30.19m
        Tried Lock Failures:                    24.75m
        IO In Progress:                         247
        Low Memory Aborts:                      103
        Free on Write:                          54.95k
        Writes While Full:                      10.62k
        R/W Clashes:                            562
        Bad Checksums:                          1.29m
        IO Errors:                              128.28k
        SPA Mismatch:                           48.35b

L2 ARC Size: (Adaptive)                         33.47   GiB
        Header Size:                    2.24%   767.31  MiB

L2 ARC Evicts:
        Lock Retries:                           18
        Upon Reading:                           0

L2 ARC Breakdown:                               35.45m
        Hit Ratio:                      26.64%  9.45m
        Miss Ratio:                     73.36%  26.01m
        Feeds:                                  567.04k

L2 ARC Buffer:
        Bytes Scanned:                          528.84  TiB
        Buffer Iterations:                      567.04k
        List Iterations:                        36.05m
        NULL List Iterations:                   961.29k

L2 ARC Writes:
        Writes Sent:                    100.00% 135.47k

------------------------------------------------------------------------
```

And here:


```
root@cadence:/ # sysctl kstat.zfs.misc.arcstats.l2_io_error
kstat.zfs.misc.arcstats.l2_io_error: 128275
root@cadence:/ # sysctl kstat.zfs.misc.arcstats.l2_cksum_bad
kstat.zfs.misc.arcstats.l2_cksum_bad: 1290021
```


----------



## ZFSZealot (Feb 13, 2015)

User23 said:


> Well the "degraded L2ARC problem" can't be only compression related. Compression is not enabled on my datasets,
> and the L2ARC was degraded after 60-70 days uptime. Before 10.1 the server need to be restarted at least one time a month.
> I'll stay with 10.1 Release until this happens again and test 10.1 Stable after that.



I'm not convinced that because you don't have compression enabled on any filesystems that it doesn't compress L2ARC anyway.  Ever since 9.3-RELEASE it compresses and there doesn't seem to be any way to turn it off.  This whole thread is worth a read, and I think this problem has been around a while and hasn't ever been fixed:

http://lists.freebsd.org/pipermail/freebsd-current/2013-October/045695.html

You can infer that compression is enabled thus:


```
root@cadence:/ # sysctl kstat.zfs.misc.arcstats.l2_compress_successes
kstat.zfs.misc.arcstats.l2_compress_successes: 1352012
```

What I have not looked for is whether or not this problem exists only on FreeBSD or if it's present on other operating systems that use OpenZFS.


----------



## User23 (Feb 13, 2015)

You're right. It is compressing anyway. Even on 9.2 Release.


----------



## ZFSZealot (Feb 13, 2015)

User23 said:


> You're right. It is compressing anyway. Even on 9.2 Release.



There are a lot of changes to /usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c between 10.1-RELEASE and 10.1-STABLE.  I've checked out STABLE and will try to duplicate the problem there.


----------



## Frank de Bot (Mar 17, 2015)

I have the same problem and I've posted it to the freebsd-stable mailinglist ( https://lists.freebsd.org/pipermail/freebsd-stable/2015-March/081907.html ) I've tried to find out what the problem could be, witnessed a few strange things. But I can't figure it out.

-update-

I forced my l2arc device to ashift 9. My pool is ashift 12. I removed the cache devices, set the vfs.zfs.max_auto_ashift sysctl(8) to 9 and created them again. I ran same tests for several days and it didn't get to the 16.0E free space situation. Test I did before (same tests by writing/reading a lot of data just from and to the pool) it often occurred within hours (with smaller l2arc sizes even few dozen of minutes)


----------



## gkontos (Mar 26, 2015)

Do we know if this is fixed in 10.1-STABLE?


----------



## roper (Mar 27, 2015)

I had the same issue running 10.1-RELEASE. I removed and recreated the cache device while temporarily setting `sysctl vfs.zfs.min_auto_ashift=9` and `sysctl vfs.zfs.max_auto_ashift=9` to match the 512kb block size of the Intel SSD.  I was unsure of which sysctl to modify, setting both didn't cause a problem. The cache filled and now again shows 16.0E free which seems harmless. L2ARC  has given no bad checksums since recreated and is still reported as healthy by zfs-stats after 48 hours.


edit: and now 6 hours later I have to retract that.  Bad checksums and degraded L2ARC once again. Frustrating.


----------



## gkontos (Mar 28, 2015)

FreeBSD 10.1-STABLE does not fix the problem.


----------



## roper (Mar 30, 2015)

Well since the last boot it's been 48 hours. The L2ARC filled and has remained healthy according to zfs-stats for over 24 hours. The workload at present is write heavy copying of incompressible files. Last time it was also write heavy but much more compressible data.

`zfs-stats -a` returns:

```
------------------------------------------------------------------------
ZFS Subsystem Report Mon Mar 30 08:59:30 2015
------------------------------------------------------------------------

System Information:

Kernel Version: 1001000 (osreldate)
Hardware Platform: amd64
Processor Architecture: amd64

ZFS Storage pool Version: 5000
ZFS Filesystem Version: 5

FreeBSD 10.1-RELEASE-p6 #0: Tue Feb 24 19:00:21 UTC 2015 root
8:59AM  up 2 days, 15:13, 1 user, load averages: 0.22, 0.24, 0.24

------------------------------------------------------------------------

System Memory:

0.01% 3.57 MiB Active, 0.22% 71.44 MiB Inact
81.22% 25.27 GiB Wired, 0.01% 3.24 MiB Cache
18.54% 5.77 GiB Free, 0.00% 0 Gap

Real Installed: 32.00 GiB
Real Available: 99.84% 31.95 GiB
Real Managed: 97.39% 31.11 GiB

Logical Total: 32.00 GiB
Logical Used: 81.75% 26.16 GiB
Logical Free: 18.25% 5.84 GiB

Kernel Memory: 333.86 MiB
Data: 92.28% 308.09 MiB
Text: 7.72% 25.78 MiB

Kernel Memory Map: 31.11 GiB
Size: 72.60% 22.59 GiB
Free: 27.40% 8.53 GiB

------------------------------------------------------------------------

ARC Summary: (HEALTHY)
Memory Throttle Count: 0

ARC Misc:
Deleted: 10.29m
Recycle Misses: 336.23k
Mutex Misses: 453
Evict Skips: 33.49m

ARC Size: 76.60% 23.07 GiB
Target Size: (Adaptive) 76.60% 23.07 GiB
Min Size (Hard Limit): 12.50% 3.76 GiB
Max Size (High Water): 8:1 30.11 GiB

ARC Size Breakdown:
Recently Used Cache Size: 98.86% 22.81 GiB
Frequently Used Cache Size: 1.14% 268.56 MiB

ARC Hash Breakdown:
Elements Max: 1.29m
Elements Current: 99.56% 1.28m
Collisions: 3.12m
Chain Max: 6
Chains: 161.42k

------------------------------------------------------------------------

ARC Efficiency: 7.34m
Cache Hit Ratio: 57.06% 4.19m
Cache Miss Ratio: 42.94% 3.15m
Actual Hit Ratio: 47.12% 3.46m

Data Demand Efficiency: 97.99% 621.19k
Data Prefetch Efficiency: 68.95% 109.89k

CACHE HITS BY CACHE LIST:
Anonymously Used: 15.72% 658.13k
Most Recently Used: 37.09% 1.55m
Most Frequently Used: 45.48% 1.90m
Most Recently Used Ghost: 0.87% 36.26k
Most Frequently Used Ghost: 0.84% 35.27k

CACHE HITS BY DATA TYPE:
Demand Data: 14.54% 608.69k
Prefetch Data: 1.81% 75.77k
Demand Metadata: 67.99% 2.85m
Prefetch Metadata: 15.66% 655.72k

CACHE MISSES BY DATA TYPE:
Demand Data: 0.40% 12.50k
Prefetch Data: 1.08% 34.12k
Demand Metadata: 95.73% 3.02m
Prefetch Metadata: 2.79% 87.74k

------------------------------------------------------------------------

L2 ARC Summary: (HEALTHY)
Passed Headroom: 7.19m
Tried Lock Failures: 157.41k
IO In Progress: 200
Low Memory Aborts: 24
Free on Write: 8.74k
Writes While Full: 77.95k
R/W Clashes: 1
Bad Checksums: 0
IO Errors: 0
SPA Mismatch: 199.45m

L2 ARC Size: (Adaptive) 112.05 GiB
Header Size: 0.19% 214.16 MiB

L2 ARC Evicts:
Lock Retries: 110
Upon Reading: 0

L2 ARC Breakdown: 3.15m
Hit Ratio: 0.67% 21.09k
Miss Ratio: 99.33% 3.13m
Feeds: 300.96k

L2 ARC Buffer:
Bytes Scanned: 144.97 TiB
Buffer Iterations: 300.96k
List Iterations: 18.65m
NULL List Iterations: 6.86m

L2 ARC Writes:
Writes Sent: (FAULTED) 160.59k
Done Ratio: 100.00% 160.59k
Error Ratio: 0.00% 0

------------------------------------------------------------------------

File-Level Prefetch: (HEALTHY)

DMU Efficiency: 111.63m
Hit Ratio: 95.15% 106.21m
Miss Ratio: 4.85% 5.42m

Colinear: 5.42m
Hit Ratio: 0.01% 583
Miss Ratio: 99.99% 5.42m

Stride: 106.14m
Hit Ratio: 99.99% 106.14m
Miss Ratio: 0.01% 6.74k

DMU Misc:
Reclaim: 5.42m
Successes: 0.08% 4.44k
Failures: 99.92% 5.41m

Streams: 75.21k
+Resets: 2.96% 2.23k
-Resets: 97.04% 72.98k
Bogus: 0

------------------------------------------------------------------------

VDEV Cache Summary: 182.22k
Hit Ratio: 49.20% 89.66k
Miss Ratio: 41.23% 75.14k
Delegations: 9.56% 17.43k

------------------------------------------------------------------------

ZFS Tunables (sysctl):
kern.maxusers                           2380
vm.kmem_size                            33408720896
vm.kmem_size_scale                      1
vm.kmem_size_min                        0
vm.kmem_size_max                        1319413950874
vfs.zfs.arc_max                         32334979072
vfs.zfs.arc_min                         4041872384
vfs.zfs.arc_average_blocksize           8192
vfs.zfs.arc_meta_used                   1291794608
vfs.zfs.arc_meta_limit                  8083744768
vfs.zfs.l2arc_write_max                 8388608
vfs.zfs.l2arc_write_boost               8388608
vfs.zfs.l2arc_headroom                  2
vfs.zfs.l2arc_feed_secs                 1
vfs.zfs.l2arc_feed_min_ms               200
vfs.zfs.l2arc_noprefetch                0
vfs.zfs.l2arc_feed_again                1
vfs.zfs.l2arc_norw                      1
vfs.zfs.anon_size                       19696640
vfs.zfs.anon_metadata_lsize             0
vfs.zfs.anon_data_lsize                 0
vfs.zfs.mru_size                        23915642368
vfs.zfs.mru_metadata_lsize              195094016
vfs.zfs.mru_data_lsize                  23458551808
vfs.zfs.mru_ghost_size                  854230528
vfs.zfs.mru_ghost_metadata_lsize        479495680
vfs.zfs.mru_ghost_data_lsize            374734848
vfs.zfs.mfu_size                        6684672
vfs.zfs.mfu_metadata_lsize              1024
vfs.zfs.mfu_data_lsize                  68608
vfs.zfs.mfu_ghost_size                  14649464320
vfs.zfs.mfu_ghost_metadata_lsize        197552128
vfs.zfs.mfu_ghost_data_lsize            14451912192
vfs.zfs.l2c_only_size                   118107354112
vfs.zfs.dedup.prefetch                  1
vfs.zfs.nopwrite_enabled                1
vfs.zfs.mdcomp_disable                  0
vfs.zfs.dirty_data_max                  3430408601
vfs.zfs.dirty_data_max_max              4294967296
vfs.zfs.dirty_data_max_percent          10
vfs.zfs.dirty_data_sync                 67108864
vfs.zfs.delay_min_dirty_percent         60
vfs.zfs.delay_scale                     500000
vfs.zfs.prefetch_disable                0
vfs.zfs.zfetch.max_streams              8
vfs.zfs.zfetch.min_sec_reap             2
vfs.zfs.zfetch.block_cap                256
vfs.zfs.zfetch.array_rd_sz              1048576
vfs.zfs.top_maxinflight                 32
vfs.zfs.resilver_delay                  2
vfs.zfs.scrub_delay                     4
vfs.zfs.scan_idle                       50
vfs.zfs.scan_min_time_ms                1000
vfs.zfs.free_min_time_ms                1000
vfs.zfs.resilver_min_time_ms            3000
vfs.zfs.no_scrub_io                     0
vfs.zfs.no_scrub_prefetch               0
vfs.zfs.metaslab.gang_bang              131073
vfs.zfs.metaslab.fragmentation_threshold70
vfs.zfs.metaslab.debug_load             0
vfs.zfs.metaslab.debug_unload           0
vfs.zfs.metaslab.df_alloc_threshold     131072
vfs.zfs.metaslab.df_free_pct            4
vfs.zfs.metaslab.min_alloc_size         10485760
vfs.zfs.metaslab.load_pct               50
vfs.zfs.metaslab.unload_delay           8
vfs.zfs.metaslab.preload_limit          3
vfs.zfs.metaslab.preload_enabled        1
vfs.zfs.metaslab.fragmentation_factor_enabled1
vfs.zfs.metaslab.lba_weighting_enabled  1
vfs.zfs.metaslab.bias_enabled           1
vfs.zfs.condense_pct                    200
vfs.zfs.mg_noalloc_threshold            0
vfs.zfs.mg_fragmentation_threshold      85
vfs.zfs.check_hostid                    1
vfs.zfs.spa_load_verify_maxinflight     10000
vfs.zfs.spa_load_verify_metadata        1
vfs.zfs.spa_load_verify_data            1
vfs.zfs.recover                         0
vfs.zfs.deadman_synctime_ms             1000000
vfs.zfs.deadman_checktime_ms            5000
vfs.zfs.deadman_enabled                 1
vfs.zfs.spa_asize_inflation             24
vfs.zfs.txg.timeout                     5
vfs.zfs.vdev.cache.max                  16384
vfs.zfs.vdev.cache.size                 16777216
vfs.zfs.vdev.cache.bshift               16
vfs.zfs.vdev.trim_on_init               1
vfs.zfs.vdev.mirror.rotating_inc        0
vfs.zfs.vdev.mirror.rotating_seek_inc   5
vfs.zfs.vdev.mirror.rotating_seek_offset1048576
vfs.zfs.vdev.mirror.non_rotating_inc    0
vfs.zfs.vdev.mirror.non_rotating_seek_inc1
vfs.zfs.vdev.max_active                 1000
vfs.zfs.vdev.sync_read_min_active       10
vfs.zfs.vdev.sync_read_max_active       10
vfs.zfs.vdev.sync_write_min_active      10
vfs.zfs.vdev.sync_write_max_active      10
vfs.zfs.vdev.async_read_min_active      1
vfs.zfs.vdev.async_read_max_active      3
vfs.zfs.vdev.async_write_min_active     1
vfs.zfs.vdev.async_write_max_active     10
vfs.zfs.vdev.scrub_min_active           1
vfs.zfs.vdev.scrub_max_active           2
vfs.zfs.vdev.trim_min_active            1
vfs.zfs.vdev.trim_max_active            64
vfs.zfs.vdev.aggregation_limit          131072
vfs.zfs.vdev.read_gap_limit             32768
vfs.zfs.vdev.write_gap_limit            4096
vfs.zfs.vdev.bio_flush_disable          0
vfs.zfs.vdev.bio_delete_disable         0
vfs.zfs.vdev.trim_max_bytes             2147483648
vfs.zfs.vdev.trim_max_pending           64
vfs.zfs.max_auto_ashift                 12
vfs.zfs.min_auto_ashift                 12
vfs.zfs.zil_replay_disable              0
vfs.zfs.cache_flush_disable             0
vfs.zfs.zio.use_uma                     1
vfs.zfs.zio.exclude_metadata            0
vfs.zfs.sync_pass_deferred_free         2
vfs.zfs.sync_pass_dont_compress         5
vfs.zfs.sync_pass_rewrite               2
vfs.zfs.snapshot_list_prefetch          0
vfs.zfs.super_owner                     0
vfs.zfs.debug                           0
vfs.zfs.version.ioctl                   4
vfs.zfs.version.acl                     1
vfs.zfs.version.spa                     5000
vfs.zfs.version.zpl                     5
vfs.zfs.vol.mode                        1
vfs.zfs.trim.enabled                    1
vfs.zfs.trim.txg_delay                  32
vfs.zfs.trim.timeout                    30
vfs.zfs.trim.max_interval               1

------------------------------------------------------------------------
```
I expect the device will fault again but I don't have the knowledge to interpret much of that output.  I'm concerned about the

```
Writes Sent: (FAULTED)                160.59k
```
and the low memory aborts. The system has 32GB of RAM.

I'm also wondering if `zfs-stats` includes some Oracle specific code that could be reporting garbage on FreeBSD. I don't think `zpool status` listed the device as degraded last time but I'm waiting to verify that when/if it happens again.


----------



## gkontos (Apr 5, 2015)

I don't think that it is a zfs-stats bug. I am in exactly the same situation and running `zpool iostat -v` displays also the following:


```
capacity     operations    bandwidth
pool                  alloc   free   read  write   read  write
....
cache                     -      -      -      -      -      -
  gpt/cache0           645G  16.0E     10      2   139K   138K
  gpt/cache1           646G  16.0E     12      0   153K      0
```
Notice the 16.0E value.


----------



## roper (Apr 5, 2015)

```
capacity     operations    bandwidth
pool          alloc   free   read  write   read  write
....
cache             -      -      -      -      -      -
  gpt/cache0   225G  16.0E      0     38  3.62K  4.65M
```

[FONT=verdana]Here a week passed during which the reported size remained close to the actually size of my device (112G). The `zpool iostat -v` began reporting 16.0E free while the size reported was equal to the actual size. The reported size grew after that very gradually at 113G, then 125G during which time `zfs-stats` still reported L2ARC as healthy and no checksum errors. After about 6TB of writing to the pool the capacity reported by `zpool iostat -v` quickly jumped to the present value and `zfs-stats` now reports a degraded L2ARC along with bad checksums again.[/FONT]

The faulted writes reported by `zfs-stats` which I mentioned resolved itself and hasn't recurred.

`zpool status` doesn't list the checksum errors or list the device as faulted.


```
NAME          STATE     READ WRITE CKSUM
nas          ONLINE       0     0     0
raidz1-0     ONLINE       0     0     0
   gpt/hdd0  ONLINE       0     0     0
   gpt/hdd1  ONLINE       0     0     0
   gpt/hdd2  ONLINE       0     0     0
   gpt/hdd3  ONLINE       0     0     0
   gpt/hdd4  ONLINE       0     0     0
logs
mirror-1     ONLINE       0     0     0
   gpt/log0  ONLINE       0     0     0
   gpt/log1  ONLINE       0     0     0
cache
gpt/cache0   ONLINE       0     0     0
```


----------



## gkontos (Apr 5, 2015)

Same here, it does not list any checksum errors when zfs-stats report errors.


```
NAME                  STATE     READ WRITE CKSUM
storage               ONLINE       0     0     0
  raidz2-0            ONLINE       0     0     0
    multipath/disk1   ONLINE       0     0     0
    multipath/disk2   ONLINE       0     0     0
    multipath/disk25  ONLINE       0     0     0
    multipath/disk4   ONLINE       0     0     0
    multipath/disk5   ONLINE       0     0     0
    multipath/disk6   ONLINE       0     0     0
  raidz2-1            ONLINE       0     0     0
    multipath/disk7   ONLINE       0     0     0
    multipath/disk8   ONLINE       0     0     0
    multipath/disk9   ONLINE       0     0     0
    multipath/disk26  ONLINE       0     0     0
    multipath/disk11  ONLINE       0     0     0
    multipath/disk12  ONLINE       0     0     0
  raidz2-2            ONLINE       0     0     0
    multipath/disk13  ONLINE       0     0     0
    multipath/disk14  ONLINE       0     0     0
    multipath/disk15  ONLINE       0     0     0
    multipath/disk16  ONLINE       0     0     0
    multipath/disk17  ONLINE       0     0     0
    multipath/disk18  ONLINE       0     0     0
  raidz2-3            ONLINE       0     0     0
    multipath/disk19  ONLINE       0     0     0
    multipath/disk20  ONLINE       0     0     0
    multipath/disk21  ONLINE       0     0     0
    multipath/disk22  ONLINE       0     0     0
    multipath/disk23  ONLINE       0     0     0
    multipath/disk24  ONLINE       0     0     0
logs
  mirror-4            ONLINE       0     0     0
    gpt/zil0          ONLINE       0     0     0
    gpt/zil1          ONLINE       0     0     0
cache
  gpt/cache0          ONLINE       0     0     0
  gpt/cache1          ONLINE       0     0     0
spares
  multipath/disk3     AVAIL   
  multipath/disk27    AVAIL   
  multipath/disk28    AVAIL   
  multipath/disk10    AVAIL
```
 But the sizes are reported wrong:

```
cache                     -      -      -      -      -      -
  gpt/cache0           670G  16.0E      9     15   616K  1.95M
  gpt/cache1           671G  16.0E      9     15   622K  1.95M
```
Their actual capacity is 500GB each.


----------



## roper (Apr 6, 2015)

When it happens again I'm going to check the contents of `sysctl kstat.zfs.misc.arcstats.l2_cksum_bad` and others that may be relevant.


----------



## roper (Apr 8, 2015)

I finished copying data to the new server. The workload is no longer write intensive and the L2ARC seems fine now. There may be a problem, but it could be weeks before it happens again here.


----------



## gkontos (Apr 8, 2015)

roper said:


> I finished copying data to the new server. The workload is no longer write intensive and the L2ARC seems fine now. There may be a problem, but it could be weeks before it happens again here.


What is the difference of the new server?


----------



## roper (Apr 8, 2015)

gkontos said:


> What is the difference of the new server?



My old NAS is ZFS but has no separate log or cache devices. My new NAS does have redundant log SSD's and a separate cache SSD. It has had this issue cropping up while 9TB of data was being written to it. That is done and now the cache isn't using the entire SSD. The allocated portion of the drive is very slowly increasing but at this rate it will be at least two weeks before it fills.


----------



## gkontos (Apr 8, 2015)

Hm... Sounds like my situation exactly. I have 2X Intel SSDs that are partitioned for the OS, ZIL and CACHE. 


```
34  1172123501  ada0  GPT  (559G)
          34           6        - free -  (3.0K)
          40        1024     1  freebsd-boot  (512K)
        1064    33554432     2  freebsd-swap  (16G) ----> SWAP (striped)
    33555496    20971520     3  freebsd-zfs  (10G) ----> OS (mirror)
    54527016    67108864     4  freebsd-zfs  (32G) ----> ZIL (mirror)
  121635880  1048576000     5  freebsd-zfs  (500G) ----> CACHE (striped)
  1170211880     1911655        - free -  (933M)
```


----------



## gkontos (Jun 9, 2015)

I was informed that there is a relevant patch that solves this. However, we need to try it. The problem is that my server is full production and it is difficult to even reboot it. Is there anyone else with this problem that has a testing machine?

*Source*: https://reviews.freebsd.org/D2764?download=true

*EDIT:* The link became broken. I have contacted the developer.


----------



## gkontos (Jun 9, 2015)

UPDATED SOURCE: https://reviews.freebsd.org/D2764?download=true


----------



## User23 (Jun 10, 2015)

I still wait to see the error again on a 
FreeBSD 10.1-STABLE #1 r281486: Mon Apr 13

Heavy read, moderate write. Primary and secondarycache = metadata


```
# uptime
 8:35AM  up 28 days, 21:12, 1 user, load averages: 0.22, 0.45, 0.87

# zpool iostat -v 1

              capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
poolname     11.7T  4.62T    515     74  34.6M  2.15M

  raidz1    4.56T   903G    187     31  12.4M   911K
    da6         -      -    135     10  2.09M   185K
    da8         -      -    130     10  2.08M   184K
    da7         -      -    135     10  2.09M   185K
    da9         -      -    130     10  2.09M   184K
    da10        -      -    135     10  2.09M   185K
    da11        -      -    130     10  2.09M   184K

  raidz1    7.14T  3.73T    327     43  22.1M  1.26M
    da0         -      -    242     12  3.72M   262K
    da1         -      -    234     12  3.70M   261K
    da2         -      -    242     12  3.72M   262K
    da3         -      -    234     12  3.70M   261K
    da4         -      -    242     12  3.72M   262K
    da5         -      -    234     12  3.70M   261K

cache           -      -      -      -      -      -
  ada3      37.1G  37.4G     70      0   301K  25.2K
  ada2      37.2G  37.3G     70      0   301K  25.2K
----------  -----  -----  -----  -----  -----  -----

---

L2 ARC Summary: (HEALTHY)
    Passed Headroom:            64.21m
    Tried Lock Failures:            3.27m
    IO In Progress:                1.76m
    Low Memory Aborts:            771
    Free on Write:                77.82k
    Writes While Full:            3.38k
    R/W Clashes:                17.82k
    Bad Checksums:                0
    IO Errors:                0
    SPA Mismatch:                1.77k

L2 ARC Size: (Adaptive)                235.83    GiB
    Header Size:            1.71%    4.04    GiB

L2 ARC Breakdown:                1.73b
    Hit Ratio:            20.25%    350.00m
    Miss Ratio:            79.75%    1.38b
    Feeds:                    2.47m

L2 ARC Buffer:
    Bytes Scanned:                92.65    TiB
    Buffer Iterations:            2.47m
    List Iterations:            157.64m
    NULL List Iterations:            68.98m

L2 ARC Writes:
    Writes Sent:            100.00%    1.07m

---

kstat.zfs.misc.arcstats.l2_compress_successes: 21766856
kstat.zfs.misc.arcstats.l2_compress_zeros: 0
kstat.zfs.misc.arcstats.l2_compress_failures: 19
```


----------



## justin0 (Jun 19, 2015)

I applied the patch and have been using it for a week on a secondary server - (replica and backups). I removed the original l2arc device and added an 8G l2arc partition for it for testing. Looks ok to me so far. I am open to suggestions for further testing.


```
# zfs-stats -a | grep -v ^$
------------------------------------------------------------------------
ZFS Subsystem Report  Fri Jun 19 16:00:28 2015
------------------------------------------------------------------------
System Information:
  Kernel Version:  1001518 (osreldate)
  Hardware Platform:  amd64
  Processor Architecture:  amd64
  ZFS Storage pool Version:  5000
  ZFS Filesystem Version:  5
FreeBSD 10.1-STABLE #0 r284285M: Fri Jun 12 08:15:43 EDT 2015 root
4:00PM  up 7 days,  6:02, 1 user, load averages: 0.42, 0.71, 0.69
------------------------------------------------------------------------
System Memory:
  0.01%  13.75  MiB Active,  15.34%  19.12  GiB Inact
  83.62%  104.22  GiB Wired,  0.00%  0 Cache
  1.03%  1.28  GiB Free,  0.00%  4.00  KiB Gap
  Real Installed:  128.00  GiB
  Real Available:  99.95%  127.94  GiB
  Real Managed:  97.41%  124.63  GiB
  Logical Total:  128.00  GiB
  Logical Used:  84.06%  107.60  GiB
  Logical Free:  15.94%  20.40  GiB
Kernel Memory:  1.03  GiB
  Data:  97.33%  1021.76 MiB
  Text:  2.67%  28.07  MiB
Kernel Memory Map:  124.63  GiB
  Size:  76.82%  95.74  GiB
  Free:  23.18%  28.89  GiB
------------------------------------------------------------------------
ARC Summary: (HEALTHY)
  Memory Throttle Count:  0
ARC Misc:
  Deleted:  196.90m
  Recycle Misses:  80.61m
  Mutex Misses:  33.12k
  Evict Skips:  1.08b
ARC Size:  78.49%  97.04  GiB
  Target Size: (Adaptive)  78.51%  97.06  GiB
  Min Size (Hard Limit):  12.50%  15.45  GiB
  Max Size (High Water):  8:1  123.63  GiB
ARC Size Breakdown:
  Recently Used Cache Size:  93.92%  91.16  GiB
  Frequently Used Cache Size:  6.08%  5.90  GiB
ARC Hash Breakdown:
  Elements Max:  11.31m
  Elements Current:  46.72%  5.28m
  Collisions:  69.55m
  Chain Max:  11
  Chains:  811.88k
------------------------------------------------------------------------
ARC Efficiency:  3.01b
  Cache Hit Ratio:  90.91%  2.74b
  Cache Miss Ratio:  9.09%  273.80m
  Actual Hit Ratio:  60.11%  1.81b
  Data Demand Efficiency:  93.11%  459.72m
  Data Prefetch Efficiency:  1.87%  138.03m
  CACHE HITS BY CACHE LIST:
  Anonymously Used:  33.07%  905.65m
  Most Recently Used:  14.21%  389.22m
  Most Frequently Used:  51.91%  1.42b
  Most Recently Used Ghost:  0.17%  4.72m
  Most Frequently Used Ghost:  0.64%  17.63m
  CACHE HITS BY DATA TYPE:
  Demand Data:  15.63%  428.05m
  Prefetch Data:  0.09%  2.58m
  Demand Metadata:  50.23%  1.38b
  Prefetch Metadata:  34.04%  932.43m
  CACHE MISSES BY DATA TYPE:
  Demand Data:  11.57%  31.67m
  Prefetch Data:  49.47%  135.45m
  Demand Metadata:  29.48%  80.71m
  Prefetch Metadata:  9.49%  25.97m
------------------------------------------------------------------------
L2 ARC Summary: (HEALTHY)
  Passed Headroom:  36.23m
  Tried Lock Failures:  391.87k
  IO In Progress:  101
  Low Memory Aborts:  3.57k
  Free on Write:  4.02m
  Writes While Full:  415.37k
  R/W Clashes:  11.44k
  Bad Checksums:  0
  IO Errors:  0
  SPA Mismatch:  11.72k
L2 ARC Size: (Adaptive)  10.60  GiB
  Header Size:  0.19%  20.17  MiB
L2 ARC Evicts:
  Lock Retries:  2.04k
  Upon Reading:  12
L2 ARC Breakdown:  273.80m
  Hit Ratio:  1.17%  3.19m
  Miss Ratio:  98.83%  270.61m
  Feeds:  869.27k
L2 ARC Buffer:
  Bytes Scanned:  58.90  TiB
  Buffer Iterations:  869.27k
  List Iterations:  46.63m
  NULL List Iterations:  4.11m
L2 ARC Writes:
  Writes Sent:  100.00% 624.93k
------------------------------------------------------------------------
File-Level Prefetch: (HEALTHY)
DMU Efficiency:  1.79b
  Hit Ratio:  72.45%  1.30b
  Miss Ratio:  27.55%  492.68m
  Colinear:  492.68m
  Hit Ratio:  0.03%  123.77k
  Miss Ratio:  99.97%  492.56m
  Stride:  1.13b
  Hit Ratio:  100.00% 1.13b
  Miss Ratio:  0.00%  32.01k
DMU Misc:
  Reclaim:  492.56m
  Successes:  0.28%  1.38m
  Failures:  99.72%  491.18m
  Streams:  169.08m
  +Resets:  0.01%  18.56k
  -Resets:  99.99%  169.06m
  Bogus:  0
------------------------------------------------------------------------
VDEV cache is disabled
------------------------------------------------------------------------
ZFS Tunables (sysctl):
  kern.maxusers  8524
  vm.kmem_size  133821857792
  vm.kmem_size_scale  1
  vm.kmem_size_min  0
  vm.kmem_size_max  1319413950874
  vfs.zfs.trim.max_interval  1
  vfs.zfs.trim.timeout  30
  vfs.zfs.trim.txg_delay  32
  vfs.zfs.trim.enabled  1
  vfs.zfs.vol.unmap_enabled  1
  vfs.zfs.vol.mode  1
  vfs.zfs.version.zpl  5
  vfs.zfs.version.spa  5000
  vfs.zfs.version.acl  1
  vfs.zfs.version.ioctl  4
  vfs.zfs.debug  0
  vfs.zfs.super_owner  0
  vfs.zfs.sync_pass_rewrite  2
  vfs.zfs.sync_pass_dont_compress  5
  vfs.zfs.sync_pass_deferred_free  2
  vfs.zfs.zio.exclude_metadata  0
  vfs.zfs.zio.use_uma  1
  vfs.zfs.cache_flush_disable  0
  vfs.zfs.zil_replay_disable  0
  vfs.zfs.min_auto_ashift  9
  vfs.zfs.max_auto_ashift  13
  vfs.zfs.vdev.trim_max_pending  10000
  vfs.zfs.vdev.bio_delete_disable  0
  vfs.zfs.vdev.bio_flush_disable  0
  vfs.zfs.vdev.write_gap_limit  4096
  vfs.zfs.vdev.read_gap_limit  32768
  vfs.zfs.vdev.aggregation_limit  131072
  vfs.zfs.vdev.trim_max_active  64
  vfs.zfs.vdev.trim_min_active  1
  vfs.zfs.vdev.scrub_max_active  2
  vfs.zfs.vdev.scrub_min_active  1
  vfs.zfs.vdev.async_write_max_active  10
  vfs.zfs.vdev.async_write_min_active  1
  vfs.zfs.vdev.async_read_max_active  3
  vfs.zfs.vdev.async_read_min_active  1
  vfs.zfs.vdev.sync_write_max_active  10
  vfs.zfs.vdev.sync_write_min_active  10
  vfs.zfs.vdev.sync_read_max_active  10
  vfs.zfs.vdev.sync_read_min_active  10
  vfs.zfs.vdev.max_active  1000
  vfs.zfs.vdev.async_write_active_max_dirty_percent60
  vfs.zfs.vdev.async_write_active_min_dirty_percent30
  vfs.zfs.vdev.mirror.non_rotating_seek_inc1
  vfs.zfs.vdev.mirror.non_rotating_inc  0
  vfs.zfs.vdev.mirror.rotating_seek_offset1048576
  vfs.zfs.vdev.mirror.rotating_seek_inc  5
  vfs.zfs.vdev.mirror.rotating_inc  0
  vfs.zfs.vdev.trim_on_init  1
  vfs.zfs.vdev.cache.bshift  16
  vfs.zfs.vdev.cache.size  0
  vfs.zfs.vdev.cache.max  16384
  vfs.zfs.vdev.metaslabs_per_vdev  200
  vfs.zfs.txg.timeout  5
  vfs.zfs.space_map_blksz  4096
  vfs.zfs.spa_slop_shift  5
  vfs.zfs.spa_asize_inflation  24
  vfs.zfs.deadman_enabled  1
  vfs.zfs.deadman_checktime_ms  5000
  vfs.zfs.deadman_synctime_ms  1000000
  vfs.zfs.recover  0
  vfs.zfs.spa_load_verify_data  1
  vfs.zfs.spa_load_verify_metadata  1
  vfs.zfs.spa_load_verify_maxinflight  10000
  vfs.zfs.check_hostid  1
  vfs.zfs.mg_fragmentation_threshold  85
  vfs.zfs.mg_noalloc_threshold  0
  vfs.zfs.condense_pct  200
  vfs.zfs.metaslab.bias_enabled  1
  vfs.zfs.metaslab.lba_weighting_enabled  1
  vfs.zfs.metaslab.fragmentation_factor_enabled1
  vfs.zfs.metaslab.preload_enabled  1
  vfs.zfs.metaslab.preload_limit  3
  vfs.zfs.metaslab.unload_delay  8
  vfs.zfs.metaslab.load_pct  50
  vfs.zfs.metaslab.min_alloc_size  33554432
  vfs.zfs.metaslab.df_free_pct  4
  vfs.zfs.metaslab.df_alloc_threshold  131072
  vfs.zfs.metaslab.debug_unload  0
  vfs.zfs.metaslab.debug_load  0
  vfs.zfs.metaslab.fragmentation_threshold70
  vfs.zfs.metaslab.gang_bang  16777217
  vfs.zfs.free_max_blocks  -1
  vfs.zfs.no_scrub_prefetch  0
  vfs.zfs.no_scrub_io  0
  vfs.zfs.resilver_min_time_ms  3000
  vfs.zfs.free_min_time_ms  1000
  vfs.zfs.scan_min_time_ms  1000
  vfs.zfs.scan_idle  50
  vfs.zfs.scrub_delay  4
  vfs.zfs.resilver_delay  2
  vfs.zfs.top_maxinflight  32
  vfs.zfs.zfetch.array_rd_sz  1048576
  vfs.zfs.zfetch.block_cap  256
  vfs.zfs.zfetch.min_sec_reap  2
  vfs.zfs.zfetch.max_streams  8
  vfs.zfs.prefetch_disable  0
  vfs.zfs.delay_scale  500000
  vfs.zfs.delay_min_dirty_percent  60
  vfs.zfs.dirty_data_sync  67108864
  vfs.zfs.dirty_data_max_percent  10
  vfs.zfs.dirty_data_max_max  4294967296
  vfs.zfs.dirty_data_max  4294967296
  vfs.zfs.max_recordsize  1048576
  vfs.zfs.mdcomp_disable  0
  vfs.zfs.nopwrite_enabled  1
  vfs.zfs.dedup.prefetch  1
  vfs.zfs.l2c_only_size  10929157632
  vfs.zfs.mfu_ghost_data_lsize  61428802560
  vfs.zfs.mfu_ghost_metadata_lsize  31655070720
  vfs.zfs.mfu_ghost_size  93083873280
  vfs.zfs.mfu_data_lsize  7424311296
  vfs.zfs.mfu_metadata_lsize  126976
  vfs.zfs.mfu_size  8044302336
  vfs.zfs.mru_ghost_data_lsize  2597035008
  vfs.zfs.mru_ghost_metadata_lsize  5703695360
  vfs.zfs.mru_ghost_size  8300730368
  vfs.zfs.mru_data_lsize  81873646080
  vfs.zfs.mru_metadata_lsize  11627892224
  vfs.zfs.mru_size  94030831616
  vfs.zfs.anon_data_lsize  0
  vfs.zfs.anon_metadata_lsize  0
  vfs.zfs.anon_size  7489536
  vfs.zfs.l2arc_norw  1
  vfs.zfs.l2arc_feed_again  1
  vfs.zfs.l2arc_noprefetch  1
  vfs.zfs.l2arc_feed_min_ms  200
  vfs.zfs.l2arc_feed_secs  1
  vfs.zfs.l2arc_headroom  2
  vfs.zfs.l2arc_write_boost  8388608
  vfs.zfs.l2arc_write_max  8388608
  vfs.zfs.arc_meta_limit  33187028992
  vfs.zfs.arc_free_target  226534
  vfs.zfs.arc_shrink_shift  5
  vfs.zfs.arc_average_blocksize  8192
  vfs.zfs.arc_min  16593514496
  vfs.zfs.arc_max  132748115968
------------------------------------------------------------------------
```


----------



## Compizfox (Oct 31, 2015)

I think I'm affected by this issue, but I'm running FreeBSD 10.2-RELEASE.


```
L2 ARC Summary: (DEGRADED)
  Passed Headroom:  267.76m
  Tried Lock Failures:  333.09k
  IO In Progress:  1.35k
  Low Memory Aborts:  50
  Free on Write:  11.63k
  Writes While Full:  14.12k
  R/W Clashes:  83
  Bad Checksums:  6.06m
  IO Errors:  906.42k
  SPA Mismatch:  4.21b

L2 ARC Size: (Adaptive)  167.86  GiB
  Header Size:  0.19%  324.11  MiB

L2 ARC Evicts:
  Lock Retries:  26
  Upon Reading:  0

L2 ARC Breakdown:  74.65m
  Hit Ratio:  12.38%  9.24m
  Miss Ratio:  87.62%  65.41m
  Feeds:  4.35m

L2 ARC Buffer:
  Bytes Scanned:  279.66  TiB
  Buffer Iterations:  4.35m
  List Iterations:  277.98m
  NULL List Iterations:  1.51m

L2 ARC Writes:
  Writes Sent:  100.00% 95.42k
```

Apparently my L2ARC size is 167.86 GB, but that's impossible. The partition where this L2ARC is on, is only 30 GB.

I also have quite low hit ratios and high IO Errors and Bad Checksums.


----------



## justin0 (Nov 2, 2015)

The commits did not make it into 10.2-RELEASE, you need to be running 10.2-STABLE from at least about 3 or 4 weeks ago. Good Luck!


----------



## smerz (Dec 3, 2015)

Any update on this? Or perhaps a link to the MFC ?
Would be great to get thits one out of the way! ;-)


----------



## junovitch@ (Dec 4, 2015)

The above mentioned reviewed was closed in commit: https://reviews.FreeBSD.org/rS287099 on head.  The corresponding stable/10 commit was https://reviews.FreeBSD.org/rS287665 two months ago.


----------



## smerz (Dec 9, 2015)

Thank you!

We're currently testing FreeBSD 10.2-STABLE #0 r291769, will let you know how it goes.

Looking good so far but it's not been very long yet!


----------



## petehodur (Dec 29, 2015)

Hello

Is there any solution for 10.1 releng?


----------



## junovitch@ (Dec 29, 2015)

petehodur said:


> Hello
> 
> Is there any solution for 10.1 releng?


If you are asking with regards to official errata you can apply via freebsd-update(8), then no, there hasn't been any errata issued for this particular issue (See https://www.FreeBSD.org/security/notices.html).


----------



## petehodur (Dec 30, 2015)

junovitch@ said:


> If you are asking with regards to official errata you can apply via freebsd-update(8), then no, there hasn't been any errata issued for this particular issue (See https://www.FreeBSD.org/security/notices.html).



hmmm, i have custom kernel and world rebuilded (/usr/bin/svnlite checkout https://svn.FreeBSD.org/base/releng/10.1/ /usr/src) which is +/- same as freebsd-update.

so it seems that l2arc will never works on 10.1 with pools with ashift=12. This is strange 

Does anybody knows, if is this fixed in 10.2 releng? Or it's best to wait for 10.3?


----------



## gkontos (Dec 30, 2015)

junovitch@ said:


> The above mentioned reviewed was closed in commit: https://reviews.FreeBSD.org/rS287099 on head.  The corresponding stable/10 commit was https://reviews.FreeBSD.org/rS287665 two months ago.


If I understand correctly it was committed to 10.1-STABLE two months ago. At least in the version that I run I have not run into that problem again. Do you know if it was not included for some reason in 10.2-RELEASE?


----------



## junovitch@ (Jan 1, 2016)

gkontos said:


> If I understand correctly it was committed to 10.1-STABLE two months ago. At least in the version that I run I have not run into that problem again. Do you know if it was not included for some reason in 10.2-RELEASE?


10.2-RELEASE was tagged from releng/10.2 at SVN r286666.  It would have been too soon to have had this fix.


----------



## petehodur (Jan 1, 2016)

Ok, what is the best way to fix this issue if we want to stay with 10.1/10.2 releng?


----------

