# Occasional ZFS hangs



## walterheukels (Jun 23, 2014)

We've been running NFS over ZFS on a couple of fairly busy servers for about a month now, and in general we're very happy with the setup. However I've had two systems become completely unresponsive, to the point where there's no I/O to the zpool and commands like `zpool status` or `zfs list` just hang. There was nothing in the logs, literally not one relevant message.

As I don't currently have anything to go on, my question is this: if this occurs again, what should I do to troubleshoot the problem before rebooting?  Are there any statistics that I should be gathering in the meantime?  Any suggestions would be appreciated.

We're running 10-STABLE from May 2014.


----------



## SirDice (Jun 23, 2014)

Did you read this? 

https://wiki.freebsd.org/ZFSTuningGuide#NFS_tuning

And how much memory do those machines have? And how big are your zpools?


----------



## wblock@ (Jun 23, 2014)

If you can, try updating one of those systems to a recent 10-stable.  There have been some memory fixes lately, possibly since the last update.  Leaving one at the older version will help to show if the problem is fixed.


----------



## walterheukels (Jun 30, 2014)

I've updated some servers to  r267917, but it's too early to tell if there's any difference.  From what we've been able to determine so far, it seems to be related to memory allocation called from the ARC code.


----------



## kfoda (Jul 28, 2014)

http://www.denninger.net/FreeBSD-Patches/arc-patch fixed ZFS stalls for me!
Also see http://www.freebsd.org/cgi/query-pr.cgi?pr=187594&cat= , unfortunately not committed yet.

Cheers,


----------



## belon_cfy (Jul 28, 2014)

walterheukels said:
			
		

> We've been running NFS over ZFS on a couple of fairly busy servers for about a month now, and in general we're very happy with the setup. However I've had two systems become completely unresponsive, to the point where there's no I/O to the zpool and commands like `zpool status` or `zfs list` just hang. There was nothing in the logs, literally not one relevant message.
> 
> As I don't currently have anything to go on, my question is this: if this occurs again, what should I do to troubleshoot the problem before rebooting?  Are there any statistics that I should be gathering in the meantime?  Any suggestions would be appreciated.
> 
> We're running 10-STABLE from May 2014.



I'm on the same boat as you, my FreeBSD 10 server has the same symptom as well even updated to the latest version.

Moving back to FreeBSD 9.3.


----------



## noons (Sep 20, 2014)

Did the downgrade actually resolve the issue? Running 10-stable at the moment (new install) and hitting the exact same issue. Also see: https://bugs.freebsd.org/bugzilla/show_ ... ?id=187594 By the way it sounds this is a long standing issue (could be wrong though..)


----------



## belon_cfy (Sep 22, 2014)

noons said:
			
		

> Did the downgrade actually resolve the issue? Running 10-stable at the moment (new install) and hitting the exact same issue. Also see: https://bugs.freebsd.org/bugzilla/show_ ... ?id=187594 By the way it sounds this is a long standing issue (could be wrong though..)


Not entirely resolve, but my FreeBSD 9.3 server slightly better than 10 in term of stability, at least now I was experienced one down time with completely no IO on the server when during heavy read write IO ( same as FreeBSD 10 as well) . I'm not sure whether is it related to amount of free memory because my FreeBSD 9.3 server memory getting lesser, however the ARC size is dropping from 100% to 93% , and it will eventually drop to 60% or less.

L2ARC remains the same.


----------



## walterheukels (Oct 8, 2014)

After much experimentation and troubleshooting, we've found that this problem is limited to zvol's.  After moving our data from zvol's into files things have been stable.


----------

