# nfsd not working when restarting it (as opposed to kill -HUP or reload)



## peetaur (Sep 30, 2011)

If I run this command:
`# /etc/rc.d/rpcbind restart ; /etc/rc.d/mountd restart ; /etc/rc.d/nfsd restart`

NFS mounts stop working.

As soon as one of the exports are mounted by a remote system, the nfsd process takes about 130% CPU (shown as WCPU in top). Sometimes STATE is "RUN", but most of the time it is "rpcsvc".


```
PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
30459 root        4  45    0  5828K  1028K RUN    15   6:54 140.58% nfsd
```


```
PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
30459 root        4  45    0  5828K  1028K rpcsvc 14   9:40 141.02% nfsd
```

in io mode, top shows over 800,000 under the VCSW column


```
PID USERNAME   VCSW  IVCSW   READ  WRITE  FAULT  TOTAL PERCENT COMMAND
30459 root     972777      0      0      0      0      0   0.00% nfsd
```

vmstat also shows over 800,000 context switches.

`# vmstat 1`

```
procs      memory      page                    disks     faults         cpu
 r b w     avm    fre   flt  re  pi  po    fr  sr da0 da1   in   sy   cs us sy id
 1 0 0    335M    46G   369   2   3   0   455   0   0   0  152  914 318625  0  3 97
 2 0 0    335M    46G     3   0   1   0     0   0   0   0   26  115 1005671  0 10 90
 2 0 0    335M    46G     0   0   0   0     0   0   0   0   24  115 1005776  0  9 91
 1 0 0    335M    46G     0   0   0   0     0   0   0   0   26  115 973746  0  9 91
 2 0 0    335M    46G     0   0   0   0     0   0   0   0   30  120 914220  0  8 92
 2 0 0    335M    46G     0   0   0   0   115   0   0   0  134  115 967830  0  8 92
```

Same result with:

`$ uname -a`

```
FreeBSD bcnas1.bc.local 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Thu Feb 17 02:41:51 UTC 2011     root@mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  amd64
```


```
FreeBSD bcnas1.bc.local 8.2-STABLE FreeBSD 8.2-STABLE #0: Thu Sep 29 15:06:03 CEST 2011     root@bcnas1.bc.local:/usr/obj/usr/src/sys/GENERIC  amd64
```


```
FreeBSD bcnas1.bc.local 8.2-STABLE FreeBSD 8.2-STABLE #0: Thu Sep 29 15:06:03 CEST 2011     root@bcnas1.bc.local:/usr/obj/usr/src/sys/GENERIC  amd64
```


```
FreeBSD bcnas1bak.bc.local 8.3-STABLE FreeBSD 8.3-STABLE #2: Thu Jun  7 12:15:33 CEST 2012     root@bcnas1bak.bc.local:/usr/obj/usr/src/sys/GENERIC  amd64
```


While this is happening, gstat shows all my disks basically idle.

Can someone please tell me why it would do that?

And how would I fix this problem without rebooting?

Thanks


----------



## peetaur (Nov 2, 2011)

Update: I think this might be related to doing this:

`# zfs set snapdir=visible tank`

And then sharing some sub-datasets over NFS.

Here is a related log entry:

```
Oct 31 14:55:39 bcnas1 mountd[47733]: can't delete exports for /tank/fs1/.zfs/snapshot/daily-2011-10-06T09:27:52: Invalid argument
```

The log entry shows that NFS is somehow incompatible with the snapshot directories.

I figured that out after there was a complete hang of that volume (tank/fs1) when anyone viewed the .zfs/snapshot directory. Long ago, this did not cause this problem. Instead, before, the snapshots would just look wrong, with the wrong files, or the snapshot directories would be binary files. But recently with more snapshots, it hangs until reboot. When rebooting, I think nfsd failed to stop properly, saying something about a snapshot directory, so I blamed nfsd, which reminded me of this thread I started.


----------



## peetaur (Nov 3, 2011)

Update: Hiding snapdir does not prevent nfsd from failing on a restart. 

And on the second system, which I have set up the same way, but with no clients using NFS, nfsd restart works fine.


----------



## peetaur (Jun 11, 2012)

I opened this PR: http://www.freebsd.org/cgi/query-pr.cgi?pr=168942


----------

