# Nginx + FUSE stalls with 'grbmaw'



## ScopeDog (May 28, 2020)

Hi.
I'm developing a distributed file system with FUSE (+ fusefs-libs) on FreeBSD 12-Release and Stable. However while Nginx is accessing the FUSE mounted file system, it stalls with 'grbmaw' status after working correctly for a while.
This grbmaw status seems to be related to virtual memory and be used by vm_page_busy_sleep(). My FUSE program is just waiting for the next command at fuse_session_loop_mt().
I need to reboot my server if this happens, but it doesn't always shutdown and I need to manually turn it off. (This is actually pain for me working from home now.)

Does anybody know what's going on with it?


----------



## ScopeDog (May 29, 2020)

I guess it's better to send-pr this. It seems to be fuse.ko related and a userland program should never cause vm stall.


----------



## bigbrother (Aug 29, 2020)

I can report that does not happen only with fuse but also with NFS filesystem.

short topic: nginx stuck will sending a file from an NFS mounted directory

nginx stuck:
# ps axuwww | grep nginx
www        12593   0.0  0.1   19552    5980  -  D    03:04       0:00.30 nginx: worker process (nginx)
www        12594   0.0  0.1   19356    5752  -  D    03:04       0:01.35 nginx: worker process (nginx)


procstat -kka | grep nginx
12593 100132 nginx               -                   mi_switch+0xe2 sleepq_wait+0x2c _sleep+0x247 vm_page_busy_sleep+0x8f vm_page_grab_pages+0x417 allocbuf+0x34a getblkx+0x5c4 breadn_flags+0x3d vfs_bio_getpages+0x323 ncl_getpages+0x2be VOP_GETPAGES_APV+0x7c vop_stdgetpages_async+0x49 VOP_GETPAGES_ASYNC_APV+0x7c vnode_pager_getpages_async+0x7e vn_sendfile+0xd9c sendfile+0x12b amd64_syscall+0x364 fast_syscall_common+0x101
12594 101754 nginx               -                   mi_switch+0xe2 sleepq_wait+0x2c _sleep+0x247 vm_page_busy_sleep+0x8f vm_page_grab_pages+0x417 allocbuf+0x34a getblkx+0x5c4 breadn_flags+0x3d vfs_bio_getpages+0x323 ncl_getpages+0x2be VOP_GETPAGES_APV+0x7c vop_stdgetpages_async+0x49 VOP_GETPAGES_ASYNC_APV+0x7c vnode_pager_getpages_async+0x7e vn_sendfile+0xd9c sendfile+0x12b amd64_syscall+0x364 fast_syscall_common+0x101



procstat -t 12593
  PID    TID COMM                TDNAME              CPU  PRI STATE   WCHAN
12593 100132 nginx               -                    -1  120 sleep   grbmaw
root@bigb5:/ # procstat -t 12594
  PID    TID COMM                TDNAME              CPU  PRI STATE   WCHAN
12594 101754 nginx               -                    -1  120 sleep   grbmaw





This happened while I was download a file from my nginx server that is 500 MB, with the file belonging to an NFS mounted directory. After some minutes, all NFS accesses to this directory stalled, with vfs_busy:

#ps axuww | grep df
root       86591   0.0  0.0   11292    2216  1  DN   21:26       0:00.00 df -h
root       86599   0.0  0.0   11432    2368  1  SN+  21:27       0:00.00 grep df
root       85320   0.0  0.0   11292    2220  6  DN   21:23       0:00.00 df -h
# procstat -t  86591
  PID    TID COMM                TDNAME              CPU  PRI STATE   WCHAN
86591 100591 df                  -                    -1  255 sleep   vfs_busy

The NFS server was operating succesfully for the other machines on the LAN. The problem existed only on this machine and only with this mounted directory. I could access other NFS mounted directories on this machine without any problem.


FreeBSD XXXX 12.1-RELEASE-p8 FreeBSD 12.1-RELEASE-p8 GENERIC  amd64

# kldstat
Id Refs Address                Size Name
 1   48 0xffffffff80200000  2448f20 kernel
 2    1 0xffffffff82649000     2ca0 coretemp.ko
 3    3 0xffffffff8264c000    49ba8 ipfw.ko
 4    1 0xffffffff82a21000     cb50 geom_eli.ko
 5    1 0xffffffff82a2e000     88d8 tmpfs.ko
 6    1 0xffffffff82a37000     18a0 uhid.ko
 7    1 0xffffffff82a39000     1aa0 wmt.ko
 8    1 0xffffffff82a3b000     19c8 ipdivert.ko
 9    1 0xffffffff82a3d000     2450 ipfw_nat.ko
10    1 0xffffffff82a40000     ac32 libalias.ko
11    1 0xffffffff82a4b000     1010 cpuctl.ko
12    3 0xffffffff82a4d000    529c8 vboxdrv.ko
13    2 0xffffffff82aa0000     2ce0 vboxnetflt.ko
14    2 0xffffffff82aa3000     9e30 netgraph.ko
15    1 0xffffffff82aad000     1710 ng_ether.ko
16    1 0xffffffff82aaf000     3f30 vboxnetadp.ko
17    1 0xffffffff82ab3000   2472e0 zfs.ko
18    1 0xffffffff82cfb000     7628 opensolaris.ko
19    1 0xffffffff82d03000     2940 nullfs.ko
20    1 0xffffffff82d06000     30c1 if_tap.ko




Unable to do anything else, I resorted to the unclean_reboot.c program that I have created
(that has only one function call in the main
return reboot(RB_NOSYNC);
)

because this is a remote server and there was a great possiblity of hanging during the shutdown. 

After reboot, I changed the sendfile to off on nginx. If a problem reappears, I will post a follow up, otherwise if you have any suggestion let me know.


----------



## ScopeDog (Aug 29, 2020)

This is very interesting. I recommend to post to https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246886.
Well, in the bugzilla thread, it is said that sendfile causes it. However, using 12.0R's kern_sendfile on 12.1R didn't solve it. 13-current seems to have totally new sendfile implemented by Netflix and I also recommend to try with 13-current.


----------



## CyberCr33p (Aug 30, 2020)

I had Nginx hangs too with sendfile enabled without using fuse.


----------



## glebius@ (Sep 21, 2020)

> 13-current seems to have totally new sendfile implemented by Netflix and I also recommend to try with 13-current.

The new sendfile appeared in 12.0-RELEASE. There are some differencies between 12 and 13 of course, but not substantial.


----------

