# mbufs leaks



## ironudjin (Jan 18, 2021)

Hello,

I have havy loeaded web-server. Usually it is ~5000 conn/s and 100 Mbit/s of traffic. Once per two weeks the server stop responding on network connections.
In dmesg I see:

```
[zone: mbuf] kern.ipc.nmbufs limit reached
```
`netstat -m` shows me that ~19Gb of RAM used by network:


```
78365279/16/78365295 mbufs in use (current/cache/total)
43996/18104/62100/12244576 mbuf clusters in use (current/cache/total/max)
2429/16 mbuf+clusters out of packet secondary zone in use (current/cache)
12584/2975/15559/6122288 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/1814011 9k jumbo clusters in use (current/cache/total/max)
0/0/0/1020381 16k jumbo clusters in use (current/cache/total/max)
[B]19729647K[/B]/48112K/19777759K bytes allocated to network (current/cache/total)
42375955/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
404032/0/33003 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
1899771 sendfile syscalls
399 sendfile syscalls completed without I/O request
1004979 requests for I/O initiated by sendfile
7900311 pages read by sendfile as part of a request
399 pages were valid at time of a sendfile request
0 pages were valid and substituted to bogus page
0 pages were requested for read ahead by applications
0 pages were read ahead by sendfile
0 times sendfile encountered an already busy page
0 requests for sfbufs denied
0 requests for sfbufs delayed
```

`vmstat -z | grep -E '^ITEM|mbuf'`:


```
ITEM                   SIZE  LIMIT     USED     FREE      REQ FAIL SLEEP
mbuf_packet:            256, 78365295,    2360,      12,1353222657,   0,39166
mbuf:                   256, 78365295,78362923,       0,27304412405,45796642,434120
mbuf_cluster:          2048, 12244576,   44213,   17887,1529376894,   0,   0
mbuf_jumbo_page:       4096, 6122288,   12388,    3306,4741583680,   0,   0
mbuf_jumbo_9k:         9216, 1814011,       0,       0,       0,   0,   0
mbuf_jumbo_16k:       16384, 1020381,       0,       0,       0,   0,   0
```

OS: FreeBSD 12.2-STABLE 41cf333f9b2a(stable/12)-dirty: Sat Jan  2 01:49:01 EET 2021

/boot/loader.conf

```
kern.geom.label.disk_ident.enable="0"
kern.geom.label.gptid.enable="0"
opensolaris_load="YES"
zfs_load="YES"

accf_http_load="YES"
accf_data_load="YES"
autoboot_delay="7"
ioat_load="YES"
cc_htcp_load="YES"
imcsmb_load="YES"
aesni_load="YES"
fuse_load="YES"
tcp_rack_load="YES"

# zfs
vfs.zfs.vdev.cache.size=0
vfs.zfs.arc_max=32G
#vfs.zfs.arc_max=64G

# syncache tuning
net.inet.tcp.syncache.hashsize=1024
net.inet.tcp.syncache.bucketlimit=100
net.inet.tcp.syncache.cachelimit=1048576

# hostcache tuning
net.inet.tcp.hostcache.hashsize=4096
net.inet.tcp.hostcache.bucketlimit=100
net.inet.tcp.hostcache.cachelimit=65536

net.link.ifqmaxlen=2048

kern.ipc.shmseg=10240
kern.ipc.shmmni=10240

net.inet.tcp.tcbhashsize=65536
hw.intr_storm_threshold=32000
kern.msgbufsize=262144
kern.ipc.nmbclusters=0
net.inet.tcp.soreceive_stream=1

# disable ARC compression
#vfs.zfs.compressed_arc_enabled=0

cpu_microcode_load="YES"
cpu_microcode_name="/boot/firmware/intel-ucode.bin"

boot_multicons="YES"
boot_serial="YES"
comconsole_speed="115200"
console="comconsole"
comconsole_port="0x2f8"
```

/etc/sysctl.conf

```
vfs.usermount=1
security.bsd.see_other_uids=1
security.bsd.see_other_gids=1
security.bsd.see_jail_proc=0
security.bsd.unprivileged_read_msgbuf=0
security.bsd.unprivileged_proc_debug=0
vfs.zfs.min_auto_ashift=12

net.inet.ip.redirect=0
net.inet.icmp.drop_redirect=1
net.inet.icmp.log_redirect=0

vfs.zfs.prefetch_disable=1

kern.ipc.somaxconn=65535

net.inet.tcp.maxtcptw=102400

# maximum number of interrupts per second on any interrupt level
# (vmstat -i for total rate). If you still see Interrupt Storm detected messages,
# increase the limit to a higher number and look for the culprit. (default 1000)
hw.intr_storm_threshold=12000

kern.ipc.maxsockbuf=33554432
kern.maxvnodes=8000000

net.inet.tcp.cc.algorithm=htcp
net.inet.tcp.cc.htcp.adaptive_backoff=1
net.inet.tcp.cc.htcp.rtt_scaling=1
net.inet.icmp.icmplim=5000

net.inet.tcp.tso=1

vfs.zfs.txg.timeout=2
vfs.zfs.trim.txg_delay=3
# for NVMe
vfs.zfs.delay_min_dirty_percent=95
vfs.zfs.dirty_data_max=12884901888
vfs.zfs.top_maxinflight=128
vfs.zfs.vdev.aggregation_limit=524288
vfs.zfs.vdev.scrub_max_active=3
vm.lowmem_period=0

net.inet.tcp.fast_finwait2_recycle=1

net.inet.tcp.delayed_ack=1
net.inet.tcp.delacktime=100

net.inet.tcp.blackhole=0
net.inet.udp.blackhole=1
kern.ipc.maxsockbuf=2097152
net.inet.udp.maxdgram=57344
net.inet.ip.intr_queue_maxlen=5000
kern.ipc.shmmax=2147483648
kern.ipc.maxsockbuf=83886080
net.route.netisr_maxqlen=4096

net.inet.tcp.maxtcptw=3149624
net.inet.tcp.nolocaltimewait=1

net.inet.ip.portrange.first=1024
net.inet.ip.portrange.last=65535
net.inet.ip.portrange.randomized=0
net.inet.tcp.msl=15000
net.inet.tcp.path_mtu_discovery=1
net.inet.tcp.drop_synfin=1
net.inet.ip.process_options=0

kern.corefile="/var/tmp/%U.%N.core"

kern.ipc.shm_use_phys=1

net.inet.tcp.rfc3390=1

kern.ipc.shm_allow_removed=1
net.inet.tcp.sendspace=65536
net.inet.tcp.sendbuf_inc=32768
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.sendbuf_auto=1
net.inet.tcp.recvspace=32768
net.inet.tcp.recvbuf_max=16777216
net.inet.tcp.recvbuf_inc=8192
net.inet.tcp.recvbuf_auto=1

# for localhost
net.inet.raw.maxdgram=16384
net.inet.raw.recvspace=16384
net.local.stream.sendspace=163840    # lo0 mtu 16384 x 10
net.local.stream.recvspace=163840    # lo0 mtu 16384 x 10
net.local.dgram.maxdgram=65535

net.inet.tcp.fastopen.server_enable=1
vfs.timestamp_precision=0
vfs.read_max=128
kern.sync_on_panic=1
net.inet.tcp.hostcache.expire=1200
net.inet.tcp.keepinit=5000
net.inet.tcp.ecn.enable=1          # explicit congestion notification (ecn) warning: some ISP routers abuse ECN (default 0)
net.inet.tcp.mssdflt=1460
net.inet.tcp.cc.abe=1
net.inet.tcp.minmss=536
net.inet.ip.maxfragpackets=1024
net.inet.ip.maxfragsperpacket=16
net.inet.tcp.abc_l_var=44
net.inet.tcp.initcwnd_segments=44
net.inet.tcp.delacktime=20

net.inet.tcp.rfc6675_pipe=1

vm.swap_idle_enabled=1

# tune for postgres
vfs.zfs.metaslab.lba_weighting_enabled=0
kern.ipc.shmall=2097152
kern.ipc.shmmax=17179877376

# nfsv4
vfs.nfs.enable_uidtostring=1
vfs.nfsd.enable_stringtouid=1
vfs.nfsd.issue_delegations=1
vfs.nfsd.enable_locallocks=1
#vfs.nfsd.async=1
vfs.nfs.nfs_directio_enable=1

# dump cores
#kern.sugid_coredump=1

dev.ixl.0.iflib.rx_budget=65535
dev.ixl.1.iflib.rx_budget=65535

net.inet.tcp.functions_default=rack
net.inet.tcp.syncookies=0
```
How can I find where my mbufs leaks out? Is there something wrong with my settings?

Thank you!


----------



## suntzu00 (Jan 18, 2021)

https://wiki.freebsd.org/DevSummit/20170907/ZFS "ZFS currently does not play nicely with sendfile". maybe disable sendfile from the webserver for a while and see what happens?


----------



## ironudjin (Jan 19, 2021)

suntzu00 said:


> https://wiki.freebsd.org/DevSummit/20170907/ZFS "ZFS currently does not play nicely with sendfile". maybe disable sendfile from the webserver for a while and see what happens?


Thank you for suggestion. I've disabled sendfile() but mbuf counter still grow up.
Here is output of simple script which calculates mbuf counter diff in realtime:


> 10:35:43: +1208
> 10:35:44: +928
> 10:35:45: +1020
> 10:35:46: +1036
> ...


Here is `netstat -m` 7 hours after reboot:


> 2457785/24910/2482695 mbufs in use (current/cache/total)
> 27140/22858/49998/12244576 mbuf clusters in use (current/cache/total/max)
> 2543/12890 mbuf+clusters out of packet secondary zone in use (current/cache)
> 2917/16627/19544/6122288 4k (page size) jumbo clusters in use (current/cache/total/max)
> ...


----------



## suntzu00 (Jan 19, 2021)

I would start by clearing up that sysctl.conf file. for example kern.ipc.maxsockbuf appears 3 times and you don't know what value is actually being used. Look up all the sysctl variables and what they do especially combined with other variables(some combos can lead to increase in memory usage). after you have a base sysctl.conf  starting adding in stuff bit by bit and test things properly.


----------



## ironudjin (Jan 20, 2021)

suntzu00 said:


> I would start by clearing up that sysctl.conf file. for example kern.ipc.maxsockbuf appears 3 times and you don't know what value is actually being used. Look up all the sysctl variables and what they do especially combined with other variables(some combos can lead to increase in memory usage). after you have a base sysctl.conf  starting adding in stuff bit by bit and test things properly.


I cleaned up sysctl.conf. Mbufs leaks but slower then before:

```
18:07:24: 231 mbufs: 1568222
18:07:25: 89 mbufs: 1568311
18:07:26: 124 mbufs: 1568435
18:07:27: -91 mbufs: 1568344
18:07:28: 188 mbufs: 1568532
18:07:29: 826 mbufs: 1569358
18:07:30: -724 mbufs: 1568634
18:07:31: 233 mbufs: 1568867
18:07:32: 171 mbufs: 1569038
18:07:33: -2 mbufs: 1569036
18:07:34: 446 mbufs: 1569482
18:07:35: -431 mbufs: 1569051
18:07:36: 284 mbufs: 1569335
18:07:37: 140 mbufs: 1569475
18:07:38: -44 mbufs: 1569431
18:07:40: 65 mbufs: 1569496
18:07:41: 50 mbufs: 1569546
18:07:42: -8 mbufs: 1569538
18:07:43: 115 mbufs: 1569653
18:07:44: 137 mbufs: 1569790
18:07:45: -18 mbufs: 1569772
18:07:46: 36 mbufs: 1569808
18:07:47: 194 mbufs: 1570002
18:07:48: 53 mbufs: 1570055
18:07:49: -84 mbufs: 1569971
18:07:50: 93 mbufs: 1570064
18:07:51: 174 mbufs: 1570238
18:07:52: -7 mbufs: 1570231
18:07:53: 268 mbufs: 1570499
```
For mbufs used almost 500Mb of RAM with constant network activity ~5000 con/s for ~14 hours uptime. I think it's not normal behaviour.
Is there way to debug mbuf consumption?


----------



## ironudjin (Jan 22, 2021)

I've found where leaks come from.
I had kernel built with:

```
makeoptions WITH_EXTRA_TCP_STACKS=1
options TCPHPTS
```
... and those options was leading for mbufs leaks.

Here is `netstat -m` after ~12 hours uptime:

```
30465/30705/61170 mbufs in use (current/cache/total)
26801/19279/46080/12244586 mbuf clusters in use (current/cache/total/max)
2063/12105 mbuf+clusters out of packet secondary zone in use (current/cache)
1439/7700/9139/6122293 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/1814012 9k jumbo clusters in use (current/cache/total/max)
0/0/0/1020382 16k jumbo clusters in use (current/cache/total/max)
66974K/77034K/144008K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0 sendfile syscalls
0 sendfile syscalls completed without I/O request
0 requests for I/O initiated by sendfile
0 pages read by sendfile as part of a request
0 pages were valid at time of a sendfile request
0 pages were valid and substituted to bogus page
0 pages were requested for read ahead by applications
0 pages were read ahead by sendfile
0 times sendfile encountered an already busy page
0 requests for sfbufs denied
0 requests for sfbufs delayed
```

mbufs count in dynamic:

```
14:39:58: 229 mbufs: 28739
14:39:59: -253 mbufs: 28486
14:40:00: -3 mbufs: 28483
14:40:01: -128 mbufs: 28355
14:40:02: -241 mbufs: 28114
14:40:03: 97 mbufs: 28211
14:40:04: 36 mbufs: 28247
14:40:05: -188 mbufs: 28059
14:40:06: -172 mbufs: 27887
14:40:07: 54 mbufs: 27941
14:40:08: -32 mbufs: 27909
14:40:09: 86 mbufs: 27995
14:40:10: -39 mbufs: 27956
14:40:11: 47 mbufs: 28003
14:40:12: 56 mbufs: 28059
14:40:13: 152 mbufs: 28211
14:40:14: 30 mbufs: 28241
14:40:15: -14 mbufs: 28227
14:40:16: -90 mbufs: 28137
14:40:17: -157 mbufs: 27980
14:40:18: 57 mbufs: 28037
14:40:19: -2 mbufs: 28035
14:40:20: -372 mbufs: 27663
14:40:21: 15 mbufs: 27678
14:40:22: -59 mbufs: 27619
14:40:23: 196 mbufs: 27815
14:40:24: -58 mbufs: 27757
14:40:25: -3 mbufs: 27754
14:40:26: 133 mbufs: 27887
```

PR 252913


----------



## suntzu00 (Jan 22, 2021)

Good stuff! I'm glad you figured it out! I think _tcp_rack_ and _tcp_bbr _have yet to pass the test of time


----------

