# Swap problem?  Memory management problem?



## wxppro (Dec 2, 2020)

Sorry if this is against forum rules - I have posted here first before my post was relocated to "Networking."  However, as I gather more information, it does not look like a networking issue, but more like a FreeBSD system issue.  This is the original post "FreeBSD 12.2 - Mysterious offline problem:"









						FreeBSD 12.2 - Mysterious offline problem
					

Hello fellow FreeBSD users and experts,  I have been using FreeBSD for some time now.  I have some background in Unix and I found it very familiar in many ways.  In comparison to Linux, I like FreeBSD being very straightforward.  I am using FreeBSD 12.2 for a home server and I ran into a strange...




					forums.freebsd.org
				




System outage happens quite frequently, 2~3 times a day, each time it may last 15 minutes to one hour.  The system is like "frozen" during the outage period.  There is no network connectivity, but the more troubling issue is that, scheduled tasks are slowed down during the outage time period.  I have scripts that are supposed to record information every 2 minutes or 5 minutes, but the intervals become much longer during the outage time period.

Is this a swap problem?  Is this a memory management problem?  I need your help.  What information should I collect to figure out what is going on?  This is puzzling.  Thanks in advance.


----------



## richardtoohey2 (Dec 2, 2020)

I have this running as a cron job so you could try this or something similar.  Every minute might not be granular enough.


```
@every_minute date >> /tmp/swap_usage.txt && top -n 100 >> /tmp/swap_usage.txt && pstat -sh >> /tmp/swap_usage.txt
```


----------



## wxppro (Dec 2, 2020)

Thanks.  My current cron job is every five minutes.  It records ifconfig, vmstat, and top.  I will remove ifconfig and change it to every minute.  Hope it will catch some information.  I am concerned that the job will just be delayed when the system enters this weird "freezing" mode.


----------



## richardtoohey2 (Dec 2, 2020)

Sorry, I missed the bit about your existing cron job; maybe the minute run will show something building up.

I think if this was a common problem there would be a lot more reports about it, so it _feels_ like it could be more of hardware problem. Re-reading your original post it sounds like there are lots of components/layers in your system so could be a difficult one to track down.


----------



## wxppro (Dec 3, 2020)

Had another outage roughly between 18:15 and 18:25 (based on ping logs on my router).

Checked the recorded logs on the problematic server.  As I have suspected, logs are 1 minute apart till the outage.  The last "normal" one was at 18:14:00.  The next one's time was 18:22:50.  Then 18:23:00, 18:24:00...

Hardware is unlikely the problem.  This server used to be Ubuntu with the same apps, and it has been stable for quite some time.  The problem started when I switched to FreeBSD.

I attached two log entries: 18:14:00 and 18:22:50.  Please let me know if you see something worth further investigation.  I noticed the 100% WCPU value so I looked at the syncthing log.  Then I realized that this outage must be related to syncthing - syncthing always restarted at every outage, more specifically, at the ending time of each outage.  Not sure if syncthing caused the outage, or something caused outage and subsequently, caused syncthing to restart.  A quick search suggests that it may be the culprit:









						Syncthing on FreeBSD 11.0 freezes machine · Issue #3733 · syncthing/syncthing
					

Starting syncthing on one of my FreeBSD server freezes machine, while syncthing runs without problems on other FreeBSD machines. During freeze, console and network connections block, no log message...




					github.com
				




But as one of the comments says, an app should not cause the OS to freeze.  Something is not right.

I will disable syncthing to see if outage still happens or not.


```
Wed Dec  2 18:14:00 EST 2020

procs     memory        page                    disks     faults         cpu
r b w     avm     fre  flt  re  pi  po    fr   sr mm0 mm0   in    sy    cs us sy id
0 0 1 542659736  133412    82   3   0   0   120  385   0   0  190  1055  1753  1  7 93

last pid:  6645;  load averages:  0.46,  0.51,  0.39  up 5+03:33:16    18:14:00
55 processes:  1 running, 52 sleeping, 2 zombie
CPU:  0.1% user,  0.7% nice,  2.5% system,  4.2% interrupt, 92.6% idle
Mem: 686M Active, 977M Inact, 72M Laundry, 1864M Wired, 130M Free
ARC: 667M Total, 240M MFU, 233M MRU, 866K Anon, 4331K Header, 189M Other
     304M Compressed, 395M Uncompressed, 1.30:1 Ratio
Swap: 8192M Total, 1810M Used, 6382M Free, 22% Inuse

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
84307 wxppro       17  30    9  1585M   829M uwait    0  47:57   6.54% syncthing
 1768 root         11  20    0   533M   140M kqread   3 166:51   1.07% bhyve
 1769 root         11  20    0   550M   108M kqread   0 108:32   1.07% bhyve
 6637 root          1  20    0    13M  2376K sbwait   3   0:00   0.10% ftpd
 6640 root          1  20    0    13M  2376K sbwait   2   0:00   0.10% ftpd
 1220 root          1  20    0    38M  2248K select   2  15:53   0.00% nmbd
 1707    921        3  20    0    26M  1288K kqread   2   3:25   0.00% transmission-daemon
  895 root          1  52    0    11M  1276K select   0   2:55   0.00% dhclient
 1270 root          1  20    0    11M   924K select   0   2:32   0.00% powerd
 1225 root          1  21    0   173M  3840K select   0   1:03   0.00% smbd
 1207 ntpd          1  20    0    19M  1472K select   1   0:54   0.00% ntpd
 1193 wxppro       11  33    0   708M  9444K uwait    1   0:34   0.00% syncthing
 1798 root          1  20    0    17M  1448K select   0   0:14   0.00% sendmail
 1304 root          1  20    0    17M  1800K select   0   0:14   0.00% sendmail
 7992 root          1  20    0    11M   620K wait     0   0:10   0.00% sh
 1311 root          1  20    0    11M  1344K nanslp   3   0:05   0.00% cron
 1805 root          1  20    0    11M   548K nanslp   0   0:04   0.00% cron
 1119 root          1  20    0    11M  1092K select   3   0:02   0.00% syslogd
 1873 root          1  20    0    12M  1308K select   2   0:02   0.00% ftpd
 1665 root          1  20    0    11M   648K select   2   0:02   0.00% syslogd
 1298 root          1  20    0   131M  3736K select   1   0:01   0.00% smbd
 1303 root          1  20    0   173M  3288K select   0   0:01   0.00% smbd
 1297 root          2  20    0   133M  2828K select   0   0:01   0.00% smbd
 5252 root          1  20    0    13M  2344K sbwait   2   0:01   0.00% ftpd
  970 root          1  20    0    10M  1080K select   2   0:00   0.00% devd
  945 _dhcp         1  20    0    12M  1340K select   3   0:00   0.00% dhclient
 1192 wxppro        1  20    0    11M   884K piperd   2   0:00   0.00% daemon
 1307 smmsp         1  20    0    16M   836K pause    2   0:00   0.00% sendmail
 1801 smmsp         1  20    0    16M   688K pause    1   0:00   0.00% sendmail
 1528 root          1  52    0    11M   816K select   0   0:00   0.00% dhclient
 1578 _dhcp         1  20    0    12M   996K select   2   0:00   0.00% dhclient
  892 root          1  20    0    11M  1244K select   1   0:00   0.00% dhclient
 1525 root          1  20    0    11M   788K select   1   0:00   0.00% dhclient
 1300 root          1  20    0    19M  2084K select   0   0:00   0.00% sshd
 6638 root          1  20    0    13M  2376K sbwait   0   0:00   0.00% ftpd
 6642 root          1  33    0    11M  2660K wait     1   0:00   0.00% sh
 1765 smmsp         1  24    0    16M  1080K piperd   1   0:00   0.00% sendmail
 1764 smmsp         1  24    0    16M  1080K piperd   1   0:00   0.00% sendmail
 6641 root          1  21    0    11M  1560K piperd   2   0:00   0.00% cron
 1321 root          1  22    0    12M   952K piperd   1   0:00   0.00% cron
 1322 root          1  22    0    12M   952K piperd   0   0:00   0.00% cron
 1888 root          1  52    0    11M   856K ttyin    1   0:00   0.00% getty
 1884 root          1  52    0    11M   856K ttyin    1   0:00   0.00% getty
 1889 root          1  52    0    11M   856K ttyin    2   0:00   0.00% getty
 1891 root          1  52    0    11M   856K ttyin    0   0:00   0.00% getty
 1885 root          1  52    0    11M   856K ttyin    0   0:00   0.00% getty
 1890 root          1  52    0    11M   856K ttyin    0   0:00   0.00% getty
 1886 root          1  52    0    11M   856K ttyin    2   0:00   0.00% getty
 1887 root          1  52    0    11M   856K ttyin    1   0:00   0.00% getty
  127 root          1  52    0    11M     0B pause    2   0:00   0.00% <adjkerntz>
 1759 root          1  52    0    19M   632K select   2   0:00   0.00% sshd
 6628 root          1  20    0    10M  1872K nanslp   1   0:00   0.00% sleep
 6645 root          1  32    0    13M  2956K CPU3     3   0:00   0.00% top


Device          512-blocks     Used    Avail Capacity
/dev/ada1p8       16777216     1.8G     6.2G    22%


Wed Dec  2 18:22:50 EST 2020

procs     memory        page                    disks     faults         cpu
r b w     avm     fre  flt  re  pi  po    fr   sr mm0 mm0   in    sy    cs us sy id
4 0 1 542673200  131444    82   3   0   0   120  384   0   0  190  1054  1752  1  7 92

last pid:  6666;  load averages: 54.90, 43.66, 23.65  up 5+03:42:06    18:22:50
59 processes:  2 running, 54 sleeping, 2 zombie, 1 lock
CPU:  0.1% user,  0.7% nice,  2.5% system,  4.2% interrupt, 92.5% idle
Mem: 685M Active, 978M Inact, 72M Laundry, 1866M Wired, 128M Free
ARC: 678M Total, 248M MFU, 234M MRU, 2914K Anon, 4376K Header, 189M Other
     309M Compressed, 403M Uncompressed, 1.30:1 Ratio
Swap: 8192M Total, 1810M Used, 6382M Free, 22% Inuse

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
84307 weiyong      17  30    9  1585M   829M uwait    0  56:12 100.29% syncthing
 6648 root          1  20    0    11M  2344K *re0     3   8:09 100.00% ping
 1768 root         11  20    0   533M   140M kqread   0 166:52   0.00% bhyve
 1769 root         11  20    0   550M   108M kqread   3 108:32   0.00% bhyve
 1220 root          1  20    0    38M  2248K select   0  15:53   0.00% nmbd
 1707    921        3  20    0    26M  1288K kqread   0   3:25   0.00% transmission-daemon
  895 root          1  52    0    11M  1276K select   0   2:55   0.00% dhclient
 1270 root          1  20    0    11M   924K select   2   2:32   0.00% powerd
 1225 root          1  21    0   173M  3840K select   0   1:03   0.00% smbd
 1207 ntpd          1  20    0    19M  1472K select   3   0:54   0.00% ntpd
 1193 weiyong      11  33    0   708M  9448K uwait    1   0:34   0.00% syncthing
 1798 root          1  20    0    17M  1484K select   3   0:14   0.00% sendmail
 1304 root          1  20    0    17M  1848K select   0   0:14   0.00% sendmail
 7992 root          1  20    0    11M   620K wait     3   0:10   0.00% sh
 1311 root          1  20    0    11M  1344K nanslp   0   0:05   0.00% cron
 1805 root          1  20    0    11M   548K nanslp   1   0:04   0.00% cron
 1119 root          1  20    0    11M  1092K zio->i   0   0:02   0.00% syslogd
 1873 root          1  20    0    12M  1308K select   0   0:02   0.00% ftpd
 1665 root          1  20    0    11M   648K select   3   0:02   0.00% syslogd
 1298 root          1  20    0   131M  3736K select   0   0:01   0.00% smbd
 1303 root          1  20    0   173M  3288K zio->i   1   0:01   0.00% smbd
 1297 root          2  20    0   133M  2828K select   3   0:01   0.00% smbd
 5252 root          1  20    0    13M  2344K sbwait   2   0:01   0.00% ftpd
  970 root          1  20    0    10M  1080K select   0   0:00   0.00% devd
  945 _dhcp         1  20    0    12M  1340K select   3   0:00   0.00% dhclient
 1192 weiyong       1  20    0    11M   884K piperd   3   0:00   0.00% daemon
 1307 smmsp         1  20    0    16M   836K pause    2   0:00   0.00% sendmail
 1801 smmsp         1  20    0    16M   688K pause    1   0:00   0.00% sendmail
 1528 root          1  52    0    11M   816K select   0   0:00   0.00% dhclient
 1578 _dhcp         1  20    0    12M   996K select   2   0:00   0.00% dhclient
  892 root          1  20    0    11M  1244K select   1   0:00   0.00% dhclient
 1525 root          1  20    0    11M   788K select   1   0:00   0.00% dhclient
 1300 root          1  20    0    19M  2084K select   0   0:00   0.00% sshd
 1765 smmsp         1  24    0    16M  1080K piperd   1   0:00   0.00% sendmail
 1764 smmsp         1  24    0    16M  1080K piperd   1   0:00   0.00% sendmail
 6659 root          1  28    0    11M  2660K wait     0   0:00   0.00% sh
 6657 root          1  22    0    11M  2652K zio->i   1   0:00   0.00% sh
 6658 operator      1  21    0    11M  2632K zio->i   3   0:00   0.00% sh
 1321 root          1  22    0    12M   952K piperd   1   0:00   0.00% cron
 1322 root          1  22    0    12M   952K piperd   0   0:00   0.00% cron
 6654 root          1  20    0    11M  1560K piperd   3   0:00   0.00% cron
 1888 root          1  52    0    11M   856K ttyin    1   0:00   0.00% getty
 1884 root          1  52    0    11M   856K ttyin    1   0:00   0.00% getty
 6666 root          1  27    0    13M  2960K CPU3     3   0:00   0.00% top
 1889 root          1  52    0    11M   856K ttyin    2   0:00   0.00% getty
 1891 root          1  52    0    11M   856K ttyin    0   0:00   0.00% getty
 1885 root          1  52    0    11M   856K ttyin    0   0:00   0.00% getty
 6664 operator      1  20    0   596K   400K RUN      2   0:00   0.00% sh
 1890 root          1  52    0    11M   856K ttyin    0   0:00   0.00% getty
 1886 root          1  52    0    11M   856K ttyin    2   0:00   0.00% getty
 1887 root          1  52    0    11M   856K ttyin    1   0:00   0.00% getty
  127 root          1  52    0    11M     0B pause    2   0:00   0.00% <adjkerntz>
 1759 root          1  52    0    19M   632K select   2   0:00   0.00% sshd
 6652 root          1  21    0    11M  1432K piperd   3   0:00   0.00% cron
 6651 root          1  20    0    11M  2340K select   3   0:00   0.00% ping
 6655 root          1  21    0    11M  1432K piperd   3   0:00   0.00% cron
 6662 root          1  20    0    11M  1244K piperd   3   0:00   0.00% cron


Device          512-blocks     Used    Avail Capacity
/dev/ada1p8       16777216     1.8G     6.2G    22%
```


----------



## richardtoohey2 (Dec 3, 2020)

Memory/swap looks OK between the two - but the load averages seem a bit bonkers on the second one?


```
last pid:  6666;  load averages: 54.90, 43.66, 23.65  up 5+03:42:06    18:22:50
```

That _seems_ to show huge load?  I'm definitely not an expert, but not what you'd expect to see?

That link to the issue on FreeBSD 11 sounds like what are you seeing.

Looks like syncthing and ping each using 100% of a CPU ...


```
84307 weiyong      17  30    9  1585M   829M uwait    0  56:12 100.29% syncthing
 6648 root          1  20    0    11M  2344K *re0     3   8:09 100.00% ping
```

Will be interesting to see if better without syncthing.


----------



## jmos (Dec 3, 2020)

On every system I know a load of 55 on a server means: unreachable. Without investigating time in your setup it points me to virtual machines with far to many jobs/requests and/or heavy disk access and/or out of memory (or DOS attacks), so jobs/requests can't be answered fast enough; The result is that more and more jobs/requests are left open, and: an increasing load.

If possible: I would take one after another of my services down to identify who causes this problem.


----------



## wxppro (Dec 3, 2020)

Too early to tell yet - Gotta give it a few days to confirm no outage with the Syncthing service stopped.  But it has to be Syncthing.  I found numerous posts complaining system "unresponsive" or "freezing" on both FreeBSD and Syncthing forums.

And jmos you are completely right that a server will be unresponsive under heavy load.  In this case, the weird thing is that the average load is really low.  It is some spikes of heavy load caused by Syncthing.  But even so, should not the operating system itself still be responsive?  That is, be able to maintain network connectivity, and accessible by SSH?  I can understand that tasks take much longer to complete, but a whole system freeze for a modern operating system is beyond me...


----------



## richardtoohey2 (Dec 3, 2020)

Can you put syncthing into any debugging mode / verbose output / logging?  Might give some more clues as to what specifically it is doing to trigger this issue.

But maybe do that after you have the trial period without syncthing.

I _think_ fork bombs are still a thing - so yes, you can engage treacle-mode on an OS by overwhelming it with resource-intensive processes.

It does seem to be known for whacking machines (this is an old post): https://forum.syncthing.net/t/syncthing-freezes-my-computer/2450


----------



## jmos (Dec 4, 2020)

wxppro said:


> But even so, should not the operating system itself still be responsive?  That is, be able to maintain network connectivity, and accessible by SSH?  I can understand that tasks take much longer to complete, but a whole system freeze for a modern operating system is beyond me...


Who defined the binaries on your computer that should be given priority? The system can hardly guess that a request on SSH should now be prioritized higher - the system cannot solve this for you in general (and the AI thing doesn't exist, it's a marketing fairy tale). And a new, further job on top should rather be executed instead of finally finishing the current state?

It would be nice to have a responsive machine in such cases. But I myself never been in front of a machine on which this was possible out of the box. Instead you're waiting for minutes to get a single letter through when logging in…


----------



## richardtoohey2 (Dec 4, 2020)

If I'm looking at the right machine it does also seem a bit wimpy (or is that just me?) given the layers of complexity you are using - a quad core 1.1 Ghz Celeron with 4GB of RAM - and you've got Bhyve, jails, and ZFS all in the mix.  Or am I getting too used to needing a bazillion Ghz and a megaton of RAM just to run web browsers these days?  (My first computer had 1K of RAM until we got the 16K RAM pack!)


----------



## SirDice (Dec 4, 2020)

I'm trying to recap and skimmed through most of the thread so I may have missed some information. Is this machine a VM or real iron? I see there's 4GB of memory and 8GB of swap, that looks good. The two bhyve(8) VMs, how much memory is assigned to them? Looks like 2 x 512MB? So that's 1 GB, give or take. Set your ARC limit to 1GB too. That will leave around 2GB for the OS and everything else that's running on it (I see Samba and ftpd(8)?). Also try without running powerd(8).


----------



## wxppro (Dec 4, 2020)

Right, richardtoohey2, it is not a strong, powerful machine.  This one is only for home, with 2~3 users and some lightweight tasks.  To some extent, it is like a testing environment to check out how the system works.  It may seem to have many apps/layers installed, but most of them are just there, not really doing anything.  
 - The two VMs are Pihole on Debian, with each assigned 512MB.  They serve 5~10 home PCs and devices.  Really lightweight.  Load is close to zero.
 - The jail is for testing transmission.  No download has been done recently.
 - Samba and Syncthing are probably the only substantial apps, with about 2TB data.  It is almost all static data with occasional access.  Syncthing will sync files between two PCs and the server.  There is no constant reading/writing at all.

I love how FreeBSD makes things straightforward.  I also love the fact of separation of base system of userland.  That is why I switched the operating system from Ubuntu to FreeBSD.  The performance of all the above (without two Pihole VMs, and transmission directly on the machine) on Ubuntu is actually stable and reliable, no issues whatsoever.  I guess you can understand why I am a bit frustrated.  jmos had a good point, the system cannot tell which processes should have higher priority.  So it will be a whole system slow down / nonresponsiveness.  I was just expecting that the server can handle this light load with ease.

Appreciated your advice, SirDice.  I think what I have neglected is that FreeBSD requires tuning to be effective.  I probably all got used to just install a system and believe that default values will work.

So far, it has been almost one day with Syncthing turned off.  No outage.  Too early to tell but seems to be the right identification.  I will follow SirDice's suggestion to set ARC to 1GB (I already set it to 2G, down from the default 3/4 of system memory, roughly 3G), and turn Sycnthing back on.  Hopefully the system will be more stable.  Will report back.  Thanks to you all.


----------



## SirDice (Dec 4, 2020)

wxppro said:


> Appreciated your advice, SirDice. I think what I have neglected is that FreeBSD requires tuning to be effective. I probably all got used to just install a system and believe that default values will work.


FreeBSD "autotunes" itself and assumes various "sane" defaults that will work for most people. But in certain situations you have to help the autotuning a bit to do the right thing for specific use cases. It sometimes requires a bit of fiddling to get the best results. Luckily a lot of these "autotune" parameters can be easily changed, some are required to be set at boot, others can be changed "on-the-fly". Back in the earlier days when I started with FreeBSD you had to recompile the kernel to modify them.


----------



## wxppro (Dec 4, 2020)

Haha... talking about days you use the assembly language to deal with a few KBs.  Nowadays, hardware resources such as memory have become so abundant, to the extent we almost forgot there is a limit or some tuning is needed.

This all started with what a right home server for me is.  I do not like the idea of a full desktop PC as a home server.  It seems too wasteful for my purpose.  I also do not like the NAS devices.  Have used some before.  But they have too limited functionality.  A mini PC such as this Byte 3 seems to be very reasonable.  4GB memory should be fine for most home applications.

I learned that ZFS is good but requires quite some resources.  Since I am not so demanding on performance, it is fine to have less resources reserved for ZFS.  I hope the freed memory for the OS can then mitigate or eliminate the issue I am experiencing.  Fingers crossed.  Many thanks, SirDice.


----------



## PMc (Dec 5, 2020)

wxppro said:


> But even so, should not the operating system itself still be responsive?  That is, be able to maintain network connectivity, and accessible by SSH?  I can understand that tasks take much longer to complete, but a whole system freeze for a modern operating system is beyond me...


What else should it do?
It probably is still reachable and maintains network connectivity (try with `ping`), but SSH needs more to function: it needs to create new processes, read files, etc. etc. - so you need the process table operative, the disk subsystem operative, and probably a lot more. And all of this can technically get saturated.[1]

It is the task of the sysadmin to properly adjust a server, or auto-detect overloads and start countermeasures, or fix or get rid of the annoying processes. There is an `rtprio` feature that gives important tasks the priviledge to work, probably even under such conditions. There is also a tool called `racct` (`rctl`) which can limit ressource consumption - it just needs to be configured. If you want the system to stay mission critical operative, you can work with these. So enjoy! 


richardtoohey2 said:


> If I'm looking at the right machine it does also seem a bit wimpy (or is that just me?) given the layers of complexity you are using - a quad core 1.1 Ghz Celeron with 4GB of RAM - and you've got Bhyve, jails, and ZFS all in the mix.  Or am I getting too used to needing a bazillion Ghz and a megaton of RAM just to run web browsers these days?



Yes you do.  One cannot get performance from it, but one can make such a system run stable. Since SSD are cheap and fast, one should even be able to run things from there, i.e. having most processes in swap and most data in l2arc. But it is not adviseable to do so, because you will run into cornercase bugs which are not eagerly fixed, because nobody else is hit by them.

[1] There is an issue with ZFS: it has these hundreds of threads in the kernel. So to do anything useful with the filesystem, you also need the task scheduler to easily switch threads - and at a loadavg as shown above, it will have problems with that. On the old systems this was easier, there you could just wait (for minutes) until you get your shell process finally running.


----------



## PacketMan (Dec 12, 2020)

wxppro said:


> Too early to tell yet - Gotta give it a few days to confirm no outage with the Syncthing service stopped.  But it has to be Syncthing.  I found numerous posts complaining system "unresponsive" or "freezing" on both FreeBSD and Syncthing forums.
> 
> And jmos you are completely right that a server will be unresponsive under heavy load.  In this case, the weird thing is that the average load is really low.  It is some spikes of heavy load caused by Syncthing.  But even so, should not the operating system itself still be responsive?  That is, be able to maintain network connectivity, and accessible by SSH?  I can understand that tasks take much longer to complete, but a whole system freeze for a modern operating system is beyond me...


Syncthing for me on FreeBSD has been flawless. Got it running on three machines. Flawless.


----------



## takumo (Dec 14, 2020)

I had a similar problem with syncthing when the option *watch for changes *is enabled for any folder which contains many files (many depends on your cpu power, for me it was >1000).

The reason behind the problem is that this option uses the kernel function kqueue and each file has to be added independently to the kernel watch list. Because *all* fileaccess of the system are checked against this list even simple commands or processes can take ages if the list becomes too long. Sometimes I could not even do cat on the console, although I had enough of free mem and the cpu was idle.


----------



## wxppro (Dec 16, 2020)

OK, I think I have watched the system long enough. Here are two observations:

It is a memory management (cause), not a networking (effect) issue. By setting vfs.zfs.arc_max to 1073741824 (1GB), this server has maintained network connectivity for the past 10 days. No network outage observed. The default value of the parameter is 2837426176. Per FreeBSD handbook, that is ¾ of available memory. I tried to set it to 2G before. In both cases, network disconnections happened.

The other observation that Syncthing has something (the kqueue stuff?) that can cause FreeBSD to go berserk. takumo nailed it. The “Watch for Change” option must be turned off for folders with many files. Otherwise I see painfully slow SSH sessions as if the server has been dragged down to crawling, yet available memory and CPU utilization both seem to be just fine.

To validate the second observation, I actually tested it. I turned on “Watch for Change” for a large folder with more than 300K files (I enlarged kern.maxfiles and kern.maxfilesperproc to 500K). Syncthing started scanning, and in about one hour, all other services became unavailable: Samba, FTP, Transmission (in a jail), and SSH. The funny part is that the two Pihole virtual machines are still working. I can also ping the server itself. So, setting the ZFS ARC parameter to a smaller value does help the server maintain network connectivity. But it seems Syncthing has opened too many files to cause the system not responding to any other services.

Bottom line – It is possible to have a stable server with limited resources. 4GB memory is good enough for light load Samba, FTP, Transmission, and Syncthing. But Syncthing must be configured and operated correctly: (1) For folders with many files, do not turn on “Watch for Changes.” (2) Let it complete an initial scanning before enable syncing.

Thanks to you all for your help. I think I can put this behind now.


----------

