# Out of memory



## JohnnySorocil (Feb 23, 2019)

Do you have problems with out of memory problems under FreeBSD?
I am having problems for many years with unresponsive machines when they get without memory and/or swap. As far I can tell, when swap is enabled it will just prolong agony before some application is killed.

Tonight around 5AM I was using Firefox and then everything interactive stopped... GUI panel clock stopped changing, it wasn't possible to switch virtual desktop, nor start xkill or xterm. Try to go to console (Ctrl-Alt-F1) was also unsuccessful.
Daemons (lie hostapd, dhcpd, pf) continue to work, but I was not able to ssh to machine.


```
other_machine% ssh machine1
Last login: Thu Feb 21 05:11:25 2019 from unix:0.0
FreeBSD 11.2-RELEASE-p8 (GENERIC) #0: Tue Jan  8 21:35:12 UTC 2019
```
And then... nothing, no shell, nothing.
The machine was leaved as is and after 12h state was same: black monitor, no reaction to keyboard or mouse. Ssh login was also in same state.
There is nothing in logs (last log was at 2:50) until the computer is rebooted.
After reboot I can use machine normally, but nothing in logs:

```
# less /var/log/messages
Feb 21 01:22:54 innovator sshd[64784]: error: maximum authentication attempts exceeded for invalid user admin from 94.190.122.195 port 34315 ssh2 [preauth]
Feb 21 02:26:04 innovator in.tftpd[5747]: RRQ from 184.105.139.98 filename a.pdf
Feb 21 02:40:13 innovator sshd[81165]: error: Received disconnect from 185.254.120.6 port 14618:3: com.jcraft.jsch.JSchException: Auth fail [preauth]
Feb 21 02:40:15 innovator sshd[81901]: error: Received disconnect from 185.254.120.6 port 15936:3: com.jcraft.jsch.JSchException: Auth fail [preauth]
Feb 21 02:40:32 innovator sshd[82338]: error: Received disconnect from 185.254.120.6 port 16246:3: com.jcraft.jsch.JSchException: Auth fail [preauth]
Feb 21 02:40:35 innovator sshd[83660]: error: Received disconnect from 185.254.120.6 port 18850:3: com.jcraft.jsch.JSchException: Auth fail [preauth]

...

Feb 21 02:50:00 innovator sshd[34824]: error: Received disconnect from 185.254.120.6 port 41940:3: com.jcraft.jsch.JSchException: Auth fail [preauth]
Feb 21 02:50:11 innovator sshd[34963]: error: Received disconnect from 185.254.120.6 port 42573:3: java.net.SocketTimeoutException: Read timed out [preauth]
Feb 21 16:48:22 innovator syslogd: kernel boot file is /boot/kernel/kernel
Feb 21 16:48:22 innovator kernel: Copyright (c) 1992-2018 The FreeBSD Project.
```

Same problem is observed on two machines, one 12.0, other 11.2-RELEASE, with 16 and 8 GB of RAM.
Both machines are my dev machines with ZFS and usually firefox, music player, GUI filemanager and many xterms with tmux and vim.
It is not nice when server grade OS stops reacting to anything expect power cycling...
How to prevent that from happening (expect maybe rebooting every day)? Is there a way to limit resource consumption by program (limits(1)/rcctl(1)/...something)?


```
machine1% uname -sr
FreeBSD 11.2-RELEASE-p8

machine1% limits
Resource limits (current):
  cputime              infinity secs
  filesize             infinity kB
  datasize              2097152 kB
  stacksize              524288 kB
  coredumpsize         infinity kB
  memoryuse            infinity kB
  memorylocked               64 kB
  maxprocesses            12085
  openfiles              231435
  sbsize               infinity bytes
  vmemoryuse           infinity kB
  pseudo-terminals     infinity
  swapuse              infinity kB
  kqueues              infinity
  umtxp                infinity

machine2% uname -sr
FreeBSD 12.0-RELEASE-p2

machine2% limits
Resource limits (current):
  cputime              infinity secs
  filesize             infinity kB
  datasize              4194304 kB
  stacksize              524288 kB
  coredumpsize         infinity kB
  memoryuse            infinity kB
  memorylocked               64 kB
  maxprocesses            19486
  openfiles              468477
  sbsize               infinity bytes
  vmemoryuse           infinity kB
  pseudo-terminals     infinity
  swapuse              infinity kB
  kqueues              infinity
  umtxp                infinity
```

Sometimes there are logs (and machine will become responsive after 10 or more minutes).


```
# grep -i swap /var/log/messages
/var/log/messages:Jan 30 01:14:28 innovator kernel: GEOM_ELI: Device gpt/swap.eli destroyed.
/var/log/messages:Jan 30 01:14:28 innovator kernel: GEOM_ELI: Detached gpt/swap.eli on last close.
/var/log/messages:Jan 31 10:06:01 innovator kernel: pid 18922 (deadbeef), uid 1001, was killed: out of swap space
/var/log/messages:Feb  9 07:52:39 innovator kernel: pid 41854 (waterfox), uid 1001, was killed: out of swap space
/var/log/messages:Feb 14 12:52:35 innovator kernel: pid 36817 (deadbeef), uid 1001, was killed: out of swap space
/var/log/messages:Feb 16 21:07:42 innovator kernel: pid 89879 (waterfox), uid 1001, was killed: out of swap space
/var/log/messages:Feb 21 16:48:23 innovator kernel: GEOM_ELI: Device gpt/swap.eli created.
/var/log/messages:Feb 21 16:48:59 innovator kernel: GEOM_ELI: Device gpt/swap.eli destroyed.
/var/log/messages:Feb 21 16:48:59 innovator kernel: GEOM_ELI: Detached gpt/swap.eli on last close.
```

Sometimes there are logs like this (but currently I do not have them in /var/log/messages):

```
swap_pager_getswapspace(x): failed
```

I am not too much worried about killing leaky web browser or music player, the machine unresponsivibilty is what concerns me.
I can recompile and install debug kernel or something, it's not a problem.


----------



## Bobi B. (Feb 23, 2019)

Usually when machine starts swapping performance falls down to zero. How much RAM do you have? Can you describe your hardware configuration? Do you use this machine as a server or as a desktop? How long until you run out of free RAM? You should pinpoint program responsible for out-of-RAM condition. Better disable swap temporarily. Also, since you're using ZFS, limit ARC size to 1/3 or 1/2 of RAM, depending on what programs you run.

There are built-in utilities to monitor resources use.


----------



## PMc (Feb 23, 2019)

JohnnySorocil said:


> Do you have problems with out of memory problems under FreeBSD?



No. Not at all. Thats what swap is for: machine gets slow, but is recoverable and the issue can be analyzed.



> It is not nice when server grade OS stops reacting to anything expect power cycling...



There may be a little misunderstanding here: a server grade machine is not simply "better". It is more reliable _IF_ it is handled in a reliable way: a server setup needs proper planning. A consumer machine may forgive sloppy operation, a server machine usually does not.

The other problem is that you don't have an imminent console. If you switch from graphics to console, that operation needs an ugly lot of ressources - so if you are already in a ressource exhaustion situation, at that point you're usually lost.



> I am having problems for many years with unresponsive machines when they get without memory and/or swap. As far I can tell, when swap is enabled it will just prolong agony before some application is killed.



Well, in that case the most common approach is to have a look upon what is eating how much of the memory, and how much is currently left - _before_ it gets exhausted. There is a tool called `top` that continuousely shows that.



> ```
> other_machine% ssh machine1
> Last login: Thu Feb 21 05:11:25 2019 from unix:0.0
> FreeBSD 11.2-RELEASE-p8 (GENERIC) #0: Tue Jan  8 21:35:12 UTC 2019
> ```



It could be interesting to wait at that point until it maybe spits out an error message - which it may or may not do after a couple of minutes. This may give a clue about what is actually wrong.

I somehow doubt that this is out-of-memory. I suppose it is some other ressource exhaustion (buffers, process slots, whatever) or a defect.



> How to prevent that from happening (expect maybe rebooting every day)? Is there a way to limit resource consumption by program (limits(1)/rcctl(1)/...something)?



Sure, lots of such. But we need to know _what_ ressource the problem is, before adjusting it. No therapy without diagnosis.



> ```
> # grep -i swap /var/log/messages
> /var/log/messages:Jan 30 01:14:28 innovator kernel: GEOM_ELI: Device gpt/swap.eli destroyed.
> /var/log/messages:Jan 30 01:14:28 innovator kernel: GEOM_ELI: Detached gpt/swap.eli on last close.
> ...



Now, this does not look like you're out of swapspace, it looks like somehow you loose your swap device. And then, Your swap is not just a plain disk device - You have encryption on it, and that whole thing is probably stacked above ZFS. And if such a construct doesn't work properly, then You get quite likely the picture you describe.

What I would do, at first I would check if that swap does function at all: grab awk and let it bloat an array until it walks into swap. Watch with `top` how the swap usage grows, and kill the awk before it fills. Something like this should do:
`awk 'END { for(i=0; i< 9999999; i++) a[I]=[/I]"abcdefghij";}' < /dev/null`

Second step: Switch off the gpt/eli/whatever swap, and configure a plain unencrypted swap partition outside of ZFS, e.g. on an usb stick. That will be slow, but it should work, and the machine should not stall. See if the failure repeats or is now avoided.

Third step: figure out the ressouce usage. In most cases that is imminent visible with `top`, otherwise stronger measures are needed. With 8 Gig RAM Your desktop stuff should almost run without swap, and should certainly not stall the machine - unless there is some flaw in the layout.


----------



## JohnnySorocil (Feb 26, 2019)

Bobi B. said:


> Usually when machine starts swapping performance falls down to zero.



Agree. That is the reason why I (usually) disable swap on my machines.



Bobi B. said:


> How much RAM do you have? Can you describe your hardware configuration? Do you use this machine as a server or as a desktop? How long until you run out of free RAM? You should pinpoint program responsible for out-of-RAM condition. Better disable swap temporarily. Also, since you're using ZFS, limit ARC size to 1/3 or 1/2 of RAM, depending on what programs you run.


8 GB in one machine, 16 GB in 2nd.
Both machines are used as a desktop (X11 WMs, xterm, text editor, firefox, music player, occasionally some compiling and video playing but nothing fancy).
After few days (and few firefox/waterfox instances later) problem occurs (sometimes it is enough to leave a few web browsers and media player instances and wait).

```
[~][pts/10][19.02.26. 17:47:05]
% zfs-stats -AE

------------------------------------------------------------------------
ZFS Subsystem Report                            Tue Feb 26 17:33:51 2019
------------------------------------------------------------------------

ARC Summary: (THROTTLED)
        Memory Throttle Count:                  32


ARC Misc:
        Deleted:                                2.10m
        Recycle Misses:                         0
        Mutex Misses:                           38.73k
        Evict Skips:                            15.75m

ARC Size:                               47.99%  1.79    GiB
        Target Size: (Adaptive)         50.42%  1.88    GiB
        Min Size (Hard Limit):          22.27%  849.52  MiB
        Max Size (High Water):          4:1     3.73    GiB

ARC Size Breakdown:
        Recently Used Cache Size:       64.62%  1.21    GiB
        Frequently Used Cache Size:     35.38%  680.55  MiB

ARC Hash Breakdown:
        Elements Max:                           511.70k
        Elements Current:               9.95%   50.93k
        Collisions:                             680.71k
        Chain Max:                              7
        Chains:                                 1.20k

------------------------------------------------------------------------

ARC Efficiency:                                 551.47m
        Cache Hit Ratio:                95.25%  525.25m
        Cache Miss Ratio:               4.75%   26.22m
        Actual Hit Ratio:               94.46%  520.91m

        Data Demand Efficiency:         85.73%  106.92m
        Data Prefetch Efficiency:       41.65%  159.30k

        CACHE HITS BY CACHE LIST:
          Anonymously Used:             0.61%   3.22m
          Most Recently Used:           10.05%  52.79m
          Most Frequently Used:         89.12%  468.12m
          Most Recently Used Ghost:     0.06%   330.26k
          Most Frequently Used Ghost:   0.15%   793.74k

        CACHE HITS BY DATA TYPE:
          Demand Data:                  17.45%  91.67m
          Prefetch Data:                0.01%   66.35k
          Demand Metadata:              81.39%  427.47m
          Prefetch Metadata:            1.15%   6.04m

        CACHE MISSES BY DATA TYPE:
          Demand Data:                  58.18%  15.26m
          Prefetch Data:                0.35%   92.95k
          Demand Metadata:              37.34%  9.79m
          Prefetch Metadata:            4.12%   1.08m

------------------------------------------------------------------------
```



Bobi B. said:


> There are built-in utilities to monitor resources use.


Can you recommend something which will be useful in my situation?


Yesterday around 16:50 the machine with 8 GB of RAM freezes (same problem as descried - taskbar clock stops, music stops after currently playing song, screen won't go off after few hours, cannot login over ssh, no response to Ctrl-Alt-F1,...).
Problem started when I had to run another Firefox instance. Didn't have memory meter other than the one in Conky, but it was around 99% (or something like that, definitely close to max).
But this time I waited.
I wasn't at home until around 17:00 today, but machine was responsive (music continued, physical console (Ctrl-Alt-F1) was on screen)
logs:

```
Feb 23 19:25:46 innovator sshd[7909]: error: maximum authentication attempts exceeded for root from 37.110.104.187 port 45227 ssh2 [preauth]
Feb 23 21:11:33 innovator sshd[78410]: error: maximum authentication attempts exceeded for invalid user admin from 174.103.99.159 port 47898 ssh2 [preauth]
Feb 24 03:50:09 innovator sshd[28129]: error: maximum authentication attempts exceeded for root from 2.179.183.88 port 52200 ssh2 [preauth]
Feb 24 04:47:47 innovator sshd[38586]: error: maximum authentication attempts exceeded for root from 91.137.127.104 port 62504 ssh2 [preauth]
Feb 24 06:23:18 innovator sshd[53559]: error: maximum authentication attempts exceeded for root from 5.224.43.97 port 52978 ssh2 [preauth]
Feb 24 12:43:07 innovator sshd[99155]: error: maximum authentication attempts exceeded for invalid user admin from 116.3.101.243 port 33375 ssh2 [preauth]
Feb 24 16:47:37 innovator sshd[17678]: error: maximum authentication attempts exceeded for root from 182.45.53.77 port 38863 ssh2 [preauth]
Feb 25 10:41:01 innovator sshd[3877]: error: maximum authentication attempts exceeded for root from 222.220.84.95 port 40538 ssh2 [preauth]
Feb 25 12:09:43 innovator sshd[82956]: error: maximum authentication attempts exceeded for invalid user admin from 222.113.145.41 port 37131 ssh2 [preauth]
Feb 25 13:42:03 innovator sshd[80577]: error: maximum authentication attempts exceeded for root from 58.236.98.240 port 41500 ssh2 [preauth]
Feb 25 16:19:46 innovator sshd[29762]: error: maximum authentication attempts exceeded for root from 42.115.173.65 port 49458 ssh2 [preauth]
Feb 25 17:18:34 innovator sshd[11175]: error: maximum authentication attempts exceeded for root from 115.55.152.216 port 27027 ssh2 [preauth]
Feb 25 18:43:40 innovator ppp[11659]: tun0: Warning: ff02::/: Change route failed: errno: Network is unreachable
Feb 25 18:43:50 innovator ppp[11659]: tun0: Warning: ff02::/: Change route failed: errno: Network is unreachable
Feb 25 18:49:04 innovator sshd[68665]: error: maximum authentication attempts exceeded for invalid user admin from 93.138.63.121 port 52693 ssh2 [preauth]
Feb 25 23:04:23 innovator sshd[77023]: error: maximum authentication attempts exceeded for root from 119.130.105.246 port 49886 ssh2 [preauth]
Feb 26 08:15:03 innovator kernel: ath0: stuck beacon; resetting (bmiss count 4)
Feb 26 08:52:32 innovator kernel: ath0: device timeout
Feb 26 11:45:34 innovator kernel: pid 4188 (waterfox), uid 1001, was killed: out of swap space
Feb 26 14:31:47 innovator sshd[28062]: error: maximum authentication attempts exceeded for root from 119.130.105.246 port 43634 ssh2 [preauth]
Feb 26 16:27:17 innovator sshd[51415]: error: maximum authentication attempts exceeded for root from 123.24.67.128 port 47838 ssh2 [preauth]
Feb 26 17:08:42 innovator sshd[75261]: error: maximum authentication attempts exceeded for root from 81.83.212.173 port 40886 ssh2 [preauth]
```
So, after more than half a day machine comes to life with logs which aren't that descriptive...


----------



## Sevendogsbsd (Feb 26, 2019)

There was a 10,000 (kidding) post thread on Firefox causing issues like this here, can't remember the thread title - search for it. May have been fixed. I have no issues like this but I don't use Firefox and my machine only stays up around 8 hours max  - I shut down after using.

I "think" this is it: Thread 67657 but seems shorter than I remember.


----------



## JohnnySorocil (Feb 26, 2019)

PMc said:


> The other problem is that you don't have an imminent console. If you switch from graphics to console, that operation needs an ugly lot of ressources - so if you are already in a ressource exhaustion situation, at that point you're usually lost.


Interesting, didn't know what. I'll try to reproduce it while on console and see how it goes




PMc said:


> Well, in that case the most common approach is to have a look upon what is eating how much of the memory, and how much is currently left - _before_ it gets exhausted. There is a tool called `top` that continuousely shows that.


Usually the memory heavy programs are ran before exhaustion. But I expected that then guilty process will be killed, not whole machine to basically halt.




PMc said:


> It could be interesting to wait at that point until it maybe spits out an error message - which it may or may not do after a couple of minutes. This may give a clue about what is actually wrong.


Well, I have waited for a day, and there was only one useful line in log (freeze occurred approximately Feb 25 16:50):

```
Feb 26 11:45:34 innovator kernel: pid 4188 (waterfox), uid 1001, was killed: out of swap space
```



PMc said:


> I somehow doubt that this is out-of-memory. I suppose it is some other ressource exhaustion (buffers, process slots, whatever) or a defect.



Something like this?

```
sysctl -h vfs | \grep buffer
```



PMc said:


> Now, this does not look like you're out of swapspace, it looks like somehow you loose your swap device. And then, Your swap is not just a plain disk device - You have encryption on it, and that whole thing is probably stacked above ZFS. And if such a construct doesn't work properly, then You get quite likely the picture you describe.


Sorry, that was part of the log when I have disabled swap device after boot. Swap partition (when used) is plain GPT partition:

```
# swapinfo
Device          1K-blocks     Used    Avail Capacity
# gpart show -p ada0
=>       40  468862048    ada0  GPT  (224G)
         40       2008          - free -  (1.0M)
       2048     204800  ada0p1  efi  (100M)
     206848       1024  ada0p2  freebsd-boot  (512K)
     207872       1024          - free -  (512K)
     208896  461373440  ada0p3  freebsd-zfs  (220G)
  461582336    2097152  ada0p4  freebsd-swap  (1.0G)
  463679488    5182600          - free -  (2.5G)
```



PMc said:


> What I would do, at first I would check if that swap does function at all: grab awk and let it bloat an array until it walks into swap. Watch with `top` how the swap usage grows, and kill the awk before it fills. Something like this should do:
> `awk 'END { for(i=0; i< 9999999; i++) a[I]=[/I]"abcdefghij";}' < /dev/null`
> 
> Second step: Switch off the gpt/eli/whatever swap, and configure a plain unencrypted swap partition outside of ZFS, e.g. on an usb stick. That will be slow, but it should work, and the machine should not stall. See if the failure repeats or is now avoided.
> ...



Thanks for the suggestions, I'll try to reproduce the same problem on machine with 16 GB RAM (and limited ARC).


----------



## JohnnySorocil (Feb 26, 2019)

Sevendogsbsd said:


> There was a 10,000 (kidding) post thread on Firefox causing issues like this here, can't remember the thread title - search for it. May have been fixed. I have no issues like this but I don't use Firefox and my machine only stays up around 8 hours max  - I shut down after using.
> 
> I "think" this is it: Thread 67657 but seems shorter than I remember.


Thanks, I'll look into it! Firefox is definitely a memory pig 

But, can I prevent some program (Firefox or not, shouldn't matter) from doing this to machine? I would prefer that OS kills memory intensive program before it stalls the whole machine for hours.


----------



## SirDice (Feb 26, 2019)

JohnnySorocil said:


> Agree. That is the reason why I (usually) disable swap on my machines.


Contrary to popular belief, swapping in and of itself is NOT bad. It's _excessive_ swapping that will cause performance  issues. And even then it may not be bad, if the choice is between a system that completely hangs itself up due to memory starvation or a system that works, slowly, but still works, then the choice should be simple. 

In short, always add swap.


----------



## shkhln (Feb 26, 2019)

JohnnySorocil said:


> I would prefer that OS kills memory intensive program before it stalls the whole machine for hours.



FreeBSD has OOM killer, but it's not very reliable as you can see.


----------



## JohnnySorocil (Feb 26, 2019)

shkhln said:


> FreeBSD has OOM killer, but it's not very reliable as you can see.



Maybe something in sys/swap_pager.c and/or sys/vm_pageout.c can be tuned?
In worst case, I would be OK if somehow bad process name can be hardcoded somewhere (so at least, X11 and good processes(tm) survive).

Or, can somehow 32-128-NNN MB of RAM be reserved for responsiveness of OOM killer (so it has enough to kill bad processes)?


----------



## PMc (Feb 26, 2019)

JohnnySorocil said:


> 8 GB in one machine, 16 GB in 2nd.
> Both machines are used as a desktop (X11 WMs, xterm, text editor, firefox, music player, occasionally some compiling and video playing but nothing fancy).
> After few days (and few firefox/waterfox instances later) problem occurs (sometimes it is enough to leave a few web browsers and media player instances and wait).
> 
> ...



Something is very strange here. 
My desktop was running with 4G memory until last month. With firefox, 1-2 vlc, occasionally 1-2 additional browsers, OpenOffice, occasionally recompiling OS and all ports - but I do not recall ever having seen a count on this memory throttle metric. I had no problems whatsoever, except that it gets very slow when paging, and firefox producing coredumps at termination.



> Yesterday around 16:50 the machine with 8 GB of RAM freezes (same problem as descried - taskbar clock stops, music stops after currently playing song, screen won't go off after few hours, cannot login over ssh, no response to Ctrl-Alt-F1,...).



This is not a typical swapping behaviour. Swapping may reduce speed to about 1/100 (which is much too slow for most people), but it does not freeze things (for longer than occasionally a couple of seconds).



> Problem started when I had to run another Firefox instance.



That may indeed hurt. I found the newer firefox (with multiple content processes) to use an extreme amount of memory: about 2-3 Gig _per installed CPU_! So, depending on how many CPUs there are, a second one of that ilk may lead into troubles.
-> Go to "Preferences", uncheck the "use recommended performance settings" box, see what it says for "content process limit" (should be the number of your CPUs), and set it to a sensible value (2 or 3 should do). Set it to 1 for all secondary instances.


----------



## Bobi B. (Feb 26, 2019)

You can try with setting resource limits, but the way you do it depends from the shell you use. With tcsh(1) the command you need is `limit`, whereas with sh(1) and bash(1) it is probably `ulimit`.

To monitor resource use you can try top(1) (`top -S -o size`), of course, systat(1) (`systat -vmstat`), vmstat(8) (`vmstat 1`). To monitor process' resource usage try procstat(1) (`procstat -l <pid>`, `procstat -r <pid>`). top(1) can also show you processes doing most I/O with `top -S -m io -o total` and gstat(8) will display per-disk/partition utilization.

You might leave some of those running in a X11 terminal so you can look for culprit when you notice the machine is going down.


----------



## PMc (Feb 26, 2019)

> Sorry, that was part of the log when I have disabled swap device after boot. Swap partition (when used) is plain GPT partition:
> 
> ```
> # swapinfo
> ...



No that won't work. Give the beast some swap it can use. 10 Gig for starters. Or at least give it the 3.5 that are there.


----------



## PMc (Feb 26, 2019)

JohnnySorocil said:


> Thanks, I'll look into it! Firefox is definitely a memory pig
> 
> But, can I prevent some program (Firefox or not, shouldn't matter) from doing this to machine? I would prefer that OS kills memory intensive program before it stalls the whole machine for hours.



Well, but I would NOT prefer such behaviour.  
I want it to try and struggle its way through, if that at any means might work out. It's a server OS - and if somebody sends a bloated query to the database, we do NOT want it to continue running with one of the database processes killed (and that way the whole thing becoming somehow half-operative).

`racct` should allow to do that, it can limit memory usage. But I didn't try that (as I don't want killed processes). What I did try was to use it to limit core memory usage and force the process to page out early and leave space for other processes (that obviousely requires ample swap space). That one works, but it puts much useless load on the vmdaemon, so the result didn't look very feasible to me.


----------



## shkhln (Feb 26, 2019)

PMc said:


> This is not a typical swapping behaviour. Swapping may reduce speed to about 1/100 (which is much too slow for most people), but it does not freeze things (for longer than occasionally a couple of seconds).



It's not a swapping behavior, it's a memory exhaustion behavior. Once you are out of _physical and swap_ memory anything trying to call malloc (or mmap) is going to stall indefinitely.



PMc said:


> Something is very strange here.



Nothing is strange here. I have a similar configuration (that is, zfs with swap disabled) and I occasionally get similar lockups. I don't believe it's a consequence of the disabled swap, presumably they are just easier to trigger that way.


----------



## PMc (Feb 26, 2019)

shkhln said:


> It's not a swapping behavior, it's a memory exhaustion behavior. Once you are out of _physical and swap_ memory anything trying to call malloc (or mmap) is going to stall indefinitely.



Yepp.



> Nothing is strange here. I have a similar configuration (that is, zfs with swap disabled) and I occasionally get similar lock ups. I don't believe it's a consequence of a disabled swap, presumably they are just easier to trigger that way.



Ah. I never disable swap. And it is indeed not really strange when considering that there is only 1 Gig swap - and that does not help with 8 Gig memory. So if there are maybe 4 processors, firefox in default configuration might likely expand to 8 Gig for itself, will sqeeze the zfs arc to uselessness, will at the same time walk out to swap, will soon hit the end of the swap, and there the show stops. From that viewpoint it figures.


----------



## Deleted member 30996 (Feb 27, 2019)

Here's the business end of my X61 .mp3 player running  FreeBSD 11.1-RELEASE-p10 at 292 days uptime. sysutils/screenfetch isn't showing the CPU for some reason but it's an Intel Core 2 Duo T7300 @ 2.00 GHz with 4GB RAM and Scorpio Black 200GB HDD:




It shows a total of 3947MB Swap allocated with 82MB in use, or 3947MB with 2% in use as you prefer.

I never turn sysutils/gkrellm2 or multimedia/xmms off and leave what songs I have loaded playing so I can just pick up my headphones and hear music. I never take it online and it couldn't be running better so I don't see the need to update ATM, but is a full FreeBSD build and every bit as capable as the T61 running FreeBSD 12.0-RELEASE-p3 with same specs I use as a desktop.

The T61 currently shows 3979 MB Swap allocated with 5608K in use at 6 days uptime. The T400 with Intel Core2 Duo P8600 @ 2.40GHz, 8GB RAM and Scorpio Black 200GB HDD running FreeBSD 11.2-RELEASE-p9 I'm on now is showing Swap at 3979MB allocated and 3979MB free at 3 days uptime.

I've never had a problem with Swap on any of my machines and they're never short on RAM no matter what I use them for.


----------



## Deleted member 48958 (Feb 27, 2019)

JohnnySorocil said:


> Do you have problems with out of memory problems under FreeBSD?


Yes. I started to have some problems with memory usage after update to 11.2, as far as I remember, IMO something happened after that release. I always use 2 machines at the same time, I use first machine to watch some content-listen to the music, while I work on my laptop, I use ssh and vnc to manipulate first machine. First machine is running FreeBSD 12 now, second one - Devuan 9. And I've almost never saw full RAM memory on Linux machine, while after the upgrade, FreeBSD machine RAM is full almost always... I just need to start to use some memory intensive apps, and soon RAM will be full, and it doesn't fully frees up even when I close all applications, about 1.5-2 GB of RAM is always full, as well as some amount of SWAP. Maybe it's some kind of glitch, which appeared after `freebsd-update`, and I just need to reinstall this OS, but unfortunately, I don't have much time for now, to make full reinstallation. BTW, I had much bigger memory problems with 11.2 release, RAM and SWAP were almost always full (I've downgraded to 10.4 release at that time, which used to work well for me, including memory usage), 12.0 works a little bit better, but anyway, 11.0 used to use memory much more reasonably for me.


----------



## twllnbrck (Feb 27, 2019)

ILUXA said:


> I just need to start to use some memory intensive apps, and soon RAM will be full, and it doesn't fully frees up even when I close all applications, about 1.5-2 GB of RAM is always full, as well as some amount of SWAP.


I recognized similar issues since 11.2-R upgrade. The largest memory junk was firefox what finally has caused me to switch to www/chromium. It is still a heavy browser and a memory hog but 1 site open on both firefox is consuming half more RAM.
But the problem with not fully freed up memory remains even on 12.0-p3.


----------



## PacketMan (Mar 19, 2019)

I'm running 11.2-RELEASE on a headless home server, with 16GB ram, with a fair few programs running, and as far as I am concerned there is no memory issues there. I could let that thing run for a year and it would keep on going. And although I have not written down any numbers, I'm pretty sure any program I shut down, its memory is 'released' and 'free for future use'.


----------



## usdmatt (Mar 19, 2019)

Much as I like ZFS, I've had memory issues with it since day 1, and always limit ARC manually. I tried a new server a few weeks ago without a manual limit to see if it was any better, but it started killing processes and the NFS service (which is what it was being used for) went down within 24 hours. I actually posted on the mailing list about it as I thought they'd been improvements made over the years but only got a few "me too" responses.

Not trying to knock FreeBSD too much, I've used it since 3.x and much prefer it to everything else; It's just frustrating that after more than 10 years, ZFS still seems, in my experience at least, to need draconian memory limits to stop the entire system starving itself to death.


----------



## twllnbrck (Mar 19, 2019)

usdmatt said:


> Much as I like ZFS, I've had memory issues with it since day 1, and always limit ARC manually


Im also playing around with ZFS tuning on my desktop PC (8GB RAM). What tunables do you change besides vfs.zfs.arc_max and are there any rule of thumbs to get some improvements?


----------



## D-FENS (Mar 19, 2019)

usdmatt said:


> Much as I like ZFS, I've had memory issues with it since day 1, and always limit ARC manually. I tried a new server a few weeks ago without a manual limit to see if it was any better, but it started killing processes and the NFS service (which is what it was being used for) went down within 24 hours. I actually posted on the mailing list about it as I thought they'd been improvements made over the years but only got a few "me too" responses.
> 
> Not trying to knock FreeBSD too much, I've used it since 3.x and much prefer it to everything else; It's just frustrating that after more than 10 years, ZFS still seems, in my experience at least, to need draconian memory limits to stop the entire system starving itself to death.


How much RAM does your system have? And how much disk space?


----------



## Ancient (Mar 19, 2019)

Look...I know that FreeBSD installations consume 2GB of minimum. But you mentioned that you have a few programs...look: only Xorg consumes 1GB. I know it because I used it.


----------



## usdmatt (Mar 19, 2019)

> How much RAM does your system have? And how much disk space?



The system I was running the other day had 12GB RAM and 2TB of storage (4 1TB disks in mirrors). I currently have ARC limited to 10GB and it's been fine since. Of course it shouldn't really matter that much. Filling a system with 10TB disks shouldn't mean you suddenly need 128GB of RAM to stop it from falling over. ARC should always leave some memory free and release some if the system starts to run out.


----------



## D-FENS (Mar 19, 2019)

Ancient said:


> Look...I know that FreeBSD installations consume 2GB of minimum. But you mentioned that you have a few programs...look: only Xorg consumes 1GB. I know it because I used it.


In theory, yes. I have a GUI system with XFCE desktop and FreeBSD 12.0 which runs on 512 MB RAM just fine. 60 GB HDD and ZFS file system.

Edit: The system is quite new though, and it does not have a lot of snapshots. Maybe the massive snapshot count mentioned below is a hint?


----------



## D-FENS (Mar 19, 2019)

usdmatt said:


> The system I was running the other day had 12GB RAM and 2TB of storage (4 1TB disks in mirrors). I currently have ARC limited to 10GB and it's been fine since. Of course it shouldn't really matter that much. Filling a system with 10TB disks shouldn't mean you suddenly need 128GB of RAM to stop it from falling over. ARC should always leave some memory free and release some if the system starts to run out.


I have a system with 16 GB RAM and 4x4TB disks and I have not yet run out of memory. It's used mostly as a light server though.

Edit: I think it actually DID run out of memory. I've been having HDD problems when scrubbing for a while but I thought it was a hardware problem. I looked deeper into it and it might be actually a memory problem.


----------



## PMc (Mar 21, 2019)

usdmatt said:


> Much as I like ZFS, I've had memory issues with it since day 1, and always limit ARC manually.



Agreed, that is a funny beast and I dont grasp it fully.
Lately I reduced desktop to single disk (for noise reasons) and implemented remote mirroring with zfs send. Then I watched it: 8GB mem installed, single 500G (spinning) disk, ZFS sitting there with 4.5G ARC and 6.8G(!) wired mem, firefox pushed into swap(!), the machine mostly idle and disk 100% busy with 2MB/sec thruput (daily chksetuid not getting anywhere), and if that weren't already enough:

```
ARC Summary: (THROTTLED)
        Memory Throttle Count:                  3
```
That is not how it is supposed to function, but I couldn't figure out what problem the machine had (it seems to have to do with massive snapshot usage).

I consider to _not recommend_ ZFS for single-disk desktop use. Once I tried to add an SSD-l2arc, but didn't perceive any improvement. Probably when the thing runs entirely from SSD - but then there might be not much point in a large ARC - the OS can buffer as well.



> I tried a new server a few weeks ago without a manual limit to see if it was any better, but it started killing processes and the NFS service (which is what it was being used for) went down within 24 hours. I actually posted on the mailing list about it as I thought they'd been improvements made over the years but only got a few "me too" responses.



I would like to contradict the stance that "ZFS needs much ram". Probably one can overcome most issues with an _extreme_ amount of ram, but otherwise the matter needs planning or some good educated guesses: databases can cache, the OS can cache, and putting it all together and hoping that it would decide for a nice interplay on its own behalfs, may not be enough.



> Not trying to knock FreeBSD too much, I've used it since 3.x and much prefer it to everything else; It's just frustrating that after more than 10 years, ZFS still seems, in my experience at least, to need draconian memory limits to stop the entire system starving itself to death.



It seems, ZFS deals with memory like some people deal with money: they never have enough, and if perchance they win in the lottery, it only leads to them getting really in debt. (The solution is to not give them more than they actually need.)
ZFS ist mostly just a cache - and running a (big) server with 90% of memory used as a filesystem cache sounds suboptimal to me - in such case I would prefer the application to decide to cache what it actually needs. The "adaptiveness" of ZFS cache is just a workaround for lack of this, and may or may not perform good, depending on the kind of workload.
I would rather start with a small ARC and then increase as long as it brings actual performance improvement - and always prefer an ability of the application to do it's own caching.


----------



## PMc (Mar 21, 2019)

usdmatt said:


> The system I was running the other day had 12GB RAM and 2TB of storage (4 1TB disks in mirrors). I currently have ARC limited to 10GB and it's been fine since. Of course it shouldn't really matter that much. Filling a system with 10TB disks shouldn't mean you suddenly need 128GB of RAM to stop it from falling over. ARC should always leave some memory free and release some if the system starts to run out.



Early in the story (~2008?), when I wanted to use ZFS for data integrity reasons (not concerning performance), I made it run on significantly less than 1G RAM. I had to go into the code and change the adjustments that are made there. There were three parts of them:

Some very simple math to initially come up with the adjustable values: arc_max and some limits.
Ways for ZFS to continuously receive the state of the OS (free mem etc.) and react on it.
The internal mechanics of ZFS to adjust ARC size and usage.
There are proably few people who would understand item 3, and this comes from the developers of ZFS, while item 1+2 were done for FreeBSD integration. Item 2 is what I had to adjust, and this has evolved over time (but I didn't look into it more recently). 
But, Your concern is with item 1, and this is just some simple best guess that hopefully work for the majority of users. You definitely should adjust these values, as You know Your workload and therefore almost always will make better choices.

There is no rule that XTB of disk would require Y GB of ram. (There is such a rule for using deduplication, and also for the use of l2arc - the latter depends on block- and filesizes. And there also seems to be an issue with snapshots, but I still have to figure that one out.) The more interesting point is the disk access patterns of the application, and therefore, how it can benefit from caching.
Also, caching behaviour can be further adjusted by switching it on/off in each individual filesystem.


----------



## xtaz (Mar 22, 2019)

I run ZFS with a single hard drive on a server and a laptop, both of which have 4GB of memory. The ARC is limited to 1GB on both and despite the free memory quite often being close to zero I've never had a problem. They rarely use swap, and if they do it's usually things which have been idle for hours rather than active processes.

However, with the default settings where ARC is not limited, then yes, it rapidly consumes all available memory and swaps like crazy. The out of the box configuration for ZFS/memory management really needs to be looked at. As people shouldn't have to limit the ARC size just to get it to behave.


----------



## ralphbsz (Mar 22, 2019)

Agree: I run ZFS on a 32-bit machine with 3GB of memory, with 4 disks under ZFS control, 4 pools ranging in size from 1TB to 4TB.  No memory problem at all, just have to set the well-known parameters in /boot/loader.conf:

```
vm.kmem_size="512M"
vm.kmem_size_max="512M"
vfs.zfs.arc_max="64M"
vfs.zfs.vdev.cache.size="8M"
vfs.zfs.prefetch_disable=0
```
Note that this a server machine, without X windows, without GUI, not running memory-intensive applications (like web browsers).  I have not tuned the above settings for ideal performamce, since performance is good enough for my needs, and the time investment involved in tuning isn't worth it.

I agree it would be nice if ZFS memory usage could be preconfigured at installation time, or it auto-tuned.  But that's not the world we live in.


----------



## PMc (Mar 22, 2019)

Here it is 2G ram, 3-4 pools, no graphics, and
Kernel-options:

```
options                KVA_PAGES=512
options        KSTACK_PAGES=8   # i've seen a "double fault" with 4
```

in loader.conf:

```
vm.kmem_size="1408M"
vm.kmem_size_max="1408M"
vfs.zfs.arc_max="800M"
vfs.zfs.arc_min="200M"
vfs.zfs.prefetch_disable="1"
```

in sysctl.conf

```
vfs.zfs.l2arc_norw="0"
vfs.zfs.l2arc_noprefetch="0"
vfs.zfs.arc_meta_limit=471859200  # arc is for metadata, payload goes in l2arc
vfs.zfs.min_auto_ashift=12
```

This is for the case scrub runs on SSD when booting - otherwise the machine will not reach multiuser. In multiuser these get reverted and auto-adjusted to current load.

```
vfs.zfs.scan_idle=86399999
vfs.zfs.scrub_delay=1
```

Beware: this is not thought for cut&paste. It is crafted for my specific need, e.g. the machine does some telephony routing and other housekeeping, like collecting backups - performance is not important, but responsiveness shall be maintained.


----------



## aragats (Mar 22, 2019)

ILUXA said:


> I started to have some problems with memory usage after update to 11.2...
> ... 12.0 works a little bit better, but anyway, 11.0 used to use memory much more reasonably for me.


I mostly agree with that. Now I'm on 12.0, and today I got this:
	
	



```
....
swap_pager_getswapspace(32): failed
pid 29726 (firefox), uid 1001, was killed: out of swap space
pid 38208 (chrome), uid 1001, was killed: out of swap space
```
Firefox had only one tab open, and Chromium about 15.
All other programs were _urxvt_ terminals in DWM and 1 spreadsheet in LibreOffice.

I have 16G of RAM, 4G is for bhyve's Win2019, ARC is limited to 2G, and I have 10G of swap.
How that could happen? It's hard to reproduce it now...


----------



## D-FENS (Mar 22, 2019)

ralphbsz said:


> Agree: I run ZFS on a 32-bit machine with 3GB of memory, with 4 disks under ZFS control, 4 pools ranging in size from 1TB to 4TB.  No memory problem at all, just have to set the well-known parameters in /boot/loader.conf:
> 
> ```
> vm.kmem_size="512M"
> ...


How many snapshots do you have on this machine?
I ran into issues and my machine has 16 GB of ram and ~ 8000 snapshots so far. The disks are 4x 4TB.


----------



## ralphbsz (Mar 23, 2019)

Snapshots?  Fewer than fingers on one hand.  I hardly ever use snapshots.  I have a backup system that was built before I had a file system with snapshots available, so I don't use them for backups.  Also, no compression or dedup (although the backup system has all that built in).


----------



## PMc (Mar 23, 2019)

Alright, that now figures: snapshots are a problem. One can do the following: run send/receive on two pools of the local machine (or get the machine i/o bound by other means), then run some script that works with snapshots - and then the fun starts: performance degrades and programs are pushed into swap, for whatever reason.
It seems if ZFS cannot put the snapshot (creation or deletion) out to disk immediately, it starts to do ugly things.

One can do many interesting things with snapshots. For instance, I use them for port building: this is faster than doing `make clean`. So, one may have some script based stuff that does massive create/destroy actions following some logic. In that way, the snapshot becomes a programming function - and obviousely, then there can be lots of them.


----------



## D-FENS (Mar 23, 2019)

PMc said:


> Alright, that now figures: snapshots are a problem. One can do the following: run send/receive on two pools of the local machine (or get the machine i/o bound by other means), then run some script that works with snapshots - and then the fun starts: performance degrades and programs are pushed into swap, for whatever reason.
> It seems if ZFS cannot put the snapshot (creation or deletion) out to disk immediately, it starts to do ugly things.
> 
> One can do many interesting things with snapshots. For instance, I use them for port building: this is faster than doing `make clean`. So, one may have some script based stuff that does massive create/destroy actions following some logic. In that way, the snapshot becomes a programming function - and obviousely, then there can be lots of them.


Yeah, I also think that has something to do with resource usage and massive snapshots seem to bloat the necessary cache. Eventually the system runs out of memory.
In my case it happens only when I do scrub. When not scrubbing, everything is peachy.


----------



## ralphbsz (Mar 23, 2019)

Extra memory being used while scrubbing, enough for every scrub thread to keep a complete state of what it is looking at, makes sense.  But after scrubbing is done, the memory usage should go back down.  My machine often stays up for a month at a time (longer in the summer when there are fewer power outages), and it scrubs every 3 days, and memory usage doesn't creep up.  Could this be a bug in scrubbing, with the way your ZFS is being used (snapshots and such)?

If yes, it's hard to report this; a developer would need way more debug information than just telling them "memory leaks".


----------



## PMc (Mar 23, 2019)

It is most likely not a memory demand of scrubbing (I don't observe such a demand), it is rather the fact that the scrubbing floods the i/o queues. I see these similar effects without scrubbing, too, when the disk is busy enough.

Snapshots need to care about data integrity, while a filesystem is active. So, certain data has to be put to disk, and quickly so, or otherwise some of the regular ongoing i/o activity has to be held back for some time - and this context, although I do not know the implementation details, seems likely to create urgent memory demands.

Anyway, I cannot confirm a memory leak. I would say, it's a (temporary) overcommit. We should check if it can be contained by limiting the arc_max appropriately - and continue to complain  about what I would call a design weakness.


----------



## D-FENS (Mar 23, 2019)

So how do I work around this problem with the I/O queue flooding? Reduce the amount of snaphosts?


----------



## PMc (Mar 23, 2019)

Well, I don't know what is practical for You. And I do not yet see the full picture: is it the _number_ of snapshots that is problematic, is it the _frequency_ they get created/deleted, is it the _size_ of the filesystems? Dunno.
Currently, I have remote disk mirroring active - that does use only a dozen snapshots, but they span all data and get created/deleted every few minutes. And the behaviour is far from troublesome, but still remarkably ugly.

For now, I would consider snapshots a valuable resouce that brings along certain expenses. 

I am currently experimenting with limiting the arc_max, but then, my snapshots do reach only the hundreds at max, and I never had an actual out-of-memory situation. OTOH, I still have early-swapping enabled (`vm.swap_idle_enabled=1`, from earlier times with scarce memory), and that makes the OS push data out to swap _before_ memory gets low. That might also be worth trying - but I will switch it off now, I don't like the browser to first have to climb out of swap when I come back after an hour or so.

And then, there is something I don't understand (from top() output):

```
Mem: 963M Active, 353M Inact, 827M Laundry, 5385M Wired, 102M Buf, 317M Free
ARC: 3110M Total, 431M MFU, 2326M MRU, 3432K Anon, 102M Header, 253M Other
     2350M Compressed, 4676M Uncompressed, 1.99:1 Ratio
```
who is to be accounted for difference of 5385M - 3110M?

It currently appears to me that limiting the arc_max may not even tackle the actual issue, because memory gets claimed _outside_ of the ARC (but still within wired kernel memory). This memory gets reclaimed when user processes are in need of mem, but only then.
It probably can be figured out who is doing that, but, uuh, that looks like work.


----------



## DoItDrive (Mar 29, 2019)

Sounds like an old machine, but I'm guessing it isn't.


----------



## shkhln (Sep 16, 2019)

JohnnySorocil said:


> Maybe something in sys/swap_pager.c and/or sys/vm_pageout.c can be tuned?



For what it's worth, there are a few tunable parameters, but I have zero understanding how they work. I now have the following line in _/etc/sysctl.conf_:

```
# Fuck you, Firefox.
vm.disable_swapspace_pageouts=1
```

That indeed seems to make FreeBSD much more enthusiastic about murdering processes.


----------

