# Apache Solr Java usage in FreeBSD



## last1 (Jan 25, 2022)

I am running Solr 8.6.2 under FreeBSD 12.3 inside a VM with 32GB of ram
Hypervisor: Origin = "VMwareVMware"
real memory  = 34359738368 (32768 MB)
avail memory = 33370628096 (31824 MB)

Solr is installed from packages and I am limiting it's memory like so:

/usr/local/openjdk11/bin/java -server -Xms16g -Xmx16g -XX:+UseG1GC -XX:+PerfDisableSharedMem -XX:+ParallelRefProcEnabled

Despite this, the process ends up taking an increasing amount of memory, goes into swap, eventually exhausts swap and the box fails unless I monitor it and restart it every 2 weeks or so. But right now I'm stuck in monitoring hell, having to endure countless swap warning messages, etc.

I asked the Solr team about this, they claim it's a FreeBSD issue. That their process really only occupies 16Gb of memory.

top reports this and I'm at a loss as to what to do next...

28 processes:  1 running, 27 sleeping
CPU: 35.9% user,  0.0% nice,  2.7% system,  0.1% interrupt, 61.3% idle
Mem: 23G Active, 4132M Inact, 1777M Laundry, 2258M Wired, 1572M Buf, 682M Free
Swap: 4096M Total, 1947M Used, 2149M Free, 47% Inuse

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
82778 solr        284  52    0   252G    27G uwait    1 713:51   0.00% java
83076 zookeeper    56  52    0  9876M  1724M uwait    1  11:46   0.00% java


----------



## shkhln (Jan 26, 2022)

What does `jps -lv` print?


----------



## 3301 (Jan 26, 2022)

Looks like something is leaking memory. I would bet on Solr or JVM. Please check which exact version of JDK you run and search their issue trackers for bugs related to memory leaking. You can also attach to the running Solr process using tools like visualvm and see if there is something suspicious. Edit: what filesystem you're using? Maybe extensive I/O operations cause such strange behavior.


----------



## last1 (Jan 26, 2022)

shkhln said:


> What does `jps -lv` print?




```
jps -lvV
83076 org.apache.zookeeper.server.quorum.QuorumPeerMain -Dzookeeper.root.logger=INFO,ROLLINGFILE -Dzookeeper.log.dir=/var/log/zookeeper -Dlog4j.configuration=file:/usr/local/etc/zookeeper/log4j.properties
88310 jdk.jcmd/sun.tools.jps.Jps -Dapplication.home=/usr/local/openjdk11 -Xms8m -Djdk.module.main=jdk.jcmd
```



3301 said:


> Looks like something is leaking memory. I would bet on Solr or JVM. Please check which exact version of JDK you run and search their issue trackers for bugs related to memory leaking. You can also attach to the running Solr process using tools like visualvm and see if there is something suspicious. Edit: what filesystem you're using? Maybe extensive I/O operations cause such strange behavior.



it's openjdk11 which was the dependency for solr-8.6.2. I'm using UFS and there's light disk usage, maybe 2-3MB/sec


----------



## shkhln (Jan 26, 2022)

last1 said:


> ```
> 83076 org.apache.zookeeper.server.quorum.QuorumPeerMain -Dzookeeper.root.logger=INFO,ROLLINGFILE -Dzookeeper.log.dir=/var/log/zookeeper -Dlog4j.configuration=file:/usr/local/etc/zookeeper/log4j.properties
> 88310 jdk.jcmd/sun.tools.jps.Jps -Dapplication.home=/usr/local/openjdk11 -Xms8m -Djdk.module.main=jdk.jcmd
> ```


And where exactly is Solr?


----------



## last1 (Jan 26, 2022)

shkhln said:


> And where exactly is Solr?



It's hidden because it was started with -XX:+PerfDisableSharedMem  . Should I restart it without it ?


----------



## shkhln (Jan 26, 2022)

I'm only interested in double checking arguments, try `procstat -c <pid>`.


----------



## last1 (Jan 26, 2022)

shkhln said:


> I'm only interested in double checking arguments, try `procstat -c <pid>`.



Oh ok, here we go

```
PID COMM             ARGS
86150 java             /usr/local/openjdk11/bin/java -server -Xms16g -Xmx16g -XX:+UseG1GC -XX:+PerfDisableSharedMem -XX:+ParallelRefProcEnabled -XX:MaxGCPauseMillis=250 -XX:+UseLargePages -XX:+AlwaysPreTouch -XX:+ExplicitGCInvokesConcurrent -Xlog:gc*:file=/var/log/solr/solr_gc.log:time,uptime:filecount=9,filesize=20M -Dsolr.jetty.inetaccess.includes= -Dsolr.jetty.inetaccess.excludes= -DzkClientTimeout=15000 -DzkHost=192.168.1.4:2181,192.168.1.9:2181,192.168.1.10:2181,192.168.1.11:2181 -Dsolr.log.level=WARN -Dsolr.log.dir=/var/log/solr -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Dhost=192.168.1.11 -Duser.timezone=UTC -XX:-OmitStackTraceInFastThrow -Djetty.home=/usr/local/solr/server -Dsolr.solr.home=/var/db/solr -Dsolr.data.home= -Dsolr.install.dir=/usr/local/solr -Dsolr.default.confdir=/usr/local/solr/server/solr/configsets/_default/conf -Djetty.host=192.168.1.11 -Djetty.host=0.0.0.0 -Dlog4j2.formatMsgNoLookups=true -Xss256k -Dsolr.jetty.https.port=8983 -Dsolr.log.muteconsole -XX:OnOutOfMemoryError=/usr/local/solr/bin/oom_solr.sh 8983 /var/log/solr -jar start.jar --module=http
```


----------



## Jose (Jan 26, 2022)

Maybe it's Zookeeper that's running out of memory?

Does the /var/log/solr/solr_gc.log have anything interesting in it when the box runs out of swap? Are there oom logs in the /var/log/solr directory?








						docker-solr/oom_solr.sh at master · docker-solr/docker-solr
					

A Docker build for Solr, to manage the official Docker hub solr image - docker-solr/oom_solr.sh at master · docker-solr/docker-solr




					github.com


----------



## shkhln (Jan 27, 2022)

No, it's not Zookeeper. The poorly formatted `top` output posted earlier shows 27 GB resident memory being consumed by Solr, which is admittedly a bit too much vs 16 GB heap limit.  I'm assuming it's probably just index files mapped with MMapDirectory, but it wouldn't hurt to verify that assumption. (Still thinking how this can be done.)


----------



## last1 (Jan 28, 2022)

Jose said:


> Maybe it's Zookeeper that's running out of memory?
> 
> Does the /var/log/solr/solr_gc.log have anything interesting in it when the box runs out of swap? Are there oom logs in the /var/log/solr directory?
> 
> ...



I'm not sure what to look for in that file. There's obviously tons of output. On this particular server, swap usage goes from 0% to 45% within 12 hours. Here's a small excerpt from the gc log file in case it's relevant

[2022-01-27T23:08:32.873+0500][69509.494s] GC(6095) Humongous regions: 922->805
[2022-01-27T23:08:32.873+0500][69509.494s] GC(6095) Metaspace: 253500K->253500K(1318912K)
[2022-01-27T23:08:32.873+0500][69509.494s] GC(6095) Pause Young (Normal) (G1 Evacuation Pause) 10569M->8882M(16384M) 84.918ms
[2022-01-27T23:08:32.873+0500][69509.494s] GC(6095) User=0.16s Sys=0.00s Real=0.08s
[2022-01-27T23:08:45.867+0500][69522.487s] GC(6096) Pause Young (Normal) (G1 Evacuation Pause)
[2022-01-27T23:08:45.867+0500][69522.487s] GC(6096) Using 2 workers of 2 for evacuation
[2022-01-27T23:08:45.960+0500][69522.580s] GC(6096)   Pre Evacuate Collection Set: 0.1ms
[2022-01-27T23:08:45.960+0500][69522.580s] GC(6096)   Evacuate Collection Set: 89.1ms
[2022-01-27T23:08:45.960+0500][69522.580s] GC(6096)   Post Evacuate Collection Set: 2.9ms
[2022-01-27T23:08:45.960+0500][69522.580s] GC(6096)   Other: 0.6ms
[2022-01-27T23:08:45.960+0500][69522.580s] GC(6096) Eden regions: 107->0(132)
[2022-01-27T23:08:45.960+0500][69522.580s] GC(6096) Survivor regions: 9->7(15)
[2022-01-27T23:08:45.960+0500][69522.580s] GC(6096) Old regions: 297->301

I'm now down to restarting this server daily in order to keep it from becoming unresponsive


----------



## last1 (Jan 28, 2022)

and to update my post, after I restarted solr.

Apparently it wasn't just solr tying up that 45% of swap, it was zookeeper too - 22% apparently

last pid: 93353;  load averages:  0.68,  1.03,  1.13                                                          up 376+08:11:50 23:18:10
31 processes:  1 running, 30 sleeping
CPU:  0.0% user,  0.0% nice,  0.2% system,  0.0% interrupt, 99.8% idle
Mem: 3035M Active, 7516M Inact, 867M Laundry, 2009M Wired, 1569M Buf, 18G Free
Swap: 4096M Total, 910M Used, 3186M Free, 22% Inuse

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
83076 zookeeper    56  52    0  9864M   901M uwait    1  23:13   0.67% java

I've restarted zookeeper too now and that 900MB got released as well

last pid: 93687;  load averages:  1.56,  1.35,  1.25                                                          up 376+08:14:13 23:20:33
32 processes:  1 running, 31 sleeping
CPU: 41.6% user,  0.0% nice,  3.5% system,  0.2% interrupt, 54.7% idle
Mem: 20G Active, 7893M Inact, 5808K Laundry, 2099M Wired, 1570M Buf, 1489M Free
Swap: 4096M Total, 12M Used, 4084M Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
93412 solr        173  52    0   253G    19G uwait    1   2:29  89.37% java
93686 zookeeper    44  52    0  9854M   128M uwait    1   0:02   0.10% java


----------



## Jose (Jan 28, 2022)

This might be a legit bug in the Openjdk port to Freebsd. Not sure how to prove that beyond trying the same workload in Linux, unfortunately.


----------



## last1 (Jan 29, 2022)

Jose said:


> This might be a legit bug in the Openjdk port to Freebsd. Not sure how to prove that beyond trying the same workload in Linux, unfortunately.


That's what Solr team pretty much told me as well. Should I be forwarding this to someone on FreeBSD side ?


----------



## shkhln (Jan 29, 2022)

What bug? Where the hell do you see any bugs?


----------



## last1 (Jan 29, 2022)

shkhln said:


> What bug? Where the hell do you see any bugs?


ok, so then is this expected behavior ? The Solr team doesn't think so


----------



## Jose (Jan 29, 2022)

The bug would be the JVM is ignoring the -mx parameter.


shkhln said:


> The poorly formatted `top` output posted earlier shows 27 GB resident memory being consumed by Solr, which is admittedly a bit too much vs 16 GB heap limit.  I'm assuming it's probably just index files mapped with MMapDirectory, but it wouldn't hurt to verify that assumption. (Still thinking how this can be done.)


I think filing a bug might be your best recourse. It doesn't look like many humans participate in the freebsd-java mailing list.

The problem I see is that it's hard to reproduce. Basically you only see this after some time of heavy use? Can you reproduce it using synthetic traffic?

Another thing to try is reduce the heap size to say, 10GB or so and see how much memory it consumes. Is it a linear function of the mx parameter, or is this parameter simply ignored? This may be unacceptable since it's a production machine, and this may cause it to need reboots more often.


----------



## Jose (Jan 29, 2022)

You might look here for how to collect data to attach to your bug:








						Different Ways to Capture Java Heap Dumps | Baeldung
					

Learn multiple ways to capture a heap dump in Java




					www.baeldung.com


----------



## grahamperrin@ (Jan 29, 2022)

last1 said:


> … I think filing a bug might be your best recourse. It doesn't look like many humans participate in the freebsd-java mailing list. …



+1

List archives are not ideal for tracking what might be a bug, and so on. 

Bugzilla is better suited to triage.


----------



## shkhln (Jan 29, 2022)

last1 said:


> ok, so then is this expected behavior ? The Solr team doesn't think so


Citation need. Either you misunderstood them or they misunderstood you.



Jose said:


> The bug would be the JVM is ignoring the -mx parameter.


I already mentioned MMapDirectory. It's not a subject to the heap limit, since memory mapped files do not use JVM's heap in the first place.


----------



## Jose (Jan 30, 2022)

Interesting. I tried to find out the value UNMAP_SUPPORTED on Freebsd, and indeed it is false. However the setUseUnmap() hack they mention in the class's javadoc does appear to work:

```
import java.nio.file.Paths;

import org.apache.lucene.store.MMapDirectory;

public class TestLucene
{
    public static void main( String[] args ) throws Exception
    {
        System.out.println(MMapDirectory.UNMAP_SUPPORTED);
        System.out.println(MMapDirectory.UNMAP_NOT_SUPPORTED_REASON);

        MMapDirectory dir = new MMapDirectory( Paths.get( "/usr/home", "me" ) );
        dir.setUseUnmap( true );
    }
}
```
Output:

```
true
null
```
Maybe ask the Solr folks if there's a way to enable that hack on Freebsd?


----------

