# Currently 6G is swapped out on my desktop PC , but there is something newer. zswap.



## Alain De Vos (Sunday at 5:45 AM)

Waiting on freebsd ?








						Memory Compression on the Mac Can Improve Performance
					

Compressed memory is part of the Mac. Your Mac can make better use of available RAM improving performance while preventing paging memory to disk.




					www.lifewire.com


----------



## Crivens (Sunday at 8:56 AM)

We have a compressed ARC, and you may want to swap on a compressed ZVOL.


----------



## Alain De Vos (Sunday at 12:01 PM)

Did this,

```
zfs create -V 52G -b 8K -o compression=lz4 -o logbias=throughput -o sync=always -o primarycache=metadata  -o secondarycache=none -o com.sun:auto-snapshot=false -o volmode=full MySwap/Swap
```
No idea what all options  mean


----------



## Crivens (Sunday at 12:22 PM)

I don't think you need sync. Let it compress swapouts and let the compressed data sit there in RAM. If a crash occurs, your swap content is worthless anyway. Now all we need to do is find the sysctl that increases the swapout of inactive memory. But here comes the downside of the memory compression. You can't really cache that or share it between processes. Only your private memory can be compressed.


----------



## Alain De Vos (Sunday at 12:48 PM)

Good idea. I increase blocksize from 8K to 32K. This should allow better compression.


----------



## mer (Sunday at 1:45 PM)

Reading that article brought back some memories;  I had one of the original Macs, so had the limited physical memory.

Below my opinion, based on pretty much, nothing 

CPU performance has increased to the point where "cost to compress/decompress" is negligable.  Of course that depends on the algorithms used and how things are tuned, but most people never notice.

The usefulness of compressing data to RAM really hits more as RAM gets used up.  There is likely a lot of magic tuning going on for you, think about how Java garbage collection runs to reclaim memory for you.  The compressed memory needs to be uncompressed to actually use it, which has a theoretical lag to the user (yes, likely hidden by other things such as keystrokes, mouse movements).  I think it's a more useful feature on systems with limited RAM (typical smartphone) vs servers with tons of RAM (of course the specific workload on the server matters)

I wonder how it affects overall memory fragmentation in the system.  Simplistically,  all allocations are multiples of page size, so if an allocation gets compressed, does it get compressed in place or compressed to a new location (a smaller overall allocation)?  If compressed to new and the original deleted, is there a system level hit on reclaiming that?  Now when you go to use it, you need to decompress it which winds up needing more blocks to uncompress into.

ZFS storing the data in a compressed manner on the device uses less bandwidth on the I/O system so has a theoretical advantage reading/writing to the device.
Compressed ARC means for a given size of ARC it can hold more items, so the system can have a greater hit rate instead of going back to the device.  Uncompressing something pulled from ARC is the same penalty as uncompressing what is read from the device,  so the ability to hold more in ARC, giving a greater hit rate gives better overall system performance.


----------



## Voltaire (Sunday at 2:15 PM)

DragonFly BSD is still the king of RAM efficiency in my experience. It was already very competitive with ZFS in 2009: https://oda.oslomet.no/oda-xmlui/bi..._MartinChristoffer.pdf?sequence=2&isAllowed=y

In terms of performance, I think HAMMER2 is much faster than HAMMER in some situations, and in some common situations faster than all Linux filesystems.

On a standard desktop with XFCE installed, DragonFly BSD seems to me to use about 300MB less RAM than FreeBSD, at least according to neofetch. On the other hand, FreeBSD has fewer bugs in apps and ZFS can be faster than HAMMER2 in some situations. I also think FreeBSD's CPU performance is slightly higher than DragonFly BSD.


----------



## Voltaire (Sunday at 2:46 PM)

This is my current RAM usage on DragonFly BSD:






If I understand it correctly this is less than 9MB of _active RAM usage_. A truly spectacular result and better than 99% of Linux/BSD systems. The only system I can think of that does even better is Alpine Linux, but as far as I know BSD can manage memory better than Linux systems when RAM usage increases.


----------



## mer (Sunday at 4:26 PM)

I think Matt has done some excellent work with DragonFlyBSD;  it shows the beauty of Open Source (I think his website has the history). 
Hammer/Hammer2 has it's design goals, some overlap with ZFS, some not.  Like a lot of others, I'd love to see is OpenZFS could be ported to DragonFly (Yes I know it's a lot of work, may not even be possible), but as Hammer2 gets more miles on it, it will likely wind up as stable UFS/ZFS.

Memory management when approaching "End Of RAM" is always interesting and responses typically fall into one of two camps (yes this not binary and there are many points in between):

Fail spectaculary and reboot quickly
Run severely degraded until a human can intervene.

I hate reboots for unknown reasons but you need to be able to log in to fix a degraded system.  Neither is 100% correct, neither is 100% wrong.  Reality seems to be a mix of the two.


----------



## Alain De Vos (Sunday at 4:38 PM)

Hammer2 on on FreeBSD would be nice...


----------



## bob2112 (Sunday at 5:44 PM)

Voltaire said:


> This is my current RAM usage on DragonFly BSD:
> 
> View attachment 15351
> 
> If I understand it correctly this is less than 9MB of _active RAM usage_.



It's 80 MB of active RAM. However, if dragonfly is like FreeBSD, it's likely to be a meaningless figure because the high value of Free+Cache+Inact creates no incentive to trim memory from the active queue.


----------



## Alain De Vos (Sunday at 7:55 PM)

I had a crash with swap on zfs-zvol with compression.

Reverting back to old freebsd-swap partition.


----------



## Voltaire (Sunday at 8:19 PM)

bob2112 said:


> It's 80 MB of active RAM. However, if dragonfly is like FreeBSD, it's likely to be a meaningless figure because the high value of Free+Cache+Inact creates no incentive to trim memory from the active queue.


You're right. 

I have about 89MB of _active_ RAM usage right after I boot FreeBSD after logging into dwm. On both FreeBSD and DragonFly I have a very limited number of services started, eg dbus and hald on DragonFly, and about three other services. DragonFly and FreeBSD are ultimately quite similar in terms of active RAM usage right after login. I thought DragonFly scored better here, because in neofetch I see that DragonFly with XFCE is using over 200MB less RAM than my particular FreeBSD setup for some reason.



Alain De Vos said:


> I had a crash with swap on zfs-zvol with compression.



FreeBSD's default layout works for me. I never had a full system crash on FreeBSD, it's the only system I can say that about.


----------



## Alain De Vos (Sunday at 8:49 PM)

I heard openbsd is also good when it comes to not wasting memory.

Note : I tend to push my system to the limit with rule of thumb, the cpu should not sleep.


----------



## cracauer@ (Sunday at 11:15 PM)

Alain De Vos said:


> I had a crash with swap on zfs-zvol with compression.
> 
> Reverting back to old freebsd-swap partition.



What kind of crash?


----------



## Alain De Vos (Sunday at 11:23 PM)

A very normal traditional crash. A sudden reboot, without any further warning,notice.
It happened when i was using poudriere aggressive, so it was eating swap-space.
I have 50G swapspace. But the crash happened at around 6G swap-space usage.
I did not analyse the crash i just reverted back to traditional swap-space to be certain.
Note : a crash is not exceptional when the swap is compressed.


----------



## cracauer@ (Sunday at 11:27 PM)

Alain De Vos said:


> Note : a crash is not exceptional when the swap is compressed.



Well, but it should be. So this can be reproduced by simply placing a swapfile on a compressed ZFS volume?


----------



## Alain De Vos (Sunday at 11:52 PM)

I have a "feeling" not "a proof" there might be a relation.
I have read other persons had similar issues...
But there is another parameter i did not thought about , "zvolblocksize"...


----------



## Alain De Vos (Monday at 3:55 AM)

It is important to set the ZVOL block size to match the system page size ...


----------



## Voltaire (Monday at 2:28 PM)

There are a few things I don't quite understand reading the Lifewire article:
-Won't this require more energy? RAM memory requires virtually no energy, doesn't it require more energy to let the CPU do compression very frequently?
-Haven't we gotten to the point where this is largely redundant for desktop systems? I have very little problem even with 4GB system RAM at the moment. I use dwm which uses extremely little RAM. Many desktops now have 16GB of RAM. When do you ever use that on a desktop system? Only with (heavy) flight sims have I ever seen such high RAM usage on desktops. But I hardly know anyone who frequently plays a flight simulator, so that seems more of a niche to me.
-What are the numbers behind this story. There is talk of 'performance improvements' in this article, but no hard numbers of common scenarios are mentioned.

Seems like something that was invented to make use of the large amount of cores that CPUs now have, not because desktop users really need RAM compression.

This kind of optimization seems more useful to me: https://www.phoronix.com/news/OpenZFS-Uncached-Prefetch


----------



## Alain De Vos (Monday at 2:36 PM)

Compiling vscode,iridium,chromium takes very long when not done in memory.


----------



## Voltaire (Monday at 2:44 PM)

Alain De Vos said:


> Compiling vscode,iridium,chromium takes very long when not done in memory.


That's true, but why would a desktop user, more specifically a macOS user, ever compile one of these apps themselves?
I recently compiled ZFS myself on Linux and it actually went pretty fast on my primitive system. 
Most apps require little RAM for compilation.


----------



## mer (Monday at 2:52 PM)

Voltaire my opinion, it's moderately useful on low memory systems as RAM gets "full".  A lot is based on workload and intent of the system;  something building ports that's effectively kick off a batch job and walk away.  How long it takes is really the only metric you care about, so use all the memory.
user based graphical workstations?  I think the load is a lot more variable, opening/closing apps, tabs on a browser, quick editor over here, so subjective responsiveness is the driving metric (is it fast enough for the user).  Would RAM ever get full enough start desiring compression?  I don't know.  

A lot of time it's not really about how many resources you have, it becomes how efficiently do you use them.  The original Macs with 9inch B&W screen, 128K of memory, 3.5inch floppy?  They were pretty responsive for the day, the compression utilities helped because the system didn't need to go out to the floppy


----------



## Voltaire (Monday at 3:23 PM)

mer said:


> Voltaire my opinion, it's moderately useful on low memory systems as RAM gets "full".  A lot is based on workload and intent of the system;  something building ports that's effectively kick off a batch job and walk away.  How long it takes is really the only metric you care about, so use all the memory.
> user based graphical workstations?  I think the load is a lot more variable, opening/closing apps, tabs on a browser, quick editor over here, so subjective responsiveness is the driving metric (is it fast enough for the user).  Would RAM ever get full enough start desiring compression?  I don't know.
> 
> A lot of time it's not really about how many resources you have, it becomes how efficiently do you use them.  The original Macs with 9inch B&W screen, 128K of memory, 3.5inch floppy?  They were pretty responsive for the day, the compression utilities helped because the system didn't need to go out to the floppy


I assumed that systems like the MacBook Pro would always have a minimum of 16GB of RAM as a base configuration for their price. But I checked that and Apple is asking € 1,619.00 for 8 GB of central memory and 256 GB of SSD storage. It's amazing how people waste their money on that. It also largely explains the article, I translate it in my head anyway as: Apple are so stingy that they give 8GB RAM to their most premium products.

Even then I don't think you often go over that 8GB. It is mainly recent games that need more than 8GB. And none of the best games of 2022 will work on macOS. Think Elden Ring, Ghostwire: Tokyo, A Plague Tale: Requiem, and Stray. None of those games were developed for macOS.


----------



## Alain De Vos (Monday at 3:51 PM)

I just need to compile one port, chromium in poudriere with option TMPFS all, and my system uses more than 16G of memory ...


----------



## Alain De Vos (Monday at 3:52 PM)

cracauer@ said:


> Well, but it should be. So this can be reproduced by simply placing a swapfile on a compressed ZFS volume?


Done additional tests, affirmative.


----------



## Voltaire (Monday at 4:05 PM)

Alain De Vos said:


> I just need to compile two ports, vscode&chromium in poudriere with TMPFS all, and my system uses more than 16G of memory ...


You can simply reduce the number of parallel jobs in MAKEOPTS and/or EMERGE_DEFAULT_OPTS to stay under 16GB RAM usage during compilation.

What interests me: why don't you install it via pkg? Chromium binary packages are available on almost all systems, including macOS and Linux, and they work fine.


----------



## Alain De Vos (Monday at 5:21 PM)

According to Mr. Spock it's i highly illogical i do things difficult when i can them do simple.
PS: Currently compiling only Chromium&Iridium browser, ie two ports with poudriere & TMPFS setting "yes".
16G memory & 19G swap usage. Foreseen end time 5 hours.
[My CPU is Intel 12-th generation ]
On my older PC compiling Chromium took 35 hours. With these compile times if you can do something parallel you do it.


----------



## Voltaire (Monday at 5:56 PM)

Alain De Vos said:


> According to Mr. Spock it's i highly illogical i do things difficult when i can them do simple.
> PS: Currently compiling only Chromium&Iridium browser, ie two ports with poudriere & TMPFS setting "yes".
> 16G memory & 19G swap usage. Foreseen end time 5 hours.
> [My CPU is Intel 12-th generation ]
> On my older PC compiling Chromium took 35 hours. With these compile times if you can do something parallel you do it.


Are you compiling both apps at the same time? I think that is quite a long time for your hardware.
I used to use Arch Linux for a long time and let's just say I had +- 12 AUR packages that had to compile regularly on very weak hardware.
I've never seen anything take longer than 45 minutes.
It is apparently true that that large C++ programs have extremely slow compilation times for quite a while.
I do wonder why C++ is so popular if it causes such compilation problems.
Again I suspect you can get faster performance by making sure no SWAP is used by cutting down the parallel processes.





						Knowledge Base:Emerge out of memory - Gentoo Wiki
					






					wiki.gentoo.org
				




_The system became extremely slow because of swap usage while emerge something._
_It is not advised to use parallel jobs in either MAKEOPTS nor EMERGE_DEFAULT_OPTS on systems that do not have much RAM (Raspberry Pi's with 512 MB of RAM, old desktop computers, etc.)._
_Try to* lower number of parallel jobs for some packages which usually requires more RAM to compile.*_


----------



## Alain De Vos (Monday at 6:18 PM)

You are right. Too much swapping influences performance bad.
In poudriere i configured,
# Use tmpfs(5)
# This can be a space-separated list of options:
# wrkdir    - Use tmpfs(5) for port building WRKDIRPREFIX
# data      - Use tmpfs(5) for poudriere cache/temp build data
# localbase - Use tmpfs(5) for LOCALBASE (installing ports for packaging/testing)
# all       - Run the entire build in memory, including builder jails.
# yes       - Enables tmpfs(5) for wrkdir and data
# no        - Disable use of tmpfs(5)
# EXAMPLE: USE_TMPFS="wrkdir data"
USE_TMPFS="yes"

PARALLEL_JOBS=13
PREPARE_PARALLEL_JOBS=17
In make.conf,
MAKE_JOBS_NUMBER=12

Number of cores on my CPU is 12.
A good measurement of frequency of pages swapped-in would be interesting. Because pages swapped in is expensive on load.

I'll try now with,
MAKE_JOBS_NUMBER=6
This number seem to have hardly an influence on swap usage ...

Trying now,
USE_TMPFS="wrkdir"
Not better swap usage, 16G

====================================================================================
Trying something new,
PARALLEL_JOBS=1
PREPARE_PARALLEL_JOBS=11
Meaning i build packages not in parallel but one after another. When done in memory it gives good performance...
With these settings swap usage dropped from 19G to 8G.


----------



## Voltaire (Monday at 7:53 PM)

How many threads does your CPU have?
You still use a lot of SWAP. I would give this option a try:

MAKEOPTS="-j8"
NINJAOPTS=-j8

Or if this still uses a lot of swap I would try this for example:
MAKEOPTS="-j4"
NINJAOPTS=-j4

You can export those values with the export command.
Another thing often mentioned to reduce RAM usage is 'disabling jumbo-build'.


----------



## Voltaire (Monday at 8:14 PM)

(Gentoo)MAKEOPTS=(FreeBSD)MAKE_JOBS_NUMBER

##
# MAKE_JOBS_SAFE
#                               - This port can safely be built on multiple cpus in parallel.
#                                 The make will be invoked with -jX parameter where X equals
#                                 number of cores present in the system.
# MAKE_JOBS_UNSAFE
#                               - Disallow multiple jobs even when user set a global override.
#                                 To be used with known bad ports.
# DISABLE_MAKE_JOBS
#                               - Set to disable the multiple jobs feature.  User settable.
# FORCE_MAKE_JOBS
#                               - Force all ports to be built with multiple jobs, except ports
#                                 that are explicitly marked MAKE_JOBS_UNSAFE.  User settable.
# MAKE_JOBS_NUMBER
#                               - Override the number of make jobs to be used.  User settable.


You may need to do the combination of the following two values:
*FORCE_MAKE_JOBS=yes*
MAKE_JOBS_NUMBER=8

It seems you are not using the bold option.
What I wonder is what the FreeBSD equivalent is for NINJAOPTS?


----------



## Alain De Vos (Monday at 10:08 PM)

For whom is interested, my make.conf:

```
MYFLAGS="-fno-lto -O2 -pipe"
CFLAGS+="${MYFLAGS}"
CXXFLAGS+="${MYFLAGS}"
BATCH=yes
CPUTYPE?=alderlake
DISABLE_LICENSES=yes
MAKE_JOBS_NUMBER=8
MAKE_JOBS_UNSAFE=yes
MTREE_FOLLOWS_SYMLINKS= -L
NO_CHECKSUM=yes
OPENSSLBASE=/usr/local
WITHOUT_CCACHE_BUILD=yes
WITHOUT_MANCOMPRESS=yes

DEFAULT_VERSIONS+= mysql=10.6m
DEFAULT_VERSIONS+= ssl=openssl

MAKEOPTS="-j8"
NINJAOPTS="-j8"
```

& poudriere.conf

```
# Use tmpfs(5)
# This can be a space-separated list of options:
# wrkdir    - Use tmpfs(5) for port building WRKDIRPREFIX
# data      - Use tmpfs(5) for poudriere cache/temp build data
# localbase - Use tmpfs(5) for LOCALBASE (installing ports for packaging/testing)
# all       - Run the entire build in memory, including builder jails.
# yes       - Enables tmpfs(5) for wrkdir and data
# no        - Disable use of tmpfs(5)
# EXAMPLE: USE_TMPFS="wrkdir data"
USE_TMPFS="wrkdir data"

PARALLEL_JOBS=6
PREPARE_PARALLEL_JOBS=11
# By default MAKE_JOBS is disabled to allow only one process per cpu
# Use the following to allow it anyway
ALLOW_MAKE_JOBS=yes
```


----------

