# how to speed/improve memcpy



## nbari (Sep 7, 2021)

I am running a Redis (version 6.2.5) cluster "sentinel" of 3 nodes (1 master, 2 replicas), OS FreeBSD 13, the dataset is approximately 70GB, the system has 320GB RAM, SSD disks, and the servers (dedicated, not VM's) are in the same network/datacenter, probably hardware is not a problem, but I start to notice that after `BGSAVE` finishes, there is a lag of approximately 10 seconds, because of this, the applications randomly get this error: `NOREPLICAS Not enough good replicas to write`, setting, `min-replicas-max-lag` to 20 helps but I would like to know if there is something I could fine-tune to speed up `memcpy`, from the flame graph: (Github discussion here: https://github.com/redis/redis/discussions/9457)








The current /boot/loader.conf:


```
# Set Max size = 32GB
vfs.zfs.arc_max="34359738368"
# Min size = 4GB
vfs.zfs.arc_min="4294967296"


kern.ipc.semmns="2048"
kern.ipc.semmni="128"
kern.ipc.shmall="33554432"
kern.ipc.shmseg="1024"
kern.ipc.shmmax="137438953472"
```

In /etc/sysctl.conf


```
kern.ipc.shm_use_phys=1
```

For building the kernel I have this in /etc/make.conf


```
CFLAGS=         -O2 -pipe -fno-strict-aliasing
COPTFLAGS=      -O2 -pipe -fno-strict-aliasing

BUILD_OPTIMIZED=        YES
BUILD_STATIC=           YES
OPTIMIZED_CFLAGS=       YES
WITHOUT_DEBUG=          YES
WITH_CPUFLAGS=          YES
WITH_OPTIMIZED_CFLAGS=  YES
MALLOC_PRODUCTION=      YES
```

From the flame graph, next  to the Redis is one from ZFS:





Here also I notice that  ZFS is using memcpy:











Currently trying with `vm.pmap.pg_ps_enabled=0`


----------



## SirDice (Sep 7, 2021)

nbari said:


> For building the kernel I have this in /etc/make.conf


Remove those `CFLAGS` and `COPTFLAGS`.


----------



## diizzy (Sep 7, 2021)

...and BUILD_STATIC probably doesn't help either but it most likely it wont fix your main issue
You probably want to set/define CPUTYPE so you can take advantage of new(er) instructions


----------



## Alain De Vos (Sep 7, 2021)

Why this tuning ?

```
kern.ipc.semmns="2048"
kern.ipc.semmni="128"
kern.ipc.shmall="33554432"
kern.ipc.shmseg="1024"
kern.ipc.shmmax="137438953472"
```


----------



## mark_j (Sep 7, 2021)

That tuning's for things like databases communicating over ipc/shared memory - Oracle is big on all that stuff, particularly on Solaris.


----------



## nbari (Sep 7, 2021)

Alain De Vos said:


> Why this tuning ?
> 
> ```
> kern.ipc.semmns="2048"
> ...



Like mark_j mentioned, indeed I normally found also that for PostgreSQL and though could help to improve Redis performance. (try to keep all in ram) 


diizzy regarding `BUILD_STATIC/CPUTYPE` any url/doc or something in specific ?

Currently, I am rebuilding kernel & word removing as SirDice advised, removing  `CFLAGS/COPTFLAGS`


----------



## SirDice (Sep 7, 2021)

nbari said:


> indeed I normally found also that for PostgreSQL and though could help to improve Redis performance. (try to keep all in ram)


PostgreSQL uses shared memory (which is where those settings are for). Redis however does not.


----------



## diizzy (Sep 7, 2021)

nbari 
https://cgit.freebsd.org/src/tree/share/examples/etc/make.conf#n25 will probably help
As regarding to static vs dynamic I should probably rephrase it as it depends, ffmpeg is or at least used to be a lot slower compiled as a static binary for instance. Dynamic linking is also the in general preferred way of linking binaries and libraries in FreeBSD


----------



## mark_j (Sep 7, 2021)

Back to the original question " if there is something I could fine-tune to speed up memcpy", there's probably not. Clang (and gcc) are very good at inlining the code to optimise it. I guess you could experiment with __SSE2__ or __SSE3__ (thus restricting possible optimisation to intel code chips)  in memcpy with -fno-builtin-memcpy in your own version of memcpy?

The only way to really speed it up is get faster RAM or where possible use pointers and move pointers rather copying the data; but that assumes modifying this "redis" stuff.

If this stuff is all in memory, then judicious use of mmap would seem a better approach. Then again, I don't know what redis does, it might just do so.


----------



## _martin (Sep 7, 2021)

I second the mark_j opinion. memcpy (memmove) comes from the libc, for amd64 lib/libc/amd64/string/memmove.S. _Note_ there does say no simd operations. 
But writing your of memcpy (and let it be better than the current one) could be a hard task.


----------



## Alain De Vos (Sep 8, 2021)

when i run,

```
pkg info | awk '{print $1}' | xargs -I {} pkg info -D  {} | grep kern.ipc
```
I don't find anything on kern.ipc


----------



## _martin (Sep 8, 2021)

Alain De Vos Have a look in here: Part II. Interprocess Communication to get some information about IPC.
Those tunables are accessible via standard sysctl, `$ sysctl kern.ipc`.


----------



## SirDice (Sep 8, 2021)

Alain De Vos said:


> when i run,
> 
> ```
> pkg info | awk '{print $1}' | xargs -I {} pkg info -D  {} | grep kern.ipc
> ...


You know you can just do `pkg info -aD` right?


----------



## nbari (Sep 13, 2021)

SirDice said:


> Remove those `CFLAGS` and `COPTFLAGS`.


Is there any benefit from using something like or why is better to remove all CFLAGS/COPTFLGS?


```
CFLAGS="-O3"
```


----------



## SirDice (Sep 13, 2021)

nbari said:


> Is there any benefit from using something like or why is better to remove all CFLAGS/COPTFLGS?


The developers have already set the most optimal (for most people) options for every single part of the base and kernel. So unless you fully understand what all those options do with the compiler and the code it's best not to touch it. Just randomly adding options you found on the internet without actually understanding what they do typically makes things worse, not better.


----------



## covacat (Sep 13, 2021)

memcpy/memmove is already assembly code (without simd instructions) so CFLAGS wont matter


----------



## hardworkingnewbie (Sep 13, 2021)

For me the more interesting question is: how did you configure your Redis instance? This very important piece of information is still missing.


----------



## nbari (Sep 14, 2021)

probably I am facing this issue:


> Latency induced by transparent huge pages​Unfortunately when a Linux kernel has transparent huge pages enabled, Redis incurs to a big latency penalty after the fork call is used in order to persist on disk. Huge pages are the cause of the following issue:
> 
> 
> Fork is called, two processes with shared huge pages are created.
> ...



How in FreeBSD, like in Linux "disable transparent huge pages" or similar to improve these latency issues?


----------



## nbari (Sep 14, 2021)

hardworkingnewbie said:


> For me the more interesting question is: how did you configure your Redis instance? This very important piece of information is still missing.











						redis halts/idle more than 10 seconds after BGSAVE finishes · Discussion #9457 · redis/redis
					

I am running a cluster "sentinel" of 3 nodes (1 master, 2 replicas) Redis version 6.2.5, OS FreeBSD 13, the dataset is approximately 70GB, the system has 320GB RAM, SSD disks, and the ser...




					github.com
				





```
appendonly no
daemonize yes
databases 8
dbfilename dump.rdb
dir /var/db/redis
min-replicas-max-lag 20
min-replicas-to-write 1
pidfile /var/run/redis/redis.pid
protected-mode no

maxmemory 255092mb

save 900 1
save 300 10
save 60 10000

io-threads 13
io-threads-do-reads yes

client-output-buffer-limit replica 16gb 16gb 60
repl-backlog-size 4gb
repl-timeout 3600
```


----------



## mark_j (Sep 14, 2021)

nbari said:


> probably I am facing this issue:
> 
> 
> How in FreeBSD, like in Linux "disable transparent huge pages" or similar to improve these latency issues?


Read this.
You need to remember linux != freebsd. In some things like providing posix they're close but nearly everything else is different.
What I'm try to say is while both might implement superpages or hugepages or w^x or whatever, the actual implementation and functionality will likely be hugely disparate.


----------

