# Is it worth building for native CPU on amd64?



## rowan194 (Dec 10, 2021)

Hi. CLANG 10.0.1 from FreeBSD 12.2-RELEASE-p11 supports the following architectures, which I presume are all x86 based:


```
note: valid target CPU values are: nocona, core2, penryn, bonnell, atom, silvermont, slm, goldmont, goldmont-plus, tremont, nehalem, corei7, westmere, sandybridge, corei7-avx, ivybridge, core-avx-i, haswell, core-avx2, broadwell,
      skylake, skylake-avx512, skx, cascadelake, cooperlake, cannonlake, icelake-client, icelake-server, tigerlake, knl, knm, k8, athlon64, athlon-fx, opteron, k8-sse3, athlon64-sse3, opteron-sse3, amdfam10, barcelona, btver1, btver2,
      bdver1, bdver2, bdver3, bdver4, znver1, znver2, x86-64
```

Of the couple of servers I have that are updated from source, I build world/kernel on each machine individually. Is it worth trying to build (on 64 bit x86) for the exact CPU the machine has? And is a CLANG architecture such as (for example) "ivybridge" actually i386, and therefore not relevant for a 64 bit system?

Tried hunting the big G for answers, but no luck. Since I have to build from source anyway, I'm curious whether I can improve efficiency by not having to support the oldest CPU. Thanks.


----------



## Argentum (Dec 10, 2021)

rowan194 said:


> Hi. CLANG 10.0.1 from FreeBSD 12.2-RELEASE-p11 supports the following architectures, which I presume are all x86 based:
> 
> 
> ```
> ...


Once I had a machine with *haswell* CPU and tried to build exclusively for it. The code was different, but then I ran some tests and everything was actually a bit slower. So I did abandon the *haswell* flag. It may be different with other architectures.


----------



## Alain De Vos (Dec 10, 2021)

I have for Ivy Bridge (Core i7) in make.conf,

```
CPUTYPE?=core-avx-i
```


----------



## Eric A. Borisch (Dec 10, 2021)

The easiest thing to do -- assuming you're building on the host that will be running the compiled versions -- (in `/etc/make.conf` is to use `CPUTYPE ?= native`). That way you don't have to change it for every system.


----------



## cmoerz (Dec 10, 2021)

That's a pretty complex topic. At least on x86, more modern CPUs come with specific instruction sets (check out https://en.wikipedia.org/wiki/List_of_Intel_CPU_microarchitectures for reference), which - if used - may prove to be beneficial for performance. 

Said performance may however only be measurable under particular circumstances, i.e. in multi threaded applications. Work loads, that run in only one thread (which there are some in FreeBSD's kernel, if memory serves), may actually be hit in a detrimental way. Sometimes, improvements may turn out to be causing security concerns.

There are certainly some general optimizations, which deliver measurable results. Just look at this for example:





						LLVM Clang 12 Benchmarks At Varying Optimization Levels, LTO - Phoronix
					






					www.phoronix.com
				




Overall, optimizations are kind of like a heuristic. There may be special cases, where optimizing can lead to unexpected or undesired behavior - for example








						LLVM 11 optimizations breaks code · Issue #10220 · crystal-lang/crystal
					

After #9829 I attempt to bump homebrew formula with a backport patch. This will allow broader testing of llvm 11 for Crystal. But the story didn't end well. Similar to the issue #9829 (comment)...




					github.com
				




In terms of your kernel compilation and choosing CPU specific code, you might want to simply try it and check whether it helps with your particular workload. Unfortunately, with your particular system, you're probably the only person to definitively confirm whether it's worthwhile for your box to compile it with CPU-specific code.


----------



## Alain De Vos (Dec 10, 2021)

I think i compiled crystal-lang with llvm13 ...


----------



## Eric A. Borisch (Dec 10, 2021)

It would be interesting to benchmark ZFS in particular; it looks like it has switches in the raidz and fletcher code for AVX, etc.


----------



## rowan194 (Dec 11, 2021)

Eric A. Borisch said:


> The easiest thing to do -- assuming you're building on the host that will be running the compiled versions -- (in `/etc/make.conf` is to use `CPUTYPE ?= native`). That way you don't have to change it for every system.


Thanks, that's a nice tip.

After some further searching, I think I'm teetering at the edge of the rabbit hole: not only is there -march, but there's also -mtune and -mcpu (the latter for legacy GCC only?), and as well as CPUTYPE in /etc/make.conf there's also MACHINE_CPU.



			UsingCPUTYPE - FreeBSD Wiki
		


FYI, package sysutils/hs-cputype will show the current CPU type.

I did some quick tests on a random file from the secp256k1 lib, and the following two commands result in a byte-for-byte identical object file (with CLANG 10.0.1, on an i7-3930K) :

`cc -march=sandybridge -mtune=sandybridge ...`

`cc -march=native ...`

So the latter generic "native" seems to work as expected. The output is also slightly smaller when compiling for the specific CPU, versus no -march/-mtune flags. (Note, I haven't benchmarked actual performance.)

There are some userland applications I use that would benefit from further experiments to wring out the very last bit of optimisation from, but for the base system build I think I'll stick with `CPUTYPE ?= native`.

BTW, I'm not necessarily interested in just "faster": one of my low power embedded CPUs runs very hot for some reason, with one core reporting around 68 degrees at a load of only about 0.1 to 0.2. An identical device with a slightly slower CPU reports 52C at a load of 0.4.


----------



## Eric A. Borisch (Dec 11, 2021)

Tried changing the clock speed (powerd or powerdxx, or just set it to a low value with sysctl.conf) if it’s CPU tempersture (and not performance) you’re trying to impact?

They all adjust dev.cpu.0.freq.


----------



## rowan194 (Dec 12, 2021)

Eric A. Borisch said:


> Tried changing the clock speed (powerd or powerdxx, or just set it to a low value with sysctl.conf) if it’s CPU tempersture (and not performance) you’re trying to impact?
> 
> They all adjust dev.cpu.0.freq.


`powerd` is running, and does downclock the frequency. I've also tried forcing it lower via sysctl. Neither seems to make a lot of difference.

It's possible that the reported temperature is bogus, since there's a large difference between the two cores (and a fair difference in temp today versus yesterday) :

dev.cpu.1.temperature: 53.0C
dev.cpu.0.temperature: 39.0C

...or perhaps the temperatures are accurate, and it's something like a bad thermal connection. I can feel a lot of heat coming from it. For comparison, the other device currently shows 55C and 56C for each respective core.


----------

