# How FreeBSD utilize multicore processors and multi-CPU systems?



## Sergei_Shablovsky (Feb 13, 2020)

Hi, FreeBSD guru!
How FreeBSD utilize multicore processors in ONE CPU systems ? 
How FreeBSD utilize multicore processors in multi-CPU systems ?


----------



## multix (Apr 10, 2020)

I have a Pentium-D which is Dual-Core, it will say:

`FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s)`

The second CPU is launched:

`SMP: AP CPU #1 Launched!`

and everything looks smooth!


----------



## ralphbsz (Apr 11, 2020)

Are you asking "does it work"? At least reasonably well. No, I have not done extensive benchmarks, nor run it on extreme hardware (like >100 cores or a dozen CPUs or highly NUMA architectures), but on run-of-the-mill single socket Intel/AMD it works good enough for amateur usage. I'm not sure about high performance applications.

This is from my home server:

```
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s) x 2 hardware threads
Firmware Warning (ACPI): 32/64X length mismatch in FADT/Gpe0Block: 128/64 (20171214/tbfadt-748)
ioapic0: Changing APIC ID to 4
ioapic0 <Version 2.0> irqs 0-23 on motherboard
SMP: AP CPU #1 Launched!
SMP: AP CPU #2 Launched!
SMP: AP CPU #3 Launched!
```

Are you asking "how is it implemented"? Read Marshall Kirk McKusick et al, "The Design and Implementation of the FreeBSD Operating System", 2nd edition (with the black cover with a daemon on it). The first few chapters talk about it.


----------



## PMc (Apr 11, 2020)

Sergei_Shablovsky said:


> Hi, FreeBSD guru!
> How FreeBSD utilize multicore processors in ONE CPU systems ?
> How FreeBSD utilize multicore processors in multi-CPU systems ?



With MPS 1.4 (or newer).


----------



## Sergei_Shablovsky (May 2, 2020)

PMc said:


> With MPS 1.4 (or newer).


Thank You for reply! 

Specification are great. In the same time we all know how implementation in real hardware (motherboard + cpu) AND software impact on.

Common place the last 10+ years that software development running faster (because of industry rushing, and common data value dramatically increasing) than hardware manufacturer able to Engineering and producing motherboards+CPU.

So, each time when we need a effective and well-balanced solution, we need to “calling to all”: community of users of certain software, community of users of certain operating system, and of course forums/support of hardware manufacturer.


----------



## Sergei_Shablovsky (May 2, 2020)

ralphbsz said:


> Are you asking "does it work"? At least reasonably well. No, I have not done extensive benchmarks, nor run it on extreme hardware (like >100 cores or a dozen CPUs or highly NUMA architectures), but on run-of-the-mill single socket Intel/AMD it works good enough for amateur usage. I'm not sure about high performance applications.


Thank You for kindly reply!

The start topic is only first step. 

Because the main question are: how network-focused software (I’m interesting exactly in 
1. firewall pfSense solution
2. balancing HAproxy solution
3. FreeNAS storage solution)
working on systems with multi-CPU systems (which support  multi-threading and have 4-6-8-12 cores).

This is complex question because each solution have different software architecture, and different loading strategy on CPU, memory and data bus.

What a You think about this?



ralphbsz said:


> Are you asking "how is it implemented"? Read Marshall Kirk McKusick et al, "The Design and Implementation of the FreeBSD Operating System", 2nd edition (with the black cover with a daemon on it). The first few chapters talk about it.


Thank You so much! I’l try To find it.


----------



## Sergei_Shablovsky (Nov 6, 2020)

Could You be so please to comment about manage iflib threads on several CPU cores, the last reply on this thread?
How pfSense utilize multicore processors and multi-CPU systems ?​








						How pfSense utilize multicore processors and multi-CPU systems ?
					

Hi, pfSense Gurus! Looking on perspective of upgrading to multi-CPU systems we have 2 main question:  How pfSense utilize multicore processors in ONE CPU systems ? How pfSense utilize multicore processors in multi-CPU systems ?  UPDATE - Feb 2021 Hm. Look...




					forum.netgate.com


----------



## Sergei_Shablovsky (Dec 16, 2020)

Also this post about FreeBSD optimization and tuning for networking for Yours attention https://calomel.org/freebsd_network_tuning.html


----------



## Sergei_Shablovsky (Feb 11, 2021)

Hm. Looks like hard to find right answer...

 I need a little bit to explain the topic start question: 

What system is better for network-related operation (i.e. firewall, load balancing, gate, proxy, media stream,...): 
a) 1 CPU with 4-10 cores, hi-frequency
b) 2-4 CPU with 4-6 cores, mid-frequency

And how the cache in CPU L2 (2-56Mb) and L3 (2-57Mb) impact on network-related operation (in cooperation with NIC card) ?


----------



## richardtoohey2 (Feb 11, 2021)

Sergei_Shablovsky said:


> What system is better for network-related operation (i.e. firewall, load balancing, gate, proxy, media stream,...)


I don't have any answers but - won't it depend on the network and the bandwidth and the amount of traffic and type of traffic?

Any also what you are doing with the traffic - pushing it through as fast as possible?  Or trying to analyse it and do more than just pushing packets?

If a LAN with max. 100 MB/s then it probably won't matter what machine or set-up you have - anything modern will (I think) cope with the network.


----------



## Sergei_Shablovsky (Feb 14, 2021)

richardtoohey2 said:


> I don't have any answers but - won't it depend on the network and the bandwidth and the amount of traffic and type of traffic?


Because the different type of traffic involve different software chain react to. For example, media streaming packets, VPN sessions and ICMP are very different in processing inside software that are on the top of BSD. Am I wrong? 



richardtoohey2 said:


> Any also what you are doing with the traffic - pushing it through as fast as possible?  Or trying to analyse it and do more than just pushing packets?


From the networking device point of view the main goal are "processing packets w/o errors and as fast as possible".

My questions related more to situation when FreeBSD used as core of FW, VPN gate or balancer on usual Intel-based servers.


richardtoohey2 said:


> If a LAN with max. 100 MB/s then it probably won't matter what machine or set-up you have - anything modern will (I think) cope with the network.



The speeds we must talking about are starting from 10-20Gb / s


----------



## richardtoohey2 (Feb 14, 2021)

Not sure you will get answers to those sorts of questions on these forums.

There's a book:

Design and Implementation of the FreeBSD Operating System, The 2nd Edition

And Netflix work on FreeBSD and networking e.g.






						Netflix and FreeBSD: Using Open Source to Deliver Streaming Video :: FreeBSD Presentations and Papers
					






					papers.freebsd.org
				




_Using FreeBSD and commodity parts, we achieve 90 Gb/s serving TLS-encrypted connections with ~55% CPU on a 16-core 2.6-GHz CPU._

Pretty sure there are other Netflix papers on working with FreeBSD and NUMA etc.

Have a look at https://papers.freebsd.org/ e.g.






						In-kernel TLS Framing and Encryption for FreeBSD :: FreeBSD Presentations and Papers
					






					papers.freebsd.org
				




I'm not sure if you are just trying to learn how things work or if you have a specific requirement or issue that you need to fix - maybe if you are more specific then someone can help.


----------



## Phishfry (Feb 14, 2021)

I know this is a subjective topic but I prefer Single Socket server board.
The second cpu does not bring a linear acceleration. There is a preformance hit for dual cpu.
Witness the synthetic benchmarks.
Single CPU  = 11K


			PassMark - Intel Xeon E5-2650L v3 @ 1.80GHz - Price performance comparison
		

Same CPU dual =18K


			PassMark - [Dual CPU] Intel Xeon E5-2650L v3 @ 1.80GHz - Price performance comparison
		


But where a dual CPU configuration can help is PCIe lanes. Typical Xeon had 40 Lanes. with 2 CPU that means 80 lanes.
For a setup requiring I/O this can be important. The newer LGA3647 Xeon has 48 Lanes.
AMD EPYC has 128 Lanes.

So the single EPYC/2 will smash most Dual CPU setups.


			PassMark - AMD EPYC 7302P - Price performance comparison
		



			PassMark - [Dual CPU] AMD EPYC 7302 - Price performance comparison
		


There are benefits to single CPU. Interprocess communication kept on die is superior.


----------



## GoNeFast_01 (Feb 16, 2021)

Phishfry said:


> So the single EPYC/2 will smash most Dual CPU setups.
> PassMark - AMD EPYC 7302P - Price performance comparison   PassMark - [Dual CPU] AMD EPYC 7302 - Price performance comparison


And that is why there are DUAL CORE EPYC Boards now....  Enjoying the boundaries being push let's go quantum!!!


Well I have installed FreeBSD on some 8-32 core processors performs relative well.... 4-8% with very heavy media use as desktop environment(100+ tabs of firefox, 40+ chrome, 20+ terminal, 2+ VM) sometimes ram becomes an issue but not CPU on my experience barely breaks 10% ever.


----------



## Phishfry (Feb 16, 2021)

Yea but did you notice the benchmarks?
Single EPYC=33K
Dual EPYC=40K
Yikes imagine buying 2 chips at $4K each and only getting marginal increase....

So is this a testing flaw? Passmark is a Windows thing so not representative of FreeBSD.
But I do feel that NUMA drags pretty hard.

Intel uses QPI for its core interconnect and it is quick. Going off die is costly.


----------



## GoNeFast_01 (Feb 17, 2021)

Phishfry said:


> Yea but did you notice the benchmarks?
> Single EPYC=33K
> Dual EPYC=40K
> Yikes imagine buying 2 chips at $4K each and only getting marginal increase....


This is a nice catch, and if what the benchmark say is true, I would be MAD as hell 

Honestly though, I think that is a software limitation or bottleneck (Maybe WINDOWS? like you mention )... At the end of the day we need to put them in production environment and let these thing bleed. I like to test in real world environment, call me a benchmark skeptic.

I like the VCORE for cloud's future, the cost is being driven to the ground... I mean pretty soon everyone and their mother will have a VM for a computer, it will just be the most economical way. At these scales, I mean: 128v cores or 256v cores NO ONE in the home will use anything close to 50% of what these CPU will be able to do. I mean yes,   you could use it for crypto mining.


----------



## Phishfry (Feb 17, 2021)

I should mention the 7302p EPYC that I posted is middle of the road. $1000 chip not $4000 like their champ, the 64 core EPYC 7702


			PassMark - AMD EPYC 7702 - Price performance comparison
		


Threadripper variants 3990X/3995WX are the only thing that tops this. Same price range.
With the PCIe-4 and 128 lanes EPYC really has some legs. Stomping all over their competitor.
14 AMD Chips at the top of the charts. Intels top offering there at $7K for a 15th place CPU.


----------



## GoNeFast_01 (Feb 17, 2021)

Phishfry said:


> Threadripper variants 3990X/3995WX are the only thing that tops this. Same price range.
> With the PCIe-4 and 128 lanes EPYC really has some legs. Stomping all over their competitor.
> 14 AMD Chips at the top of the charts. Intels top offering there at $7K for a 15th place CPU.


Yup is sad.... Intel lost it, I switch to AMD...

Even GPU they're pushing NVIDIA for first time I am glad, literally have been a slave to nvidia... Actually still am due to NVIDIA support with Freebsd  .. But at least now I give AMD a look


----------



## Mjölnir (Feb 17, 2021)

Sergei_Shablovsky said:


> What system is better for network-related operation (i.e. firewall, load balancing, gate, proxy, media stream,...):


OT, you might already know this... but I hope you do not intend to put the (external) packet filter (often loosely called "firewall") onto the same physical machine than other services.  Don't do that.  It must be on it's own physical machine, solely for that purpose, and no other services on that host.  In contrast, you _can_ merge the _internal_ PF onto the same machine as a DMZ host (with gateway services (proxy, load balancer, mail etc.) jailed or in VMs), but not the _external_ one.


----------



## Sergei_Shablovsky (May 26, 2021)

Mjölnir said:


> In contrast, you _can_ merge the _internal_ PF onto the same machine as a DMZ host (with gateway services (proxy, load balancer, mail etc.) jailed or in VMs), but not the _external_ one.


Thank You for informative reply.

Just from my second post in this thread:



> Because the main question are: how network-focused software (I’m interesting exactly in
> 1. firewall pfSense solution
> 2. balancing HAproxy solution
> 3. FreeNAS storage solution)
> ...



That mean that we speak about one physical machine. 

Of coarse for many reasons (sustainability, redundancy, point of failure, etc...) some *functions* better keep on separate machines: Firewall+router+DPI on one, balancer+ssl on another, etc...

In this thread I just try to receive the answer for “numbers of cpu, numbers of cores VS main frequency in FreeBSD for routing packets, analyzing packets, enc/decrypting packets and deals with RAID controllers to handle databases/VMs”


----------

