# Networking much better in 13.1 (upgraded from 12.2)



## Rudy (Apr 20, 2022)

Our FreeBSD router was getting taxed at 15Gbps -- started to drop packets, customer complaints, etc.

I upgrade to 13.1, turned on hyperthreading, and upped the hw.cxgbe queues to 16 from 8 and the box is working fine.  I'm not sure if this is solely due to the hyperthreading of if the reworked networking stack in 13.1 is that much better.  Posting good news here for other that may have throughput issues.

Also interesting, in the attached graphs, note the interrupts tanked when bandwidth and load went up in FreeBSD 12 - that is when we started to get packet loss.

_Background: _ 3 chelsio cards, dual decacore CPU
 t6nex0: <Chelsio T62100-LP-CR>* numa-domain 0* on pci10
 t5nex0: <Chelsio T540-LP-CR>* numa-domain 1 *on pci14
 t5nex1: <Chelsio T540-LP-CR> *numa-domain 1* on pci16
 CPU: Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz
We have a chelsio_affinity script that is '*numa aware*' to bind the interrupts to different CPU cores. Without the hyperthreading, we could only support 8 queues per port, now with hyperthreading, we can do 16 queues (power of 2 and less than number of cores on a cpu).

We used the https://calomel.org/freebsd_network_tuning.html advice to not use hyperthreading in FreeBSD 12.


----------



## Phishfry (Apr 20, 2022)

Why did you choose to put the cards on different numa domains?
I put my three Chelsio T540 on one domain and all my NVMe on the other.
I would expect a performance penalty if threads have to jump CPU's. QPI maxes out at? Plus overhead.


----------



## Phishfry (Apr 20, 2022)

What is your testing platform?


Rudy said:


> Without the hyperthreading, we could only support 8 queues per port, now with hyperthreading, we can do 16 queues


So I guess this is the advantage of using both CPU's PCIe slots (numa domains)? You get more queues?


----------



## Alain De Vos (Apr 20, 2022)

Which software did you use to make the packet loss graph ?


----------



## Phishfry (Apr 20, 2022)

I dunno but its rrdtools based graphs. Not munin either.

If I zoom it it says 'Monkeybrains uses Cacti'.

So it looks like net-mgmt/cacti
Maybe Monkeybrains is the plugin.


----------



## Rudy (Apr 20, 2022)

Phishfry said:


> Why did you choose to put the cards on different numa domains?
> I put my three Chelsio T540 on one domain and all my NVMe on the other.
> I would expect a performance penalty if threads have to jump CPU's. QPI maxes out at? Plus overhead.


Two T5 cards in one domain, one T6 in the other.  We were maxing out cores in one CPU (*htop* would show cpu's 11 - 20 pegging)

This box is just a router, no disk activity, so spreading the queue processing across CPUs is our goal.


----------



## Rudy (Apr 20, 2022)

Alain De Vos said:


> Which software did you use to make the packet loss graph ?


I didn't post a packet loss graph, rather the last graph is context-switches / interrupts per second.  When load was high, interrupts stopped getting processed as quickly.  If you divide bandwidth / interrupts, we were getting 'more bandwith per interrupt' but there was also loss.  Latency also was not as good.

How did we detect loss? Our customer support queue! 

The tool that we used to verify was just plain old *ping*.  At peak times, traffic was lossy through that router.  Turning off our Amazon peer (turns out a lot of traffic comes from them) make the traffic drop and pings return to 100%.  This router is connect to an IX and we just started peering with Amazon -- bumped traffic from 10Gbps up to 15Gbps through the IX.


----------



## Rudy (Apr 20, 2022)

Phishfry said:


> So I guess this is the advantage of using both CPU's PCIe slots (numa domains)? You get more queues?


Access to more cores since we bind cards to numa domain with `/usr/bin/cpuset -l  ${CPU} -x ${IRQ}`

Read this paper: https://papers.freebsd.org/2018/asi...reebsd_for_routing_and_firewalling-slides.pdf


----------



## Sergei_Shablovsky (Jun 1, 2022)

Rudy said:


> I upgrade to 13.1, turned on hyperthreading, and upped the hw.cxgbe queues to 16 from 8 and the box is working fine. I'm not sure if this is solely due to the hyperthreading of if the reworked networking stack in 13.1 is that much better.



So, You change 4 variables at one time:
- upgrade to 13.1
- turn on hypertreading 
- lock NICs on certain domain, bind NICs to certain CPU core
- tune hw.cxgbe queues

Please show You measures *before* and *after* hypertreading switch on.

How You find the *exactly source *of Your problem with packet loss ?

I asking this because traffic constantly growing and in certain cases (national celebration days, worldwide sport events, local wars, national-wide disasters, important movies release, worldwide technological events, increasing numbers of videotraffic because next COVID-19 wave, etc..) You may stick to the same problem with packetloss...
Now only +30% of total traffic give You a packetloss, but what happened in feature, when peak goes up to +50....+70% and during this time You not be able to pull out router from work?


----------

