# Intel Pro 10G (ix) freezing during high traffic



## absduser (Nov 7, 2019)

I am seeing a repeating issue on FreeBSD 11.1-RELEASE with the ix driver (3.1.13-k) where during high levels of traffic my network card stops passing traffic. Here are the particulars:

- at the time of the "crash" the NIC is pushing out ~1GB of traffic (primarily outbound) and around 75-90k pps
- the NIC is still pingable locally, but cannot accept or send traffic externally

netstat -m: (at the time of the crash)

```
94039/21086/115125 mbufs in use (current/cache/total)
65737/12069/77806/16775612 mbuf clusters in use (current/cache/total/max)
65737/11934 mbuf+clusters out of packet secondary zone in use (current/cache)
1018/8622/9640/8387806 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/2485275 9k jumbo clusters in use (current/cache/total/max)
0/0/0/1397967 16k jumbo clusters in use (current/cache/total/max)
159055K/63897K/222953K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0 sendfile syscalls
0 sendfile syscalls completed without I/O request
0 requests for I/O initiated by sendfile
0 pages read by sendfile as part of a request
0 pages were valid at time of a sendfile request
0 pages were requested for read ahead by applications
0 pages were read ahead by sendfile
0 times sendfile encountered an already busy page
0 requests for sfbufs denied
0 requests for sfbufs delayed
```

sysctl dev.ix | grep interrupt_rate: (at the time of the crash)

```
dev.ix.1.queue7.interrupt_rate: 0
dev.ix.1.queue6.interrupt_rate: 0
dev.ix.1.queue5.interrupt_rate: 0
dev.ix.1.queue4.interrupt_rate: 0
dev.ix.1.queue3.interrupt_rate: 0
dev.ix.1.queue2.interrupt_rate: 0
dev.ix.1.queue1.interrupt_rate: 0
dev.ix.1.queue0.interrupt_rate: 0
dev.ix.0.queue7.interrupt_rate: 500000
dev.ix.0.queue6.interrupt_rate: 500000
dev.ix.0.queue5.interrupt_rate: 500000
dev.ix.0.queue4.interrupt_rate: 500000
dev.ix.0.queue3.interrupt_rate: 500000
dev.ix.0.queue2.interrupt_rate: 500000
dev.ix.0.queue1.interrupt_rate: 500000
dev.ix.0.queue0.interrupt_rate: 500000
```

(prior to crash those figures were in flux)

systat -tcp 1: (at the time of the crash)

```
/0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
     Load Average   ||||||||||||||||||||||||||||||||||||||


             TCP Connections                       TCP Packets
           0 connections initiated           75646 total packets sent
           0 connections accepted            64016 - data
           0 connections established           258 - data (retransmit by dupack)
           0 connections dropped               258 - data (retransmit by sack)
           0 - in embryonic state            11373 - ack-only
           0 - on retransmit timeout             0 - window probes
           0 - by keepalive                      0 - window updates
           0 - from listen queue                 0 - urgent data only
                                                 0 - control
                                                 0 - resends by PMTU discovery

             TCP Timers                      40420 total packets received
       20044 potential rtt updates           16807 - in sequence
       20599 - successful                       58 - completely duplicate
          11 delayed acks sent                   0 - with some duplicate data
           0 retransmit timeouts              3208 - out-of-order
           0 persist timeouts                  416 - duplicate acks
           0 keepalive probes                20599 - acks
           0 - timeouts                          0 - window probes
                                                 1 - window updates
                                                 0 - bad checksum
```

vmstat -i: (NOT at the time of the crash)

```
interrupt                          total       rate
irq5: uart2                        12665          0
irq18: ehci0 uhci5                     2          0
irq19: uhci2 uhci4                    27          0
cpu0:timer                     128218130       1725
cpu1:timer                      70989642        955
cpu4:timer                      77084729       1037
cpu23:timer                     57337377        771
cpu12:timer                     56820954        764
cpu6:timer                      74254569        999
cpu7:timer                      74549149       1003
cpu2:timer                      79381091       1068
cpu20:timer                     58652387        789
cpu10:timer                     59629106        802
cpu8:timer                      59224673        797
cpu22:timer                     58609410        788
cpu9:timer                      58162364        782
cpu16:timer                     57266397        770
cpu18:timer                     57564243        774
cpu19:timer                     56484023        760
cpu15:timer                     56022798        754
cpu5:timer                      76716821       1032
cpu11:timer                     58499243        787
cpu13:timer                     55827216        751
cpu17:timer                     56234589        756
cpu21:timer                     57115392        768
cpu3:timer                      77477070       1042
cpu14:timer                     57042863        767
irq256: igb0:que 0             180546537       2429
irq257: igb0:que 1             162536944       2187
irq258: igb0:que 2             155586807       2093
irq259: igb0:que 3             172733041       2324
irq260: igb0:que 4             103526741       1393
irq261: igb0:que 5             199118299       2679
irq262: igb0:que 6             157922942       2124
irq263: igb0:que 7             120417137       1620
irq264: igb0:link                      2          0
irq274: mps0                    41945045        564
irq275: mps1                    22103463        297
irq276: mps2                         511          0
irq277: ahci0:ch0                 210612          3
irq278: ahci0:ch1                 210884          3
irq279: ahci0:ch2                    133          0
irq280: ahci0:ch3                    133          0
irq281: ahci0:ch4                    133          0
irq293: ix0:q0                  61730991        830
irq294: ix0:q1                  22327729        300
irq295: ix0:q2                  90093508       1212
irq296: ix0:q3                  71431567        961
irq297: ix0:q4                  58330475        785
irq298: ix0:q5                  45056527        606
irq299: ix0:q6                  40896578        550
irq300: ix0:q7                  49991675        673
irq301: ix0:link                      28          0
irq302: ix1:q0                  70982750        955
irq303: ix1:q1                  48796468        656
irq304: ix1:q2                  71915750        967
irq305: ix1:q3                 100512928       1352
irq306: ix1:q4                  50342892        677
irq307: ix1:q5                  70621409        950
irq308: ix1:q6                  78816402       1060
irq309: ix1:q7                  40005799        538
irq310: ix1:link                      20          0
irq311: mps3                   512411663       6893
irq312: mps4                   417590889       5618
Total                         4797892342      64544
```

systat -ifstat -match igb0 -pps: (NOT at the time of the crash, but under similar network conditions, different card)

```
/0   /1   /2   /3   /4   /5   /6   /7   /8   /9   /10
     Load Average   |||||||||||||||||||||||||||||||

      Interface           Traffic               Peak                Total
           igb0  in     42.438 Kp/s         64.016 Kp/s            1.502 Gp
                 out    89.479 Kp/s         99.960 Kp/s            3.201 Gp
```

sysctls:

```
hw.ix.rxd=4096
hw.ix.txd=4096
net.isr.maxthreads="-1"

net.inet.tcp.drop_synfin=1
net.inet.ip.portrange.hifirst=62000
net.inet.ip.portrange.hilast=64000
security.mac.portacl.port_high=65535
net.inet.ip.fw.one_pass=0
net.inet.tcp.mssdflt=1460
net.inet.tcp.recvspace=2263000
net.inet.tcp.sendspace=2263000
net.inet.tcp.minmss=1300
net.inet.tcp.syncache.rexmtlimit=0
net.inet.tcp.tso=0
net.inet.tcp.cc.algorithm=htcp

kern.ipc.maxsockbuf=16777216
net.inet.tcp.sendbuf_inc=16384
net.inet.tcp.recvbuf_inc=524288
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216
hw.intr_storm_threshold=10000
```

I have also tried these (but no difference was seen):

```
dev.ix.0.fc=0
ifconfig ix0 -tso
hw.ix.enable_aim=0
```

Also interesting that after the crash there were fatal errors reported in the PCI stats:

```
27045-ix0@pci0:131:0:0: class=0x020000 card=0x00018086 chip=0x15288086 rev=0x01 hdr=0x00
27128-    vendor     = 'Intel Corporation'
27165-    device     = 'Ethernet Controller 10-Gigabit X540-AT2'
27224-    class      = network
27249:    subclass   = ethernet
27275-    cap 01[40] = powerspec 3  supports D0 D3  current D0
27332-    cap 05[50] = MSI supports 1 message, 64 bit, vector masks
27395-    cap 11[70] = MSI-X supports 64 messages, enabled
27448-                 Table in map 0x20[0x0], PBA in map 0x20[0x2000]
27513-    cap 10[a0] = PCI-Express 2 endpoint max data 256(512) FLR RO NS
27581-                 link x8(x8) speed 5.0(5.0) ASPM disabled(L0s/L1)
27647-    ecap 0001[100] = AER 2 1 fatal 1 non-fatal 1 corrected
27706-    ecap 0003[140] = Serial 1 a0369fffff3e4538
```

At boot that line read:

```
ecap 0001[100] = AER 2 0 fatal 0 non-fatal 1 corrected
```


Install info:

demsg:

```
ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.1.13-k> mem 0xf7e00000-0xf7ffffff,0xf7dfc000-0xf7dfffff irq 17 at device 0.0 numa-domain 1 on pci10
ix0: Using MSIX interrupts with 9 vectors
ix0: Ethernet address: a0:36:9f:3e:7f:2c
ix0: PCI Express Bus: Speed 5.0GT/s Width x8
ix0: netmap queues/slots: TX 8/4096, RX 8/4096
```

System:

```
FreeBSD 11.1-RELEASE-p6 #0: Tue Dec 19 13:52:29 PST 2017
    user@11_1:/usr/src/sys/amd64/compile/kernel.11_1amd64 amd64
FreeBSD clang version 4.0.0 (tags/RELEASE_400/final 297347) (based on LLVM 4.0.0)
VT(vga): resolution 640x480
CPU: Intel(R) Xeon(R) CPU           E5645  @ 2.40GHz (2400.14-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x206c2  Family=0x6  Model=0x2c  Stepping=2
 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
 Features2=0x29ee3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,POPCNT,AESNI>
  AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
  AMD Features2=0x1<LAHF>
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
  TSC: P-state invariant, performance statistics
real memory  = 274882101248 (262148 MB)
avail memory = 267105476608 (254731 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <123011 APIC1930>
FreeBSD/SMP: Multiprocessor System Detected: 24 CPUs
FreeBSD/SMP: 2 package(s) x 6 core(s) x 2 hardware threads
```


Thus far, to solve the problem I either re-config all networking to another card (including ix1, which also crashes eventually) or "HUP" the nic:

```
devctl disable ix0
devctl enable ix0
devctl suspend ix0
devctl resume ix0
```

There are no messages of any kind about failures or errors in logs or on console.


----------



## absduser (Nov 7, 2019)

hw.ix.max_interrupt_rate="-1" has also been tried


----------



## Phishfry (Nov 8, 2019)

Why are you using FreeBSD RELEASE 11.1 ? That has been EOL'ed since September 2018.
The ix driver has undergone many versions in the two years since that release.
I would also recommend trying the ports version of the module net/intel-ix-kmod if, after updating, you still have troubles.


----------

