# Kernel panic with pf



## Ben (May 4, 2012)

Hi,

I just experienced a kernel panic which I experienced last week also I think. But this time I have the backtrace (please see the attachment). What could be the reason? Something with pf, but the rules I am using have been used before. This time I set a few sysctl parameters which have not been set before. You can give me a hint which one it could be or how I could find out more about the reason?

Note: I will add the sysctl parameters as soon as I have access to the server again.

Thanks for your help in advance. If you need further details, please let me know.

System: FreeBSD 9.0 GENERIC AMD64
Network driver: re(4)


----------



## Ben (May 5, 2012)

dmesg:

```
Copyright (c) 1992-2012 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 9.0-RELEASE #0: Tue Jan  3 07:46:30 UTC 2012
    root@farrell.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64
CPU: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz (3411.55-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0x206a7  Family = 6  Model = 2a  Stepping = 7
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,
 CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x17bae3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,
 PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,AVX>
  AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
  AMD Features2=0x1<LAHF>
  TSC: P-state invariant, performance statistics
real memory  = 17179869184 (16384 MB)
avail memory = 16419856384 (15659 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <ALASKA A M I>
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
FreeBSD/SMP: 1 package(s) x 4 core(s) x 2 SMT threads
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
 cpu4 (AP): APIC ID:  4
 cpu5 (AP): APIC ID:  5
 cpu6 (AP): APIC ID:  6
 cpu7 (AP): APIC ID:  7
ioapic0 <Version 2.0> irqs 0-23 on motherboard
kbd1 at kbdmux0
acpi0: <ALASKA A M I> on motherboard
[B]ACPI Error: [RAMB] Namespace lookup failure, AE_NOT_FOUND (20110527/psargs-392)
ACPI Exception: AE_NOT_FOUND, Could not execute arguments for [RAMW] (Region) (20110527/nsinit-380)[/B]
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 900
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
cpu2: <ACPI CPU> on acpi0
cpu3: <ACPI CPU> on acpi0
cpu4: <ACPI CPU> on acpi0
cpu5: <ACPI CPU> on acpi0
cpu6: <ACPI CPU> on acpi0
cpu7: <ACPI CPU> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pcib1: <ACPI PCI-PCI bridge> irq 16 at device 1.0 on pci0
pci1: <ACPI PCI bus> on pcib1
vgapci0: <VGA-compatible display> port 0xf000-0xf03f mem 0xfe000000-0xfe3fffff,0xc0000000-0xcfffffff
 irq 16 at device 2.0 on pci0
pci0: <simple comms> at device 22.0 (no driver attached)
ehci0: <EHCI (generic) USB 2.0 controller> mem 0xfe503000-0xfe5033ff irq 23 at device 26.0 on pci0
usbus0: EHCI version 1.0
usbus0: <EHCI (generic) USB 2.0 controller> on ehci0
pcib2: <ACPI PCI-PCI bridge> irq 17 at device 28.0 on pci0
pci2: <ACPI PCI bus> on pcib2
pcib3: <ACPI PCI-PCI bridge> irq 16 at device 28.5 on pci0
pci3: <ACPI PCI bus> on pcib3
xhci0: <XHCI (generic) USB 3.0 controller> mem 0xfe400000-0xfe407fff irq 17 at device 0.0 on pci3
xhci0: 32 byte context size.
usbus1 on xhci0
pcib4: <ACPI PCI-PCI bridge> irq 18 at device 28.6 on pci0
pci4: <ACPI PCI bus> on pcib4
re0: <RealTek 8168/8111 B/C/CP/D/DP/E PCIe Gigabit Ethernet> port 0xe000-0xe0ff mem 0xd0004000-0xd0004fff,
 0xd0000000-0xd0003fff irq 18 at device 0.0 on pci4
re0: Using 1 MSI-X message
re0: Chip rev. 0x2c000000
re0: MAC rev. 0x00000000
miibus0: <MII bus> on re0
rgephy0: <RTL8169S/8110S/8211 1000BASE-T media interface> PHY 1 on miibus0
rgephy0:  none, 10baseT, 10baseT-FDX, 10baseT-FDX-flow, 100baseTX, 100baseTX-FDX, 100baseTX-FDX-flow,
 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, 1000baseT-FDX-flow, 1000baseT-FDX-flow-master,
 auto, auto-flow
re0: Ethernet address: 14:da:e9:b3:98:70
pcib5: <ACPI PCI-PCI bridge> irq 19 at device 28.7 on pci0
pci5: <ACPI PCI bus> on pcib5
pcib6: <ACPI PCI-PCI bridge> irq 17 at device 0.0 on pci5
pci6: <ACPI PCI bus> on pcib6
ehci1: <EHCI (generic) USB 2.0 controller> mem 0xfe502000-0xfe5023ff irq 23 at device 29.0 on pci0
usbus2: EHCI version 1.0
usbus2: <EHCI (generic) USB 2.0 controller> on ehci1
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
ahci0: <Intel Cougar Point AHCI SATA controller> port 0xf0b0-0xf0b7,0xf0a0-0xf0a3,0xf090-0xf097,0xf080-0xf083,
 0xf060-0xf07f mem 0xfe501000-0xfe5017ff irq 20 at device 31.2 on pci0
ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich1: <AHCI channel> at channel 1 on ahci0
pci0: <serial bus, SMBus> at device 31.3 (no driver attached)
acpi_button0: <Power Button> on acpi0
hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
Timecounter "HPET" frequency 14318180 Hz quality 950
Event timer "HPET" frequency 14318180 Hz quality 550
attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0
Event timer "RTC" frequency 32768 Hz quality 0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
orm0: <ISA Option ROM> at iomem 0xcd800-0xce7ff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
ppc0: cannot reserve I/O port range
est0: <Enhanced SpeedStep Frequency Control> on cpu0
p4tcc0: <CPU Frequency Thermal Control> on cpu0
est1: <Enhanced SpeedStep Frequency Control> on cpu1
p4tcc1: <CPU Frequency Thermal Control> on cpu1
est2: <Enhanced SpeedStep Frequency Control> on cpu2
p4tcc2: <CPU Frequency Thermal Control> on cpu2
est3: <Enhanced SpeedStep Frequency Control> on cpu3
p4tcc3: <CPU Frequency Thermal Control> on cpu3
est4: <Enhanced SpeedStep Frequency Control> on cpu4
p4tcc4: <CPU Frequency Thermal Control> on cpu4
est5: <Enhanced SpeedStep Frequency Control> on cpu5
p4tcc5: <CPU Frequency Thermal Control> on cpu5
est6: <Enhanced SpeedStep Frequency Control> on cpu6
p4tcc6: <CPU Frequency Thermal Control> on cpu6
est7: <Enhanced SpeedStep Frequency Control> on cpu7
p4tcc7: <CPU Frequency Thermal Control> on cpu7
ZFS filesystem version 5
ZFS storage pool version 28
Timecounters tick every 1.000 msec
usbus0: 480Mbps High Speed USB v2.0
usbus1: 5.0Gbps Super Speed USB v3.0
usbus2: 480Mbps High Speed USB v2.0
ugen0.1: <Intel> at usbus0
uhub0: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus0
ugen1.1: <0x1b21> at usbus1
uhub1: <0x1b21 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus1
ugen2.1: <Intel> at usbus2
uhub2: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus2
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <ST33000651AS CC45> ATA-8 SATA 3.x device
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
ada0: Previously was known as ad4
ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
ada1: <ST33000651AS CC45> ATA-8 SATA 3.x device
ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
ada1: Previously was known as ad6
SMP: AP CPU #1 Launched!
SMP: AP CPU #5 Launched!
SMP: AP CPU #6 Launched!
SMP: AP CPU #4 Launched!
SMP: AP CPU #3 Launched!
SMP: AP CPU #7 Launched!
SMP: AP CPU #2 Launched!
uhub1: 4 ports with 4 removable, self powered
Root mount waiting for: usbus2 usbus0
uhub0: 2 ports with 2 removable, self powered
uhub2: 2 ports with 2 removable, self powered
Root mount waiting for: usbus2 usbus0
ugen0.2: <vendor 0x8087> at usbus0
ugen2.2: <vendor 0x8087> at usbus2
uhub3: <vendor 0x8087 product 0x0024, class 9/0, rev 2.00/0.00, addr 2> on usbus0
uhub4: <vendor 0x8087 product 0x0024, class 9/0, rev 2.00/0.00, addr 2> on usbus2
Root mount waiting for: usbus2 usbus0
uhub3: 6 ports with 6 removable, self powered
uhub4: 8 ports with 8 removable, self powered
Trying to mount root from zfs:tank/root []...
re0: link state changed to UP
```


----------



## Ben (May 5, 2012)

sysctl.conf:

```
security.bsd.see_other_uids=0
security.bsd.see_other_gids=0
net.inet.ip.check_interface=1
net.inet.tcp.blackhole=2
net.inet.udp.blackhole=1
net.inet.tcp.drop_synfin=1
net.inet.tcp.msl=10000
kern.ipc.somaxconn=32768
net.inet.ip.rtexpire=5
net.inet.ip.rtminexpire=5

net.inet.ip.fastforwarding=1
net.inet.ip.redirect=0
kern.random.sys.harvest.ethernet=0              
kern.random.sys.harvest.point_to_point=0
kern.random.sys.harvest.interrupt=0

net.inet.icmp.icmplim=50          
net.inet.ip.process_options=0     
net.inet.ip.rtmaxcache=256        
net.inet.icmp.drop_redirect=1     
net.inet.tcp.delayed_ack=0        
net.inet.tcp.nolocaltimewait=1  
net.inet.tcp.path_mtu_discovery=0 
net.inet.tcp.recvbuf_max=16777216 
net.inet.tcp.recvspace=8192       
net.inet.tcp.sendbuf_max=16777216 
net.inet.tcp.sendspace=16384
```


----------



## Ben (May 5, 2012)

Interestingly almost exactly 7 days after the last panic.


----------



## da1 (May 6, 2012)

Regarding the sysctl's, you can use [cmd=]sysctl -d <name>[/cmd] to get a short description of what that particulat sysctl does.

Regarding the pf and kernel crash, sorry, no idea, but you might want to send an e-mail to the pf mailing lists (see http://www.benzedrine.cx/mailinglist.html for more info).


----------



## Ben (May 6, 2012)

Ok, thanks. Ill try that.


----------



## da1 (May 6, 2012)

Just came to mind after analyzing the dump again. What process does PID 78566 belong to?


----------



## Ben (May 6, 2012)

How can I find out? I*'*m not very experienced in dump analysis / kernel problems.

I had to reboot the system to make it up again.


----------



## da1 (May 6, 2012)

http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2004-11/1632.html


LE: http://www4.tw.freebsd.org/doc/en/books/developers-handbook/kerneldebug.html

I think it would be advisable to get a core dump and then analyse it with kgdb or something else.


----------



## Ben (May 6, 2012)

Ok, I set dumpdev to auto in /etc/rc.conf. 

Maybe next week it dumps again  Then I will analyze as described. Maybe I can find out something. I will keep this thread updated.

Thanks so far. You really helped me a lot.


----------



## val (May 12, 2012)

I've almost the same problem. My topic http://forums.freebsd.org/showthread.php?t=31307


----------



## Ben (Jun 29, 2012)

Today it crashed again.

Almost same backtrace. After the reset I will check if there is a dump for me.

EDIT: No dump in /var/crash.


----------



## Ben (Jun 29, 2012)

Crashed again with the exact same backtrace. I'm desperate.


----------



## glebius@ (Jun 30, 2012)

In 9-STABLE some ioctl() calls into pf do M_WAITOK malloc, while holding the pf mutex, and I suppose, that could be the cause of panic.

In my pf branch I've fixed all such cases. For details see: http://lists.freebsd.org/pipermail/freebsd-pf/2012-June/006643.html


----------



## Ben (Jul 1, 2012)

Sorry, I*'*m not an expert in those topics.

What could cause the problem here? That a lock is set and not released in time or this kind? Sorry, I don't know what that means.

I read all the mailing list conversation, but I don't understand too much but I*'*m willing to try. More than crashing like now can't happen. Last night it crashed again after 12 hours.

I changed sysctl parameters to see if I set something wrong.

Unluckily I might need your assistance in compiling the new PF. What I understood is I need to checkout the source (sure) and then merge your code into, then run build world?

Thanks.


----------



## glebius@ (Jul 1, 2012)

> Unluckily I might need your assistance in compiling the new PF. What I understood is I need to checkout the source (sure) and then merge your code into, then run build world?



You don't need to merge or patch anything, the branch already contains my version of PF. What you do need is establishing a good redundancy infrastructure, so that you can quickly fall back to a well tested version. Although my branch is considered more correct with locks usage, but it is in beta state right now.

Another, less disrupting alternative, is to try to patch 9-STABLE fixing this exact issue you are hitting. I am not sure what exactly causes the panic. To tell that you need to dump core after panic, then go to kgdb and trace the PID that panic string references. I suppose, that you are hitting M_WAITOK malloc in pfi_dynaddr_setup(). To fix it, please try the attached patch. But I can't guarantee it would help.


----------



## Ben (Jul 2, 2012)

I don't have a dump. Even though I set dumpdev to auto in rc.conf.

So I can checkout the source, apply your patch and make a build world?


----------



## Ben (Jul 3, 2012)

Now it crashes every night at 12pm. I really don't know what is going on at that time to make the server crash. I am thinking to go back to 8.x


----------



## Ben (Jul 3, 2012)

```
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|Index: pf_if.c
|===================================================================
|--- pf_if.c     (revision 237890)
|+++ pf_if.c     (working copy)
--------------------------
File to patch: /usr/src/sys/contrib/pf/net/pf_if.c
Patching file /usr/src/sys/contrib/pf/net/pf_if.c using Plan A...
Hunk #1 failed at 506.
1 out of 1 hunks failed--saving rejects to /usr/src/sys/contrib/pf/net/pf_if.c.rej
done
```
This is what I get when I apply your patch


----------



## Ben (Jul 3, 2012)

I applied the match manually and compiled/installed the new kernel.

I hope it works. I will let you know if it helped.

Thanks!!


----------



## Ben (Jul 3, 2012)

Crashed again at 12pm, like last night and the night before.


----------



## Ben (Jul 4, 2012)

Ok, I found out how I can make it crash: I just have to run [CMD=""]obspamd-setup[/CMD]

It will load 177387 (or more) IPs into a local table which makes pf crash.

Somebody has a hint if I can tune some parameters to make pf handle this amount of IPs in tables?


----------



## glebius@ (Jul 5, 2012)

Ben,

you need to configure crashdump device and make sure it works. Then reproduce crash and dump core. Then please trace the pid prom panic message.


----------



## Ben (Jul 5, 2012)

I set dumpdev to auto and verified with dumpon. It just doesn't work.


----------



## glebius@ (Jul 5, 2012)

Strange. Then add "options DDB" to the kernel and on panic new kernel will enter debugger. In debugger type "call doadump". Would that work?

P.S. You can also statically compile pf/pfsync into kernel, that would use future debugging of core.


----------



## kpa (Jul 5, 2012)

Please post the output of 
`# pfctl -s memory`

The last two numbers should be what you are looking for, maximums for number of tables and overall number of addresses in all tables. You can increase them in pf.conf(5), for example:


```
set limit tables 1000
set limit table-entries 500000
```


----------



## Ben (Jul 6, 2012)

Thanks for this hint, seems it worked. At least it didn't panic after I set the limit higher. Anyway, now the table contains "only" 137507 IPs.


----------



## kpa (Jul 6, 2012)

Good to hear that it helped. This should be reported via a PR to the developers. In my opinion exhausting the available table entries shouldn't panic the system. Include as much details as possible about your system and settings in your PR.


----------



## Ben (Jul 6, 2012)

This is also my opinion.

Actually you should be able to set the number to 5 and then load 10 IPs. I will check if I have a system available which I can crash. We don't have too many FreeBSD 9 machines yet.


----------



## maxum (Jul 6, 2012)

Try changing the time, to see wheter it is a routine program or a hasard.


----------



## Ben (Jul 6, 2012)

Everyday 12pm it crashed. Then I ran it manually and it crashed. After setting the new limits I ran it and it did not crash. So I activated the job again to see if it crashes again.

Tonight I will see...


----------



## glebius@ (Jul 15, 2012)

The problem appeared to be fixed in head long time ago, in r230119. I'll merge it to stable/9 ASAP.


----------



## glebius@ (Jul 18, 2012)

Fix merged to stable/9.


----------



## gkontos (Jul 18, 2012)

glebius@ said:
			
		

> Fix merged to stable/9.



Good, will that make it to 9.1-RELEASE?


----------



## glebius@ (Jul 19, 2012)

Sure.

x.y-RELEASE is always cut from releng/x.y branch, which in its turn is cut from stable/x branch. For now, the releng/9.1 hasn't been cut yet.


----------



## gkontos (Jul 20, 2012)

glebius@ said:
			
		

> Sure.
> 
> x.y-RELEASE is always cut from releng/x.y branch, which in its turn is cut from stable/x branch. For now, the releng/9.1 hasn't been cut yet.



Great, I see there were some issues with this commit in the mailing list :e


----------

