# Debugging a kernel panic



## Aayla Secura (Sep 5, 2017)

Greetings! I recently installed FreeBSD 11 on a server machine, Xeon E3-1275v6, wtih ZFS as the rootfs in a 3-way mirror configuration. On the first day of use I experienced two kernel panics which are not readily reproducible. I have the kernel crash dump and the output of `kgdb > backtrace`, but I don't know what to read from that. The first panic occured when I tried printing the kernel routing table with `netstat -r`, while the other one occured when I tried running `dhclient` on an inactive Ethernet port (that is, there was no cable plugged in). The motherboard has two Ethernet ports, both using igb driver. `dhclient` sends broadcasts on the active one but gets no reply, so I just wanted to make sure I was using the correct port by running `dhclient` on the other one as well. It displayed "No link" and then the kernel paniced.

I attach the output of `kgdb`, any help appreciated.


----------



## SirDice (Sep 6, 2017)

This is weird, it looks like it was ntpd(8) that caused it:

```
current process		= 790 (ntpd)
```

Besides the CPU what else can you tell us about the hardware?


----------



## Aayla Secura (Sep 7, 2017)

Thanks for replying! The motherboard is Intel DBS1200SPLR and it's got two 16GB Kingston KVR21E15D8/16 DIMMs (it's ECC). Maybe I should mention that one of the DIMMs is actually Kingston KVR21E15D8/16I which is identical except that it's "certified" for Intel. The store just didn't have two of those, so I got one ../16, which was still advertised as suitable for this processor.

I doubt it's of relevance but the rootfs is on three Kingston 120G HyperX SSDs and the PSU is a SeaSonic G-360W 80Plus Gold.


----------



## SirDice (Sep 7, 2017)

If you look at that 'current process' does each panic(9) happen for random processes? If the crashes are seemingly random I'm more inclined to suspect some bad memory. But if the crashes are always caused by the same process with a similar backtrace the issue may be driver related.


----------



## Aayla Secura (Sep 11, 2017)

So after some experimentation I figured out how to reproduce it.  In short, I found to ways to get it to panic every time:

*1)*

```
service netif stop
dhclient igb<ACTIVE IF>
```
In the mean time, in another terminal

```
service netif start
ntpd -q
```
If I don't start netif again, ntpd fails with [FONT=Courier New]unable to bind to wildcard address ::[/FONT].

*2)*

```
service netif stop
dhclient igb<ACTIVE IF>
```
In the mean time, in another terminal

```
netstat -r
```
In both cases it doesn't panic if dhclient is running on the inactive (unplugged) interface. And it doesn't happen unless I stop netif prior to running dhclient...

The error in all cases is the same: [FONT=Courier New]Fatal trap 12: page fault while in kernel mode[/FONT].


----------



## Terry_Kennedy (Sep 11, 2017)

SirDice said:


> If you look at that 'current process' does each panic(9) happen for random processes? If the crashes are seemingly random I'm more inclined to suspect some bad memory. But if the crashes are always caused by the same process with a similar backtrace the issue may be driver related.


The backtrace here seems to have the same footprints as this freebsd-hackers@ post, starting at soclose+0x3c. There may be some underlying cause common to both that bug report and this one. Certainly, usermode code shouldn't cause references to NULL + somesmallnumber (in this case, 0x17). This looks like it is supposed to be a reference to offset 0x17 in some data structure. In most of 11.x (the original poster didn't specify the exact FreeBSD version, and the SVN tag would also help), uipc_socket.c:1046 is:

```
error = (*so->so_proto->pr_usrreqs->pru_disconnect)(so);
```
I'd suggest opening a PR in category base / kern with the information in the OP as well as a link to the freebsd-hackers@ thread and this reply. If you post the PR number here, people can follow it. But we really need one of the network stack developers to look at it.


----------



## SirDice (Sep 11, 2017)

Aayla Secura said:


> In short, I found to ways to get it to panic every time


That will be extremely helpful. It's so much easier to debug issues if you know of a way to reproduce the problem. As Terry_Kennedy noted, it's probably time to create a PR for it. Definitely mention how to reproduce the panic.


----------



## Aayla Secura (Sep 13, 2017)

Done, bug #*222273*. Thanks for the help.


----------



## Terry_Kennedy (Sep 13, 2017)

Aayla Secura said:


> Done, bug #*222273*. Thanks for the help.


Subscribed.


----------

