# freebsd server crashing (crash dump)



## Lem0nHead (Aug 18, 2009)

hello
I have a *FreeBSD 6.2-RELEASE-p8* server (http+mysql+pop3+imap+smtp) crashing about twice a week for a month or so
this server was running fine for at least 8 months before without a single crash

I already tried replacing RAM and PSU, but the problem keeps happening

I got to create a "top" less than 10 seconds before the server crashing (I had set a script saving it):

```
last pid: 61576;  load averages:  2.04,  2.38,  2.45  up 9+12:13:49    14:42:43
199 processes: 2 running, 196 sleeping, 1 zombie

Mem: 1245M Active, 347M Inact, 281M Wired, 100M Cache, 112M Buf, 31M Free
Swap: 2048M Total, 2096K Used, 2046M Free


  PID USERNAME   THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
16167 mysql       19  20    0   488M 85556K kserel 0  53.3H 27.88% mysqld
61398 apache       1   4    0   168M 33868K sbwait 0   0:01  5.89% httpd
92707 apache       1   4    0   169M 64084K sbwait 0   1:51  3.42% httpd
```

and here's the dump:


```
Unread portion of the kernel message buffer:
kernel trap 12 with interrupts disabled


Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 06
fault virtual address   = 0x104
fault code              = supervisor read, page not present
instruction pointer     = 0x20:0xc067a45d
stack pointer           = 0x28:0xe4f58c90
frame pointer           = 0x28:0xe4f58c9c
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = resume, IOPL = 0
current process         = 5 (thread taskq)
trap number             = 12
panic: page fault
cpuid = 2
Uptime: 9d12h14m29s
Physical memory: 2039 MB
Dumping 338 MB: 323 307 291 275 259 243 227 211 195 179 163 147 131 115 99 83 67 51 35 19 3

#0  doadump () at pcpu.h:165
165             __asm __volatile("movl %%fs:0,%0" : "=r" (td));
(kgdb) backtrace
#0  doadump () at pcpu.h:165
#1  0xc0683236 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409
#2  0xc068355d in panic (fmt=0xc08d7a75 "%s") at /usr/src/sys/kern/kern_shutdown.c:565
#3  0xc0889c70 in trap_fatal (frame=0xe4f58c50, eva=260) at /usr/src/sys/i386/i386/trap.c:837
#4  0xc0889426 in trap (frame=
      {tf_fs = -968949752, tf_es = -967507928, tf_ds = -453705688, tf_edi = -968921088, tf_esi = 4, tf_ebp = -453669732, tf_isp = -453669764, 
tf_ebx = -960082340, tf_edx = 6, tf_ecx = 0, tf_eax = 1, tf_trapno = 12, tf_err = 0, tf_eip = -1066949539, tf_cs = 32, tf_eflags = 65538, 
tf_esp = -941363984, tf_ss = 4})
    at /usr/src/sys/i386/i386/trap.c:270
#5  0xc087604a in calltrap () at /usr/src/sys/i386/i386/exception.s:139
#6  0xc067a45d in _mtx_lock_sleep (m=0xc6c64e5c, tid=3326046208, opts=0, file=0x0, line=0) at /usr/src/sys/kern/kern_mutex.c:546
#7  0xc06c97c2 in unp_gc (arg=0x0, pending=1) at /usr/src/sys/kern/uipc_usrreq.c:1714
#8  0xc06a3edf in taskqueue_run (queue=0xc64c9080) at /usr/src/sys/kern/subr_taskqueue.c:257
#9  0xc06a43c2 in taskqueue_thread_loop (arg=0x1) at /usr/src/sys/kern/subr_taskqueue.c:376
#10 0xc066c979 in fork_exit (callout=0xc06a4330 <taskqueue_thread_loop>, arg=0xc09d7048, frame=0xe4f58d38) at /usr/src/sys/kern/kern_fork.c:821
#11 0xc08760ac in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:208
```

any ideas if I should try to replace some other hardware or if this may be a software/kernel problem?

the DC offered to swap my entire box, leaving only the HDDs
this server uptime is very important for me, so I'm trying to find the least risk procedure to try to debug that

thanks


----------



## DutchDaemon (Aug 18, 2009)

The dump appears to suggest it's related to xpt_thr, which is part of CAM (SCSI, USB), xpt(4). Anything special going on in that area?


----------



## Alt (Aug 18, 2009)

DutchDaemon said:
			
		

> The dump appears to suggest it's related to xpt_thr, which is part of CAM (SCSI, USB), xpt(4).


How did you see that?


----------



## DutchDaemon (Aug 18, 2009)

```
current process         = 5 (thread taskq)
```

Checked on three different systems here .. it's always:


```
5  ??  DL     0:00.00 [xpt_thrd]
```

Etcetera


----------



## Alt (Aug 18, 2009)

Hmm.. strange on my 7-1 release there is


> 5  ??  DL     0:00,00 [system_taskq]


And there is some lines about tasks/threads in 1st post.. It may be a task scheduler fail? I dont find anything about xpt or somewhat on my system...
Maybe try to run server without services one by one ?


----------



## DutchDaemon (Aug 18, 2009)

Ah, I should've looked beyond the first three systems 

On two others:


```
5  ??  DL     0:00.00 [kqueue taskq]
```


```
5  ??  DL     0:00.05 [[B]thread taskq[/B]]
```


----------



## DutchDaemon (Aug 18, 2009)

Here's another for 6.2:

http://lists.freebsd.org/pipermail/freebsd-hackers/2007-March/019926.html

In fact, this specific error appears to be very '6 related'.

http://www.google.com/search?q="thread+taskq"


----------



## Lem0nHead (Aug 18, 2009)

DutchDaemon said:
			
		

> The dump appears to suggest it's related to xpt_thr, which is part of CAM (SCSI, USB), xpt(4). Anything special going on in that area?



any ideas on how I could check that?
I see /dev/xpt0 exists
and I also found that process

```
root        21  0.0  0.0     0     8  ??  WL   Mon02PM   0:00.00 [swi2: cambio]
```


for taskq I have:

```
root         5  0.0  0.0     0     8  ??  DL   Mon02PM   0:55.67 [thread taskq]
root         9  0.0  0.0     0     8  ??  DL   Mon02PM   0:00.00 [kqueue taskq]
root        19  0.0  0.0     0     8  ??  WL   Mon02PM   0:00.01 [swi6: Giant taskq]
```

thanks


----------



## DutchDaemon (Aug 18, 2009)

It's not xpt, my bad.

All I know about 'thread taskq' is in here: taskqueue(9). I don't know whether the panics are due to e.g. scheduling, locking, or threaded apps misbehaving. I guess a developer should look at this.


----------



## Lem0nHead (Aug 18, 2009)

well
if it's really a kernel problem I guess I should try upgrading to 6.3 or 7 and report if it persists
maybe it's already fixed

thanks


----------



## torqueturns (Aug 21, 2009)

Lem0nHead said:
			
		

> well
> if it's really a kernel problem I guess I should try upgrading to 6.3 or 7 and report if it persists
> maybe it's already fixed
> 
> thanks



Look again at your hardware, I had a similar problem like this, turned out to be a bulging capacitor on the mother board near the memory.


----------



## plamaiziere (Aug 23, 2009)

Lem0nHead said:
			
		

> hello
> I have a *FreeBSD 6.2-RELEASE-p8* server (http+mysql+pop3+imap+smtp) crashing about twice a week for a month or so
> this server was running fine for at least 8 months before without a single crash



Looks like similar with the issue mentioned in the 6.2 ERRATA, (according to your dump, the box panic in unp_gc so I've just googled on it), see :

http://people.freebsd.org/~bmah/relnotes/6.2-RELEASE/errata.pdf

There is a patch, you could try it.


----------



## Lem0nHead (Aug 23, 2009)

thanks, that's good news since I already upgraded to 6.4
I most likely won't need to upgrade to 7 then, since it will be probably fixed on 6.4


----------

