# Random crashes - reason unknown



## yavor (Jan 14, 2009)

Hello all FreeBSD fans,

I have a nasty problem which makes my server crashes at random period and until today I even didn't have some kind of error message which could points me where the problem could be.

I use 6.2-RELEASE mainly as mail server, running postfix with dovecot. At the today's crash I've manage to record an error message on the screen:


> Sleeping thread (tid 100255 ipid 67940) owns a non-sleepable lock
> panic: sleeping thread
> cpuid=0
> Uptime 4d.....
> ...



I'm not sure what this message means, but to me it seems like some hardware problem. Another suggestion is that the crash is caused by some faulty process, but how could a non-root process to crash the whole server and OS? As far as I know FreeBSD is one of the most stable and reliable operating systems in this aspect..

Where could be the problem - the hardware, the OS or the software installed?

Any help is appreciated, thanks in advance!


----------



## trev (Jan 15, 2009)

Follow the instructions at http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/install-trouble.html for details on how to track down and isolate your problem(s).


----------



## sniper007 (Jan 15, 2009)

I advise you to check RAM

http://www.memtest.org/


----------



## yavor (Jan 15, 2009)

@trev, I'll read this document to see if it is useful and after that will report back.

@sniper007, I've started memtest and it displays a lot of loops like this:





> Loop 17:
> Stuck Address       : ok
> Random Value        : ok
> Compare XOR         : ok
> ...



Every loop looks identical to the others, how many loops it has to pass? Do I have to press Ctrl+C to stop memtest or it will stops by itself?


----------



## yavor (Jan 19, 2009)

Hi guys,

I've read the page on troubleshooting subject, but it didn't helped me. It was about installation issues, but my system is installed and working. The problem is that it restart spontaneously from time to time and I can't figure out the reason. The last time I've managed to write down this error message by hand on a paper, before the system starts loading and to the message disappears.

Any other ideas or suggestions?


----------



## danger@ (Jan 19, 2009)

http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug.html


----------



## kamikaze (Jan 19, 2009)

You need a swap partition, so that the kernel can create a dump. From the dump you can create a backtrace with kgdb.


----------



## yavor (Jan 21, 2009)

OK, I've prepared my system for dumps. Added this to rc.conf:


> dumpdev="AUTO"             # Device to crashdump to (device name, AUTO, or NO).
> dumpdir="/var/crash"       # Directory where crash dumps are to be stored
> savecore_flags=""          # Used if dumpdev is enabled above, and present.



Now will wait for the next crash, but can you tell me what to do with the content of /var/crash after that?


----------



## sniper007 (Jan 21, 2009)

yavor said:
			
		

> @sniper007, I've started memtest and it displays a lot of loops like this



Did you run memtest+ from CD ?


----------



## yavor (Jan 22, 2009)

Not from the CD, I have it in /usr/local/bin/memtest, can't remember whether it is installed by me or by default.

Which CD do you mean?


----------



## sniper007 (Jan 22, 2009)

> Here is some pre-compiled distributions of memtest86+. Memtest86+ comes in three different way, first is a pre-build bootable ISO, second is a bootable binary and third an installable package for creating a bootable floppy. Third version are compressed in .zip and .tar.gz.



http://www.memtest.org/#downiso


----------



## yavor (Jan 22, 2009)

Yes, I've found this memtest86 in the ports and now have to wait for the next crash to test it.

But meantime, I've replaced the old memory banks with new, so if the problem is with the memory it is solved radical.

I'll report back when I have more info.

Thanks a lot for your help


----------



## yavor (Jan 27, 2009)

Today my system crashed again with the new memory banks.

After reading this page http://www.freebsd.org/doc/en/books/developers-handbook/kerneldebug-gdb.html , all I could do was to run 





> cd /usr/obj/usr/src/sys/MYKERNEL/
> kgdb kernel.debug /var/crash/vmcore.0



kgdb output was:


> hostname# kgdb kernel.debug /var/crash/vmcore.0
> [GDB will not be able to debug user-mode threads: /usr/lib/libthread_db.so: Undefined symbol "ps_pglobal_lookup"]
> GNU gdb 6.1.1 [FreeBSD]
> Copyright 2004 Free Software Foundation, Inc.
> ...



And here I need your help, the last quote means nothing to me 

What should I do?


----------



## danger@ (Jan 27, 2009)

please submit the output of [cmd=(kgdb)]bt[/cmd] command in the prompt.


----------



## yavor (Jan 27, 2009)

Here is your request:


> (kgdb) bt
> #0  doadump () at pcpu.h:165
> #1  0xc062aec6 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:409
> #2  0xc062b1ed in panic (fmt=0xc08dc012 "%s") at /usr/src/sys/kern/kern_shutdown.c:565
> ...


----------



## rwatson@ (Jan 27, 2009)

Hi Yavor:

I'm sorry to hear about the problems you have been experiencing. A number of bugs in the UNIX domain socket code were fixed between FreeBSD 6.2 and 6.4, with at least a couple that might include symptoms such as those you are experiencing. Is it possible for you to upgrade to a more recent 6.x (ideally 6.4) to see if that corrects the problem? If you can't do a full system upgrade, it should be possible to upgrade the kernel and modules alone; however, I think moving it entirely forward to 6.4 would be preferable.

Thanks


----------



## yavor (Jan 27, 2009)

Thanks for the answer rwatson,

I had thoughts for upgrade too, but I've never done this before and since this system is serving 1500 email clients I'm a little bit concerned about the upgrade process and if it gonna pass smoothly.

I have to read more about the upgrade details and every advices are welcome. 

I'll report back when I have more info.


----------



## danger@ (Jan 27, 2009)

Well if you have a spare machine, try to make a clone of the "production" box and try to update the testing machine first to figure out how the upgrade process looks like. After all, 6.2->6.4 shouldn't be painful. If you have any specific questions, just ask.


----------



## yavor (Jan 27, 2009)

Unfortunately, I don't have a spare machine.. 

What are my upgrade options, can I upgrade from 6.2 directly to 6.4, or I have to step first on 6.3 and after that 6.4?

Also, if the upgrade fails for some reason, is there a way to rollback?

Thanks a lot for your help.


----------



## danger@ (Jan 27, 2009)

The easiest way for you might (if you run GENERIC kernel) be to use freebsd-update(8) tool. However the version contained within FreeBSD 6.2 does not support the [cmd=]update[/cmd] option, so you will have to obtain a newer version. Please see http://www.freebsd.org/releases/6.3R/announce.html and follow the instructions described there. FIY, the freebsd-update(8) utility supports the [cmd=]rollback[/cmd] option.

Also, you may want to go through http://www.freebsd.org/doc/en/books/handbook/updating-upgrading-freebsdupdate.html.

Speaking of myself, I have used freebsd-update(8) tool only once. I prefer doing source upgrades. Also please see Robert's reply he has a good point.


----------



## rwatson@ (Jan 27, 2009)

Hi Yavor:

It should be possible to updated straight to FreeBSD 6.4 without going via 6.3. If you do a source update, you should be able to try out a 6.4 kernel+modules without upgrading userspace or applications, which is easy to back out. Make sure you do a "cp -r /boot/kernel /boot/kernel.backup" to keep a copy of the 6.2 kernel around in case you decide to roll back. Most problems with upgrades will occur as a result of kernel changes, perhaps due to a change in device driver support, so this is a good way to do upgrades generally. FWIW, upgrades within a major release are generally low-risk and straight forward, it's major version upgrades that tend to be trickier.


----------



## yavor (Jan 27, 2009)

OK.. and to make the things more complicated - my kernel is customized.

What is the difference with the GENERIC?


----------



## danger@ (Jan 27, 2009)

wasn't it you who customized it?


----------



## yavor (Jan 27, 2009)

Sorry, I had to say: What is the difference between upgrading custom and generic kernel?

There is something not very clear to me, when I upgrade from 6.2 to 6.4 I have to:
1. Switch my kernel to generic
2. Upgrade the generic kernel + kernel modules, libraries etc..
3. portupgrade -a to rebuild the installed software to work with the new libraries from point 2

Am I on the right way?


----------

