# After upgrade to 9.3 HP DL160 reboots every day



## Vladimir43 (Oct 22, 2014)

After upgrading to FreeBSD 9.3, the HP DL160 server reboots every 30-40 hours.

`# uname -a`

```
FreeBSD ... 9.3-RELEASE-p2 FreeBSD 9.3-RELEASE-p2 #0: Mon Sep 15 16:44:27 UTC 2014     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64
```

Upgraded by command `freebsd-update -r 9.3-RELEASE upgrade`.

/var/log/messages is empty before this. Some reboot history:

```
Oct 14 14:28:07
Oct 15 10:52:07
Oct 16 18:45:12
Oct 18 02:48:05
Oct 19 10:50:24
Oct 20 18:53:00
Oct 22 02:55:42
```

ZFS mirror, some jails, Apache, MySQL and so on are running.

What information may help? `dmesg`, `dmidecode`?


----------



## ShelLuser (Oct 22, 2014)

Anything special in /var/log/dmesg.today or even /var/log/dmesg.yesterday? I'd expect something odd to show up in there, a system usually doesn't reboot on its own.

I also assume nothing odd shows up in /var/log/auth.log (just to rule everything out)?


----------



## Vladimir43 (Oct 22, 2014)

Nothin_g_ strange for me in the dmesg (attached). How can I debug this problem?


----------



## ShelLuser (Oct 22, 2014)

Just to make sure; I wasn't referring to using the dmesg command but to specifically look into those mentioned logfiles. Using dmesg will only show you the current system message buffer.

Other than that it might help to log *.emerg messages. By default those get sent to the console, it could be useful to store them; check /etc/syslog.conf for that.

Of course this is all assuming that something malicious is going on. Another thing to look out for are security logs; making sure that this isn't caused by someone else's doing, maybe even unintentionally. The security logs could help in that area.


----------



## Vladimir43 (Oct 28, 2014)

Here is a part of my syslog.conf

```
*.emerg                                         /var/log/console.log
console.*                                       /var/log/console.log
```
But nothing writes before reboot:

```
Oct 27 08:52:59 ... kernel: Configuring jails:.
Oct 27 08:52:59 ... kernel: Starting jails:
Oct 27 08:53:06 ... kernel: Starting background file system checks in 60 seconds.
Oct 27 08:53:06 ... kernel:
Oct 27 08:53:06 ... kernel: Mon Oct 27 08:53:06 MSK 2014
Oct 28 17:14:01 ... kernel: Setting hostuuid: 81a00ce0-8ffe-d511-994d-18a905757ea7.
Oct 28 17:14:01 ... kernel: Setting hostid: 0x23c3240e.
Oct 28 17:14:01 ... kernel: Entropy harvesting: interrupts ethernet point_to_point kickstart.
```

I repeat that this happened immediately after an upgrade to the 9.3 version. I think this is a hardware incompatibility.


----------



## Chris_H (Nov 3, 2014)

Vladimir43 said:
			
		

> I repeat that this happened immediately after an upgrade to the 9.3 version. I think this is a hardware incompatibility.


Possibly. But even so. There *should* be some indication as to what/why this occurred indicated in /var/log/messages. Check the time frames. _Especially_ just before the boot message(s) begin. There should be some clue(s) as to what happened, and why.

--Chris


----------



## Vladimir43 (Nov 7, 2014)

I turned off all services and unloaded core modules (jail, sendmail, NTP, ipmi.ko and so on) ex*c*ept important and now restarts stopped!
I will return one by one and see.


----------



## Chris_H (Nov 7, 2014)

Nothing in /var/log/messages?
Best wishes. Your choice of direction, for isolating the problem, is a good one. I'm just surprised you haven't seen, or looked to messages. As that almost always provides good clues, in such matters.

All the best.

--Chris


----------



## Vladimir43 (Nov 7, 2014)

I wrote at the beginning that there was nothin*g* strange in /var/log/messages. Just like power turned off.


----------



## kpa (Nov 7, 2014)

Do you have any ports/packages installed that install kernel modules? If you didn't recompile those after the upgrade there's a very good chance that the reboots are caused by an incompatible kernel module.


----------



## Chris_H (Nov 7, 2014)

Vladimir43 said:


> I wrote at the beginning that there was nothin*g* strange in /var/log/messages. Just like power turned off.


Right. I caught that, but was _sure_ there would be at least _something_. Well. Hope you find the culprit. 

--Chris


----------



## Vladimir43 (Dec 18, 2014)

After long testing revealed that the server restarts after about 250 tests of `smartctl` on the SATA drives. Previously, these tests run Nagios.
I have two SATA drives on the 0 and 3 channel, two SAS drives on the 1 and 2 channel, and a HP Smart Array P410 Controller (/dev/ciss0). So:
`smartctl -a /dev/ciss0 -d sat+cciss,3
smartctl -a /dev/ciss0 -d sat+cciss,0
smartctl -a /dev/ciss0 -d cciss,2`
The first and second lines lead to a reboot after ~ 250 repetitions, the third can run indefinitely.
Two versions:
1. Bug in `smartctl`
2. Bug in the HP Smart Array P410
Is it enough to file a bug reports?


----------

