# System halting



## finnsson (Apr 21, 2009)

Hi there..

I've got a strange problem with a 6.3-RELEASE-p5 web server. From time to time it stops responding to http requests and when I try to ssh into it - it prompts me for username and then password but nothing happens after I type in the password. The ssh session just hangs. Same thing when trying to login on the console. I can type in username, then it prompts me for password but I can't see if I'm typing it in or the session is already frozen. However the machine responds to ping just fine and the ssh/console login session prompts fine but I just can't seem to login and all services are down.

When momentary pressing the power button or ctrl+alt+del the machine goes through the normal shut down sequence and boots up just fine.

I'm not sure where I could look for hints for what the problem is, the message logs show nothing and I'm sort of lost.


----------



## Mel_Flynn (Apr 21, 2009)

No disk access or /var full.

Check /var/log/messages. Both of these (READ_DMA TIMEOUT, ata errors for disk access) are logged there. Of course, if it's /var full, those lines might just be missing 

Most likely reason is disk access, don't think login(8) needs to write anything on disk. So, the password/user lookup from /etc/password is not succeeding.


----------



## finnsson (Apr 21, 2009)

Mel_Flynn said:
			
		

> No disk access or /var full.
> 
> Check /var/log/messages. Both of these (READ_DMA TIMEOUT, ata errors for disk access) are logged there. Of course, if it's /var full, those lines might just be missing



Thanks for the reply, but nope - plenty of space. Latest log in /var/log/messages is from my BAD SU few hours before the halt so the logs show me nothing :\


----------



## LateNiteTV (Apr 21, 2009)

passwords arent echoed back in freebsd. you wont see like ******** in the password field when youre typing it...


----------



## finnsson (Apr 21, 2009)

LateNiteTV said:
			
		

> passwords arent echoed back in freebsd. you wont see like ******** in the password field when youre typing it...



I know.. therefor I'm saying I don't know if the session freezes before or after I'm prompted for the password.


----------



## LateNiteTV (Apr 21, 2009)

lol ok i see what youre saying now. it sounded like you thought the session was freezing because you couldnt see anything as you typed the password.


----------



## phoenix (Apr 22, 2009)

Sounds like you've hit a livelock (system locked up but still responds to some things), most likely to do with either kmem or network.

Are you using ZFS?  Which NIC chipset and driver?  How much RAM in the system?


----------



## finnsson (Apr 24, 2009)

phoenix said:
			
		

> Are you using ZFS?  Which NIC chipset and driver?  How much RAM in the system?



2048 MB of RAM ..

bce1: <Broadcom NetXtreme II BCM5708 1000Base-SX (B2)> mem 0xf6000000-0xf7ffffff irq 16 at device 0.0 on pci3 

..and no ZFS.

This is a HP Blade server, with SAN SCSI storage mounted via FC mezzanine card.


----------



## phoenix (Apr 24, 2009)

There's been a lot of discussion re: the bde(4) driver in FreeBSD 7.x on the -stable and -current mailing lists over the past two months.  You may want to search the mailing list archives for threads with *bce* in the subject.  They may be helpful.

Are you able to get to the physical console to see if there's anything printed there when this happens?  If not, you should edit */etc/syslog.conf* and enable the console.log.  Then look in there after a reboot when this issue happens.


----------



## foo_daemon (Apr 25, 2009)

I have run into this same kind of system behavior too.  In my case, it turned out to be a bad Hitachi Deskstar drive, as evidenced by hearing the 'clicks of death' and clumps of g_vfs() error messages on tty1.

Maybe your HD is silently failing?


----------



## Mel_Flynn (Apr 25, 2009)

You can test disk access, by running the following:

```
#!/bin/sh
me=$$

while(true); do
    date> /tmp/disktest.${me}
    sleep 60
done
```

After halt and boot, cat the disktest file to view when it was last able to write. When you observe the halt you may want to keep on pinging at least for 60 seconds, so that the disktest should have been able to write another date to the disk.


----------

