# ZFS freeze my system, what could be wrong?



## olav (Sep 22, 2010)

After I got a kernel panic on the system (which is another story) the system wouldn't boot. After loading the ZFS module the system would freeze. I managed to remove the zfs_load in /etc/rc.conf and the system booted fine again.

However if I try to `kldload zfs.ko` and type `zpool status` the whole system freeze. In that way nothing responds, like you can't type or login with ssh. But you can change screen with ctrl-alt fX (but still cant type) and the screen saver logo works. ctrl-t also works and says something with zfs and how long it's been running, but it doesn't make much more sense to me.

What can I do to figure out what has gone wrong and how to fix it?

Running FreeBSD 8.1 release i386


----------



## phoenix (Sep 23, 2010)

Try booting to single-user mode, and running the following:

```
# /etc/rc.d/hostid onestart
# kldload zfs
# zpool export <poolname>
# zpool import <poolname>
```

Hostid is needed to be run before any ZFS commands, otherwise you'll get "pool imported on another host" warnings and nothing will work.

The export/import should reset things back to a usable state.

Use *CTRL+T* to get information on what's happening if it looks like things are hung.


----------



## olav (Sep 26, 2010)

That caused the system to freeze again. Running only zpool export was enough(kldload zfs work fine though). I really didn't have time to find out what caused this so I've now recovered a backup 

CTRL-T said something like zpool with some numbers. If it should happen again I will take a photo of the CTRL-T output.


----------



## olav (Oct 6, 2010)

Now it happened again. Here is a photo of the monitor of what I did in single user mode, and with output of ctrl+t.


----------



## phoenix (Oct 6, 2010)

As long as you can press CTRL+T and get output back, the box is still running.  It's just a long running process.  Leave it overnight and see if it's still sitting there like that in the morning.


----------



## Terry_Kennedy (Oct 7, 2010)

olav said:
			
		

> Now it happened again. Here is a photo of the monitor of what I did in single user mode, and with output of ctrl+t.


To expand on what phoenix said, the box is still running. In particular, the state of the zpool command is "runnable" with 0% CPU.

Most of the ZFS user commands actually trigger kernel routines that do the actual work. For example, a scrub command will return to the prompt after a brief period, even though the scrub is still running.

If your drives have visible activity LEDs, do they show any activity?

Since you're in single user mode, you can't check on the status from another session, but you might be able to background the process:
`# ^Z`
`# bg`

If that doesn't work and you wind up rebooting the box (I wouldn't suggest a reboot if not absolutely necessary, as it may cause further problems with your zpool), you can execute the command in the background by following it with an ampersand:
`# zpool export tank &`

Potentially useful commands to see what is going on:
`# top -S`
`# systat -v`
`# zpool iostat 1`

Note that the last of those 3 commands may hang, if it gets held up waiting for a ZFS lock (due to the export in progress).

Once you've pinned down what is keeping the system busy, people can suggest more detailed ways of investigating that one area.


----------



## olav (Oct 7, 2010)

Now I've waited over 24 hours without anything happening.

I can't type ctrl+z and if I type:
`# zpool export tank &`
Then I can type in another application, but it will not execute, the system hangs and waits for the zfs lock/export.

I see no activity on the HD LED 

Is there anything else I can help with? If I disable ZFS in rc.conf I can boot the system just fine.


----------



## wu (Aug 23, 2011)

*SOLVED: same problem*

My UPS failed a couple of days ago, and my FreeBSD server (8.2-release) lost power while it was running.  I powered it back up, and it hung when trying to mount the ZFS filesystems.  This is a raidz2 filesystem with 4 2T disks that had been problem free for months.  The raidz2 was built using this procedure:

http://forums.freebsd.org/archive/index.php/t-21644.html

I commented out zfs_enable from /etc/rc.conf and then I was able to boot up.  Running *kldload zfs* worked fine, but then when I ran any ZFS commands, the disk lights would come on for a few seconds and then go off, and the system would become mostly non-responsive.  ctrl+t showed that it was running zfs, but no cpu was being used, and that the load continued to increase over time.  I could toggle between the virtual consoles or hit ctrl+t, but beyond that, it would not accept any keyboard input.  It responded to pings but was otherwise non-responsive, and my ssh connections would just freeze.  I tried running *zpool export* and it froze, and I was worried that might have caused some additional problems.

I booted using the latest opensolaris livecd, and that told me that 3 of the 4 disks had 'corrupt data'.  That was really bumming me out since i only needed 2 disks to recover.  :/

I upgraded to 8-stable, and everything came back online and problem-free.  Yipee!!!  I have not yet done the *zpool upgrade*, but plan to do that after I get everything backed up properly.


----------



## dave (Sep 25, 2011)

I am in the same situation, but I do not want to upgrade to STABLE, because I want to stay on the RELEASE path.  I would freebsd-update to 9.0-BETA2, but it is not available because engineering wants people to test the fresh install.  What to do?  What is the best way for me to get a newer version of ZFS on my 8.2 system?


----------



## dave (Sep 26, 2011)

To follow up, here's what I did...

First a little background:  This all started when I accidentally rebooted a machine while it was in the middle of a zpool replace process.  The machine then began to exhibit the behaviour described above by the OP.

After a long wait to see if the machine would come back, and lots of head-scratching, I finally shut the machine down hard.  I downloaded the 8.2-STABLE livefs iso, booted that and entered FIXIT in liveCD mode.  Then...


```
Fixit# kldload /mnt2/boot/kernel/opensolaris.ko
Fixit# kldload /mnt2/boot/kernel/zfs.ko
```

I was able to recognize the pool, so...


```
Fixit# zpool import tank
cannot import 'tank': pool may be in use from other system, it was last accessed by [snip: host and date]
use '-f' to import anyway
```

At this point, I had very little to lose, so I...


```
Fixit# zpool import -f tank
```

...and it works!  Pool imports and begins resilvering again.  Once it was finished...


```
Fixit# zpool export tank
```

...and shutdown, remove replaced drive, and boot back into 8.2-RELEASE.


----------

