# ZFS raidz2 keeps freezing 8.0 64bit system



## big_girl (Oct 10, 2010)

Hi all,

First, I initially posted something about this at http://forums.freebsd.org/showthread.php?t=4623 but the problem is actually not apparently related to kvm_getenvv but rather seems related to ZFS, so I hope it belongs here.

I created one zpool approx 3 months ago on 8.0 x86_64 and it was a 6 volume (1TB wd1001fals disks) raidz2 pool, using ZFS v13. I also use gnome. Starting a couple weeks ago, typically during very long rsync transfers to computers across a local switch, I began to experience system hangs where the whole machine froze partway through the transfer. The other machine was fine. This has gotten worse to the point where the system will totally freeze within seconds of mounting the ZFS and not even doing anything with it. It sits on top of geli encrypted volumes. I ran a scrub yesterday but the system froze about 3/4 of the way through, after approximately 6 hours.

I just did a clean install on a separate disk with 8.0, and got the same problem. It also happened with 8.1 on an SSD.

At the crash in the fresh install of 8.0, after attaching the geli volumes and typing 
	
	



```
zfs import {raidz2.volume}
```
 the system hangs and the error message I see is:


```
Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address = 0x28
fault code = supervisor read data, page not present
instruction pointer = (didn't copy here)
frame pointer = ""
code segment = ""
processor eflags = interrupt enabled, resume, IOPL=0
current process = 15 (txg_thread_enter)
trap number = 12
panic: page fault
cpuid = 3
Uptime: 9m4s
Cannot dump. Device not defined or unavailable.
Automatic reboot....
```


Then she freezes..

I also ran memtest but got no errors. In the machine are also a SCSI card and an LSI 8port SAS controller.

As Galactic_Dominator suggested in the other thread, I examined the drives for physical damage. I used WD 'Data Lifeguard Diagnostic Tools' and did the 'quick test' because it doesn't make any changes but no errors were found. I could do the more extensive test but would rather not since the drives are all pretty new (< 1 year) and it potentially makes changes if it finds errors. I'm afraid changes might make volumes unusable; but then again, two of them could be lost and the volume could still be recovered..

Any help on how to import this ZFS volume would be MUCH obliged!!

Thanks in advance,
-bg


----------



## big_girl (Oct 11, 2010)

Update - removing the LSI card and booting the system disk from USB via a SATA to USB adapter also froze within seconds after geli attaching the six volumes and typing 
	
	



```
zpool status -x
```
Funny thing is, it quickly returned 'All pools are healthy' before it froze. I worked with it for 20min or so before attaching those volumes, ran fsck, etc., but as soon as that ZFS volume was touched, the whole thing froze.


----------



## Galactic_Dominator (Oct 11, 2010)

I specifically didn't suggest memtest because it can only validate memory is bad.  It cannot prove it is good.  There are certain memory errors memtest cannot detect.  Please use the procedure I gave earlier to validate your memory.  It possible the ZFS/GELI combo is stressing your memory in ways normal operations don't.  As I said earlier, it looks like a hardware issue.  If there is any way you can take the drives to another system, I suggest to you try it there.  If you have another computer, you could boot from an mfsbsd cd, import the pool and see what happens.  If it works, you've at least ruled out a large set of possibilities.  

Also an extended hard drive test isn't usually destructive.  I can't remember all of them though so I can't say for certain on your setup.

It may come down to having the crash dump analyzed.


----------



## big_girl (Oct 11, 2010)

Thanks for following up -- I swapped out the memory into two sets and got the same freeze. 

There were some issues with the WD1001FALS drives and needing to use a program to adjust the timeout (to make it longer) but I never did this and I didn't find anything recommending this either way for freeBSD + ZFS.

I'll move the pool over to another system and see what happens.. it will be a few hours before I can attempt this.. 

Thanks again,
-bg


----------



## Galactic_Dominator (Oct 11, 2010)

big_girl said:
			
		

> Thanks for following up -- I swapped out the memory into two sets and got the same freeze.



Well that pretty much rules out RAM then, but still would be nice to see what happens on another system.  CPU, L1-2 cache and memory controller are still in play.



			
				big_girl said:
			
		

> There were some issues with the WD1001FALS drives and needing to use a program to adjust the timeout (to make it longer) but I never did this and I didn't find anything recommending this either way for freeBSD + ZFS.


I believe you're talking about the wdidle utility, and that in theory should not have anything to do with the issue you're seeing.  All that does prevent the drive from parking so often.  Maybe you mean something else.

If all that fails, try the STABLE mailing list.  There are some good ZFS/SMART people there who may have better ideas.


----------



## big_girl (Oct 11, 2010)

I wonder if there's any utility in swapping out one of the six disks in the RAIDZ2 and seeing if I get the freeze? Very little extra work besides the reboots.. and would rule out a damaged HD?

Or a waste of time?

Thanks again,
-bg


----------



## Galactic_Dominator (Oct 11, 2010)

I think that would work.  You can try it before you try the pool elsewhere.  Worth a shot I guess.  I don't think I've ever seen a bad HD cause a panic, but then again I've never used your type of setup either.

EDIT:  I should have said I've ever seen a bad HD cause a panic when it's part of an abstracted reduntant device like gmirror or raidz.  A single drive setup with a disappearing device of course can and does cause a panic.


----------



## big_girl (Oct 12, 2010)

Nope, no luck. Omitted two drives each of three ways and got the freeze. Omitting all drives gave no freeze. 

Next to another system..


----------



## big_girl (Oct 12, 2010)

Don't know if this is helpful or not, but on the generic 8.0 R4 64bit kernel, I get the same instruction pointer for each freeze, which is 


```
0x20:0xffffffff80e662d3
```
The attributes of the code segment are also the same each time, while the stack and frame pointer addresses vary. 

At 
	
	



```
http://www.freebsd.org/doc/en/books/faq/advanced.html
```
 one can use 'nm' to get more info about the calling function, but I am unsure of how to get the kernel name so I haven't succeeded with 'nm'.

Thanks,
-bg


----------



## big_girl (Oct 13, 2010)

Unfortunately on another system the exact same problem occurs shortly after importing the volume. 

The last thing before trying the disks on a different system was an attempted scrub a couple of nights ago, and it crashed after getting nearly finished. When I imported the RAIDZ2 ZFS volume on the new system just now, after typing 
	
	



```
zpool status
```
 it informed me the scrub had resumed, but then I got the same freeze and error message on the screen again with the same instruction pointer address after 1 minute of importing. 

This was also on 8.0 R4 64bit. 

Thanks,
-bg


----------



## big_girl (Oct 14, 2010)

Pure excitement -- my laptop, a lenovo g530, is also running 8.0 64bit, and appears to be showing the early signs of the same problems. I've got a geli encrypted USB disk I connect to it, and also a geli encrypted partition, both with ZFS volumes.

Either ZFS v13 on 8.0 doesn't really work or I'm making systematic errors that cause all of my volumes to eventually cause system hangs that are not recoverable.

Please help. 

Love always,
-bg


----------



## Galactic_Dominator (Oct 14, 2010)

There have been a lot of ZFS improvements since 8.0.  I didn't suggest it earlier because you indicated you are worried about data loss, but upgrading to STABLE would be a logical step to see if any of those improvements resolve the issue.  I seriously doubt such an upgrade would eat your data, but you never know.  Otherwise, take to the stable or fs mailing list, people there are far more of an expert in the area.


----------



## big_girl (Oct 14, 2010)

Word. I can easily dd the disks to duplicates in case of doing something risky, but the 2nd thing I tried previously was installing 8.1, then importing the RAIDZ2 volume. Same freeze.

I definitely feel like the data is there and is ok (plus I have a recent backup) but there's definitely something I'm missing..

Thanks,
-bg


----------



## Galactic_Dominator (Oct 14, 2010)

This may also help:

PR kern/117158


----------



## big_girl (Oct 14, 2010)

Thanks for this - I always type 
	
	



```
geli_autodetach="NO"
```
 in /etc/rc.conf from the beginning at install time, so that's likely/hopefully not it (including laptop and other system I used to test the other day). 

But I'm pretty convinced I'm bunging something up, so here's more info:

Since the main system (with the 6x1TB wd1001fals RAIDZ2 volume - this is the one that usually panics within a minute of issuing a zpool command; however, if I run [cmd=]zpool scrub[/cmd] right away after geli-attaching the 6 volumes, that has gone for approx 6 hours, almost to completion) has 8GB ram, I didn't tune the ZFS (v13) at all. 

I never decrypt/mount at boot time. I boot up the computer into gnome, start a root terminal session, decrypt the key (on a UFS2 USB stick), use the key to attach the volumes, unmount/destroy the key, and finally mount the zfs. 

I don't have any entries in fstab for any of these volumes. 

I'm trying to brainstorm anything else that's typical of how I've been setting up these geli/ZFS volumes. 

Does anything here stand out as a potential point of failure? Like I said, I did try once with an 8.1 install to import the pool, but appeared to get the same freeze requiring a hard restart. 

Thanks again,
-bg


EDIT - a couple of other thoughts - the freezes on the big volume coincided also with my switch from scp to rsync for fairly large (50GB-800GB) file transfers over a local network switch (if you google rsync, ZFS and FreeBSD there are apparently some issues); I think the last crash came when doing the biggest transfer so far, which was supposed to be 800GB and crashed after about 350GB. It was also to a ZFS filesystem folder within the zpool; otherwise, everything else in the zpool is in folders (not separate filesystems). I usually set ownership to root and permissions to 400 within the zpool (which should include the ZFS filesystem in it as well as all the folders created directly in the zpool folder which mounts at / when I type zfs mount) and set ownership to me and permissions of the ZFS filesystem to something like 755, and do the transfer as me (not root). 

The other thing I can think of is that also during a large file transfer, maybe a week or so before the problems started, my stupid cousin came to visit and started pushing buttons on my computers; she rebooted by freeBSD box with the RAIDZ2 array while it was transferring files to a backup zpool (that was a 4 x 1TB RAIDZ) via either rsync or scp.


----------



## Galactic_Dominator (Oct 14, 2010)

I don't think there's anything wrong with the setup, seems logical and definitely shouldn't cause a panic.  I use GELI/UFS volumes in a similar manner.  

Perhaps export the decrypted GELI over ISCSI, then with a OSOL host connect to ISCSI, and import the pool.  See what happens.

The other system to connected the pool to earlier, did it have the identical SATA controller?


----------



## big_girl (Oct 14, 2010)

Yes, for the other system I used the same LSI card; on the main system where the volume lives I've used either the onboard (Intel G35) or the LSI card. 

For the card I added 
	
	



```
mpt_load="YES"
```
 to /etc/rc.conf recently after the problems started but had forgotten to do it previously. 

It will take a while, but that is a good idea to decrypt and export the volumes to new disks. That way I can see whether the issue stems from geli or from ZFS.  

Thanks,
-bg


----------



## Galactic_Dominator (Oct 14, 2010)

You can try setting kern.geom.eli.debug.  3 is the most verbose.

`# zdb` may also provide some info.


----------



## big_girl (Oct 14, 2010)

*dnode.c*

geli-attaching the disks with verbosity showed nothing awry, then (still as root) running [cmd=]zdb -v tank.zfs[/cmd] ran for a while, printed out a bunch of file names and attributes, then produced this error (but no freeze!):


```
Assertion failed: (size <=(1ULL << 17) (0x2c0000 <= 0x2000)), file 
/usr/src/cddl/lib/libzpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dnode.c, line 264.

Abort
```

Trying to re-produce the error now then will load from 8.1 release and run [cmd=]zdb[/cmd] again.

-bg


----------



## big_girl (Oct 14, 2010)

Incidentally, it appears that *maybe* the issue with the ZFS volume on the USB disk had to do with the 'path=' often being incorrect; i.e. when I typed 'zdb' it was revealed that the 'path=' parameter for this volume was set to 'da0.eli' when the disk was actually at 'da1.eli'. Typing [cmd=]zdb <usb zfs disk name>[/cmd] threw an error about not finding it. Unplugging & replugging so it was back on 'da0.eli' allowed [cmd=]zdb <usb zfs disk name>[/cmd] to run to completion & report no errors. 

But I can't help thinking that issues with unmounting and recent hangs might be related in the case of the USB drive. 

The problematic RAIDZ2 volume had 'path=' set correctly.

-bg


----------



## big_girl (Oct 16, 2010)

Alright, as previously running [cmd=]zdb -v tank.zfs[/cmd] produced the error I printed, I then ran [cmd=]zdb tank.zfs[/cmd] on the same RAIDZ2 volume again (I omitted the verbose argument this time) in an attempt to reproduce the error. 

However, instead of running for a short while (in minutes) as previously, then crashing, this time it ran for approximately 36 hours, then finally bonked with this error:


```
Assertion failed: fsize <= (1ULL<< 17) (0x15ce800 <= 0x2000)), 
file /usr/src/cddl/lib/zpool/../../../sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c, line 422.
Abort
```

For the heck of it, I then tried (without mounting the ZFS volume, as I cannot without generating a panic and system freeze) [cmd=]zpool export tank.zfs[/cmd]

But then got the 
	
	



```
Fatal trap 12: page fault while in kernel mode
```
 error after a few seconds. This time it didn't freeze up but rather rebooted. 

I wonder if the next step might be trying 8.1 Release and the 'zdb' command?

Thanks again,
-bg


----------



## Galactic_Dominator (Oct 16, 2010)

Like I said earlier, 8-STABLE is a better choice.  You can then try upgrading to zpool version 15 then and all the other zfs fixes/improvements since then.  Although I'm not sure about the wisdom of advising you to do the upgrade when a scrub will not successfully complete, but then again given where it's at now...


----------



## big_girl (Oct 16, 2010)

Ahh, I see -- before your post I mistakenly assumed 8.1 RELEASE was what I wanted but clearly I want STABLE. I'll use this month's 8.1-STABLE amd64 snapshot from ftp://ftp.freebsd.org/pub/FreeBSD/snapshots/201010/ and give it a shot.

And as I said earlier, I do have a recent backup so it's not a huge issue if the RAIDZ2 gets mangled.

After booting into 8-STABLE, do I have to [cmd=] zpool import [/cmd] the pool BEFORE upgrading it to v15? Also, is there any wisdom in trying [cmd=]zdb[/cmd] or any other troubleshooting first on this pool (before import/upgrade) after booting 8-STABLE?

And as I mentioned previously, I have one zfs filesystem within the zpool; it was my understanding that the zpool should be upgraded first, and then any separate zfs filesystems within the pool also need to be upgraded separately.

DD - thanks for your comment about `and [code] -- I'll be more careful.

Thanks and all the best,
-bg`


----------



## big_girl (Oct 17, 2010)

Getting killed here.. Same exact issue (Fatal trap 12) unfortunately after running 8-STABLE (ZFS v15) and attempting to import the zpool. 

Will run [cmd=]zdb[/cmd] again on this zpool and report how it crashes.. 

-bg


----------



## big_girl (Oct 17, 2010)

[cmd=]zdb[/cmd] won't run as it doesn't know about the pool..


----------



## danbi (Oct 19, 2010)

What is your setting for vm.kmem_size?

In the days of 8.0 I was routinely adding

```
vm.kmem_size="12G"
```
to /boot/loader.conf, as otherwise weird things happened under load with ZFS. This is for 8GB RAM system. Now, with -stable few systems run without any tuning and no crashes.

It looks like you have some ZFS corruption, as you are getting those asserts from zdb. Perhaps good idea is to rebuild and re-populate your pool.


----------



## big_girl (Oct 19, 2010)

Thanks for that. I saw your post but hadn't tried it.. unfortunately it gave the same (Fatal trap 12) error upon 
`# zpool import -a`

Since the zpool started having these problems after a really big rsync transfer (thinking more about it there was a 93GB file in the bunch) I was actually wondering if I might be running out of memory and then swap, but maybe not..?

Prior to trying that, I also tried removing the /boot/zfs/zpool.cache file and importing again; the command executed, returning me to the prompt, but then the system froze a few seconds later, also with the same error; however this time it also printed a 
	
	



```
bufwrite: buffer is not busy???
```
 error to the console as part of the Fatal trap 12 error.

The Gravity Test is looking more and more appealing..


----------



## Galactic_Dominator (Oct 20, 2010)

Perhaps it is the livelock issue mentioned here:

http://lists.freebsd.org/pipermail/svn-src-all/2010-October/030158.html


----------



## big_girl (Oct 20, 2010)

That does seem like a good match. I'm running 8.0-RELEASE-p4 and looking at /usr/src/sys/geom/eli/g_eli.c I see its version is 1.44.2.1.2.1 dated 10/25/09, so I assume my compiled version is the same? Since this was before the 4/15/2010 change, unless I'm missing something, it couldn't cause this error(livelocking)?

Thanks,
-bg


----------



## danbi (Oct 20, 2010)

Have you tried booting recent OpenSolaris and trying to import the pool?


----------



## big_girl (Oct 20, 2010)

danbi said:
			
		

> Have you tried booting recent OpenSolaris and trying to import the pool?



That may be what I'll end up doing.. it will be sort of a pain since I will have to decrypt the volumes with geli first. 

Am I right to assume that the compiled version of the corresponding system file that I have is the same as the version of this source file /usr/src/sys/geom/eli/g_eli.c in my 8.0-RELEASE-p4 install? I'm not sure how to get the version otherwise..

Thanks!
-bg



EDIT : I installed the 8.0-RELEASE without updating (from the DVD) and had the same problem. Thus I can rule out the livelock issue.


----------



## big_girl (Nov 8, 2010)

*A couple of other thoughts...*

Sorry to keep kicking a dead pig, but, I added


```
vm.kmem_size="12G"
vfs.zfs.arc_max="4G"
```
to /boot/loader.conf on 8.0-RELEASE, 8.1-RELEASE, and 8-STABLE, but had the same freeze.

I also typed

```
sysctl -a | grep vfs.zfs.zio.use_uma
```
And found, on my installations of 8.1-REL and 8-STABLE where it is tunable, this parameter disabled (set to "0" by default). (I found some threads out there about issues with this creating system hangs if it is enabled). So probably no issue there..

Incidentally, this zpool was probably 80% full, but another thing that occurs to me is that I've never emptied the trash, and have deleted beyond the capacity of the raidz2 zpool volume. Searching another (living) ZFS volume, I can see where my deleted files go, but again it's unclear to me whether freeBSD/ZFS will automatically purge my deleted files or not(?). Just wondering if perhaps what I'm seeing has to do with the volume being totally full..

Also, another thing that happened over time, was that somehow two of the six volumes seem to have switched places on the controller.. I noticed because the geli keys I use to decrypt them before zfs mount now decrypt two in the reverse order. They do decrypt OK, but I was wondering if this might potentially wreak havoc on my zpool?

At this point most of this is academic as I have a pretty recent backup of everything and it seems like there's no way this volume will EVER come back online. The main thing I'd actually be interested in is simply getting a list of files and directories created/modified after a given date, so that I can make sure I don't actually lose anything (or at least know what I have lost) -- if there is a way to get this using ZDB or similar (?), that would be really helpful.. 

Incidentally, are there any resources for how to actually use ZDB?

Thanks,
-bg


----------



## big_girl (Dec 18, 2010)

Final lesson: don't use ZFS + rsync if you value your data and time.. 

Of course, I may have to re-evaluate that down the road, but that really seems like the most likely culprit. Total bummer as I really like rsync..


----------



## phoenix (Dec 18, 2010)

ZFS + rsync works wonderfully.  Been using it since ZFSv6 hit the FreeBSD 7.0 tree at work.  rsync backups of 127 remote servers every night, each into their own sub-directory, snapshots taken every morning.  Then rsync'd to another server across town during the day.

Main server is now FreeBSD 7.3 with ZFSv14; secondary server is now FreeBSD 8.1 with ZFSv14.

We restore files and directories via rsync on an almost daily basis.  And use Frenzy/Knoppiz + rsync to restore entire servers at least once a month.

What really matters is what rsync options you use.


----------



## big_girl (Jan 6, 2011)

I just saw this post. 

Thanks for this - would you mind posting the rsync parameters you use routinely as well as the parameters that tend to crash ZFS servers, from your experiences? This would be really helpful for me. The parameters I use with rsync are simply '-av' and I've had problems with large (~500GB) transfers, in which are contained large files, such as disk images, which can individually be 80-100GB each. I don't recall any problems with smaller, quicker transfers, which led me to believe that my configuration options for ZFS memory use were probably incorrect/unsafe. I haven't been able to determine much about what has caused the crashes, and frankly, I've invested a foolishly large amount of time trying to find the reason when, for better or worse (and most likely due to my own inexperience) I don't seem to have the problem when I don't use rsync. 

In the past and currently, I've used rsync on large 64bit fedora servers running very large (20TB) raid volumes, both within the box and over networks, miles away, but have never had any problems there with instability, yet have had unacceptably high failure rates using it on freeBSD + ZFS, the details of which are posted earlier in this thread. 

Thanks,
-bg


----------



## phoenix (Jan 7, 2011)

Shameless plug:
Thread on rsync backups

Our current set of rsync options are:

```
--archive --delete-during --delete-excluded --hard-links --inplace --numeric-ids --partial --stats
```

Using HPN-enabled net/openssh-portable in place of base OpenSSH, with the following options (only enable None if on a network you trust, and if both ends are using HPN; the buffer is a must):

```
-oHPNBufferSize=8192 -oNoneEnabled=yes -oNoneSwitch=yes
```


----------



## big_girl (Jan 6, 2012)

Thinking more deeply about this issue over the last year (but not having the time to rebuild this server; thankfully it was backed up right before death & I'm too busy), I doubt freeBSD or the ZFS implementation (or their initial, ZFS-related parameters) were to blame at all. In fact, freeBSD is a beacon of stability in a turbulent world. 

There are several likely possibilities, each of which could have killed my zpool:

1) power supply might have reset at high, sustained voltages - I wouldn't recommend http://www.newegg.com/Product/Product.aspx?Item=N82E16817148040 After more consideration, a good power supply is never a bad deal. The high-end ones also have very long warranties, like 7 years. The cheap ones break right after the warranty ends. Consider the $$$/year.
2) disks might have switched IDs. As this was my first time with freeBSD and also a hodge-poge of leftover parts (except for the ZFS disks and power supply, which I had to buy), I swapped multiple controllers in this machine and suspect that two disks might have swapped at some point. I also used multiple system disks, connected to different controllers at various times. Hardwiring SCSI IDs in the future.
3) apparently existing, weird behavior from the fedora machines I was transferring to/fro. Does anyone else have weird problems when transferring big files over a cheap switch between fedora boxes? When the same machines are transferring data from a USB connection (as opposed to SATA) this does not happen (still over same cheap switch). I'm talking about ~400GB and more data, mostly music files and large (80GB disk images) all in the same rsync run. 
4) a few power outages, like the one due to my idiot cousin (& that ratty power supply). 
5) untweaked rsync & ssh parameters.

Fortunately because of the backup I can just consider this a pretty good initial experience. 

-BG


----------



## tnpimatt (Jan 6, 2012)

Hey big_girl,
I read through your ordeal with a bit of nostalgia. In 2008 I build a pair of ZFS servers for doing backups of linux systems via rsync.  Read my post for some background. In short, I was running 3 concurrent rsync processes. Why 3?   Because empirical testing proved that 3 concurrent rsync processes could saturate my ZFS pools. More than 3 rsync processes put more memory pressure on the system and slowed everything down.

Each rsync process was backing up a Linux VPS to my ZFS based backup servers. For moving around that much data, rsync + SSH was a non-starter, it just could not move enough data across the network to get 8,000 servers backed up in less than a week. We had a private network available so I pushed all the backup traffic across that network unencrypted, using rsyncd. I didn't have to tweak rsync params at all. IIRC, excepting some minor tweaks for monitoring and reporting, I used the default rsync options that rsnapshot defaults with.  

I highly doubt #5 was related to your problem. One thing I can tell you though is that ZFS on FreeBSD behaves much, much better when running atop real RAID controllers. I had all sorts of problems with ZFS on OpenSolaris and FreeBSD 7 during initial testing. With 24 disks across 3 controllers, performance and stability were both terrible. When ZFS only had to stripe data across two 12-disk RAID volumes with each controller having 1GB of BBWC, it performed fairly well.


----------



## big_girl (Apr 2, 2012)

6) Make sure your outlets are wired correctly. My landlord is less than worthless, and I recently discovered that the circuit powering most of my equipment was wired by an amateur electrician and was not properly grounded. After getting shocked and having a computer fried (not the same server as with the ZFS, for better or worse) one day not too long ago, I bought a $3 outlet tester and discovered this. Always test outlets before plugging into them. This might have contributed to instability.

So I got back into messing around with this box over the last few days... I bought a nice, efficient power supply with a 7yr warranty, rated at 910 watts with 12 SATA connectors. I would have preferred to get bona fide server equipment but I need frequent physical access to the machine and could not tolerate the noise. 

I installed 9.0 release and set up everything as noted before in this thread. The idea was to chuck this pool and create a new one from a backup I have from just before the pool became unuseable.

For the hell of it, I decided to try and import the pool with my new setup before erasing the disks. As is typical, I got some interesting results. After I typed import, CPU useage went almost to 100% for a few hours before the system froze. I rebooted again, decrypted the volumes, and [CMD=]zpool status -x[/CMD] returned some information. Paraphrasing (except where quoted), this is what it said:

State said the pool was online.
Status said the pool was older and should be upgraded.
Action said to upgrade the pool since it was older.
Scan said scrub in progress since when I tried to import this pool (which was last night). "3.78T scanned out of 3.85T at 1/s, (scan is slow, no estimated time)
0 repaired, 98.32% done"
Config showed the pool correctly with no errors.

Then the system froze.

I rebooted again, decrypted the disks, and once more typed 
	
	



```
zpool status -x
```

Then it froze instantly, stayed frozen for a second and then rebooted. There wasn't anything that seemed to be related in /var/log/messages

I ran [CMD=]zdb -v tank[/CMD] which ran for a few minutes, then threw a nearly identical error as to what I described on this thread (dnode.c error, see post 19) and returned me to the command line without any freeze or crash. Running it again causes the same crash/dump at exactly the same point (or at the same file, from the 'zdb' output).

I then upgraded [successfully] to v28 a second before the system froze and then rebooted. Running 'zdb' again on the upgraded pool bonked on exactly the same file as before. The 'status' command then returned 'All pools are healthy' before she froze a few seconds later and rebooted.

What seems to be the case here is that there is a file or region on this raidz2 volume that chokes the ZFS. E.g. each time I use the 'status' command, it appears to cause the ZFS to resume its scrub where it previously left off (at 98.32%), then it hits the rough patch and causes a kernel panic. Since this seems to correspond to a file or region, is there anyway I can tweeze this file or group of files out for removal?

It seems like the pool is mostly fine and that I might be able to recover it, but then again I've spent countless hours already, making me think I should probably kill this f-ing thing once and for all and start over.. 



EDIT: I was able to then export the pool, but now the system freezes upon import.


----------

