# Unstoppable resilver



## simplex (Apr 30, 2012)

Hi, I've a problem with my ZFS pool on FreeBSD 8.3-RELEASE. The pool is version 15, composed by four disks, two mirror. This was the situation: I had a faulty disk in the second mirror and before I was able to replace it the other one started having problems ("Already active DMA on this device"). I've fixed it disabling DMA (but now *I*'ve other errors in dmesg, but that's another problem..).

After booting without DMA the system was able to mount the ZFS pool and the data looked ok. I've replaced the originally dead disk and I've started the resilver. This is the situation now:

```
pool: pr0nserv
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Mon Apr 30 10:29:09 2012
        284G scanned out of 1.18T at 277M/s, 0h56m to go
        16.6G resilvered, 23.51% done
config:

	NAME                       STATE     READ WRITE CKSUM
	pr0nserv                   DEGRADED     0     0   108
	  mirror-0                 ONLINE       0     0     0
	    ad4                    ONLINE       0     0     0
	    ad6                    ONLINE       0     0     0
	  mirror-1                 DEGRADED     0     0   648
	    replacing-0            DEGRADED   648     0     0
	      6530854401941125969  OFFLINE      0     0     0  was /dev/ad8/old
	      ad8                  ONLINE       0     0   648  (resilvering)
	    ad10                   ONLINE       0     0   648

errors: Permanent errors have been detected in the following files:

        <metadata>:<0x87>
```
The problem:
I have an error on *metadata*, and *I* can't get rid of it. The resilver keeps restarting again and again. If I reboot it, resilver again, if *I* do a `# zpool clear pr0nserv` or a `# zpool clear pr0nserv mirror-1` the resilver restarts again. I've removed two files that were corrupted but *I* can't "fix" the metadata error. I think that a *scrub* could fix it but *I* can't scrub because it will resilver again 

If someone knows how to fix it, please tell me.

I think that a brutal way to fix this could be copy all the files that are in the second mirror, remove it, re-create it and copy the files back but I would avoid this, if possible.

Thanks.


----------



## simplex (Apr 30, 2012)

Looks like I've solved with a `# zpool detach pr0nserv 6530854401941125969`
Now I'm scrubbing to see if that fixes the metadata error.


----------



## simplex (May 2, 2012)

Looks like it's not finished...
I've upgraded the pool to version 28, scrubbed again and cleared the errors but the metadata error is still here:

```
[root@pr0nserv ~]# zpool status -v
  pool: pr0nserv
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
  scan: scrub repaired 389K in 16h37m with 1 errors on Wed May  2 04:15:07 2012
config:

        NAME        STATE     READ WRITE CKSUM
        pr0nserv    ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            ad4     ONLINE       0     0     0
            ad6     ONLINE       0     0     0
          mirror-1  ONLINE       0     0     0
            ad8     ONLINE       0     0     0
            ad10    ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        <metadata>:<0x87>
```

Does someone know how to fix it without destroying and re-creating the pool?
Thanks.


----------



## simplex (May 2, 2012)

I've rebooted the machine and the resilver started again


----------

