# Cloning a failing ZFS drive



## Deke (Feb 14, 2011)

Hi, 

I've had a bit of a disaster with my ZFS pool. 3 drive pool, one of the drives failed, and was removed and sent for RA. A few days later a 2nd drive in the pool failed.

I'm now trying all my best data-recovery tricks (the drive has issues powering up, so going to try freezing it) and I'm hoping to clone it to another drive using something like ghost in the hope that it will run long enough to do this, and that ZFS will accept the cloned drive.

Has anyone done this before? Will I run into problems getting ZFS to accept the cloned drive as the original drive?

If not, is there anything else I should be trying?


----------



## SirDice (Feb 15, 2011)

I'm assuming by 3 drive pool, you mean a RAIDZ? When one of the drives fails you just don't remove the drive, you replace it.

Since there's now a second broken disk, it's time to check if your backup is working.


----------



## danbi (Feb 15, 2011)

You will be better to clone the failing drive with dd, like

`# dd if=/dev/old_drive of=/dev/new_drive bs=1m`

You may wish to consult the dd page for options how to skip unreadable blocks etc. The new drive needs to have at least the same number of sectors as the old drive.


----------



## Deke (Feb 16, 2011)

danbi said:
			
		

> You will be better to clone the failing drive with dd, like
> 
> `# dd if=/dev/old_drive of=/dev/new_drive bs=1m`
> 
> You may wish to consult the dd page for options how to skip unreadable blocks etc. The new drive needs to have at least the same number of sectors as the old drive.



Thanks, I think I will end up going this way, as Ghost/Acronis True Image were not able to see the drive properly.

Ultimately the drive runs fine for about 30 minutes, so I think using ddrescue, and having it resume from a certain sector may be the only way.


----------



## Galactic_Dominator (Feb 16, 2011)

Deke said:
			
		

> Thanks, I think I will end up going this way, as Ghost/Acronis True Image were not able to see the drive properly.
> 
> Ultimately the drive runs fine for about 30 minutes, so I think using ddrescue, and having it resume from a certain sector may be the only way.



Use the tool in the base system for this, recoverdisk()


----------



## wblock@ (Feb 16, 2011)

Hadn't seen recoverdisk(1) before.  Not sure it'd be a good tool in some cases; when the rust is flaking off of a platter, rereading the bad spots multiple times could make it worse, or crash the heads.

The idea of dropping bad sectors from one drive of an array kind of makes me queasy.  You'd hope the RAID would rebuild that data, but it would have changed when that drive wasn't in the array: bad drive taken out of array, dd-ed to good drive, good drive replaced in array.  Will the RAID controller figure things out correctly and do the right thing?  Maybe...


----------



## Galactic_Dominator (Feb 16, 2011)

wblock said:
			
		

> Hadn't seen recoverdisk(1) before.  Not sure it'd be a good tool in some cases; when the rust is flaking off of a platter, rereading the bad spots multiple times could make it worse, or crash the heads.
> 
> The idea of dropping bad sectors from one drive of an array kind of makes me queasy.  You'd hope the RAID would rebuild that data, but it would have changed when that drive wasn't in the array: bad drive taken out of array, dd-ed to good drive, good drive replaced in array.  Will the RAID controller figure things out correctly and do the right thing?  Maybe...



Yes further damaging the drive is a big concern here, but in this case I'm not sure there is much choice.  And it isn't really any worse than dd() except that it will go back and continue to attempt to read the bad blocks which is sometimes successful.  You need to migrate the data somehow, just as well try something that will try harder at getting a pristine image. The op didn't identify the raid type so it's hard to say exactly what is needed.  If they are running a 3-way mirror, it's no problem.  If it's RAIDZ, unless you get a perfect copy there will be data loss, maybe pool loss depending on what data is bad.  Given the OP's level of panic, I would assume it's RAIDZ so if it was my setup, I'd run recoverdisk(1).

I did find a success story here:

http://robinbowes.com/article.php/20090420153906928


----------



## danbi (Feb 17, 2011)

In principle, ZFS should be able to recover from partially bad disk. It should be able to reconstruct data from two or more partially damaged disks, as long as the damage does not result in loss of redundancy.

However, if the disk tends to go away after a while, you need some procedure to copy as much as you can, before the disk wedges again. dd will try more than one time (not dd itself, the lower layers) to read sectors anyway. Also, with dd you may copy as much data you can before you lose the source disk, then continue from about where it stopped.


----------

