# ZFS alike self healing, what database has this technology?



## olav (Jul 9, 2010)

After I came over ZFS and started reading about it's self healing capabilities I started to wonder about what other technologies which also use this. What came to my mind first was that all serious databases should have this.

So I started googling to find the answers.

The big commercial databases like Oracle, DB2, MS Sql Server claims to support self healing.

For Windows / SQL server I found this 
http://www.infoworld.com/d/networking/windows-server-2008-windows-also-rises-286
and this 
http://technet.microsoft.com/en-us/library/cc771388(WS.10).aspx
It says:


> Self-healing NTFS repairs file system corruption in the background, without interrupting service.


Sounds quite similar to ZFS, but is it?


As for DB2 v9 the information was very sparse
http://www.ibm.com/developerworks/data/library/techarticle/dm-0606ahuja2/
It claims to have self-healing capabilities, but how does it work?


For Oracle I found this, quite detailed
http://www.dba-oracle.com/oracle11g/oracle11g_healthchecks.htm


> Data Block Integrity: detects disk image block corruptions such as checksum failures, head/tail mismatch, and logical inconsistencies within the block.


And listed as one of the main features that came with Oracle 11g is self-healing. So my guess is that Oracle has self healing capabilities similar to ZFS?

PostgreSQL does not have self-healing according to this: http://www.xaprb.com/blog/2010/02/0...inst-partial-page-writes-and-data-corruption/
But it can be used with ZFS to get the self healing feature.


----------



## kdemidofff (Jul 9, 2010)

1. just put stuff on RAID and backup..
2. also replication to slaves can greatly help

Nothing saves u from machine hardware failure or hack but time delayed slave. 
In production environment (read _mene_ gigabytes) it means master machine dead,  customers waiting on checkout or w/e .. It can be critical to change hot replicated slave to temporary hold master role, repair main master switch back and continue.

http://www.mysqlperformanceblog.com/2009/01/12/should-you-move-from-myisam-to-innodb/


> Business Continuity / Disaster Recovery using InnoDB? We use an internal process based on mysqldump and innodb file-per table.
> Our â€œDatabase File-Per-Table Archivesâ€ web site summarizes like this: â€œ31 Hosts 90 Databases 2919 Tables 91 Dates 354456 Backup Filesâ€
> 
> The 31 Hosts are the Master hosts only. Add another 67 slaves in groups replicating upto 6 slaves machines per group. Largest instances top out at approx 60M rows. Most tables are < 0.5M rows. Some applications use MyISAM with partitioned tables. Since every machine is a dedicated instance, mixing engines is not an issue. All DBs are fronted by memcached machines (64 instances). All applications are written to use memcache before hitting DB. We use RAID0 with multiple disks on all SLAVES. MASTER machines use RAID5. Ratio of reads-to-writes is 7,000-to-1. Collectively, about 600M DB transactions per day after peeling off 78% of the reads on the memcaches.
> ...


----------



## olav (Jul 9, 2010)

I think you are missing the point. Raid and backup are still vulnerable for bit rot.

A database usually store very important data, which in any case should never be modified or deleted by data corruption. ZFS prevents this, as long you store the database on a mirror.


----------



## sossego (Jul 9, 2010)

Minix


----------



## kdemidofff (Jul 11, 2010)

olav said:
			
		

> I think you are missing the point. Raid and backup are still vulnerable for bit rot.
> 
> A database usually store very important data, which in any case should never be modified or deleted by data corruption. ZFS prevents this, as long you store the database on a mirror.



yes that is but ZFS useless if
1. hacker dd ur disk, or administrator (accidentally) delete database
2. dead system hardware

in that cases slaves (for serving while master gone) and (+delayed also) replication can help
u will have new healthy machine (u can use zfs also) opposed to useless healthy disk in dead machine. Database is a system, solution not just filesystem it needs more integrity than just files. And if u read u can see i was talking about high availability and overall integrity not just checksums.


----------



## gordon@ (Jul 11, 2010)

Calling this "self-healing" is disingenuous. It's really just proactive health checking.


----------



## fronclynne (Jul 13, 2010)

Indeed.  fsck(8) has "corrected" away several important files that I would have liked to have kept, but at least things were "consistent" afterwards, right?  I don't see zfs as much different.  It might be able to tell when something is corrupt, but I'd rather it didn't try (cack-handedly) to "correct" it without my say-so.

Let's just say I'm dubious about the whole self-healing thing.


----------



## olav (Jul 13, 2010)

But if you disable checksum then it won't self-heal?


----------



## bsd10 (Oct 19, 2010)

I agree with gordon that self-healing is a misnomer, but even if it weren't, it would still probably be safer to do get the checksum of the backup file itself rather than the individual blocks.


----------

