# Backup solution for ginormous ZFS pool?



## Terry_Kennedy (Jun 4, 2010)

First, this isn't a potshot at ZFS - I've been stress-testing it heavily for some weeks now and it is the greatest thing since sliced bread. But... it is somewhat lacking in native tools for disaster recovery.

I collected a bunch of parts to make a nice large NAS - 32TB of disk, and a 48-tape LTO4 robotic tape library. After test-filling the 32TB to about 25% capacity, I then tried to use `# dump` to make a test backup to the tape library. Result: whaddaya mean, _unknown file system_? x(

After doing a number of extensive searches for any remotely relevant keywords, I found people use a number of solutions:

Use zfs snapshots
Use zfs send/receive
Use amanda
Copy small pieces at a time to another filesystem and use dump
Just hope nothing bad happens :\
ZFS snapshots and send/receive don't address the underlying issue - having one or more  complete copies of the data in another location for disaster recovery. For a similar perspective on this, I suggest reading the SmallNetBuilder article Smart SOHOs Don't Do RAID. Despite the unfortunate title, the article is sound - RAID != backup.

I installed the amanda port and was overwhelmed by the configuration options. I contacted zmanda and asked for a quote for configuring Amanda Community Edition, and got back a quote for building a zmanda-supported port of the Amanda Enterprise client to FreeBSD and having it send dumps to a supported server platform (such as Linux). I'm not sure if I was unclear in what I wanted, or if they feel that things are best done with Enterprise on another platform - in any event, it was a non-starter. I'll attach my original message to them so you can see what (I think) I was asking for.

So, I've got this large pile of data that wants to go to tape in some relatively-sane manner. I'm certainly not committed to using Amanda - just about any open-source solution will do, and if configuring whatever it is is beyond me, I'm willing to pay for some reasonable consulting to get it done. Any suggestions?

Here's the relevant piece of the message I sent to zmanda:



> I am setting up a new fileserver using FreeBSD 8.1. The system has 32TB of disk in a multi-level ZFS structure. A single mount point of 22TB is exposed by ZFS.
> 
> I also purchased a Dell TL4000 48-tape LTO4 robotic library (this is the same unit as the IBM TS3200). This library uses barcodes on the tapes to identify media in the library.
> 
> ...


----------



## carlton_draught (Jun 4, 2010)

Terry_Kennedy said:
			
		

> After doing a number of extensive searches for any remotely relevant keywords, I found people use a number of solutions:
> 
> Use zfs snapshots
> Use zfs send/receive
> ...


I realize you are looking at a tape solution. I just wanted to address the "complete copies of data in another location" thing. I completely agree that using RAID or ZFS ZRAID or mirrors alone is not in any way, shape or form a backup. But there is nothing wrong with using zfs send/receive (along with snapshots), provided that you are sending to at least two disks (or pool) that is either already offsite or regularly taken offsite.

I've actually been working on a script to make the above disk-based backup much, much easier and routine for a month or more now. It is not far from release. However, I can't help you with the tapes. I will be interested to hear how this plays out though. Do tapes have anything corresponding to a ZFS checksum? i.e. When there is the tape equivalent of a bad sector, is anything going to let you know? And is there any redundancy?


----------



## magickan (Jun 4, 2010)

Depending on how ginormous this is, and how big the deltas are, your networking capabilities,  is building another ginormous zfs pool elsewhere and syncing between the two not an option?


----------



## jalla (Jun 4, 2010)

You could possibly setup a front-end to handle your taperobot, mount your zfs volumes over nfs, and use a backup app that supports dumping nfs-mounted partitions.

A quick search for "amanda backup nfs" indicates it should be doable with amanda.


----------



## Terry_Kennedy (Jun 4, 2010)

magickan said:
			
		

> Depending on how ginormous this is, and how big the deltas are, your networking capabilities,  is building another ginormous zfs pool elsewhere and syncing between the two not an option?


There actually will be an off-site replication server a few blocks away, connected via GigE. That doesn't solve the case of there being a disaster in this city, though, and also doesn't deal with recovering files that were deleted for good reason, but later found to be needed and restored from tape. Even assuming no compresssion, a full 22TB ZFS setup will fit onto 25 or so LTO4 tapes at a total cost of $850 or so in tapes. I can keep a very large number of backup sets for the cost of just the disk drives, let alone the whole file server.


----------



## Terry_Kennedy (Jun 4, 2010)

jalla said:
			
		

> You could possibly setup a front-end to handle your taperobot, mount your zfs volumes over nfs, and use a backup app that supports dumping nfs-mounted partitions.
> 
> A quick search for "amanda backup nfs" indicates it should be doable with amanda.


As far as I can tell, Amanda can back up from zfs directly. So I'm not sure what I'd get by going with a front-end server. But this is the same thing the zmanda folks came up with, so if I'm missing something, please let me know.


----------



## carlton_draught (Jun 4, 2010)

Terry_Kennedy said:
			
		

> and also doesn't deal with recovering files that were deleted for good reason, but later found to be needed and restored from tape.


That's the purpose behind a regular snapshotting regime - so that you can restore files that have been deleted for good reason. They only take up as much space as your data is changing over time, and also work hand in hand with sending incremental updates to your backup pool. 


> Even assuming no compresssion, a full 22TB ZFS setup will fit onto 25 or so LTO4 tapes at a total cost of $850 or so in tapes. I can keep a very large number of backup sets for the cost of just the disk drives, let alone the whole file server.


True. The HDDs alone would be about $1400 or so, assuming USD.


----------



## magickan (Jun 4, 2010)

Terry_Kennedy said:
			
		

> also doesn't deal with recovering files that were deleted for good reason, but later found to be needed and restored from tape.


 Guess it depends on what your using to sync, rsync for example would keep the deltas but i would look more at the inbuilt snapshotting of zfs.  its tcp based thou and single streamed, so not great performance, but we see it being used for this function.  If the deltas arent to bad then it can work ok.



			
				Terry_Kennedy said:
			
		

> That doesn't solve the case of there being a disaster in this city, though Even assuming no compresssion, a full 22TB ZFS setup will fit onto 25 or so LTO4 tapes at a total cost of $850 or so in tapes. I can keep a very large number of backup sets for the cost of just the disk drives, let alone the whole file server.



Fair enough the cost of the media is pretty cheep, but you do have ancillary costs like where the data is going to be housed.  Tbh thou i can't see how your going to get around not spending a fair bit of money to have full incremental backups off site and in a different city.

If you can spend the money, a connection, either p2p or via a provider that doesnt charge usage rates, plug gig e in your end, other end at a datacenter, rent half a rack, and then use snapshots with syncing?


----------



## Terry_Kennedy (Jun 5, 2010)

magickan said:
			
		

> Fair enough the cost of the media is pretty cheep, but you do have ancillary costs like where the data is going to be housed.  Tbh thou i can't see how your going to get around not spending a fair bit of money to have full incremental backups off site and in a different city.


It isn't nearly as expensive as it seems - a bunch of LTO4 tapes and some Imation DataGuard cases, and I'm all set.



> If you can spend the money, a connection, either p2p or via a provider that doesnt charge usage rates, plug gig e in your end, other end at a datacenter, rent half a rack, and then use snapshots with syncing?


I do have multiple GigE fiber links to a site a few blocks away, where the replication will happen. As I mentioned in an earlier reply, that isn't cost-effective for multiple complete sets of backup. Hence, the tape.

I also need to guard against a software failure taking out both the primary and replicated ZFS pools. I tried to explain this to DEC when they were developing the VAXft system - software failure is more likely than hardware failure, and all that VAXft achieved was to have a couple, arm in arm, walking off the same cliff at the same time. That did not earn me many friends in that product group. The only way to achieve that level of software fault-tolerance is with an N-way voted system with different software implementations on each member. In the past, I worked on one such system, a 5-way voted system. And even then, during development and testing we had some 3-to-2 votes where 3 of the systems were wrong.


----------



## Terry_Kennedy (Jun 5, 2010)

carlton_draught said:
			
		

> I've actually been working on a script to make the above disk-based backup much, much easier and routine for a month or more now. It is not far from release. However, I can't help you with the tapes.


I'd be interested in this when it is ready - as I mentioned elsewhere in the thread, I will have an off-site system with identical hardware for redundancy.


> I will be interested to hear how this plays out though. Do tapes have anything corresponding to a ZFS checksum? i.e. When there is the tape equivalent of a bad sector, is anything going to let you know? And is there any redundancy?


I haven't actually run into a tape read error (on a restore) since the days of 9-track open reel drives. Modern drives (at least DLT and LTO) compute their own checksums and are capable of decent error recovery. Which is good, because at least with DLT8000, a hard read error would usually make the rest of the tape inaccessible.

The only errors I've had with modern tapes have been the drive rejecting brand new media - my first SDLT600 drive + tapes came with a 10-pack of defective tapes, and I'd say I have run into about a 10% out-of-the-box failure rate on SDLT600 media. That, combined with Quantum declining to honor their "lifetime media warranty", is why I switched to LTO. Once the tapes are written, though, I've never had a problem restoring from them.


----------



## carlton_draught (Jun 6, 2010)

Terry_Kennedy said:
			
		

> I also need to guard against a software failure taking out both the primary and replicated ZFS pools. I tried to explain this to DEC when they were developing the VAXft system - software failure is more likely than hardware failure, and all that VAXft achieved was to have a couple, arm in arm, walking off the same cliff at the same time. That did not earn me many friends in that product group. The only way to achieve that level of software fault-tolerance is with an N-way voted system with different software implementations on each member. In the past, I worked on one such system, a 5-way voted system. And even then, during development and testing we had some 3-to-2 votes where 3 of the systems were wrong.


This is an interesting concept, one I had not considered before. Thanks for bringing it up.

I've been thinking about this though... are you talking about a failure of the filesytem? e.g. a failure of UFS, ZFS, ext3, etc? Because if you are, the probability of failure would be something like a/x, where a is the number of reported errors due to a bug with the filesystem, and x is the estimated total number of installations. You might get a feel for "a" by googling, not sure how to get x - maybe add up the solaris and freebsd install estimate and divide by 2, for example. Anyway as the filesystem matures this should asymptote to zero.

I guess there is also the possibility for corruption of the operating system components that are responsible for filesystem maintenance. Which is why I run a ZFS mirror for that too.  And there is the backup software - e.g. if you use UFS, dump/restore, amanda, rsync, whatever. Though I would suspect that they have probably squeezed the critical bugs to near zero by now as well.

Thanks for the info on the tapes btw. Will let the forum know when the script is ready for release.


----------



## danbi (Jun 7, 2010)

The ZFS send/receive is just a file.

I have not used tapes since many, many years, but in the old days, one would do sort of

`# dd if=file of=tape`

and be done with it.

I suspect modern tape drives are much smarter than older ones and a robotic system would permit you to just send a (large) file to the tape drive and not worry about filling up the tape etc.

Think of zfs send/receive as the UFS dump/restore tools and life will be much easier for you. 

What is the speed of your tape system read/write (real life)? If it is not much faster than the Gbit connection to an archive server, you might indeed consider building a backup/archive server, even if initial costs seem high. For such setup, I would be more concerned about power consumtion/heat dissipation of the backup solution anyway. 

Restoring from a tape is only so much useful when you restore everything. With a disk-based archive server, you may restore individual files way, way faster.
Of course, with ZFS you do not need to restore 'deleted' files from any external media, if you use snapshots.


----------



## Terry_Kennedy (Jun 8, 2010)

danbi said:
			
		

> I have not used tapes since many, many years, but in the old days, one would do sort of
> 
> `# dd if=file of=tape`
> 
> ...


The backup won't fit on a single tape in any currently-existing tape format - the LTO roadmap shows LTO-8 storing 12.8TB in 2017, but even that won't hold the whole filesystem on one tape.

Tape robots aren't that smart - each tape will report EOT to the host system, and then the host does whatever it needs to close the tape and command the robot to load the next tape.

Also, a simple dd or similar doesn't meet the requirements of being able to restore a single file or the contents of a directory, only a complete restore of the entire filesystem.



> What is the speed of your tape system read/write (real life)? If it is not much faster than the Gbit connection to an archive server, you might indeed consider building a backup/archive server, even if initial costs seem high. For such setup, I would be more concerned about power consumtion/heat dissipation of the backup solution anyway.


I'm seeing about 77MB/sec (from a sloweer source drive - a 7200RPM UFS gmirror):

```
16840130560 bytes transferred in 216.809384 secs (77672517 bytes/sec)
```
That's using only a single drive - the library has 2 (and can hold up to 4).

While slower than GigE, you can't beat the price (and portability) of tape media.


----------



## AndyUKG (Jun 14, 2010)

If you want to use tape then clearly you need something roughly equivalent to Veritas  NetBackup. I dont have experience of any open source solutions, but from a complexity point of view I think it is much simpler to implement something using zfs send/recieve, which runs in conjunction with zfs snapshots. Your only issue was that you mentioned as a draw back your secondary system is in the same city, with the implication that this isnt sufficiently far away geographically from your primary system.
Ignoring that minor detail  zfs send recieve and snapshots gives you a full replicated environment, minimises network traffic (zfs send recieve is sending only changed blocks) and gives you pretty much as many historical point in time snapshots as you want (for those legitimately deleted files as you mentioned) for recovery. Obviously from a restore perspective it completely negates the need to call back from offsite tapes, put tape in drive, wait for tape to position etc etc
I havent actually got this fully running and tested myself, but Im working on it now....

thanks Andy.


----------



## mix_room (Jun 15, 2010)

I have no idea if it does, but this might help you
http://blogs.sun.com/ako/entry/tape_backup_for_zfs


----------



## JohnDC (Sep 5, 2010)

*Terry any progress there..?*

I've alredy spent too much time on Backup Exec for just daily jobs much less consistent or reliable jobs, so looking for options asap.

I'm new to BSD but needing to get a working FreeBSD Server backing up 6 2003 Server clients..  

Was wondering if you had worked through the setup options sucessfully and what you think the level of difficulty would be for an Admin, but new to Unix guy..

My setup is fairly simple, I plan to backup to Disk on my FreeBSD Server at night, then to Arcvault12/LTO tape during the day for retention and offisite, and since we have the library and media..

Really needing to replace the crap Backup Exec Srvr 2003 and run Amanda or Bacula Backup Server on my new FreeBSD Server. Then backup 6 2003 Servers,(2 running Notes, and a couple running SQL)

Any good progress with your related work?
Thank you,
jc


----------



## da1 (Sep 5, 2010)

carlton_draught said:
			
		

> Do tapes have anything corresponding to a ZFS checksum?


basically no. they just get and store the data .. period.
there are ways of verifying the integrity of the data (CRC for instance) but that is not a feature of the tape but of the software.



> When there is the tape equivalent of a bad sector, is anything going to let you know?


you will know the instant you cannot backup/restore to/from the tape or when you do some internal housekeeping like migration, stg backup, reclamation, move data, audit vol, etc. Once a "sector" on a tape = bye bye, that's pretty much it. Sure, you can recover the data but you would need to send the tape to a lab (costly like hell and lengthy - usually ~1 month) and by the time you get the tape+data back, you may well be over your data retention period . it's soooooo cool sometimes lol



			
				carlton_draught said:
			
		

> And is there any redundancy?


Sure. If and only if you have a copy pool/tape. This of course implies that you are using half (normally way less than half) of your total tape storage capacity.

Lovely ain't it ?


----------



## Terry_Kennedy (Sep 5, 2010)

JohnDC said:
			
		

> Any good progress with your related work?


Not yet - summer is racing season, so I've been out all over the country in my race car. [If my profile picture/avatar would show, you'd see it 8-]

I'm almost positive that Amanda can do what I want, I just have to sit down and figure out how to turn off the parts I don't need. I found this article which may be helpful.


----------



## Terry_Kennedy (Sep 5, 2010)

da1 said:
			
		

> basically no. they just get and store the data .. period.
> there are ways of verifying the integrity of the data (CRC for instance) but that is not a feature of the tape but of the software


Modern tape drives that anybody is going to use (LTO, DLT, and so on) have extensive error detection and correction logic.

Users of operating systems that have been around for 30+ years (like VMS) which had extensive facilities to recover data from bad tapes are starting to question the conventional wisdom of leaving those facilities enabled. Some discussion here.


----------



## AndyUKG (Sep 6, 2010)

Terry_Kennedy said:
			
		

> Modern tape drives that anybody is going to use (LTO, DLT, and so on) have extensive error detection and correction logic.
> 
> Users of operating systems that have been around for 30+ years (like VMS) which had extensive facilities to recover data from bad tapes are starting to question the conventional wisdom of leaving those facilities enabled. Some discussion here.



If its critical to have a good copy of your backup data, you will make multiple copies to tape (just like storing critical data on disk). You have no idea when a tape might go wrong and break or whatever...


----------



## da1 (Sep 6, 2010)

AndyUKG said:
			
		

> If its critical to have a good copy of your backup data, you will make multiple copies to tape (just like storing critical data on disk). You have no idea when a tape might go wrong and break or whatever...



my point exactly.

due to my line of work, I see it every day. The latest and greatest fail too, no matter what super-mega-ultra $#it technology they use. The best way is to have a copy of the copy.

At work we have 3 copies (1 disk + 2 tape pools (primary/copy pool)). And even so, some tapes go to hell.

Bottom line .. better safe than sorry when it comes to keeping backups.


----------



## Terry_Kennedy (Sep 7, 2010)

da1 said:
			
		

> Bottom line .. better safe than sorry when it comes to keeping backups.


Indeed. That's why I was so surprised to discover that there's no simple backup solution for large ZFS pools - I would have thought that Sun (at least) would have had a proper backup utility.


----------



## da1 (Sep 7, 2010)

AFAIK, SUN has no such thing. 

For all our SUN machines we have TSM (IBM/SUN libraries)


----------



## AndyUKG (Sep 7, 2010)

Terry_Kennedy said:
			
		

> Indeed. That's why I was so surprised to discover that there's no simple backup solution for large ZFS pools - I would have thought that Sun (at least) would have had a proper backup utility.



People who pay for Sun kit generally also pay (alot) for their backup solution. Sun will be very happy to resell you Veritas NetBackup or Legato Networker etc. As for free or open source solutions, they Sun dont have anything in that space.


----------



## phoenix (Sep 7, 2010)

Terry_Kennedy said:
			
		

> Indeed. That's why I was so surprised to discover that there's no simple backup solution for large ZFS pools - I would have thought that Sun (at least) would have had a proper backup utility.



They do have a solution, it's called "a second pool configuration" that you "zfs send/recv" data to.  And you create as many "secondary pools" as you need, for off-site, redundant backups.


----------



## carlton_draught (Jun 1, 2011)

Terry_Kennedy said:
			
		

> I'd be interested in this when it is ready - as I mentioned elsewhere in the thread, I will have an off-site system with identical hardware for redundancy.


Hi Terry,

You might want to check out this thread. I've finally finished a set of articles that describe what I had in mind. Maybe it is of use to you. 

Although 22TB data is several times more data than what I have, you could buy 7+2 Hitachi 3TB HDDs for a RAIDZ2 setup and it would cost $1260. I realize you probably already have tape drive (starting at $1500 on newegg). As a comparison, if you have 5 free SATA ports that support port multiplication, you only have to buy 5* e-SATA docks for something like $15/ on ebay plus some SATA data + power connectors that would cost a total of about $100 all up. For each backup HDD pool, you'd want silicone covers for each HDD ($4 per HDD) and a padded camera bag for $60-$80.

You could probably have 3 backup HDD pools before tape starts being cheaper.


----------



## Terry_Kennedy (Jun 1, 2011)

carlton_draught said:
			
		

> You might want to check out this thread. I've finally finished a set of articles that describe what I had in mind. Maybe it is of use to you.


That's quite interesting. I've skimmed it and will take some time to look at it in more detail.



> Although 22TB data is several times more data than what I have, you could buy 7+2 Hitachi 3TB HDDs for a RAIDZ2 setup and it would cost $1260. I realize you probably already have tape drive (starting at $1500 on newegg).
> 
> You could probably have 3 backup HDD pools before tape starts being cheaper.


I'm probably being old-fashioned, but I want to have some sort of media that I can restore bits and pieces from, onto any sort of filesystem. I don't want to have to depend on ZFS for both my primary data and backups.

I had a problem early on where my ZIL SSD failed (it is a PCI Express card with flash daughterboards, and one of the daughterboard sockets failed). Fortunately, I was able to copy the data off the pool to another server (plus, I had a tape backup which would have been slower to restore). Any attempt to write to the pool crashed the system (post-8.2-RELEASE 8-STABLE).

I'm running 8-STABLE + ZFS v28 on two of the 32TB boxes and 8-STABLE with the "stock" ZFS v15 on the third. It is possible that the pool with the corrupted ZIL would have been fixable if it was at v28.


```
(0:867) new-gate:~terry# df -h
Filesystem            Size    Used   Avail Capacity  Mounted on
[snip]
rz1:/data              21T    9.5T     11T    44%    /var/www/docs/media/data1
rz1m.offsite:/data     21T    9.6T     12T    44%    /var/www/docs/media/data1m
rz2:/data              21T    9.5T     12T    44%    /var/www/docs/media/data2
```

I have a Dell PowerVault TL4000 with (currently) a single LTO4 drive. That gives me 44 slots (48 if I don't care about the import/export mailbox) worth of tapes. New-with-warranty LTO4 media is about $26/tape in reasonable quantities. According to Amanda, the average tape capacity (compression enabled) is about 820GB. That's about what I'd expect, given that the ZFS pool only gets about 1.01 compression and the LTO4 native capacity is 800GB, which costs me something like 3.2 cents/GB. Plus, I have nice DataGuard tape cases which hold 20 tapes to move them around to the various offsite storage sites.

22TB is the usable size of my pool, which joins 3 5-drive raidz's with a spare drive for a total of 16 drives, all 2TB:


```
zpool create -f data \
raidz label/twd0 label/twd1 label/twd2 label/twd3 label/twd4 \
raidz label/twd5 label/twd6 label/twd7 label/twd8 label/twd9 \
raidz label/twd10 label/twd11 label/twd12 label/twd13 label/twd14 \
spare label/twd15 log label/ssd0
#
zpool set autoreplace=on data
zfs set dedup=on data
zfs set compression=on data
```

Those are all WD RE4 drives. I realize that I could use cheaper drives for backup. The least-expensive 2TB drive on newegg is currently a Hitachi, at $69.99 [the rebate doesn't help, unless they're going to give rebates on bulk purchases], which comes in at around 3.4 cents per GB, which is more expensive than the LTO4 tapes. Of course, it doesn't factor into account the cost of the tape library - OTOH, it also doesn't factor in the cost of additional SATA controller ports needed for the drives, plus mounting.


----------



## carlton_draught (Jun 1, 2011)

Terry_Kennedy said:
			
		

> That's quite interesting. I've skimmed it and will take some time to look at it in more detail.


Thank you.


> I'm probably being old-fashioned, but I want to have some sort of media that I can restore bits and pieces from, onto any sort of filesystem. I don't want to have to depend on ZFS for both my primary data and backups.


Fair point. That's why I recommend having 3+ backup pools. If you are reinstalling a whole system, you're starting from that article set + scripts + LiveDVD + fixed/new system, so any issues with the existing system should be a bit of a moot point. And if your current system is usable (e.g. you are just restoring an old file or something, the snapshot scheme will give you a high likelihood of finding it on your system). Of course, you'd want to thoroughly test and understand such a system before trusting your data to it.



> I had a problem early on where my ZIL SSD failed (it is a PCI Express card with flash daughterboards, and one of the daughterboard sockets failed). Fortunately, I was able to copy the data off the pool to another server (plus, I had a tape backup which would have been slower to restore). Any attempt to write to the pool crashed the system (post-8.2-RELEASE 8-STABLE).


Yes, this is why if I ever run a ZIL, it will probably be a triple mirror and using something like the Intel 320 series with capacitors for writing buffer data in the event of a power failure, because it's not something that can really afford to fail. L2arc by contrast are great in that you can use a single drive with no problems at all.



> I'm running 8-STABLE + ZFS v28 on two of the 32TB boxes and 8-STABLE with the "stock" ZFS v15 on the third. It is possible that the pool with the corrupted ZIL would have been fixable if it was at v28.


Perhaps. I'm going to wait until 9.0-RELEASE before using v28, exciting though dedup may be.



> Those are all WD RE4 drives. I realize that I could use cheaper drives for backup. The least-expensive 2TB drive on newegg is currently a Hitachi, at $69.99 [the rebate doesn't help, unless they're going to give rebates on bulk purchases], which comes in at around 3.4 cents per GB, which is more expensive than the LTO4 tapes. Of course, it doesn't factor into account the cost of the tape library - OTOH, it also doesn't factor in the cost of additional SATA controller ports needed for the drives, plus mounting.


I realize that you already have the hardware, so tape is obviously going to be cheaper for you at this point, and perhaps more convenient. I'm not sure whether you'd have need for the sort of system similar to the one in my article set. However, it would be interesting to see if you can identify any more likely points of failure etc (other than the acknowledged N-way voting issue etc.)


----------



## Terry_Kennedy (Jun 1, 2011)

carlton_draught said:
			
		

> Yes, this is why if I ever run a ZIL, it will probably be a triple mirror and using something like the Intel 320 series with capacitors for writing buffer data in the event of a power failure, because it's not something that can really afford to fail. L2arc by contrast are great in that you can use a single drive with no problems at all.


Since I don't have a high steady-state write rate (most access is reads of existing data), and writes are things like backups or file copies from other systems which can be re-done if needed, the possibility of a ZIL failure isn't particularly worrisome, particularly as v28 should allow a clean removal of a failed log, without damaging the pool (other than any data not yet committed to the main pool).

Since ZIL writes are intentionally synchronous, I'd be somewhat concerned about increasing the latency with a redundant ZIL, particularly when using MLC flash on a shared controller. The PCI Express ones tend to be a good bit faster. Unfortunately, most of those target the Windows or Linux environments, and many (most?) of them need proprietary drivers not available for FreeBSD. The ones I'm using are LSI-based and use the normal freebsd mpt driver.



> Perhaps. I'm going to wait until 9.0-RELEASE before using v28, exciting though dedup may be.


I upgraded my first pool (test system) to v28 when the patches for 8-STABLE first came out. I upgraded the second pool around 6 weeks into the "MFC after: 1 month" v28 commit to HEAD. Unfortunately, the plan for that MFC seems to have been a bit optimistic. I'd like to see it MFC'd at some point - the more people use it, the better it will be.



> However, it would be interesting to see if you can identify any more likely points of failure etc (other than the acknowledged N-way voting issue etc.)


I'm using separate OS drives (a gmirror'd pair of 320GB 2.5" drives). Ideally, those would be behind some sort of hardware RAID controller. My previous design from 6+ years ago kept the OS on the same drives as the data. For various reasons, I wasn't happy with that choice. The separate drives proved to be a good idea as I've done at least one set of full drive swaps on each of the systems (one system got 2 sets of full swaps due to a mis-communication with the manufacturer). It was convenient to be able to pull all 16 drives out at once. I expect the restore was also quite a bit faster than a sequence of 5 3-drive swaps/resilvers.


----------



## carlton_draught (Jun 1, 2011)

Terry_Kennedy said:
			
		

> I'm using separate OS drives (a gmirror'd pair of 320GB 2.5" drives). Ideally, those would be behind some sort of hardware RAID controller. My previous design from 6+ years ago kept the OS on the same drives as the data. For various reasons, I wasn't happy with that choice. The separate drives proved to be a good idea as I've done at least one set of full drive swaps on each of the systems (one system got 2 sets of full swaps due to a mis-communication with the manufacturer). It was convenient to be able to pull all 16 drives out at once. I expect the restore was also quite a bit faster than a sequence of 5 3-drive swaps/resilvers.


Ah. You might find something in that article of use then, as it puts (most) OS/applications on the SSD mirror, while the rest is on the HDD where space is cheap. In addition to that, there is an interim backup of what is on the SSD mirror, so you can easily restore that if the root mirror dies for whatever reason.

One thing I'm not sure of is how you would get that to play nice with your v28 pool. I guess you could use a newer version of the liveDVD, though I've only ever played with the RELEASE version.


----------



## AndyUKG (Jun 1, 2011)

Terry_Kennedy said:
			
		

> I'm probably being old-fashioned, but I want to have some sort of media that I can restore bits and pieces from, onto any sort of filesystem. I don't want to have to depend on ZFS for both my primary data and backups.



IMHO I think you are being sensible. I have written some scripts to manage hourly/weekly/monthly snapshots and replicate them to a remote system via Ssh. I have them running on a few systems and it runs very well. However if the data was really critical then backup to the same technology has obvious risks. You can't rule out a bug affecting both live and backup systems.

cheers Andy.


----------



## carlton_draught (Jun 1, 2011)

AndyUKG said:
			
		

> IMHO I think you are being sensible. I have written some scripts to manage hourly/weekly/monthly snapshots and replicate them to a remote system via Ssh. I have them running on a few systems and it runs very well. However if the data was really critical then backup to the same technology has obvious risks. You can't rule out a bug affecting both live and backup systems.
> 
> cheers Andy.


Is the risk contingent upon both systems being live, in effect? Or do you still see a risk when you have 5 or 6+ backup pools that spend most of their lives disconnected from anything, sitting in a firesafe or similar (i.e. offline), with a couple of the pools only updated once every few months so as to guard against a bug that is not overt?


----------



## AndyUKG (Jun 1, 2011)

carlton_draught said:
			
		

> Is the risk contingent upon both systems being live, in effect? Or do you still see a risk when you have 5 or 6+ backup pools that spend most of their lives disconnected from anything, sitting in a firesafe or similar (i.e. offline), with a couple of the pools only updated once every few months so as to guard against a bug that is not overt?



Hmm, well you are definitely making a lot of effort to minimise risk in the scenario you describe. I was certainly imagining ZFS replication being made to other online systems. But an offline system using a different system vs and offline system using the same, I would say different is clearly better (without getting into details about each system and assuming both systems are considered production ready). Also in the case of backups, where people are doing for example 1 backup a day or more, I think in the instance of disk to disk backup you won't commonly see people taking those disks offline each day.

As I mentioned, I am using ZFS as backup on some systems too, I'm not saying its bad. But in summery my opinion would be that if the data is really critical, then putting all your eggs in one basket is a sub-optimal backup solution. But then when you are designing a backup solution you are always making choices, based on a requirement which will differ from situation to situation, all of which will have pros and cons and a price...

cheers Andy.


----------



## carlton_draught (Jun 2, 2011)

Thanks for prompting this discussion, Andy.


			
				AndyUKG said:
			
		

> Hmm, well you are definitely making a lot of effort to minimise risk in the scenario you describe. I was certainly imagining ZFS replication being made to other online systems.


If we are dealing with critical systems where there is no other record being made of the important data that accumulates during that time, it's probably not a bad idea to replicate online in addition to keeping the offsite, offline backups. Doing that is more of an interim measure that will mitigate the loss of the primary server by minimizing the data loss for little ongoing effort (as it's automated). 

However, a strictly online "backup" does not guard against an online disaster for example. e.g. if both boxes are rooted and the cracker decides to secure shred both of your copies. See this post. And read the thread, especially monkeyboy's posts. My aim was to resolve a lot of those issues with my attempt at a solution. To quote monkeyboy:


			
				monkeyboy said:
			
		

> it ain't a "backup" unless it is 1) full, 2) on removable media, 3) offline, 4) offsite, 5) tested for actual restore...





			
				AndyUKG said:
			
		

> But an offline system using a different system vs and offline system using the same, I would say different is clearly better (without getting into details about each system and assuming both systems are considered production ready).


I can see that in the case where you have two backup systems (e.g. a straight ZFS system as in my example, and some sort of backup to tape that is not ZFS), it has potential to be even more reliable. This is of course provided that you have allocated the extra funds to things like testing that the restore actually works properly, documenting all your procedures, and so on. Often IRL there are compromises made. Maybe you use less tapes/HDD pools than you would with a single solution. Maybe you don't document them well. Maybe one doesn't get tested properly. In the process of attempting to eliminate that risk one can end up introducing more risk.

But then again, I've always been one to put all my eggs in as few baskets as possible, individually wrapped in bubble wrap, 100 feet under concrete under high ground, with castle walls, a moat, pillboxes with overlapping fields of fire... you get the picture.

And if we compare say, ZFS on regular system plus LTFS on the tapes vs ZFS on regular system and backups, then we also conceivably have two things that can go wrong in the former, where only one thing can go wrong in the latter. I realize that we are probably thinking that "LTFS is tape, it can't go wrong", but really it's just another filesystem. If anything, because something like ZFS is used on live systems, any bugs are going to be noticed and corrected that much sooner.

Filesystems would have to be one of the most extensively tested software in existence, because

Every computer uses at least one filesystem
The filesystem is in use all the time, every time the computer does virtually anything.
When there are bugs, particularly data destroying bugs, people get very, very mad. They WILL let someone know about the bug, and if it's much of a problem they will quickly use something else.
Designers know this, and particularly with filesystems that are designed to be used in servers, especially servers that are on the more reliable end of the scale (e.g. ZFS), they are going to be more conservative, do more testing etc.

Once a filesystem has been used in the field reliably for a reasonable period of time on a reasonable install base, the probability of there being some sort of showstopping, data destroying bugs, especially ones that would somehow not show themselves after something like repeated successful zpool status checks, imports, exports and the like, and then simultaneously render all of your backup pools unusable, even those a few months old or so, while functioning perfectly on your primary system up until that point... in my estimation would be remote. To the point where (/me puts on the dogbert hat) a hand-crafted company destruction script that makes it look like you are making backups when in actual fact you are shredding all your backups until the fateful day when your primary system is destroyed - might be more probable than that scenario. Or even if you use tape, a sysadmin with an axe to grind decides to surreptitiously destroy all your tape archives that are theoretically only written to once, along with all other copies of the organization's data.

I guess the thing to realize with risk management is that try as you might, you can never get the risk to zero. Even if you decide to nuke it from orbit, maybe the aliens are already on board.



			
				AndyUKG said:
			
		

> Also in the case of backups, where people are doing for example 1 backup a day or more, I think in the instance of disk to disk backup you won't commonly see people taking those disks offline each day.


Where I used to work they would take a tape backup each day, and take them offsite each day in a rotation. It would have made no difference to the ease in which that procedure was done to use HDD instead. Even something like 12 disks in a (padded) camera bag, a woman can still carry that by herself. The way I suggest doing it (if you read all the articles I wrote, you'll find it there in the preface articles) is use standard internal HDDs, put them in the cheap $4 silicone HDD cases (which basically provide some small shock protection, stop them sliding around your desk, allow stacking of them on said desk as high as you'd want (they interlock), and provide access to the data and power ports). Excuse the PATA HDD in the image, we'd use SATA of course.
e.g.







You connect the HDDs (still in the cases) to SATA data + power extenders (otherwise you'd have to remove the silicone cases).






You use dual e-SATA HDD docks, that you connect the above extender to.






You connect the docks via e-SATA to your e-SATA back plates, which are in turn connected to regular internal e-SATA ports on your motherboard or SATA card.






Making a backup is as easy as:

Put HDDs on a flat surface.
Connect to the SATA extenders coming from each HDD dock (two per dock, obviously)
Turn on each HDD dock.
Wait 10-20 seconds for your HDDs to spin up and detect.
Execute the backup script.
Flick off HDD dock switches when script finishes.
Remove HDDs from extenders.
Stack HDDs in padded bag (e.g. camera bag) to be taken offsite.



			
				AndyUKG said:
			
		

> As I mentioned, I am using ZFS as backup on some systems too, I'm not saying its bad. But in summery my opinion would be that if the data is really critical, then putting all your eggs in one basket is a sub-optimal backup solution. But then when you are designing a backup solution you are always making choices, based on a requirement which will differ from situation to situation, all of which will have pros and cons and a price...


Exactly.


----------



## AndyUKG (Jun 2, 2011)

Hi,

  I didn't say you couldn't offline disks each day, just that I don't think you will find many people using that system currently. Maybe that will change, but I think in big enterprises people still use tape and virtual tape. In not big enterprise things are propbably more mixed.

Online backup vs offline backup its certainly a valid point that someone can destroy online backups all at once if your server is compromised. That's why in a real rolls royce solution for critical systems, in say banks, you will find 2 or 3 online replicas/copies of data with snapshots and 1 or 2 backups to tape each day/hour.

With regards to introducing more risk by having some other tape system that you need to test and document, well I guess thats true in a sense. Two systems to test and document means double the chance of some human error in one of those. I don't think I believe that is a valid arguement for putting all your eggs in one basket, ie weighing the risks against each other,

cheers Andy.

PS If someone has no experience setting up backups (ie home user or small company), then yeah trying to do too many things you will probably end up in trouble, in that situation better to have one system that works really well. For medium to large companies, I think the argument starts to fall down.


----------



## carlton_draught (Jun 2, 2011)

AndyUKG said:
			
		

> Hi,
> 
> I didn't say you couldn't offline disks each day, just that I don't think you will find many people using that system currently. Maybe that will change, but I think in big enterprises people still use tape and virtual tape. In not big enterprise things are propbably more mixed.


Definitely true. I was as much pointing out that it's very cheap and easy to use HDD, I guess if anyone else is reading. My point is that now, if you use FreeBSD, you can have what is a very reliable system, and a good means to back up. Certainly not as reliable as possible, but then, you pay for that too. At least, it can be a very good building block for small companies (very high data integrity, cheap, fast, convenient), with room for them to grow.


> Online backup vs offline backup its certainly a valid point that someone can destroy online backups all at once if your server is compromised. That's why in a real rolls royce solution for critical systems, in say banks, you will find 2 or 3 online replicas/copies of data with snapshots and 1 or 2 backups to tape each day/hour.


That's a good idea. Tape is certainly good if you can afford it.


> With regards to introducing more risk by having some other tape system that you need to test and document, well I guess thats true in a sense. Two systems to test and document means double the chance of some human error in one of those. I don't think I believe that is a valid arguement for putting all your eggs in one basket, ie weighing the risks against each other,


I certainly agree that tape is an excellent and probably superior addition to such a system of HDD backups, if the cost benefit calculations work out.


> PS If someone has no experience setting up backups (ie home user or small company), then yeah trying to do too many things you will probably end up in trouble, in that situation better to have one system that works really well. For medium to large companies, I think the argument starts to fall down.


Yes. I guess I am thinking like a small company/startup/medium company that likes to do IT stuff inexpensively. For even some medium sized companies I know, this would be a vast improvement. It depends on your definition of medium and how disfunctional the companies you know are. 

From what I hear, there have been a lot more cowboys in banks too, in recent times. It's a long way from the post-depression environment where you were judged by your mistakes rather than your (current) successes.


----------



## AndyUKG (Jun 2, 2011)

carlton_draught said:
			
		

> Yes. I guess I am thinking like a small company/startup/medium company that likes to do IT stuff inexpensively. For even some medium sized companies I know, this would be a vast improvement. It depends on your definition of medium and how disfunctional the companies you know are.



Yep, each person has to aim at the best design within his or her budget. One thing that can be said for people of all budgets; budget for backup from the beginning, it makes no sense to have all the bells and whistles on your servers but then be stuck wit a poor backup solution because you already spent all your money.

cheers Andy.


----------



## danbi (Jun 3, 2011)

I am playing from time to time with backup solutions like these. Stopped using tapes years ago, because HDDs are cheaper/faster and.. more flexible in the long run. Was using DVDs at one time, by the way 

Anyway, why not use 2.5" drives? Current 3.5" capacity is 3TB, while the current 2.5" capacity 1TB or more. An 2.5" drive is much smaller/lighter than an 3.5" drive. About as much faster for backup purposes.

By reading your comments, I was thinking of a small multi-disk case with integrated port multiplier (or SAS expander) and multiple 2.5" drives. A 16 disk enclosure should not be heavy or big to carry.

Also, you may use disk drives instead of tapes for "tape" backup -- that is, use the drives as linear storage. With, or without filesystem.

With the mirror-split feature in new ZFS, one could use a different technique -- say have 16 mirror pairs of 1TB, connect 16 external drives and attach each of these to the mirrors, in effect creating triple-mirrors. Then wait for all mirrors to resilver. Split mirrors and you have an identical ZFS pool.

You can do this even remotely with HAST, or ggate. Disks can be connected to the system via the network.


----------



## Terry_Kennedy (Jun 6, 2011)

Terry_Kennedy said:
			
		

> ZFS snapshots and send/receive don't address the underlying issue - having one or more  complete copies of the data in another location for disaster recovery.


[Side note: This topic is now a year old and is still going strong - please keep going.]

I was doing my regular scan through the lists.freebsd.org mailing list archives for ZFS-related posts, and came across 2 different threads that I'd like to excerpt here. The folks already participating in this thread know how important having some sort of backup is (regardless of what strategy is used), but for people just browsing this topic, these may be interesting reading...

freebsd-stable, May 2011:


			
				Olaf Seibert said:
			
		

> I moved those directories to the side, for the moment, but I haven't been able to delete them yet. The data is a bit bigger than we're able to backup so "just restoring a backup" isn't an easy thing to do. Possibly I could make a new filesystem in the same pool, if that would do the trick]
> 
> freebsd-fs, June 2011:
> 
> ...


----------



## Jurgen (Dec 26, 2011)

ref: http://www.bacula.org/manuals/en/concepts/concepts/New_Features.html

BACULA

Solaris ZFS/NFSv4 ACLs 

This is an upgrade of the previous Solaris ACL backup code to the new library format, which will backup both the old POSIX(UFS) ACLs as well as the ZFS ACLs.

The new code can also restore POSIX(UFS) ACLs to a ZFS filesystem (it will translate the POSIX(UFS)) ACL into a ZFS/NFSv4 one) it can also be used to transfer from UFS to ZFS filesystems.


----------



## Terry_Kennedy (Jun 10, 2012)

Terry_Kennedy said:
			
		

> ZFS snapshots and send/receive don't address the underlying issue - having one or more  complete copies of the data in another location for disaster recovery.


I know this is an old topic, but I read something today that is extremely relevant. I don't know if they were using ZFS or even FreeBSD, but the lesson learned is relevant nonetheless.

Today I happened to be browsing blu-ray.com and the lead article is titled "Database Loss". I'll paste the lead paragraphs of that article below, but click on the above link for the whole thing.

Perhaps reading the article will help convince people that RAID is not backup, no matter what the implementation. Perhaps not. If it does manage to convince you, good.



			
				www.blu-ray.com said:
			
		

> We are extremely sad to let you know that we've experienced 7 weeks of database loss. 7 weeks ago we moved to a new much improved server, but unfortunately earlier today the hard drives of the database crashed (was using RAID). On the old server we did daily backups, but since we changed server and setup, the old server backup solution didn't work anymore. We have been discussing the new backup system on a daily basis, but hadn't yet implemented it, so the timing couldn't have been worse.
> 
> What is missing the last 7 weeks
> 
> ...


----------



## fgordon (Jun 10, 2012)

My "big" homeserver is a 12x2 TB raidz2 setup  -  my backup server is a 9x1.5 raidz + 2x3 partly raidz partly nonraid-zfs-big-volume.

Though zfs send/receive is really nice - I do prefer rsync due to more flexibility - so I can change my backup system e.g. to Linux :O  if there is some exotic hardware that is supported by linux like a backup on stone tablets.

I personally like the flexibility of an rsync backup with easy including/excluding so I can decide which directories go to raidz backup and which into a just plain non raidz zfs (as I have them elsewhere) and it's quite fast - at least faster than gigabit though using low-end hardware


----------



## vermaden (Jun 10, 2012)

fgordon said:
			
		

> Though zfs send/receive is really nice - I do prefer rsync due to more flexibility - so I can change my backup system e.g. to Linux :O  if there is some exotic hardware that is supported by linux   like a backup on stone tablets.



You can use ZFS on Linux as well: http://zfsonlinux.org/ (native) or by FUSE.


----------



## badtux (Jun 15, 2012)

ZFS on Linux is pretty much a non-starter, the stability of the kernel-land one is lacking and the performance of the user-land one is lacking. I looked at it and decided on FreeBSD, even though I'm a Linux penguin. ZFS on FreeBSD is worlds more stable than ZFS or BTRFS (their currently-buggy clone of ZFS) on Linux. 

The correct canonical way to do tape backups is with a tape library, a tape changer driver, and a program such as BRU or Bacula that knows how to invoke the tape changer driver when it hits the end of a tape. That said, recovery from tape backups is always problematic, and even if you do this you likely want to investigate some better option for normal recovery operations (the kind where a file server goes kablooey and you want to revive its contents, not the kind where your building burns down). At which point ZFS's replication capabilities become *very* interesting...


----------



## dareni (Oct 1, 2015)

I had problems using `zfs send/receive` to send 1 TB to a remote. I decided to break the single 1TB filesystem to contain several children. Now after a network failure at worst only a child needs to be resent. I use my script to take care of recursive sends and to keep the remote in sync:  https://github.com/dareni/shellscripts/blob/master/zfsDup.sh

I finally polished this script today I hope it can be of use to someone else.

Example output:


```
# zfsDup.sh shelltests
Test: array_add()
Test: array_clear()
Test: array_iterator()
Test: nameValidation()
Test: isValidSnapshot()
Test: getSnapshotFilesystems()
Test: getSnapshotData()
Test: getRemoteDestination()
Test: printElapsed()
Test: convertToBytes()
Shell tests completed, check the output for errors.

# zfsDup.sh zfstests
Start zfs tests.
Test: new parent file system.
Test: new child file system.
Test: simulate a failed send of the child filesystem.
Test: duplicate and check the child@2 snapshot is resent.
Test: snapshot existing files with updated child data.
Test: simulate a fail send os child@3
Test: snapshot test1.
Test: snapshot test2.
Test: snapshot test3.
Snapshot tests completed ok.
Test: remote host free space.
Test: new remote FS with no quota.
Test: incremental remote FS update with no quota.
Cleaning up zroot/tmp/zfsDupTest/dest zroot/tmp/zfsDupTest/source
Test execution time: 89secs
ZFS tests completed, check the output for errors.


# zfs list -t all -r ztest
NAME  USED  AVAIL  REFER  MOUNTPOINT
ztest  344K  448M  19K  /ztest
ztest@1  9K  -  19K  -
ztest@6  9K  -  19K  -
ztest/backup  112K  448M  19K  /ztest/backup
ztest/backup@1  9K  -  19K  -
ztest/backup@2  0  -  19K  -
ztest/backup@3  0  -  19K  -
ztest/backup@4  9K  -  19K  -
ztest/backup@5  0  -  19K  -
ztest/backup@6  0  -  19K  -
ztest/backup/data  57.5K  448M  20.5K  /ztest/backup/data
ztest/backup/data@1  0  -  19.5K  -
ztest/backup/data@2  0  -  19.5K  -
ztest/backup/data@3  9K  -  19.5K  -
ztest/backup/data@4  9K  -  19.5K  -
ztest/backup/data@5  0  -  20.5K  -
ztest/backup/data@6  0  -  20.5K  -

# zfs list -t all -r zroot/tmp
NAME  USED  AVAIL  REFER  MOUNTPOINT
zroot/tmp  38K  443M  19K  /tmp
zroot/tmp/zfsDupTest  19K  443M  19K  /tmp/zfsDupTest

# zfsDup.sh ztest zroot/tmp root@localhost
================================================================================
Starting duplication 20151001 16:10:56 ...
ztest@6...new...19K...0hr.0min.1sec
ztest/backup@6...new...19K...0hr.0min.1sec
ztest/backup/data@6...new...20.5K...0hr.0min.0sec
Duplication complete 20151001 16:11:04.
================================================================================

# zfsDup.sh ztest zroot/tmp root@localhost
================================================================================
Starting duplication 20151001 16:11:25 ...
ztest@6...up to date
ztest/backup@6...up to date
ztest/backup/data@6...up to date
Duplication complete 20151001 16:11:29.
================================================================================

# zfs snapshot -r ztest@7
# zfsDup.sh ztest zroot/tmp root@localhost
================================================================================
Starting duplication 20151001 16:12:25 ...
ztest@7...incremental...9K...0hr.0min.1sec
ztest/backup@7...incremental...9K...0hr.0min.1sec
ztest/backup/data@7...incremental...10K...0hr.0min.0sec
Duplication complete 20151001 16:12:33.
================================================================================

# zfs list -t all -r zroot/tmp
NAME  USED  AVAIL  REFER  MOUNTPOINT
zroot/tmp  124K  442M  19K  /tmp
zroot/tmp/zfsDupTest  19K  442M  19K  /tmp/zfsDupTest
zroot/tmp/ztest  86K  442M  19K  /tmp/ztest
zroot/tmp/ztest@6  9K  -  19K  -
zroot/tmp/ztest@7  0  -  19K  -
zroot/tmp/ztest/backup  58K  442M  19K  /tmp/ztest/backup
zroot/tmp/ztest/backup@6  9K  -  19K  -
zroot/tmp/ztest/backup@7  0  -  19K  -
zroot/tmp/ztest/backup/data  30K  442M  20K  /tmp/ztest/backup/data
zroot/tmp/ztest/backup/data@6  10K  -  20K  -
zroot/tmp/ztest/backup/data@7  0  -  20K  -
```


----------

