# ZFS + dedup + backups



## xy16644 (Nov 21, 2013)

Hi _a_ll_,_

I've been thinking about a couple of ideas I have for my next server and I hope you can assist.

I want to use three old hard drives for backing up my entire server each day as follows:

1 x 300 GB for daily `ZFS send/receive` (used for DR scenarios)
1 x 300 GB for daily tar backups (used for restoring deleted files/directories)
1 x 500 GB used for monthly backups (probably a mixture of tar and/or `ZFS send/receive`)
(Each hard drive will be its own ZFS pool.)

Currently when I use ZFS send/receive I compress the ZFS file it creates with gunzip*.* The question I have is (with the required RAM), will dedup help save space in this scenario? Will dedup work on gunzip files (i*.*e*.*: filename.zfs.gz) and tar files?

When running these types of backups (full system backups) there*'*s obviously going to be plenty of duplication each day as there will be many files on the system that won't change. Will dedup help me here? All the backup and system drives will be encrypted with geli*.*

I think I know all the downsides to running dedup on a FreeBSD system so I will be installing 16 GB of ECC RAM in this server (later upgradeable to 32 GB). The ZFS root and boot directory are only 120 GB in total. 

Am I on the right track in thinking that dedup in this case will help me save lots of disk space? FYI: I won't be running dedup on the ZFS root and boot directory. dedup will only be running on the three backup drives mentioned above.

Thanks guys!


----------



## ondra_knezour (Nov 22, 2013)

xy16644 said:
			
		

> Currently when I use ZFS send/receive I compress the ZFS file it creates with gunzip



As I understand how send/receive works, if you save the sent stream to a file, one bit flipped or unreadable may lead to the loss of all data streamed.


----------



## wblock@ (Nov 22, 2013)

sysutils/rsnapshot can do the second and third steps in a space-efficient, automatic way, without a lot of setup.


----------



## kpa (Nov 22, 2013)

I doubt that the archive files will always have common parts that would be usable for dedup compression even if the source files for them have changed very little. Remember that dedup works on block level and even a single byte shift will mean that large amounts of blocks are different and can not be deduped any longer.


----------



## xy16644 (Nov 22, 2013)

Ok thanks guys. So dedup doesn't sound like its going to help me. I thought that because the contents of the tar or compressed ZFS files had plenty of duplicate files that it would help me save some space.


----------



## xy16644 (Nov 22, 2013)

One more question. My current server uses about 1GB to 1.5GB of RAM. The new server I am going to build will have 16GB RAM. Is it necessary for me to have a swap file? All drives will have ZFS on it and will be encrypted with `geli`. I'd like to use ZFS pre fetching if I can and maybe limit it to betwen 12GB and 14GB of RAM.

Can I run my new server without a swap file? And use ZFS pre fetching?


----------



## AndyUKG (Nov 22, 2013)

Hi,

  perhaps I don't understand your requirements well but I think you are missing the obvious solution when dealing with ZFS.

For arguements sake, let*'*s say you take a snapshot once a day on your live ZFS data. This you can send to another ZFS pool via ZFS send/receive. If you do the rec_ei_ve on the remote pool (let*'*s call it pool2) then you have the snapshot replicated onto pool2. So for example after a week you have 7 snapshots on pool2, each snapshot being a complete point in time backup of your data. Given a snapshot by definition only stores changes between snapshots you have no need for deduplication (at least for the reason you stated, that many files will be the same). Additionally with ZFS you can configure pool2 to use GZIP compression natively to optimise pool2 for space vs performance.

You are free to delete old snapshots on your live pool to save space, so long as you always leave the newest one intact so you can do another incrememental ZFS send the next day.

Hope that helps, if I've missunderstood what you are trying to achieve or you don't think this solution is applicable for some reason let us know,

thanks, Andy.


----------



## xy16644 (Nov 22, 2013)

Hi Andy

Thanks for your reply!

I like what you have described and would like to give it a try on a test machine. Its quite different to the way I am currently doing it. All I am doing now is doing a `ZFS send` to a compressed file as follows:

```
/sbin/zfs snapshot -r zroot@`date +%d.%m.%Y`-zroot
/sbin/zfs send -Rv zroot@`date +%d.%m.%Y`-zroot | gzip > /backups/zroot/`date +%d.%m.%Y`-zroot.zfs.gz
/sbin/zfs destroy -r zroot@`date +%d.%m.%Y`-zroot
```

You've given me something to think about (and test)...thanks!


----------



## xy16644 (Nov 30, 2013)

Ok, so I have given this some thought and read some more about ZFS send/receive but I'm still a but unsure of how to achieve my goal of backing up my server. A bit more background: I'll be building a new server in the coming weeks (just waiting for the hardware I ordered yesterday to arrive) and the hard drives will be configured as follows:

2 x 120 GB SSD drive will be in a root ZFS pool called "zroot" using mirroring (RAID 1) (ALL my data will be in this pool.)

1 x 1 TB SATA drive will be used as the backup drive and be in a ZFS pool called "backups".

What I would like to do is run a script via cron daily that backups up my data daily and then deletes the oldest snapshot after 31 days.

So is this what I need to do:

```
zfs snapshot -r zroot@today
zfs send -R zroot@today | zfs receive -F backups
(and then after 31 days) zfs destroy -r zroot@today
```

I'm basically wanting to have 31 days worth of backups on the 1 TB that I can use in case I need to restore something. Am I on the right track? 

How do I automate this so that after 31 days the oldest backsups are deleted?
Do I need to use incremental backups in this somehow? 
Do the snapshots that are created on the ZFS pool "zroot" need to be kept after they are sent/received to the backup pool? Or can they be deleted?
I'd appreciate any help!


----------



## ralphbsz (Dec 2, 2013)

Will dedup save a lot of space?  Good question.  If you look at the academic literature on dedup (and there is piles and piles of that around), you get a wide variety of estimates of how much space dedup saves.  In particular, how much space it saves depends crucially on HOW dedup is done.  If you dedup whole files only (only dedup a second file if it contains exactly the same bits as the first file), the saving tends to be smaller.  If you dedup fixed size blocks (typically 4 KiB page-size blocks), you often save some more space, but not always.  The best saving can be accomplished by dedup'ing variable size blocks, using an algorithm that cuts data into blocks on content boundaries.  However, even that technique (which can be computationally quite expensive) will not always save a lot more disk space than whole file dedup.

One of the important questions here is: what is in your file system?  If your file system is a large file server, being used for an Internet-scale data center, then one of the largest consumers of disk space will be VM boot/root disk images.  Those are large (duh), and tend to show good savings when doing fixed- or variable-size block dedup (and very little saving on whole file dedup).  A lot of the excitement (hype?) around dedup is for these types of servers, where dedup actually does good (and where the money for vendors selling large storage servers is).

On a normal home user file system, it is not clear that dedup will actually save a lot of space.  It heavily depends on usage.  For example, if you have unpacked copies of 100 different releases of the Linux kernel, plus object and library files, it probably will save space.  Few people do stuff like that (but they tend to be the power users whose systems get talked about in the studies).

So, here is one real data point.  I have a FreeBSD server at home, used for minor stuff (a little software development, in-home server, router/firewall).  To show you how small it is: the /home file system is on ZFS, fits easily on 1 TB disks (using 2-way mirroring and ZFS), and is only half full.  The bulk of the disk space (not by the number of files, but by the bytes used) is in media files, which is

video files from the camcorder (having a child who is active in sports and music you end up with lots of material from band concerts and soccer games),
ripped CDs (I am an avid music listener, with a largish CD collection, nearly all ripped to disk), 
pictures from the digital camera (snapshot).
I have my own backup program, which currently keeps a complete archive of all files that have ever existed on my /home file system, on two separate disk drives (one at home, own at a remote site).  This is done by walking the whole /home file system every hour, and looking for files that have not been backed up yet.  The backup program performs whole-file dedup, in the following sense: it actually stores files indexed by their hash code (using an SHA-256 hash), and if two files are the same size and have the same hash, only one is stored.  You would think that doing this dedup saves a lot of space, wouldn't you?  You would be wrong.  If you are looking for files of whose content more than two file system entries exist in the backup, that amounts to just a few percent of the backup space.  So why do I do dedup?  Remember, I said that I save a complete archive of all files that ever existed, including ones that no longer exist (have been deleted or removed).  And dedup is actually useful in removing the disk space usage from files that are renamed!  Imagine this: I take the SD card out of the camera after a few weekends that included a soccer tournament or band concert, and I copy a few GB of files to a temporary holding space (say /home/video/TEMP).  Then I get around to classifying the files, and I rename them to /home/video/soccer/Nov2013/GameAgainstAnotherHighSchool.mts or /home/video/band/Dec2013/ChristmasConcert.mts.  If my backup program didn't dedup, it would now contain two copies of the same file (one which has been labeled as "deleted"); instead I have only one copy of the file, and two references to it.  BUT: even with this application of dedup, it only saves about 1/3 of the space (my backup would grow to about 133% of its current size if I didn't dedup).

So, before you go to extreme effort to dedup, consider whether it's worth it.


----------



## wblock@ (Dec 2, 2013)

`zdb -S poolname` can be used to simulate dedup and see how much space will be saved.


----------



## xy16644 (Dec 2, 2013)

Thanks @ralphbsz and @wblock@ for the reply!

I've decided not to use dedup. I use my server for email mainly and it is used by family members. I thought that when someone sends an email with many photos attached to a few people on the same server that dedup would have helped save some space but after all I have read there seems to be too many downsides with ZFS dedup currently.

I ran `sudo zdb -S zroot` and the results on my ZFS root pool are as follows:

```
dedup = 1.11, compress = 1.24, copies = 1.07, dedup * compress / copies = 1.30
```

Can anyone comment on setting up incremental backups with ZFS send/receive? I want the full/incremental backups to be stored on another pool for a month (after a month the oldest backup must be removed) BUT I don't want the snapshots to be kept on the ZFS root pool where the snapshots are taken, i.e. after the snapshot is taken on the ZFS root pool and successfully received by the backup pool I want to delete the snapshot on the ZFS root pool. Is this possible?

What I'm trying to achieve is having a month's worth of backup on another drive that I can use for restores. Ideally I'd like to see something like:

JANUARY/1
JANUARY/2
JANUARY/3
etc.

And in each of these folders would be the incremental backups. I would assume one folder would have the FULL backup.

Can someone assist or point me in the right direction?


----------



## usdmatt (Dec 2, 2013)

I have my own script for managing ZFS snapshots but there are a few ports that may do what you want. Basically what you want is something like the following:

Create the first snapshot and send to the backup. This creates the backup file system at the same time. You will be left with the file system on both pools, both with the 2013-12-01 snapshot.

```
# zfs snapshot source/fs@2013-12-01
# zfs send source/fs@2013-12-01 | zfs recv backup/fs
```

The next day you create a new snapshot and send the differences between the two. You will end up with both snapshots on each pool. Once this has completed successfully you don't really need the first snapshot on the source filesystem anymore. You do need the snapshot you just took however, as you'll need that tomorrow to send the next set of differences.

```
# zfs snapshot source/fs@2013-12-02
# zfs send -i 2013-12-01 source/fs@2013-12-02 | zfs recv -F backup/fs
# zfs destroy source/fs@2013-12-01
```

If you keep repeating that you will end up with all the snapshots on the backup pool, and only yesterdays on the source pool. You just need to come up with some way of clearing any snapshots on the backup pool older than 31 days. You could use a unixtime as the snapshot name and use that to determine how old they are (bit ugly). You could use dates as above, then get your script to work out which dates should be kept. You can also use `zfs get -p creation some/fs@snap` to get the creation date of a snapshot in unixtime.

In my script I use user properties to store the retention settings for each file system which is quite neat.

```
NAME                 PROPERTY         VALUE            SOURCE
storage/mail  net.hiddendomain:snap  none,14,6,3,0    local
```
First field is the pool to send to (in this case this filesystem is actually the backup, populated using rsync, and I'm just keeping snapshots), the other four are the number of daily, weekly, monthly, yearly snapshots to create.



> And in each of these folders would be the incremental backups. I would assume one folder would have the FULL backup.



You will end up will as full copy of your filesystem in /backup/fs/, which will look exactly like your source system did when you performed the last backup. If you look in /backup/fs/.zfs/snapshot/snapname/, you will see the entire filesystem as it was when that snapshot was taken.


----------



## xy16644 (Dec 3, 2013)

Thank you @usdmatt.

That's what I was trying to achieve but I was having a problem trying to automate it all (I'm not a programmer but can do some basic scripting).

I was working through this article: http://www.aisecure.net/2012/01/11/automated-zfs-incremental-backups-over-ssh/ (I'm not interested in using SSH as my backups will be on the local machine so I removed the SSH bits from the examples and script). When I ran the script it just didn't seem to run ok and often failed. The article is almost two years old so maybe it needs updating for FreeBSD 9.x?

Can anyone help with this script?

Another question:

Taking the above into account, lets say my first full backup is 20 GB. Then let's say with the incremental backups there's 1 GB of changes. Am I correct in saying that in a month with 31 days (and if I ran the backup script everyday) that it would take up 50 GB of disk space? I.e. 20 GB (for the one full backup) + 30 GB (30 days x 1 GB each day). Is what I have said above correct in the amount of disk space that will be used for the incremental backups?

Also, if you deleted the oldest snapshot on the backup pool, wouldn't this delete all the newer snapshots?


----------



## usdmatt (Dec 4, 2013)

> Taking the above into account, lets say my first full backup is 20GB. Then lets say with the incremental backups theres 1GB of changes. Am I correct in saying that in a month with 31 days (and if I ran the backup script everyday) that it would take up 50GB of disk space? ie: 20GB (for the one full backup) + 30GB (30 days x 1GB each day).



Yes, if ~1 GB of data is being changed each day then your backup filesystem would use around 50 GB of space. In practice, you may find you can store a lot more than 31 days without really using that much more disk space than the original filesystem (unless you make regular heavy changes).



> Also, if you deleted the oldest snapshot on the backup pool, wouldn't this delete all the newer snapshots?



If you stored the original snapshot in a file, with each 'incremental' in additional files (which really isn't recommended), then yes, deleting the original 'big' snapshot would lose the data as you'd only be left with the small incremental files.

However, it works completely differently when you are sending to a second pool. The zfs send/recv commands completely duplicate the filesystem on the remote pool. When the first snapshot is sent, the filesystem is recreated on the destination, identical to the source. Each time you send a snapshot, you are effectively bringing that second filesystem up to date. When you now delete the oldest snapshot, it functions just like deleting a snapshot on any other ZFS filesystem. All that happens is that blocks referenced *only* by that snapshot are freed. If blocks referenced by that snapshot are still in use by other snapshots (or the actual "live" filesystem), they are kept.


----------



## xy16644 (Dec 4, 2013)

usdmatt said:
			
		

> > Taking the above into account, lets say my first full backup is 20GB. Then lets say with the incremental backups theres 1GB of changes. Am I correct in saying that in a month with 31 days (and if I ran the backup script everyday) that it would take up 50GB of disk space? ie: 20GB (for the one full backup) + 30GB (30 days x 1GB each day).
> 
> 
> 
> ...



Thanks again @usdmatt.

I'm still battling to fully understand the incremental side of backups with ZFS. 

Let me use an example. Let's assume I don't backup to file, don't use SSH and only use ZFS send/recv for my backups. Lets say the two pools are called source and backup. If my source pool has the following files in it: a, b and c. And I take a full backup as follows:

```
# zfs snapshot source/fs@2013-12-01
# zfs send source/fs@2013-12-01 | zfs recv backup/fs
```
My backup/fs will now have a complete copy of the source/fs pool and show the files a, b and c, correct?

Lets say I delete the file a off source/fs and create a new file called d. Now if I do the first incremental backup as follows:

```
# zfs snapshot source/fs@2013-12-02
# zfs send -i 2013-12-01 source/fs@2013-12-02 | zfs recv -F backup/fs
# zfs destroy source/fs@2013-12-01
```
How will backup/fs look at this point after the first incremental backup? Will it show:

a b c d
b c d
What will show in the backup/fs/.zfs/snapshot directories? Will I just see the deleted a file in backup/fs/.zfs/snapshot/2013-12-02?

If I ever want to restore a deleted file from an incremental backup do I just copy the relevant file from the backup/fs/.zfs/snapshot/ directory? (I.e. I don't want to rollback the entire snapshot.)


```
When you now delete the oldest snapshot, it functions just like deleting a snapshot on any other ZFS filesystem. All that happens is that blocks referenced only by that snapshot are freed. If blocks referenced by that snapshot are still in use by other snapshots (or the actual "live" filesystem), they are kept.
```

So if I had my full backup and three incrementals (called Day 1, Day 2 and Day 3) and I deleted the Day 1 incremental snapshot on the backup/fs pool, would I still be ok? I assume you can never touch/delete the full backup.

On a more practical note, my current server uses about 20 GB of space and the backup drive is 1 TB. Not much changes day to day on this server except new emails received/sent, new logfiles generated and maybe the ports tree being updated. From what I understand, I would be able to store at least a year or two (or more!) of backups on the 1 TB drive if I had one full backup and then incrementals after that? My main goal is to just be able to copy deleted files/emails from the incremental backups when needed. 

Thanks again!


----------



## usdmatt (Dec 4, 2013)

> How will backup/fs look at this point after the first incremental backup? Will it show:
> a b c d
> b c d
> What will show in the backup/fs/.zfs/snapshot directories?



If you look in the 'live' filesystem, /backup/fs you will just see b,c & d, exactly as on the primary. In /backup/fs/.zfs/snapshot you will have a folder for both snapshots. Looking in /backup/fs/.zfs/snapshot/2013-12-01/ you will see the entire filesystem, as it was when that snapshot was taken, so a,b & c. Looking in /backup/fs/.zfs/snapshot/2013-12-02/ you will see b,c & d.

If you now delete the 2013-12-01 snapshot off the backup, file a will be gone but b, c & d will still exist as they are being 'held' by the 'live' filesystem and the 2013-12-02 snapshot. You can even delete the 2012-12-02 snapshot; You'll have no snapshots left (and won't be able to do any more incremental sends) but you'll still have the backup/fs 'live' filesystem with b,c & d in it.

I really don't find snapshots that hard a subject to understand, although I'm having trouble explaining it clearly.
Do you understand how snapshots work in general? You have a filesystem containing 'live' data, then you take a snapshot. You can continue to use the 'live' filesystem as normal, but you can look in the snapshot at any point and see the entire filesystem as it was when that snapshot was taken. If you delete a snapshot, only the blocks that are no longer referenced by 'live' files or other snapshots are removed. The same exact logic as this applies on your backup filesystem when you use send/recv. You can go and delete the very first snapshot on the backup if you want (the one when you did the 'full' send), and it will only free up blocks that are not needed by the 'live' filesystem or any of its other snapshots.



> So if I had my full backup and three incrementals (called Day 1, Day 2 and Day 3) and I deleted the Day 1 incremental snapshot on the backup/fs pool, would I still be ok? I assume you can never touch/delete the full backup.



Once the data is on your backup pool there isn't really any concept of 'full' or 'incremental' backups, you just have an identical replica of your live filesystem and all it's snapshots.



> From what I understand, I would be able to store at least a year or two (or more!) of backups on the 1 TB drive if I had one full backup and then incrementals after that?



Yes, you'll probably be able to store snapshots going back quite a long time easily. The nice thing is that if you start to get low on space, you can just delete a few of the older snapshots, which takes seconds.


----------



## xy16644 (Dec 5, 2013)

@usdmatt

The penny has finally dropped! Your explanation has helped me understand how this all works now so thanks very much. I think where I was going wrong was, I thought the "live" backup file system was static and contained the original full backup. I then thought that the snapshots were the incremental backups and I also thought that each incremental backup depended on the previous one. I set up a test VM last night and everything you said was 100% and makes sense now. I'm really impressed with how ZFS works even more so now.

The last question I have is then: How do I automate ZFS incremental backups using snapshots?  I've tried the script at http://www.aisecure.net/2012/01/11/automated-zfs-incremental-backups-over-ssh/. The script is as follows (I have removed the SSH bits):

```
#!/bin/sh
 
pool="zroot/usr/src"
destination="tank/test"
 
today=`date +"$type-%Y-%m-%d"`
yesterday=`date -v -1d +"$type-%Y-%m-%d"`
 
# create today snapshot
snapshot_today="$pool@$today"
# look for a snapshot with this name
if zfs list -H -o name -t snapshot | sort | grep "$snapshot_today$" > /dev/null; then
echo " snapshot, $snapshot_today, already exists"
exit 1
else
echo " taking todays snapshot, $snapshot_today"
zfs snapshot -r $snapshot_today
fi
 
# look for yesterday snapshot
snapshot_yesterday="$pool@$yesterday"
if zfs list -H -o name -t snapshot | sort | grep "$snapshot_yesterday$" > /dev/null; then
echo " yesterday snapshot, $snapshot_yesterday, exists lets proceed with backup"
 
zfs send -R -i $snapshot_yesterday $snapshot_today | zfs receive -Fduv $destination
 
echo " backup complete destroying yesterday snapshot"
zfs destroy -r $snapshot_yesterday
exit 0
else
echo " missing yesterday snapshot aborting, $snapshot_yesterday"
exit 1
fi
```

But I don't seem to be getting very far! What I am trying to achieve with this script is to automate the ZFS incremental snapshots with the following requirements:


Only have one snapshot on the source pool at any point in time (no more than two).
Keep one or two years worth of incremental backups on the backup pool and then delete the oldest snapshot after the first or second year.

Can someone point me in the right direction? I'm not a coder/programmer so I can't write this myself from scratch.

Thanks!


----------



## xy16644 (Dec 8, 2013)

I spent some time this weekend trying to put together a script(s) that I can use to do ZFS incremental backups AND delete snapshots older than so many days. This is what I have come up with so far (and seems to be working):

Source pool name: zroot
Destination backup pool name:tank/fs

ZFS incremental backup script (this assumes you have taken the first full backup manually):

```
#!/bin/sh
today=`date +"$type%Y.%m.%d"`
yesterday=`date -v -1d +"$type%Y.%m.%d"`

zfs snapshot -r zroot@Daily_`date +%Y.%m.%d`

zfs send -R -i zroot@Daily_$yesterday zroot@Daily_$today | zfs receive -duv tank/fs
```

This will create a snapshot on the backup pool "tank" called tank@Daily_2013.12.08 if the date is 8 Decemeber 2013. The zroot snapshot taken will be in the same format.

I then run a new script to delete old snapshots as follows:

```
FILESYSTEMS="zroot@Daily_=1 tank/fs@Daily_=5"
for filesystem in $FILESYSTEMS; do
    set -- `echo $filesystem | tr '=' ' '`
    echo $1 $2
    zfs list -t snapshot -o name -s name |grep ^$1 |sort -r| sed 1,$2d |sort  | xargs -n 1 zfs destroy -r
done
```

This keeps a day's worth of snapshots for zrootand five days worth for tank/fs.

I haven't used these scripts with cron yet but running them manually and changing the server's date seems to work fine.

Have I done this correctly? I'm not a scripter/programmer so I have just taken various scripts/commands on the Internet and tried to combine them to make them do what I want. Ultimately it would be nice to combine this into one script.

Appreciate any feedback or suggestions!


----------



## xy16644 (Dec 8, 2013)

The one issue I have found so far is, if the backups skip a day (for whatever reason), the incremental backup fails with:

```
local fs tank/fs does not have fromsnap (Daily_2013.12.14 in stream); must have been deleted locally; ignoring
cannot receive new filesystem stream: destination 'tank/fs' exists
must specify -F to overwrite it
local fs tank/fs does not have fromsnap (Daily_2013.12.14 in stream); must have been deleted locally; ignoring
```

The snapshot on zroot is there for the day but on the backup pool there are no newer snapshots.

Is there a way to incorporate some kind of error checking into this script to account for skipped backups?

Thanks!


----------



## xy16644 (Dec 10, 2013)

Anyone? 

I just can't figure out how to handle a skipped backup in the script so that the incremental still runs. If a backup is skipped (for whatever reason) then can the incremental run from the previous days backup? How can I incorporate this into the script?

Thank you!


----------



## AndyUKG (Dec 11, 2013)

Hi,

  I have been using scripts for a few years to send snapshots in this way, as the systems are servers they are always on so I never have to worry about the snapshot not being done or sent the day before. On the odd occasion when there has been a system outage I manually correct the issue. If you really need to have some code to cope with situations where the last snapshot is missing on the target pool it will be possible to do by listing the available snapshots on each pool and looking for the newest snapshot that exists on both pools. However I don't have this code in my scripts so cannot provide you with a working script to do this. If it's of interest I can provide you with my scripts as they are,

Thanks, Andy.


----------



## xy16644 (Dec 11, 2013)

Hi Andy

My server is on 24x7 too but I was wondering, how do you handle an issue where the server is off when the next incremental backup runs? I.e.: what if there*'*s an extended power failure and you run out of battery on the UPS? How do you manually correct the issue if you miss a backup? That*'*s what I am interested to know is dealing with a skipped backup. I don't mind doing it manually but I just want to know HOW (and what) to deal with it.

Thanks!


----------



## da1 (Dec 12, 2013)

Hi,

Try `zfs send -I` and modify your script to:

get the latest snapshot from the backup server;
get the latest snapshot from server1;
create a new snapshot on server1;
send all snapshots in between backup server snapshot and server1 snapshot.

Quick (untested) example:

```
# get latest snapshot from backup server
  remote_snapshot="`ssh -i $ssh_key <username>@<backup.server> \
    zfs list -Ht snapshot -o name -s name | grep <pool_name> | tail -1 | \
      awk -F'@' '{print $NF}'`"

# extract last local snapshot
snapshot_last=`zfs list -Ht snapshot -o name -s name | grep ^${zpool_name}@ | \
                tail -n 1 | awk '{ print $1 }'`

# create local snapshot
zfs snapshot -r <pool_name>@<snapshot_name>

# send 
zfs send -I $remote_snapshot $snapshot_last | ssh -c arcfour -i $ssh_key \
      <username>@<backup.server> sudo zfs receive -dF <pool_name>
```


----------



## xy16644 (Dec 12, 2013)

How is this different to `zfs send -i`?


----------



## da1 (Dec 12, 2013)

```
-i snapshot
                 Generate an incremental stream from the -i snapshot to the
                 last snapshot.  The incremental source (the -i snapshot) can
                 be specified as the last component of the snapshot name (for
                 example, the part after the @), and it is assumed to be from
                 the same file system as the last snapshot.

                 If the destination is a clone, the source may be the origin
                 snapshot, which must be fully specified (for example,
                 pool/fs@origin, not just @origin).

-I snapshot
                 Generate a stream package that sends all intermediary snap-
                 shots from the -I snapshot to the last snapshot.  For exam-
                 ple, -I @a fs@d is similar to -i @a fs@b; -i @b fs@c; -i @c
                 fs@d.  The incremental source snapshot may be specified as
                 with the -i option.
```


----------



## xy16644 (Dec 12, 2013)

Ah, got it. I will give this a try and see how it goes. Thanks!

What I'm thinking is: use my script as is (with the -i option) and then if I have a skipped backup then run it manually with -I. Is this correct?


----------



## da1 (Dec 12, 2013)

Why the hassle? Simply use the -I flag and totally forget about a skipped backup as it will be picked up automatically (kind of "self healing"  ). You will only have to intervene if the whole script fails for some reason.


----------



## xy16644 (Dec 12, 2013)

da1 said:
			
		

> Why the hassle? Simply use the -I flag and totally forget about a skipped backup as it will be picked up automatically (kind of "self healing"  ). You will only have to intervene if the whole script fails for some reason.



That works great if I run it manually. If I take an incremental snapshot today and then I skip two days worth of incrementals the -L option lets me still do a `zfs send/recv` successfully.

The problem I have now is that when the automated script runs and I simulate skipping two days worth of incremental backups then the script fails as it is looking for today's date minus one,  i.e.:

```
today=`date +"$type%Y.%m.%d"`
yesterday=`date -v -1d +"$type%Y.%m.%d"`

zfs snapshot -r zroot@Daily_`date +%Y.%m.%d`

zfs send -R -I zroot@Daily_$yesterday zroot@Daily_$today | zfs receive -duv tank/fs
```

Is there a way to build some logic into this so that it says, if yesterdays incremental doesn't exist then check for the most recent incremental snapshot and run as usual?

I'm so close!  :stud


----------



## kgatan (Dec 20, 2013)

> Is there a way to build some logic into this so that it says, if yesterdays incremental doesn't exist then check for the most recent incremental snapshot and run as usual?




```
zfs list -t snapshot -o name
```
The above command gives you a list of the names of all available snapshots which you can then break down with a script.

I run a very limited script in my environment which returns the last available snapshot but it requires all snapshots to be named as dates in the form of '131220'.


```
zfs list -t snapshot -o name | grep '/<filesystem name>@' | grep -o '[0-9]\{6\}' | tail -1
```


----------

