# ZFS unmirrored files in a mirrored pool ???



## ralphbsz (Jan 23, 2014)

I'm pretty much a novice in using ZFS, although I really like what I've seen so far.  My home server has a ZFS file system for /home, which is physically on two disks, using a mirrored pool over two identical-size SATA disks.

I'm now discovering that a large fraction of the disk space on that file system is taken up by a few handfuls of large files (ranging from 2 to 20 GB each), which are in reality either temporary, or don't need to be stored redundantly, because they can be recreated easily (for example ripped copies of DVDs of which I own the original media, or backup copies of various databases).  If I could store these files non-mirrored, I could save a lot of disk space.

What is the most sensible way to tell ZFS that certain files (perhaps all files in a certain directory or under a certain mount point, or all files with a certain pattern in their name, or just individually identified files) shall be stored non-redundantly?  I'm perfectly willing to rename or move the files.

Right now, there is only one way I can see to do it, and it's very inflexible: first, estimate how much space should be reserved for non-redundant files.  Then take the disks out of the mirror pool, one at a time.  Repartition each disk (with `gpart` for example), make the existing ZFS partition smaller, and create a new ZFS partition, which is about half the size of the disk use of non-redundant files.  Put the (now slightly shrunk) ZFS partition back into the mirror, and start resilvering the mirror.  On the small new partitions, create a new ZFS file system, which uses the two small partitions, just as a non-mirrored file system spread over two partitions.  Move the offending files to the new non-redundant file systems.  Once all the moving and resilvering is done, do the same to the other disk (this may have to be done in two steps, because of space constraints).  The reason this is inflexible: I have to guess the amount of space used by the two classes of files, and if that changes, I have to move all the data around.

Is there a better way?  I can't see any policy-based placement in the ZFS man pages.  Nor  do I see a way for the same raw volumes to be used in two pools (one mirrored, one not).  There is a "zfs set/get copies=n" command, but (a) it applies to a while file system (or snapshot or volume), not to an individual file or directory, and (b) it doesn't reduce the mirroring already done by the mirrored pool, rather creates additional copies on top of it.

Good ideas, anyone?


----------



## usdmatt (Jan 23, 2014)

Unfortunately multiple pools is about the best you can do. You've obviously got a good idea of how it works and what is/isn't possible already.

Do you have compression turned on? That will probably help save some space.


----------



## ralphbsz (Jan 23, 2014)

If I were running out of disk space, I could turn on compression.  Fortunately, I'm still good on space.  To be honest, I don't like compression, because it is complex to implement, and I'm not sure I would trust it enough.  Given the cost of hardware, and the ease of upgrading disks in a mirrored pool (one at a time), I would probably just buy bigger disks.  It's a bit silly to waste space on storing multiple copies of things that don't need that, but for my personal situation, the cost/benefit tradeoff of putting lots of work into it favors throwing money at the problem, if there isn't an easy answer.


----------



## usdmatt (Jan 24, 2014)

> I don't like compression, because it is complex to implement, and I'm not sure I would trust it enough.



Are you aware that ZFS has compression built in? I don't see how it's untrustworthy or complex to implement? It's incredibly easy to switch on*, extremely stable and general advice is just to turn it on anyway because it saves space (often a lot of space), cuts down on disk IO and will actually perform faster as the compression algorithms will outperform your disks.


```
# zfs set compress=on pool/dataset
# zfs set compress=lz4 pool/dataset (much better if supported - 9.2/10+)
```

*Turning on compression will only affect new data written, not any existing files so it's best to do it when a dataset is first created. There's no problem with turning it on or off at any time though, the compression flag is stored with each record.

I do agree though that keeping a redundant pool and just making it big enough for everything makes more sense than trying to split it into multiple pools.


----------



## kpa (Jan 24, 2014)

Compression can in some cases result in faster read speeds because the number of blocks required to read a file can be only 25% of the uncompressed blocks if the data compresses well. The time spent on decompression is insignificant compared to how much time reading the extra blocks from the disk takes.


----------

