# ZFS - Best partitioning scheme between SSD and caviar red



## cchamberlain (Apr 7, 2013)

I'm building a home NAS for storage and hosting.  The more I learn about ZFS, the more I think I've proverbially shot myself in the foot on my initial partitioning scheme.

My drives -

Samsung 840 120GB SSD
WD Caviar Red 3TB (eventually will expand to 5 of these)

My initial plan was to use the SSD for all of the main mount points on UFS (boot, /, /usr, /var, etc.) and then just mount WD Red as a ZFS pool, and grow it as I acquire more of the WD Reds down the road.  I have read quite a few threads and can't find anybody doing this so I'm thinking that it probably will not use ZFS' memory features well and I do want the fastest and best overall setup regardless if I have to reformat and start over.

It looks like the sector size of the Red drive is 512 with a stripe size of 4096.  Does anybody have any suggestions for my situation?  I'm now leaning towards maybe putting some of the partitions on the SSD (boot, swap, L2ARC cache), and putting the rest on the ZFS pool.

Please let me know if I can provide any more detail, I'd appreciate any suggestions on the partition scheme between the two drives as well as FS types (UFS vs. ZFS for each partition).  Good partition offsets for my situation would be very helpful to know as well.

Thanks in advance!


----------



## priyadarshan (Apr 7, 2013)

Hi,

I am also quite interested in such a scheme. In my case, I have a 120 GB SSD and already three 3TB Caviar Red. I would like to set up two of them as a mirrored pool, and the third as a hot spare.

Thank you!


----------



## cchamberlain (Apr 8, 2013)

Glad to see I'm not the only person with this question. 

I was looking at this thread which sounds sort of similar to what we're doing - http://forums.freebsd.org/showthread.php?t=38740

I am currently planning on going with @usdmatt's solution with bootcode and cache on the SSD:


> I personally would just create a root pool on the SSD and a separate data pool (I don't know where the 3rd you mention comes from?) or do as suggested above - put bootcode on the SSD, a single disk pool on the HDD and use the remaining SSD space as a cache. Obviously the cache is always empty on boot but I don't think disk performance has much of an effect on boot time anyway, most of it is spent in BIOS, boot loader or device discovery.



I'm not sure whats optimal on the cache side but since boot I imagine would be 512 kB, that would leave nearly 120GB free for cache.  Anyone know if all this cache would be used or if it would be smarter to stick some part of the file system on the SSD beside the cache.

Probably will go ahead with this but if anyone can ring in with experience that would be great.


----------



## throAU (Apr 8, 2013)

I'd personally create a pool using the spinning disk as your storage and use the SSD as L2ARC.  

It's far more important to cache data for performance than the boot files which you will typically read once every boot (how often do you reboot?).

Bear in mind that you won't be able to expand your ZFS pool by adding single drives if you want any sort of redundancy.


What are you trying to achieve, what is your intended workload?  This will determine what trade-offs are "best" for your circumstances.  Because every storage setup is a trade-off in one way or another.


----------



## cchamberlain (Apr 8, 2013)

Thanks for the reply @throAU.  That is starting to sound like the best bet.

So basically if I start off with one ZFS storage drive, I can expand the pool but since it wasn't initially setup with redundancy, the new expanded pool wouldn't have redundancy, correct?  Would I be able to dump the filesystem to a separate computer, repartition and set the pools up with redundancy, and re-import the file system onto the new pool down the line when I get more drives?

Primary usage will be serving media over Samba to my home, but I have a Xeon e3-1245 processor and 16GB RAM so I will likely setup Apache and run usenet apps (SABnzbd, CouchPotato, SickBeard), Newznab and other applications along these lines.


----------



## vermaden (Apr 8, 2013)

cchamberlain said:
			
		

> Samsung 840 120GB SSD
> WD Caviar Red 3TB (eventually will expand to 5 of these)


Better start with at least 2 of them (ZFS mirror), for 5 or 6 I would use RAIDZ2 (RAID6).

Use that SSD as L2ARC cache device.

You can use another 2 SSDs (ZFS mirror) for ZIL.


----------



## usdmatt (Apr 8, 2013)

If you start with a single disk in your pool you can add a second to make a mirror. From this point onwards, the recommended action would be to continue adding mirrored pairs - i.e. go from 1 mirror to 2 mirrors, to 3 mirrors (effectively RAID10).

You will not be able to convert a single disk pool (or a mirror) into a raidz without destroying the pool and re-creating it.

Obviously you can start with 1 disk and just keep adding 1 more in a stripe setup but I would generally advise against a non-redundant config.

Remember that adding bootcode to the SSD and using whole disks for the pool makes managing disks in the pool slightly easier but also means you won't be able to boot if the SSD fails. There's a fair argument for adding bootcode to all the pool disks and using the SSD purely as cache (or cache and ZIL but you may not benefit that much from ZIL depending on your application)


----------



## kpa (Apr 8, 2013)

If the SSD fails you can boot from an USB memory stick that has the same bootcode.


----------



## usdmatt (Apr 8, 2013)

Just to add the 3TB RED disks are definately advanced format (4k) so you'll want to look up the posts on here about getting alignment right and using gnop to get the right ZFS setup.

I don't know when we're going to see a simple way to override the sector size for new pools, they've been discussing various methods on the mailing lists for ages now.

As with pretty much all disks, the REDs still claim to have a 512b sector to the OS although they also show a 4k stripe size (I don't think this is universal though as the devs would of already jumped on it as a simple way to identify AF drives).


----------



## cchamberlain (Apr 8, 2013)

Thanks for all the great advice!  I ended up just going for it last night and set up the pool with the intention that I will be doing it again soon (everything is not going to be set up optimal anyway the first time I try it, right).

I'm glad to know they have the 4k sector size, I was getting thrown off by the report it was showing me.  Planning on ordering some more WD Reds today and will read  up on gnop. I had planned to offset the sectors for 4k optimization but after hours of trying to destroy partitions and getting "Device busy" last night, I was just happy when I finally got it to partition zfs on there.

FYI to anyone having this issue - I was initially going off these instructions on setting up ZFS on GPT partition. They are dated for the current USB installer (so far as I can tell) since I could not find a way to get into Fixit. I tried entering sysinstall from the shell and navigating to it like that but none of the Fixit menu area would let me actually get into it. Instead, I used the updated instructions here. The author calls out the issue with the current installer in the first paragraph.



			
				usdmatt said:
			
		

> If you start with a single disk in your pool you can add a second to make a mirror. From this point onwards, the recommended action would be to continue adding mirrored pairs - i.e. go from 1 mirror to 2 mirrors, to 3 mirrors (effectively RAID10).



So this sounds to me like I should buy drives in even numbers while I'm adding mirrors, then wipe out everything if I make the jump to RAIDZ when I get 5 or 6 disks, right? Do I need to set up the first mirror before the file systems are created on the pool?


----------



## usdmatt (Apr 8, 2013)

You can start with a single disk, create file systems, put data on it and then convert to a mirror without any problem.


```
# zpool create pool disk1 (single disk)
(you can start adding filesystems/data now)
# zpool attach pool disk1 disk2 (mirror)
# zpool add pool mirror disk3 disk4 (2 mirrors)
# zpool add pool mirror disk5 disk6 (3 mirrors)
```

*Make sure you learn the difference between the attach and add subcommands*
They get a lot of people into trouble. attach creates a mirror (or adds another disk to a mirror), and add adds a new vdev to the pool (i.e. add-ing disk2 instead of attach-ing it would stripe your data across the 2 disks rather than make a mirror, and you can't undo it).

You can even have a single mirror, add another disk (giving one mirror + one single disk), then later on make that single disk a mirror (giving 2 * mirror) and so on but this isn't really advisable. When you have mirrors + a standalone disk in the pool, you lose everything if that standalone disk fails.
Just for interest that would go something like this:


```
# zpool create pool disk1 (single disk)
# zpool attach pool disk1 disk2 (single mirror)
# zpool add pool disk3 (mirror + single)
# zpool attach pool disk3 disk4 (2 mirrors)
# zpool add pool disk5 (2 mirrors + single)
# zpool attach pool disk5 disk6 (3 mirrors)
```

I seem to say this about once every few days now but I follow the method used by @vermaden in the forum post below. Do not use sysinstall. Whenever I've tried to use bsdinstall it seems to give me problems when I drop back to the console after the install to sort the zpool.cache stuff. I find the method that works for me without any issue is to just go into the live cd and do the whole thing by hand:

http://forums.freebsd.org/showthread.php?t=31662

Last time I installed a 4k drive I started the first real partition at 1m (using the -b 1m option), and the rest aligned to 4k (with the -a 4k option). Don't know if there's a preferred method.

I too quite often run into device busy errors when messing with disks. I might be making this up (been a month or two) but I seem to remember the gpart commands complain about this but I can usually get around it by dding the start/end of the disk manually and then starting again.


----------



## cchamberlain (Apr 9, 2013)

Thanks @usdmatt, this was exactly what I was looking for.  Just ordered a second WD Red, should be here tomorrow and I'll stick it in as a mirror, then down the road when I get a third I will likely switch to RAIDZ for its space benefits.

I had a similar issue with bsdinstall, got through the step of deleting/creating the zfs partitions, then on committing it said that one of the partitions could not be deleted.  I will definitely be following @vermaden's advice on the base install when I switch over to RAIDZ down the line.

Out of curiosity, do you leave any space free at the end of the disk due to ZFS requiring that future disks are greater than or equal to the current disk size, and have you run into any issues with not being able to add additional drives because the random variance of drive sizes.  I see some people saying to not use the full size and other people just saying to let it use the whole disk.


----------



## gpw928 (Apr 9, 2013)

Hi,

You are by no means alone.  It's been something of a wait for FreeBSD to get SSD support, and the WD Reds have a lot to offer at the value end of spinning disks.

I have a setup very similar to yours, and hope that this thread develops well.

Some observations from reading a lot of Sun stuff on ZFS regarding the ZFS intent log (ZIL) are:


the ZIL never needs to be larger than 50% of main memory (so quite small);
the ZIL turns random writes into sequential writes (performance wise); and
the ZIL must be low latency storage compared to the tank (i.e. SSD);
My plan has been to put the ZFS cache and ZIL onto my Samsung 840 Pro.  It's SATA 3 connected, so I'm hoping that there is enough bandwidth for both.

Also I have noticed that provided you keep the block size right (I'm using NFS) ZFS RAID1Z behaves like a stripe (i.e. very fast).

Cheers,


----------



## Terry_Kennedy (Apr 9, 2013)

cchamberlain said:
			
		

> My initial plan was to use the SSD for all of the main mount points on UFS (boot, /, /usr, /var, etc.) and then just mount WD Red as a ZFS pool, and grow it as I acquire more of the WD Reds down the road.  I have read quite a few threads and can't find anybody doing this so I'm thinking that it probably will not use ZFS' memory features well and I do want the fastest and best overall setup regardless if I have to reformat and start over.


What are your goals for this NAS? One important factor is the percentage of writes vs. reads. If you expect to be doing a fair number of writes, you might investigate using the SSD for a ZIL device and using something else for the base operating system storage. Remember, SSD's have a finite capacity for writes and it may not be the best use of the drive for storing things that are only have a relatively short useful life, like most of the files in /var/log. You also don't need ultimate speed for operating system files - the user experience will be based on how fast your NAS can get data from the ZFS pool to the user (or vice versa).

And there's something else to consider regarding that last sentence - you'll only get 125Mbyte/sec (best case) on a Gigabit Ethernet, so if you aren't doing local processing such as media transcoding, your money may be better spent on slower disks with more capacity for the same price.

On my RAIDzilla II systems I'm using a pair of WD Blue notebook-class drives for the operating system (mirrored, UFS format), 16 x 2GB WD RE4 drives for the ZFS pool, and a 300GB (way oversize) PCIe SSD as a ZIL device. This config will read or write at > 600Mbyte/sec continuously, with burst writes above 4GB/sec (see JPEG here). 

To answer your question about adding drives, you can add drives / vdevs to many ZFS configurations, but ZFS will not move pre-existing data to balance space between the existing and added devices. So, unless you will be adding lots of data, you'll wind up doing lots more I/O to the older vdev(s) than the new ones. To balance things you'd need to back up the pool data somewhere, re-create the pool, and restore. But that doesn't seem like it will work for you, as adding drives one at a time implies there isn't any place with enough space to back up the existing data. At pools of the size I'm using, it also gets challenging to move 13TB or so of data off so I can re-initialize the pool.

One point raised in a later reply was hot spares. There's no "auto" in ZFS autoreplace on FreeBSD - you'll need to manually tell ZFS to start using the hot spare. There was some discussion about adding this to devd(8), but I haven't kept track of where that went.


----------



## wblock@ (Apr 9, 2013)

cchamberlain said:
			
		

> Thanks @usdmatt, this was exactly what I was looking for.  Just ordered a second WD Red, should be here tomorrow and I'll stick it in as a mirror, then down the road when I get a third I will likely switch to RAIDZ for its space benefits.



That will require a backup and reformat of the existing drives.  ZFS can't morph from a mirror to RAIDZ1 on its own.



> Out of curiosity, do you leave any space free at the end of the disk due to ZFS requiring that future disks are greater than or equal to the current disk size, and have you run into any issues with not being able to add additional drives because the random variance of drive sizes.  I see some people saying to not use the full size and other people just saying to let it use the whole disk.



Later versions of ZFS are reported to leave unused space at the end of the drive for just that reason.  Exactly which versions, and how much unused space, I have not found.


----------



## wblock@ (Apr 9, 2013)

gpw928 said:
			
		

> Hi,
> 
> You are by no means alone.  It's been something of a wait for FreeBSD to get SSD support,



Please expand on that--do you mean SSD support in the installer, or TRIM support in ZFS, or something else?



> Also I have noticed that provided you keep the block size right (I'm using NFS) ZFS RAID1Z behaves like a stripe (i.e. very fast).



My 3-disk RAIDZ1 seems to be about twice as fast as a single disk (180M/sec, AFAIR).  It's enough, but not as fast as I'd hoped.


----------



## rusty (Apr 9, 2013)

cchamberlain said:
			
		

> Out of curiosity, do you leave any space free at the end of the disk due to ZFS requiring that future disks are greater than or equal to the current disk size, and have you run into any issues with not being able to add additional drives because the random variance of drive sizes.  I see some people saying to not use the full size and other people just saying to let it use the whole disk.



Personally I like to partition, better safe than sorry.
It would be a minor annoyance with a 2 way mirror, anything beyond that would really irritating.


----------



## cchamberlain (Apr 9, 2013)

Great responses, let me see if I can respond to everything in one go.



			
				gpw928 said:
			
		

> My plan has been to put the ZFS cache and ZIL onto my Samsung 840 Pro. It's SATA 3 connected, so I'm hoping that there is enough bandwidth for both.



I was thinking about doing this but read that having them both on the same SSD might encumber performance.  In my case I have a Samsung 840 120GB (picked up a cheap one for this, have the pro in my desktop/laptop) which I think will be just about perfect L2ARC size to coincide with my 16GB RAM and 15TB of HDD end state (3TB x 5).  As of right now I'm not planning on putting in multiple SSDs since I'm fairly limited with 6 SATA connections (using P8H77-I mini-ITX motherboard) - my goal here was to pack as much power into a mini-ITX form factor as possible.  I have a single PCI-e slot on the board that could have been used as another SATA controller but since I cannot seem to get the onboard video to work, I had to resort to using a graphics card - still no idea why it won't work since my processor (E3-1245) has support for integrated video but that's off topic.

In terms of read vs. write, I'd like to favor read speed since the primary usage will be streaming to my LAN.  Also, I went with the cheapest Samsung SSD I could find so as long as my data doesn't get corrupted (my understanding is if L2ARC dies everything will just run a little slower), I don't mind throwing the SSD in the garbage in 6 months if it comes down to that.



			
				Terry_Kennedy said:
			
		

> To answer your question about adding drives, you can add drives / vdevs to many ZFS configurations, but ZFS will not move pre-existing data to balance space between the existing and added devices. So, unless you will be adding lots of data, you'll wind up doing lots more I/O to the older vdev(s) than the new ones. To balance things you'd need to back up the pool data somewhere, re-create the pool, and restore. But that doesn't seem like it will work for you, as adding drives one at a time implies there isn't any place with enough space to back up the existing data. At pools of the size I'm using, it also gets challenging to move 13TB or so of data off so I can re-initialize the pool.



On that question, I was mostly just curious if it was possible to add mirrors without reformatting.  I don't have any data on the server yet, right now I'm still tuning and don't mind reformatting if need be.  I have a beast of a desktop with a couple of WD Black 2TB alongside a 840 Pro which holds all my data at the moment so I have time to get it right.  Given that the zpool hasn't had all of my data moved to it (pretty much just the standard root file system currently), would it be beneficial to start over with the mirror I'm going to add today or should I just add the mirror on?



			
				wblock@ said:
			
		

> That will require a backup and reformat of the existing drives.  ZFS can't morph from a mirror to RAIDZ1 on its own.



Yes, this was my understanding, I'll plan on backing up the data to my desktop when I switch over to RAIDZ1.  Its always better to be verbose though. 



			
				wblock@ said:
			
		

> Later versions of ZFS are reported to leave unused space at the end of the drive for just that reason. Exactly which versions, and how much unused space, I have not found.



Thanks for clearing that up!

Will be moving forward with adding in the mirror tonight.


----------



## Terry_Kennedy (Apr 10, 2013)

cchamberlain said:
			
		

> Out of curiosity, do you leave any space free at the end of the disk due to ZFS requiring that future disks are greater than or equal to the current disk size, and have you run into any issues with not being able to add additional drives because the random variance of drive sizes.  I see some people saying to not use the full size and other people just saying to let it use the whole disk.


In many cases this is an over-emphasized concern. First, I don't know of any current drive manufacturer which will warranty-replace an "xGB" drive with one with fewer sectors. That means that any RMA replacements you get for the next 3 (or 5, etc.) years will be the same size or bigger. I've had experiences with both Seagate and WD where they're replaced a drive under warranty with a larger, newer model because they no longer stocked the older model. [For that matter, I've had the same thing with my PCIe SSD's - the manufacturer replaced my 256GB ones with 320GB ones because they no longer had any of the older ones to meet warranty requirements.]

The second concern is that drives in the "far" future might have slightly different capacities, and might be a few blocks smaller than previous models. I was surprised that I've never run into this - in fact, I recently replaced some Seagate Cheetah 15K.4 drives with Cheetah 10K.6 drives, which are from a completely different family and 2 generations apart, and they both have exactly 286749488 sectors.

One thing to be aware of is that sizes may vary between drive manufacturers. Custom firmware for OEM's like Dell, HP, etc. often reports a different number of sectors than the generic version of the same drive model. This is so the OEM's can customize the capacity so all of their xGB drives have the same number of sectors, regardless of manufacturer. That allows them to ship out any brand of, say, 300GB 7200 RPM SATA drive as a spare part regardless of whether they were made by Seagate, WD, Samsung, etc. without needing to worry about customers having problems re-adding them to an array.

In closing, if you haven't created a pool yet, you might want to consider sizing down the drive capacity a little "just in case". But if you have an existing pool utilizing the entire capacity of the drives, don't panic - it will probably make no difference.


----------



## throAU (Apr 10, 2013)

If you're storing media files (or other large files that are streamed), and accessing them via 1 GbE, the benefit of a ZIL or L2ARC will probably be limited unless you have a *large *number of users (to randomize your IO), as your spinning disks will saturate 1 GbE already.  

However, I'd wager L2ARC will be more useful (workload more read-biased) and won't require an SSD mirror, as an SSD failure won't impact data integrity.

From what I've read, if you're wanting an SSD based ZIL, you really should have multiple SSDs set up as a mirror, otherwise a failure in an SSD can potentially cause pool corruption, etc.


----------



## Terry_Kennedy (Apr 10, 2013)

throAU said:
			
		

> From what I've read, if you're wanting an SSD based ZIL, you really should have multiple SSDs set up as a mirror, otherwise a failure in an SSD can potentially cause pool corruption, etc.


I have confirmed that this was fixed as of the ZFS v28 import (and subsequent changes over the next month or so after that). I had a ZFS v15 pool where the SSD-based ZIL failed (PCIe SSD which used flash chips on SODIMM-like modules, where a connector problem was common). The pool could be read, but any attempt to write it would cause the system to panic immediately.

Recently (a month or two ago) I built another RAIDzilla II on a bet that I couldn't do it for $3000 or less (not counting the 16 2TB drives). It ended up costing $3001.70. Anyway, that 'zilla had another one of those same SSD's, and experienced the same problem with bad flash connectors. However, this system was using 8-STABLE with the latest ZFS, and it was quite easy for me to remove the ZIL from the pool and install a replacement PCIe SSD (which, fortunately, no longer uses connectors) and add it to the pool.

Yes, if you lose the ZIL you may have some uncommitted writes. However, even with mirrored SSD's you can have this happen unless the SSD's DRAM memory is backed up by either a battery or a supercap. And there's also the issue of the "real" disk drives and controller correctly reporting synchronous command completion properly. If the drive or controller says the data was written, the OS (any OS) has no way of knowing otherwise.

I use a non-mirrored PCIe SSD ZIL with supercaps, a 3Ware 9650SE-16ML with battery backup, and WD RE4 'enterprise' drives. All four 'zillas (128TB total) are on a UPS which provides at least 4 hours of runtime if there's a power failure. Based on that, I believe I have taken reasonable steps against my pools getting irrecoverably damaged. Any data being written to the pool can be re-created, and I've never had ZFS lose or corrupt data once it has been committed to the pool.

I also do regular backups to LTO-4 media which is stored offsite, as well as replicating the data on other 'zillas at an offsite location via a dual 1Gbit/sec Ethernet link.

I could split the PCIe SSD and do mirroring - the underlying architecture is a LSI SAS2004 controller running the Integrated RAID firmware with 4 independent flash controllers and chips. So I can run it as a stripe (which is how I have it configured), or half-capacity as a mirror. At some point providing redundancy in the ZIL actually reduces overall performance - the ZFS pool without the SSD can do 600 to 700 Mbyte/sec reads or writes.

By the way, resilver performance when replacing a fast PCIe ZIL is amazing. This is actual output from the resilver I did after doing a replace on the ZIL:

```
(0:29) rz3:/sysprog/terry# zpool status
  pool: data
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Feb 27 14:23:19 2013
        5.36T scanned out of 17.8T at 3.56G/s, 0h59m to go
        0 resilvered, 30.18% done
```


----------



## cchamberlain (Apr 10, 2013)

More good info, thanks guys.

So I'm messing around with the mirror and having some trouble.  If I understand correct, I need to first partition the hard drive the same way as my other drive.  On my first drive, I have the following partitions (please ring in if any of these look unnecessary or non optimal sizes) -


```
$ gpart show ada1
=>        34  5860533101  ada1  GPT  (2.7T)
          34           6        - free -  (3.0k)
          40         128     1  freebsd-boot  (64k)
         168    16777216     2  freebsd-swap  (8.0G)
    16777384  5843755744     3  freebsd-zfs  (2.7T)
  5860533128           7        - free -  (3.5k)
```

I setup the second drive in the same fashion with gpart -


```
$ gpart show ada2
=>        34  5860533101  ada2  GPT  (2.7T)
          34           6        - free -  (3.0k)
          40         128     1  freebsd-boot  (64k)
         168    16777216     2  freebsd-swap  (8.0G)
    16777384  5843755744     3  freebsd-zfs  (2.7T)
  5860533128           7        - free -  (3.5k)
```

I tried writing the bootcode on second drive, then glabel it as disk1 using glabel label -v disk1 /dev/ada2 and got an instant "Corrupt or Invalid GPT detected.  GPT rejected -- may not be recoverable".  I then deleted the slices and destroyed the partition and tried again, this time I didn't do the bootcode, and I did it from Live CD and got the same issue.  Any idea what I'm doing wrong?

My understanding is that I need to have the same slices on the partition and a label to use with the zpool attach command.  Here is some info on the zpool in case that helps -


```
$ zpool status
  pool: zroot
 state: ONLINE
  scan: none requested
config:

        NAME         STATE     READ WRITE CKSUM
        zroot        ONLINE       0     0     0
          gpt/disk0  ONLINE       0     0     0
        cache
          ada0       ONLINE       0     0     0

errors: No known data errors
```


```
$ zpool list
NAME    SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
zroot  2.72T  2.53G  2.72T     0%  1.00x  ONLINE  -
```


----------



## cchamberlain (Apr 10, 2013)

Terry_Kennedy said:
			
		

> Recently (a month or two ago) I built another RAIDzilla II on a bet that I couldn't do it for $3000 or less (not counting the 16 2TB drives). It ended up costing $3001.70. Anyway, that 'zilla had another one of those same SSD's, and experienced the same problem with bad flash connectors. However, this system was using 8-STABLE with the latest ZFS, and it was quite easy for me to remove the ZIL from the pool and install a replacement PCIe SSD (which, fortunately, no longer uses connectors) and add it to the pool.



Wow, all I can say about that is holy crap.  May I ask what your primary usage is?  Website?

By PCI-E SSD, is that an mSATA or something else?


----------



## kpa (Apr 10, 2013)

Do not use glabel(8) for labeling disks or partitions on GPT partitioned disks. GPT has its own labeling system that is superior in many ways. Also labeling the whole disks does not make sense if you want to identify partitions by easy names.

After creating the partitions as you did above:

`# gpart modify -l swap1 -i 2 ada1`
`# gpart modify -l swap2 -i 2 ada2`

`# gpart modify -l disk1 -i 3 ada1`
`# gpart modify -l disk2 -i 3 ada2`

Do these to force GEOM "retasting" to make the labels visible in /dev/gpt immediately:

`# true >/dev/ada1`
`# true >/dev/ada2`

You can see the labels in the output of

`# gpart show -l`

Then you can use the names gpt/swap1 gpt/swap2 for building a gmirror(8) for swap.

`# gmirror label myswap gpt/swap1 gpt/swap2`

And build the ZFS pool using the names  gpt/disk1 and  gpt/disk2

`# zpool create mypool mirror gpt/disk1 gpt/disk2`

The bootcode is written with these, on both disks:

`# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada1`
`# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada2`


----------



## Terry_Kennedy (Apr 10, 2013)

throAU said:
			
		

> If you're storing media files (or other large files that are streamed), and accessing them via 1 GbE, the benefit of a ZIL or L2ARC will probably be limited unless you have a *large *number of users (to randomize your IO), as your spinning disks will saturate 1 GbE already.


Very true. The systems I build do a large amount of local processing, so it's worth getting the highest possible performance from the pool. Serving files over a 1Gbit/sec LAN will not tax most ZFS configurations unless there is a lot of thrashing going on (as you point out). In fact, most of the better 2TB+ drives (7200 RPM, etc.) can probably saturate a 1GbE link with only a single drive, and a 2-drive stripeset definitely will.

Where things get interesting is _way_ at the high end of the scale. Here is a benchmarks/iozone graph on a pool with neither a ZIL nor a L2ARC device. FreeBSD 8.4-PRERELEASE. I can't say a lot about the hardware (work system vs. my hobby 'zillas). Peak performance of 7GB/sec while in the processor cache, then 4.5GB/sec from main memory, and finally dropping "down" to 1.5GB/sec when it actually has to hit the disks in the pool. The main problem I have is how long it can take to do a maximum iozone run, despite the fast storage in use.


----------



## Terry_Kennedy (Apr 10, 2013)

cchamberlain said:
			
		

> Wow, all I can say about that is holy crap.  May I ask what your primary usage is?  Website?


Hobby usage. All 128TB are available to my web servers, though most of the data isn't used on any of my web pages. A lot of it is large datasets relating to my other hobby - auto racing. I've got engine/vehicle performance, actual wind tunnel performance, and simulations of all of the above. Also lots of high-res laser scans of the car for 3D modeling. Sorry, no porn.

Work is different - substantial local computations on large(r) datasets, so CPU performance is also important.



> By PCI-E SSD, is that an mSATA or something else?


Something else. OCZ Enterprise Velodrive DC-HHPX8-320G - specs here. I picked up a number of them inexpensively since they were just EOL'd in the last few months.


----------



## cchamberlain (Apr 10, 2013)

Terry_Kennedy said:
			
		

> Hobby usage. All 128TB are available to my web servers, though most of the data isn't used on any of my web pages. A lot of it is large datasets relating to my other hobby - auto racing. I've got engine/vehicle performance, actual wind tunnel performance, and simulations of all of the above. Also lots of high-res laser scans of the car for 3D modeling. Sorry, no porn.
> 
> Work is different - substantial local computations on large(r) datasets, so CPU performance is also important.
> 
> ...



Sounds like you are in big data.  Those drives have some nice read speed.


----------



## cchamberlain (Apr 11, 2013)

kpa said:
			
		

> Do not use glabel(8) for labeling disks or partitions on GPT partitioned disks. GPT has its own labeling system that is superior in many ways. Also labeling the whole disks does not make sense if you want to identify partitions by easy names.
> 
> After creating the partitions as you did above:
> 
> ...




This was very helpful for getting the labels correct, however I ran into an issue when mirroring the disks.  I first tried the command you gave but it didn't work because I already have a pool up and I'm trying to add a mirror to it (per usdmatt's instructions on page 1).

`# zpool attach zroot gpt/disk1 gpt/disk2`

```
cannot attach gpt/disk2 to gpt/disk1: no such device in pool
```

I also tried -

`# zpool attach zroot disk1 disk2`

```
cannot open 'disk2': no such GEOM provider
must be a full path or shorthand device name
```

Will probably just wipe the disks again tonight and start over, at least I'm learning. 

If I'm doing something glaringly wrong please let me know.  Also, am I setting up the swap mirror in typical fashion or could I be doing it in a better way?  Does the ordering of my SATA connections on my motherboard matter?  If I switch the connections, would the disks still stay ada1 and ada2?  If I'm understanding correctly the actual labels are put on the last sector of the disks, but is the geom ada1/ada2 also written to the disks?  Thanks again for bearing with me.


----------



## wblock@ (Apr 11, 2013)

cchamberlain said:
			
		

> `# zpool attach zroot gpt/disk1 gpt/disk2`
> 
> ```
> cannot attach gpt/disk2 to gpt/disk1: no such device in pool
> ```



To use those GPT labels that were assigned, give an absolute, not relative path to them:
`# zpool attach zroot /dev/gpt/disk1 /dev/gpt/disk2`


----------



## cchamberlain (Apr 11, 2013)

wblock@ said:
			
		

> To use those GPT labels that were assigned, give an absolute, not relative path to them:
> `# zpool attach zroot /dev/gpt/disk1 /dev/gpt/disk2`



Tried it, got the 
	
	



```
cannot attach /dev/gpt/disk2 to /dev/gpt/disk1: no such device in pool
```
  I did a `gpart show -l` and I noticed that what used to be GEOM ada1 device got renamed to ufsid/5162c4e28540c74a.  Not sure how that happened, but I'm about ready to reformat, feels like something is screwed up.

Another question I have is if I'm mirroring the disks and mirroring the swaps, am I not supposed to mirror the boot slice also?


----------



## wblock@ (Apr 11, 2013)

One problem at a time.  The first is referring to GPT labels, which appear in /dev/gpt.

I don't think attach is the right command there.  See Example 2 in zpool(8).


----------



## kpa (Apr 11, 2013)

@wblock@, the relative names work fine, all geom(8) utilities support using shortcut names without the leading /dev.

Back to the problem, if you're adding a new mirror vdev to an existing pool the correct command is `# zpool add`.

`# zpool add zroot mirror gpt/disk1 gpt/disk2`.

However I have to ask, how does the pool look without the mirror that is going to be added? If it's now a single disk pool adding a mirror vdev to it does not make much sense. The resulting pool wouldn't have full redundancy for all of the data stored.


----------



## wblock@ (Apr 11, 2013)

kpa said:
			
		

> @wblock@, the relative names work fine, all geom(8) utilities support using shortcut names without the leading /dev.



Hmm. I didn't count on ZFS behaving like native FreeBSD stuff.


----------



## cchamberlain (Apr 11, 2013)

kpa said:
			
		

> However I have to ask, how does the pool look without the mirror that is going to be added? If it's now a single disk pool adding a mirror vdev to it does not make much sense. The resulting pool wouldn't have full redundancy for all of the data stored.



Its pretty much just a fresh install.  This is what I was thinking might end up happening, wasn't sure if adding the mirror copied existing data or not so that clears that up.

Sounds like the smartest move at this point would be to wipe the drives and start over clean, right?


----------



## cchamberlain (Apr 11, 2013)

usdmatt said:
			
		

> You can start with a single disk, create file systems, put data on it and then convert to a mirror without any problem.
> 
> 
> ```
> ...



I was going off these instructions that usdmatt posted on the first page.  I was under the belief that using add on an existing pool would stripe the drives.  He mentioned specifically not to use add if I was adding a second drive into a single mirror as it is undoable (quite possible I misunderstood something here).


----------



## kpa (Apr 11, 2013)

cchamberlain said:
			
		

> Its pretty much just a fresh install.  This is what I was thinking might end up happening, wasn't sure if adding the mirror copied existing data or not so that clears that up.
> 
> Sounds like the smartest move at this point would be to wipe the drives and start over clean, right?



You're not offering much information and most of what you offer is contradicting itself. 

What kind of pool you want to build, that's the first thing you have to make very clear to yourself and state it very clearly in your posts. Otherwise we can't offer any reliable instructions.


----------



## cchamberlain (Apr 11, 2013)

kpa said:
			
		

> You're not offering much information and most of what you offer is contradicting itself.
> 
> What kind of pool you want to build, that's the first thing you have to make very clear to yourself and state it very clearly in your posts. Otherwise we can't offer any reliable instructions.



Sorry if I've made things complicated, bad habit of typing what I'm thinking.  First and foremost, my goal is to learn the ins-and-outs of FreeBSD as fast as possible via trial and error.  I know what I want to do with the server, still not locked down on what the best configuration would be to get there (because I'm jumping headfirst into ZFS).  I'll try and keep the questions a little more focused.

As I've said, my end state usage is very clear - a mix of home media and file server as well as personal web server (mostly intranet usage - usenet applications, SABnzbd, CouchPotato, SickBeard, etc.).  I want to first get this mirror going and then in a few weeks or so will likely wipe that out and go RAIDZ1 when I pick up a third drive.  Again, goal is to learn the gotchas of ZFS, and improve on my final build.  Whats the point in redundancy if you accidentally wipe out all your data typing in the wrong command, right?  I should have made that clearer from the start.

These are the primary questions I'm still researching

What is the best practice for setting up boot and swap partition (slice?) mirroring on ZFS?  If swap should be mirrored, why not boot?
How should the 4k offset be done on my drives?

And yes, I realize most of this is on the internet and I have read quite a few pages / forums on the 4k stuff alone. Its taking a while to process everything since it seems like a lot has changed in the install process that is not in the top (official looking) hits on Google, leading to a lot of confusion over contradictory information.

Please let me know if anything is not clear at this point and thanks for all the help so far!


----------



## gpw928 (Apr 11, 2013)

wblock@ said:
			
		

> Please expand on that--do you mean SSD support in the installer, or TRIM support in ZFS, or something else?



Hi Warren,

I'm booting from a conventional GEOM mirror.  Some time soon I might get game enough to turn that into a ZFS mirror.

I was mostly referring to TRIM support for SSDs in general, and ZFS in particular.  Following all the threads on the 4K block thing has taken time too.

I know that ZFS is self-levelling to some extent.  But sorting out the issues takes time and effort.

Cheers,


----------



## wblock@ (Apr 11, 2013)

Experimental TRIM support for ZFS is under way, might be in -CURRENT already.  UFS has had it for quite a while.  See Using a Solid State Drive with FreeBSD.


----------



## kpa (Apr 11, 2013)

If I get this right you started with one disk and you want to attach another disk to create a mirror vdev? In that case the proper procedure is `# zpool attach` and yes, the existing data will be replicated on the newly attached disk.

It goes something like this:

Partition the disks the way you already did.

Create the pool with one disk assuming the partition on the first disk is called gpt/disk1:

`# zpool create pool gpt/disk1`

Now fill the pool with data.

Then attach the second disk or more precisily the partition gpt/disk2 on it to the pool:

`# zpool attach pool gpt/disk1 gpt/disk2`

This should give you exactly what you want with the existing data replicated on the second disk and ZFS handling the redundancy for you in a completely transparent way.


Of course you could create the pool with a mirror vdev at the very start but this is more of a proof that you can always turn a single disk ZFS pool into a mirrored one.


----------



## cchamberlain (Apr 12, 2013)

Thanks @kpa, I finally got around to repartitioning the system (ended up going for a clean install).  I followed @usdmatt's advice and used @vermaden's guide to a T, and added my SSD as cache device.  My setup now looks like -
`# zpool status`

```
pool: sys
 state: ONLINE
  scan: none requested
config:

        NAME          STATE     READ WRITE CKSUM
        sys           ONLINE       0     0     0
          mirror-0    ONLINE       0     0     0
            gpt/sys1  ONLINE       0     0     0
            gpt/sys2  ONLINE       0     0     0
        cache
          ada0        ONLINE       0     0     0
```

I'm happy with this setup for now, but down the line if I get more drives and go to RAIDZ1, would I be able to use the dump/restore commands to back up pertinent data elsewhere, reformat the system, install as RAIDZ1, and restore the data?  Would I be better off using beadm to back up everything elsewhere?

If anyone has done this, I'm just looking for the simplest process, perhaps there is an up to date thread on this somewhere that I haven't come across yet.


----------



## kpa (Apr 12, 2013)

dump(8) does not work with ZFS. You'll have to use either `# zfs send` / `# zfs receive` to create/restore backups or use net/rsync for the same purpose.


----------



## cchamberlain (Apr 13, 2013)

kpa said:
			
		

> dump(8) does not work with ZFS. You'll have to use either `# zfs send` / `# zfs receive` to create/restore backups or use net/rsync for the same purpose.



Okay thanks, I'll take a look at those.  This thread has gotten me to a good point, from my perspective you are good to mark it as solved.  Appreciate all the help everyone has thrown in!


----------

