# ZFS Pool Setup with different vdevs?



## MasterOne (Jul 31, 2011)

I want to setup an Intel SS-4200 Entry Storage Server (with Intel Celeron 420 1.6GHz EM64T CPU, RAM upgraded to 2GB) as a LAN storage/NFS server using ZFS, but I am not quite sure about the disk setup, because I have quite an inconsistent bunch of disks available:


4 x 1TB SATA disks (connected to the internal SATA ports of the SS-4200)
2 x 2TB external SATA disks (connected to the eSATA-ports of the SS-4200)
4 x 1TB SATA disks in an external DeLOCK Raid Box (due to the lack of additional eSATA or USB 3.0 ports I wanted to setup that box to use RAID5 and connect it to one of the USB 2.0 ports of the SS-4200)
2 x 1GB SLC memory USB sticks (could be connected to two of the USB 2.0 ports of the SS-4200 and act as separate L2ARC and ZIL devices)
How would you do the ZFS Pool setup with such a hardware?

Is it right, that a ZFS Pool can consist of several (striped) mirrors, but not a mix of RAIDZ and mirrors?

I guess there us no way to only have one large pool with the mentioned bunch of storage devices (which would not make much practical sense especially due to the slow connected Raid Box by USB 2.0 anyway)?

Will the 2GB of RAM be sufficient to operate that machine (I am a little confused about the real RAM requirements for ZFS, because in the Solaris and FreeBSD wiki it is mentioned as "1GB or more of memory is recommended.", but on an additional note I could find "no tuning may be necessary on systems with more than 2 GB of RAM", and some user wiki pages for ZFS setups talk about 4-8GB of RAM)?

Any recommendations are highly appreciated.


----------



## Sebulon (Aug 1, 2011)

Hi,

it is possible to mix different types of raidzÂ´s and mirror vdevs in the same pool, no problem.

In my own experience, it is impossible to have a pool, or a vdev of a pool, connected with usb2.0. Things like scrubbing are way too IO intense and is going to choke, cutting IO to the pool.

USB for L2ARC and ZIL is bad, in terms of throughput, since it is never going to go higher than (in theroy) 60MB/s. And having just 1GB will be much slower than that. You will have better throughput without separate L2ARC and ZIL, using the pool for that instead. I have made an effort to rate the quality of different SSDÂ´s here, the total score is at the bottom of the page.

In general, RAM requirements are very small. The wiki has examples of tuned systems where 768MBÂ´s are enough. More depends on what other applications are going to be running, limiting ZFS so the other apps get their fair share.

/Sebulon


----------



## vermaden (Aug 1, 2011)

> Is it right, that a ZFS Pool can consist of several (striped) mirrors, but not a mix of RAIDZ and mirrors?



Create VirtualBox machine with FreeBSD and 100MB disks as equivalents for 1TB disks and 200MB for equivalents for 2TB drives and make a mess, test and so, imho it should work without a problem, but its best to find out on a test environment.



> Will the 2GB of RAM be sufficient to operate that machine (I am a little confused about the real RAM requirements for ZFS, because in the Solaris and FreeBSD wiki it is mentioned as "1GB or more of memory is recommended.", but on an additional note I could find "no tuning may be necessary on systems with more than 2 GB of RAM", and some user wiki pages for ZFS setups talk about 4-8GB of RAM)?



2GB RAM is enought for stability, I for example have a 2 * 2TB ZFs mirror with 1GB RAM and use taht box for a lot more purposes then only a storage and its 100% stable, but performance may be lower since ZFS ARC is in RAM and the more RAM you have the more things are cached in RAM, You may use several cheap USB pendrives for ZFS cache (or L2ARC should I say) but for ZIL get at least a pair of SSD drives.


----------



## danbi (Aug 1, 2011)

Do not be tempted to enable dedup with only 2GB of RAM. It will work, but will be very slow, because over time the dedup lookup table will grow and not fit in memory, requiring read I/O for about any write operation etc.

I am curious where did you find the SLC USB flash. Who makes them? L2ARC via USB will not make sense, because you are limited by bandwidth and (probably) IOPS for the USB controller. USB is already considered (as I read somewhere quite brilliantly recently) "Fast serial bus for low-speed peripherals".
It may help as ZIL for some very specific workloads, but probably is not worth it.
What I have discovered to be the biggest trouble with USB is the highly varied degree of support/performance of the USB 2.0 ports. I have few systems, that will kill happily any USB flash you connect there -- both low and high-end (for the USB flash drives and the motherboards). You need to experiment with this, if you decide to trust it.

There is no problem to mix and match vdevs in ZFS -- this is how/why ZFS was designed in the first place. However, you need to understand that it does not have 'priorities' for vdevs (or I am missing something). If you have several vdevs, it will distribute the write load depending on how empty each of the vdevs is -- in an attempt to equalize the load. So if you have lots of free space on your USB attached vdev, you will get a lot of I/O there, that will saturate the USB bus and slow down everything.

It is also not a good idea to run ZFS on top of some other RAID system. That will prevent ZFS from saving your data, should something bad happen with the RAID and I would guess that USB attached RAID does not have much management functions. If you have the ports, it is better to connect each drive via USB separately.

If you have an PCI-Express or PCI-X slot, you might expand using an external JBOD chassis and some of the SAS/SATA controllers that have external connectors.

As you have eSATA ports, you might consider some external port multiplier instead of the USB enclosure -- that is much better option. But you need to verify the server motherboard's SATA controller suppoorts port multipliers.


----------



## MasterOne (Aug 4, 2011)

Took some time to think this over. So that's good news, that it is indeed possible to mix different vdev's in one pool, exactly what I was planning to do. And yes, thinking about a vdev connected by USB 2.0 was a pretty lame idea. The idea with L2ARC on USB stick came from the ZFS tuning guide in the wiki:





> To improve the random read performance, a separate L2ARC device can be used (zpool add <pool> cache <device>). A cheap solution is to add an USB memory stick (see http://www.leidinger.net/blog/2010/02/10/making-zfs-faster/). The high performance solution is to add a SSD.


Unfortunately that entry storage server hardware is pretty limited, I rechecked all possibilities, but that machine definitely can not handle more than 2GB of RAM. There is one PCIe x1 slot onboard, and I am considering a flexible PCIe extender and a USB 3.0 expansion card, so that I can use that external Raid Box in the same ZFS pool. A USB 3.0 expansion card has two USB connectors, so I could use an additional USB 3.0 SLC USB memory stick for either the root filesystem, L2ARC or ZIL (SSDs are out of question due to $$$). How about the following updated setup:

4 x 1TB SATA disks (connected to the internal SATA ports of the SS-4200)
2 x 2TB external SATA disks (connected to the eSATA-ports of the SS-4200)
4 x 1TB SATA disks in an external DeLOCK Raid Box (connected to one of the two USB 3.0 ports of the PCIe x1 expansion card)
1 x 16GB or 32GB SLC USB 3.0 memory stick (connected to the second USB 3.0 port of the PCIe 1x expansion card)
2 x 1GB SLC USB 2.0 memory sticks (connected to two USB 2.0 ports of the SS-4200)
Again there is a bunch of uncertainties with this setup:
Where to put the root filesystem? As a separate ZFS pool on the 4 x 1TB internal drives (which would require to put 3 partitions on each drive: swap + root-pool + data-pool), or onto the 16/32GB USB 3.0 memory stick?
If having the root-pool on the interal drives, what's the best use of that USB 3.0 SLC memory stick? L2ARC or ZIL, or both (split in half with 2 partitions)?
Is there any point to use those two 1GB USB 2.0 SLC memory stick in that setup, and what for?
With all 10 drives in use, that setup would result in 8TB of useable storage:
4 x 1TB internal SATA disks as raidz1 (= 3TB)
2 x 2TB external SATA disks as mirror (= 2TB)
4 x 1TB SATA disks in the external Raid Box as hardware RAID5 (= 3TB)
Which makes me wonder, why not invest in 4 x 3TB drives to have a pool with 9TB of useable space with only 4 drives running, instead of 10 drives... Oh boy!

BTW Those two SLC USB 2.0 memory sticks are from Buffalo, which I bought some years ago. A quick search on Amazon resulted in the two types (16 and 32GB) of SLC USB 3.0 memory sticks from WINKOM, which would come pretty cheap in comparison with equally sized SSDs.


----------



## Sebulon (Aug 7, 2011)

Hi,

I must urge you to first test setting up a separate pool on the one raid5 da-device, copy in some data- like a TB or so and then test to scrub and send/recv, since those HAVE to work for your setup to be successful. I would suspect that USB still is your weakest link and biggest point of failure.

Then, let's say you install fbsd on your 1GB USB drive and than have:

```
[FILE]/boot/loader.conf:[/FILE]

vfs.root.mountfrom=zfs:pool/root
```

Then setup the zfs filesystems like:

```
pool
pool/root
pool/root/usr
pool/root/usr/local
pool/root/usr/home
pool/root/export
pool/root/export/project1
pool/root/export/project2
pool/root/var
etc.
```


```
# zfs set mounpoint=none pool
# zfs set mountpoint=legacy pool/root
# zfs set mountpoint=legacy pool/root/usr
etc.
```

And so on, you can use the 16GB drive for L2ARC, provided that it gives satisfactory throughput compared to your pool, but since L2ARC's can be added and removed on demand, you can just remove it to test the difference. Remember that you still have to have RAM to be able to allocate L2ARC with, which with normal usage is about double the size of your RAM. Forget about USB as ZIL, since nothing except for RAM is gonna cut it in terms of throughput, and you don't have any RAM.

Use the pool as swap. It'll save you the trouble of partitioning.

```
# zfs create -V 4GB pool/swap
# zfs set checksum=off pool/swap
# zfs set compression=on pool/swap
```


```
[FILE]/etc/fstab:[/FILE]

/dev/zvol/pool/swap  none  swap sw 0 0
pool/root            /     zfs  rw 0 0
pool/root/usr        /usr  zfs  rw 0 0
etc.
```

Then, no matter what you do, you don't have to worry about space, since you just use USB to boot and everything else is on the pool.

Works for me

/Sebulon


----------



## MasterOne (Aug 8, 2011)

Sebulon said:
			
		

> I must urge you to first test setting up a separate pool on the one raid5 da-device, copy in some data- like a TB or so and then test to scrub and send/recv, since those HAVE to work for your setup to be successful. I would suspect that USB still is your weakest link and biggest point of failure.


Do you really think, USB 3.0 can be a bottleneck? AFAIK it offers a larger bandwidth than SATA2. That machine has exactly one PCIe x1 slot, which is supposed to be for diagnostic purposes only, and which is why I need a flexible PCIe extender to be able to connect a card at all (which is to be placed on the inside of that machine's case). I thought it may be a good idea to use a USB 3.0 extension card for that slot, because it seems to be easier to route a USB cable to the outside through a hole at the backside of the machine, but I also could use an eSATA extension card instead (of course I then will have to kick the 16/32GB USB 3.0 memory stick), since that external 4-Bay Raid Box can be connected either through eSATA or USB 3.0.

The idea with booting off the 1GB USB 2.0 memory stick was already in the plan, since I want to have the whole ZFS pool on top of GELI encrypted volumes (I just forgot to mention the encryption in my initial posting).



			
				Sebulon said:
			
		

> ```
> # zfs set mounpoint=none pool
> # zfs set mountpoint=legacy pool/root
> # zfs set mountpoint=legacy pool/root/usr
> ...


Not exactly on topic, but I guess I do not really understand that "mountpoint=legacy" option. I thought, for ZFS datasets no mountpoints have to be declared in /etc/fstab, since ZFS is supposed to handle mountpoints internally automatic?



			
				Sebulon said:
			
		

> you can use the 16GB drive for L2ARC, provided that it gives satisfactory throughput compared to your pool, but since L2ARC's can be added and removed on demand, you can just remove it to test the difference. Remember that you still have to have RAM to be able to allocate L2ARC with, which with normal usage is about double the size of your RAM. Forget about USB as ZIL, since nothing except for RAM is gonna cut it in terms of throughput, and you don't have any RAM.


Yes, the lack of RAM is my major concern in the whole setup, but that machine is already equipped with the maximum it can handle, which is 2GB. That's why I want to keep that setup as simple as possible, to not put any more stress on memory usage than necessary. The 16GB SLC USB 3.0 memory stick idea will have to be kicked anyway, if I should go with the eSATA extension card instead of USB 3.0, so maybe I should not bother about a separate L2ARC device at all. The overall performance of that machine does not have to be that great, since the data will be accessed exclusively through NFS over a 1Gbit CAT7-copper-network, so my major concern is about the lack of RAM and the known problems with NFS on ZFS.



			
				Sebulon said:
			
		

> Use the pool as swap. It'll save you the trouble of partitioning.
> 
> ```
> # zfs create -V 4GB pool/swap
> ...


I don't recall where I have read about it, but especially in lack of RAM one should not use SWAP on ZFS, because it will put more stress on the system. I don't mind partitioning the four internal drives as necessary, but I am now still unsure, if I should create only one ZFS pool spanning all available disks, and put the root filesystem (and SWAP) on that one too, or if I should install the root filesystem on a separate ZFS pool only spanning slices on the four internal disks (which was the idea behind partitioning the internal drives with 3 partitions each: swap + root-pool + data-pool), to have the system and data completely separated.



			
				Sebulon said:
			
		

> Then, no matter what you do, you don't have to worry about space, since you just use USB to boot and everything else is on the pool.


Maybe that's still the best idea, nevertheless the RAM shortage, but I am not quite sure about the benefits or disadvantages of having the system and data within the same pool, because it was mentioned somewhere to better have the system data and main data in separate pools, so the system can be exchanged and the data-pool exported/imported easily.


----------



## Sebulon (Aug 8, 2011)

Hi,



> Do you really think, USB 3.0 can be a bottleneck?


Better safe than sorry, right?



> Not exactly on topic, but I guess I do not really understand that "mountpoint=legacy" option. I thought, for ZFS datasets no mountpoints have to be declared in /etc/fstab, since ZFS is supposed to handle mountpoints internally automatic?


Oh, it does. The "legacy" part is for telling ZFS to go "hands off" on the mounting part and letting you do that manually through fstab.
There are two upsides to booting off of the USB!
1: It is less hassle getting FBSD to boot off of UFS than ZFS and
2: You get a nice "rescue" enviroment in case something goes terribly wrong. When in peril, you can reboot, choose nr. 6 at the loader prompt and type e.g.:

```
set vfs.root.mountpoint=ufs:/dev/da0s1a
boot
```
And bam, you are booted entirely off of the USB. BUT if you then have all the mountpoints automatically assigned for the ZFS filesystems, once you import your pool, all of those filesystems will lay themselves "on top" of the USB and all the eventual problems that go with them. Therefore, if you have them all like "legacy", and no filesystems specified in the USBÂ´s fstab, you can choose for yourself what and where to mount.

Another point is if you want to have two pools replicated between them with send/recv, both pools canÂ´t compete for mounts on them same system.



> I don't recall where I have read about it, but especially in lack of RAM one should not use SWAP on ZFS, because it will put more stress on the system.


I have myself set up a Dell GX620 at work with 2GBÂ´s of RAM and 4GB of ZFS swap, acting as a AD-bound SAMBA machine. As of today, it has 30 days of uptime. So I would like to disagree.



> ...if I should create only one ZFS pool spanning all available disks, and put the root filesystem (and SWAP) on that one too, or if I should install the root filesystem on a separate ZFS pool only spanning slices on the four internal disks (which was the idea behind partitioning the internal drives with 3 partitions each: swap + root-pool + data-pool), to have the system and data completely separated.




```
pool
pool/root
pool/root/usr
pool/root/usr/local
pool/root/usr/home
pool/root/export
[B]pool/root/export/project1[/B]
[B]pool/root/export/project2[/B]
pool/root/var
etc.
```
This is completly separated. You have to think of every ZFS filesystem as a separate hard drive, or a partition of a very big hard drive. If you want to separate exported data with NFS or SAMBA, a logical placement would the bold parts. This is similar to the ZFS filesystem structure that e.g. Sun/Oracle use in their Unified Storage Systems with ZFS.



> Maybe that's still the best idea, nevertheless the RAM shortage, but I am not quite sure about the benefits or disadvantages of having the system and data within the same pool, because it was mentioned somewhere to better have the system data and main data in separate pools, so the system can be exchanged and the data-pool exported/imported easily.


That is true, but does it weight up for the loss of useable data?
Besides, you can achieve the same effect with send/recv.

Oh, and if you:

```
# mount [B]-o async[/B] 192.168.1.43:/export/project1 /mnt/project1
```
YouÂ´d just be doing what SAMBA and all is doing, and no speed penalty either. I have personally set up a server with dual 1Gbit NICÂ´s made into one lagg-device, port-channeled at the switch with LACP and two clients with one 1Gbit NIC each; at the same time maxing out towards the server with RAM-disks (md-devices) configured. The NFS-exported md-device on the server was writing data at between 250-300MB/s and both clients reported back that they was able to write 100MB/s at the same time.

/Sebulon


----------



## danbi (Aug 9, 2011)

USB 3.0 may have faster signaling rate than eSATA 2, but this does not mean it will run any faster.
Just as Firewire 400 (400Mbit) is way faster than any USB 2.0 (480 Mbps).
It is typically IOPs you are concerned when connecting flash drives and I doubt many USB controllers are high performance in that respect.

The support of eSATA is likely to be better than the support of USB 3.0 in FreeBSD.
It is best to experiment however.

As to swap on ZFS: you are not going to build large ZFS array and therefore are very unlikely to reach serious performance -- thus if you have space to spare for separate swap partitions, it is better to not use swap over ZFS, especially considering your limited memory capacity. You may have swap partitions on each of your four drives and FreeBSD will interleave the swap usage.


----------

