# Clustered Filesystem/ZFS



## z3R0 (Feb 1, 2011)

Anyone have any recommendation for setting up clustered storage?
Specifically ZFS on top of a clustered filesystem?

Is glusterfs any good? http://www.gluster.org/

If I could find something similar to theGluster Storage Platform but based on FreeBSD would be great!

Have a look at these videos for reference:
Gluster Storage Platform Install
Creating a Storage Volume on Gluster 

Recommendations?

Thanks!

z3r0


----------



## vermaden (Feb 1, 2011)

Check HAST mate: http://wiki.freebsd.org/HAST


----------



## z3R0 (Feb 2, 2011)

HAST looks like a software RAID.

I'm looking for something that will allow a shared pool storage primarily for video.
Something to replace XSAN or StroNext.

Except that is a clusteredFS with ZFS on top specifically on FreeBSD.

thanks!

z3r0



			
				vermaden said:
			
		

> Check HAST mate: http://wiki.freebsd.org/HAST


----------



## vermaden (Feb 2, 2011)

ZFS is not clustered, You would have to export it on one node and import on the second, I generally do not know any clustered filesystem for FreeBSD, also check this: http://blog.elitecoderz.net/cluster-filesystem-for-freebsd-gfs-ocfs2/2010/06/


----------



## z3R0 (Feb 7, 2011)

I think I would need to run ZFS on top of a clustered filesystem. I'm not entirely sure how this actually works. In other words would the client be talking to the clustered fs or ZFS?

I read some things on using lustreFS with ZFS on top, but I'm not sure if thats stable yet.

I'm going to give glusterFS a shot with with ZFS on top. The documentation is dated on how to accomplish this.
Seems that glusterFS runs in userspace so I'll need FUSE. Hopefully a native cluster fs in the freebsd kernel will become a reality soon.



			
				vermaden said:
			
		

> ZFS is not clustered, You would have to export it on one node and import on the second, I generally do not know any clustered filesystem for FreeBSD, also check this: http://blog.elitecoderz.net/cluster-filesystem-for-freebsd-gfs-ocfs2/2010/06/


----------



## AndyUKG (Feb 7, 2011)

Hi,

  So fundamentally you want to put ZFS on top of another technology, and then have a ZFS pool simultaneously imported on more than one host? I don't think it matters what tool you use for the cluster part, but I'm pretty sure you are just gona break ZFS (the pool) if you do this. When a ZFS pool is imported the system assumes it has exclusive access to the devices in the pool, if another system starts making updates to them its going to get in a mess.
I'm not sure what ZFS functionality interests you, but I guess you could put GlusterFS on top of ZFS, you can create block volumes on ZFS which you can format (ie with UFS for example) and generally treat as if they were any other block device (ie a disk). With that you can take advantage of ZFS RAID/Mirror, snapshots, compression etc.

thanks Andy.


----------



## aragon (Feb 7, 2011)

z3R0 said:
			
		

> I read some things on using lustreFS with ZFS on top, but I'm not sure if thats stable yet.


That's not stable yet, and I haven't seen anyone working on any Lustre support in FreeBSD.

If commercial is an option, IBM's GPFS is rather good. (linux/AIX only)


----------



## phoenix (Feb 7, 2011)

Isn't it the other way around, using ZFS to manage the storage, exporting raw volumes, and using Lustre on top of the zvols?  Yeah, that's the way that Oracle describes it.  ZFS is the bottom layer, with Lustre hooked into the DMU (data management unit, the volume manager).


----------



## thrruss (Feb 8, 2011)

*GFS would be easier*

I don't understand why nobody thought in porting GFS2 into FreeBSD kernel.
It would be an easier solution than Lustre or GlusterFS!

On my side, I have quite the same problem. I have 1 SAN (with iSCSI) providing the same LUN to 3 FreeBSD servers. It would be much easier for me to attach iSCSI and use GFS as FS! But I think I will have to move from FreeBSD to L*nux (berk)...


----------



## Pfarthing6 (Mar 1, 2011)

*hast and zfs?*

I'm trying something similar and attempting to go the HAST route. I'm running into a problem trying to create a hast resource for a ZFS filesystem. I can do it with a ZVOL, but if I try it against the zpool I get an error that this is a directory. And there is nothing created in /dev for a basic zpool for me to provide to HAST.

For example...

```
f2# zpool create rz1 raidz da1 da2 da3 da4
f2# zpool list
NAME   SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
rz1   3.97G   138K  3.97G     0%  ONLINE  -

f2# mount
/dev/da0s1a on / (ufs, local)
devfs on /dev (devfs, local, multilabel)
rz1 on /rz1 (zfs, local)

f2# cat /etc/hast.conf
 resource rz1 {
    on f1 {
         local rz1
         remote tcp4://10.1.10.90
    }
    on f2 {
         local rz1
         remote tcp4://10.1.10.89
    }
}

f2# hastctl create rz1
[ERROR] [rz1] Unable to open rz1: No such file or directory.

# that didn't work, so now try /rz1 in hast.conf

f2# cat /etc/hast.conf
 resource rz1 {
    on f1 {
         local /rz1
         remote tcp4://10.1.10.90
    }
    on f2 {
         local /rz1
         remote tcp4://10.1.10.89
    }
}

f2# hastctl create rz1
[ERROR] [rz1] Unable to open /rz1: Is a directory.
```

In the wiki it does mention to "export/import" the ZFS but no examples of such. I know how to export/import, but that's generally only to transfer to another system. I did it anyway, but you know, just exporting/importing a zpool in the same system doesn't do much, so I don't know how it is meant to be applied in this tutorial.

Then is there anyway to use HAST with ZFS and not use a ZVOL? Is there another way to address my zpool in HAST that I am missing?

update: I found some examples of create a hast device and then laying ZFS over it. What I want is to create a raidz and have it replicated. If I do this over a bunch of HAST devices instead of physical devices to create the raidz ...well, sounds pretty sketchy to me.

thanks!


----------



## phoenix (Mar 1, 2011)

HAST works on GEOM providers.  A ZFS pool is not a GEOM provider; a pool is made up of GEOM providers, and is the top layer of a storage stack.

HAST works best when done below the pool, so that each GEOM provider that makes up the pool is identical.

HAST is designed to be the bottom layer in the storage stack.


----------



## Pfarthing6 (Mar 1, 2011)

So, Phoenix, does that mean that I "should" create a hast device for each disk, then create my raidz out of those? 

I tried experimenting with that but for some reason hastctl create did not create the expected /dev/hast/<resource>. Maybe a different problem that I need to sort out.

Also, I've used GEOM, like to make a mirror, but I don't want to use it for a soft raid for a bunch of disks. That's what ZFS/raidz does best! When you say GEOM provider then, is that any disk device that FreeBSD sees or only those specifically managed by GEOM?

Update: I went ahead and tried creating hast devices for all my drives and then made a raidz on top of that. It worked. The second node of course doesn't know about my zpool. Tried doing an import, but the /dev/hast dir doesn't exist and no hast devices. The docs said something about only the primary getting such devices. So, I must be missing to get the secondary setup right?


----------



## phoenix (Mar 1, 2011)

You create the hast devices on the primary box.  You create the ZFS pool using the /dev/hast/*.

Then on the secondary box, you configure hast (be sure to point it at the correct local disk devices), set the role to secondary, and have it connect to the primary box.  Once the connection is established, you can check the status and see that it is syncing (copying the data from the primary).

You won't see the /dev/hast/* devices on the secondary box until it becomes primary.  Once it's become primary, and the hast devices appeared, then you can import the pool.

There's lots of information on how to do this here, here, and here.


----------



## Pfarthing6 (Mar 3, 2011)

I got some of the basics setup, created my hast devices and created a zpool. When the zpool comes back to life though, it gives me an error message about ensuring that all disks are online. Should I script the export of the zpool before the hast daemon is shutdown/restarted? Or is it safe to just clear the message and ignore it?

Picking through the howtos now and got a hastd startup script with device configuration for all the drives so far.


----------



## Pfarthing6 (Mar 3, 2011)

Pfarthing6 said:
			
		

> I got some of the basics setup, created my hast devices and created a zpool/raidz on top of 4 of them. When the zpool comes back to life though, it gives me an error message about ensuring that all disks are online. Maybe I'll have to script the export of the zpool before the hast daemon is shutdown/restarted?
> 
> Picking through the howtos now and got a hastd startup script with device configuration for all the drives so far.



Update: It works! Now I get the part about importing and exporting the zpool too. Export on the primary before either restarting hastd or switching roles to secondary. Import it on the primary.

I haven't sorted the startup and carp failover scripts yet, but I can run iftop and see the replication between nodes. Very fly!


----------

