# ZFS and disk labeling question



## craigyk (Aug 10, 2012)

So I've read a lot of advice about always making sure to give ZFS the "whole disk", but I'd also like to label the drives using something like gpt labels so they'll be easier to located in this 70 drive MDS600 I've bought.

Are these two wants in contradiction?   If I label with gpt but still give ZFS the full disk will things still be "OK"?  Or is the advice to always use whole disks outdated or no longer as important?

Thanks.


----------



## rabfulton (Aug 11, 2012)

I believe it is ok to use bsdlabel on whole disks and then create your array using the labels.


----------



## phoenix (Aug 11, 2012)

There's nothing wrong with using GPT to create a partition on the disk, especially if you want to align access to the disk sectors. Then label the partition. And then use 'gpt/labelname' as the disk device for creating the vdevs:

`# zpool create mypool raidz2 gpt/disk-a1 gpt/disk-a2 gpt/disk-a3 gpt/disk-a4 ...`


----------



## kangaroo (Aug 20, 2012)

Hi.

I'm relatively new to FreeBSD with no experience in this area.  However, I've spent half of my day reading up on labels, and have a few comments.  I'm hoping that someone like phoenix who is more experienced with labels might have some more to say on this topic.

I started the day with the same problem as craigyk, though smaller -- namely, I have a 16 disk storage unit with da0 through da15 that has wacky numbering (eg. da4 is disk 9).  Now, up until today, I was just using whole devices (eg. /dev/da4) in my zpool create because that's what the FreeBSD documentation demonstrated, that's what the various Solaris-based ZFS tutorials mention, etc, and after all, it seems "easiest", even though it meant that in the case of replacing a disk, I had to be extra extra careful that I was removing the right disk.

Later, I read that if you use full device names in creating a zpool, and the disk mapping changes that this can cause problems.  For example, if you remove disks from your chassis that maybe have nothing to do with your zpool, and you reboot, then when say, /dev/da4 which is part of a raidz1 vdev changes from being disk 9 to disk 7 that the zpool will be broken?  Is that really true?  Is shuffling the devices in the storage array enough to confuse ZFS from finding the disks that make up a zpool if you don't use labels?  If this is the case, then surely labeling is something that is really more than just convenient -- almost downright "necessary"!!

But labeling seems to have its fair share of "issues" as well.  The first method I read about was labeling full disks using "glabel" (GEOM-based labels).  Many people seem to be using the glabel method of labeling on full disks (as opposed to partitioned disks) since full disks are recommended for ZFS for anything but the boot volumes.  

Using glabel in itself seemed easy enough, but after reading through enough pages, I started to read about problems that *some* people had with that.  Apparently, glabel writes the label to the last block on the device, and if you are going to write to the last block of the device, and you're giving ZFS the whole device, then you may in fact be "corrupting" the filesystem, and this could come back to bite you at some point.  There are people who say this isn't a problem, and there are people who say that this is a problem.  ugh.  I can understand how it might be a problem though, although it would clearly make sense to me, not knowing how ZFS works, would be for ZFS to automatically shave off a few blocks from the end of the disk to ensure that this problem couldn't happen and that labels and ZFS could co-exist on full disks without partitions, but I'm sure there are reasons why this doesn't happen.

In addition, I read that apparently this type of GEOM labeling isn't supported on OpenSolaris/Solaris -- so if you have problems with FreeBSD and decide to move to OpenSolaris/Solaris, it's not clear if you can?  I did read that the labels will be overwritten by OpenSolaris/Solaris.  I am certain that I'd want to know if the system that I use to label my disks would make my zpool incompatible with other ZFS implementations.

One solution to at least the writing the label to the end of the full disk problem with GEOM seems to be to use a partition on the disk.  Adding a partition and having it encompass the whole disks seems to "complicate" the issue of adding new ZFS disks to a system.  I mean, no, partitions aren't complicated in themselves -- they're just containers and all, but having to partition the disk just to place a label on it seems a little overkill to me, but something I would do if I had to.  It just seems easier to give ZFS a disk, and have it go to work.  After all, if I partition my disks, then I guess I can't rely on ZFS to automatically replace disks with spares because I need to partition the disk, and re-label the disk dependent on its location in the disk chassis!   

I then read various stuff about GPT partitions and labels.  It's not clear whether using a GEOM label or GPT label is better.  One page said that GEOM labels are not compatible with OpenSolaris/Solaris and you better use GPT labels, while the other page says that you better use GEOM labels for compatibility.  It's the classic case of google confusion..   The problem is, I don't know what to believe.

So, over the years, the recommendations on FreeBSD disk labeling seem to be changing.  In August 2012, I'm looking for the best label mechanism for disks that would allow exporting the ZFS pool to OpenSolaris/Solaris later if need be.  If I need to partition individual disks to label them, I want to understand how use of an "automated" spare would occur?  Now, phoenix also mentioned sector alignment -- that's another tricky area with a lot of conflicting information.  If I *have* to partition my disks for labeling, I might as well worry about proper sector alignment at the same time, so any referral to a good document would be appreciated here as well.

Thanks for any advice that you may have!

Jason.


----------



## wblock@ (Aug 20, 2012)

Labels are safe provided you use them correctly.  Label first, then use the label device.  Example:
`# glabel label zoot /dev/ada0`

That writes a label to the last block of /dev/ada0.  The label device is /dev/label/zoot.  Compare sizes and you will see that zoot is one block smaller than ada0.  Use /dev/label/zoot all you want, it's safe.  But don't label a device and then keep using the raw device.

GPT labels are even better, because they don't use that last block of the device.  But GPT has partition tables at the beginning and end of the drive, so can have other conflicts.  Should not be a problem with ZFS, but I haven't tested.

For compatibility, I wouldn't expect any other operating system to understand either GPT or GEOM labels.


----------



## craigyk (Aug 20, 2012)

kangaroo said:
			
		

> Later, I read that if you use full device names in creating a zpool, and the disk mapping changes that this can cause problems.  Is shuffling the devices in the storage array enough to confuse ZFS from finding the disks that make up a zpool if you don't use labels?  If this is the case, then surely labeling is something that is really more than just convenient -- almost downright "necessary"!!



My understanding is ZFS puts all kinds of metadata on the disks that keeps track of this stuff.  So it doesn't matter if the physical device name changes, because when ZFS scans the disk at known locations and will know what pool and vdev the disk belongs too.  This means using physical disks for zfs should be cross-platform.  One reason for recommending the use of full drives is that using partitions requires that the OS support the partitioning scheme before ZFS can scan the partitions and determine their pool status, if any.



			
				kangaroo said:
			
		

> Using glabel in itself seemed easy enough, but after reading through enough pages, I started to read about problems that *some* people had with that.  Apparently, glabel writes the label to the last block on the device, and if you are going to write to the last block of the device, and you're giving ZFS the whole device, then you may in fact be "corrupting" the filesystem, and this could come back to bite you at some point.  There are people who say this isn't a problem, and there are people who say that this is a problem.  ugh.  I can understand how it might be a problem though, although it would clearly make sense to me, not knowing how ZFS works, would be for ZFS to automatically shave off a few blocks from the end of the disk to ensure that this problem couldn't happen and that labels and ZFS could co-exist on full disks without partitions, but I'm sure there are reasons why this doesn't happen.



My guess is that ZFS leaves certain blocks alone near the start and end, and people exploit this to allow the use of labels like glabel to exist on zfs devices.  I'm not sure if this a documented part of ZFS though.  It might be possible that in the future they'll decide to use these blocks in which case either the label will be overwritten, or labeling a drive in use will screw up ZFS data.



			
				kangaroo said:
			
		

> So, over the years, the recommendations on FreeBSD disk labeling seem to be changing.  In August 2012, I'm looking for the best label mechanism for disks that would allow exporting the ZFS pool to OpenSolaris/Solaris later if need be.  If I need to partition individual disks to label them, I want to understand how use of an "automated" spare would occur?  Now, phoenix also mentioned sector alignment -- that's another tricky area with a lot of conflicting information.  If I *have* to partition my disks for labeling, I might as well worry about proper sector alignment at the same time, so any referral to a good document would be appreciated here as well.



Really, it would be nice if ZFS had it's own editable labeling scheme for disks, vdevs in a pool, but even so I think labels are a poor way to manage and identify drives in large enclosures.  Personally I'm waiting for some of the support being worked on to match drives with their slots in enclosures using ses, and even better integration of this with ZFS, so that ZFS for example could switch on error light in enclosures for identifying drives.


----------



## kpa (Aug 21, 2012)

> My guess is that ZFS leaves certain blocks alone near the start and end, and people exploit this to allow the use of labels like glabel to exist on zfs devices. I'm not sure if this a documented part of ZFS though. It might be possible that in the future they'll decide to use these blocks in which case either the label will be overwritten, or labeling a drive in use will screw up ZFS data.



This is not possible, the drive is locked by ZFS and trying to label it with glabel(8) will fail. You can however label the drive before using it in a ZFS pool, the size of the disk as seen by ZFS will be one sector smaller than the raw unlabeled disk so ZFS won't even know that there is one more extra block after the last block it uses.


----------



## kpa (Aug 21, 2012)

wblock@ said:
			
		

> For compatibility, I wouldn't expect any other operating system to understand either GPT or GEOM labels.



I thought that GPT labels would be universally supported by operating systems that support GPT partitioning? It can't be a FreeBSD specific extension to GPT?


----------



## Sebulon (Aug 21, 2012)

craigyk said:
			
		

> My understanding is ZFS puts all kinds of metadata on the disks that keeps track of this stuff.  So it doesn't matter if the physical device name changes, because when ZFS scans the disk at known locations and will know what pool and vdev the disk belongs too.


Correct. The labeling is more for your own sake, so you donÂ´t mistakenly pull the wrong one while replacing a failed drive.



			
				craigyk said:
			
		

> This means using physical disks for zfs should be cross-platform.  One reason for recommending the use of full drives is that using partitions requires that the OS support the partitioning scheme before ZFS can scan the partitions and determine their pool status, if any.


YouÂ´d think that, but actually IÂ´ve noticed that Solaris required itÂ´s partitions to start at block 2048(1MiB) to see the drives and be able to cleanly import a pool created in FreeBSD. Solaris also uses GPT.

/Sebulon


----------



## wblock@ (Aug 21, 2012)

kpa said:
			
		

> I thought that GPT labels would be universally supported by operating systems that support GPT partitioning? It can't be a FreeBSD specific extension to GPT?



I suspect (but have not verified) that GPT labels come from the "partition name" field of a GPT partition entry.

Whether other operating systems will make that field available or usable as identifiers from within the operating system rather than just in a partition editor... put me in the "cynical" category.


----------



## kangaroo (Aug 21, 2012)

Sebulon said:
			
		

> Correct. The labeling is more for your own sake, so you donÂ´t mistakenly pull the wrong one while replacing a failed drive.
> 
> 
> YouÂ´d think that, but actually IÂ´ve noticed that Solaris required itÂ´s partitions to start at block 2048(1MiB) to see the drives and be able to cleanly import a pool created in FreeBSD. Solaris also uses GPT.
> ...



So, this is to say that if I use full disks to form my zpool, then I wouldn't be able to export the pool and then import it to say, a Solaris system? ugh.  

You're saying that Solaris uses GPT, but I keep reading that to ensure compatibility between moves between FreeBSD and Solaris that you should not use GPT, but in your experience today this works? (because so much of the documentation out there is old and outdated...)

Unless I'm wrong, using GPT partitions means that you can't have the zpool replace disks automatically as they fail because you need to manually need to reconfigure and relabel the disk, then replace it?

I wish there was a "best practices" document for creating ZPOOLs with GPT, enabling labels, and documented examples of exporting to other O/S.  The information on the web in this area is extremely confusing and in many cases outdated.


----------



## kangaroo (Aug 21, 2012)

By the way, here's one example that I'm referring to...

http://www.napp-it.org/downloads/free-bsd_en.html

In particular, quoted from that page ...



> If the disks were formatted in FreeBSD with GPT partitions (which FreeBSD recognizes but Solaris doesn't), you cannot import to Solaris.
> 
> Only pools with disks formatted with GEOM can be exported from FreeBSD/FreeNAS/ZFSGuru and reimported into Solaris.


----------



## Savagedlight (Aug 21, 2012)

I've created zpools on FreeBSD using GPT partitions with GPT labels, exported and re-imported the pool. When I didn't tell it to look in /dev/gpt for the devices, it imported the pool using the partitions (/dev/da0s1 etc).

IIRC, the labels are in the partition table which is located at the first block of the device (and backed up to last block) - and as such, this shouldn't cause any problems.

I think it's reasonable to say that, if my above observations hold true (I think they do, but I may have observed wrongly), there shouldn't be any compatibility problems creating a zpool on FreeBSD using GPT labels, as long as the alternate system also handles GPT partitions, since the pool can be re-imported using the raw partitions.


----------



## Sebulon (Aug 23, 2012)

kangaroo said:
			
		

> By the way, here's one example that I'm referring to...
> 
> http://www.napp-it.org/downloads/free-bsd_en.html
> 
> In particular, quoted from that page ...



ItÂ´s not "zpool" that does the replacing in Solaris, itÂ´s a daemon. I could be wrong, but the daemon makes sure to partition the drive you put in and uses that partition in a "zpool replace" command to make it look seamless. I remember reading about someone trying to script something similar for FreeBSD, but that was a while ago and havenÂ´t read anything about that since.

I havenÂ´t had the time to verify this, IÂ´m going to as soon as I get the time. But IÂ´m pretty sure that Solaris is hard-coded to partition with GPT at boundary 2048. IÂ´m going to test in a virtual machine installing FreeBSD, doing one mirror pool with GPT partitions at boundary 0 and another mirror with GPT at boundary 2048, then boot up Solaris 11 and see which pool is importable.

/Sebulon


----------



## Sebulon (Aug 23, 2012)

OK, it turns i was *wrong* about SolarisÂ´s partition alignment. My bad, I had gotten that backwards. So this turned out to be a very healthy exercise

First, I installed FreeBSD-9.0-RELEASE on da0 with plain UFS. When booted, I created GPT on da1,2,3 and 4, and created GPT partitions inside them, with labels. The first two, I made aligned at 1MiB and stop aligned to 4k(-b 2048 -a 4k) and the second two, I just made the partitions span the entire drive:
gpart show

Then I created gnop devices ontop of the gpt labels that had 4k sector size, then used those gnopÂ´s to create the pools, called "aligned" and "unaligned":
zpool status

The VM was created with an emulated LSI SAS controller but for some reason Solaris didnÂ´t like that, so I had to power it down and reconfigure it to a Parallel controller instead. Then, when it had booted from the live-CD install, I started a Terminal and tried to import the two pools:
zpool import

As you can see, the pool created with aligned partitions is marked as FAULTED, while the pool created unaligned was OK, and I was also able to import it successfully afterwards.

The lesson learned here is that *a)* you can use GPT partitions and labels to create a pool in FreeBSD and import it cleanly into Solaris(or derivative), but *b)* only if the partitions span the entire drive.

/Sebulon


----------



## kangaroo (Aug 23, 2012)

*opensolaris and freebsd*

Thanks Sebulon for doing that testing.

I did some testing of my own as well with interesting results.

I created 3 zfs pools comprised of 2 x 2 disk vdevs under FreeBSD as follows...


using only straight physical devices,
using gpart label attached to straight physical devices,
using 1 gpt partition comprised of most of the disks (disks were 750GB, but I used -s 500G for testing) and gpt labels

I exported the ZFS pools and booted into a SmartOS live CD.  All 3 pools imported perfectly automatically!

With option 1, obviously FreeBSD device names are replaced by Solaris device names, but rather than seeing individual devices as cXtXd0 devices as I would have expected (full disk), they all show up as cXtXd0p0 devices.

With option 2, SmartOS didn't see the gpart labels, but imported the pool fine.  The devices were as above - cXtXd0p0.  This is what I expected to happen since Solaris doesn't support those labels anyway as wblock said.  One thing I'm not certain about is whether encoded in the ZFS pool is the fact that the last block of those disks is not to be used.  Sure, SmartOS can't read the labels, but I wonder whether ZFS on SmartOS knows not to write into the last block of those disks? 

With option 3, the gpt version, and pretty much the same test that Sebulon did, the import was successful as well.  GPT labels are not recognized under SmartOS.  The gpt version shows up as an EFI partition table on SmartOS with 1 partition.    The devices were all cXtXd0 devices.

Interestingly enough, if I create a ZFS pool in SmartOS, the partition table shows the same as the option 3 created above, yet that pool when exported will not import back to FreeBSD. On FreeBSD, when I try to import the pool via "zfs import apool", I get "cannot import apool : no such pool available".  However, I have since read that there are issues with creating a ZFS filesystem under OpenSolaris type systems and importing in FreeBSD due to gpt incompatibilities.  Very odd.  I went back into SmartOS just to verify that I could indeed see the pool there, and then back into FreeBSD where I could not see the pool.  FreeBSD gpart list did not show the gpt disk created under Solaris anyway.

Of course the results when importing to a SPARC-based Solaris would be different, but I wanted to share my testing and include with this thread so that if anyone else wonders about this, that the answer is more clear.

Jason.


----------



## Sebulon (Aug 23, 2012)

@kangaroo



> 2. using gpart label attached to straight physical devices,



Now, by that, do you actually mean glabel(8)()?

Because glabels or not gpt labels.

/Sebulon


----------



## kangaroo (Aug 23, 2012)

Sebulon said:
			
		

> @kangaroo
> 
> Now, by that, do you actually mean glabel(8)()?
> 
> ...



Sorry .. good catch ... yes.. that was my typo.  I used:


```
glabel label <disk name> <disk device>
```
 for each device in step 1.

I wanted to test specifically what would happen when I imported that filesystem under OpenSolaris, and it worked as well .... (though as I said, I'm not sure what would happen when filling that filesystem -- would the label get "overwritten" or would the filesystem on Solaris stay within the bounds minus the label.


----------



## Sebulon (Aug 24, 2012)

kangaroo said:
			
		

> for each device in step 1.



Now by that, do you actually mean "step 2"?



> 2. using gpart label attached to straight physical devices



/Sebulon


----------

