# A FreeBSD install for ZFS root script,support RAID0/1/5/10 and 4k alignment



## iceblood (Nov 24, 2012)

This is a FreeBSD install for ZFS root script.
https://code.google.com/p/iceblood/source/browse/FreeBSD_ZFS

Welcome to test.


boot from CD-ROM
select "Shell"
Setting network:

```
ifconfig ETH x.x.x.x netmask 255.255.255.0
route add default x.x.x.y
mkdir /tmp/bsdinstall_etc
echo 'nameserver 8.8.8.8' > /etc/resolv.conf
```

Download script:

```
cd /tmp
fetch [url]http://iceblood.googlecode.com/svn/FreeBSD_ZFS/freebsd_zfs_inst.sh[/url]
chmod 555 freebsd_zfs_inst.sh
```

run script:

```
./freebsd_zfs_inst.sh
freebsd_zfs_inst.sh {normal|raid1|raid5|raid10}
normal  <---- stripe mode
raid1  <---- mirror mode
raid5  <---- raidz1 mode
raid10   <---- mirror and stripe mode
```







thanks for Sebulon.


----------



## vermaden (Nov 25, 2012)

If You would like to tweak You script to use the ZFS Boot Environments with sysutils/beadm, then check instructions from this howto (generally instructions from this howto can be directly put into the script): http://forums.freebsd.org/showthread.php?t=31662

Also, think about adding mirror option for Your script, and RAID10 possibly.


----------



## gkontos (Nov 25, 2012)

You are using wrong gpart() values.
You are missing the 4K alignment.
*Do not* hardcode any tuning.
*Do not* play with vm.kmem_size


----------



## wblock@ (Nov 25, 2012)

iceblood said:
			
		

> ```
> echo 'nameserver 8.8.8.8' > /etc/resolv.conf
> ```



The privacy implications of using Google's public DNS server ought to be mentioned.



> ```
> chmod 555 freebsd_zfs_inst.sh
> ```



Why not just chmod +x ?


----------



## iceblood (Nov 26, 2012)

wblock@ said:
			
		

> The privacy implications of using Google's public DNS server ought to be mentioned.
> 
> 
> 
> Why not just chmod +x ?


This is a temp DNS,reboot after clear


----------



## iceblood (Nov 26, 2012)

now add raid10 done.
./freebsd_zfs_inst.sh raid10


----------



## Sebulon (Nov 26, 2012)

gkontos said:
			
		

> *Do not* play with vm.kmem_size



In general yes, but in this case I think it falls within best practice though, since it is only applied if system is i386. And looks like it was basically copy/pasted from FreeBSD ZFS Wiki.

@iceblood

Nice work man! A few observations;


gpart has a flag to wipe a disk even if there are partitions on it; gpart destroy -F
I usually use tmpfs for /tmp, and youÂ´d need an entry in fstab for that.
These tunings are unnecessary for amd64, in my opinion:

```
vfs.zfs.prefetch_disable=0
vfs.zfs.vdev.cache.size="10M"
```
Leave that to the OS to automatically tune them on amd64.
As gkontos said, you need to partition 4k aligned. And use the "gnop-trick" to get ashift=12 on every vdev in the pool. Otherwise performance will be severly crippled for people with 4k(Advanced Format) drives.
You should use labels instead of the partition names when creating the pool.
I may be alone on this, but I follow the Solaris way to create first a root filesystem for / in every pool, like pool/*root*/usr, pool/*root*/var, etc. I canÂ´t remember exactly from where I read it, definitely a Sun/Oracle ZFS document (maybe the admin guide) that it should be considered bad practice using that top(pool) filesystem for anything, but IÂ´ve forgotten why. Might have been because there are values that you cannot change, or isnÂ´t there on the top(pool) filesystem.
Also, like Solaris, I use a more generic name for the pool, like "system", "rpool", or "pool0".

@vermaden

Would creating a separate root filesystem be interfering with the Boot Environments philosophy? Just checking...

/Sebulon


----------



## vermaden (Nov 26, 2012)

Sebulon said:
			
		

> @vermaden
> 
> Would creating a separate root filesystem be interfering with the Boot Environments philosophy? Just checking...


For Boot Environments You need that schema: ${POOL}/ROOT/${BENAME}, and You need to boot from that pool using the bootfs=${POOL}/ROOT/${BENAME} property set to that. Of course that is changed by the beadm script for different BE.

You can of course add these AFTER the installation, even if the root (/) was placed directly on zroot for example. I have made beadm smart enought, so You can zfs send|zfs recv the BE from other system and as well from local system and after activation, it will just work (beadm takes care about /boot/zfs/zpool.cache thingy).

But IMHO its far more better to setup that from the start. Boot Environments (even in its limited - without boot menu form) best thing since sliced bread, You can do EVERYTHING to the working system and have a time machine that will take You back if You mess something.


----------



## iceblood (Nov 26, 2012)

Sebulon said:
			
		

> In general yes, but in this case I think it falls within best practice though, since it is only applied if system is i386. And looks like it was basically copy/pasted from FreeBSD ZFS Wiki.


Oh...is not copy or pasted,the value only for i386, and from my experience.



			
				Sebulon said:
			
		

> Nice work man! A few observations;
> 
> 
> gpart has a flag to wipe a disk even if there are partitions on it; gpart destroy -F
> ...


Thanks for the observations.
about 4k alignment,I gradually improve.
about tmpfs I do not agree with.


----------



## Sebulon (Nov 27, 2012)

vermaden said:
			
		

> For Boot Environments You need that schema: ${POOL}/ROOT/${BENAME}...



Sweet! So it was quite the opposite then

/Sebulon


----------



## iceblood (Nov 28, 2012)

now 4k alignment added.


----------



## Sebulon (Nov 28, 2012)

iceblood said:
			
		

> now 4k alignment added.



More like "now 4k alignment added*?*"

Because I could only see it added in one case called "normal)", that should instead be called "stripe)". ThereÂ´s nothing normal with creating a striped raid, you have *no* redundancy whatsoever. The slightest error on any of the disks in the pool and you would be toast. Stripe is dangerous and therefore should be clear as to what it is you are choosing.

ItÂ´s good you used gnop to create a 4k provider but there are a few more steps that needs to be done to have it truly 4k optimized. And use labels when creating the pool. Let me give you an example:

`# gpart create -s gpt da0`
`# gpart create -s gpt da1`
`# gpart create -s gpt da2`
`# gpart create -s gpt da3`
`# gpart add -t freebsd-boot -s 64k da0`
`# gpart add -t freebsd-boot -s 64k da1`
`# gpart add -t freebsd-boot -s 64k da2`
`# gpart add -t freebsd-boot -s 64k da3`
`# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da0`
`# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da1`
`# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da2`
`# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da3`
`# gpart add -t freebsd-zfs -l disk0 -b 2048 -a 4k da0`
`# gpart add -t freebsd-zfs -l disk1 -b 2048 -a 4k da1`
`# gpart add -t freebsd-zfs -l disk2 -b 2048 -a 4k da2`
`# gpart add -t freebsd-zfs -l disk3 -b 2048 -a 4k da3`
`# gnop create -S 4096 /dev/gpt/disk0`
`# gnop create -S 4096 /dev/gpt/disk2`
`# zpool create -o autoexpand=on pool0 mirror gpt/disk0.nop gpt/disk1 mirror gpt/disk2.nop gpt/disk3`
`# zpool export pool0`
`# gnop destroy /dev/gpt/disk0.nop`
`# gnop destroy /dev/gpt/disk2.nop`
`# zpool import -d /dev/gpt/ pool0`

This example was for a RAID10, and I showed you this because you only *need* to have gnop'ed every *first* disk in each vdev, since the ashift value is set on vdev basis. But since youÂ´re using a "for" it might be easier just to do them all; your call.

/Sebulon


----------



## iceblood (Nov 28, 2012)

all add 4k alignment at yesterday.
Please see 78 line.


----------



## iceblood (Nov 28, 2012)

Must be " -a 4K " is effective?


----------



## Sebulon (Nov 28, 2012)

iceblood said:
			
		

> Must be " -a 4K " is effective?



Yes. " -l diskX *-b 2048 -a 4k* ".

ashift != alignment. But it needs to be used as well.

/Sebulon


----------



## iceblood (Nov 28, 2012)

Oh..I see.
thanks.add at late.


----------



## iceblood (Nov 28, 2012)

Must be "-l diskX" too?


----------



## Sebulon (Nov 28, 2012)

iceblood said:
			
		

> Must be "-l diskX" too?



https://forums.freebsd.org/showpost.php?p=198755&postcount=12

Read that again, two more times

/Sebulon


----------



## jem (Nov 28, 2012)

It dismays me to see yet another FreeBSD-on-ZFS installation article promoting the use of the top-level dataset for the root filesystem.

It's bad practice and seriously limits flexibility to change things around later.

Sun used the dataset 'rpool/ROOT/solaris' for the root filesystem in Solaris for a good reason.


----------



## Sebulon (Nov 28, 2012)

@jem

IÂ´m not alone, yeay

/Sebulon


----------



## iceblood (Nov 28, 2012)

But i think "-l" is not must.because it is lable.


----------



## gkontos (Nov 28, 2012)

jem said:
			
		

> It dismays me to see yet another FreeBSD-on-ZFS installation article promoting the use of the top-level dataset for the root filesystem.
> 
> It's bad practice and seriously limits flexibility to change things around later.
> 
> Sun used the dataset 'rpool/ROOT/solaris' for the root filesystem in Solaris for a good reason.



Solaris uses BE so placing the root pool in the top level is not possible. Do you think there are other limitations in this practice?


----------



## Sebulon (Nov 28, 2012)

iceblood said:
			
		

> But i think "-l" is not must.because it is lable.



It is very bad practice using the raw partition names.

Imagine a person has:

HDD0 -> ada0
HDD1 -> ada1
HDD2 -> ada2

Then HDD0 dies and for whatever reason person reboots server. The pool relation now looks like:

HHD0 -> dead
HDD1 -> ada0
HDD2 -> ada1

Now, in worst case, ZFS will respond like "WTF just happened!?!?", refuse to boot and curls up in a fetal position, feeling very sorry for itself.

This is just one scenario. 



If person had used labels (as he should have), it would instead have looked like:

HDD0 -> disk0(ada0)
HDD1 -> disk1(ada1)
HDD2 -> disk2(ada2)

Then HDD0 dies and for whatever reason person reboots server. But this time, the pool relation is unchanged:

HHD0 -> disk0,dead
HDD1 -> disk1(ada0)
HDD2 -> disk2(ada1)

ZFS happy

Now, ZFS is supposed to have mechanisms to prevent these sort of errors but IÂ´ve seen that fail. Better safe than sorry.

/Sebulon


----------



## iceblood (Nov 28, 2012)

Oh...I see.thanks for your advise.


----------



## jem (Nov 28, 2012)

gkontos said:
			
		

> Solaris uses BE so placing the root pool in the top level is not possible. Do you think there are other limitations in this practice?



It's often cited as good practice to keep your system files and data files logically separate.  On a single-pool ZFS system, this means keeping them in different branches of your ZFS hierarchy, but if you're using the top level dataset then you can't do that.

Take the following example of a dataset hierarchy:


```
rpool				(container dataset - not mounted)
rpool/ROOT			(container dataset - not mounted)
rpool/ROOT/freebsd		OS root filesystem - mounted at /
rpool/ROOT/freebsd/usr		OS /usr filesystem
rpool/ROOT/freebsd/var		OS /var filesystem
rpool/DATA			(container dataset - not mounted)
rpool/DATA/home			Home directory container, mounted at /home
rpool/DATA/home/joe		Joe's homedir
rpool/DATA/mediafiles		Media files, music, movies etc
rpool/DATA/database		MySQL files
rpool/DATA/www			Webserver content
rpool/SWAP			zvol for swapspace
```

Here, the top-level rpool dataset isn't used for storing any files.  It's just a container for more datasets.  Likewise, rpool/ROOT and rpool/DATA are also containers.  These containers split the ZFS hierarchy into two main branches and allow you to manage them more independently of eachother.

Now if I want to make a recursive snapshot of only my OS files, I can 'zfs snapshot -r rpool/ROOT@snapname' and my data files aren't touched.

I could also ZFS send all my data files by recursively sending rpool/DATA, without including any OS files.

If I want to install a new version of FreeBSD alongside the existing version and switch between them, I can create a new rpool/ROOT/freebsd10 dataset _alongside_ rpool/ROOT/freebsd and install to that.  That wouldn't be possible if I had used the top-level dataset for my OS root filesystem.

In general, it just eases management and flexibility to do things this way, and I suspect it's why Sun did it.


----------



## iceblood (Nov 29, 2012)

added to disk lable.


----------



## gkontos (Nov 29, 2012)

jem said:
			
		

> If I want to install a new version of FreeBSD alongside the existing version and switch between them, I can create a new rpool/ROOT/freebsd10 dataset _alongside_ rpool/ROOT/freebsd and install to that.  That wouldn't be possible if I had used the top-level dataset for my OS root filesystem.



I think this is the most interesting part of all. And given the fact that FreeBSD uses /local  for almost all software installed that would give us the chance to actually use into 2 different versions with the same software. 

This is a very interesting approach indeed.


----------



## vermaden (Nov 29, 2012)

gkontos said:
			
		

> This is a very interesting approach indeed.



This is ZFS Boot Environments approach


----------

