# Appending more storage space to the file system



## pietrasm (Jul 25, 2013)

Hi,

I have a 250 GB HDD in my home server. I need more space so I have just bought a new 4 TB HDD.

However, I am not sure how to add a new volume to the existing file system. Most of space on existing HDD is occupied by the home directory. I figured to just mount a new HDD to /usr/home. Is it a good idea? What's the best way to copy existing home directory to a partition on the new HDD? Is just using cp(1) a good idea?

Another issue is how to add a next HDD in the future? I would like to avoid mounting it to a different path and having two subtrees of the file system segmented between two HDDs. Is it better to use software RAID 0 or ZFS for this purpose? Will it be possible to create a RAID/ZFS volume out of two HDDs without loosing data on one of them? Maybe, is it a better idea to create such a volume with just one HDD now and add an another when it's needed?

Thanks.


----------



## wblock@ (Jul 25, 2013)

Unless there is a good reason to keep using the old drive, I would just copy everything to the new drive.  To set up a new drive, see Disk Setup On FreeBSD.  To back up or copy the old drive onto the new drive, see Backup Options For FreeBSD.

Some other observations:

ZFS *is* software RAID.
RAID0 is faster than a single drive but at least twice as likely to fail.  An SSD is much faster.
ZFS can probably grow a single-drive pool by adding another single drive (untested).  That would be similar to RAID0, where there is no redundancy.  A three-drive RAID-Z arrangement is better, allowing any single drive to fail without data loss.


----------



## pietrasm (Jul 26, 2013)

wblock@ said:
			
		

> Unless there is a good reason to keep using the old drive, I would just copy everything to the new drive.  To set up a new drive, see Disk Setup On FreeBSD.  To back up or copy the old drive onto the new drive, see Backup Options For FreeBSD.


The 250 GB HDD is a mid-end grade HDD. That's why I would like to keep it and use the cheaper 4 TB HDD only for home directories.


			
				wblock@ said:
			
		

> Some other observations:
> 
> ZFS *is* software RAID.


Is there any ZFS-independtent software RAID implementation?


			
				wblock@ said:
			
		

> RAID0 is faster than a single drive but at least twice as likely to fail.  An SSD is much faster.


I dont' care about speed. The most important thing to me is to have one logical drive build on top of a few HDDs. I would like to avoid a need for distributing files between HDDs manually.


			
				wblock@ said:
			
		

> ZFS can probably grow a single-drive pool by adding another single drive (untested).  That would be similar to RAID0, where there is no redundancy.  A three-drive RAID-Z arrangement is better, allowing any single drive to fail without data loss.


If I have 3 X 4 TB HDDs how much usable storage I can obtain in 3-drive setup?

Thanks.


----------



## kpa (Jul 26, 2013)

RAID-Z is pretty slow unless you can stripe together multiple RAID-Z vdevs which would mean at least 6 disks to be efficient and redundant enough. If I were you I would get one more disk and do 2x2-disk mirror vdevs. Those would give you 8 TBs of storage and the performance would be more than acceptable.


----------



## wblock@ (Jul 26, 2013)

pietrasm said:
			
		

> The 250 GB HDD is a mid-end grade HDD. That's why I would like to keep it and use the cheaper 4 TB HDD only for home directories.
> 
> Is there any ZFS-independtent software RAID implementation?



Yes: gmirror(8), gstripe(8), gconcat(8).



> I dont' care about speed. The most important thing to me is to have one logical drive build on top of a few HDDs. I would like to avoid a need for distributing files between HDDs manually.



The risk is that a single drive failure could make data on the other drives inaccessible and unrecoverable.



> If I have 3 X 4 TB HDDs how much usable storage I can obtain in 3-drive setup?



RAIDZ with three drives gives 2/3 the total amount of space for data, so 8 TB.


----------



## pietrasm (Jul 26, 2013)

kpa said:
			
		

> RAID-Z is pretty slow unless you can stripe together multiple RAID-Z vdevs which would mean at least 6 disks to be efficient and redundant enough. If I were you I would get one more disk and do 2x2-disk mirror vdevs. Those would give you 8 TBs of storage and the performance would be more than acceptable.


I don't get it. Do you mean to build two ZFS's vdevs per HDD and than use all four of them as one logical drive without any redundancy?



			
				wblock@ said:
			
		

> Yes: gmirror(8), gstripe(8), gconcat(8).
> 
> The risk is that a single drive failure could make data on the other drives inaccessible and unrecoverable.


I am aware of this fact.

What about extending either ZFS or RAID 0 with more HDDs in the future without a need for backing up all data to another drive, recreating an array and restoring data from a backup? I read that it's possible for ZFS but not implemented yet. What about software RAID 0?


----------



## kpa (Jul 26, 2013)

I meant using two disks for each vdev in  mirror configuration and then stripe them together into a single pool. This would give RAID 1 redundancy for each vdev. This is how it would be done with zpool(8)

`zpool create tank mirror ada0 ada1 mirror ada2 ada3`

Assuming that ada0 trough ada3 are the four individual disks. There are however some issues with newer disks that use 4096 byte sectors but don't tell to the OS about it. With those drives, and I'm quite sure that your 4 TB disks are such drives, it is necessary to use proper alignment and sector size when creating the ZFS pool. Search the forums for details, I don't have a good link right now.


----------



## pietrasm (Jul 26, 2013)

kpa said:
			
		

> I meant using two disks for each vdev in  mirror configuration and then stripe them together into a single pool. This would give RAID1 redundancy for each vdev. This is how it would be done with zpool(8)
> 
> `zpool create tank mirror ada0 ada1 mirror ada2 ada3`
> 
> Assuming that ada0 trough ada3 are the four individual disks. There are however some issues with newer disks that use 4096 byte sectors but don't tell to the OS about it. With those drives, and I'm quite sure that your 4TB disks are such drives, it is necessary to use proper alignment and sector size when creating the ZFS pool. Search the forums for details, I don't have a good link right now.



As far as I understand, you suggest a solution that uses four 4 TB HDDs. I have only one 4 TB HDD.

After some more reading I would go for gconcat(8). It seems like one drive failure causes only lose of files on this drive. How are files spread between drives when using concat? Can I rely on a fact that most of files won't be segmented between more than one drive? Is it possible to extend concat volume by appending more HDDs without losing data?

Thanks.


----------



## wblock@ (Jul 26, 2013)

If one drive in a gconcat(8) array fails, it's effectively the same as a portion of a single hard drive failing.  It may not just be file contents that are lost.  If the failure happens in an important part of the filesystem, directory indexes and pointers to the files could be lost.  So the files are still there, there's just no way to locate them.


----------



## pietrasm (Jul 26, 2013)

wblock@ said:
			
		

> If one drive in a gconcat(8) array fails, it's effectively the same as a portion of a single hard drive failing.  It may not just be file contents that are lost.  If the failure happens in an important part of the filesystem, directory indexes and pointers to the files could be lost.  So the files are still there, there's just no way to locate them.



It doesn't sound too bad for my needs. I think I will go for it. Is there any way to locate and backup directory indexes and pointers to files or they are spread around the file system?

What about appending more HDDs?

Finally, what's the best way to copy /usr/home directory to a new HDD? Is using cp(1) sufficient to ensure that everything will be exactly copied including access rights, symlinks etc.?

Thanks.


----------



## wblock@ (Jul 26, 2013)

UFS has some backup directory information that might help in recovering after a fail.  Too often, it does not.  The failure rates in big hard drives is one of the things that is driving the acceptance of ZFS and RAID-Z.

cp(1) might be enough, with enough options.  dump(8)/restore(8) as shown in the link in post #2 is better.


----------



## pietrasm (Jul 27, 2013)

wblock@ said:
			
		

> UFS has some backup directory information that might help in recovering after a fail.  Too often, it does not.  The failure rates in big hard drives is one of the things that is driving the acceptance of ZFS and RAID-Z.
> 
> cp(1) might be enough, with enough options.  dump(8)/restore(8) as shown in the link in post #2 is better.



Thanks for all the help guys.

I have just created an UFS filesystem on a new drive and I got just 3.5 TB of space:

```
root@Server:/dev # df -h
Filesystem     Size    Used   Avail Capacity  Mounted on
/dev/ada0p2    224G    203G    3.3G    98%    /
devfs          1.0k    1.0k      0B   100%    /dev
/dev/ada1p1    3.5T    8.0k    3.2T     0%    /mnt
root@Server:/dev #
```
Why is that? How can I fix it?

Thanks.


----------



## wblock@ (Jul 27, 2013)

That drive started with 4,000,000,000,000 bytes of space, or about 3.6T.  A typical `newfs` reserves 8% of space, leaving 3.3T by my calculations.  I don't know why it shows 3.5T.

If you did not align the first partition on that drive, writes will be slow.


----------



## pietrasm (Jul 27, 2013)

wblock@ said:
			
		

> That drive started with 4,000,000,000,000 bytes of space, or about 3.6T.  A typical `newfs` reserves 8% of space, leaving 3.3T by my calculations.  I don't know why it shows 3.5T.


It shows 3.2 TB as available space. I guess that is correct.


> If you did not align the first partition on that drive, writes will be slow.


I did it exactly like it's described in the Handbook:
http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/disks-adding.html
What are benefits of aligning the first partition? How can I do this?

Thanks.


----------



## wblock@ (Jul 27, 2013)

pietrasm said:
			
		

> It shows 3.2 TB as available space. I guess that is correct.
> 
> I did it exactly like it's described in the Handbook:
> http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/disks-adding.html



That is correct as far as it goes.  I actually rewrote that section recently, and left alignment out because it would confuse the issue.



> What are benefits of aligning the first partition? How can I do this?



The benefit is that writes will go as fast as the drive can go.  If partitions are not aligned, writes can take twice as long.  (Write one 4K block to an aligned partition, and it writes a single block on the drive.  Misaligned, part of that data is in one disk block, and part in another.  So the drive has to read the first block, modify it, then write it back out, then repeat for the second.)

Please show the output from `gpart show ada1`.  To be aligned, the data partition must start at an even multiple of 4K.  Drives usually pretend to have 512-byte partitions even when they are really 4K.  The primary GPT table takes 34 512-byte blocks.  The next aligned spot is at block 40.  I suggest starting the first data partition at 1M, or block 2048, for compatibility with other operating systems.

Seagate has a patented method to avoid the need for alignment.  Other brands do not.


----------



## pietrasm (Jul 27, 2013)

wblock@ said:
			
		

> That is correct as far as it goes.  I actually rewrote that section recently, and left alignment out because it would confuse the issue.
> 
> 
> 
> ...




```
pietrasm@Server /u/h/pietrasm> gpart show
=>       34  488397101  ada0  GPT  (232G)
         34        128     1  freebsd-boot  (64k)
        162  478150528     2  freebsd-ufs  (228G)
  478150690    8388608     3  freebsd-swap  (4.0G)
  486539298    1857837        - free -  (907M)

=>        34  7814037101  ada1  GPT  (3.7T)
          34           6        - free -  (3.0k)
          40  7814037088     1  freebsd-ufs  (3.7T)
  7814037128           7        - free -  (3.5k)

pietrasm@Server /u/h/pietrasm>
```
It seems to be correct for the new HDD as it starts at block 40. However, it looks like partitions on the main HDD are not aligned correctly.

The first drive is a Seagate as well (Seagate Barracuda VB0250EAVER HPG7). Does it mean that alignment doesn't matter?

Thanks.


----------



## wblock@ (Jul 27, 2013)

Excellent!  As far as I know, only drives 1T or larger use 4K blocks, meaning partitions on the smaller drive do not need any special alignment.


----------



## pietrasm (Jul 27, 2013)

wblock@ said:
			
		

> Excellent!  As far as I know, only drives 1T or larger use 4K blocks, meaning partitions on the smaller drive do not need any special alignment.



That's great, thanks. How can I check block size just to be sure?

 Thanks.


----------



## wblock@ (Jul 27, 2013)

`diskinfo -v ada0 | grep stripesize` is a quick way.


----------



## pietrasm (Jul 27, 2013)

wblock@ said:
			
		

> `diskinfo -v ada0 | grep stripesize` is a quick way.




```
pietrasm@Server /u/h/pietrasm> diskinfo -v ada0 | grep stripesize
	0           	# stripesize
pietrasm@Server /u/h/pietrasm> diskinfo -v ada1 | grep stripesize
	4096        	# stripesize
pietrasm@Server /u/h/pietrasm>
```

What does zero mean? Does it mean that any alignment isn't necessary?

By the way, there is a huge amount of free space at the end of the first HDD. I have no idea why it's like this. Is there any way to shift swap partition and expand UFS partition?

Thanks again.


----------



## wblock@ (Jul 28, 2013)

Zero would mean the stripesize is not any different than the sectorsize.  Look at the full output of `diskinfo -v ada0`.

907M at the end of that disk is not really that much, or not enough to make it worth juggling partitions around.


----------



## pietrasm (Jul 28, 2013)

wblock@ said:
			
		

> Zero would mean the stripesize is not any different than the sectorsize.  Look at the full output of `diskinfo -v ada0`.
> 
> 907M at the end of that disk is not really that much, or not enough to make it worth juggling partitions around.



Thanks, it's clear to me now.

I know that it's not worth to do it to recover 907 MB but I think I would like to try to learn something new. Can I just delete the swap partition and then recreate it at the end of HDD? Then just expand the root partition?


----------



## wblock@ (Jul 28, 2013)

Yes, that can be done with growfs(8).  The result is usable but not exactly what would be there if it had been created at that size.  I would back up, repartition, and restore.  The links in post #2 show both.


----------



## pietrasm (Jul 28, 2013)

wblock@ said:
			
		

> Yes, that can be done with growfs(8).  The result is usable but not exactly what would be there if it had been created at that size.  I would back up, repartition, and restore.  The links in post #2 show both.



I don't think it's a good idea to do this on live system. Can I reboot to single-user mode or some other mode in order to preform repartitioning?

Thanks.


----------



## wblock@ (Jul 28, 2013)

Repartitioning can't be done on a system with mounted partitions.  mfsBSD is handy to use for this kind of thing.


----------



## pietrasm (Jul 28, 2013)

wblock@ said:
			
		

> Repartitioning can't be done on a system with mounted partitions.  mfsBSD is handy to use for this kind of thing.



So there is no way to unmount the root partition in single-user mode?

Thanks.


----------



## Crivens (Jul 29, 2013)

wblock@ said:
			
		

> Excellent!  As far as I know, only drives 1T or larger use 4K blocks, meaning partitions on the smaller drive do not need any special alignment.



The "WD 750GB WD7500BPVT" claims to have "Advanced Format-Technologie", according to my local box-pusher. I think that was the 4k per sector format, so it may also appear in smaller disks.


----------



## J65nko (Jul 29, 2013)

You can boot an install CD or USB stick and select *<Live CD>*.


----------



## pietrasm (Jul 30, 2013)

wblock@ said:
			
		

> To be aligned, the data partition must start at an even multiple of 4K.  Drives usually pretend to have 512-byte partitions even when they are really 4K.  The primary GPT table takes 34 512-byte blocks.  The next aligned spot is at block 40.  I suggest starting the first data partition at 1M, or block 2048, for compatibility with other operating systems.




```
4 K = 8 512-byte blocks

40 / 8 = 5
```
Five is not even so 40 is not an even multiple of 4 K I guess.

Please could you explain?

P.S. How does gpart(8)'s `[-a alignment]` option work?


----------



## wblock@ (Jul 30, 2013)

Not even, _even multiple_:

40 * 512 = 20480
20480 / 4096 = 5, an integer (no fraction)

So block 40 (20480 bytes) is an even multiple of 4096, or 4K.

All that the -a option in gpart(8) does is round the partition starting location and size up to the nearest multiple of the value given.


----------



## pietrasm (Jul 31, 2013)

wblock@ said:
			
		

> Not even, _even multiple_:
> 
> 40 * 512 = 20480
> 20480 / 4096 = 5, an integer (no fraction)
> ...



Thanks. It was my bad interpretation, sorry.

I have repartitioned the disk and restored a backup. I got a message like this during restore(8):

```
./.sujournal: (inode 4) not found on tape
```
Should I worry about this?

It seems like the system is working just fine after restoring. However, df(1) shows that there is -776M of available space. That's weird taking into account that the new partition is bigger than the previous one. Is there any way to reclaim missing disk space?

Thanks.


----------



## wblock@ (Jul 31, 2013)

pietrasm said:
			
		

> I have repartitioned the disk and restored a backup. I got a message like this during restore(8):
> 
> ```
> ./.sujournal: (inode 4) not found on tape
> ...



No, that file will be recreated.



> It seems like system is working just fine after restoring. However, df(1) shows that there is -776M of available space. That's weird taking into account that the new partition is bigger than the previous one. Is there any way to reclaim missing disk space?



That sounds wrong, but it's hard to guess what is happening.  `du -k -d1` in various directories can be used to track down what is taking space.


----------



## pietrasm (Jul 31, 2013)

wblock@ said:
			
		

> That sounds wrong, but it's hard to guess what is happening.  `du -k -d1` in various directories can be used to track down what is taking space.



The drive was almost full before the operation and it's slightly overloaded now. The difference in volume is not significant, so it will be hard to track it down. However, I will take a closer look.

Thanks.


----------



## wblock@ (Jul 31, 2013)

A file called restoresymtable is left in the root directory of each filesystem.  Deleting those will save a little space.


----------



## pietrasm (Jul 31, 2013)

wblock@ said:
			
		

> A file called restoresymtable is left in the root directory of each filesystem.  Deleting those will save a little space.



Thanks, I have noticed this file just after restoring the root partition but it's only about 60 MB long.

I took a closer look on this. I have tried to restore(8) the backup into another directory and it has exactly the same volume as on the root partition. I guess that dump(8) backup file isn't broken.

I think the issue is caused by increased file system overhead for some reason. I have changed minimum disk space required to 7% and toggled optimization for space using tunefs(8)'s -m and -o switches respectively. Then I have restored the root partition once again and I have almost 2 GB of available space now. I think it's about the same as before repartitioning.

Unfortunately, I don't remember UFS configuration of the original file system and there is now way to check it. I have created it using bsdinstall(8)'s automatic 'Entire Disk' option when I was installing FreeBSD 9.0. It's rather weird that bsdinstall(8) left almost a GB of free disk space at the end. Moreover, a smaller partition had greater effective capacity. Maybe, bsdinstall(8) doesn't use newfs(8)'s defaults?

Thanks for help.


----------



## wblock@ (Jul 31, 2013)

bsdinstall(8) may use a higher number of inodes, there were problems with running out of them on earlier versions.  And partition size may have been rounded down to the nearest full gigabyte.


----------



## pietrasm (Jul 31, 2013)

wblock@ said:
			
		

> bsdinstall(8) may use a higher number of inodes, there were problems with running out of them on earlier versions.  And partition size may have been rounded down to the nearest full gigabyte.



Do I need to increase the number of inodes when using newfs(8) or the default is fine now?

Thanks.


----------



## wblock@ (Jul 31, 2013)

Spread over a single big filesystem, it's probably okay.  You didn't run out when doing the restore, for instance.  If you did run out, it would just be a matter of backup, newfs(1) with a higher number of inodes, and then a restore.


----------



## pietrasm (Aug 1, 2013)

wblock@ said:
			
		

> Spread over a single big filesystem, it's probably okay.  You didn't run out when doing the restore, for instance.  If you did run out, it would just be a matter of backup, newfs(1) with a higher number of inodes, and then a restore.



Thanks.

I try to dump(8) a /usr/home directory. It's located on the root partition, it's not a separate file system. I got following error:

```
root@mfsbsd:/mnt2 # dump -C16 -b64 -0aL -h0 -f - /mnt/usr/home | restore -ruf -
dump: /mnt/usr/home: unknown file system
Tape is not a dump tape
root@mfsbsd:/mnt2 #
```
I guess the problem is that /mnt/usr/home is not a separate file system. Is there any way to make dump(8) to dump only a subdirectory of a file system? What's the best way to move /mnt/usr/home to a new partition?

Thanks.


----------



## wblock@ (Aug 1, 2013)

Right, dump(8) only deals with full filesystems.

cp(1) can be used to recursively copy directories, but net/rsync is better at it.  (There's also sysutils/clone).  For rsync(1), build the port with the FLAGS option enabled and use these options to make an exact duplicate of of the source directory in the dest directory:
`rsync -axHAXS --delete --fileflags --force-change source/ dest/`

The trailing slashes on the source and dest directories are important.


----------



## pietrasm (Aug 7, 2013)

wblock@ said:
			
		

> Right, dump(8) only deals with full filesystems.
> 
> cp(1) can be used to recursively copy directories, but net/rsync is better at it.  (There's also sysutils/clone).  For rsync(1), build the port with the FLAGS option enabled and use these options to make an exact duplicate of of the source directory in the dest directory:
> `rsync -axHAXS --delete --fileflags --force-change source/ dest/`
> ...



I have used sysutils/clone as it seems to be less complicated. It's a new piece of software but it seems to work correctly. I have tested the server for a last few days and everything looks good.

Thank you all for help. Special thanks to @wblock@.


----------



## Crivens (Aug 8, 2013)

wblock@ said:
			
		

> `du -k -d1` in various directories can be used to track down what is taking space.


May I thow sysutils/gdmap in here to solve this? That tool can help a lot to find out where the disk space ends up.


----------

