# very slow disk I/O (nas)



## vso1 (Jun 23, 2010)

I bought a new (asus) motherboard
added a 4850E CPU and 2GB 800mhz DD2 
connected 5 sata drives 
in the BIOS I noticed that enable-ing ACHI made booting from the desired disk was 
impossible so set it to IDE 
To laisy to openup case again re-arange the 5 sata cables 
then installed 8.0-RC? freebsd DVD and installed custom on 1 disk 
The 4 1TB WD disks I added to a GVinum raid5 setup with a stripe setting of 128K 
installed ISTGT and configured the /dev/gvinum/raid5 to be used as the target 
on a other machine I added the target and did a format 
when I started copying data to it from a other temp NAS I borrowed from a friend using iscsi I made a shocking discovery it was SLOW 
so I did a 

```
# iostat 5 disk1 disk2 disk3 disk4
```
and it showed that each disk only wrote at 6.xx mb/s 
fiddiling with some sysctl's didn't matter at all no increase/decrease 
I guess that recompiling the kernel won't increase my transfer rate much (maybe 5 or 10%) 
although GENERIC should be sufficient enough to have some speed
What do I need to check/set configure to gain the biggest increase of transfer rate?


----------



## wblock@ (Jun 23, 2010)

Are they the new drives with 4K sectors?  If the partitions on those drives aren't on an even sector boundary, performance is poor.


----------



## vso1 (Jun 23, 2010)

```
diskinfo ad2
ad2     512     1000204886016   1953525168      1938021 16      63
```

I did a gvinum raid5 128k stripe setup 
than shared via ISTGT the /dev/gvinum/raid5

so I skipped the newfs command


----------



## phoenix (Jun 23, 2010)

Posting the model numbers of the harddrives would be helpful.

And you didn't "skip" the newfs command, as you did it from the client system, over iSCSI.  Knowing how you partitioned the iSCSI disk would be very helpful.


----------



## vso1 (Jun 24, 2010)

I reinstalled the whole system, enabled now AHCI in the bios, 
no "speed" gain however 



			
				phoenix said:
			
		

> Posting the model numbers of the harddrives would be helpful.
> 
> And you didn't "skip" the newfs command, as you did it from the client system, over iSCSI.  Knowing how you partitioned the iSCSI disk would be very helpful.



when i did a sysinstall I went into the disk (geomytry editor .. cant get on the name right now) It said that my it was wrong .. it suggested something else I think 116101/255/??? (this from the top of my head) so I used that on all 4 drives (wrote config) 

new raid5 config 

```
cat /root/raid5.conf
drive r0 device /dev/ad6
drive r1 device /dev/ad8
drive r2 device /dev/ad12
drive r3 device /dev/ad14
volume raid5
    plex org raid5 256k
    sd drive r0
    sd drive r1
    sd drive r2
    sd drive r3
```

ISTGT is setup with 

```
disk 
lun 0 storage /dev/gvinum/raid5 auto
```
Iscsi command queing didn't give any speed improvement either. 


after that I did a speed test, got the same result (writing large files to the volume.) rusult ~ 6 mb/s 

In windows I used a Dynamic disk instead of a basic disk giving me the most diskspace (because of the 2tb limit) 


so at this stage I got a GENERIC system. I will post when I get home more details 
- drive type (wd .... (no aers) 
- need to post more ? 

any extra speed improvements would be very welcome


----------



## vso1 (Jun 24, 2010)

the requested diskinfo 


```
diskinfo -v /dev/ad6
/dev/ad6
        512             # sectorsize
        1000204886016   # mediasize in bytes (932G)
        1953525168      # mediasize in sectors
        1938021         # Cylinders according to firmware.
        16              # Heads according to firmware.
        63              # Sectors according to firmware.
        WD-WCAU46666283 # Disk ident.
```


----------



## vso1 (Jun 24, 2010)

I made slices, with the following calculation (with post above) 

1953525168 / 32768 = 59616,85693359375

59616 * 32768 = 1953497088

1953497088 is slice size


----------



## sub_mesa (Jun 26, 2010)

geom vinum does not do writeback with RAID5, so will be very slow even when properly setup and configured. You could try geom_raid5 (not part of base system) or of course ZFS if you prefer RAID5.


----------



## User23 (Jun 27, 2010)

You are using WD10EACS drives. The problem is 4K sector size of the drive.


----------



## vso1 (Jun 28, 2010)

hmmm grrrr whatch this 








opt1?  gone check at home if these drives do have that setting, 

Wierd thing is that the off-the-shelf nas uses that very same drive and they give good speed.without the jumper set


----------



## sub_mesa (Jun 28, 2010)

User23 said:
			
		

> You are using WD10EACS drives. The problem is 4K sector size of the drive.


EACS = 512 byte sectors
EARS = 4096 byte sectors (aka Advanced Format)

So if he's using EACS this does not apply to him.


----------



## vso1 (Jun 28, 2010)

sub_mesa said:
			
		

> EACS = 512 byte sectors
> EARS = 4096 byte sectors (aka Advanced Format)
> 
> So if he's using EACS this does not apply to him.



and I have those eacs disks ..


----------



## sub_mesa (Jun 29, 2010)

Could you try ZFS RAID-Z or geom_raid5 instead of gvinum raid5? gvinum doesn't have a good RAID5 implementation as it lacks write-back.


----------



## vso1 (Jun 29, 2010)

sub_mesa said:
			
		

> Could you try ZFS RAID-Z or geom_raid5 instead of gvinum raid5? gvinum doesn't have a good RAID5 implementation as it lacks write-back.



At this moment in time I've tried both, both are SLOW, so last resort the setting an jumper might be the solution. (got my fingers crossed) keep you guys updated.


----------



## User23 (Jun 29, 2010)

sub_mesa said:
			
		

> EACS = 512 byte sectors
> EARS = 4096 byte sectors (aka Advanced Format)
> 
> So if he's using EACS this does not apply to him.



true! sry for my wrong post.


----------



## vso1 (Jun 29, 2010)

the jumpers helpt 1mbp/s gain  

the 

```
zpool create <raidz> cache <disk1>s1 <disk2>s1
zpool create <raidz> log <disk1>s2 <disk2>s2
```


the later one doubled the speed .. but its now 12mbs .. yipppy (not)


----------



## phoenix (Jun 29, 2010)

Don't use harddrives for cache or log devices, unless you are using 15,000 RPM SCSI/SAS drives.  cache and log devices need to be faster than the devices in the pool.  IOW, they really should be flash devices, preferably SSDs but USB sticks work in a pinch.

If you want the best performance, use mirror vdevs with ZFS:
`# zpool create poolname mirror disk1 disk2 mirror disk3 disk4`

Then, you need to create the ZFS volumes that will be exported via iSCSI:
`# zfs create -V 10G -b 32K poolname/volumename`
This will create a */dev/zvol/poolname/volumename* device.  This is the device to use in the iSCSI export.  The -b option is very important.  You really need to match this with the block size of the filesystem that will be used (ie, the remote client filesystem).  For example, 32K if using NTFS.  If the ZVol blocksize is larger or smaller than the blocksize of the filesystem on top, performance suffers greatly.

With only 2 GB of RAM, you will have to do a lot of tuning to get ZFS to perform well.  Adding more RAM will speed things up more than anything else.

Finally, as a test, do an iSCSI export of a single raw disk (no gvinum, no geom_raid, no zfs, just the bare /dev/disk1 device).  See what the perform is for that.  That will tell you if it's a network issue, and iSCSI issue, or a software RAID issue.


----------



## vso1 (Jun 30, 2010)

the current setup is just a test setup, goal is to get between 30 and 60mbs anything more is nice to have. 

The FreeBSD is only serving ISCSI NAS, maybe other things but not current primairy goal




			
				phoenix said:
			
		

> Don't use harddrives for cache or log devices, unless you are using 15,000 RPM SCSI/SAS drives.  cache and log devices need to be faster than the devices in the pool.  IOW, they really should be flash devices, preferably SSDs but USB sticks work in a pinch.


+


> With only 2 GB of RAM, you will have to do a lot of tuning to get ZFS to perform well.  Adding more RAM will speed things up more than anything else.


I've got 50,- to spend now I am able to invest in: 
- 32gb sata ssd 
- 1 -4 usb sticks 4 to 8 gb in size 
- additional 2gb ram 
your advise would be to add the gb ram ? next month maybe again 50,- 




> If you want the best performance, use mirror vdevs with ZFS:
> `# zpool create poolname mirror disk1 disk2 mirror disk3 disk4`


 this would mean that I lose 700 gb what is to much for my taste 
current raidz is 2,6 gb (per disk 930gb)   




> Then, you need to create the ZFS volumes that will be exported via iSCSI:
> `# zfs create -V 10G -b 32K poolname/volumename`
> This will create a */dev/zvol/poolname/volumename* device.  This is the device to use in the iSCSI export.  The -b option is very important.  You really need to match this with the block size of the filesystem that will be used (ie, the remote client filesystem).  For example, 32K if using NTFS.  If the ZVol blocksize is larger or smaller than the blocksize of the filesystem on top, performance suffers greatly.


Oke this is a very nice adition thanks in advance

I would do ?? to create a 2600 gb iscsi disk ?
`# zfs create -V 2600G -b ?? tank/iscsidisk`
the -b is something I need to research (currently i did "automatic" for ntfs format)  
if this adds the much needed speed I would be very greatfull 




> Finally, as a test, do an iSCSI export of a single raw disk (no gvinum, no geom_raid, no zfs, just the bare /dev/disk1 device).  See what the perform is for that.  That will tell you if it's a network issue, and iSCSI issue, or a software RAID issue.



The only thing what had a better performance was geom_raid5 I think, 
To be frank I expected a start of 30mb/s not 6mb/s and 


my setup as intended was: 
- 1 bootdisk 
- 2x raid4 setup 
* 1 for storage, slow speed = 60mb/s (raid 5 setup type or raidz)
* 1 for OS (ESXI/desktops boot) fast speed (raid 10 or zfs equal) 
ps I have one USB stick attached for config files saving .. in time OS will move to the OS zfs or USB 


Now i need to add 2 ssd's ? (no sata connection room left) to get a decent performance .. 

Oww does NCQ need to be enabled in the kernel or is it default on ?


----------



## vso1 (Jul 2, 2010)

vso1 said:
			
		

> `# zfs create -V 2600G -b ?? tank/iscsidisk`
> the -b is something I need to research (currently i did "automatic" for ntfs format)
> if this adds the much needed speed I would be very greatfull


I did a 
`# zfs create -V 2600G -b 4096 tank/iscsidisk`

4096 = the recommended ntfs cluster size above 2gb 
format of course the partition with the same value

speed doubled from 6mb/s to 12 mb/s adding cache + log didn't improve much this time 
1 or 2 mb's So im going to remove them, next thing will be adding I think the gb ram later on a seperate Cache/log device 

why was geom_raid5 removed ? because overall performance seemd to be oke.


----------



## vso1 (Jul 2, 2010)

oke I am stoborn but, I will test single disk and NCQ 
Do I need to recompile kernel ? 

USB stick will be added for caching, think I will add a other one just for logging 
will put a powered USB hub in between just in case.


----------



## vso1 (Jul 6, 2010)

oke install disk of freebsd crashed .. 
replacing the disk and importing zpool (loved that !) 

I saw a increase in disk speed from 6mb/s to 15mb/s 

also I noticed (new config for testing changed some values) 


```
LU2: LUN0 file=/dev/zvol/tank/iscsidisk, size=160041885696
LU2: LUN0 312581808 blocks, [/I][B]512 bytes/block[/B]
[I]LU2: LUN0 149.0GB storage for iqn.2007-09.jp.ne.peach.istgt:test2disk
LU2: LUN0 command queuing disabled
```

auwch so the device was 4096 in blocksize the format of ntfs was 4096 but iscsi was lacking "correct"  setting. .

looking a bit deeper at ISCSI does anyone know sending the disktype along (scsi/sata) + geom can improve performance with windows ?


----------



## vso1 (Jul 15, 2010)

oke it's speedy "enough"  issue was incorrect "cluster" size on ISCSI, I think setting the zfs block size might improved an aditionnal speed, ifstat gives 300/400 mbs 

so this can be marked as solved


----------



## Matty (Jul 15, 2010)

vso1 said:
			
		

> oke it's speedy "enough"  issue was incorrect "cluster" size on ISCSI, I think setting the zfs block size might improved an aditionnal speed, ifstat gives 300/400 mbs
> 
> so this can be marked as solved



great to hear but what do you mean with cluster size?


----------



## vso1 (Jul 16, 2010)

Matty said:
			
		

> great to hear but what do you mean with cluster size?


i mean the size of clusters used on the "disk" 

*in windows: *partition with 4096 clusterssize (see windows ntfs recomendations for what size for size of partition)
*istgt*: default uses 512 bytes/block didn't set that one 
*zfs*: use option *-b 4096* with the command `zfs create -V 2600G -b 4096 tank/iscsidisk`


----------



## pruik (Oct 28, 2010)

Can you please give some more information on the iscsi bytes/block you're using?

I'm currently trying to increase the default istgt 512 bytes/block but documentation is scarce ;-) Maybe it has no effect on performance at all; If so please let me know


----------



## danbi (Oct 28, 2010)

ZFS cache vdevs will help when the system gets some usage -- the cache has to fill. You may check with 

`# zpool iostat -v`

how much of the cache device is full -- it is most efficient when this shows the cache device is almost full (don't worry it will not overflow);

The zlog vdevs can help in read/write cases, because the ZIL will not be allocated/freed on the storage drives and thus will not impact storage performance. You should see immediate effect from the zlog, but not with typical tests like 'copy a file to the filesystem', rather with more multi-tasking load.

There are opinions, that iSCSI in FreeBSD is not yet very well performing. This might or might not be true - and would be the bottleneck in your case. What does 

`# gstat`

show for the I/O utilization of drives while you do your tests?


----------



## phoenix (Oct 28, 2010)

danbi said:
			
		

> The zlog vdevs can help in read/write cases, because the ZIL will not be allocated/freed on the storage drives and thus will not impact storage performance.



log devices are only written to, never read from.  The only time the data in the log is read is when the system crashes before the data in the log is written to the pool.

The way a log device works is like this:

```
[data in RAM] ----> [written to log]
     \--later-----> [written to pool]
```
If a read request is made for data "in the log", it is read from the ARC, not from the log device.


----------

