# System 'pauses' during heavy disc usage



## ddaley (Dec 24, 2011)

I just replaced my old 160GB SATA drive with a 1.5TB SATA drive.  The new drive is a Wester Digital "Green" drive with 64MB of cache (I think the cache may be important for this issue).  The drive in general seems to perform very well.  However, when building projects or doing anything with lots of hard drive writes, the system will "pause" for about 5 seconds periodically while the hard drive is being accessed (I am assuming written to).

When I build a maven project using the old drive, it takes about 90 seconds.  When I build on the new drive (during which it pauses numerous times), it takes roughly twice as long (over 3 minutes).

I am running the latest 8.2 stable code and ports of the AMD64 code.

I am basically running the GENERIC kernel with *makeoptions DEBUG=-g* removed.


```
uname -a
FreeBSD shuttlebsd.localdomain 8.2-RELEASE-p4 FreeBSD 8.2-RELEASE-p4 #2: Tue Dec 20 09:25:14 CST 2011 \
     root@shuttlebsd.localdomain:/usr/obj/usr/src/sys/GENERIC_AMD64  amd64
```


```
df
Filesystem       1K-blocks      Used      Avail Capacity  Mounted on
/dev/ad4s1a         1012974   498042     433896    53%    /
devfs                     1        1          0   100%    /dev
/dev/ad4s1e         1012974      716     931222     0%    /tmp
/dev/ad4s1f      1401015662 26660628 1262273782     2%    /usr
/dev/ad4s1d        11915630   716140   10246240     7%    /var
procfs                    4        4          0   100%    /proc
```

The only thing I have tried so far is setting *vfs.write_behind=0*, which didn't seem to help.

Any suggestions on where to start to try to resolve this?


----------



## wblock@ (Dec 24, 2011)

Did you align the filesystem slices to the 4K block sector size on that drive?  Misalignment could make the drive much slower overall.


----------



## ddaley (Dec 24, 2011)

Thanks for the response?

How would I tell if they are aligned to the 4k block sector size?  I used sysinstall to init the drive.  Told it to use the entire drive and auto allocated the slices.  Would sysinstall do this automatically?


----------



## wblock@ (Dec 24, 2011)

sysinstall does not think about alignment, AFAIK.  fdisk(8) will show it, but gpart(8) is preferred:

```
$ gpart show ada0
=>        63  1953525105  ada0  MBR  (931G)
          63  1953520002     1  freebsd  [active]  (931G)
  1953520065        5103        - free -  (2.5M)
```

This is a standard MBR setup, starting at sector 63, so not aligned (63 blocks * 512 / 4096 = 7.875).  The easy way to get alignment is to use gpart(8) to create partitions with the -a option.  For example:
`# gpart add -t freebsd-ufs -l gptmpfs -a 4k -s 4G da0`

Of course that would require backing up the drive, recreating partitions, and restoring.  And I can't guarantee it would fix the problem, it could be something else.  If someone wants to send me one of those green drives to test, preferably to keep, I'd be happy to post performance comparisons (really happy!).  Failing that, this http://www.storagereview.com/western_digital_caviar_green_3tb_review_wd30ezrsdtl has some poorly-labeled graphs, some of which appear to show the performance difference between aligned and unaligned layouts.  There's this, also: http://blog.des.no/2010/08/exploring-wd-advanced-format-drives.html.


----------



## bbzz (Dec 24, 2011)

4K alignment means divisible by 8, not 4096, is it not (8 x 512 = 4096)?
For example 
`# gpart add -t freebsd -a 4k da0`

gives

```
63  15470529  da0  MBR  (7.4G)
        63        63       - free -  (31k)
       126  15470406    1  freebsd  (7.4G)
  15470532        60       - free -  (30k)
```

126 x 512 / 4k = 15.75
How does this work?


----------



## wblock@ (Dec 24, 2011)

Divisible by 8 assumes that the block size reported by the drive is 512 bytes.  Granted, 4K-block drives all lie and report 512 so far, but using bytes instead of blocks means it doesn't matter.


----------



## ddaley (Dec 24, 2011)

I tried running this on an extra drive that I have... just to try it out first and here is what I got


```
[ddaley@shuttlebsd ~/tmp]$ gpart add -t freebsd-ufs -l gptmpfs -a 4k ada1
gpart: illegal option -- a
```

Is this a new option for FreeBSD 9?  I am running 8.2.


----------



## wblock@ (Dec 24, 2011)

Sorry, it was added to 8-STABLE in July.  Disk Setup On FreeBSD shows a full example, but not the -a option.  It does show a 1M offset for the first partition, and labels, both of which I recommend.

If you update to 8-STABLE, just add -a 4k to the command.

For 8.2-RELEASE, alignment will still work if you use the 1M offset for the first partition and keep partition sizes at even multiples of 1M or 1G.


----------



## ddaley (Dec 25, 2011)

Thanks wblock.

I spent most of the day doing dump, restore, gpart... repeat.  But, it was worth it.  Now the build times on my project are down to about 90 secs to 115 secs (from 3+ minutes).  More importantly, there are no noticeable pauses while writing to the file system.

However, in the process of all of this, I lost the FreeBSD boot manager.  Can I get that back without wiping anything out?

Here are the commands I issued, in case anyone is interested:


```
gpart create -s gpt /dev/ada0
gpart add -t freebsd-boot -l gpboot0 -s 128K ada0
gpart bootcode -b /boot/pmbr -p /boot/gptboot -i 1 ada0
gpart add -t freebsd-ufs -l gprootfs0 -b 1M -s 5G ada0
gpart add -t freebsd-swap -l gpswap0 -s 16G ada0
gpart add -t freebsd-ufs -l gpvarfs0 -s 15G ada0
gpart add -t freebsd-ufs -l gptmpfs0 -s 2G ada0

gpart add -t freebsd-ufs -l gpusrfs0 -s 1359G ada0
# ended up with about 270M left over...
gpart add -t freebsd-ufs -l gpotherfs -s 270M ada0

newfs /dev/gpt/gprootfs0
newfs -U /dev/gpt/gpvarfs0
newfs -U /dev/gpt/gptmpfs0
newfs -U /dev/gpt/gpusrfs0
newfs -U /dev/gpt/gpotherfs
```


----------



## aragon (Dec 25, 2011)

ddaley said:
			
		

> However, in the process of all of this, I lost the FreeBSD boot manager.  Can I get that back without wiping anything out?


The FreeBSD boot manager (boot0) does not work with GPT partitioned disks.  You have to stick with MBR if you want to use it.

Perhaps you should do your 4k alignment with MBR and/or a BSD label?


----------



## ddaley (Dec 25, 2011)

Thanks for the info aragon.

Follow up question then... I currently have another drive in the system with an installation of FreeBSD.  Is it possible to boot from that drive as well?  During boot, can I tell the system to boot from that driver, or is there another boot manager that works with GPT that can present a menu during boot time to boot from the other drive?

This guy managed to use GRUB to dual boot FreeBSD/Linux on GPT: http://www.rodsbooks.com/gdisk/booting.html

I noticed GRUB2 in ports. Is this an option here?


----------



## wblock@ (Dec 25, 2011)

Grub2 should work, but I don't use it and haven't tried it.


----------



## wblock@ (Dec 25, 2011)

bbzz said:
			
		

> 4K alignment means divisible by 8, not 4096, is it not (8 x 512 = 4096)?
> For example
> `# gpart add -t freebsd -a 4k da0`
> 
> ...



Better to add new messages than edit old ones that can be missed.

Anyway, that's a weird layout, looks like one mistake at least is using a type of freebsd which is an old disklabel type.  Use freebsd-ufs for standard UFS filesystems.  It is possible to create an MBR/disklabel setup with GPT.  Create da0 (for example) with a type of MBR, then create da0s1 with a type of freebsd, then add partitions to da0s1.

Here's a sample of GPT setup with alignment:

```
# gpart create -s gpt da0
# gpart bootcode -b /boot/pmbr -p /boot/gptboot -i 1 da0
# gpart add -s 512k -a 4k -t freebsd-boot
# [color="Red"]gpart add -t freebsd-ufs -l gprootfs -b 1M -s 2G da0[/color]
# gpart add -t freebsd-swap -l gpswap -a 4k -s 4G da0
# gpart add -t freebsd-ufs -l gpvarfs -a 4k -s 4G da0
# gpart add -t freebsd-ufs -l gptmpfs -a 4k -s 1G da0
# gpart add -t freebsd-ufs -l gpusrfs -a 4k da0
# gpart show da0
=>      34  39070013  da0  GPT  (18G)
        34         6       - free -  (3.0k)
        40      1024    1  freebsd-boot  (512k)
      1064       984       - free -  (492k)
      2048   4194304    2  freebsd-ufs  (2.0G)
   4196352   8388608    3  freebsd-swap  (4.0G)
  12584960   8388608    4  freebsd-ufs  (4.0G)
  20973568   2097152    5  freebsd-ufs  (1.0G)
  23070720  15999320    6  freebsd-ufs  (7.6G)
  39070040         7       - free -  (3.5k)
```

I wanted the root partition to start at 1M (a standard), but -a 4k wanted to force it to 1064.  Both are aligned, just 1M might be convenient for compatibility later on.


----------



## bbzz (Dec 25, 2011)

wblock@ said:
			
		

> Better to add new messages than edit old ones that can be missed.



We posted nearly at the same time, I was typing when you posted. 



> Anyway, that's a weird layout, looks like one mistake at least is using a type of freebsd which is an old disklabel type.  Use freebsd-ufs for standard UFS filesystems.  It is possible to create an MBR/disklabel setup with GPT.  Create da0 (for example) with a type of MBR, then create da0s1 with a type of freebsd, then add partitions to da0s1.



Right, but you can't use freebsd-ufs when creating slice, with MBR, only when creating bsdlabels. You must use freebsd.
The above partitioning is actually aligned (starts at 126).


----------



## wblock@ (Dec 25, 2011)

bbzz said:
			
		

> Right, but you can't use freebsd-ufs when creating slice, with MBR, only when creating bsdlabels. You must use freebsd.
> The above partitioning is actually aligned (starts at 126).



The "partition" created there is a disklabel.  Actual data partitions go inside it.  As to where it's getting the 126, maybe it's allowing for two blocks of disklabel data at the beginning.  Alignment would then be correct immediately after the disklabel, where the actual partitions start.  freebsd-ufs works for those.  I'm not sure if the bootcode is done correctly here.  Also I'm not sure if gpart(8) aligns partitions to the start of the drive or the start of the container (in which case, they might be misaligned on the drive).

```
# gpart create -s mbr da0
# gpart bootcode -b /boot/boot0 da0
# gpart add -t freebsd -a 4k da0
# gpart create -s bsd da0s1
# gpart add -t freebsd-ufs -a 4k -s 2g da0s1
# gpart add -t freebsd-swap -a 4k -s 4g da0s1
# gpart add -t freebsd-ufs -a 4k -s 4g da0s1
# gpart add -t freebsd-ufs -a 4k -s 1g da0s1
# gpart add -t freebsd-ufs -a 4k da0s1
# gpart show da0
=>      63  39070017  da0  MBR  (18G)
        63        63       - free -  (31k)
       126  39069891    1  freebsd  (18G)
  39070017        63       - free -  (31k)

# gpart show da0s1
=>       0  39069891  da0s1  BSD  (18G)
         0         2         - free -  (1.0k)
         2   4194304      1  freebsd-ufs  (2.0G)
   4194306   8388608      2  freebsd-swap  (4.0G)
  12582914   8388608      4  freebsd-ufs  (4.0G)
  20971522   2097152      5  freebsd-ufs  (1.0G)
  23068674  16001216      6  freebsd-ufs  (7.6G)
  39069890         1         - free -  (512B)
```


----------



## bbzz (Dec 25, 2011)

Yes they are aligned - look at starting number for each entry in BSD block, add 126 to any, divide by 8, and you get whole number. That's why I said it's easier if you divide by 8.

Bootcode is ok but you still need one more for active partition:

`# gpart bootcode -b /boot/boot da0s1`

Oh, and you need to set active partition fot MBR, unlike GPT:

`# gpart set -a active -i 1 da0`

That is if you don't use boot0.


----------



## aragon (Dec 27, 2011)

ddaley said:
			
		

> I currently have another drive in the system with an installation of FreeBSD.  Is it possible to boot from that drive as well?  During boot, can I tell the system to boot from that driver, or is there another boot manager that works with GPT that can present a menu during boot time to boot from the other drive?


Many BIOSes these days give you the ability to choose the boot device on-the-fly at boot time.

Or if your primary boot disk is MBR partitioned while your secondary disk is GPT, you could try load boot0 onto your primary disk and it might let you boot the GPT disk with F5...


----------



## phoenix (Jan 1, 2012)

ddaley said:
			
		

> I just replaced my old 160GB SATA drive with a 1.5TB SATA drive.  The new drive is a Wester Digital "Green" drive with 64MB of cache (I think the cache may be important for this issue).  The drive in general seems to perform very well.  However, when building projects or doing anything with lots of hard drive writes, the system will "pause" for about 5 seconds periodically while the hard drive is being accessed (I am assuming written to).



Western Digital Caviar Green and GP drives are configured by default to park the drive heads if there's more than 6? seconds of idle time.  So, if you are not reading/writing to the disk for 6? seconds, the drive heads are parked, and platters spun down.  Then you need to read or write to the disk, and the platters have to spin up, and the heads get into position.  You really, really, really do not want to use WD Green drives in any kind of "performance" setup.

Depending on the age of the disk, you may be able to disable (or set the idle timeout high enough to be "disabled") the idle head parking "feature" using their wdidle3.exe tool.  You have to boot to DOS, make sure that that drive is the only one plugged in, then run the tool.

You can check that this is happening by looking at the output of smartctl(), in the Load/Store field.  This number should increase slowly (tens or hundreds per year) in normal use.  The Green drives will increase this by several thousand per week, greatly shortening the useful life-cycle of the drive.

If at all possible, consider replacing the drives with non-Green versions.


----------



## ddaley (Jan 2, 2012)

Thanks Freddie.

Installing smartmontools right now...

This problem has pretty much disappeared since moving to gpt partitions as wblock suggested.  I need to read up on this so that this output will have some meaning, but this is what [CMD="smartctl"] -a /dev/ada0[/CMD] outputs right now:



```
smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.2-RELEASE-p4 amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (Adv. Format)
Device Model:     WDC WD15EARS-00Z5B1
Serial Number:    WD-WMAVU2533141
LU WWN Device Id: 5 0014ee 6aaac84a5
Firmware Version: 80.00A80
User Capacity:    1,500,301,910,016 bytes [1.50 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Mon Jan  2 09:19:37 2012 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (32400) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 255) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x3031) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   183   183   021    Pre-fail  Always       -       5841
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       48
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       106
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       46
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       23
193 Load_Cycle_Count        0x0032   196   196   000    Old_age   Always       -       14242
194 Temperature_Celsius     0x0022   117   107   000    Old_age   Always       -       33
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
```


----------



## ddaley (Jan 2, 2012)

This is the output from my OLD drive:


```
smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.2-RELEASE-p4 amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.7 and 7200.7 Plus
Device Model:     ST3120026AS
Serial Number:    3JT0KBJ3
Firmware Version: 3.05
User Capacity:    120,034,123,776 bytes [120 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   6
ATA Standard is:  ATA/ATAPI-6 T13 1410D revision 2
Local Time is:    Mon Jan  2 09:24:27 2012 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (  38) The self-test routine was interrupted
                                        by the host with a hard or soft reset.
Total time to complete Offline 
data collection:                (  430) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        No General Purpose Logging support.
Short self-test routine 
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (  85) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   057   056   006    Pre-fail  Always       -       76165658
  3 Spin_Up_Time            0x0003   096   096   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       995
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       2
  7 Seek_Error_Rate         0x000f   080   060   030    Pre-fail  Always       -       111390721
  9 Power_On_Hours          0x0032   098   098   000    Old_age   Always       -       2540
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       890
194 Temperature_Celsius     0x0022   038   059   000    Old_age   Always       -       38
195 Hardware_ECC_Recovered  0x001a   057   055   000    Old_age   Always       -       76165658
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 Data_Address_Mark_Errs  0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short captive       Interrupted (host reset)      60%        25         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
```


----------



## wblock@ (Jan 2, 2012)

The new drive has been on only 106 hours, yet Load_Cycle_Count is 14,242!  For comparison, my drive has been on 7905 hours, but Load_Cycle_Count is 29.

WD does not show a wdidle download for that drive, and some drives ignore it.  (Like the 2.5-inch BEVT Scorpio Blue drive I have, which is slow to do anything unless you don't give it eight seconds to go to sleep.  Avoid those.  Hmm... maybe there's a periodic disk access in Windows that hides the issue?)

The old Seagate drive doesn't track Load_Cycle_Count, but has two reallocated sectors.  Maybe not too bad for a drive that's probably five or six years old.


----------



## fnucc (Jan 2, 2012)

There were some problems with 1 and 1+TB WD drives. Some were related to the parking problems and some disappeared with a new drive. That problem is not related just to *nix, it can happen under Windows 7 too.


----------

