# Can an Upgrade to 9.2 cause storage issues?



## layer3guru (Oct 10, 2013)

System was running on 9.0. I upgraded to 9.2. No issues at all with any applications, went very smoothly. I then noticed two strange things. 
What I am running now:

```
FreeBSD vader2.digitalrage.org 9.2-RELEASE FreeBSD 9.2-RELEASE #0 r255898: Thu Sep 26 22:50:31 UTC 2013     [email]root@bake.isc.freebsd.org[/email]:/usr/obj/usr/src/sys/GENERIC  amd64
```

Problem 1: I started getting error messages in the /var/log/messages right after upgrade that was not there before at all, here is the message:

```
Oct  9 03:01:57 vader2 kernel: (ada1:ata3:0:0:0): READ_DMA. ACB: c8 00 e2 d9 b5 42 00 00 00 00 40 00
Oct  9 03:01:57 vader2 kernel: (ada1:ata3:0:0:0): CAM status: ATA Status Error
Oct  9 03:01:57 vader2 kernel: (ada1:ata3:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC )
Oct  9 03:01:57 vader2 kernel: (ada1:ata3:0:0:0): RES: 51 40 ea d9 b5 02 02 00 00 00 00
Oct  9 03:01:57 vader2 kernel: (ada1:ata3:0:0:0): Retrying command
Oct  9 03:01:57 vader2 kernel: (ada1:ata3:0:0:0): READ_DMA. ACB: c8 00 e2 d9 b5 42 00 00 00 00 40 00
Oct  9 03:01:57 vader2 kernel: (ada1:ata3:0:0:0): CAM status: ATA Status Error
Oct  9 03:01:57 vader2 kernel: (ada1:ata3:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC )
Oct  9 03:01:57 vader2 kernel: (ada1:ata3:0:0:0): RES: 51 40 ea d9 b5 02 02 00 00 00 00
Oct  9 03:01:57 vader2 kernel: (ada1:ata3:0:0:0): Error 5, Retries exhausted

vader2# smartctl -a /dev/ada1
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, [url]www.smartmontools.org[/url]

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Blue Serial ATA
Device Model:     WDC WD5000AAJS-00TKA0
Serial Number:    WD-WCAPW5366807
LU WWN Device Id: 5 0014ee 200a65110
Firmware Version: 12.01C01
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA/ATAPI-7 (minor revision not indicated)
Local Time is:    Wed Oct  9 21:31:02 2013 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)	Offline data collection activity
					was suspended by an interrupting command from host.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 121)	The previous self-test completed having
					the read element of the test failed.
Total time to complete Offline 
data collection: 		(12600) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 157) minutes.
Conveyance self-test routine
recommended polling time: 	 (   6) minutes.
SCT capabilities: 	       (0x303f)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   194   194   051    Pre-fail  Always       -       6645
  3 Spin_Up_Time            0x0003   171   170   021    Pre-fail  Always       -       6416
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       32
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000e   200   200   051    Old_age   Always       -       1
  9 Power_On_Hours          0x0032   088   088   000    Old_age   Always       -       9066
 10 Spin_Retry_Count        0x0012   100   253   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0012   100   253   051    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       32
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       312
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       327
194 Temperature_Celsius     0x0022   112   107   000    Old_age   Always       -       38
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always       -       3
198 Offline_Uncorrectable   0x0010   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   051    Old_age   Offline      -       0

SMART Error Log Version: 1
ATA Error Count: 6517 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 6517 occurred at disk power-on lifetime: 9048 hours (377 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ea d9 b5 e2  Error: UNC at LBA = 0x02b5d9ea = 45472234

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 40 e2 d9 b5 02 00   1d+04:52:49.488  READ DMA
  c8 00 40 e2 d9 b5 02 00   1d+04:52:47.416  READ DMA
  c8 00 40 e2 d9 b5 02 00   1d+04:52:45.493  READ DMA
  c8 00 40 e2 d9 b5 02 00   1d+04:52:43.570  READ DMA
  c8 00 40 e2 d9 b5 02 00   1d+04:52:41.486  READ DMA

Error 6516 occurred at disk power-on lifetime: 9048 hours (377 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ea d9 b5 e2  Error: UNC at LBA = 0x02b5d9ea = 45472234

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 40 e2 d9 b5 02 00   1d+04:52:47.416  READ DMA
  c8 00 40 e2 d9 b5 02 00   1d+04:52:45.493  READ DMA
  c8 00 40 e2 d9 b5 02 00   1d+04:52:43.570  READ DMA
  c8 00 40 e2 d9 b5 02 00   1d+04:52:41.486  READ DMA
  c8 00 08 92 70 6c 05 00   1d+04:52:40.827  READ DMA

Error 6515 occurred at disk power-on lifetime: 9048 hours (377 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ea d9 b5 e2  Error: UNC at LBA = 0x02b5d9ea = 45472234

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 40 e2 d9 b5 02 00   1d+04:52:45.493  READ DMA
  c8 00 40 e2 d9 b5 02 00   1d+04:52:43.570  READ DMA
  c8 00 40 e2 d9 b5 02 00   1d+04:52:41.486  READ DMA
  c8 00 08 92 70 6c 05 00   1d+04:52:40.827  READ DMA
  c8 00 08 8a 70 6c 05 00   1d+04:52:40.826  READ DMA

Error 6514 occurred at disk power-on lifetime: 9048 hours (377 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ea d9 b5 e2  Error: UNC at LBA = 0x02b5d9ea = 45472234

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 40 e2 d9 b5 02 00   1d+04:52:43.570  READ DMA
  c8 00 40 e2 d9 b5 02 00   1d+04:52:41.486  READ DMA
  c8 00 08 92 70 6c 05 00   1d+04:52:40.827  READ DMA
  c8 00 08 8a 70 6c 05 00   1d+04:52:40.826  READ DMA
  c8 00 08 82 70 6c 05 00   1d+04:52:40.815  READ DMA

Error 6513 occurred at disk power-on lifetime: 9048 hours (377 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ea d9 b5 e2  Error: UNC at LBA = 0x02b5d9ea = 45472234

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 40 e2 d9 b5 02 00   1d+04:52:41.486  READ DMA
  c8 00 08 92 70 6c 05 00   1d+04:52:40.827  READ DMA
  c8 00 08 8a 70 6c 05 00   1d+04:52:40.826  READ DMA
  c8 00 08 82 70 6c 05 00   1d+04:52:40.815  READ DMA
  c8 00 08 3a e7 19 01 00   1d+04:52:40.815  READ DMA

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       90%      9042         25663269

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay
```

The second issue was my ability to mount an external USB drive upon boot through fstab for backup in the middle of the night through `cron` with `rsync`. The hard drive is self powered and powers itself down when not in use. When this happens it will not remount when my backup script runs. This was actually not a problem in 9.0. Also when I run the backup script problem 1 errors become littered in the logs. 

So did I just run into a coincidence where the drive decided to die or be on it's way out right as I upgraded? I have went and purchased a new SATA cable and replaced the existing and get the same error messages in the log.

Any help would be greatly appreciated.


----------



## wblock@ (Oct 10, 2013)

The second issue is completely unrelated and would be better posted in a separate thread.  Whatever the cause, the pending sectors are questionable.  Back everything up and run a SMART long test (`smartctl -t long /dev/ada1`).  If it passes, distrust the drive.  If it fails, replace the drive.  The new Seagate NAS drives have decent reviews.


----------



## layer3guru (Oct 11, 2013)

Interesting --- Because of the number of USB issues being reported in 9.2 (I know I saw the patches) I decided to go back to 9.0 fresh install from cd.

I no longer have any of the issues I initially reported. Now I am wondering do I really need to swap the boot drive.


----------

