# System doesn't recognize hdd after boot



## Vovas (Oct 24, 2012)

Hi all.
I installed 3 HDD to my dlna server and made zfs pool. After copying files from my pc to the server, I saw that message:

```
(aprobe0:ahcich1:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
(aprobe0:ahcich1:0:0:0): CAM status: ATA Status Error
(aprobe0:ahcich1:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 (ABRT )
(aprobe0:ahcich1:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff
(aprobe0:ahcich1:0:0:0): Error 5, Retries exhausted
(aprobe1:ahcich1:0:15:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
(aprobe1:ahcich1:0:15:0): CAM status: ATA Status Error
(aprobe1:ahcich1:0:15:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 (ABRT )
(aprobe1:ahcich1:0:15:0): RES: d1 04 ff ff ff ff ff ff ff ff ff
(aprobe1:ahcich1:0:15:0): Error 5, Retries exhausted
(aprobe0:ahcich1:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
(aprobe0:ahcich1:0:0:0): CAM status: ATA Status Error
(aprobe0:ahcich1:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 (ABRT )
(aprobe0:ahcich1:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff
(aprobe0:ahcich1:0:0:0): Error 5, Retries exhausted
```
I tried this solution, but after reboot no effect.
More info:

```
beast# camcontrol devlist -v
scbus0 on ahcich0 bus 0:
<SAMSUNG HD160JJ ZM100-47>         at scbus0 target 0 lun 0 (pass0,ada0)
<>                                 at scbus0 target -1 lun -1 ()
scbus1 on ahcich1 bus 0:
<WDC WD30EZRX-00MMMB0 80.00A80>    at scbus1 target 0 lun 0 (pass1)
<>                                 at scbus1 target -1 lun -1 ()
scbus2 on ahcich2 bus 0:
<WDC WD30EZRX-00MMMB0 80.00A80>    at scbus2 target 0 lun 0 (pass2,ada2)
<>                                 at scbus2 target -1 lun -1 ()
scbus3 on ahcich3 bus 0:
<WDC WD30EZRX-00MMMB0 80.00A80>    at scbus3 target 0 lun 0 (pass3,ada3)
<>                                 at scbus3 target -1 lun -1 ()
scbus4 on ahcich4 bus 0:
<>                                 at scbus4 target -1 lun -1 ()
scbus5 on ahcich5 bus 0:
<>                                 at scbus5 target -1 lun -1 ()
scbus6 on umass-sim0 bus 0:
<Generic- SD/MMC 1.00>             at scbus6 target 0 lun 0 (da0,pass4)
<Generic- Compact Flash 1.01>      at scbus6 target 0 lun 1 (da1,pass5)
<Generic- SM/xD-Picture 1.02>      at scbus6 target 0 lun 2 (da2,pass6)
<Generic- MS/MS-Pro 1.03>          at scbus6 target 0 lun 3 (da3,pass7)
scbus-1 on xpt0 bus 0:
<>                                 at scbus-1 target -1 lun -1 (xpt0)
```
`# dmesg -a | grep ada`

```
ada0: <SAMSUNG HD160JJ ZM100-47> ATA-7 SATA 2.x device
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 152627MB (312581808 512 byte sectors: 16H 63S/T 16383C)
ada0: Previously was known as ad4
ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
ada1: <WDC WD30EZRX-00MMMB0 80.00A80> ATA-8 SATA 3.x device
ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
ada1: Previously was known as ad6
ada2 at ahcich2 bus 0 scbus2 target 0 lun 0
ada2: <WDC WD30EZRX-00MMMB0 80.00A80> ATA-8 SATA 3.x device
ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada2: Command Queueing enabled
ada2: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
ada2: Previously was known as ad8
ada3 at ahcich3 bus 0 scbus3 target 0 lun 0
ada3: <WDC WD30EZRX-00MMMB0 80.00A80> ATA-8 SATA 3.x device
ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada3: Command Queueing enabled
ada3: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
ada3: Previously was known as ad10
```
`# pciconf -lv`

```
ahci0@pci0:0:31:2:      class=0x010601 card=0x82d41043 chip=0x3a228086 rev=0x00 hdr=0x00
    vendor     = 'Intel Corporation'
    device     = '82801JI (ICH10 Family) SATA AHCI Controller'
    class      = mass storage
    subclass   = SATA
    bar   [10] = type I/O Port, range 32, base 0xac00, size  8, enabled
    bar   [14] = type I/O Port, range 32, base 0xa880, size  4, enabled
    bar   [18] = type I/O Port, range 32, base 0xa800, size  8, enabled
    bar   [1c] = type I/O Port, range 32, base 0xa480, size  4, enabled
    bar   [20] = type I/O Port, range 32, base 0xa400, size 32, enabled
    bar   [24] = type Memory, range 32, base 0xf9ffc000, size 2048, enabled
    cap 05[80] = MSI supports 16 messages enabled with 1 message
    cap 01[70] = powerspec 3  supports D0 D3  current D0
    cap 12[a8] = SATA Index-Data Pair
    cap 13[b0] = PCI Advanced Features: FLR TP
```
Please help.


----------



## Vovas (Oct 24, 2012)

About zpool:
`# zpool status`

```
pool: storage
state: DEGRADED
status: One or more devices has been removed by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
scan: resilvered 36,5M in 0h2m with 0 errors on Wed Oct 24 15:11:44 2012
config:

        NAME                     STATE     READ WRITE CKSUM
        storage                  DEGRADED     0     0     0
          raidz1-0               DEGRADED     0     0     0
            1180438976994044890  REMOVED      0     0     0  was /dev/ada1
            ada2                 ONLINE       0     0     0
            ada3                 ONLINE       0     0     0

errors: No known data errors
```


----------



## Sebulon (Oct 24, 2012)

Hey,

could you please also share the output of:
`# gpart show`

and:
`# zdb | grep ashift`

/Sebulon


----------



## Vovas (Oct 24, 2012)

*First*

```
beast# gpart show
=>       34  312581741  ada0  GPT  (149G)
         34        128     1  freebsd-boot  (64k)
        162  304086912     2  freebsd-ufs  (145G)
  304087074    8388608     3  freebsd-swap  (4.0G)
  312475682     106093        - free -  (51M)
```
*Second*

```
beast# zdb | grep ashift
            ashift: 9
```
I checked SATA and power cables, they're fine. Tried to remove one by one SATA cable from each HDD and boot to OS, boot fine without any errors all disks. And, when I insert all SATA cables to all new HDD, I see this error. I don't understand it x(
==================
verbose dmesg.boot =>> http://pastebin.com/RdHMsZ8Y


----------



## Sebulon (Oct 24, 2012)

@Vovas

Thank you. You have different "problems" also, which are unlikely contributors to your problem, but I can start by explaining it to you at least.

The disks you have used to build your pool with are "Advanced Format(AF)"-drives that have 4k large physical sectors, but they lie and present themselves as 512b, as to not confuse lesser knowing beeings, like Windows XP .e.g. When you create the pool with these drives raw, ZFS sends all IO unaligned which severely impacts performance. So the first thing you have to do is to partition the hard drives aligned to 1MiB.

The second problem is the ashift-value that ZFS uses to determine the smallest IO it can send. "ashift: 9" stands for "I will send 512b IOÂ´s", while "ashift: 12" stands for "I will send 4k IOÂ´s", which is what these drives like, since thatÂ´s what they really are.

Remediation; Backup and recreate. Sorry, thereÂ´s no other way.

Aligned partitioning:
`# gpart create -s gpt ada(1,2,3)`
`# gpart add -t freebsd-zfs -b 2048 -a 4k -l disk(1,2,3) ada(1,2,3)`

Pool creation with "ashift: 12":
`# gnop create -S 4096 /dev/gpt/disk1`
`# zpool create storage raidz gpt/disk1[b].nop[/b] gpt/disk2 gpt/disk3`
`# zpool export storage`
`# gnop destroy /dev/gpt/disk1.nop`
`# zpool import -d /dev/gpt storage`

Will land you with aligned partitions and ZFS sending 4k IOÂ´s for optimal performance.

But thereÂ´s another "snag" about these drives, and that is their firmware that says "park the read-head if idle for 5 secs". The problem with that is that ZFS is a transactional database that buffers IO for about 5 secs between flushes, which means that these drives parks and unparks their heads a gazillion times more than any other drive used with ZFS. Although the specification says that they should be good for about a gazillion times parking, but it may cause unnecessary ware for them to be acting like that. So there is some sort of DOS firmware modifier that removes that behavior. I think it is called "wdidle". Might be worth looking in to.

About your original issue, maybe BIOS is wonky? Make sure itÂ´s set in AHCI-mode and that all SATA are treated equal.

/Sebulon


----------



## Vovas (Oct 24, 2012)

Thank You for detailed answer. I will try to do it and reply.


----------



## Vovas (Oct 24, 2012)

@Sebulon
Thanks again. Very good guide
Now pool and disks working very well.


----------



## Vovas (Oct 24, 2012)

It's a magic.
Again errors:

```
ahcich1: Timeout on slot 1 port 0
ahcich1: is 00000000 cs 00000400 ss 000007fe rs 000007fe tfd 40 serr 00000000 cmd 0004c917
(ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 70 08 16 40 00 00 00 01 00 00
(ada1:ahcich1:0:0:0): CAM status: Command timeout
(ada1:ahcich1:0:0:0): Retrying command
ahcich1: AHCI reset: device not ready after 31000ms (tfd = 00000080)
(aprobe0:ahcich1:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
(aprobe0:ahcich1:0:0:0): CAM status: Unconditionally Re-queue Request
(aprobe0:ahcich1:0:0:0): Error 5, Retry was blocked
(ada1:(pass1:ahcich1:0:ahcich1:0:0:0:0): lost device
0): passdevgonecb: devfs entry is gone
(ada1:ahcich1:0:0:0): removing device entry
(aprobe0:ahcich1:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
(aprobe0:ahcich1:0:0:0): CAM status: ATA Status Error
(aprobe0:ahcich1:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 (ABRT )
(aprobe0:ahcich1:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff
(aprobe0:ahcich1:0:0:0): Error 5, Retries exhausted
(aprobe1:ahcich1:0:15:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
(aprobe1:ahcich1:0:15:0): CAM status: ATA Status Error
(aprobe1:ahcich1:0:15:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 (ABRT )
(aprobe1:ahcich1:0:15:0): RES: d1 04 ff ff ff ff ff ff ff ff ff
(aprobe1:ahcich1:0:15:0): Error 5, Retries exhausted
(aprobe0:ahcich1:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
(aprobe0:ahcich1:0:0:0): CAM status: ATA Status Error
(aprobe0:ahcich1:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 (ABRT )
(aprobe0:ahcich1:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff
(aprobe0:ahcich1:0:0:0): Error 5, Retries exhausted
```
`# gpart show`

```
beast# gpart show
=>       34  312581741  ada0  GPT  (149G)
         34        128     1  freebsd-boot  (64k)
        162  304086912     2  freebsd-ufs  (145G)
  304087074    8388608     3  freebsd-swap  (4.0G)
  312475682     106093        - free -  (51M)

=>        34  5860533101  ada2  GPT  (2.7T)
          34        2014        - free -  (1M)
        2048  5860531080     1  freebsd-zfs  (2.7T)
  5860533128           7        - free -  (3.5k)

=>        34  5860533101  ada3  GPT  (2.7T)
          34        2014        - free -  (1M)
        2048  5860531080     1  freebsd-zfs  (2.7T)
  5860533128           7        - free -  (3.5k)
```
*ada1* disappeared.

```
beast# zdb | grep ashift
            ashift: 12
```
P.S. Bios is up to date, ahci mode enabled for all drives. My motherboard - ASUS P6T SE.


----------



## wblock@ (Oct 24, 2012)

Is ada1 connected to the motherboard or an add-in controller?  Have you checked the SMART data on that drive?  sysutils/smartmontools will show that data:
`# smartctl -a /dev/ada1`


----------



## Sebulon (Oct 24, 2012)

Uhm,

try shutting it down and change SATA ports to see if it's still disk1 that drops out, cause it feels as if it's just a case of disk1 failing. You should be able to send it back for warranty.

Install sysutils/smartmontools that can monitor how the disks are doing in the background.

/Sebulon


----------



## Vovas (Oct 24, 2012)

wblock@ said:
			
		

> Is ada1 connected to the motherboard or an add-in controller?  Have you checked the SMART data on that drive?  sysutils/smartmontools will show that data:
> `# smartctl -a /dev/ada1`





			
				Sebulon said:
			
		

> Uhm,
> 
> try shutting it down and change SATA ports to see if it's still disk1 that drops out, cause it feels as if it's just a case of disk1 failing. You should be able to send it back for warranty.
> 
> ...


wblock@, yes. I tried to execute smartctl for ada1, but no results.
After booting ada1 disconnected from the system. I tested all drives with DLDIAG, and ada1 is defective:



*Error code 0008*


> Self Monitoring, Analysis, and Reporting Technology (SMART) Error returned during SMART Status/Self Test Command. The drive is defective. Status - Replace Drive


Tomorrow will change in service center


----------



## Vovas (Oct 25, 2012)

Hi all,
I changed hdd in service center today, replace ada1, and now pool working fine.
Thanks for help.


----------

