# Drives physically disconnected, ZFS pool still ONLINE



## deepdish (Dec 15, 2009)

This is quiet strange issue, and I'm thinking it maybe a driver issue with my SAS card.

I connected 3x320GB hard-drives to my LSI SAS3081E-R (FreeBSD detects this as ' mpt0 ') running on FreeBSD 8.0-RELEASE amd64. Out of the blue, I pull out the SFF-8482 connector (basically power + data in 1 connector) from all 3 drives, one at a time. A few minutes later, I logged into my box and saw the my ZFS zpool is ONLINE:


```
pool: TEMPORARY
 state: ONLINE
 scrub: scrub completed after 1h32m with 0 errors on Mon Dec 14 03:06:04 2009
config:

	NAME        STATE     READ WRITE CKSUM
	TEMPORARY   ONLINE       0     0     0
	  raidz1    ONLINE       0     0     0
	    da5     ONLINE       0     0     0
	    da6     ONLINE       0     0     0
	    da7     ONLINE       0     0     0

errors: No known data errors
```


```
plutonium# zpool list
NAME        SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
TANK         10T  3.72T  6.28T    37%  ONLINE  -
TEMPORARY   888G   726G   162G    81%  ONLINE  -
plutonium#
```

I'm confused on why I am seeing this as online. I decide to check on /var/log/messages for the mpt0 driver and saw this:


```
Dec 15 00:58:42 plutonium kernel: mpt0: mpt_cam_event: 0x16
Dec 15 00:58:42 plutonium kernel: mpt0: mpt_cam_event: 0x12
Dec 15 00:58:42 plutonium kernel: mpt0: mpt_cam_event: 0x16
Dec 15 00:58:56 plutonium last message repeated 3 times
Dec 15 00:58:56 plutonium kernel: mpt0: mpt_cam_event: 0x12
Dec 15 00:58:56 plutonium kernel: mpt0: mpt_cam_event: 0x16
Dec 15 00:59:07 plutonium last message repeated 3 times
Dec 15 00:59:07 plutonium kernel: mpt0: mpt_cam_event: 0x12
Dec 15 00:59:07 plutonium kernel: mpt0: mpt_cam_event: 0x16
Dec 15 00:59:09 plutonium last message repeated 2 times
```

It's a very vague output to suggest that the 3 drives have been lost, which I believe makes ZFS confused as to what happened to the drives (if thats the case, I am a bit disappointed that ZFS does not have it's own mechanism to detect if the drives are active).

All pools are up-to-date:


```
plutonium# zpool upgrade 
This system is currently running ZFS pool version 13.

All pools are formatted using this version.
plutonium# zfs upgrade
This system is currently running ZFS filesystem version 3.

All filesystems are formatted with the current version.
plutonium#
```

Anyone have any ideas?


----------



## deepdish (Dec 15, 2009)

About 2 hours later...


```
Dec 15 03:01:50 plutonium kernel: (da7:mpt0:0:7:0): lost device
Dec 15 03:01:50 plutonium kernel: (da7:mpt0:0:7:0): Invalidating pack
Dec 15 03:01:50 plutonium kernel: (da7:mpt0:0:7:0): Synchronize cache failed, status == 0x4a, scsi status == 0x0
Dec 15 03:01:50 plutonium kernel: (da7:mpt0:0:7:0): removing device entry
Dec 15 03:01:50 plutonium root: ZFS: vdev I/O failure, zpool=TEMPORARY path=/dev/da7 offset=262144 size=8192 error=6
Dec 15 03:01:50 plutonium root: ZFS: vdev I/O failure, zpool=TEMPORARY path=/dev/da7 offset=320072318976 size=8192 error=6
Dec 15 03:01:50 plutonium root: ZFS: vdev I/O failure, zpool=TEMPORARY path=/dev/da7 offset=320072581120 size=8192 error=6
Dec 15 03:02:47 plutonium kernel: mpt0: request 0xffffff80005aeea0:44745 timed out for ccb 0xffffff00056d6000 (req->ccb 0xffffff00056d6000)
Dec 15 03:02:47 plutonium kernel: mpt0: attempting to abort req 0xffffff80005aeea0:44745 function 0
Dec 15 03:02:47 plutonium kernel: mpt0: mpt_wait_req(1) timed out
Dec 15 03:02:47 plutonium kernel: mpt0: mpt_recover_commands: abort timed-out. Resetting controller
Dec 15 03:02:47 plutonium kernel: mpt0: mpt_cam_event: 0x0
Dec 15 03:02:47 plutonium kernel: mpt0: mpt_cam_event: 0x0
Dec 15 03:02:47 plutonium kernel: mpt0: completing timedout/aborted req 0xffffff80005aeea0:44745
Dec 15 03:02:59 plutonium kernel: mpt0: mpt_cam_event: 0x16
Dec 15 03:02:59 plutonium last message repeated 2 times
Dec 15 03:02:59 plutonium kernel: mpt0: mpt_cam_event: 0x12
Dec 15 03:02:59 plutonium last message repeated 4 times
Dec 15 03:02:59 plutonium kernel: mpt0: mpt_cam_event: 0x16
Dec 15 03:02:59 plutonium last message repeated 2 times
Dec 15 03:03:03 plutonium root: ZFS: vdev I/O failure, zpool=TEMPORARY path=/dev/da6 offset=262144 size=8192 error=6
Dec 15 03:03:03 plutonium kernel: (da6:mpt0:0:6:0): lost device
Dec 15 03:03:03 plutonium kernel: (da6:mpt0:0:6:0): Invalidating pack
Dec 15 03:03:03 plutonium root: ZFS: vdev I/O failure, zpool=TEMPORARY path=/dev/da6 offset=320072318976 size=8192 error=6
Dec 15 03:03:03 plutonium root: ZFS: vdev I/O failure, zpool=TEMPORARY path=/dev/da6 offset=320072581120 size=8192 error=6
Dec 15 03:03:03 plutonium kernel: (da6:mpt0:0:6:0): Synchronize cache failed, status == 0x4a, scsi status == 0x0
Dec 15 03:03:03 plutonium kernel: (da6:mpt0:0:6:0): removing device entry
Dec 15 03:03:07 plutonium root: ZFS: vdev I/O failure, zpool=TEMPORARY path=/dev/da5 offset=262144 size=8192 error=6
Dec 15 03:03:07 plutonium kernel: (da5:mpt0:0:5:0): lost device
Dec 15 03:03:07 plutonium kernel: (da5:mpt0:0:5:0): Invalidating pack
Dec 15 03:03:07 plutonium kernel: (da5:mpt0:0:5:0): Synchronize cache failed, status == 0x4a, scsi status == 0x0
Dec 15 03:03:07 plutonium kernel: (da5:mpt0:0:5:0): removing device entry
Dec 15 03:03:07 plutonium root: ZFS: vdev I/O failure, zpool=TEMPORARY path=/dev/da5 offset=320072318976 size=8192 error=6
Dec 15 03:03:07 plutonium root: ZFS: vdev I/O failure, zpool=TEMPORARY path=/dev/da5 offset=320072581120 size=8192 error=6
Dec 15 03:03:07 plutonium root: ZFS: vdev I/O failure, zpool=TEMPORARY path= offset=632062541824 size=2048 error=6
Dec 15 03:03:07 plutonium root: ZFS: vdev I/O failure, zpool=TEMPORARY path= offset=452179589120 size=2048 error=6
Dec 15 03:03:07 plutonium root: ZFS: vdev failure, zpool=TEMPORARY type=vdev.no_replicas
Dec 15 03:03:07 plutonium kernel: (da4:mpt0:0:4:0): READ(6). CDB: 8 0 0 0 80 0 
Dec 15 03:03:07 plutonium kernel: (da4:mpt0:0:4:0): CAM Status: SCSI Status Error
Dec 15 03:03:07 plutonium kernel: (da4:mpt0:0:4:0): SCSI Status: Check Condition
Dec 15 03:03:07 plutonium kernel: (da4:mpt0:0:4:0): UNIT ATTENTION asc:29,0
Dec 15 03:03:07 plutonium kernel: (da4:mpt0:0:4:0): Power on, reset, or bus device reset occurred
Dec 15 03:03:07 plutonium kernel: (da4:mpt0:0:4:0): Retrying Command (per Sense Data)
Dec 15 03:03:07 plutonium kernel: (da3:mpt0:0:3:0): READ(6). CDB: 8 0 0 0 80 0 
Dec 15 03:03:07 plutonium kernel: (da3:mpt0:0:3:0): CAM Status: SCSI Status Error
Dec 15 03:03:07 plutonium kernel: (da3:mpt0:0:3:0): SCSI Status: Check Condition
Dec 15 03:03:07 plutonium kernel: (da3:mpt0:0:3:0): UNIT ATTENTION asc:29,0
Dec 15 03:03:07 plutonium kernel: (da3:mpt0:0:3:0): Power on, reset, or bus device reset occurred
Dec 15 03:03:07 plutonium kernel: (da3:mpt0:0:3:0): Retrying Command (per Sense Data)
Dec 15 03:03:07 plutonium kernel: (da2:mpt0:0:2:0): READ(6). CDB: 8 0 0 0 80 0 
Dec 15 03:03:07 plutonium kernel: (da2:mpt0:0:2:0): CAM Status: SCSI Status Error
Dec 15 03:03:07 plutonium kernel: (da2:mpt0:0:2:0): SCSI Status: Check Condition
Dec 15 03:03:07 plutonium kernel: (da2:mpt0:0:2:0): UNIT ATTENTION asc:29,0
Dec 15 03:03:07 plutonium kernel: (da2:mpt0:0:2:0): Power on, reset, or bus device reset occurred
Dec 15 03:03:07 plutonium kernel: (da2:mpt0:0:2:0): Retrying Command (per Sense Data)
Dec 15 03:03:07 plutonium kernel: (da1:mpt0:0:1:0): READ(6). CDB: 8 0 0 0 80 0 
Dec 15 03:03:07 plutonium kernel: (da1:mpt0:0:1:0): CAM Status: SCSI Status Error
Dec 15 03:03:07 plutonium kernel: (da1:mpt0:0:1:0): SCSI Status: Check Condition
Dec 15 03:03:07 plutonium kernel: (da1:mpt0:0:1:0): UNIT ATTENTION asc:29,0
Dec 15 03:03:07 plutonium kernel: (da1:mpt0:0:1:0): Power on, reset, or bus device reset occurred
Dec 15 03:03:07 plutonium kernel: (da1:mpt0:0:1:0): Retrying Command (per Sense Data)
Dec 15 03:03:07 plutonium kernel: (da0:mpt0:0:0:0): READ(6). CDB: 8 0 0 0 80 0 
Dec 15 03:03:07 plutonium kernel: (da0:mpt0:0:0:0): CAM Status: SCSI Status Error
Dec 15 03:03:07 plutonium kernel: (da0:mpt0:0:0:0): SCSI Status: Check Condition
Dec 15 03:03:07 plutonium kernel: (da0:mpt0:0:0:0): UNIT ATTENTION asc:29,0
Dec 15 03:03:07 plutonium kernel: (da0:mpt0:0:0:0): Power on, reset, or bus device reset occurred
Dec 15 03:03:07 plutonium kernel: (da0:mpt0:0:0:0): Retrying Command (per Sense Data)
Dec 15 03:03:10 plutonium root: ZFS: zpool I/O failure, zpool=TEMPORARY error=6
Dec 15 03:03:10 plutonium root: ZFS: vdev I/O failure, zpool=TEMPORARY path= offset= size= error=
Dec 15 03:03:11 plutonium root: ZFS: vdev failure, zpool=TEMPORARY type=vdev.open_failed
Dec 15 03:03:11 plutonium root: ZFS: zpool I/O failure, zpool=TEMPORARY error=6
Dec 15 03:03:11 plutonium root: ZFS: vdev I/O failure, zpool=TEMPORARY path= offset= size= error=
```

Why is there such a long delay?!


----------



## trasz@ (Feb 6, 2010)

Looks like a bug in mpt(4) driver.  Recently there were quite a few changes in it; could you check if it still happens in FreeBSD-CURRENT?


----------

