# USB disks hang system



## twilk (Aug 5, 2020)

I have a Toshiba 4TB external USB hard disk that I'm trying to use with FreeBSD. I can read from and write to the disk just after the system boots (or just after I plug it in). However, if I leave the disk unused for ~5 minutes, any subsequent reads or writes hang the system (to the extent that the process doing I/O cannot be killed, but I can still run additional processes that do not touch the affected disk; rebooting the system is impossible from FreeBSD, I have to hard-reset the system).

It seems to me that the problem is the USB disk spinning down or going into power-saving mode, and FreeBSD cannot wake it again. (Linux handles the disk fine and does not exhibit this problem, so it seems unlikely to be a broken disk/hardware problem.)

This problem occurs when I connect the USB disk via USB-2 or USB-3.

Here's how I've tried to solve this problem:

I set up a cron(8) job to `touch -c /dev/da0` every few minutes, but that seems to have no effect -- the disk still hangs after a while.
I've run `camcontrol apm /dev/da0`, which should disable APM. The command produces no errors, but seems to have no effect -- the disk still hangs after a while.
I've run `camcontrol standby /dev/da0 -t 0` and `camcontrol idle /dev/da0 -t 0`. As before, the commands produce no errors, but seem to have no effect -- the disk still hangs after a while.
I've run `smartd` from sysutils/smartmontools including 
	
	



```
DEFAULT -e standby,off
```
 in /usr/local/etc/smartd.conf, but that seems to have no effect -- the disk still hangs after a while.
I set up a cron(8) job to run `date > /path/to/da0-mount/date.txt` every few minutes. This seems to keep the disk awake for extended periods of time!

What can I do to stop this disk from going to sleep? Is there a less hacky solution than writing to the disk every few minutes?


Error log

When I try to write to the disk once it has (likely) powered-down, I get the following errors in /var/log/messages:


```
Aug  5 16:02:25 server kernel: (da0:umass-sim0:0:0:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c 00 20 88 00 00 00 08 00 00
Aug  5 16:02:25 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug  5 16:02:25 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 3 more tries remain
Aug  5 16:02:31 server kernel: (da0:umass-sim0:0:0:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c 00 20 88 00 00 00 08 00 00
Aug  5 16:02:31 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug  5 16:02:31 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 2 more tries remain
Aug  5 16:02:36 server kernel: (da0:umass-sim0:0:0:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c 00 20 88 00 00 00 08 00 00
Aug  5 16:02:36 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug  5 16:02:36 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 1 more tries remain
Aug  5 16:02:42 server kernel: (da0:umass-sim0:0:0:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c 00 20 88 00 00 00 08 00 00
Aug  5 16:02:42 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug  5 16:02:42 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 0 more tries remain
Aug  5 16:02:47 server kernel: (da0:umass-sim0:0:0:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c 00 20 88 00 00 00 08 00 00
Aug  5 16:02:47 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug  5 16:02:47 server kernel: (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
Aug  5 16:02:53 server kernel: (da0:umass-sim0:0:0:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c 00 20 90 00 00 00 08 00 00
Aug  5 16:02:53 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug  5 16:02:53 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 3 more tries remain
Aug  5 16:02:59 server kernel: (da0:umass-sim0:0:0:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c 00 20 90 00 00 00 08 00 00
Aug  5 16:02:59 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug  5 16:02:59 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 2 more tries remain
Aug  5 16:03:04 server kernel: (da0:umass-sim0:0:0:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c 00 20 90 00 00 00 08 00 00
Aug  5 16:03:04 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug  5 16:03:04 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 1 more tries remain
Aug  5 16:03:10 server kernel: (da0:umass-sim0:0:0:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c 00 20 90 00 00 00 08 00 00
Aug  5 16:03:10 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug  5 16:03:10 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 0 more tries remain
Aug  5 16:03:16 server kernel: (da0:umass-sim0:0:0:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c 00 20 90 00 00 00 08 00 00
Aug  5 16:03:16 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug  5 16:03:16 server kernel: (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
Aug  5 16:03:21 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 e8 00 20 78 00 00 08 00
Aug  5 16:03:21 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug  5 16:03:21 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 3 more tries remain
Aug  5 16:03:27 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 e8 00 20 78 00 00 08 00
Aug  5 16:03:27 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug  5 16:03:27 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 2 more tries remain
Aug  5 16:03:33 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 e8 00 20 78 00 00 08 00
Aug  5 16:03:33 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug  5 16:03:33 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 1 more tries remain
Aug  5 16:03:38 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 e8 00 20 78 00 00 08 00
Aug  5 16:03:38 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug  5 16:03:38 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 0 more tries remain
Aug  5 16:03:44 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 e8 00 20 78 00 00 08 00
Aug  5 16:03:44 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug  5 16:03:44 server kernel: (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
```

...and then the system just hangs.


Here's some information about the USB disk:


```
# camcontrol powermode /dev/da0
camcontrol: Can't get ATA command status
```


```
# less /var/log/messages
[... snip ...]
Aug  5 18:42:19 server kernel: da0 at umass-sim0 bus 0 scbus6 target 0 lun 0
Aug  5 18:42:19 server kernel: da0: <TOSHIBA External USB 3.0 5438> Fixed Direct Access SPC-4 SCSI device
Aug  5 18:42:19 server kernel: da0: Serial Number [REDACTED]
Aug  5 18:42:19 server kernel: da0: 400.000MB/s transfers
Aug  5 18:42:19 server kernel: da0: 3815447MB (7814037164 512 byte sectors)
Aug  5 18:42:19 server kernel: da0: quirks=0x2<NO_6_BYTE>
[... snip ...]
```


```
# usbconfig -d 1.2 dump_curr_config_desc
ugen1.2: <TOSHIBA External USB 3.0> at usbus1, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=ON (224mA)


Configuration index 0

    bLength = 0x0009
    bDescriptorType = 0x0002
    wTotalLength = 0x002c
    bNumInterfaces = 0x0001
    bConfigurationValue = 0x0001
    iConfiguration = 0x0000  <no string>
    bmAttributes = 0x0080
    bMaxPower = 0x0070

    Interface 0
      bLength = 0x0009
      bDescriptorType = 0x0004
      bInterfaceNumber = 0x0000
      bAlternateSetting = 0x0000
      bNumEndpoints = 0x0002
      bInterfaceClass = 0x0008  <Mass storage>
      bInterfaceSubClass = 0x0006
      bInterfaceProtocol = 0x0050
      iInterface = 0x0000  <no string>

     Endpoint 0
        bLength = 0x0007
        bDescriptorType = 0x0005
        bEndpointAddress = 0x0081  <IN>
        bmAttributes = 0x0002  <BULK>
        wMaxPacketSize = 0x0400
        bInterval = 0x0000
        bRefresh = 0x0000
        bSynchAddress = 0x0000

      Additional Descriptor

      bLength = 0x06
      bDescriptorType = 0x30
      bDescriptorSubType = 0x0e
       RAW dump:
       0x00 | 0x06, 0x30, 0x0e, 0x00, 0x00, 0x00


     Endpoint 1
        bLength = 0x0007
        bDescriptorType = 0x0005
        bEndpointAddress = 0x0002  <OUT>
        bmAttributes = 0x0002  <BULK>
        wMaxPacketSize = 0x0400
        bInterval = 0x0000
        bRefresh = 0x0000
        bSynchAddress = 0x0000

      Additional Descriptor

      bLength = 0x06
      bDescriptorType = 0x30
      bDescriptorSubType = 0x0e
       RAW dump:
       0x00 | 0x06, 0x30, 0x0e, 0x00, 0x00, 0x00
```


----------



## twilk (Aug 12, 2020)

A similar problem keeps happening with another USB disk, this time a Seagate one:


```
Aug 12 13:48:00 server kernel: ugen1.2: <Seagate Expansion Desk> at usbus1 (disconnected)
Aug 12 13:48:00 server kernel: umass0: at uhub1, port 1, addr 1 (disconnected)
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0b 80 0b ca 00 00 01 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 3 more tries remain
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0b 80 0b ca 00 00 01 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 2 more tries remain
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0b 80 0b ca 00 00 01 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 1 more tries remain
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0b 80 0b ca 00 00 01 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 0 more tries remain
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0b 80 0b ca 00 00 01 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 00 42 00 00 02 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 3 more tries remain
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 00 42 00 00 02 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 2 more tries remain
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 00 42 00 00 02 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 1 more tries remain
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 00 42 00 00 02 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 0 more tries remain
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 00 42 00 00 02 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 3a 38 17 42 00 00 02 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 3 more tries remain
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 3a 38 17 42 00 00 02 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 2 more tries remain
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 3a 38 17 42 00 00 02 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 1 more tries remain
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 3a 38 17 42 00 00 02 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 0 more tries remain
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 3a 38 17 42 00 00 02 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 3a 38 17 82 00 00 02 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 3 more tries remain
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 3a 38 17 82 00 00 02 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 2 more tries remain
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 3a 38 17 82 00 00 02 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 1 more tries remain
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 3a 38 17 82 00 00 02 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 0 more tries remain
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 3a 38 17 82 00 00 02 00
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
Aug 12 13:48:00 server kernel: da0 at umass-sim0 bus 0 scbus6 target 0 lun 0
Aug 12 13:48:00 server kernel: da0: <Seagate Expansion Desk 0712>  s/n [REDACTED] detached
Aug 12 13:48:00 server kernel: (da0:umass-sim0:0:0:0): Periph destroyed
Aug 12 13:48:00 server kernel: umass0: detached
Aug 12 13:48:00 server ZFS[65732]: vdev state changed, pool_guid=$12612782409786294928 vdev_guid=$3198167944910114318
Aug 12 13:48:00 server ZFS[65748]: vdev is removed, pool_guid=$12612782409786294928 vdev_guid=$3198167944910114318
Aug 12 13:48:04 server kernel: ugen1.2: <Seagate Expansion Desk> at usbus1
Aug 12 13:48:04 server kernel: umass0 on uhub1
Aug 12 13:48:04 server kernel: umass0: <Seagate Expansion Desk, class 0/0, rev 3.00/1.00, addr 1> on usbus1
Aug 12 13:48:04 server kernel: umass0:  SCSI over Bulk-Only; quirks = 0x0100
Aug 12 13:48:04 server kernel: umass0:6:0: Attached to scbus6
Aug 12 13:48:11 server kernel: da0 at umass-sim0 bus 0 scbus6 target 0 lun 0
Aug 12 13:48:11 server kernel: da0: <Seagate Expansion Desk 0712> Fixed Direct Access SPC-4 SCSI device
Aug 12 13:48:11 server kernel: da0: Serial Number [REDACTED]
Aug 12 13:48:11 server kernel: da0: 400.000MB/s transfers
Aug 12 13:48:11 server kernel: da0: 3815447MB (976754645 4096 byte sectors)
Aug 12 13:48:11 server kernel: da0: quirks=0x2<NO_6_BYTE>
```

This happens reliably a few hours after booting. However, setting up a cron(8) job that writes to the disk every 2 minutes (as in the post above) seems to make no difference -- the disk hangs the system after a while, whether it is being used or not. It even happens when it's in heavy use, unlike the Toshiba disk (which only hangs when not used at all for a few minutes).

For reference, I have this entry in my /etc/crontab, but the Seagate disk (mounted at /data) still hangs:


```
*/2     *       *       *       *       root    date > /data/.keepalive; fsync /data/.keepalive
```

Is there anything I can do to keep this from happening?


Edited to add some more information about the Seagate disk:


```
# camcontrol powermode da0
pass2: Active or Idle mode
```

(camcontrol(8) outputs "Active or Idle mode" when the disk is working and when it's wedged.)


```
# usbconfig -d 1.2 dump_curr_config_desc
ugen1.2: <Seagate Expansion Desk> at usbus1, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=ON (36mA)


 Configuration index 0

    bLength = 0x0009 
    bDescriptorType = 0x0002 
    wTotalLength = 0x0079 
    bNumInterfaces = 0x0001 
    bConfigurationValue = 0x0001 
    iConfiguration = 0x0000  <no string>
    bmAttributes = 0x00c0 
    bMaxPower = 0x0012 

    Interface 0
      bLength = 0x0009 
      bDescriptorType = 0x0004 
      bInterfaceNumber = 0x0000 
      bAlternateSetting = 0x0000 
      bNumEndpoints = 0x0002 
      bInterfaceClass = 0x0008  <Mass storage>
      bInterfaceSubClass = 0x0006 
      bInterfaceProtocol = 0x0050 
      iInterface = 0x0000  <no string>

     Endpoint 0
        bLength = 0x0007 
        bDescriptorType = 0x0005 
        bEndpointAddress = 0x0081  <IN>
        bmAttributes = 0x0002  <BULK>
        wMaxPacketSize = 0x0400 
        bInterval = 0x0000 
        bRefresh = 0x0000 
        bSynchAddress = 0x0000 

      Additional Descriptor

      bLength = 0x06
      bDescriptorType = 0x30
      bDescriptorSubType = 0x0f
       RAW dump: 
       0x00 | 0x06, 0x30, 0x0f, 0x00, 0x00, 0x00


     Endpoint 1
        bLength = 0x0007 
        bDescriptorType = 0x0005 
        bEndpointAddress = 0x0002  <OUT>
        bmAttributes = 0x0002  <BULK>
        wMaxPacketSize = 0x0400 
        bInterval = 0x0000 
        bRefresh = 0x0000 
        bSynchAddress = 0x0000 

      Additional Descriptor

      bLength = 0x06
      bDescriptorType = 0x30
      bDescriptorSubType = 0x0f
       RAW dump: 
       0x00 | 0x06, 0x30, 0x0f, 0x00, 0x00, 0x00



    Interface 0 Alt 1
      bLength = 0x0009 
      bDescriptorType = 0x0004 
      bInterfaceNumber = 0x0000 
      bAlternateSetting = 0x0001 
      bNumEndpoints = 0x0004 
      bInterfaceClass = 0x0008  <Mass storage>
      bInterfaceSubClass = 0x0006 
      bInterfaceProtocol = 0x0062 
      iInterface = 0x0000  <no string>

     Endpoint 0
        bLength = 0x0007 
        bDescriptorType = 0x0005 
        bEndpointAddress = 0x0081  <IN>
        bmAttributes = 0x0002  <BULK>
        wMaxPacketSize = 0x0400 
        bInterval = 0x0000 
        bRefresh = 0x0000 
        bSynchAddress = 0x0000 

      Additional Descriptor

      bLength = 0x06
      bDescriptorType = 0x30
      bDescriptorSubType = 0x0f
       RAW dump: 
       0x00 | 0x06, 0x30, 0x0f, 0x05, 0x00, 0x00


      Additional Descriptor

      bLength = 0x04
      bDescriptorType = 0x24
      bDescriptorSubType = 0x03
       RAW dump: 
       0x00 | 0x04, 0x24, 0x03, 0x00


     Endpoint 1
        bLength = 0x0007 
        bDescriptorType = 0x0005 
        bEndpointAddress = 0x0002  <OUT>
        bmAttributes = 0x0002  <BULK>
        wMaxPacketSize = 0x0400 
        bInterval = 0x0000 
        bRefresh = 0x0000 
        bSynchAddress = 0x0000 

      Additional Descriptor

      bLength = 0x06
      bDescriptorType = 0x30
      bDescriptorSubType = 0x0f
       RAW dump: 
       0x00 | 0x06, 0x30, 0x0f, 0x05, 0x00, 0x00


      Additional Descriptor

      bLength = 0x04
      bDescriptorType = 0x24
      bDescriptorSubType = 0x04
       RAW dump: 
       0x00 | 0x04, 0x24, 0x04, 0x00


     Endpoint 2
        bLength = 0x0007 
        bDescriptorType = 0x0005 
        bEndpointAddress = 0x0083  <IN>
        bmAttributes = 0x0002  <BULK>
        wMaxPacketSize = 0x0400 
        bInterval = 0x0000 
        bRefresh = 0x0000 
        bSynchAddress = 0x0000 

      Additional Descriptor

      bLength = 0x06
      bDescriptorType = 0x30
      bDescriptorSubType = 0x0f
       RAW dump: 
       0x00 | 0x06, 0x30, 0x0f, 0x05, 0x00, 0x00


      Additional Descriptor

      bLength = 0x04
      bDescriptorType = 0x24
      bDescriptorSubType = 0x02
       RAW dump: 
       0x00 | 0x04, 0x24, 0x02, 0x00


     Endpoint 3
        bLength = 0x0007 
        bDescriptorType = 0x0005 
        bEndpointAddress = 0x0004  <OUT>
        bmAttributes = 0x0002  <BULK>
        wMaxPacketSize = 0x0400 
        bInterval = 0x0000 
        bRefresh = 0x0000 
        bSynchAddress = 0x0000 

      Additional Descriptor

      bLength = 0x06
      bDescriptorType = 0x30
      bDescriptorSubType = 0x00
       RAW dump: 
       0x00 | 0x06, 0x30, 0x00, 0x00, 0x00, 0x00


      Additional Descriptor

      bLength = 0x04
      bDescriptorType = 0x24
      bDescriptorSubType = 0x01
       RAW dump: 
       0x00 | 0x04, 0x24, 0x01, 0x00
```


----------



## SirDice (Aug 12, 2020)

twilk said:


> A similar problem keeps happening with another USB disk, this time a Seagate one:


It it perhaps in the same brand/type of enclosure? The problem might not be the disk but the USB->SATA controller that's in the enclosure.


----------



## twilk (Aug 12, 2020)

Hi SirDice, thank you very much for your reply!

They look pretty different from the outside -- the Toshiba one is quite small and USB-powered, while the Seagate one is much larger and has a separate power cable. They both have the same sort of USB cable -- a USB3-A to USB3 Micro-B cable (as shown in this figure) -- though I suppose that's standard.

How do I tell what USB-to-SATA controller they have? I can't find that info on Seagate's or Toshiba's websites.

I've got these hard drives:

Seagate: 4TB; model no. SRD00F2; product no. 1D7AD8-500; datasheet
Toshiba: 4TB; product no. HDTB440MK3CA; datasheet (apparently only available in German, but not very useful anyway)


----------



## SirDice (Aug 12, 2020)

Yeah, the cable is standard. There's usually a small PCB in these things. Those enclosures need to convert the USB umass(4) protocols to SATA commands the disk understands, this conversion is typically done with a small controller chip. These chips are often cheaply manufactured and some definitely have bugs. Which is why I asked if it was the same controller or not. 

Judging by the information you provided I doubt the enclosures used the same controller. That's good, at least we can rule it out as a possible cause.


----------



## twilk (Aug 12, 2020)

Fair enough, thanks!

Here's all I can think of that might be the cause of these errors:

bugs somewhere in the stack between the hard drive and FreeBSD (though I'm tempted to blame the FreeBSD drivers, as I've never had problems with these drives under Linux)
for the Toshiba drive, the problems seems very likely to be that it goes into sleep/standby mode and FreeBSD can't wake it up again, as the problem disappears when writing to the disk frequently, and I get reliable hangs after a few minutes of no disk activity
it seems like the Seagate drive has a different problem, as it hangs the system under light, moderate and even heavy load, and not on a predictable time scale (it seems to take between 2 and 30 hours of varying load for it to hang)

I've recently re-seated the CPU on the "server" (actually an old desktop PC) and bent a few pins in the process, though I bent them back and haven't encountered any other mysterious hardware problems since
the room the server is in gets fairly hot (mid 30s °C) with the warm weather here currently, but that hasn't been a problem before
Is there anything else that might be a cause I can investigate?


----------



## ralphbsz (Aug 13, 2020)

It could also a problem in USB itself. Given that these disks are recent, and the enclosures are sold by reputable makers (Seagate and Toshiba), I expect then to have mostly bug-free USB -> SATA implementations. But perhaps the USB ports on your motherboard are somewhat unusual, and giving the FreeBSD driver stack problems?

Little anecdote: I used to use a 1TB disk in an external enclosure (no name brand enclosure) via USB connected to my FreeBSD home server. Writing a few dozen GB to it every hour. The USB connection would come down every day or two, occasionally with the whole OS crashing. This was about 10 years ago, and using USB 2.0. I fixed it eventually by adding an eSATA connector to my server, and buying an eSATA enclosure. Eventually, I tried a newer USB 3.0 disk (this time name-brand Seagate enclosure) with a fresh FreeBSD install (11.x), and it worked perfectly. My suspicion (without proof!) is that newer FreeBSD versions have fixed bugs in the USB stack, and name-brand USB adapters generally have fewer problems.

About the temperature: Disks work best around 30...40 degrees, and electronics doesn't care until much higher temperatures, so that's probably not the problem.


----------



## Alain De Vos (Aug 13, 2020)

Could it be related to power savings ?
Maybe an "ls /mnt/myusbdisk/*/*" to wakeup


----------



## twilk (Aug 13, 2020)

Thanks for your replies, ralphbsz and Alain De Vos!



ralphbsz said:


> It could also a problem in USB itself. Given that these disks are recent, and the enclosures are sold by reputable makers (Seagate and Toshiba), I expect then to have mostly bug-free USB -> SATA implementations. But perhaps the USB ports on your motherboard are somewhat unusual, and giving the FreeBSD driver stack problems?
> 
> Little anecdote: I used to use a 1TB disk in an external enclosure (no name brand enclosure) via USB connected to my FreeBSD home server. Writing a few dozen GB to it every hour. The USB connection would come down every day or two, occasionally with the whole OS crashing. This was about 10 years ago, and using USB 2.0. I fixed it eventually by adding an eSATA connector to my server, and buying an eSATA enclosure. Eventually, I tried a newer USB 3.0 disk (this time name-brand Seagate enclosure) with a fresh FreeBSD install (11.x), and it worked perfectly. My suspicion (without proof!) is that newer FreeBSD versions have fixed bugs in the USB stack, and name-brand USB adapters generally have fewer problems.


I've tried plugging the Toshiba disk into some USB-2 ports on my motherboard instead of the USB-3 ports, and I got the same problem -- so it seems unlikely that it's USB-3-related weirdness, but my motherboard might just be weird overall. (I've got an ~7-year-old ASUS P8H61-M Pro mobo, which came with the ASUS CM6630 desktop it's installed in, which I've repurposed as a home server.)

I'm running FreeBSD 12.1-RELEASE-p8 by the way, which I installed about a week ago, replacing Debian (so I'm a complete BSD noob!) -- that means that I'm presumably already getting those USB fixes, and hitting different bugs.

If this is indeed motherboard weirdness on my side, what information should I submit in a bug report to help fix the bugs in FreeBSD's USB stack?



ralphbsz said:


> About the temperature: Disks work best around 30...40 degrees, and electronics doesn't care until much higher temperatures, so that's probably not the problem.


That's reassuring, thanks!



Alain De Vos said:


> Could it be related to power savings ?
> Maybe an `ls /mnt/myusbdisk/*/*` to wakeup



I'm doing something similar already with a cron(8) job every 2 minutes that writes the current date out to both disks (mounted at /backup and /data):

```
*/2     *       *       *       *       root    date > /backup/.keepalive; fsync /backup/.keepalive
*/2     *       *       *       *       root    date > /data/.keepalive; fsync /data/.keepalive
```
This seems to work for the Toshiba disk, but _not_ the Seagate one, which suggests to me that the problem with the Toshiba disk is related to power management, but the Seagate disk has another problem.

It's important to note that FreeBSD apparently can't wake these disks up once they've gone to sleep.

When the disks are wedged, reading from or writing to them just hangs the process doing it indefinitely. For example, when the Seagate disk hangs, and I run `ls /data`, ls(1) just hangs: there's no output, ls(1) runs forever, and can't be killed by `^C` or `kill -9`. This problem is not unique to ls(1), it happens to any process that tries to use the wedged disk. For instance, the cron(8) jobs above just accumulate, and if I run htop(1) I can see lots of `sh -c 'date > /data/.keepalive; fsync /data/.keepalive'` processes just hanging there. Also, e.g. typing `ls /data/` and pressing tab for auto-completion will completely hang my shell.


----------



## Alain De Vos (Aug 13, 2020)

maybe search google "usb quirks toshiba freebsd"


----------



## teo (Aug 14, 2020)

Why don't you try installing NormadBSD on the USB memory and see how the system installed on the USB stick works?  I don't know why it gives many FreeBSD bugs when trying to install on the 60 GB Toshiba USB stick. 

There is not even a clear guide in the Handbook on how to install the FreeBSD system on a USB stick, in the middle of the installation the system ends up hanging.


----------



## Alain De Vos (Aug 14, 2020)

Freebsd does not care if a disk is SATA or USB. Everything remains the same. So no guide is needed.
Just read "man gpart"


----------



## teo (Aug 14, 2020)

Alain De Vos said:


> Freebsd does not care if a disk is SATA or USB. Everything remains the same. So no guide is needed.
> Just read "man gpart"


On the IDE HDD of a real computer or virtualised Virtualbox machine, the FreeBSD system dnot cause to serious problems when trying to install the system, so do not confuse one with the other because to install FreeBSD on a real computer or virtualised Virtualbox machine is detailed in the Handbook.   Because when distribute  the disk it does it automatically and there is no need to do it manually with another tool like gpart.


----------



## twilk (Aug 18, 2020)

Alain De Vos said:


> maybe search google "usb quirks toshiba freebsd"


Searching for "freebsd seagate usb quirks" and "freebsd toshiba usb quirks" and variations on that doesn't turn up anything useful, unfortunately. Reading the usb_quirk(4) man page, nothing jumps out as immediately applicable to me. I tried:

```
usbconfig -d 1.2 add_quirk UQ_MSC_NO_SYNC_CACHE
```
where ugen1.2 is the Toshiba drive, but that didn't change anything.


----------



## mark_j (Aug 18, 2020)

Likely your disk has firmware with a set sleep default.  It then "disappears" from the system. This is overridden with (hopefully) a setting in the APM:

`camcontrol apm /dev/da0 -l 128`

(128 is the minimum value to prevent idle power down but you can go all the way to 254, which means higher I/O performance and NO sleeping).

Note: This will have to be done every time the disk is attached (whether at boot or afterwards).

In regards to the "CAM status: CCB request completed with an error" message, this is normally the result of a bad controller or cable. Be aware that any cable (standard or extension) to the USB port may cause timing issues (especially electrically poor ones). Not all cables are created equal.

When it does disappear, have you tried using `camcontrol reprobe /dev/da0`?


----------



## twilk (Aug 18, 2020)

Hi mark_j, thanks for the suggestion! Unfortunately, I've tried both

```
camcontrol apm da0
```
 and

```
camcontrol apm da0 -l 254
```
both of which don't fix the problem. On a fresh reboot, `camcontrol identify da0` outputs:

```
camcontrol: Can't get ATA command status
pass2: <TOSHIBA MQ04UBB400 JS000U> ACS-3 ATA SATA 2.x device
pass2: 400.000MB/s transfers

protocol              ACS-3 ATA SATA 2.x
device model          TOSHIBA MQ04UBB400
firmware revision     JS000U
serial number         [REDACTED]
WWN                   0000000000000000
additional product id 
cylinders             16383
heads                 16
sectors/track         63
sector size           logical 512, physical 4096, offset 0
LBA supported         268435455 sectors
LBA48 supported       7814037168 sectors
PIO supported         PIO4
DMA supported         WDMA2 UDMA5 
media RPM             5400
Zoned-Device Commands device managed

Feature                      Support  Enabled   Value           Vendor
read ahead                     yes      yes
write cache                    yes      yes
flush cache                    yes      yes
Native Command Queuing (NCQ)   yes              32 tags
NCQ Priority Information       no
NCQ Non-Data Command           no
NCQ Streaming                  no
Receive & Send FPDMA Queued    no
NCQ Autosense                  no
SMART                          yes      yes
security                       yes      no
power management               yes      yes
microcode download             yes      yes
advanced power management      yes      yes     128/0x80
automatic acoustic management  no       no
media status notification      no       no
power-up in Standby            no       no
write-read-verify              yes      no      0/0x0
unload                         yes      yes
general purpose logging        yes      yes
free-fall                      no       no
sense data reporting           yes      no
extended power conditions      no       no
device statistics notification yes      no
Data Set Management (DSM/TRIM) no
Trusted Computing              no
encrypts all user data         no
Sanitize                       no
Host Protected Area (HPA)      yes      no      7814037168/0
HPA - Security                 yes      no 
Accessible Max Address Config  no
```
with the important line being

```
advanced power management      yes      yes     128/0x80
```
i.e. the disk should already be set not to power down, but it apparently still does.


----------



## mark_j (Aug 18, 2020)

Just *apm *on its own disables apm. This is not what you want, as you want to utilise it.

When you set it to 254 what does it show after an *identify*?

I'm not sure if it will work, but what does this command report:

camcontrol epc -c status -P

Scratch that, I see it doesn't support it.

Edit: It might likely require a combination of both, ie, setting apm and standby:

`camcontrol apm /dev/da0 -l 254`
`camcontrol standby /dev/da0 -t 3600`


----------



## twilk (Aug 18, 2020)

Aha, that might have been the problem! Running `camcontrol apm da0 -l 254` then `camcontrol identify da0` shows:

```
advanced power management      yes      yes     254/0xFE
```
And now the Toshiba disk doesn't seem to hang any more. Thank you very much!


----------



## Mjölnir (Aug 18, 2020)

ralphbsz said:


> About the temperature: Disks work best around 30...40 degrees, and electronics doesn't care until much higher temperatures, so that's probably not the problem.


Ouch!  I can not let this go without contradiction:

electronics DO care about temperature, because physical/electrical characteristics vary with temperature
high temperature is one of the main factors of ageing -- this effect is used in the lab to estimate equipment's lifetime, so-called _burn-in tests_
electronic parts can get _much_ hotter than the surrounding temperature because they dissipate heat


----------



## twilk (Aug 18, 2020)

mjollnir said:


> Ouch!  I can not let this go without contradiction:
> 
> electronics DO care about temperature, because physical/electrical characteristics vary with temperature
> high temperature is one of the main factors of ageing -- this effect is used in the lab to estimate equipment's lifetime, so-called _burn-in tests_
> electronic parts can get _much_ hotter than the surrounding temperature because they dissipate heat


Fair enough. I've been monitoring the Seagate disk's temperature using smartctl(8). Results seem a little contradictory: I've had the disk hang the system at around 51°C, but if I just `dd if=/dev/da2 of=/dev/null bs=1M`, I get warnings from smartctl(8) around 55°C but dd(1) carries on fine (I `^C`'d it then to avoid damaging the disk).

So, overall, I'm still not sure what causes the hangs with the Seagate disk -- it might be high temperatures, or it might be something completely different. It seems like hangs are much more likely under heavy disk load (when the disk's temperature goes up), but I've also had one or two overnight under no or very light load (though I wasn't monitoring the temperature then).


----------



## mark_j (Aug 18, 2020)

twilk said:


> Aha, that might have been the problem! Running `camcontrol apm da0 -l 254` then `camcontrol identify da0` shows:
> 
> ```
> advanced power management      yes      yes     254/0xFE
> ...


Remember, you will have to do this every time the disk is attached, so at boot mount or when using something like automount/autofs.


----------



## twilk (Aug 18, 2020)

mark_j said:


> Remember, you will have to do this every time the disk is attached, so at boot mount or when using something like automount/autofs.


Got it, thanks! I've added the following to my /etc/crontab:

```
@reboot    root    camcontrol devlist | grep -e TOSHIBA -e Seagate | grep -o 'da[0-9]\+' | xargs -I X camcontrol apm X -l 254
```
Side note: is there a better way of finding out which physical device is represented by each `/dev/da*` device? My computer also has a CD/DVD drive that this command shouldn't be applied to. Are `/dev/da*` numbers given out predictably? Could I just hard-code `da0` and `da2` or is that a bad idea? Even better, is there an equivalent to Linux's /dev/disk/by-uuid/* (and similar) symlinks that point to numbered device files?


----------



## mark_j (Aug 18, 2020)

Yes there is. Refer to `tunefs`and section 18.7 of the handbook: Disk Labels. This is generally how you would handle USB detachable disks, anyway.

Also, about your Seagate issue, I would presume/assume this drive is SMR, so can you run `zonectl`on it and report back the results? Have your previously provided the results of `camcontrol identify` on this drive to the forum?


----------



## ralphbsz (Aug 19, 2020)

mjollnir said:


> Ouch!  I can not let this go without contradiction:
> 
> electronics DO care about temperature, because physical/electrical characteristics vary with temperature
> high temperature is one of the main factors of ageing -- this effect is used in the lab to estimate equipment's lifetime, so-called _burn-in tests_
> electronic parts can get _much_ hotter than the surrounding temperature because they dissipate heat


Yes, but disks are not only electronics ... they are very complex electro-mechanical-magnetic systems. Every component of them has temperature sensitivity. For the electronics themselves (the chips), temperature within reason is probably not a problem; die temperatures of 70 or 80 degrees are not particularly harmful. Well, at least for the CPUs in the data and control path. When it comes to the preamps and write amps that attach to the heads, things get complicated, and because at that point, we're into the weird world of high-speed analog, I don't understand what really happens. I know that temperature compensating of RF amplifiers is very difficult.

But most of the tough stuff in the disk is not electronics. The spindle bearing runs on lubricants (something akin to oil or grease), which changes behavior drastically with temperature. Make it cold, the motor has to work like mad to crank the spindle, causing strange heat flows (hot motor, cold case, cold platters), which causes mechanical tensions. Make it super hot, the lubricant starts flying around and splattering (usually, the air filter in the disk catches it, but sometimes it ends up on the platters). Speaking of platters, they are also covered in a "lubricant", but I don't think that is anything like an oil, it's more like a varnish or lacquer film that's highly polished. However, that lubricant is soft, so the effect of (unavoidable but not frequent) "head platter interactions" depends on temperature, and can be detrimental or helpful. Next effect is that obviously the platters change size with temperature, which seek algorithms have to correct for. Where it gets really insidious is that both the magnetic surface layer and the head are made from very bizarre materials (today, there is no iron oxide in the platter any more, which is why they are silver and not red). There is big temperature effects there. And finally: the heads fly on an air cushion; changing the temperature by 10 degrees changes the density of air by 4% (about 10 / 293, if you think of tenperature in Kelvin and assume that air is an ideal gas), which changes the fly height by about 4%. Modern disks actively compensate for fly height, but you don't want to stress that compensation by running too hot or too cold (or at too high an altitude, there is a reason disk drives shut themselves down at extreme height).

The important part is that the sensitivity of disk overall reliability to temperature is extremely well studied, and is one of the few things in disk reliability that is actually published (meaning available to everyone without an NDA). Look for the proceedings of a FAST conference in the mid-2000s or early 2010, there is a paper by some Google authors. There are also later papers by a professor from Toronto. There are several graphs of disk reliability as a function of temperature, and it seems that 30-40 degrees C is best for disks. A bit hotter (50 and up) gets bad pretty fast, while considerably cooler (down to 20) doesn't hurt very much. Below about 15 degrees C weird effects happens (the firmware will start acting differently).

Closely related to this is the question of what temperature data centers are kept at (most of the disks in the world are in data centers). For efficiency reasons, many data centers today are kept at very warm ambient temperatures on the outlet side of the computer (often above 40 degrees C), and the inlet side (known as the "cold aisle") is usually not terribly cold. These days, cold aisles run at minimum delta T to the hot aisle, and people in data centers more often run around in bikinis and rubber slippers than in hiking boots and down parka of the old days. (No, that's a joke: any employee found in a swimsuit and sandals in a data center would get at least reprimanded, if not fired on the spot, for both being unsafe and  sexual harassment. Most data centers are unattended, and humans rarely venture in there.) Seriously, the "cold aisle" is usually cooled more for the "comfort" of the humans who have to work in there. If you look at the cooling efficiency literature, "cold" aisles running up to 29 degrees C is the norm today. Now consider that disk enclosures are usually air cooled (using the "cold" aisle inlet air, but typically with multiple layers of disks), while CPUs are always seriously heatsinked, and often water-cooled, so you see that disks running at 30-40 is both efficient and reliable.

And cooling efficiency of data centers is a HUGE deal, a gigantic industry, of seriously world-changing importance. Given that every human spends a lot of energy today on computing (most of it is spent on data centers that the human causes work to be done in), and given that computing is a larger and larger fraction of the total energy consumption on earth, it is important to keep the cooling overhead as small as possible. In the bad old days, the cooling overhead could easily be over 100% (for every 1 W that the computer uses, you needed at least another 1 W to remove that heat), and that has been improved by a factor of roughly 10.


----------



## Mjölnir (Aug 19, 2020)

twilk said:


> [...] is there an equivalent to Linux's /dev/disk/by-uuid/* (and similar) symlinks that point to numbered device files?


`ls /dev/{diskid,gpt{,id},label,msdosfs,ufs,zvol/t450s}`

```
ls: /dev/diskid: No such file or directory
ls: /dev/label: No such file or directory
ls: /dev/ufs: No such file or directory
ls: /dev/ufsid: No such file or directory
/dev/gpt:
DUMP IRST efiboot0 gptboot0

/dev/gptid:
3354896e-ab2e-11ea-a908-507b9d666b68 f3587124-b087-11ea-903f-507b9d666b68
33612b8d-ab2e-11ea-a908-507b9d666b68

/dev/msdosfs:
EFISYS

/dev/zvol/t450s:
SWAP
```
These are filesystem labels, partition labels, and under /dev/label IIRC geom labels (RTFM glabel(8)).  I find it handy to give the zpool(8) name like the machine model or name, or disk model, or some other unique name like _bob_ or _mary_ or functional like _dmz-host_.  I.e. give a unique name to avoid getting confused when moving disks between machines.  In case you have equal disk models, pin a written label onto them, numbered and/or otherwise uniquely named.  I recommend to use functional partition labels in fstab(5).
ralphbsz TL;DR


----------



## twilk (Aug 26, 2020)

Thanks for your replies, everyone!



mark_j said:


> Yes there is. Refer to `tunefs` and section 18.7 of the handbook: Disk Labels. This is generally how you would handle USB detachable disks, anyway.


Thank you for the pointers, I'll RTFM. 

If you're interested, see below for problems I encountered when trying mjollnir's suggestions to the same effect.



mark_j said:


> Also, about your Seagate issue, I would presume/assume this drive is SMR, so can you run `zonectl` on it and report back the results?


It seems that neither drive uses SMR:

```
# zonectl -d /dev/da0 -c params
Zone Mode: None
Command support: None
Unrestricted Read in Sequential Write Required Zone (URSWRZ): No
Optimal Number of Open Sequential Write Preferred Zones: Not Set
Optimal Number of Non-Sequentially Written Sequential Write Preferred Zones: Not Set
Maximum Number of Open Sequential Write Required Zones: Not Set
# zonectl -d /dev/da0 -c rz
zonectl: DIOCZONECMD ioctl failed: Invalid argument
# zonectl -d /dev/da2 -c params
Zone Mode: None
Command support: None
Unrestricted Read in Sequential Write Required Zone (URSWRZ): No
Optimal Number of Open Sequential Write Preferred Zones: Not Set
Optimal Number of Non-Sequentially Written Sequential Write Preferred Zones: Not Set
Maximum Number of Open Sequential Write Required Zones: Not Set
# zonectl -d /dev/da2 -c rz
zonectl: DIOCZONECMD ioctl failed: Input/output error
```




mark_j said:


> Have your previously provided the results of `camcontrol identify` on this drive to the forum?


Nope, sorry! Here's the `camcontrol identify` output for the Seagate drive:

```
# camcontrol identify da0    
pass2: <ST4000DM000-1F2168 CC54> ACS-2 ATA SATA 3.x device
pass2: 400.000MB/s transfers

protocol              ACS-2 ATA SATA 3.x
device model          ST4000DM000-1F2168
firmware revision     CC54
serial number         [REDACTED]
WWN                   [REDACTED]
additional product id 
cylinders             16383
heads                 16
sectors/track         63
sector size           logical 512, physical 4096, offset 0
LBA supported         268435455 sectors
LBA48 supported       7814037168 sectors
PIO supported         PIO4
DMA supported         WDMA2 UDMA6 
media RPM             5900
Zoned-Device Commands no

Feature                      Support  Enabled   Value           Vendor
read ahead                     yes      yes
write cache                    yes      yes
flush cache                    yes      yes
Native Command Queuing (NCQ)   yes              32 tags
NCQ Priority Information       no
NCQ Non-Data Command           no
NCQ Streaming                  no
Receive & Send FPDMA Queued    no
NCQ Autosense                  no
SMART                          yes      yes
security                       yes      no
power management               yes      yes
microcode download             yes      yes
advanced power management      yes      yes     254/0xFE
automatic acoustic management  no       no
media status notification      no       no
power-up in Standby            yes      no
write-read-verify              yes      no      0/0x0
unload                         no       no
general purpose logging        yes      yes
free-fall                      no       no
sense data reporting           no       no
extended power conditions      no       no
device statistics notification no       no
Data Set Management (DSM/TRIM) no
Trusted Computing              no
encrypts all user data         no
Sanitize                       no
Host Protected Area (HPA)      yes      no      7814037168/7814037167
HPA - Security                 yes      no 
Accessible Max Address Config  no
```

Interestingly, this disk shows support for "power-up in Standby", but that that feature is disabled. Can I enable it in some way or is it just disabled because I've set APM level 254?




mjollnir said:


> `ls /dev/{diskid,gpt{,id},label,msdosfs,ufs,zvol/t450s}`
> 
> ```
> ls: /dev/diskid: No such file or directory
> ...


Here's the output of `ls -lF /dev/{diskid,gpt{,id},label,msdosfs,ufs,zvol/t450s}` for me:

```
ls: /dev/gpt: No such file or directory
ls: /dev/label: No such file or directory
ls: /dev/ufs: No such file or directory
ls: /dev/zvol/t450s: No such file or directory
/dev/diskid:
total 0
crw-r-----  1 root  operator   0x7e Aug 23 21:30 DISK-20190615006211F

/dev/gptid:
total 0
crw-r-----  1 root  operator   0x7f Aug 23 21:30 6c654b1b-d4b1-11ea-91ef-3085a9a86c56

/dev/msdosfs:
total 0
crw-r-----  1 root  operator   0x80 Aug 23 21:30 EFISYS
```
/dev/diskid/DISK-20190615006211F seems to be /dev/da2, the Toshiba disk, as that's what shows up in `zpool status -v`. It can't be used interchangably with /dev/da2 though:

```
# camcontrol apm /dev/diskid/DISK-20190615006211F -l 254 
camcontrol: cam_get_device: unable to find device unit number
```

I've also done a possibly weird thing in that I've not partitioned the USB disks, but just added them to zpools as-is, i.e.:

```
# zpool create pool1 /dev/da0
# zpool create pool2 /dev/da2
```
...which means that glabel(8) gets confused:

```
# glabel label -v twilk-server-seagate /dev/da0
glabel: Can't store metadata on /dev/da0: Operation not permitted.
# glabel label -v twilk-server-toshiba /dev/da2
glabel: Can't store metadata on /dev/da2: Operation not permitted.
```


----------



## jb_fvwm2 (Aug 26, 2020)

I found, with the PREVIOUS generation of usb
drivers,  removing the USB disks and reconnecting them as SATA/EIDE  was more reliable
in the long run... unless one just onlines them for r/w, with a slow
parameter [ such as rsync's --bwlimit=700  ] then umounts them again.
....
fwiw.


----------



## mark_j (Aug 27, 2020)

twilk said:


> Interestingly, this disk shows support for "power-up in Standby", but that that feature is disabled. Can I enable it in some way or is it just disabled because I've set APM level 254?



That would be my interpretation.

I didn't realise this is a zfs pool, so in that regard, having PUIS enabled would be a "bad thing" [tm].

Looking at your seagate drive, the difference seems to be in that it's failing on both reads and writes. Have you run smart on this drive to see if there's any reported issues?
It could be a cable or it could be a power supply or it could be a disk platter about to head off into space.

Make sure the cable is seated correctly, first off.

Forgive me if I asked this before, is the drive in an enclosure made by Seagate?

Edit: Forgot, has the drive any firmware available?





						Seagate Technology - Download Finder
					






					apps1.seagate.com
				




You can then flash it with https://www.seagate.com/au/en/suppo...as-drive-firmware-using-seaflashlin-007806en/


----------



## Mjölnir (Aug 27, 2020)

On labels: _t450s_ is the name of my zpool(8) (because it's the disk in my _ThinkPad T450s_ laptop), you should have replaced that with your _zpool_ name.  Do not give the same name to different disks: the benefit of labels is to have a _unique_ identifier.  ZFS labels disks, if you give it whole disks; thus the device can not be labeled by another utility.
See the drive's _Extended Power Conditions_: `camcontrol epc da0 -c status`
If the drive supports it, you could try to disable standby with `camcontrol epc da0 -c state -d -p Standby_y -s` & `camcontrol epc da0 -c state -d -p Standby_z -s`
The downside is no power saving when the system goes to standby.  Or disable the EPC timer values; RTFM camcontrol(8)


----------



## twilk (Aug 27, 2020)

Thanks for your replies, jb_fvvm2 and mark_j!



jb_fvwm2 said:


> I found, with the PREVIOUS generation of usb
> drivers,  removing the USB disks and reconnecting them as SATA/EIDE  was more reliable
> in the long run... unless one just onlines them for r/w, with a slow
> parameter [ such as rsync's --bwlimit=700  ] then umounts them again.
> ...


I've tried something like this and it seemed to work -- in my case, I wrote a script to copy files one-by-one, waiting before each file until the disk temperature was below 50°C. That seems impractical for normal use as I'd like to serve files off that disk over HTTP.



mark_j said:


> That would be my interpretation.
> 
> I didn't realise this is a zfs pool, so in that regard, having PUIS enabled would be a "bad thing" [tm].
> 
> ...



The Toshiba drive was failing on both reads and writes as well, before I set APM level 254.

That Seagate page shows no firmware updates available.

The drive is in its original Seagate-branded enclosure. I've checked both the USB and power cables; they're definitely seated correctly.

I'm running smartd(8), which checks the drive every night. Here's `smartctl -a /dev/da0`:

```
smartctl 7.1 2019-12-30 r5022 [FreeBSD 12.1-RELEASE-p8 amd64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Desktop HDD.15
Device Model:     ST4000DM000-1F2168
Serial Number:    [REDACTED]
LU WWN Device Id: [REDACTED]
Firmware Version: CC54
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5900 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Aug 27 11:05:26 2020 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (  117) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 518) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x1085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   106   099   006    Pre-fail  Always       -       11391720
  3 Spin_Up_Time            0x0003   094   092   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       809
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   070   060   030    Pre-fail  Always       -       43051781288
  9 Power_On_Hours          0x0032   054   054   000    Old_age   Always       -       41087
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       521
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0 0 0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   055   041   045    Old_age   Always   In_the_past 45 (Min/Max 45/47 #38)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       9
193 Load_Cycle_Count        0x0032   001   001   000    Old_age   Always       -       208771
194 Temperature_Celsius     0x0022   045   059   000    Old_age   Always       -       45 (0 16 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       12622h+56m+20.040s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       67676762018
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       134357918370

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     41079         -
# 2  Short offline       Completed without error       00%     41055         -
# 3  Short offline       Completed without error       00%     41031         -
# 4  Short offline       Completed without error       00%     41007         -
# 5  Extended offline    Completed without error       00%     40991         -
# 6  Short offline       Completed without error       00%     40959         -
# 7  Short offline       Completed without error       00%     40935         -
# 8  Short offline       Completed without error       00%     40911         -
# 9  Short offline       Completed without error       00%     40887         -
#10  Short offline       Completed without error       00%     40864         -
#11  Short offline       Completed without error       00%     40845         -
#12  Extended offline    Completed without error       00%     40828         -
#13  Short offline       Completed without error       00%     40796         -
#14  Short offline       Completed without error       00%     40772         -
#15  Short offline       Completed without error       00%     40764         -
#16  Short offline       Completed without error       00%     40751         -
#17  Short offline       Completed without error       00%     40729         -
#18  Short offline       Completed without error       00%      5945         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
```
Apparently no problems except for the temperature becoming too high once -- that was when I did a `dd if=/dev/da0 of=/dev/null` to see if there was a specific threshold temperature that would cause failures. Sadly, it didn't seem so clear-cut: I've had write failures of the Seagate drive at 51°C, but that dd(1) command carried on reading all the way up to 55°C, at which point I stopped it. The temperature also got up to 55°C once during an extended smart self-test overnight.

I haven't had problems with the Seagate drive for a few days, but for a while I'd regularly get unkillably stuck find(1) commands from various periodic(8) scripts running overnight, which were scanning that disk. Here's all of the messages in /var/log/messages from around that time:

```
Aug 22 03:50:00 server kernel: ugen1.2: <Seagate Expansion Desk> at usbus1 (disconnected)
Aug 22 03:50:00 server kernel: umass0: at uhub0, port 1, addr 1 (disconnected)
Aug 22 03:50:00 server kernel: da0 at umass-sim0 bus 0 scbus6 target 0 lun 0
Aug 22 03:50:00 server kernel: da0: <Seagate Expansion Desk 0712>  s/n [REDACTED] detached
Aug 22 03:50:00 server kernel: (da0:umass-sim0:0:0:0): Periph destroyed
Aug 22 03:50:00 server kernel: umass0: detached
Aug 22 03:50:08 server kernel: ugen1.2: <Seagate Expansion Desk> at usbus1
Aug 22 03:50:08 server kernel: umass0 on uhub0
Aug 22 03:50:08 server kernel: umass0: <Seagate Expansion Desk, class 0/0, rev 3.00/1.00, addr 1> on usbus1
Aug 22 03:50:08 server kernel: umass0:  SCSI over Bulk-Only; quirks = 0x0100
Aug 22 03:50:08 server kernel: umass0:6:0: Attached to scbus6
Aug 22 03:50:08 server kernel: da0 at umass-sim0 bus 0 scbus6 target 0 lun 0
Aug 22 03:50:08 server kernel: da0: <Seagate Expansion Desk 0712> Fixed Direct Access SPC-4 SCSI device
Aug 22 03:50:08 server kernel: da0: Serial Number [REDACTED]
Aug 22 03:50:08 server kernel: da0: 400.000MB/s transfers
Aug 22 03:50:08 server kernel: da0: 3815447MB (976754645 4096 byte sectors)
Aug 22 03:50:08 server kernel: da0: quirks=0x2<NO_6_BYTE>
```
The timing (disconnection at exactly 03:50:00) seems too specific to be random, though there's no cron(8) job scheduled then. I've set smartd(8) to run short self tests on Mon-Sat mornings between 3 and 4 am and extended self tests on Sundays at the same time. 22 August was a Saturday, so this error could correspond to a short self-test. However, that seems unlikely to cause this error, especially since I've run these tests every day since and they haven't caused the same error! According to my logs (sampling smartctl(8) every 2 minutes), the disk temperature was a constant 49°C that whole night.


----------



## twilk (Aug 27, 2020)

Thanks, mjollnir!


mjollnir said:


> On labels: _t450s_ is the name of my zpool(8) (because it's the disk in my _ThinkPad T450s_ laptop), you should have replaced that with your _zpool_ name.  Do not give the same name to different disks: the benefit of labels is to have a _unique_ identifier.  ZFS labels disks, if you give it whole disks; thus the device can not be labeled by another utility.


Unfortunately, /dev/zvol/ doesn't exist at all on my system! I never ran `zfs create`, just `zpool create`. That gave me a filesystem to mount, and `zfs list` shows the ones I created that way. Was that the wrong thing to do?



mjollnir said:


> See the drive's _Extended Power Conditions_: `camcontrol epc da0 -c status`




```
# camcontrol epc da0 -c status     
camcontrol: The epc subcommand only works with ATA protocol devices
# camcontrol epc da2 -c status
camcontrol: The epc subcommand only works with ATA protocol devices
```




mjollnir said:


> If the drive supports it, you could try to disable standby with `camcontrol epc da0 -c state -d -p Standby_y -s` & `camcontrol epc da0 -c state -d -p Standby_z -s`
> The downside is no power saving when the system goes to standby.  Or disable the EPC timer values; RTFM camcontrol(8)


Thanks, I'll have a look!


----------



## mark_j (Aug 28, 2020)

twilk said:


> Thanks for your replies, jb_fvvm2 and mark_j!
> 
> 
> I've tried something like this and it seemed to work -- in my case, I wrote a script to copy files one-by-one, waiting before each file until the disk temperature was below 50°C. That seems impractical for normal use as I'd like to serve files off that disk over HTTP.
> ...


Well nothing stands out for the smart info; given Seagate is notorious for producing convoluted numbers which are decoded only by their software so that only their software interprets the numbers correctly.
Eg, your seek error rate is 102108328 but your actual seek errors is 10, based on 43051781288.

I guess the important point is they are showing either pre-fail or old age. (And, pre-fail is a horrible notation unless it is actually in pre-fail mode, that is).

Why do you suspect the drive plays up at the same time? You've only shown one log entry. Are there others?

Remember, the OS has its own cron jobs that run. See /etc/crontab and /etc/periodic.

A gripe with me is the vagueness of the error messages. The CDB messages are less than useless, except if you take them as authoritative and assume the blocks reported are actually unreadable/unwritable (which I doubt) and it's more likely a timeout. This is why I would have almost bet the house on this drive being Shingled.

It's a quandary. Your drive, as reported from SMART looks ok. It then leads to the inevitable; it's FreeBSD. I need to think more about this.


----------



## twilk (Aug 29, 2020)

mark_j said:


> Why do you suspect the drive plays up at the same time? You've only shown one log entry. Are there others?


Here are the errors for the Seagate disk in my /var/log/messages:

```
[snip]
Aug 20 20:15:39 server kernel: ugen1.2: <Seagate Expansion Desk> at usbus1 (disconnected)
Aug 20 20:15:39 server kernel: umass0: at uhub0, port 1, addr 1 (disconnected)
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0c 07 a3 ab 00 00 01 00 
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 3 more tries remain
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0c 07 a3 ab 00 00 01 00 
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 2 more tries remain
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0c 07 a3 ab 00 00 01 00 
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 1 more tries remain
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0c 07 a3 ab 00 00 01 00 
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 0 more tries remain
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0c 07 a3 ab 00 00 01 00 
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0c 07 a3 ac 00 00 04 00 
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 3 more tries remain
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0c 07 a3 ac 00 00 04 00 
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 2 more tries remain
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0c 07 a3 ac 00 00 04 00 
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 1 more tries remain
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0c 07 a3 ac 00 00 04 00 
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 0 more tries remain
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0c 07 a3 ac 00 00 04 00 
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0b 80 af 0b 00 00 02 00 
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 3 more tries remain
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0b 80 af 0b 00 00 02 00 
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 2 more tries remain
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0b 80 af 0b 00 00 02 00 
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 1 more tries remain
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0b 80 af 0b 00 00 02 00 
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 0 more tries remain
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0b 80 af 0b 00 00 02 00 
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 3a 38 17 82 00 00 02 00 
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 3 more tries remain
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 3a 38 17 82 00 00 02 00 
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 2 more tries remain
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 3a 38 17 82 00 00 02 00 
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 1 more tries remain
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 3a 38 17 82 00 00 02 00 
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 0 more tries remain
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 3a 38 17 82 00 00 02 00 
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0c 80 89 c3 00 00 01 00 
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 3 more tries remain
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0c 80 89 c3 00 00 01 00 
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 2 more tries remain
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0c 80 89 c3 00 00 01 00 
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 1 more tries remain
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0c 80 89 c3 00 00 01 00 
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 0 more tries remain
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0c 80 89 c3 00 00 01 00 
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
Aug 20 20:15:39 server kernel: da0 at umass-sim0 bus 0 scbus6 target 0 lun 0
Aug 20 20:15:39 server kernel: da0: <Seagate Expansion Desk 0712>  s/n [REDACTED] detached
Aug 20 20:15:39 server kernel: (da0:umass-sim0:0:0:0): Periph destroyed
Aug 20 20:15:39 server kernel: umass0: detached
Aug 20 20:15:39 server ZFS[67324]: vdev state changed, pool_guid=$12612782409786294928 vdev_guid=$3198167944910114318
Aug 20 20:15:39 server ZFS[67640]: vdev is removed, pool_guid=$12612782409786294928 vdev_guid=$3198167944910114318
Aug 20 20:15:43 server kernel: ugen1.2: <Seagate Expansion Desk> at usbus1
Aug 20 20:15:43 server kernel: umass0 on uhub0
Aug 20 20:15:43 server kernel: umass0: <Seagate Expansion Desk, class 0/0, rev 3.00/1.00, addr 1> on usbus1
Aug 20 20:15:43 server kernel: umass0:  SCSI over Bulk-Only; quirks = 0x0100
Aug 20 20:15:43 server kernel: umass0:6:0: Attached to scbus6
Aug 20 20:15:43 server kernel: da0 at umass-sim0 bus 0 scbus6 target 0 lun 0
Aug 20 20:15:43 server kernel: da0: <Seagate Expansion Desk 0712> Fixed Direct Access SPC-4 SCSI device
Aug 20 20:15:43 server kernel: da0: Serial Number [REDACTED]
Aug 20 20:15:43 server kernel: da0: 400.000MB/s transfers
Aug 20 20:15:43 server kernel: da0: 3815447MB (976754645 4096 byte sectors)
Aug 20 20:15:43 server kernel: da0: quirks=0x2<NO_6_BYTE>
Aug 22 03:50:00 server kernel: ugen1.2: <Seagate Expansion Desk> at usbus1 (disconnected)
Aug 22 03:50:00 server kernel: umass0: at uhub0, port 1, addr 1 (disconnected)
Aug 22 03:50:00 server kernel: da0 at umass-sim0 bus 0 scbus6 target 0 lun 0
Aug 22 03:50:00 server kernel: da0: <Seagate Expansion Desk 0712>  s/n [REDACTED] detached
Aug 22 03:50:00 server kernel: (da0:umass-sim0:0:0:0): Periph destroyed
Aug 22 03:50:00 server kernel: umass0: detached
Aug 22 03:50:08 server kernel: ugen1.2: <Seagate Expansion Desk> at usbus1
Aug 22 03:50:08 server kernel: umass0 on uhub0
Aug 22 03:50:08 server kernel: umass0: <Seagate Expansion Desk, class 0/0, rev 3.00/1.00, addr 1> on usbus1
Aug 22 03:50:08 server kernel: umass0:  SCSI over Bulk-Only; quirks = 0x0100
Aug 22 03:50:08 server kernel: umass0:6:0: Attached to scbus6
Aug 22 03:50:08 server kernel: da0 at umass-sim0 bus 0 scbus6 target 0 lun 0
Aug 22 03:50:08 server kernel: da0: <Seagate Expansion Desk 0712> Fixed Direct Access SPC-4 SCSI device
Aug 22 03:50:08 server kernel: da0: Serial Number [REDACTED]
Aug 22 03:50:08 server kernel: da0: 400.000MB/s transfers
Aug 22 03:50:08 server kernel: da0: 3815447MB (976754645 4096 byte sectors)
Aug 22 03:50:08 server kernel: da0: quirks=0x2<NO_6_BYTE>
[snip]
Aug 23 22:34:42 server kernel: ugen1.2: <Seagate Expansion Desk> at usbus1 (disconnected)
Aug 23 22:34:42 server kernel: umass0: at uhub0, port 1, addr 1 (disconnected)
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 0b 80 09 75 00 00 09 00 
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 3 more tries remain
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 0b 80 09 75 00 00 09 00 
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 2 more tries remain
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 0b 80 09 75 00 00 09 00 
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 1 more tries remain
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 0b 80 09 75 00 00 09 00 
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 0 more tries remain
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 0b 80 09 75 00 00 09 00 
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0b b5 61 1e 00 00 20 00 
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 3 more tries remain
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0b b5 61 1e 00 00 20 00 
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 2 more tries remain
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0b b5 61 1e 00 00 20 00 
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 1 more tries remain
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0b b5 61 1e 00 00 20 00 
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 0 more tries remain
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0b b5 61 1e 00 00 20 00 
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0b b5 60 1e 00 00 20 00 
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 3 more tries remain
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0b b5 60 1e 00 00 20 00 
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 2 more tries remain
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0b b5 60 1e 00 00 20 00 
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 1 more tries remain
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0b b5 60 1e 00 00 20 00 
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 0 more tries remain
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 0b b5 60 1e 00 00 20 00 
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
Aug 23 22:34:42 server kernel: da0 at umass-sim0 bus 0 scbus6 target 0 lun 0
Aug 23 22:34:42 server kernel: da0: <Seagate Expansion Desk 0712>  s/n [REDACTED] detached
Aug 23 22:34:42 server kernel: (da0:umass-sim0:0:0:0): Periph destroyed
Aug 23 22:34:42 server kernel: umass0: detached
Aug 23 22:34:42 server ZFS[24697]: vdev state changed, pool_guid=$12612782409786294928 vdev_guid=$3198167944910114318
Aug 23 22:34:42 server ZFS[25288]: vdev is removed, pool_guid=$12612782409786294928 vdev_guid=$3198167944910114318
Aug 23 22:34:47 server kernel: ugen1.2: <Seagate Expansion Desk> at usbus1
Aug 23 22:34:47 server kernel: umass0 on uhub0
Aug 23 22:34:47 server kernel: umass0: <Seagate Expansion Desk, class 0/0, rev 3.00/1.00, addr 1> on usbus1
Aug 23 22:34:47 server kernel: umass0:  SCSI over Bulk-Only; quirks = 0x0100
Aug 23 22:34:47 server kernel: umass0:6:0: Attached to scbus6
Aug 23 22:34:47 server kernel: da0 at umass-sim0 bus 0 scbus6 target 0 lun 0
Aug 23 22:34:47 server kernel: da0: <Seagate Expansion Desk 0712> Fixed Direct Access SPC-4 SCSI device
Aug 23 22:34:47 server kernel: da0: Serial Number [REDACTED]
Aug 23 22:34:47 server kernel: da0: 400.000MB/s transfers
Aug 23 22:34:47 server kernel: da0: 3815447MB (976754645 4096 byte sectors)
Aug 23 22:34:47 server kernel: da0: quirks=0x2<NO_6_BYTE>
[snip]
```
The full log is attached as var-log-messages.txt.



mark_j said:


> A gripe with me is the vagueness of the error messages. The CDB messages are less than useless, except if you take them as authoritative and assume the blocks reported are actually unreadable/unwritable (which I doubt) and it's more likely a timeout. This is why I would have almost bet the house on this drive being Shingled.
> 
> It's a quandary. Your drive, as reported from SMART looks ok. It then leads to the inevitable; it's FreeBSD. I need to think more about this.


It seems unlikely to me, too, that the blocks are actually unreadable/unwritable as e.g. force-rebooting the system "fixes" the problem and lets me read/write those blocks again.

Is it possible that the drives use SMR internally but don't expose it over USB? I forgot to mention earlier, when running `zonectl -d /dev/da0 -c rz`, I get the following error in /var/log/messages:

```
Aug 29 11:10:19 server kernel: (da0:umass-sim0:0:0:0): ZBC IN. CDB: 95 00 00 00 00 00 00 00 00 00 00 02 00 00 00 00 
Aug 29 11:10:19 server kernel: (da0:umass-sim0:0:0:0): CAM status: SCSI Status Error
Aug 29 11:10:19 server kernel: (da0:umass-sim0:0:0:0): SCSI status: Check Condition
Aug 29 11:10:19 server kernel: (da0:umass-sim0:0:0:0): SCSI sense: ILLEGAL REQUEST asc:20,0 (Invalid command operation code)
Aug 29 11:10:19 server kernel: (da0:umass-sim0:0:0:0): Error 22, Unretryable error
```


----------



## mark_j (Aug 30, 2020)

twilk said:


> ```
> Aug 29 11:10:19 server kernel: (da0:umass-sim0:0:0:0): ZBC IN. CDB: 95 00 00 00 00 00 00 00 00 00 00 02 00 00 00 00
> Aug 29 11:10:19 server kernel: (da0:umass-sim0:0:0:0): CAM status: SCSI Status Error
> Aug 29 11:10:19 server kernel: (da0:umass-sim0:0:0:0): SCSI status: Check Condition
> ...



This does suggest it does zone block reads because there's allocation size specified in the CDB. Perhaps it is "just" a misreading of the firmware by the driver, ie, a bug.

I did trawl through your logs, and it does seem you do need a firmware update:

```
Aug 13 18:14:02 server smartd[37433]: Device: /dev/ada0, WARNING: A firmware update for this drive may be available,
Aug 13 18:14:02 server smartd[37433]: see the following Seagate web pages:
Aug 13 18:14:02 server smartd[37433]: http://knowledge.seagate.com/articles/en_US/FAQ/207931en
Aug 13 18:14:02 server smartd[37433]: http://knowledge.seagate.com/articles/en_US/FAQ/223651en
```

If you go to the second link it definitely shows a different firmware number. 
See: https://www.seagate.com/au/en/support/kb/barracuda-1tbdisk-platform-firmware-update-223651en/

Looking in the log you will see:

```
Aug 14 20:38:01 server kernel: ada0: <ST1000DM003-9YN162 CC4B> ATA8-ACS SATA 3.x device
Aug 14 20:38:01 server kernel: ada0: Serial Number S1D4H88M
```

Taking that serial number, selecting US as the region and using this URL it will show your drive is requiring a firmware update:





						Seagate Technology - Download Finder
					






					apps1.seagate.com
				




So, I suggest in the first instance updating the firmware (backup data first, if you need).


----------



## twilk (Aug 30, 2020)

mark_j said:


> This does suggest it does zone block reads because there's allocation size specified in the CDB. Perhaps it is "just" a misreading of the firmware by the driver, ie, a bug.


Hm, that's interesting. Assuming it isn't a bug and the disk does zone reads internally, how do I tell whether that's the origin of the problems/how do I fix it if it is?



mark_j said:


> I did trawl through your logs, and it does seem you do need a firmware update:
> 
> ```
> Aug 13 18:14:02 server smartd[37433]: Device: /dev/ada0, WARNING: A firmware update for this drive may be available,
> ...


True, though that's ada0 (the internal SATA-connected hard disk), *not* da0 (the USB-connected hard disk that I'm having problems with). That internal disk is working perfectly fine, though you're right, I should update its firmware. It seems unlikely that that would solve the problem with the external disk though.


----------



## mark_j (Aug 30, 2020)

twilk said:


> Hm, that's interesting. Assuming it isn't a bug and the disk does zone reads internally, how do I tell whether that's the origin of the problems/how do I fix it if it is?
> 
> 
> True, though that's ada0 (the internal SATA-connected hard disk), *not* da0 (the USB-connected hard disk that I'm having problems with). That internal disk is working perfectly fine, though you're right, I should update its firmware. It seems unlikely that that would solve the problem with the external disk though.


Oops sorry I mistook ada for da.


----------



## mark_j (Aug 31, 2020)

twilk said:


> Hm, that's interesting. Assuming it isn't a bug and the disk does zone reads internally, how do I tell whether that's the origin of the problems/how do I fix it if it is?



Well it reports itself as not SMR, so I guess you have to take the firmware's word for it.

The ZBC reports an error, so it does not support zone block control. Perhaps the only other definitive way is to re-format the disk as UFS, run dd on it and monitor the I/O. If it's SMR, it will have a burst of high I/O writes then drop precipitously after that and settle on some real mediocre write rate. That's shingled drive modus operandi because the write band or zone is under another 'shingle' of data.

(Those drives are so dodgy I personally believe it's criminal act selling them - especially as Seagate and Western Digital go to pains to hide the fact).

The other potential is it's disk managed, so it's "hiding" the SMR from the OS. Can you provide the `diskinfo -v da0` output?

Output of `camcontrol zone da0 -v -c rz`?

(I apologise if I've asked these before, it's hard to keep track.)

If that still reports it as non-zoned, then there are 5 potential causes:

1. Disk is failing. (But smart should give some indication). Throw it away.
2. Power supply is failing. Take disk out of enclosure, fit it into another.
3. USB cable is damaged. Swap it out.
4. USB female socket is damaged on the host. Swap it to another USB plug.
5. There's a bug in the CAM driver. Advise via PR.


----------



## twilk (Sep 2, 2020)

Thanks for the suggestions!



mark_j said:


> Well it reports itself as not SMR, so I guess you have to take the firmware's word for it.
> 
> The ZBC reports an error, so it does not support zone block control. Perhaps the only other definitive way is to re-format the disk as UFS, run dd on it and monitor the I/O. If it's SMR, it will have a burst of high I/O writes then drop precipitously after that and settle on some real mediocre write rate. That's shingled drive modus operandi because the write band or zone is under another 'shingle' of data.
> 
> ...


I wiped the disk before formatting it with ZFS by dd(1)'ing from /dev/zero; that wrote the whole 4 TB at a constant 150 MiB/s, which seems decent over USB-3.

```
# diskinfo -v da0
da0
        4096            # sectorsize
        4000787025920   # mediasize in bytes (3.6T)
        976754645       # mediasize in sectors
        0               # stripesize
        0               # stripeoffset
        60800           # Cylinders according to firmware.
        255             # Heads according to firmware.
        63              # Sectors according to firmware.
        Seagate Expansion Desk  # Disk descr.
        NA4MHXW9        # Disk ident.
        No              # TRIM/UNMAP support
        Unknown         # Rotation rate in RPM
        Not_Zoned       # Zone Mode

# camcontrol zone da0 -v -c rz
(pass2:umass-sim0:0:0:0): ZBC IN. CDB: 95 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 
(pass2:umass-sim0:0:0:0): CAM status: SCSI Status Error
(pass2:umass-sim0:0:0:0): SCSI status: Check Condition
(pass2:umass-sim0:0:0:0): SCSI sense: ILLEGAL REQUEST asc:20,0 (Invalid command operation code)
```




mark_j said:


> If that still reports it as non-zoned, then there are 5 potential causes:
> 
> 1. Disk is failing. (But smart should give some indication). Throw it away.
> 2. Power supply is failing. Take disk out of enclosure, fit it into another.
> ...



Hopefully not, and Linux seemed to handle the disk fine even under heavy load while I got errors from FreeBSD.
I'd really like to avoid this as the enclosure is sealed and I'd have to break it to get the disk out, I can't just unscrew it.
I've tried swapping the Toshiba disk's cable with the Seagate one (the Toshiba works perfectly now), but that didn't make a difference.
I've tried that too, no difference.

I've had another one of these errors, by the way:

```
Aug 31 21:05:13 server kernel: ugen1.2: <Seagate Expansion Desk> at usbus1 (disconnected)
Aug 31 21:05:13 server kernel: umass0: at uhub0, port 1, addr 1 (disconnected)
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 11 0d 89 14 00 00 20 00 
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 3 more tries remain
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 11 0d 89 14 00 00 20 00 
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 2 more tries remain
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 11 0d 89 14 00 00 20 00 
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 1 more tries remain
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 11 0d 89 14 00 00 20 00 
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 0 more tries remain
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 11 0d 89 14 00 00 20 00 
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 00 42 00 00 02 00 
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 3 more tries remain
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 00 42 00 00 02 00 
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 2 more tries remain
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 00 42 00 00 02 00 
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 1 more tries remain
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 00 42 00 00 02 00 
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 0 more tries remain
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 00 00 00 42 00 00 02 00 
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 3a 38 17 42 00 00 02 00 
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 3 more tries remain
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 3a 38 17 42 00 00 02 00 
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 2 more tries remain
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 3a 38 17 42 00 00 02 00 
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 1 more tries remain
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 3a 38 17 42 00 00 02 00 
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): Retrying command, 0 more tries remain
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): READ(10). CDB: 28 00 3a 38 17 42 00 00 02 00 
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): CAM status: CCB request completed with an error
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
Aug 31 21:05:13 server kernel: da0 at umass-sim0 bus 0 scbus6 target 0 lun 0
Aug 31 21:05:13 server kernel: da0: <Seagate Expansion Desk 0712>  s/n [REDACTED] detached
Aug 31 21:05:13 server kernel: (da0:umass-sim0:0:0:0): Periph destroyed
Aug 31 21:05:13 server kernel: umass0: detached
Aug 31 21:05:13 server ZFS[77721]: vdev state changed, pool_guid=$12612782409786294928 vdev_guid=$3198167944910114318
Aug 31 21:05:13 server ZFS[78365]: vdev is removed, pool_guid=$12612782409786294928 vdev_guid=$3198167944910114318
Aug 31 21:05:19 server kernel: ugen1.2: <Seagate Expansion Desk> at usbus1
Aug 31 21:05:19 server kernel: umass0 on uhub0
Aug 31 21:05:19 server kernel: umass0: <Seagate Expansion Desk, class 0/0, rev 3.00/1.00, addr 1> on usbus1
Aug 31 21:05:19 server kernel: umass0:  SCSI over Bulk-Only; quirks = 0x0100
Aug 31 21:05:19 server kernel: umass0:6:0: Attached to scbus6
Aug 31 21:05:25 server kernel: da0 at umass-sim0 bus 0 scbus6 target 0 lun 0
Aug 31 21:05:25 server kernel: da0: <Seagate Expansion Desk 0712> Fixed Direct Access SPC-4 SCSI device
Aug 31 21:05:25 server kernel: da0: Serial Number [REDACTED]
Aug 31 21:05:25 server kernel: da0: 400.000MB/s transfers
Aug 31 21:05:25 server kernel: da0: 3815447MB (976754645 4096 byte sectors)
Aug 31 21:05:25 server kernel: da0: quirks=0x2<NO_6_BYTE>
Sep  1 10:28:40 server ZFS[40593]: vdev state changed, pool_guid=$12612782409786294928 vdev_guid=$3198167944910114318
Sep  1 10:28:40 server ZFS[43612]: vdev state changed, pool_guid=$12612782409786294928 vdev_guid=$3198167944910114318
```
(Those last two lines are me running `zpool clear $da0pool`.)
This error happened while serving a 1.6GiB file over HTTP to another device on the LAN.


----------



## mark_j (Sep 3, 2020)

Well, where to now?
I think the next step, should you be willing, is to test the USB subsystem. It will also involve using dtrace to attempt to find the potential software issue.


----------



## JonnySac (May 17, 2021)

I have the same (or almost the same) Toshiba 4TB USB hard drives and the same issue on FreeBSD 13.0...Set up a ZFS raidz with 4TB Toshiba USB 3.0 hard drives. Everything works great until they go to sleep, once they do they will not wake up.  Any file I/O hangs that process to the point where it cannot be killed in any way...I need to hold the power button for 10 seconds bc it can't even shutdown.

I get a bunch of "ccb request completed with failure" errors until it gives up.  I tried everything in this thread and searched and tried every camcontrol power/sleep/standby setting and nothing makes a difference. I tried them in USB 2.0 ports, still has same problem. 
Unfortunetly I ended up having to move to Linux where everything worked without issue using the same set up and ZFS...On another system I have different issues with even a simple 60gb mirror on FreeBSB, USB, & ZFS. I guess some hardware really doesn't work well with zfs & usb.


----------



## Alain De Vos (May 17, 2021)

Freebsd just loads fine. What do you use to load ?


----------



## mark_j (May 18, 2021)

JonnySac said:


> Unfortunetly I ended up having to move to Linux where everything worked without issue using the same set up and ZFS...On another system I have different issues with even a simple 60gb mirror on FreeBSB, USB, & ZFS. I guess some hardware really doesn't work well with zfs & usb.



There are lots of settings you can tweak in sysctl for timeouts, read_cache/write_cache just to name a few if you want to track down this issue, but it seems you've moved on. That's fine. Use whatever gets the job done quickest for you.


----------



## JonnySac (May 18, 2021)

Well I really only moved on to see if it works, which it does, and works great actually (Ubuntu Server 21.04, OpenZFS, Samba server  4.13.3).  The discs go to sleep when there's no activity, and wake right up once needed.  That being said, I'd still much rather go back to FreeBSD for this setup as I know it and like it better.
I made a raidz ZFS using zstd-6 compression with 3 external 4TB USB 3.0 hard drives and a Samba server to be used on my internal network.  That way I can easily access it from any OS on my network, Windows, Mac, Linux, & FreeBSD.  I also have an internal HD that runs the OS so the 3 external HD's are completely separate for extra storage.  This gives me about 7.5TB of space where 1 of the 3 drives can completely fail without losing any data. Been trying it out today and I really like it so far, (except on FreeBSD `mount_smbfs` only works with v1 of Samba but that's another story).

So basically my only issue is, once the hard drives go to sleep after a few minutes of no activity on FreeBSD, nothing will wake them up. I tried different combinations of things like:
```
camcontrol apm da2 -l 254
camcontrol standby /dev/da2 -t 3600
```
But no luck.  Once they go to sleep, any file system call will indefinetly hang that process and terminal. No signals will work at all. 
If I have to stay with Linux for this I will but I'd be willing to try some other things to get it working with FreeBSD.  What else could I try? I really don't want them spinning 24/7 either so there's got to be a way to have them wake up properly. Thanks!


----------



## covacat (May 18, 2021)

usb to sata enclosure makes a lot of difference
i have a mac which i use with an external ssd on usb3
cheap enclosures cause hangs / panics when the system goes to sleep

also see https://www.amazon.com/gp/customer-...viewpnt?ie=UTF8&ASIN=B00FYKRI9C#RPV1HWB1JSQ5Y and the link to another review inside which somehow duplicates my experience

enclosures based on jmicron 578 work ok for me (on mac)


----------

