# ZFS pool degraded - cannot buy exact replacement SSD



## dougs (Nov 5, 2020)

Hello-

A drive in a zfs mirror containing the FreeBSD 12.1-RELEASE OS on a 9 years old SuperMicro server has failed. The two mirrored SSD drives are SanDisk SD6SB1M256G1022I which cannot be found for purchase online.

I'm wondering if I could purchase two 256GB SSDs. Use one as the replacement to sync with the working SSD. Once synced, take the old SSD offline and insert the second new SSD and perform a 
	
	



```
zfs replace
```
. Should this do the trick?

Are there any better solutions than this one?

~Doug


----------



## ralphbsz (Nov 5, 2020)

AFAIK, the replacement disk simply has to be the same size or bigger than the old one. I have upgraded a mirror pair of 1TB (spinning) disks by successively replacing them with 3TB and 4TB drives. Oh, and the new disk has to have the same sector size (512b or 4K).


----------



## dougs (Nov 5, 2020)

dougs said:


> Hello-
> 
> A drive in a zfs mirror containing the FreeBSD 12.1-RELEASE OS on a 9 years old SuperMicro server has failed. The two mirrored SSD drives are SanDisk SD6SB1M256G1022I which cannot be found for purchase online.
> 
> ...





ralphbsz said:


> AFAIK, the replacement disk simply has to be the same size or bigger than the old one. I have upgraded a mirror pair of 1TB (spinning) disks by successively replacing them with 3TB and 4TB drives. Oh, and the new disk has to have the same sector size (512b or 4K).


Is that logical or physical sector size? Many SSDs have 512 bytes logical sector size and 4KB physical sector size.


----------



## tOsYZYny (Nov 5, 2020)

I wasn't aware of the sector size limitation, the only thing I was aware of was the drives needing to be at least the same size as the current or larger.

Yes, that sounds right.  You will attach another drive to the mirror, let it fully resilver, then once that is done, add another one and take the old original offline.

It is up to you on ordering, but if the original hasn't died yet, I would leave it in while you're waiting to have 2 completely new drives in the mirror.  What is it going to hurt?

I have had a hard drive die on rebooting / power cycling a machine.  Many years ago, I started seeing some disk errors, but I still could see the drive and access files (at least the important ones, I figured, ah, I'll just reboot, after reseating the cables).  Upon rebooting, the drive completely died.  I didn't lose much as I had a backup from the day before, but I did lose a day of work and I'm still kicking myself for not syncing or attempting to sync before rebooting ...

If you need help with the exact commands, let me know.  AFAIK, you want to go from a 2-way mirror to 3-way (well, you have 1 failed drive), then after it is 3-way, add another drive to make it a 4-way mirror ...


----------



## ralphbsz (Nov 6, 2020)

dougs said:


> Is that logical or physical sector size? Many SSDs have 512 bytes logical sector size and 4KB physical sector size.


Logical. Physical sectors on SSDs are a strange concept anyway, since SSDs internally use much larger block sizes, and have interestingly complex internal metadata to map sectors to addresses.

And I'm not actually sure for ZFS you absolutely have to match the sector size, but the new SSD has to be able to perform IOs with the sector size that was used for the pool. So if the old SSD had 512-byte sectors, and the pool is configured with small sectors (ashift=9), and the new disk has logical 4K sectors, the  new disk won't be able to perform 512-byte IOs, and you're stuck. The other way around (pool configured for 4K sectors, ashift=12, new disk has 512-byte sectors) might actually work, because a 512-byte sector disk is capable of performing 4K IOs. But I'm not sure, if ZFS checks the sector size, it might break.


----------



## tOsYZYny (Nov 6, 2020)

Can you not take a snapshot of the pool, then send that to the "new" drive?  Then once that is ready, attach a device to that pool making a mirror, let it resilver?


----------



## dougs (Nov 10, 2020)

It's a bit more complicated than that.

```
[root@backup 10.Nov 3:04pm ~]# gpart show ada0
=>       34  500118125  ada0  GPT  (238G)
         34       1024     1  freebsd-boot  (512K)
       1058    2096222     2  freebsd-swap  (1.0G)
    2097280  165150720     3  freebsd-zfs  (79G)
  167248000   67108864     4  freebsd-zfs  (32G)
  234356864   67108864     5  freebsd-zfs  (32G)
  301465728   16777216     6  freebsd-swap  (8.0G)
  318242944  181875215        - free -  (87G)

[root@backup 10.Nov 3:04pm ~]#
```
ada0p3 is the OS. 
ada0p4 is the zfs logs for another zfs array. 
ada0p5 is the cache also for the other zfs array. 
ada0p2 was the original swap that became too small and since I had lots of space at the end of the fifth partition, I merely created a second swap partition (ada0p6) and used this one instead of ada.
Below is more information regarding the two zpools on my system:

```
[root@backup 10.Nov 3:10pm ~]# zpool status 
  pool: zdata
 state: ONLINE
  scan: scrub repaired 0 in 0 days 10:40:04 with 0 errors on Wed Nov  4 18:20:04 2020
config:

        NAME                 STATE     READ WRITE CKSUM
        zdata                ONLINE       0     0     0
          raidz3-0           ONLINE       0     0     0
            gpt/data_disk10  ONLINE       0     0     0
            gpt/data_disk11  ONLINE       0     0     0
            gpt/data_disk12  ONLINE       0     0     0
            gpt/data_disk13  ONLINE       0     0     0
            gpt/data_disk14  ONLINE       0     0     0
            gpt/data_disk15  ONLINE       0     0     0
            gpt/data_disk16  ONLINE       0     0     0
            gpt/data_disk17  ONLINE       0     0     0
            gpt/data_disk18  ONLINE       0     0     0
            gpt/data_disk19  ONLINE       0     0     0
        logs
          mirror-1           ONLINE       0     0     0
            gpt/log0         ONLINE       0     0     0
            gpt/log1         ONLINE       0     0     0
        cache
          gpt/cache0         ONLINE       0     0     0
          gpt/cache1         ONLINE       0     0     0

errors: No known data errors

  pool: zroot
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: scrub repaired 384K in 0 days 00:04:33 with 0 errors on Tue Nov 10 13:40:04 2020
config:

        NAME           STATE     READ WRITE CKSUM
        zroot          ONLINE       0     0     0
          mirror-0     ONLINE       0     0     0
            gpt/disk0  ONLINE       0     0     9
            gpt/disk1  ONLINE       0     0     1

errors: No known data errors
[root@backup 10.Nov 3:10pm ~]#
```

I've scrubbed and rescrubbed the zroot pool and I keep getting the same warning again and again. Running _smartctl _on the mirrored disks as follows:

```
[root@backup 10.Nov 3:17pm ~]# smartctl -a /dev/ada0
smartctl 7.1 2019-12-30 r5022 [FreeBSD 12.1-RELEASE-p10 amd64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Marvell based SanDisk SSDs
Device Model:     SanDisk SD6SB1M256G1022I
Serial Number:    140433400574
LU WWN Device Id: 5 001b44 bbe109efe
Firmware Version: X231600
User Capacity:    256,060,514,304 bytes [256 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      Unknown (0x000a)
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Nov 10 15:17:17 2020 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x11) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  10) minutes.

SMART Attributes Data Structure revision number: 4
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0032   100   100   ---    Old_age   Always       -       73
  9 Power_On_Hours          0x0032   253   100   ---    Old_age   Always       -       44100
 12 Power_Cycle_Count       0x0032   100   100   ---    Old_age   Always       -       56
166 Min_W/E_Cycle           0x0032   100   100   ---    Old_age   Always       -       1
167 Min_Bad_Block/Die       0x0032   100   100   ---    Old_age   Always       -       36
168 Maximum_Erase_Cycle     0x0032   100   100   ---    Old_age   Always       -       5838
169 Total_Bad_Block         0x0032   100   100   ---    Old_age   Always       -       346
171 Program_Fail_Count      0x0032   100   100   ---    Old_age   Always       -       69
172 Erase_Fail_Count        0x0032   100   100   ---    Old_age   Always       -       0
173 Avg_Write/Erase_Count   0x0032   100   100   ---    Old_age   Always       -       5604
174 Unexpect_Power_Loss_Ct  0x0032   100   100   ---    Old_age   Always       -       27
187 Reported_Uncorrect      0x0032   100   100   ---    Old_age   Always       -       56
194 Temperature_Celsius     0x0022   065   045   ---    Old_age   Always       -       35 (Min/Max 24/45)
212 SATA_PHY_Error          0x0032   100   100   ---    Old_age   Always       -       0
230 Perc_Write/Erase_Count  0x0032   100   100   ---    Old_age   Always       -       0 0 47696
232 Perc_Avail_Resrvd_Space 0x0033   100   100   004    Pre-fail  Always       -       99
233 Total_NAND_Writes_GiB   0x0032   100   100   ---    Old_age   Always       -       1462359
241 Total_Writes_GiB        0x0030   253   253   ---    Old_age   Offline      -       467347
242 Total_Reads_GiB         0x0030   253   253   ---    Old_age   Offline      -       15118
243 Unknown_Marvell_Attr    0x0032   100   100   ---    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 53 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 53 occurred at disk power-on lifetime: 44099 hours (1837 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  51 40 00 00 00 00 00

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 18 50 74 80 40 08      00:00:00.000  READ FPDMA QUEUED

Error 52 occurred at disk power-on lifetime: 44099 hours (1837 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  51 40 00 00 00 00 00

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 18 50 74 80 40 08      00:00:00.000  READ FPDMA QUEUED

Error 51 occurred at disk power-on lifetime: 44099 hours (1837 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  51 40 00 00 00 00 00

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 20 50 75 80 40 08      00:00:00.000  READ FPDMA QUEUED

Error 50 occurred at disk power-on lifetime: 44099 hours (1837 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  51 40 00 00 00 00 00

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  2f 00 01 10 00 00 00 08      00:00:00.000  READ LOG EXT

Error 49 occurred at disk power-on lifetime: 44099 hours (1837 days + 11 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  51 40 00 00 00 00 00

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 78 71 80 40 08      00:00:00.000  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     34559         -
# 2  Short offline       Completed without error       00%     34535         -
# 3  Short offline       Completed without error       00%     34511         -
# 4  Short offline       Completed without error       00%     34487         -
# 5  Short offline       Completed without error       00%     34463         -
# 6  Short offline       Completed without error       00%     34439         -
# 7  Short offline       Completed without error       00%     34415         -
# 8  Short offline       Completed without error       00%     34391         -
# 9  Short offline       Completed without error       00%     34367         -
#10  Short offline       Completed without error       00%     34343         -
#11  Short offline       Completed without error       00%     29964         -
#12  Short offline       Completed without error       00%     29940         -
#13  Short offline       Completed without error       00%     29916         -
#14  Short offline       Completed without error       00%     29892         -
#15  Short offline       Completed without error       00%     29868         -
#16  Short offline       Completed without error       00%     29844         -
#17  Short offline       Completed without error       00%     29820         -
#18  Short offline       Completed without error       00%     29796         -
#19  Short offline       Completed without error       00%     29772         -
#20  Short offline       Completed without error       00%     29748         -
#21  Short offline       Completed without error       00%     29724         -

Selective Self-tests/Logging not supported

[root@backup 10.Nov 3:17pm ~]#
```


```
[root@backup 10.Nov 3:18pm ~]# smartctl -a /dev/ada1
smartctl 7.1 2019-12-30 r5022 [FreeBSD 12.1-RELEASE-p10 amd64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Marvell based SanDisk SSDs
Device Model:     SanDisk SD6SB1M256G1022I
Serial Number:    140433401574
LU WWN Device Id: 5 001b44 bbe10a2e6
Firmware Version: X231600
User Capacity:    256,060,514,304 bytes [256 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      Unknown (0x000a)
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Nov 10 15:19:07 2020 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever 
                                        been run.
Total time to complete Offline 
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x11) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  10) minutes.

SMART Attributes Data Structure revision number: 4
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0032   100   100   ---    Old_age   Always       -       81
  9 Power_On_Hours          0x0032   253   100   ---    Old_age   Always       -       44101
 12 Power_Cycle_Count       0x0032   100   100   ---    Old_age   Always       -       56
166 Min_W/E_Cycle           0x0032   100   100   ---    Old_age   Always       -       1
167 Min_Bad_Block/Die       0x0032   100   100   ---    Old_age   Always       -       30
168 Maximum_Erase_Cycle     0x0032   100   100   ---    Old_age   Always       -       5831
169 Total_Bad_Block         0x0032   100   100   ---    Old_age   Always       -       384
171 Program_Fail_Count      0x0032   100   100   ---    Old_age   Always       -       81
172 Erase_Fail_Count        0x0032   100   100   ---    Old_age   Always       -       0
173 Avg_Write/Erase_Count   0x0032   100   100   ---    Old_age   Always       -       5592
174 Unexpect_Power_Loss_Ct  0x0032   100   100   ---    Old_age   Always       -       27
187 Reported_Uncorrect      0x0032   100   100   ---    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   065   044   ---    Old_age   Always       -       35 (Min/Max 25/44)
212 SATA_PHY_Error          0x0032   100   100   ---    Old_age   Always       -       0
230 Perc_Write/Erase_Count  0x0032   100   100   ---    Old_age   Always       -       0 0 47656
232 Perc_Avail_Resrvd_Space 0x0033   100   100   004    Pre-fail  Always       -       99
233 Total_NAND_Writes_GiB   0x0032   100   100   ---    Old_age   Always       -       1457799
241 Total_Writes_GiB        0x0030   253   253   ---    Old_age   Offline      -       467334
242 Total_Reads_GiB         0x0030   253   253   ---    Old_age   Offline      -       15856
243 Unknown_Marvell_Attr    0x0032   100   100   ---    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     34559         -
# 2  Short offline       Completed without error       00%     34535         -
# 3  Short offline       Completed without error       00%     34511         -
# 4  Short offline       Completed without error       00%     34487         -
# 5  Short offline       Completed without error       00%     34463         -
# 6  Short offline       Completed without error       00%     34439         -
# 7  Short offline       Completed without error       00%     34415         -
# 8  Short offline       Completed without error       00%     34391         -
# 9  Short offline       Completed without error       00%     34367         -
#10  Short offline       Completed without error       00%     34343         -
#11  Short offline       Completed without error       00%     29964         -
#12  Short offline       Completed without error       00%     29940         -
#13  Short offline       Completed without error       00%     29916         -
#14  Short offline       Completed without error       00%     29892         -
#15  Short offline       Completed without error       00%     29868         -
#16  Short offline       Completed without error       00%     29844         -
#17  Short offline       Completed without error       00%     29820         -
#18  Short offline       Completed without error       00%     29796         -
#19  Short offline       Completed without error       00%     29772         -
#20  Short offline       Completed without error       00%     29748         -
#21  Short offline       Completed without error       00%     29724         -

Selective Self-tests/Logging not supported

[root@backup 10.Nov 3:19pm ~]#
```
Above output leads me to believe there is a major issue with /dev/ada0.

Assuming that the /dev/ada0 drive needs to be replaced, I am unsure as to how to recreate the logs mirror and the cache.

In the back of the 2U server chassis are two slots for the SSD drives. I aim to take the failing drive out and put in a new replacement drive. I wish to avoid sliding the server out of the rack but I will if there are no other more effective ways of dealing with this.

Assuming we are not sliding the server out, below is a list of  steps I believe are needed in order to replace /dev/ada0:

```
# zpool detach zroot /dev/ada0
<take out failing drive and insert replacement drive>
# camcontrol devlist -v (verify device name and /dev designation)
# gpart create -s GPT ada0
# gpart add -b 34 -s 512k -t freebsd-boot -i 1 -l zfsboot0 ada0
# gpart add -b 1058 -s 2096222 -t freebsd-swap -i 2 -l swap0 ada0
# gpart add -b 2097280 -s 165150720 -t freebsd-zfs -i 3 -l disk0 ada0
$ gpart add -b 167248000 -s 67108864 -t freebsd-zfs -i 4 -l log0 ada0
$ gpart add -b 234356864 -s 67108864 -t freebsd-zfs -i 5 -l cache0 ada0
$ gpart add -b 301465728 -s 16777216 -t freebsd-zfs -i 6 -l swap20 ada0
$ gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0
```
I am unsure as to what the next step should be. I've seen some postings where it states to use *zpool attach zroot /dev/ada0p3*. But that doesn't address the issue of creating the log0 and the cache0 mirrors. Should I use *zpool replace zroot ada0*? Section 19.3.5 in the the FreeBSD documentation on ZFS (https://www.freebsd.org/doc/handbook/zfs-zpool.html) seems to indicate the use of the replace parameter. The zfs mirror has not yet entered into a degraded state so this should be considered a functioning mirror, yes?

Will the above method work? Or would it be easier to simply add the replacement drive to the mirror, add the bootcode, resilver and then detach the failing drive, and then boot down the server and move the replacement drive to the correct SSD drive slot in the back of the server?

Thanks in advance for any advice you may offer.

~Doug


----------



## tOsYZYny (Nov 11, 2020)

Hi Doug,

That seems complicated ...

Is it possible to "simulate" what you want to do with some other drives you have laying around?  You can replicate the setup as close as possible so you are comfortable with what you're doing.

Next, I only partially understand what you're doing.  I think you're trying to accomplish having different RAID levels for each mount point / use.

I would still think you could take a snapshot of a volume(s) and at least back those up somewhere for the time being.  I would create a backup pool that is a mirror so there is some fault tolerance there.  If you make a mistake or the drive dies, you have a backup.

Once you have that backup, then you can rebuild new drives however you like.

For my setup (workstation / router), I have no replication or RAID.  I have spare drives and have a "backup" / image of my systems and can easily rebuild them.  I backup that configuration across several devices, some remote, so I have > 3 copies at all times (in a non-mirrored configuration, so it could be susceptible to silent data loss).  For my media, I use mirroring.

Hope that helps.

Walter


----------



## ralphbsz (Nov 11, 2020)

Same random observations: Disk ada0 seems worn out. Total write traffic is 1826 x the capacity (user writes line 214 divided by logical capacity). After correcting for write sharing and overprovisioning, the average number of write cycles is 5604 (line 173). Typical NAND cell lifetime is 10^(3-4) cycles, so this one should be retired. The disk is also ~5 years old. Disk ada1 is very similar. I would replace both, obviously one at a time.

About how to create the logs and caches: Do you need to to recreate them in place? I think they only exist for performance purposes. How about completely destroying them, replacing the two disks (get the data resilvered), then creating new logs and caches from scratch? And while you are at it, think through whether the partition (slice) sizes and arrangement are still the way you want them?


----------



## PMc (Nov 11, 2020)

What you have is mainly a mechanical problem. It would be simple (and most safe) to add a 3rd mirror, then remove the flawed one.

Your partitioning is nice, here you don't need to care about disk size, and can treat each partition individually. Read this as: you _have to_ treat each partition individually.

Concerning cache and log: the cache can be removed at any time, and recreated later in any place. You will loose the content, and it will rebuild when recreated - there is no way around that. So performance will suffer for some time.
The log is mirrored, and behaves like any other piece of zfs mirror. The log is a safety measure for power loss; it may be possible to remove it and zfs would fall back to inline logging (with performance impact, etc.). But usually it is treated like any mirror (add and remove copies for replacement).



dougs said:


> I am unsure as to what the next step should be. I've seen some postings where it states to use *zpool attach zroot /dev/ada0p3*. But that doesn't address the issue of creating the log0 and the cache0 mirrors. Should I use *zpool replace zroot ada0*?


No! The log and cache are entirely separate matters! You have to treat the log AND the cache AND the zroot each for themselves, then replace the drive, then recreate each of them as appropriate. You should NOT zpool attach/replace/whatever ada0! ada0 is of no concern to your ZFS, only ada0p3/4/5 are!

Now get yourself two USB sticks, create some partitions on them, and play with that attach and replace stuff until you know what you're doing.

Obviousely you have to detach the swap partitions also before plucking the drive, otherwise system will crash. Again, separate issue.



dougs said:


> Will the above method work? Or would it be easier to simply add the replacement drive to the mirror, add the bootcode, resilver and then detach the failing drive, and then boot down the server and move the replacement drive to the correct SSD drive slot in the back of the server?


It would be more safe. These drives are same brand and appear to have their TBW done, so I'm actually wondering why only one shows these errors and not the other. Read this as: it is safer to use different brands of disk for mirrors. (Same brand has a performance advantage with raid/stripe, not so with mirror).


----------

