# ZFS panic after replacing log device



## Terry_Kennedy (Nov 13, 2010)

I'm posting this here, and to the freebsd-stable and freebsd-fs mailing lists. Followups should probably happen on freebsd-fs.

I have a ZFS pool configured as:

`# zpool create data raidz da1 da2 da3 da4 da5 raidz da6 da7 da8 da9 da10 raidz da11 da12 da13 da14 da15 spare da16 log da0`

where da1-16 are WD2003FYYS drives (2TB RE4) and da0 is a 256GB PCI-Express SSD (name omitted to protect the guilty).

The SSD has been dropping offline randomly - it seems that one or more flash modules pop out of their sockets and need to be re-seated frequently for some reason.

The most recent time it did that, I replaced the SSD with another one (for some reason, the manufacturer ties the flash modules to a particular controller, so just moving the modules results in an offline SSD and inability to manage it due to "license limits exceeded" or some such nonsense).

ZFS wasn't happy with the log device being changed, and reported it as corrupted, with the suggested corrective action being to "zpool clear" it. I did that, and then did a "zpool replace data da0 da0" and it claimed to successfully resilver it. I then did a "zpool scrub" and the scrub completed with no errors. So far, so good.

However, any attempt to write to the array results in a near-immediate panic:


```
panic: solaris assert: sm->sm_spare + size <= sm->sm_size, file: /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/space_map.c, line: 93 cpuid=2
```

(Screenshot here in case I mis-typed something).

This is repeatable across reboot / scrub / test cycles. System is 8-STABLE as of Fri Nov  5 19:08:35 EDT 2010, on-disk pool is version 4/15, same as the kernel.

I know that certain operations on log devices aren't supported until pool version 19 or thereabouts, but the error messages and zpool command results gave the impression that what I was doing was supported and worked (when it didn't). If this is truly a "you can't do that in pool version 15", perhaps a warning could be added so users don't get fooled into thinking it worked?

I can give a developer remote console / root access to the box if that would help. I have a couple days before I will need to nuke the pool and restore it from backups.


----------



## Terry_Kennedy (Nov 16, 2010)

Terry_Kennedy said:
			
		

> I can give a developer remote console / root access to the box if that would help. I have a couple days before I will need to nuke the pool and restore it from backups.


I haven't heard from anyone that wants to look into this. I need to get the pool back into service soon. If I don't get any requests to postpone or offers to investigate by 00:00 GMT on the 18th, I'll proceed with re-initializing the pool (minus the SSD, which is _persona non grata_).


----------

