# Reproducable ZFS panic ("vrele: negative ref cnt")



## exscape (May 10, 2009)

Me again, with another OpenSolaris-related panic. I stumbled upon this while making a backup script to mirror my root pool via ggatel to a file on another computer. It occasionally panicked when the backup (zfs send | zfs recv) was complete, and sometimes when rebooting (I presume zpool export -a or somesuch is run on shutdown).
Anyway, here's a script that manages to crash my computer every time.

[root@chaos /usr/obj/usr/src/sys/DTRACE]# uname -a
FreeBSD chaos.exscape.org 7.2-RELEASE FreeBSD 7.2-RELEASE #0: Sat May  9 13:10:23 CEST 2009     root@chaos.exscape.org:/usr/obj/usr/src/sys/DTRACE  amd64


```
#!/bin/sh
if [ ! -f "/usr/zfscrash" ]; then
        cd /usr
        dd if=/dev/zero of=./zfscrash bs=1000k count=90
        ggatel create -u 1532 /usr/zfscrash
        zpool create crashpool ggate1532
        touch /crashpool/test
fi

if [ ! -f "/crashpool/test" ]; then
        zpool import crashpool
fi

zpool export crashpool
zpool import crashpool
touch /crashpool/test
zpool export crashpool
```
Running the above a bunch of times (i.e., in bash, while :; do /bin/sh crashpool.sh; done) panics within a minute or so. And, as I said, this does happen in real-world scenarios as well.
I'm not sure if it's ggate related or not, as I don't have a real disk to try it with.
(Oh, and do a "ggatel destroy -u 1532; rm /usr/zfscrash" later, of course.)


```
[root@chaos /usr/obj/usr/src/sys/DTRACE]# kgdb kernel.debug /var/crash/ZFS_CRASH.vmcore 
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "amd64-marcel-freebsd"...

Unread portion of the kernel message buffer:
panic: vrele: negative ref cnt
cpuid = 0
Uptime: 10m26s
Physical memory: 2031 MB
Dumping 133 MB: 118 102 86 70 54 38 22 6

Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /bootdir/boot/kernel/zfs.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/zfs.ko
Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /bootdir/boot/kernel/opensolaris.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/opensolaris.ko
Reading symbols from /boot/kernel/smbfs.ko...Reading symbols from /bootdir/boot/kernel/smbfs.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/smbfs.ko
Reading symbols from /boot/kernel/libiconv.ko...Reading symbols from /bootdir/boot/kernel/libiconv.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/libiconv.ko
Reading symbols from /boot/kernel/libmchain.ko...Reading symbols from /bootdir/boot/kernel/libmchain.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/libmchain.ko
Reading symbols from /boot/kernel/geom_gate.ko...Reading symbols from /bootdir/boot/kernel/geom_gate.ko.symbols...done.
done.
Loaded symbols for /boot/kernel/geom_gate.ko
#0  doadump () at pcpu.h:195
195             __asm __volatile("movq %%gs:0,%0" : "=r" (td));
(kgdb) bt
#0  doadump () at pcpu.h:195
#1  0xffffffff80517f28 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
#2  0xffffffff8051836c in panic (fmt=0xffffffff808c902c "vrele: negative ref cnt") at /usr/src/sys/kern/kern_shutdown.c:574
#3  0xffffffff8059c7f4 in vrele (vp=0x0) at /usr/src/sys/kern/vfs_subr.c:2146
#4  0xffffffff804e615e in fdfree (td=0xffffff0002ab8a50) at /usr/src/sys/kern/kern_descrip.c:1777
#5  0xffffffff804f2bf3 in exit1 (td=0xffffff0002ab8a50, rv=0) at /usr/src/sys/kern/kern_exit.c:284
#6  0xffffffff804fe093 in kthread_exit (ecode=0) at /usr/src/sys/kern/kern_kthread.c:149
#7  0xffffffff80d2479d in spa_async_thread () from /boot/kernel/zfs.ko
#8  0x0000000000000000 in ?? ()
#9  0xffffff0002ab8a50 in ?? ()
#10 0xfffffffebe80bc80 in ?? ()
#11 0xffffff001007b478 in ?? ()
#12 0xffffff001044c800 in ?? ()
#13 0xfffffffebe80bc70 in ?? ()
#14 0xffffffff804f433f in fork_exit (callout=Cannot access memory at address 0xffffffffffffffc0
) at /usr/src/sys/kern/kern_fork.c:810
Previous frame inner to this frame (corrupt stack?)
(kgdb)
```
I realize the backtrace isn't that helpful, no clue as to WHY though. I'm not a kernel developer (nor any other kind of professional developer ).
My kernel config is a GENERIC with the three magical DTrace lines added (although the DTrace modules are NOT loaded at this typing... or crashing):

```
+options        KDTRACE_FRAME           # Ensure frames are compiled in
+options        KDTRACE_HOOKS           # Kernel DTrace hooks
+options DDB_CTF              # all architectures - kernel ELF linker loads CTF data
```

Any ideas? Should I file a PR for this as well?
I was going to test in CURRENT, but my first build failed with "utsname undefined" and 1000 page faults on boot (I had to step though to be able to read the text because it scrolled faster than the monitor could draw) which was later fixed. Second build booted into single user, but I got "out of memory" while trying to mount /usr. A tad too unstable for me taste.


----------



## exscape (May 12, 2009)

After reproducing this in a VMware virtual machine, on a completely clean install, I filed a PR. http://www.freebsd.org/cgi/query-pr.cgi?pr=134496 
(Link doesn't work as I post this, but I guess it will shortly.)
All I needed to reproduce was to create a pool, export it, import it, export it... a few times, and then it panics.


----------



## trev (May 17, 2009)

PR now waiting on your feedback


----------



## graudeejs (May 17, 2009)

I have this panic (vrele: negative ref cnt) when i shutdown/reboot

```
FreeBSD 192.168.128.100 7.2-RELEASE FreeBSD 7.2-RELEASE #0: Mon May  4 12:48:32 EEST 2009     killasmurf86@127.0.0.1:/usr/obj/usr/src/sys/killabsd  i386
```

I have 2 zfs pools. One per disk


----------



## exscape (May 21, 2009)

I can't provide feedback, sorry.  
Running 8-CURRENT now, and this bug isn't there anymore.


----------



## graudeejs (May 21, 2009)

Yup, I'm running 8 as well, without this problem


----------

