# Problematic FreeBSD 9.3-10.1 update



## gessel (Feb 16, 2015)

I tried to update my 9.3 system to 10.1 and had it go somewhat sideways as I was leaving the country.  Foolish timing, but it was one of the tasks I intended to accomplish while I was home. 

As I run a custom kernel, I followed the custom kernel instructions and followed the methods I have for `buildworld`, `installworld`up through 9.3.  Sadly, those are on a wiki on a jail on the system which is now in a less than optimal state, so I can't enumerate them from memory, but I got through the update, reboot, and mergemaster and then went to update ports and couldn't.  I might have been hit by this bug, which looks like this one too, but I didn't know it at the time.

I also switched to pkg install (like the poor guy above) because I was getting the same errors for ports.  I tried to package install GCC and Clang, but that didn't seem to clear up the library problems.

I also tried to switch to `freebsd-update`, which may have been a suboptimal move after trying `buildworld`.

As I flew out of SFO to Iraq, I had one Mosh session still alive and SSH wasn't restarting because of library problems, and then the client died at the airport.  Sad moment. `tmux`...

But the jails were still running fine so I intended to wait until I got home again before risking any more inexpert meddling and also hoped that better advice might appear with time.  Everything was fine until a power failure killed the system today and it won't reboot.

I'm on ZFS, and I'm pretty sure ZFS won't load because the libraries are munged.

```
trying to mount root from zfs:zroot failed with error 2 unknown file system
```

I was smart enough to stick a FreeBSD 10.1 DVD into the drive before leaving and from that I've been able to mount the system, restart networking, chroot into the dead system and cause further problems.

`# uname -a`
returns:

```
FreeBSD  10.1-RELEASE FreeBSD 10.1-RELEASE #0 r274401: Tue Nov 11 21:02:49 UTC 2014  root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64
```
but now `# freebsd-update upgrade -r 10.1-RELEASE` gives me:

```
freebsd-update: Cannot upgrade from 10.1-RELEASE to itself
```
and, likely part of the root problem of not rebooting, `# zfs` yields:

```
internal error: failed to initialize ZFS library
```
My inclination is to try overwriting (as AntumDeluge seemed to) per these instructions.

Does this sound like a reasonable plan or have better strategies emerged for recovering from this?


----------



## gessel (Feb 16, 2015)

I have a strategy which should work, I'm documenting it as I go in case others get in a similar conundrum.  I suspect there was a short window in which the FreeBSD 9.x to 10.1 upgrade process had a serious flaw which resulted in a compromised system.  I do not know how to "fix" that problem yet, but I don't think it is happening any more.  Because I left the system compromised but operating until a power failure hit it 2 months later, I may be the last person to have to deal with this.  But just in case:

*Problem:*
My specific details are above, but I started to get errors like]

```
/usr/local/llvm34/bin/clang -c -O2 -pipe -fno-strict-aliasing -march=amdfam10 -std=c99 -Wall -Wredundant-decls -Wnested-externs -Wstrict-prototypes -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual -Wundef -Wno-pointer-sign -fformat-extensions -Wmissing-include-dirs -fdiagnostics-show-option -Wno-error-tautological-compare -Wno-error-empty-body -Wno-error-parentheses-equality -Wno-error-unused-function -nostdinc -I. -I/usr/src/sys -I/usr/src/sys/contrib/altq -I/usr/src/sys/contrib/ipfilter -I/usr/src/sys/dev/ath -I/usr/src/sys/dev/ath/ath_hal -I/usr/src/sys/contrib/dev/ath/ath_hal -I/usr/src/sys/contrib/ngatm -I/usr/src/sys/dev/twa -I/usr/src/sys/dev/cxgb -I/usr/src/sys/dev/cxgbe -I/usr/src/sys/contrib/libfdt -D_KERNEL -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -mno-aes -mno-avx -mcmodel=kernel -mno-red-zone -mno-mmx -mno-sse -msoft-float -fno-asynchronous-unwind-tables -ffreestanding -fstack-protector /usr/src/sys
/amd64/amd64/genassym.c
clang: error: unknown argument: '-fformat-extensions'
*** Error code 1
```

Basically, no ports would build, very much like the issues AntumDeluge, shuxuef, and gustopn reported (although gustopn's problems look a little different.)

I left the system, as noted above, and after a power failure, it would not boot.  I got

```
trying to mount root from zfs:zroot failed with error 2 unknown file system
<mountroot>
```

*Repairs:*
Repairs are ongoing as of this update, but I had a few things going for me which I don't think I could do without:

FreeBSD 10.1 bootable DVD in the system's drive,

Remote console connection (KVM over IP: I'm doing this from Iraq to a system in California),

System behind firewall with OpenVPN set up to provide access to the local LAN,
A large, accessible NAS device operational on the LAN (FreeNAS) with an NFS server running.
My system is configured with all the significant customized stuff running in jails.
The next useful thing I did was to reboot (just pressing Enter at the <mountroot> prompt will rebooot), catch BIOS with an `F12`, and selected the CD as the boot device to boot from the FreeBSD 10.1 DVD.  I selected the liveCD option at the prompt, booted into the default multi-user mode and got console on the LiveCD.

*Step 1: Mount the dead system*
I found this reference useful and executed the following:
`# zpool import -f -R /mnt zroot`

I had to use the -f flag to force the mount as it didn't unmount cleanly when the power went out.  If this works, and you can navigate to your file system at /mnt, then your system can be salvaged. Yay?

*Step 2: Get SSH running.*
I didn't originally do it in this order, but getting SSH running makes life much simpler. I found this handy guide very useful:

GET A WRITEABLE DIRECTORY STRUCTURE FOR CONFIG FILES

```
# mkdir /tmp/etc
# mount_unionfs /tmp/etc /etc
# mkdir /tmp/root
# mount_unionfs /tmp/root /root
```

SET A PASSWORD FOR ROOT BEFORE ENABLING NETWORKING

```
# passwd
```

NETWORKING WORKING

```
# ifconfig bce0 inet [IP.ON.YOUR.NET] netmask [YOUR.NETS.NET.MASK]
# route add default [IP.OF.YOUR.GATEWAY]
# ee /etc/resolv.conf
-> nameserver [A.LOCAL.NAME.SERVER]
-> nameserver 8.8.8.8
```

CONFIGURE AND START SSHD

```
# ee /etc/ssh/sshd_config
-> PermitRootLogin yes
# /etc/rc.d/sshd onestart
```

At this point `ping www.google.com` should yield the expected results and you should be able to SSH into the 10.1 DVD LiveCD.  Yay?

*Step 3: chroot into the dead system*
I suspect you don't really need to do this, but as I'm going to use ezjail-admin to collect my datas, and ez-jail admin isn't installed on the DVD, it is helpful to be able to execute what you can from the OS.  Plus it gives some useful diagnostic hints.
`# chroot /mnt`
should put you into your host system.

At this point I tried a few ZFS commands and got errors like this:

```
# zpool import -f bootpool
link_elf_obj: symbol sdt_probe_func undefined
linker_load_file: Unsupported file type
internal error: failed to initialize ZFS library
```


```
# zfs mount -a
link_elf_obj: symbol sdt_probe_func undefined
linker_load_file: Unsupported file type
internal error: failed to initialize ZFS library
```

That ain't right.  I went through a few different various of things like`freebsd-update rollback`, using svn to repopulate /usr/src and /usr/ports and a few other trials that were slightly above my skill level and messed things up even more than they were.  Now `portsnap fetch` and `freebsd-update` are reporting signature errors of various sorts.  I'm not helping.

*Step 4: Acceptance.  Backup what you have.*
At this point I have a badly borked system I rely on for network services, and borked in a way that my more-FreeBSD inclined friends say "uh oh, I'm not sure" about.  If they don't have a "oh, just ..." response, I'm in trouble.  Time to do a system install and overwrite the mistakes under a pristine layer of fresh data. I meant to do that.  Really.

Fortunately I had some disk space available and I had previously configured my NAS system to serve NFS volumes to various mountpoints for access by the system.  There's some configuration on the host to do to get this to work, and this step won't work unless you get an NFS client working on your system (see this helpful guide).  If so,  the following will use `ezjail-admin` to archive directly to the NFS mountpoint.  You could archive locally and SCP or FTP the jail archives off manually.

`# mount -v [IP.OF.NAS.BOX]:/mnt/Tank/backup /mnt`

I did this as sort of a belt & suspenders backup, which probably wasted a lot of time, but _data_.
`# tar -cjf /mnt/jails_backup.tar.bz2 /usr/jails`
the -j flag uses bz2 which is really slow and probably a total waste of time, but this is going to be a big archive.  It's also redundant and can probably be skipped, because the useful step, following this fine guide, is:
`# ezjail-admin archive -A -d /mnt/ezjail_archives/`
(note, because of the mess here, jails are quite off already).

There seems to be a limitation in the Cpio version (or tar version) used by ezjail-admin that may have a file length limitation of 100 characters.  I'm getting errors like

```
pax: Cpio header field is too small to store file
```
So I'm glad my ssh session is logging to a file (and scrollback is set to 2000 lines) so I can parse the error messages into a tar file name list to call with the -T option later and then restore into the restored jail following this guide.


And now I wait while this backs up... in the mean time, I will backup all the customized bits of the host system (host of the jails, the nomenclature here is a little unclear to me).  The list of the files that I think are important will be the next update...


----------



## gessel (Feb 16, 2015)

*Backups...*
To archive specific files for later recovery, I used the -T command from this guide.  My first step was to archive the 28 or so files that Cpio couldn't handle from the `ezjail-admin archive` operation, which I copied out of the ssh scrollback and edited to be a file name list then FTP'd back to the NAS to call from tar like so:
`# tar -c -v -j -T /mnt/ezjail_archives/ezjail_backup_missed.txt -f /mnt/ezjail_archives/missed_mail_jail-201502152136.38.tar.bz2`
(note it was all mail files from 2002 and probably corrupt files in this case, unlikely to be a tragedy if lost)

Also note that very large files (in my case, backups stored within the file structure) are too large to backup with the cpio format.  A jail with backups or with media files in is likely to run into this issue.  The above command can be used to relatively conveniently back these up if they're needed.

Some directories are pretty important:
Below, I use the command ``(date +"%Y%m%d")`` substitution will append the current date, which matches `ezjail-admin archive`format and makes it a bit easier to find things.

I'm hoping to sort out the versioning with mergemaster following a buildworld, to come later.


```
# tar -cJf /mnt/etc-`(date +"%Y%m%d")`.tar.xz /etc
# tar -cJf /mnt/home-`(date +"%Y%m%d")`.tar.xz /usr/home
# tar -cJf /mnt/local-`(date +"%Y%m%d")`.tar.xz /usr/local
# tar -cJf /mnt/root-`(date +"%Y%m%d")`.tar.xz /root
# tar -cJf /mnt/openssl-`(date +"%Y%m%d")`.tar.xz /usr/local/openssl
# tar -cJf /mnt/var_mail-`(date +"%Y%m%d")`.tar.xz /var/mail
# tar -cJf /mnt/var_cron_tabs-`(date +"%Y%m%d")`.tar.xz /var/cron/tabs
```

I'm not sure _everything _in /var has to be restored, but better to have it and not need it...

```
# tar -cJf /mnt/var-`(date +"%Y%m%d")`.tar.xz /var
```

I created a local file with the following:

`# ee /root/key_configs`



```
/boot/loader.conf
/etc/rc.conf
/etc/login.conf
/etc/fstab
/etc/aliases
/etc/aliases.db
/etc/group
/etc/hosts
/etc/motd
/etc/named.conf
/etc/master.passwd
/etc/passwd
/etc/periodic.conf
/etc/make.conf
/etc/newsyslog.conf
/etc/resolv.conf
/etc/sysctl
/etc/sysctl.conf
/etc/syslog.conf
/etc/nsswitch.conf
/etc/profile
/etc/resolv.conf
/etc/src.conf
/etc/mail/aliases
/etc/mail/aliases.db
```

and then ran:
`# tar -cJv -T /root/key_configs -f /mnt/key-configs-`(date +"%Y%m%d")`.tar.xz`

The rest basically follows the same process as FreeBSD From Scratch.

*What packages have I installed?*
Over the years, packages get installed... Maybe you're diligent about removing any unused package like you should be, or maybe not so much... either way I'm going to backup the packages list with `# pkg version >> /mnt/pkg_version-`(date +"%Y%m%d")``

There are automated ways to reinstall all of these (poudriere, for example), but I'll just pkg add them back.

The next step is terrifying.  Reinstall FreeBSD from scratch.

*IS EVERYTHING BACKED UP?*

Next...

Wipe a lifetime's work off the disks
Install FreeBSD 10.1 from the installation media
Boot into the new, empty FreeBSD install, sadly vacant OS
Reinstall the packages

Restore config files from backup

Reboot and make sure that didn't screw everything up again.
Restore jails from backup

Start jails to restore services
Do a buildworld/installworld/mergemaster cycle to update the config files 

Hope future updates aren't such a nightmare.
Seriously consider the FreeBSD From Scratch parallel system concept.


----------



## wblock@ (Feb 16, 2015)

That article is fairly old (2008) and might require adjustment to work with current releases of FreeBSD.

I do major version upgrades like this:


Back up the old system.  Do not skip this step.
Remove the old disk, install a new blank disk.
Install the new version of FreeBSD on the new disk.
Connect the old disk as a secondary drive.  (Optional: `rsync` the important parts of the old drive into a temporary directory on the new drive, then disconnect the old drive.)

Copy data from the old drive to the new.
Update programs and configurations.
Detach the old drive and put it somewhere safe as another backup.  (This does not replace the main backup, it's additional.)
Run the new system.  If any data or configurations are missing, copy them from the old drive.


----------



## gessel (Feb 17, 2015)

*Recovering:*
I restarted to the install DVD and followed roughly the steps outlined here to wipe my disks and reinstall FreeBSD, then rebooted into my new, empty system.

Next, from the command prompt, I edited /etc/ssh/sshd_config to permit root login to continue with relatively text comfort. I'll fix that later, but for now, just getting things running. 

`# service sshd restart` enables the new config.

I like compression because disk space, and LZ4 doesn't have much (any?) visible impact on my system performance, but saving 30% of disk space does, so I turn it on following this guide:


```
# zpool set feature@lz4_compress=enabled zroot
# zfs list
# zfs set compression=lz4 zroot/usr
etc... (the rest of the mountpoints you want to compress)
```
*
Restoring files:*
The next step is remounting your files and restoring some key config files from backup using mergemaster and a temp directory. My backup was on an NFS share, so I enabled NFS client features and mounted my share with:


```
# ee /etc/rc.conf
-> nfs_client_enable="YES" # This host is an NFS client (or NO).
-> nfs_client_flags="-n 4" # Flags to nfsiod (if enabled).
# nfsiod -n 4
```

and mounted my share with
# mount -v [nfs.sys.ip.num]:/mnt/Tank/backup /mnt

`# ls /mnt` shows the old files.  Yay.

Backup the new files before messing them up:

`# tar -cJf /mnt/etc-10.1-`(date +"%Y%m%d")`.tar.xz /etc`

Next is a cool trick from this fine site to use Mergemaster just like we had meant it to go this way.


```
# mkdir /var/tmp/temproot
# tar -xvpf /mnt/backupfile.tar.xz -C /var/tmp/temproot
# mergemaster -ir
```

If you follow these directions, the old files you backed up will be temporary and you can delete them with 'd' while installing your old file is 'i' 

This is tedious, but you'd probably be doing it with mergemaster anyway if the update hadn't barfed.


----------



## gessel (Feb 18, 2015)

A few follow up notes - this is almost done (I had a huge backup file in one of my jails that got backed up and that redundant backup is consuming many hours of transfer time, alas)

*Jail Restoration*
One can merely tar and restore the entire jails directory, it works fine, but you don't get the benefits of ez-jail's full ZFS integration (creating mountpoints on a per-jail basis, at least, allowing per-jail ZFS optimization).  I just untarred and I'm not going back, but if it happens again, the restore process here seems far better.  You do have to rebuild your base jail and I'm not sure how to restore "flavors," but maybe that's better recreated anyway. 

Once a jail is copied back (and /etc and /usr/local/etc have been mergemastered into the system), the jails seem to just start as normal.  I'll declare success when they've all copied back and I can make sure my database jail restarts and connects. 

One bonus over the jail archive/restore process I linked to is by merging rc.conf back, IPs are just where you left them (at least so far). 
*
User Restoration*
Restoring the /usr/home directory pretty much recreates users as long as the necessary files in /etc are properly restored/merged.  This guide has some useful advice.  For example, to rebuild the password database:


```
# cd /etc
# pwd_mkdb /etc/master.passwd
# rehash
```

The home directory structure isn't built by FreeBSD unless a non-root user is added.  Things like SSH login by public key won't work right (at least for me) until that structure is built.  Fortunately, it is easy.

`# adduser temp` creates the directory structure along with a sacrificial user. `# rmuser temp` collects that sacrifice and leaves the /home directories usable.

*Packages:*
I just rebuilt the host system packages from ports from the list of installed ports.  The only thing that has caused a hiccup so far is Ruby.

*Minor factoid:*
I customized make.conf and have a Barcelona (AMD)-based system.  I was declaring `CPUTYPE?=barcelona`, but didn't see it in the 10.1 list of recognized CPU types.  This reference seems to validate that declaring `CPUTYPE?=amdfam10` is the right way to optimize code for this platform and informs compilers that care about sse4.  Everything seems to be compiling properly.

Edit: midnight, Wednesday.
*Everything is back! Yay!*

The only edit is that using bz2 wasn't smart.  Storage is compressed with ZFS anyway and it added a lot of overhead and a lot of time to the recovery.  Also, the ez-jail archives are much smaller and more efficient than saving the whole directory and is probably the way to go.

The system was offline for almost exactly 72 hours.  Approaching it immediately as a crash-rebuild and using ez-jail's archive function could have cut that down to about 8-10 hours total.  Next time...


----------

