# Segmentation fault while upgrading from 10.0-RELEASE to 10.1-RELEASE



## spiky (Nov 14, 2014)

Hi,

While upgrading to 10.1-RELEASE, everything was working okay for the first two commands:

```
[root@beasty ~]# freebsd-update -r 10.1-RELEASE upgrade
...
[root@beasty ~]# freebsd-update install
...
```

After rebooting successfully, I ran `freebsd-update install` once again and here's what I got:

```
[root@beasty ~]# freebsd-update install
Installing updates...Segmentation fault (core dumped)
Segmentation fault (core dumped)
Segmentation fault (core dumped)
Segmentation fault (core dumped)
...
```

Eventually, I had no choice other than "ctrl-c" the whole process since it was printing the same error message again and again but, after that, almost every command would result in a segfault. For now, I did a `zfs rollback` but would obviously like to upgrade to 10.1 someday. I don't know where to start with this.


----------



## pvoigt (Nov 15, 2014)

That's exactly what I observed today when trying to upgrade from 10.0-RELEASE to 10.1-RELEASE. After the reboot I obtained the same segmentation faults and I couldn't stop with "ctrl-c" either. My system became completely unusable: no SSH and no serial console. A quickly attached USB keyboard and attached monitor revealed that even login dumped core. All I could do with my UFS root partition was to format and restore from a luckily available fresh dump of my root file system.

My system is up with 10.0-RELEASE again but I would like to upgrade. After my bad experience I would greatly appreciate any help on how to proceed.

My system is:

```
# uname -a
FreeBSD spock 10.0-RELEASE-p12 FreeBSD 10.0-RELEASE-p12 #0: Tue Nov  4 05:07:17 UTC 2014  root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64
```

BTW: I tested the upgrade procedure before in a virtual machine without any issue. The test system, however, does not carry the same number of ports and services. It's just a minimal system only.

Regards,
Peter


----------



## talsamon (Nov 16, 2014)

Someone posted a PR 195061.


----------



## Juanitou (Nov 16, 2014)

Maybe this is related to some files missing from freebsd-update mirrors? (as it seems pvoigt is not using ZFS).


----------



## pvoigt (Nov 16, 2014)

Indeed, I am not using ZFS at all, just pure UFS and no RAID. Besides this I am using a GELI encrypted UFS home but it is not needed during the upgrade process.


----------



## wildtollwut (Nov 16, 2014)

I faced the same problem today when upgrading from 10.0-RELEASE to 10.1-RELEASE. I am using ZFS and a GELI encrypted root. Almost every command segfaults, ls and mount still work, however. Currently I'm desperately trying to recover the system.

Edit: Simple recovery failed, now the system is unbootable and even ls segfaults.

Any news regarding this?


----------



## arnov (Nov 16, 2014)

I had the same problem. I use LDAP in /etc/nsswitch.conf and Kerberos for authentication. When I removed this from nsswitch.conf (I had to use `cat >nsswitch.conf` to do that since no editor would work) and the pam.d files a lot of commands functioned again. ZFS still crashed. However `freebsd-update -IDS` showed that most files were not upgraded. Unfortunately there is no `freebsd-update reinstall` or something like that.

Because it was easier to just backup the configuration and reinstall 10.1 from scratch I did not look further. I am now reluctant to update my other systems from 9.3 to 10.1 before I know what went wrong. Does it have to do with LDAP and/or Kerberos?


----------



## wildtollwut (Nov 16, 2014)

My system seems to be running again (preliminary at least). I booted from USB and replaced /bin, /lib and /libexec. Somehow freebsd-update must have corrupted at least one of these directories.

Another run of freebsd-update screws up again. I have no idea what could be the cause. Currently I'm unable to update.

I'm fairly certain that this is not related to LDAP and Kerberos as I'm not running either of those.

Update: apparently, only /lib is corrupted. It suffices to replace the files in it by valid ones e.g. from a bootable ISO. However, applications in /usr like vim still segfault at termination. This only happens if /usr/local is mounted/present. I suspect that e.g. /usr/local/lib is also affected by the faulty update procedure.


----------



## pvoigt (Nov 16, 2014)

Some people reported similar errors on IRC. But I do not yet know why exactly freebsd-update fails. I can suppose only that it might be a combination of errors like incomplete mirrors and a bug in freebsd-update. But I do not know for sure. It is hard to find reliable information. I have been advised to build the base system and the kernel from source. This method has been told to be more reliable than using freebsd-update.

Right now `make buildword` and `make buildkernel` have just finished. Tomorrow I am going to do the rest of the upgrade process.

Regards,
Peter


----------



## talsamon (Nov 16, 2014)

http://blog.gmane.org/gmane.os.freebsd.stable



> The problems with running freebsd-update on 10.1-RC3 (including using
> freebsd-update to upgrade to -RC4 or -RELEASE) should now be fixed. The
> problem was due to some files being missing from freebsd-update mirrors
> and resulted from -RC3 going out at the same time as patches were being
> ...


----------



## pvoigt (Nov 17, 2014)

Thanks, talsamon, for pointing to the relevant portion of the above link. But I am still not sure if it fully applies because all people in this thread are upgrading from 10.0-RELEASE and not from 10.1-RC3. Or do I not understand you correctly?

Regards,
Peter


----------



## talsamon (Nov 17, 2014)

Yes, I think there are more problems as in this statement mentioned. But I found it, and think I should post it. (I haven't seen that someone else had posted a link to this in the other threads.)


----------



## dR3b (Nov 17, 2014)

I had the same problem! The upgrade from 9.3 RELEASE-p5 ends with 
	
	



```
Segmentation fault (core dumped)
```
 A further test with another VM (ESXi) and 10.0-RELEASE-p12 caused no problems.


----------



## wildtollwut (Nov 17, 2014)

My system is running again after I extracted base.txz from 10.1 to / (not overwriting /usr, /etc, /var and the likes). Still, vim was segfaulting when closing it. I could trace this back to a faulty libtspi.so (e.g. used by gnutls). As long as it resides in /usr/local/lib it's causing segfaults in various problems.

Edit: I just built libtspi.so i.e. security/trousers from ports and installed it. The same behavior 

Another edit: I copied libtspi* from a working 10.1-RELEASE to my system while `freebsd-update IDS` was running. Just as the copying was finished I got lots of

```
Segmentation fault (core dumped)
Segmentation fault (core dumped)
Segmentation fault (core dumped)
Segmentation fault (core dumped)
Segmentation fault (core dumped)
Segmentation fault (core dumped)
...
```
from `freebsd-update`. This must be somehow related.


----------



## spiky (Nov 18, 2014)

Has anybody tried with the manual method (`make buildworld`)? As for me, I've decided to do a clean install and restore my configurations and data afterwards. Then, I upgraded my 10.0 jails to 10.1 with success using the manual method:


```
[root@beasty /usr/src]# make buildworld
...
[root@beasty /usr/src]# mergemaster -p -D /jails/cranky
...
[root@beasty /usr/src]# make -DBATCH_DELETE_OLD_FILES installworld delete-old delete-old-libs DESTDIR=/jails/cranky
...
[root@beasty /usr/src]# mergemaster -i -U -D /jails/cranky
```


----------



## pvoigt (Nov 18, 2014)

Yeah, I have been convinced by people on IRC to do `make buildworld`. I am a bit disappointed that there is no reliable information about the reason for freebsd-update failing. There should at least be a kind of official warning to wait using freebsd-update until the reason of failure will be found. My rock stable picture of FreeBSD is somewhat disturbed by my extremely bad experience with freebsd-update. I have never had such a harsh crash before with a completely unresponsive system. At least not with a Unix system 

First I did not want to dare the `make buildworld` process due to my lack of experience. But the whole process was straight forward and went very smoothly.

I finally decided for `make buildworld` because I cannot afford a longer server downtime. And the reinstallation of more than 900 ports including their re-configuration from scratch would have been too time intensive. With using `make buildworld` I effectively had a server down time of no longer than two times two minutes, e.g. the two reboots.

Though not really necessary I am currently rebuilding all ports. Compared to building and installing the new base system and the new kernel this is even more time consuming. This is not only regarding the pure build time but mainly because some of the ports are not building at all. At least one has had an open PR for several months. I am currently skipping them and will investigate the details later.

Regards,
Peter


----------



## spiky (Nov 18, 2014)

I agree with you pvoigt. I'm also a bit disappointed with that.

By the way, as for you rebuilding the ports, have you looked at PKGNG? Using this in combination with the ports (only for packages which require custom compiling options) is a very effective way of managing packages. I think the official word is to use only one method but my experience so far using both is very good.


----------



## jb_fvwm2 (Nov 18, 2014)

Checking the Makefile, freebsd-update appeared about 2006.  I used /usr/src/UPDATING prior to that to do the `buildworld` cycle and kind of thought that the former would be used by those already experienced with the latter, seeing as how things can go awry with either. The latter would be a backup to the former.


----------



## dR3b (Nov 18, 2014)

Delete the following directories:

```
/boot/kernel.old
/boot/kernel.generic
```
After that `freebsd-upgrade` is running without any errors.


----------



## mxms (Nov 18, 2014)

dR3b said:


> Delete the following folder:
> 
> ```
> /boot/kernel.old
> ...


It didn't work for me. Also I can't execute `make buildworld` bec*a*use I also had a 'segmentation fault'.


----------



## gavin@ (Nov 18, 2014)

For anybody seeing the "Segmentation fault (core dumped)" messages still, can you provide the output of `dmesg | grep -A 8 ^CPU` and also show the content of your /etc/nsswitch.conf file please?

Also, if you `mv /etc/nsswitch.conf /etc/nsswitch.conf.o` do things start working again without having to do anything else?


----------



## mxms (Nov 18, 2014)

```
# dmesg | grep -A 8 ^CPU
CPU: Intel(R) Atom(TM) CPU D525   @ 1.80GHz (1800.11-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0x106ca  Family = 0x6  Model = 0x1c  Stepping = 10
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x40e31d<SSE3,DTES64,MON,DS_CPL,TM2,SSSE3,CX16,xTPR,PDCM,MOVBE>
  AMD Features=0x20100800<SYSCALL,NX,LM>
  AMD Features2=0x1<LAHF>
  TSC: P-state invariant, performance statistics
real memory  = 4294967296 (4096 MB)
avail memory = 4093018112 (3903 MB)
# cat /etc/nsswitch.conf
#
# nsswitch.conf(5) - name service switch configuration file
# $FreeBSD: release/10.0.0/etc/nsswitch.conf 224765 2011-08-10 20:52:02Z dougb $
#
group: files winbind
group_compat: nis
hosts: files dns
networks: files
passwd: files winbind
passwd_compat: nis
shells: files
services: compat
services_compat: nis
protocols: files
rpc: files
```
Removing of /etc/nsswitch.conf didn't work.


----------



## spiky (Nov 18, 2014)

The following appears on my clean install of 10.1 but I've restored nsswitch.conf as it were on 10.0.


```
[root@beasty ~]# cat /var/run/dmesg.boot | grep -A 8 ^CPU
CPU: Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz (3292.60-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0x306a9  Family = 0x6  Model = 0x3a  Stepping = 9
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x7fbae3ff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
  AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
  AMD Features2=0x1<LAHF>
  Structured Extended Features=0x281<FSGSBASE,SMEP,ERMS>
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
  TSC: P-state invariant, performance statistics
[root@beasty ~]#
[root@beasty ~]# cat /etc/nsswitch.conf
#
# nsswitch.conf(5) - name service switch configuration file
# $FreeBSD: release/10.0.0/etc/nsswitch.conf 224765 2011-08-10 20:52:02Z dougb $
#
passwd: files ldap
##passwd: files winbind
group: files ldap
##group: files winbind
#group: compat
#group_compat: nis
##hosts: files dns mdns
##hosts: files mdns4_minimal [NOTFOUND=return] dns
hosts: files mdns dns
networks: files
#passwd: compat
#passwd_compat: nis
shells: files
services: compat
services_compat: nis
protocols: files
rpc: files
[root@beasty ~]#
```


----------



## wildtollwut (Nov 18, 2014)

Very interesting, for me the removal of /etc/nsswitch.conf works (even if /usr/local/lib/libtspi.so is present) and the system doesn't segfault anymore.
/etc/nsswitch.conf

```
group: compat
group_compat: nis
hosts: files wins dns
networks: files
passwd: compat
passwd_compat: nis
shells: files
services: compat
services_compat: nis
protocols: files
rpc: files
```


```
CPU: Intel(R) Celeron(R) CPU  N2820  @ 2.13GHz (2133.47-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0x30673  Family = 0x6  Model = 0x37  Stepping = 3
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE
2,SS,HTT,TM,PBE>
  Features2=0x41d8e3bf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,SSE4.1,SSE4.2,MOVBE,POPCNT,TSCDLT,RDRA
ND>
  AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
  AMD Features2=0x101<LAHF,Prefetch>
  Structured Extended Features=0x2282<TSCADJ,SMEP,ERMS>
  VT-x: PAT,HLT,MTF,PAUSE,EPT,UG,VPID
  TSC: P-state invariant, performance statistics
--
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s)
cpu0 (BSP): APIC ID:  0
cpu1 (AP): APIC ID:  2
```

Still, to get to this point, I had to replace /lib with a version from a 10.1-RELEASE image.


----------



## mxms (Nov 18, 2014)

wildtollwut said:


> Still, to get to this point, I had to replace /lib with a version from a 10.1-RELEASE image.


Tried it too but without success.


----------



## arnov (Nov 18, 2014)

```
# dmesg | grep -A 8 ^CPU

CPU: Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz (2392.56-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0x6fb  Family = 0x6  Model = 0xf  Stepping = 11
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0xe3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM>
  AMD Features=0x20100800<SYSCALL,NX,LM>
  AMD Features2=0x1<LAHF>
  VT-x: HLT,PAUSE
  TSC: P-state invariant, performance statistics
real memory  = 3221225472 (3072 MB)
# cat /etc/nsswitch.conf
#
# nsswitch.conf(5) - name service switch configuration file
# $FreeBSD: release/10.0.0/etc/nsswitch.conf 224765 2011-08-10 20:52:02Z dougb $
#
#group: compat
group: files ldap
#group_compat: nis
hosts: files dns
networks: files
#passwd: compat
passwd: files ldap
#passwd_compat: nis
shells: files
services: compat
#services_compat: nis
protocols: files
rpc: files
```
Symptoms: after a reboot I could not log in. In single user mode `ls` worked but `ls -l` would segfault. This made me assume that it had something to do with /etc/nsswitch.conf. I used `cat >nsswitch.conf` to create a minimal /etc/nsswitch.conf :

```
passwd: files
group: files
```
After that `ls -l` worked again. I still could not log in. I commented out Kerberos in /etc/pam.d/system and in /etc/pam.d/other after which I could log in again.

`freebsd-update -IDS` showed that most files were not updated. Since it was a test system I decided to back up the configuration and install 10.1-RELEASE from scratch. I have not tried to run without /etc/nsswitch.conf.

Update: After rereading my post I realized that I did not mention that several commands still did work after my last change. `vi` worked after my change to /etc/nsswitch.conf. I could log in after my changes to the  pam.d files but `zfs` still segfaulted.


----------



## dR3b (Nov 18, 2014)

This is my /etc/nsswitch.conf:

```
#
# nsswitch.conf(5) - name service switch configuration file
# $FreeBSD: releng/9.3/etc/nsswitch.conf 224765 2011-08-10 20:52:02Z dougb $
#
#
#group: compat
group: files nis winbind
group_compat: nis
hosts: files dns
networks: files
#passwd: compat
passwd: files nis winbind
passwd_compat: nis
shells: files
services: compat
services_compat: nis
protocols: files
rpc: files
```


----------



## wildtollwut (Nov 18, 2014)

mxms said:


> Tried it too but without success.


Have you tried replacing also /libexec, /usr/lib and /usr/libexec?


----------



## mxms (Nov 18, 2014)

wildtollwut said:


> Have you tried replacing also /libexec, /usr/lib and /usr/libexec?


Thanks. Segmentation fault error seems to be gone now. I will check non-system software.


----------



## arnov (Nov 21, 2014)

Any news on this? I still have to upgrade two systems. Will `mv /etc/nsswitch.conf /etc/nsswitch.conf.o` before doing `freebsd-update -r 10.1-RELEASE upgrade` prevent it?


----------



## pvoigt (Nov 21, 2014)

arnov, I am regularly scanning the IRC and the list but I have not heard about any reliable solution for `freebsd-update` yet. On the other hand, I have been successful with the `buildword` process and can thus recommend to go this way.

Regards,
Peter


----------



## wildtollwut (Nov 21, 2014)

I'd really like to know what's causing the segmentation fault. On my system, if I remove wins from the hosts line in /etc/nsswitch.conf, everything works normally. Apparently even when doing `ls` or `vi` some kind of host lookup is performed which mysteriously fails with the new kernel or parts of the userland.

I may have been premature by discounting the LDAP connection. Turns out I am also using LDAP via the installed samba4. After reinstalling databases/ldb samba works nicely except for the system-wide WINS host lookup.


----------



## mxms (Nov 21, 2014)

In my case I restored system libraries with a FreeBSD 10.1-RELEASE memory stick using a similar procedure.

Boot from USB stick and exit to Live CD.
Mount the damaged FreeBSD installation on /mnt (/, /usr, /var)
Back up manually modified files from /mnt/etc to the USB stick.


```
# cd /usr/freebsd-dist; for file in base.txz lib32.txz kernel.txz src.txz ; do (cat $file | tar --unlink -xvpJf - -C /mnt); done
```

Reboot in current restored system.
Mount the USB stick and restore the backup to /etc.


----------



## Daniel Santos (Dec 12, 2014)

Same problem here and I had some important data on this server. The data is on different zpools apart from the root pool. In this case, if I just reinstall the system, will I be able to import these pools?


----------



## Remington (Dec 19, 2014)

Have you tried to do a `buildworld` on a clean machine or vm and tgz /usr/src and /usr/obj to a seg faulted machines to do a `installworld` without using `buildworld`?


----------



## mix_room (Jan 22, 2015)

I ran into this today as well. Upgraded from 9.2 to 10.1-RELEASE.
Solved it by reextracting base.txz. Would really have been nice to know what went wrong.


----------



## David Chisnall (Feb 6, 2015)

It appears that this problem is a change in the ABI of some base system library, which breaks things using ports for any of the nss stuff.  There is a bug tracking it: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=197366

Unfortunately, this was not part of the normal prerelease testing.  We are also investigating how to incorporate this into regression testing for future issues.


----------



## segfault (Feb 6, 2015)

Not sure if this is helpful but I successfully upgraded from stock 10.0-RELEASE (no added patches) to 10.1-RELEASE just this past Monday without problem.

FreeBSD mybox 10.1-RELEASE-p5 FreeBSD 10.1-RELEASE-p5 #0: Tue Jan 27 08:55:07 UTC 2015     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64


----------



## quamenzullo (Feb 13, 2015)

Well I need to upgrade from 10.0-RELEASE-p12 to 10.1-RELEASE and I don't want to run into the problems that are discussed about in this thread. This needs to be done soon since 10.0-RELEASE reaches its end-of-life in two weeks.
Did anyone successfully upgraded from 10.0-RELEASE-p12 to 10.1-RELEASE?
(Would that be an idea to rollback to 10.0-RELEASE before attempting to upgrade?)


----------



## dR3b (Feb 16, 2015)

/etc/nsswitch.conf is the problem! If you haven't changed anything in that file it should be OK.


----------



## quamenzullo (Feb 17, 2015)

Seems it worked, so far...


----------



## patpro (Aug 12, 2015)

This is a very odd problem.
I've upgraded too 9.3-RELEASE to 10.1-RELEASE earlier in July: no problem at all. Both use a modified nsswitch.conf because those systems are bound to an LDAP server.

I've got a third server, almost identical (all 3 are mail servers), bound to LDAP. I've tried to upgrade it from 9.3-RELEASE to 10.1-RELEASE this morning: epic failures.

Firstly, it took me ages to fetch upgrades, freebsd-update(8) was failing over and over on bad file checksums. I've had to change update servers, and it finally worked. First time I'm seeing this kind of problem.
Then I've experienced the same segfault anomalies you guys are reporting.

Thanks to my VMware snapshot, I've rolled back, replaced my nsswitch.conf with a non-ldap version, and I've done the all upgrade process again. It worked well.

I really don't understand why this bug would occur on this third server, but not in the first 2. The only differences are:

- first 2 were upgraded the first week of July, the third was upgraded today (12th of August)
- first 2 were installed as 8.x-RELEASE, upgraded to 9.x-RELEASE a year or so ago, the third one was installed as 9.x-RELEASE from scratch.

I find this a little bit concerning...


----------

