# No boot device after zfs-upgrade



## jbo (Mar 18, 2021)

On a FreeBSD 12.2 machine, I noticed the following message(s) when running `zpool status`:

```
status: Some supported features are not enabled on the pool. The pool can
    still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
    the pool may no longer be accessible by software that does not support
    the features. See zpool-features(5) for details.
```
After reading the relevant documentation(s), I concluded that running these two commands will be the way to go (review and then apply the changes):

```
zfs upgrade -v
zfs upgrade- a
```
The upgrade appeared to work successfully.

The host has two zpools:
- `zroot`: 2x 256GB NVMe mirror
- `storage`: 4x 10TB raidz2

After upgrading, `zfs upgrade -a` informed me that I should update the boot code on all bootable devices. In my case this would be the two NVMe drives in the `zroot` pool (`nvd0` and `nvd1`.
In an attempt to be helpful, `zfs upgrade -a` also mentioned an example of using `gpart bootcode` to upgrade in a scenario where GPT is used.

Unfortunately, I was foolish and forgot to check whether this particular host is actually using GPT and ran the command on both disks. However, this is an EFI system.
Now I am stuck with a machine that doesn't boot. After the BIOS I am being informed that I should select an appropriate boot device and try again.

Before I start further messing with the system, I would like to inquire some information of what exactly should happen to recover from this issue. I did some research and found a few forum topics but given the delicate nature of the operation I'd like to get some more input for my particular case.

Right now I booted FreeBSD in live mode from a USB drive on the affected host. This allows me to access the disks and make the necessary modifications.
Here's the output of `gpart show` of both disks in the `zroot` pool.




Where should I go from here? Is there a certain set of analytics I should run/perform before I start re-writing the EFI boot code?

As far as I understand, I would need to do this to restore the EFI boot code:

```
gpart bootcode -p /boot/boot1.efi -i 1 nvd0
gpart bootcode -p /boot/boot1.efi -i 1 nvd1
```
Is this correct? Is there something else I should watch out for?

Is there a chance that this is beyond repair?
I'd be thankful for any kind of input on this.


----------



## Argentum (Mar 18, 2021)

joel.bodenmann said:


> Unfortunately, I was foolish and forgot to check whether this particular host is actually using GPT and ran the command on both disks. However, this is an EFI system.
> Now I am stuck with a machine that doesn't boot. After the BIOS I am being informed that I should select an appropriate boot device and try again.
> 
> Before I start further messing with the system, I would like to inquire some information of what exactly should happen to recover from this issue. I did some research and found a few forum topics but given the delicate nature of the operation I'd like to get some more input for my particular case.
> ...


Should not to be a big problem. Reinstall the EFI partition. EFI boot can even coexist with legacy boot in separate partitions, but if tour BIOS EFI works OK, there is no need for legacy boot. Your data is most probably safe. Read 



Spoiler: UEFI






			UEFI - FreeBSD Wiki
		





This should do the work:

```
mount -t msdosfs /dev/nvd0p1 /mnt
cp /boot/loader.efi /mnt/EFI/BOOT/BOOTX64.efi
umount /mnt
```


----------



## Argentum (Mar 18, 2021)

joel.bodenmann said:


> On a FreeBSD 12.2 machine, I noticed the following message(s) when running `zpool status`:
> 
> ```
> status: Some supported features are not enabled on the pool. The pool can
> ...


.. but I have another question here - *how did you get that message*? The pool features come with kernel updates. You are using the *base ZFS*, right? Hope that you did not upgrade the pool using *OpenZFS*. If this is the case, reinstalling the boot code does not help. But even in this case, *there is still hope*...


----------



## jbo (Mar 18, 2021)

Argentum said:


> .. but I have another question here - how did you get that message? The pool features come with kernel updates. You are using the base ZFS, right? Hope that you did not upgrade the pool using OpenZFS. If this is the case, reinstalling the boot code does not help. But even in this case, there is still hope...


This is a machine that was previously running FreeBSD 12.1 which was at one point upgraded to FreeBSD 12.2.
I am using the built-in ZFS - nothing special there.
Upgrading from 12.1 to 12.2 brought also a kernel updates therefore this should match expectations?

Happy to hear that there is still hope even if this would be a problem.





Argentum said:


> This should do the work:
> 
> ```
> mount -t msdosfs /dev/nvd0p1 /mnt
> ...



I am unable to mount the partition:

```
# mount -t msdosfs /dev/nvd0p1 /mnt
mount_msdosfs: /dev/nvdp1: No such file or directory
```
What's the problem here? Corrupt partition table? I assume then `gpart show` would not list them properly?


----------



## SirDice (Mar 18, 2021)

joel.bodenmann said:


> What's the problem here? Corrupt partition table?


No, you overwrote that partition with code from boot1.efi. So now it's not a FAT filesystem anymore. 

This isn't the correct way (this efifat file is going to disappear in 13.0) but it'll do for now. 


```
gpart bootcode -p /boot/boot1.efifat -i 1 nvd0
gpart bootcode -p /boot/boot1.efifat -i 1 nvd1
```


----------



## jbo (Mar 18, 2021)

SirDice said:


> This isn't the correct way (this efifat file is going to disappear in 13.0) but it'll do for now.
> 
> 
> ```
> ...


Both commands executed successfully. However, I am still unable to boot: The system doesn't find a bootable device.

Unfortunately, this exceeds my current experiences with FreeBSD. Given the good documentation and stability I never ran into an issue like this so far 

Where to go from here? How to figure out what's broken and eventually how to repair it?

Thank you for your help guys - It's greatly appreciated.


----------



## Argentum (Mar 18, 2021)

joel.bodenmann said:


> I am unable to mount the partition:


Then you should format it first and create FAT.
`newfs_msdos -F 32 -c 1 /dev/da0p1`

Then mount it and create EFI folder.


```
mount -t msdosfs /dev/da0p1 /mnt
mkdir -p /mnt/EFI/BOOT
```

After that copy the loader.

And please show us your /boot/loader.conf. You are able to import the 'zroot' to the rescue system (booted from stick)?
The whole story looks very much like you had OpenZFS and upgraded the pool under FreeBSD 12.2. If this is the case, you have at least 2 options. Everything can be repaired, but it is important to find the root of the problem.


----------



## jbo (Mar 18, 2021)

Here are the steps I performed:

```
newfs_msdos -F 32 -c 1 /dev/nvd0p1
mount -t msdosfs /dev/nvd0p1
mkdir -p /mnt/EFI/BOOT
cp /boot/loader.efi /mnt/EFI/BOOT/BOOTX64.efi
umount /mnt
gpart bootcode -p /boot/boot1.efifat -i 1 nvd0
```
I repeated the same for the second drive in the pool: `nvd1`
Unfortunately, I am still unable to boot.

I booted back into the live system and imported the pool:

```
# zpool import -f -R /mnt zroot
# zfs mount zroot/ROOT/default
```
This allowed me to successfully import the pool and mount the filesystem.
I had to use -f as this pool was of course used on a different system prior. I hope that this will not lead to issues down the road when attempting to boot from it later?

Here's the contents of /boot/loader.conf as requested:

```
kern.geom.label.disk_ident.enable="0"
kern.geom.label.gptid.enable="0"
opensolaris_load="YES"
zfs_load="YES"
pf_load="YES"
kern.racct.enable=1
```

As far as my notes/documentation on this host go, there shouldn't be any reason to believe that non-stock ZFS was used. This host was installed using FreeBSD 11.2 and then gradually upgraded to 12.1 and 12.2.
However, I am  not sure what `opensolaris_load="YES"` is doing in the boot load config file...

Where to go from here? :/


----------



## Emrion (Mar 18, 2021)

Redo what you have done, but without `gpart bootcode -p /boot/boot1.efifat -i 1 nvd0` and `gpart bootcode -p /boot/boot1.efifat -i 1 nvd1`.

By the way, I didn't updated the content of the efi partition when I upgraded to 12.2-RELEASE and it worked (on a system which boots on efi).


----------



## Argentum (Mar 18, 2021)

joel.bodenmann said:


> Where to go from here? :/


You can try: `zpool get bootfs` after pool import.
What is the MB/BIOS model? Some brands have difficulty (non standard) with UEFI boot. You can install the legacy boot in this case.

You can also try to replace the /boot/loader.efi with the latest version from 13.0-RC2 loader.

I can see that your nvd is not huge. The next option is to transfer the whole pool to a new disk. For example take a cheap 1TB (or smaller) rotating drive, manually partition it and use fresh FreeBSD rescue system. See if it boots. When installing the rescue system, use another pool name instead of 'zroot'. Now you can create the third pool on that disk.
Import your original pool to that rescue system, create recursive snapshot and using `zfs send`, `zfs receive` transfer your old pool to the freshly created empty pool. Set it bootable with `zpool set bootfs`.


```
zfs snapshot -r source_pool@replica

zfs send -R source_pool@replica | zfs receive -F dest_pool

zpool set bootfs=dest_pool/ROOT/default dest_pool
```

This procedure works and if you suspect the pool version, you can create the dest_pool with lower version and transfer. I have even downgraded from OpenZFS upgraded (and not bootable) pool to the 12.2 base pool and made it bootable again.


----------



## Argentum (Mar 18, 2021)

joel.bodenmann said:


> Here are the steps I performed:
> 
> ```
> newfs_msdos -F 32 -c 1 /dev/nvd0p1
> ...


The last command here overwrites EFI partition. After you unmount, you are done. boot1.efifat is deprecated today. You should only copy the loader to that folder on efi partition. Hope the loader is good. You may try to take from another source. The whole efi partition is not tightly coupled to the system. Relativity easy and safe to handle and modify. The loader.efi you copy should be good. Maybe it is corrupted for some reason. Take it from another source. It implements a simple standalone ZFS functionality to access bootable files on ZFS pools. If this file is old, it may not recognize the new pool version. As a good practice, I am always compiling the loader from source. It can be even build independently of the system. As I wrote before, you may try even a new loader from 13.0.


----------



## Argentum (Mar 18, 2021)

Emrion said:


> By the way, I didn't updated the content of the efi partition when I upgraded to 12.2-RELEASE and it worked (on a system which boots on efi).


But nobody knows what version of /boot/loader.efi he has. The logic behind that upgrade message is clear - old loader may not be able to recognize the latest pool. That is exactly the case of 12.2 and OpenZFS. It can be used, but if you upgrade the pool, the loader does not recognize it any more.

I did some experiments, rebuilt the loader and inserted some debug printouts. This is all understandable - the loader has a simple, stand alone ZFS implementation in it and if the pool version is too much higher than the loader version, it simply fails.

The process is relatively simple and straightforward - the BIOS starts the loader and it scans for bootable storage pools and datasets. There may be only 2 possibilities - the loader does not start for some reason or it does not find the bootable dataset. As I wrote already, the loader can be rebuilt with debug printouts. In this case, at least, one can ensure that the loader in fact starts.


----------



## Emrion (Mar 18, 2021)

I also upgraded the pool. The upgrade message isn't clear and should be adapted to the current boot method, at least. Or better, the upgrade process could execute a script that does correctly the thing with the approval of the user.


----------



## Argentum (Mar 18, 2021)

Emrion said:


> I also upgraded the pool. The upgrade message isn't clear and should be adapted to the current boot method, at least. Or better, the upgrade process could execute a script that does correctly the thing with the approval of the user.


I think it comes from ZFS upstream. But agree, it can be better. The logic is still understandable - the loader should be able to handle the pool and it can not be upward compatible.


----------



## jbo (Mar 18, 2021)

Thank you for the details and information provided.

I think that at this point it might indeed be easiest to just push the pool to a different host using `zfs send`, re-install the OS and pull back the relevant files.

If this is still helpful for you guys tho: I copied the loader from a FreeBSD 12.2 live instance.


----------



## Eric A. Borisch (Mar 18, 2021)

Can you set your bios to *EFI only*, and not *legacy/EFI*. I'm wondering if it's getting tripped by the protective MBR into looking for a freebsd-boot partition.


----------



## Argentum (Mar 19, 2021)

joel.bodenmann said:


> If this is still helpful for you guys tho: I copied the loader from a FreeBSD 12.2 live instance.


If you have FreeBSD source, you can build a new loader with debug messages and try it.

To build the loader:


```
cd /usr/src/stand/
make install clean
```

Just tried on my desktop - it is not time consuming. Building the whole _*stand*_ took 1.8 minutes.

The loader source is in /usr/src/stand/efi/


----------



## Argentum (Mar 19, 2021)

joel.bodenmann said:


> If this is still helpful for you guys tho: I copied the loader from a FreeBSD 12.2 live instance.


*More good news for you* - I did just an experiment on 12.2 ZFS machine, installing 13.0 loader on a spare drive where I created an EFI partition for that. As expected, *it works*!
The loader or EFI partition need not to be on the same drive where your ZFS pool is located. You can have multiple EFI partitions. The loader is backward compatible, so with 13.0 loader you can start the 12.2 system.


----------



## Eric A. Borisch (Mar 19, 2021)

Eric A. Borisch said:


> Can you set your bios to *EFI only*, and not *legacy/EFI*. I'm wondering if it's getting tripped by the protective MBR into looking for a freebsd-boot partition.



I still think this is still worth checking; assuming you (as you say, incorrectly for EFI) ran `gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 nvd0` after the upgrade, you've _*still got the /boot/pmbr master boot record installed*_ even after you restore (`gpart bootcode -p /boot/boot1.efifat -i 1 nvd0`) the EFI boot code partition.

When booting on a "legacy" (non-EFI) system, this boot record will advertise that it knows how to boot, but when it actually attempts to, it will look for a traditional freebsd-boot (see gpart(8) _bootstrapping_ section) partition which, using EFI, you don't have. A BIOS that is in legacy/UEFI mode may very well attempt legacy boot first, think it has succeeded (with "I found a drive with a master boot record; my job here is done") and leave you in the unable-to-boot state.

You can either disable legacy boot (only EFI boot), or try wiping out the protective MBR: `dd if=/dev/zero of=/dev/nvd0 bs=512 count=1` (and also for nvd1).

NB: Be very careful whenever using dd(1); make sure you understand what the command is doing before proceeding.


----------



## jbo (Mar 20, 2021)

The boost mode was set to EFI only so far. This morning I did attempt to boot after changing it to EFI/Legacy and afterwards to just Legacy. Overall no difference: The system still doesn't boot.
I changed the setting back to EFI only boot as I don't intend to do anything else after fixing this 

I'll attempt wiping the MBR later today - nothing I want to do while being in a rush.



Argentum said:


> *More good news for you* - I did just an experiment on 12.2 ZFS machine, installing 13.0 loader on a spare drive where I created an EFI partition for that. As expected, *it works*!
> The loader or EFI partition need not to be on the same drive where your ZFS pool is located. You can have multiple EFI partitions. The loader is backward compatible, so with 13.0 loader you can start the 12.2 system.


Hmm... this is indeed interesting to know.
How would this be different (in essence) from what I've been doing so far? I take it that my system should be able to boot from the ZFS pool both on the 12.2 loader and the 13.0 one, right? Do you have any reason to believe that performing the previously done steps (copying the loader) from a FreeBSD 13.0 system is gonna result in anything other than what we have experienced when doing it with a 12.2 loader?

Is there anything special that I have to do/consider when trying to do this on a separate drive? Or would I just create the EFI partition, copy the loader and tell my system to boot from that drive instead? No magic involved?
Where/how would I tell the system to boot from the `zroot` pool located on the currently present drives when setting up an EFI partition on a different drive?

Thank you very much for your efforts!


----------



## Argentum (Mar 20, 2021)

joel.bodenmann said:


> Hmm... this is indeed interesting to know.
> How would this be different (in essence) from what I've been doing so far? I take it that my system should be able to boot from the ZFS pool both on the 12.2 loader and the 13.0 one, right? Do you have any reason to believe that performing the previously done steps (copying the loader) from a FreeBSD 13.0 system is gonna result in anything other than what we have experienced when doing it with a 12.2 loader?


13.0 loader is a major upgrade. *You can give it a try.* My personal takeaway here is that when upgrading, it is a good idea to upgrade the loader first and the system after that. So, I am writing this message on a 12.2 machine, but booted up with new 13.0 loader this morning. When upgrading one day, I can be sure that the loader is already good.


----------



## jbo (Mar 20, 2021)

Eric A. Borisch said:


> You can either disable legacy boot (only EFI boot), or try wiping out the protective MBR: `dd if=/dev/zero of=/dev/nvd0 bs=512 count=1` (and also for nvd1).
> 
> NB: Be very careful whenever using dd(1); make sure you understand what the command is doing before proceeding.



This would not only delete the MBR but also partition table, right?
Is that a non-issue here because the EFI partition would provide its own means to find/locate the partition(s) I want to boot from or am I missing something obvious here?


----------



## Argentum (Mar 20, 2021)

joel.bodenmann said:


> This would not only delete the MBR but also partition table, right?
> Is that a non-issue here because the EFI partition would provide its own means to find/locate the partition(s) I want to boot from or am I missing something obvious here?


This seems not to destroy the partition table, just the MBR. But it is *always a good idea *to write down your partition tables with `gpart backup nvd0`. The output is just text. Write it down,. You can always restore the partition table if you know the block boundaries.


----------



## jbo (Mar 20, 2021)

Alright, after backup up the partition table I nuked the MBR of both `nvd0` and `nvd1` but the system still refuses to boot.

Given that I am able to import the `zroot` pool on a live system on the same host I agree with you guys that this should be fixable and I'd certainly prefer fixing it over just re-installing and migrating.

I am wondering why the system would not boot after copying the boot loader from a running FreeBSD 12.2 live system. Just to be sure, the location of the loader is `/EFI/BOOT/BOOTX64.efi` - that is correct, right?
Is there anything else that would need to happen for the system to recognize this? The system itself is setup to boot only from EFI (and this worked prior to the `zfs upgrade` well for over a year on that exact system. Does the EFI partition need any form of special flags or other magic to be considered worth checking out by the system when booting?

Taking everything into account that you guys mentioned so far the following measure are available to continue from here:
- Compile the boot loader with debug output enabled and run that to potentially figure out where things go wrong.
- Boot a FreeBSD 13.0 live system instead and copy that boot loader
- Lastly, `zfs send|recv` the pool to a different device, re-install the host and migrate back

If it's of any help, here is the output of `gpart backup` prior to overwriting the MBR with zero:


----------



## jbo (Mar 20, 2021)

I just tried the FreeBSD 13.0 RC3 loader and the issue remains - still unable to boot from the pool.

Steps I performed:
  1. Boot into FreeBSD 13.0 RC3 on the affected host
  2. Mount the EFI partition of the first device in the `zroot` mirror pool to `/mnt`
  3. `cp /boot/loader.efi /mnt/EFI/BOOT/BOOTX64.efi`
  4. Repeat for the second device in the mirrored pool
  5. Reboot and hope for the best

No tears were shed.


----------



## Eric A. Borisch (Mar 20, 2021)

Is the zpool’s bootfs property set? What is the exact message when the system doesn’t boot?


----------



## Argentum (Mar 20, 2021)

joel.bodenmann said:


> I just tried the FreeBSD 13.0 RC3 loader and the issue remains - still unable to boot from the pool.


After importing the pool, what do you get by running `zpool get bootfs`?


----------



## jbo (Mar 20, 2021)

Eric A. Borisch said:


> What is the exact message when the system doesn’t boot?


The message reads:

```
Reboot and Select proper Boot device
Please press the <DEL> key into BIOS setup menu.
or Press the <CTRL-ALT-DEL> key to reboot the system.
```




Eric A. Borisch said:


> Is the zpool’s bootfs property set?


I booted into FreeBSD 13.0 RC3 on the affected host, imported the `zroot` pool and inspected the properties using `zpool get`. The `bootfs` is set to `YES`.


----------



## Eric A. Borisch (Mar 20, 2021)

It needs to be set to the actual boot (root) filesystem:

`$ zpool get bootfs newsys
NAME    PROPERTY  VALUE                   SOURCE
newsys  bootfs    newsys/ROOT/13.0-RC2.2  local

bootfs=(unset)|pool/dataset
             Identifies the default bootable dataset for the root pool. This
             property is expected to be set mainly by the installation and
             upgrade programs.  Not all Linux distribution boot processes use
             the bootfs property.`


----------



## jbo (Mar 20, 2021)

Argentum said:


> After importing the pool, what do you get by running `zpool get bootfs`?



Here we go:

```
root@:~ # zpool get bootfs
NAME   PROPERTY  VALUE               SOURCE
zroot  bootfs    zroot/ROOT/default  local
```



Eric A. Borisch said:


> It needs to be set to the actual boot (root) filesystem:


Looking at the output provided above - I take it that no further action is needed here?


----------



## Argentum (Mar 20, 2021)

joel.bodenmann said:


> If it's of any help, here is the output of `gpart backup` prior to overwriting the MBR with zero:
> 
> View attachment 9419


Not for us right now, but for yourself in case you accidentally destroy your partition table.


----------



## jbo (Mar 20, 2021)

Considering the information currently present on the situation at hand, where would you guys recommend me to go from here?


----------



## Argentum (Mar 20, 2021)

joel.bodenmann said:


> The message reads:
> 
> ```
> Reboot and Select proper Boot device
> ...


This does not look like loader message. This is more like BIOS message. That may mean that the loader has not been started.

In the BIOS setting you should see all the UEFI devices (depending on the BIOS of course). Try changing boot order in BIOS. Can you try booting on another motherboard?

In this desktop machine, I am writing this message, I have 4 drives and all have UEFI partitions (with 2 different loaders). When I go to BIOS, I can see all of them and select the boot device.

Also, when you have booted from another drive, you can try `efibootmgr -v`. See efibootmgr(8)


----------



## jbo (Mar 20, 2021)

Argentum said:


> This does not look like loader message. This is more like BIOS message. That may mean that the loader has not been started.


Yes, that is correct. Hence the topic title 

Interesting... I have no idea why I didn't check this earlier but the BIOS doesn't list any bootable devices (other than the USB Stick if I have it inserted to boot to FreeBSD live).



Argentum said:


> Can you try booting on another motherboard?


Technically I can - just gonna take some time for me to get there.
When pursuing this, do I need to hook up both drives to the "spare host" I will try to boot from or is just one drive enough?
I assume that I would want to have both disks there so in case the boot works the contents of the two disks remain the same - but is it technically possible to boot from a mirrored ZFS pool with just one device present without any user intervention?



Argentum said:


> Also, when you have booted from another drive, you can try `efibootmgr -v`.


As far as I can tell the boot drives (the two NVMe drives that make up the mirrored `zroot` zpool) do not show up there.
It lists:
- Built-in EFI Shell
- Network card
- Hard drive (which I belive are the 4x 10TB storage disks)
- The USB key containing the live system

What possible reasons lead to the system not detecting the NVMe drives? The hardware remained unchanged between the prior working state of the system and the current state after `zfs upgrade`.


----------



## Eric A. Borisch (Mar 20, 2021)

I concur with Argentum — we’re not getting to the loader at all; I would go through your BIOS settings, or perhaps reset BIOS settings to default and see if that shakes things loose. How are the nvme devices attached? On the motherboard? PCIe card?

Is there a “Select Boot Device” or similar boot option you can activate (typically a function key) during the boot process? This will usually still list other devices even if they aren’t in the “active” boot priority list.

As a last resort, you could make a USB boot stick that has the EFI loader (in an efi boot partition, similar to what you have on your nvd devices) and nothing else; it (the loader) will scan other devices after it doesn’t find a bootable filesystem on the device.


----------



## Mjölnir (Mar 20, 2021)

Check your UEFI/BIOS and the docs of your MB.  Some systems can not boot off a NVMe device.  Then you can perform any _vodoo_, it just won't boot.
EDIT E.g. zeRusski ran into that issue.  Search for his thread & a solution.


----------



## jbo (Mar 20, 2021)

Eric A. Borisch said:


> How are the nvme devices attached? On the motherboard? PCIe card?


Those are two m.2 form-factor NVMe drives attached to PCIe adapters. These are pure physical adapters and don't contain any sort of controller chips. Each of the two NVMe drives has its own PCIe adapter.



Eric A. Borisch said:


> Is there a “Select Boot Device” or similar boot option you can activate (typically a function key) during the boot process? This will usually still list other devices even if they aren’t in the “active” boot priority list.


Yes there is. In this case it's F11. The boot list is empty unless the USB stick with the FreeBSD system is plugged in.

In the meantime I updated the BIOS of the mainboard. It's a SuperMicro X10SRW-F. Up until now it was running BIOS 3.0a. Now it is upgraded to 3.3.
However, the symptoms remain: The two NVMe drives do not show up as boot drives anymore.



Mjölnir said:


> Check your UEFI/BIOS and the docs of your MB. Some systems can not boot off a NVMe device. Then you can perform any _vodoo_, it just won't boot.


This system has been running for more than two years without any issues. It is perfectly capable of booting from NVMe drives. At least it was until I ran the `zfs upgrade`.



Eric A. Borisch said:


> As a last resort, you could make a USB boot stick that has the EFI loader (in an efi boot partition, similar to what you have on your nvd devices) and nothing else; it (the loader) will scan other devices after it doesn’t find a bootable filesystem on the device.


I think that this is gonna be my next step.


----------



## jbo (Mar 20, 2021)

Eric A. Borisch said:


> As a last resort, you could make a USB boot stick that has the EFI loader (in an efi boot partition, similar to what you have on your nvd devices) and nothing else; it (the loader) will scan other devices after it doesn’t find a bootable filesystem on the device.


I've just done this. I am able to boot from the created USB drive. From there the system complains that it cannot find a bootable partition with: `ERROR: cannot open /boot/lua/loader.lua: no such file or directory.`:


Sorry for the blury-ness.
Where would I go from here?


----------



## Eric A. Borisch (Mar 20, 2021)

The error you're getting (failed to find bootable partition) still points to the NVMe devices being not visible during the boot process; the fact that they are accessible after a full live image boot suggests this is a boot-time initialization issue, rather than a hardware failure.



joel.bodenmann said:


> Those are two m.2 form-factor NVMe drives attached to PCIe adapters. These are pure physical adapters and don't contain any sort of controller chips. Each of the two NVMe drives has its own PCIe adapter.



Hrmm. Perhaps the PCI devices need option rom scan turned on?

Perhaps this: https://www.supermicro.com/support/faqs/faq.cfm?faq=25543 ?


----------



## jbo (Mar 20, 2021)

Eric A. Borisch said:


> Hrmm. Perhaps the PCI devices need option rom scan turned on?
> 
> Perhaps this: https://www.supermicro.com/support/faqs/faq.cfm?faq=25543 ?


This has already been set to EFI.
The machine was able to boot in this configuration for more than two years. Literally nothing changed in terms of hardware or BIOS configuration between the working system and the broken system after `zfs upgrade`.

I looked through the BIOS settings several times - I even performed the suggested default settings reset, tried booting, modifed settings one by one back to the original configuration - the NVMe drives are still not listed as boot devices.
But as you suggested this is unlikely to be a hardware failure because I can access the disks without any problems from the booted live environment.

In the meantime I've also booted into the EFI shell and it shows all devices: The USB key(s), the 4x 10TB drives and the two NVMe drives. So the system certainly recognizes them. It just doesn't consider booting from them!

I am open for more suggestions.


----------



## Argentum (Mar 20, 2021)

joel.bodenmann said:


> My next step will be moving the two NVMe drives to a different host and trying booting from there.
> 
> I am open for more suggestions.


If this is too risky, or too much trouble, you can clone the system using the method I have practiced many times. 

If you have a spare SATA port (or even SATA drive with USB adapter will do), just take whatever cheap SSD or rotating drive and connect this to the computer. Partition it manually in the similar way your present drives are partitioned. The ZFS partition can be bigger, but *not smaller*. After that,* install loader in the EFI partition* and using `zpool attach` connect this new partition to the existing pool. Now you have a 3 way mirror. *Let it resilver* and and shut down after that. Then just physically remove the new drive and bring it to another computer. It should boot right away. This is how I have cloned my FreeBSD laptop. 

Later just `zpool detach` the now non existing drives from mirror. If the new partition is bigger, you can automatically have extra space in pool. This is how I changed the drive in may laptop and also moved the clone of my laptop system to test system.

*Be careful *not to use `zpool add` instead of `zpool attach` *or you are in trouble*! Read the manual zpool(8).


----------



## Eric A. Borisch (Mar 20, 2021)

In the BIOS, if you select “Add boot option” do the NVME drives show as an option to add?

From the EFI shell (where it does see the NVME devices) are you able to select them and boot?


----------



## jbo (Mar 20, 2021)

In the meantime I removed the two NVMe drives and added them to a different host: Same symptoms: They show up as devices, I can boot a live OS and import the pool and browse files but they don't show up as boot devices. I have done the same on yet another system ending up with the same symptoms.
Both other systems I tried to boot from those two NVMe drives are similar in design: Supermicro mainboards, Intel CPU and both of them boot from NVMe drives themselves in regular operation (I just took them down for this test). The mainboards and CPU are slightly different between the systems but the overall architecture is the same and everything used to be able to boot from NVMe drives.
I am really not sure what's going on here. I have more than one hosts which uses this exact configuration: Two separate NVMe drives (Samsung 970 Pro) via a m.2 PCIe adapter in a ZFS mirror pool. And again: The system in question used to work flawlessly for over two years so... wtf?

@Argentum: Thank yo ufor outlining that procedure. I have plenty of spare drives of all varieties to give this a go. I'll just take a regular old 512GB SATA SSD (my NVMe drives are both 256GB devices).

Eric A. Borisch: I am currently trying to figure out how I can select a drive and boot from it in the EFI shell. The drives show up as `blk1` and `blk2`. I have selected (?) one by typing `blk1:` in the shell. Then I tried to `ls` but the EFI shell reported: `ls/dir: Cannot open current directory - Not found`.
At the moment I am unsure whether this is simply because it's a device from a ZFS pool and the EFI shell doesn't know how to handle that.


----------



## jbo (Mar 20, 2021)

I have followed the procedure presented by Argentum (Adding a third (SATA SSD) device to the mirror, resilver the pool, shutdown, attach the SSD to a different machine) and as expected: The system boots just fine.

So... lets summarize:
  - We have a host that was happy to boot from an NVMe ZFS pool for over two years
  - The system received a `zfs upgrade`
  - The system can no longer find the boot device(s).
  - Booting a live system on the affected host allows to import the `zroot` pool, mounting the filesystem(s) and browsing them.
  - Adding the NVMe devices to other hosts (which are also all booting from 2x NVMe ZFS mirror pools) does not make them show up in the list of bootabel devices
  - Adding a SATA SSD to the pool that was imported while running a live system on the original host, resilvering the pool, removing the SATA SSD and adding it to a random old desktop computer allows booting the system as if nothing ever happened to it.

What do you guys make out of this?


----------



## Eric A. Borisch (Mar 20, 2021)

joel.bodenmann said:


> In the meantime I removed the two NVMe drives and added them to a different host: Same symptoms: They show up as devices, I can boot a live OS and import the pool and browse files but they don't show up as boot devices. I have done the same on yet another system ending up with the same symptoms.
> Both other systems I tried to boot from those two NVMe drives are similar in design: Supermicro mainboards, Intel CPU and both of them boot from NVMe drives themselves in regular operation (I just took them down for this test). The mainboards and CPU are slightly different between the systems but the overall architecture is the same and everything used to be able to boot from NVMe drives.
> I am really not sure what's going on here. I have more than one hosts which uses this exact configuration: Two separate NVMe drives (Samsung 970 Pro) via a m.2 PCIe adapter in a ZFS mirror pool. And again: The system in question used to work flawlessly for over two years so... wtf?
> 
> ...


So, just to recap; 

nvme*p1 are _msdosfs_ filesystems, with the file from your /boot/loader.efi (or from live USB, etc.) copied in to `<fsroot>/EFI/BOOT/bootx64.efi`, such that if nvmep1 is mounted at /boot/efi (not a necessity, but this is what the installer is going to start doing by default), you can see this (excepting ada vs. nvme), and your sizes are likely different from what I have (I'm running 13-RC3 built from source...)

`$ mount | grep efi
/dev/ada0p1 on /boot/efi (msdosfs, local)
$ ls -l /boot/efi/EFI/BOOT/bootx64.efi /boot/loader.efi
-rwxr-xr-x  1 root  wheel  896512 Mar 11 22:37 /boot/efi/EFI/BOOT/bootx64.efi
-r-xr-xr-x  2 root  wheel  896512 Mar 10 08:53 /boot/loader.efi`

And the partition type for nvmep1 is the efi boot type:
`$ gpart show -rp | grep p1
         40       4096  ada0p1  c12a7328-f81f-11d2-ba4b-00a0c93ec93b  (2.0M)`

I see your post about a new ssd you created and could get booting on another system; clearly the zpool has a usable system installation on it; but like I said earlier, on this system we don't even seem to be getting to the loader (and the fact your BIOS doesn't show the devices is certainly the issue to keep attacking, in my mind.) That's why double-checking the partition type, filesystem type, and bootx64.efi placement are what I'm asking here.


----------



## _martin (Mar 20, 2021)

Is the first picture you shared really the state just after you did the zfs upgrades and nothing else ? Didn't you try to do something else before posting the image ?
Because in the layout you shared you're missing the freebsd-boot partition. EFI is just a fancy partition you can jump from to OS bootloader.


----------



## jbo (Mar 20, 2021)

_martin said:


> Is the first picture you shared really the state just after you did the zfs upgrades and nothing else ? Didn't you try to do something else before posting the image ?
> Because in the layout you shared you're missing the freebsd-boot partition. EFI is just a fancy partition you can jump from to OS bootloader.


Yeah - I believe so, but can't promise for 100%.

Given that I am able to boot from a SATA SSD after performing the following steps:
1. `gpart backup nvd0 | gpart restore -F ada0`
2. Copying `/boot/loader.efi` from a FreeBSD 13.0 RC3 live system to the EFI partition
3. Adding the new SATA SSD to the mirrored zpool `zroot` and resilvering it
4. Removing the new SATA SSD and attach it to a random desktop machine
5. Boot happily

I'd say that things should be in order there.

I am currently performing a `dd` clone of the SATA SSD so I have a spare when things go wrong (I do have an off-site backup but re-installing and pulling the backups was the idea to avoid). After that I will further investigate as per Eric A. Borisch 's last post.


----------



## _martin (Mar 20, 2021)

I always create the freebsd-boot partition, even on uefi only systems. I was doing this manually and it may not be needed actually. I spawned a VM with uefi only and did a generic bsdinstall with UEFI only setup. Indeed there's no freebsd-zfs partition. 
I'm doing the upgrade in the VM myself now as I'm also curious to see why you hit the problem ( I have some ideas but I want to confirm it there ).


----------



## VladiBG (Mar 20, 2021)

Enter in your bios, navigate to PCIe/PCI/PnP Configuration and enable Option ROM support and set it to EFI. Save and exit. Then enter again in the bios.
In your bios setup under boot verify if "boot mode select" is set to UEFI if it's not set it then save and exit, then enter again in the bios.
Under boot menu verify if you see the names of your nvme disks. If you don't see them as boot device you can try manual to create a new boot record or use efibootmgr. Also check the content of your startup.nsh in ESP it must point to bootx64.efi.


----------



## Eric A. Borisch (Mar 20, 2021)

VladiBG said:


> Enter in your bios, navigate to PCIe/PCI/PnP Configuration and enable Option ROM support and set it to EFI. Save and exit. Then enter again in the bios.
> In your bios setup under boot verify if "boot mode select" is set to UEFI if it's not set it then save and exit, then enter again in the bios.
> Under boot menu verify if you see the names of your nvme disks. If you don't see them as boot device you can try manual to create a new boot record or use efibootmgr. Also check the content of your startup.nsh in ESP it must point to bootx64.efi.


This; but it's still unclear why it would have been working and then stop working after a `zpool upgrade`. If the drives don't show up in the bios boot list, I'm not sure how anything else on the disk matters.

I've never used efibootmgr, but I see that the install scripts do; perhaps there is some magic that needs to be re-set.


----------



## _martin (Mar 20, 2021)

So I did an upgrade in VM, from 12.1 to 12.2. Then I did the zfs upgrade as you did. The expected bootloader message was shown. I did nothing, rebooted the VM and I was able to boot. In other words I was not able to reproduce this.
But when I look on the sector0 on my disk:

```
root@fbsd:~ # hd -n 512 /dev/da0
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000001c0  02 00 ee ff ff ff 01 00  00 00 ff ff 7f 02 00 00  |................|
000001d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000001f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 55 aa  |..............U.|
00000200
root@fbsd:~ #
```
I truly see no MBR boot code. You may have pushed some when you were executing `gpart bootcode`. This could potentially confuse BIOS that has legacy fallback.
Once you're booted in 12.2 FreeBSD (while it may work to have different versions stick to the version you're trying to boot) try to recreate the EFI partition back (as you were trying to do before).

```
newfs_msdos /dev/nvd0p1
mkdir /newefi /curefi
mount -t msdosfs /dev/da0p1 /newefi
mdconfig -a -t vnode -f /boot/boot1.efifat
mount -o ro -t msdosfs /dev/md0 /curefi
cp -rp /curefi/* /newefi/
mdconfig -d -u 0
reboot
```


----------



## VladiBG (Mar 20, 2021)

When you install boot code (pMBR) and overwrite the ESP with zfsboot your Bios no longer detect those disk as bootable as they don't have the default .efi and startup.nsh . Then even if you restore the ESP partition you still need to create a boot record or  use efi shell to boot from those nvme disk. The actual zfs upgrade doesn't cause that.


----------



## _martin (Mar 21, 2021)

VladiBG True. But in all the answers mentioned here gpart bootcode was executed only on partition 1, sector 0 was not touched. So if OP didn't do anything else it could be enough. I also agree, this has nothing to do with the pool upgrade.

My idea I didn't check out was if actually free space between EFI and freebsd-swap is placeholder for additional bootcode.
EDIT: I tested it, no.


----------



## Argentum (Mar 21, 2021)

VladiBG said:


> When you install boot code (pMBR) and overwrite the ESP with zfsboot your Bios no longer detect those disk as bootable as they don't have the default .efi and startup.nsh . Then even if you restore the ESP partition you still need to create a boot record or  use efi shell to boot from those nvme disk. The actual zfs upgrade doesn't cause that.


Agree. Seems that maybe startup.nsh is missing in the EFI folder. 

The pool upgrade itself has nothing to do with this issue.


----------



## _martin (Mar 21, 2021)

I'd recheck these again though:
- BIOS is using UEFI mode only (you had this initially).
- Make sure your MBR (start of the /dev/nvd0, sector 0 or LBA0 if you will) defines one partition (UEFI standard does require to have it defined. It's a fake entry, but it has to be there ; my hd output above is an example of such entry)
- You've created proper FAT FS for partition 1 (nvd0p1). There's no such thing as 200MB FAT32 FS. Some tools do allow to overwrite this but it breaks MS standard. Some BIOSes may be sensitive to this (newfs_msdosfs, drop all switches, let the command decide on parameters)
- make sure you are able to mount/umount the fat FS in livecd. Verify the structure and presence of the efi/boot/BOOTx64.efi


----------



## jbo (Mar 25, 2021)

Thank you for all your inputs. Unfortunately I am unable to continue working on this system for the next few days.
I'll get back to you guys!


----------



## grahamperrin@ (Apr 20, 2021)

joel.bodenmann said:


> … an example of using `gpart bootcode` …



You probably know this by now, from <https://www.freebsd.org/releases/13.0R/relnotes/#boot>:



> > … To update old ESP partitions, users should stop using the  gpart(8) utility. Instead, …


----------

