# Problem booting 9.0R from raid controller



## dalroi (Oct 20, 2012)

I'm attempting to move my FreeBSD installation from a disk containing an old 7-STABLE to 9.0-RELEASE on an existing disk on my raid controller. I managed to do a manual install on that new disk, but when I attempt to boot it I just get a cursor symbol on an empty screen - it's not even blinking!

Obviously I forgot something or I did something wrong. The question is what, and what to do about it?

The system is an old Tyan Tiger MPX board. The current 7-STABLE installation was done on the (then) usual PATA disks, but a couple of years ago I added an AMCC 3Ware 9550:
	
	



```
3ware device driver for 9000 series storage controllers, version: 3.70.05.010
twa0: <3ware 9000 series Storage Controller> port 0x1000-0x103f mem 0xf6000000-0xf7ffffff,0xf40000000xf4000fff irq 21 at device 9.0 on pci0
twa0: [ITHREAD]
twa0: INFO: (0x15: 0x1300): Controller details:: Model 9550SXU-4LP, 4 ports, Firmware FE9X 3.08.02.005, BIOS BE9X 3.08.00.002
```

In the BIOS I put the 3ware controller before the other hard disks.

The new disk is the 2nd unit on that controller (it contains 2 mirrors), and partitioned using gpart as follows:
	
	



```
# gpart show da1
=>        34  1953103805  da1  GPT  (931G)
          34         128    1  freebsd-boot  (64K)
         162     1048576    2  freebsd-ufs  (512M)
     1048738     8388608    3  freebsd-swap  (4.0G)
     9437346     1048576    4  freebsd-ufs  (512M)
    10485922     2097152    5  freebsd-ufs  (1.0G)
    12583074    31457280    6  freebsd-ufs  (15G)
    44040354   104857600    7  freebsd-ufs  (50G)
   148897954  1804205885    8  freebsd-ufs  (860G)
```

I did write bootcode to the partition on index 1: `# gpart bootcode -b /boot/pmbr -p /boot/gptboot -i 1 /da1`

Naturally I labeled the partitions to be able to do that manual install I mentioned earlier, namely: /, /var, /tmp, /usr, /home and /data respectively. The last partition (p8) has been in use for quite a while now and is nearly full. The other partitions have been unused so far, so I don't mind losing data on those.

I checked with old fdisk that the disk is indeed bootable (output from 7-STABLE):
	
	



```
# fdisk /dev/da1
******* Working on device /dev/da1 *******
parameters extracted from in-core disklabel are:
cylinders=121575 heads=255 sectors/track=63 (16065 blks/cyl)

Figures below won't work with BIOS for partitions not in cyl 1
parameters to be used for BIOS calculations are:
cylinders=121575 heads=255 sectors/track=63 (16065 blks/cyl)

Media sector size is 512
Warning: BIOS sector numbering starts with sector 1
Information from DOS bootblock is:
The data for partition 1 is:
sysid 238 (0xee),(EFI GPT)    start 1, size 1953103871 (953663 Meg), flag 80 (active)
        beg: cyl 1023/ head 255/ sector 63;
        end: cyl 1023/ head 255/ sector 63
The data for partition 2 is:
<UNUSED>
The data for partition 3 is:
<UNUSED>
The data for partition 4 is:
<UNUSED>
```

Is something in the above incorrect?

Is it possible that the system doesn't know how to boot a GPT partition somehow? In that case, what do I do to "convert" this disk to have at least one bootable MBR partition? I'd prefer to not have to juggle nearly a terabyte of data around that's already on it, but I can if necessary.


----------



## dalroi (Oct 20, 2012)

A couple of things I ought to add:

I did remember to enable the booting BIOS on the 3ware controller.

I intend to remove the PATA disks from the system in the near future (once I've taken any remaining data off of them). They're part mirror and part stripe and relatively small (320GB) and in the process of failing.


----------



## jb_fvwm2 (Oct 20, 2012)

It is unclear to me[1] (I've read your posts twice) how exactly the new setup that is failing to boot was created, its components (hardware and versions), partition layout on each of the disks involved in any way, procedures you did during the attempt to upgrade, etc...  Post it all again?
[1] Not that I could help further though...


----------



## dalroi (Oct 20, 2012)

It's a clean install on a previously partitioned disk, as shown in the gpart output.
The old version (7-STABLE) is on a different set of disks and currently happily buzzing along - I don't think it's relevant to the problem.

I'm not sure where the boot gets stuck, nothing much has been printed to the screen at that point.
On another system running CURRENT, the point where I get stuck on this server looks like the blinking cursor you get just before the BTX loader starts up. Immediately after that, it should start to spin (- / | \ etc). On the server, I just seem to get '-'.
I'm not sure what that's coming from though - could be the motherboard bios, could be the 3ware bios or could be the BTX loader.

I can boot up using the memstick image and coerce it to boot from /dev/da1 by unloading the kernel from the boot loader and replacing it with the one on the disk and by setting some boot-parameter. It was something like 
	
	



```
set currdev=disk2p2
```
That way I can boot the new system. It still starts up the bsdinstaller, but it is running the new kernel on the disk, with the disk as root device. So the system is bootable once I circumvent the boot loader stage, it seems.

For the manual installation procedure I used http://www.b0rken.org/wiki/index.php/FreeBSD_9.0_manual_installation as a guide, I did as follows:
	
	



```
# gpart bootcode -b /boot/pmbr -p /boot/gptboot -i 1 da1
# mount -u /dev/da1p2 /mnt
# cd /mnt
# mkdir var usr tmp home
# mount /dev/da1p4 /var
# mount /dev/da1p5 /tmp
# mount /dev/da1p6 /home
# tar Jxpf /usr/freebsd-dist/base.txz
# tar Jxpf /usr/freebsd-dist/kernel.txz
```

As I said, the disk was partitioned like this already, I did that months ago in preparation of this upgrade and I'm afraid I don't recall the details. I think all the details are visible from the results as shown in the gpart output though?


----------



## wblock@ (Oct 20, 2012)

The configuration is confusing, but this may be another instance of graid(8) being a problem.  Why are you connecting the non-RAID disk to the RAID controller?  If there is old RAID metadata on the disk, it can do unexpected things.


----------



## dalroi (Oct 21, 2012)

I'm not using graid, where's that coming from?!? This is a hardware raid configuration.


----------



## dalroi (Oct 21, 2012)

Oh hang on, you're referring to those PATA disks? Those are the 7-STABLE disks. I just mentioned those because I could use those disks to get the system to boot the problem disk unit, but only until I remove them. They're not reliable and not really relevant to the problem.


----------



## dalroi (Oct 21, 2012)

Better question; what is confusing about the configuration? Either I failed to explain something or you're seeing something I don't.


----------



## wblock@ (Oct 21, 2012)

graid(8) is explained in that link.  It's part of the GENERIC kernel now (9.1-RC2), and was not before, so some people are seeing different behavior.

Why the manual installation?  It should not be necessary, at least to get 9.0 installed on this disk.


----------



## dalroi (Oct 21, 2012)

Forget about graid please, it's *not* relevant to my situation. I only use software raid in the *old* situation that I'm migrating away from and which is on *different* disks (PATA) and at a *different* release (7-STABLE). The only reason I mentioned those disks is that the system still can be booted to the old OS, which is probably useful in debugging this problem for example. In fact, I wouldn't be able to answer here if that system weren't running right now - it's my gateway, among other things.

Back to the problem disk: da2;
For completeness sake; I use the word "disk" for da2, but in fact it's a set of mirrored disks on the raid controller (twa0). To the OS it's a single disk though.

I had to do a manual installation on disk da2, because bsdinstaller bailed out during the process. I Googled the error message (Something like "Can't set uid=0/gid=0 on var/empty"); apparently that's because my disk was partitioned before bsdinstaller ran on it. It may be particular to the memstick image that I use for the installation.

There has never been an OS on disk da2 before. I originally added the disk because I needed more disk space. I partitioned it such that I _could_ put an OS on it later, because I foresaw that those PATA disks (which the 7-STABLE install lives on) would break one day and that they would be impossible to replace on short term.
So, back then I created partitions da2p1 to da2p8. Of those, p1 to p7 have been empty until recently. Partition p8 contains the data that I needed the disk for. The non-booting install problem is about the OS I installed in partitions da2p1 to da2p7.

Come to think of it, there must have been a FS on those partitions or I wouldn't have been able to mount them during the manual install. I think bsdinstaller did get that far before it bailed out, as it was about to extract the base system when it complained about var/empty. It's also possible that I put an FS on those partitions when I prepared da2p8 for usage. At this point I'm not certain which is the case, that happened quite a while ago (one of the disks in the hardware mirror is labeled Dec-2009, the other is brand new as I had to replace its predecessor).


----------



## wblock@ (Oct 21, 2012)

To be clear:

*graid(8) is hardware RAID.  It's a GEOM module that deals with hardware RAID.*  Because that module detects the metadata that hardware RAID controllers write on the disk, and because that module is included in the GENERIC kernel now, it can cause systems to react differently than before.

Things have changed a lot in the long-delayed release of 9.1.  As long as the data is safely backed up, I would suggest trying an install from a 9-STABLE snapshot.  They are available here:  https://pub.allbsd.org/FreeBSD-snapshots/


----------



## dalroi (Oct 21, 2012)

In that case the man page is misleading, as it starts out with: 





> The graid utility is used to manage software RAID configurations, supported by the GEOM RAID class.


 If it does indeed support hardware RAID though, it doesn't mention 3ware controllers among its supported metadata formats.
It does however go on saying: 





> To allow booting from RAID volume, the metadata format should match the RAID BIOS type and its capabilities. To guarantee that these match, it is recommended to create volumes via the RAID BIOS interface


 That is exactly what I did, which suggests that it should work.

Thanks for pointing me to the STABLE snapshots, that will save me an upgrade later on and perhaps the bsdinstaller has improved enough that it can handle existing layouts now.

I haven't had an opportunity yet to check what graid says about my disk and the newer memstick image is still downloading, I'll get back on that later.


----------



## wblock@ (Oct 21, 2012)

Sorry, I was thinking of motherboard RAID as hardware.  It's not, really, and you're not using motherboard RAID anyway.

bsdinstall(8) has not changed, but the GEOM subsystem has.  Note also that you can use existing partitions and filesystems in bsdinstall with the Shell option.  It shows where to mount them.


----------



## dalroi (Oct 21, 2012)

Oh I see, bsdinstall can be told where to start. I could probably have used bsdinstall mount & bsdinstall distextract to do the installation. I'll keep that in mind for next attempt.

In the meantime I figured out that whether my BIOS (or controller BIOS) understand GPT or not (probably not) probably doesn't matter, since I also wrote a MBR (/boot/pmbr) to the disk. Yet, it looks like the BTX loader doesn't load.
I can't tell whether it gets executed at all or whether it hangs somewhere - any clues to figuring out what's going on there? Perhaps the loader doesn't understand its environment or something? The kernel needs the twa driver to see the disks, perhaps something similar goes for the BTX loader? (I assume not, that stage is still mostly controlled by BIOS, isn't it?)


----------



## wblock@ (Oct 22, 2012)

No, I meant to use the Shell option from the partitioning screen.


----------



## phoenix (Oct 22, 2012)

Blinking cursor in the top-left corner of the screen means the BIOS doesn't know which device to boot from, has found a couple different bootable options, and is just spinning its wheel waiting for the cows to come home.  This is most likely due to you having bootable PATA drives installed, a non-bootable RAID array installed, and then the bootable RAID array (the one you want to boot from).

Read your BIOS boot screen.  There will be an option to "select the boot device" or "show bootable devices" or along those lines.  Press that key (usually F3 or F8 or around there).  In that list will be all your detected block devices, including the PATA drives, and the multiple RAID arrays.  Select the second RAID array.  And then the boot process should continue.

If this is the case, then don't worry about it.  Use the manual process above to boot.  Get all the data off the old drives.  Then remove all the old drives and non-bootable RAID arrays.  After that, your BIOS won't be confused, and the boot process will proceed normally.

I have this issue at home with multiple bootable USB sticks plugged in.  I have to manually select the boot device, until I get the data migrated over correctly to only have a single bootable USB stick.


----------



## dalroi (Oct 22, 2012)

Thanks, that's the kind of suggestions I was hoping for!
For example, I just found out that I can't set the bootable option on a volume from the 3DM manager, but that I supposedly can in the 3BM manager - No wonder I couldn't find it! I think you put me on the right track


----------



## wblock@ (Oct 23, 2012)

Pretty sure I've also seen missing bootcode cause the "not doing anything" startup.


----------



## dalroi (Nov 4, 2012)

Today I finally managed to continue on this "project", after I created a backup of the contents of the disk (which took a long time over 100Mbit network - 34 hours! Better than the USB1 that's on the system though).

I recreated the unit in the raid controller, this time using a smallish (20G) boot volume and the rest as a second volume. Per the documentation, that boot volume is not a requirement to be able to boot from the controller, but it seemed worth a try.

I installed a recent STABLE snapshot as per the recommendations. I didn't run into the bsdinstall problems of last time, which made things a bit easier.

Initially I tried using a GPT partitioned boot volume (that 20G volume from the previous paragraph), *which didn't boot*. Then I re-did the install using MBR partitioning on the boot volume (glad I created that separately): *Didn't boot either*. Same problem as I was having before.

So I went back to the user manual of the RAID controller and lo and behold, in the Troubleshooting section it says that the boot unit *needs to be the first unit* in the controller's BIOS!
I'm currently waiting on a restore of my home directory, I'll try the BIOS thing once that's done.

Edit: I had to redo the install as I forgot to add a swap partition and took that opportunity to try GPT again. The system is now booting off the first unit of the RAID controller using a GPT partitioned slice! Hurray!

Thanks for the suggestions, it helped me figure out what wasn't causing my problem, which in this case was just as important as figuring out what was.


----------

