# NVMe passthru on Bhyve findings



## Phishfry (Dec 26, 2019)

I am currently using a SuperMicro X10DRL board for my Bhyve Virtualization machine. OS boots from a 64GB DOM.
I use two Samsung 512GB PM953 drives in a M.2 form factor mounted on a SuperMicro AOC-SLG3-2M2.
The slot where the card  is installed is set to x4x4 in the BIOS for bitfurication.
I am using a `gmirror` of the two NVMe for redundancy. I mount the gmirror 'tank' at /vm via fstab and have all my VM image files on the mirror.
This is an older generation of NVMe with not so fast performance. Slightly more than double my SSD speeds.

```
root@virt:~ # diskinfo -t /dev/mirror/tankp1
/dev/mirror/tankp1
    512             # sectorsize
    960197083136    # mediasize in bytes (894G)
    1875384928      # mediasize in sectors
    512             # stripesize
    0               # stripeoffset
    116737          # Cylinders according to firmware.
    255             # Heads according to firmware.
    63              # Sectors according to firmware.
    Yes             # TRIM/UNMAP support
    Unknown         # Rotation rate in RPM

Seek times:
    Full stroke:      250 iter in   0.032662 sec =    0.131 msec
    Half stroke:      250 iter in   0.031611 sec =    0.126 msec
    Quarter stroke:      500 iter in   0.062575 sec =    0.125 msec
    Short forward:      400 iter in   0.048904 sec =    0.122 msec
    Short backward:      400 iter in   0.048604 sec =    0.122 msec
    Seq outer:     2048 iter in   0.088228 sec =    0.043 msec
    Seq inner:     2048 iter in   0.095934 sec =    0.047 msec

Transfer rates:
    outside:       102400 kbytes in   0.097920 sec =  1045752 kbytes/sec
    middle:        102400 kbytes in   0.098955 sec =  1034814 kbytes/sec
    inside:        102400 kbytes in   0.098688 sec =  1037613 kbytes/sec
```

So I read that FreeBSD 12 includes the option to passthru a NVMe drive with the standard convention:
-s 7:0,nvme,/dev/nda2
For VM passthru testing I am using an additional NVME, a Samsung PM983 in the U.2 form factor, while still hosting my VM's on the gmirror.
Below is testing the newer Samsung PM983 NVMe on the host machine (hypervisor) for a reference point.

```
# diskinfo -t /dev/nda2
/dev/nda2
    512             # sectorsize
    960197124096    # mediasize in bytes (894G)
    1875385008      # mediasize in sectors
    512             # stripesize
    0               # stripeoffset
    SAMSUNG MZQLB960HAJR-000AZ                 # Disk descr.
    S3VKNE0KA06254         # Disk ident.
    Yes             # TRIM/UNMAP support
    0               # Rotation rate in RPM

Seek times:
    Full stroke:      250 iter in   0.017148 sec =    0.069 msec
    Half stroke:      250 iter in   0.016615 sec =    0.066 msec
    Quarter stroke:      500 iter in   0.031590 sec =    0.063 msec
    Short forward:      400 iter in   0.022771 sec =    0.057 msec
    Short backward:      400 iter in   0.022195 sec =    0.055 msec
    Seq outer:     2048 iter in   0.047863 sec =    0.023 msec
    Seq inner:     2048 iter in   0.041782 sec =    0.020 msec

Transfer rates:
    outside:       102400 kbytes in   0.056853 sec =  1801136 kbytes/sec
    middle:        102400 kbytes in   0.052590 sec =  1947138 kbytes/sec
    inside:        102400 kbytes in   0.065681 sec =  1559051 kbytes/sec
```

So as you can see the speed is almost double, compared to the earlier PM953, on the host before passthru.
Unfortunately, after passthru, the speed drops off considerably. This test is from the VM.


```
root@freebsd1:~ # diskinfo -t /dev/nvd0
/dev/nvd0
        512             # sectorsize
        960197124096    # mediasize in bytes (894G)
        1875385008      # mediasize in sectors
        0               # stripesize
        0               # stripeoffset
        bhyve-NVMe      # Disk descr.
        NVME-5-0        # Disk ident.
        No              # TRIM/UNMAP support
        0               # Rotation rate in RPM

Seek times:
        Full stroke:      250 iter in   0.060528 sec =    0.242 msec
        Half stroke:      250 iter in   0.058457 sec =    0.234 msec
        Quarter stroke:   500 iter in   0.116592 sec =    0.233 msec
        Short forward:    400 iter in   0.090369 sec =    0.226 msec
        Short backward:   400 iter in   0.089503 sec =    0.224 msec
        Seq outer:       2048 iter in   0.385192 sec =    0.188 msec
        Seq inner:       2048 iter in   0.382445 sec =    0.187 msec

Transfer rates:
        outside:       102400 kbytes in   0.343736 sec =   297903 kbytes/sec
        middle:        102400 kbytes in   0.344296 sec =   297419 kbytes/sec
        inside:        102400 kbytes in   0.343548 sec =   298066 kbytes/sec
```

That is even slower than my NVMe based VM image files which are translated to ada0 from the host gmirror NVMe:

```
root@freebsd1:~ # diskinfo -t /dev/ada0
/dev/ada0
        512             # sectorsize
        80530636800     # mediasize in bytes (75G)
        157286400       # mediasize in sectors
        32768           # stripesize
        0               # stripeoffset
        38550           # Cylinders according to firmware.
        16              # Heads according to firmware.
        255             # Sectors according to firmware.
        BHYVE SATA DISK # Disk descr.
        BHYVE-9134-7524-5C08    # Disk ident.
        No              # TRIM/UNMAP support
        Unknown         # Rotation rate in RPM
        Not_Zoned       # Zone Mode

Seek times:
        Full stroke:      250 iter in   0.109487 sec =    0.438 msec
        Half stroke:      250 iter in   0.105601 sec =    0.422 msec
        Seq outer:       2048 iter in   0.376304 sec =    0.184 msec
        Seq inner:       2048 iter in   0.401486 sec =    0.196 msec

Transfer rates:
        outside:       102400 kbytes in   0.222996 sec =   459201 kbytes/sec
        middle:        102400 kbytes in   0.253095 sec =   404591 kbytes/sec
        inside:        102400 kbytes in   0.233424 sec =   438687 kbytes/sec
```
As you can see the drop here is around half the speed while a directly passthru NVMe drops the speed by more than 6x.
I am open to suggestions for better speed.
Originally I had planned to convert all my VM's to NVMe passthru. No sense if it is slower.


----------



## aragats (Mar 24, 2020)

Phishfry said:


> FreeBSD 12 includes the option to passthru a NVMe drive with the standard convention: -s 7:0,nvme,/dev/nda2


What's the point? Isn't it easier to pass it through as a PCIe device? It perfectly works here. Unfortunately, I cannot report the actual numbers right now since that laptop died last week, and Dell keeps promising to fix it...


----------



## Phishfry (Mar 24, 2020)

I was hoping to not lose any speed of the NVMe for a fast VM.
One drawback to the method, besides lost speed, is you must dedicate a whole drive to the VM.
Whereas passing through a AHCI-HD VM image on the host NVMe you can run multiple VM's from the NVMe.
I was simply disappointed that the speed was reduced so much on direct passthru of the whole drive.

I am setting up my Bhyve graphical VM's experiment box right now with a Samsung PM983 NVMe hosting the VM's and I stress tested 2 VM's both simultaneously installing Xorg/Xfce from packages and one VM crashed. So I am still experimenting. I had a `gstat` instance running on the host to see how well the NVMe was handling multiple VM I/O requests and the NVMe never got above 20 Megabyes/sec. So I am not sure why one VM crashed.


```
[6/347] Installing libxcb-1.13.1...
[6/347] Extracting libxcb-1.13.1:  54%

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address    = 0x31
fault code        = supervisor read data, page not present
instruction pointer    = 0x20:0xffffffff80c8785c
stack pointer            = 0x28:0xfffffe004b467370
frame pointer            = 0x28:0xfffffe004b4673d0
code segment        = base 0x0, limit 0xfffff, type 0x1b
            = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process        = 843 (pkg)
trap number        = 12
panic: page fault
cpuid = 1
time = 1585084335
KDB: stack backtrace:
#0 0xffffffff80c1d297 at kdb_backtrace+0x67
#1 0xffffffff80bd05cd at vpanic+0x19d
#2 0xffffffff80bd0423 at panic+0x43
#3 0xffffffff810a7d2c at trap_fatal+0x39c
#4 0xffffffff810a7d79 at trap_pfault+0x49
#5 0xffffffff810a736f at trap+0x29f
#6 0xffffffff81081a0c at calltrap+0x8
#7 0xffffffff80c862b7 at cache_lookup+0x67
#8 0xffffffff80c8b71c at vfs_cache_lookup+0xac
#9 0xffffffff81229286 at VOP_LOOKUP_APV+0x76
#10 0xffffffff80c94c21 at lookup+0x6d1
#11 0xffffffff80c940f7 at namei+0x437
#12 0xffffffff80cb1520 at vn_open_cred+0xd0
#13 0xffffffff80ca9cd3 at kern_openat+0x213
#14 0xffffffff810a88e4 at amd64_syscall+0x364
#15 0xffffffff81082330 at fast_syscall_common+0x101
Uptime: 50m48s
~
[EOT]
```


----------



## aragats (Mar 24, 2020)

Phishfry said:


> I was simply disappointed that the speed was reduced so much on direct passthru of the whole drive.


Well, it is not direct passthru, nvme is another layer on top of PCI, that's why I used PCI passthru.


----------



## Phishfry (Mar 24, 2020)

I see what you'r saying.
Passthru the PCI address of the NVMe for direct connect.
I will give that a whirl right now.


----------



## Phishfry (Mar 24, 2020)

Well that gave me a better benchmark speed.
-s 4:0,passthru, 5/0/0

From the VM:

```
diskinfo -t /dev/nvd0p2
/dev/nvd0p2
        512             # sectorsize
        959925190656    # mediasize in bytes (894G)
        1874853888      # mediasize in sectors
        0               # stripesize
        209735680       # stripeoffset
        116704          # Cylinders according to firmware.
        255             # Heads according to firmware.
        63              # Sectors according to firmware.
        SAMSUNG MZQLB960HAJR-000AZ      # Disk descr.
        S3VKNE0KA06251  # Disk ident.
        Yes             # TRIM/UNMAP support
        0               # Rotation rate in RPM

Seek times:
        Full stroke:      250 iter in   0.038486 sec =    0.154 msec
        Half stroke:      250 iter in   0.033922 sec =    0.136 msec
        Quarter stroke:   500 iter in   0.062456 sec =    0.125 msec
        Short forward:    400 iter in   0.051672 sec =    0.129 msec
        Short backward:   400 iter in   0.052281 sec =    0.131 msec
        Seq outer:       2048 iter in   0.178363 sec =    0.087 msec
        Seq inner:       2048 iter in   0.225614 sec =    0.110 msec

Transfer rates:
        outside:       102400 kbytes in   0.175364 sec =   583928 kbytes/sec
        middle:        102400 kbytes in   0.166697 sec =   614288 kbytes/sec
        inside:        102400 kbytes in   0.147065 sec =   696291 kbytes/sec
```


----------

