# NFS and iSCSI problems



## Daniel Santos (Oct 19, 2014)

Hello all,

This is my first post here and I am new to FreeBSD (Linux admin, sorry). I am trying to use FreeBSD 10.0 running on a Supermicro box as a iSCSI or NFS storage backend for a small to medium sized Xenserver 6.2 pool. The problem is that I was not able to use neither of these.

For iSCSI, I tried to use ctl. Here is my ctl.conf:


```
portal-group pg00 {
    discovery-auth-group no-authentication
    #discovery-auth-group authgrp00
    listen 192.168.100.10
}

target iqn.2014-10.br.com.arcon:tgxen00 {
    auth-group no-authentication
    portal-group pg00
    lun 0 {
        path /dev/zvol/pool00/iscsi/spo-alg-xen00
        blocksize 512
    }
}
```

The zvol used is running on a simple pool with a couple of SATA disks and two SSDs for caching.


```
#zpool status pool00
  pool: pool00
state: ONLINE
  scan: none requested
config:

    NAME                               STATE     READ WRITE CKSUM
    pool00                             ONLINE       0     0     0
     mirror-0                         ONLINE       0     0     0
       gpt/c0d0_sata_WMAY01532686     ONLINE       0     0     0
       gpt/c1d4_sata_WMAY01444078     ONLINE       0     0     0
     mirror-1                         ONLINE       0     0     0
       gpt/c0d1_sata_WMAY01054452     ONLINE       0     0     0
       gpt/c1d5_sata_WMAY01480101     ONLINE       0     0     0
    cache
     gpt/c0d7_ssd_CVPO051002RZ160AGN  ONLINE       0     0     0
     gpt/c1d7_ssd_CVPO051002ST160AGN  ONLINE       0     0     0

errors: No known data errors
```

Please do not judge me yet, this is not the production design, just a POC =]. Here is the error I see in /var/log/messages after I try to connect to lun0 through Xen (the discovery process occurs without a problem):


```
Oct 19 20:44:59 spo-alg-str01 ctld[5080]: 192.168.100.23: read: connection lost
Oct 19 20:44:59 spo-alg-str01 ctld[913]: child process 5080 terminated with exit status 1
Oct 19 20:44:59 spo-alg-str01 ctld[5081]: 192.168.100.23 (iqn.2014-05.com.example:b7a0f07f): read: Connection reset by peer
Oct 19 20:45:11 spo-alg-str01 ctld[913]: child process 5081 terminated with exit status 1
Oct 19 20:45:11 spo-alg-str01 ctld[5084]: 192.168.100.23: read: connection lost
Oct 19 20:45:12 spo-alg-str01 ctld[913]: child process 5084 terminated with exit status 1
Oct 19 20:45:12 spo-alg-str01 ctld[5085]: 192.168.100.23: read: connection lost
Oct 19 20:45:12 spo-alg-str01 ctld[913]: child process 5085 terminated with exit status 1
Oct 19 20:45:12 spo-alg-str01 kernel: cfiscsi_ioctl_handoff: new connection from iqn.2014-05.com.example:b7a0f07f (192.168.100.23) to iqn.2014-10.br.com.arcon:tgxen00
Oct 19 20:45:37 spo-alg-str01 kernel: WARNING: 192.168.100.23 (iqn.2014-05.com.example:b7a0f07f): connection error; dropping connection
```

NFS uses the same pool, and here are the files I considered relevant:

rc.conf:

```
ifconfig_em0="inet 10.11.0.30 netmask 255.255.255.0 netmask 255.255.255.0 media 1000baseTX mediaopt full-duplex mtu 1500"
ifconfig_em1="inet 192.168.100.10 netmask 255.255.255.0 netmask 255.255.255.0 media 1000baseTX mediaopt full-duplex mtu 9000 tso lro rxcsum txcsum"
defaultrouter="10.11.0.1"

...

#NFS
rpcbind_enable="YES"
rpc_statd_enable="YES"
rpc_lockd_enable="YES"
nfs_client_enable="YES"
nfsv4_server_enable="YES"
nfs_server_enable="YES"
nfs_server_flags="-u -t -n 20"
mountd_flags="-r"
mountd_enable="YES"
```

/etc/exports:

```
/pool00/nfs/vms -alldirs -maproot=root -network 192.168.100.0/24
```

This time, Xen is able to mount the share, but when I try to write something to it, the client hangs for ever and no error message comes up.

I am using an 82574L Gigabit Network card from Intel. I already tried to disable all offloading features and compiling the official driver but nothing seems to solve the problems I have.


----------



## Sebulon (Oct 20, 2014)

Hi Daniel!

Why do you have netmasks specified twice for each NIC?

There are many options for the NIC's that I think are unneeded, maybe even wrong. I would try just like this instead:

```
ifconfig_em0="inet 10.11.0.30 netmask 255.255.255.0 mtu 1500"
ifconfig_em1="inet 192.168.100.10 netmask 255.255.255.0 mtu 9000"
```

And remember to also configure the switch appropriately for jumbo frames, unless it´s just a direct wire of course.

Before trying to use a service like iSCSI or NFS, have you made sure you can ping the client with large sized packets? `# ping -c 1 -s 8192 <client address>`

Have you tried not using NFSv4 by removing the line:

```
nfsv4_server_enable="YES"
```
from rc.conf?

/Sebulon


----------



## Daniel Santos (Oct 20, 2014)

Hello Sebulon,

Thanks for the help. It seems that I need to enable jumbo packets at the global configuration level as well as at the interface level. This is probably my problem. Do you know what problems I can have if there are other machines connected to this switch without jumbo frames enabled? I read about some performance issues but nothing well explained.


----------



## mav@ (Oct 20, 2014)

For kernel iSCSI testing I would recommend you to update your system to coming soon FreeBSD 10.1-RELEASE, or already present 10.1-RC2. CTL and iSCSI subsystems have been under active development last months, and they've gotten many improvements/fixes (including Xen interoperability) since 10.0-RELEASE.

By the way, NFS server in FreeBSD 10.1 also got many optimizations comparing to 10.1. It should not affect compatibility, but it should work much faster now, especially on large systems.


----------



## mav@ (Oct 20, 2014)

Daniel Santos said:


> Do you know what problems I can have if there are other machines connected to this switch without jumbo frames enabled? I read about some performance issues but nothing well explained.



It depends on the protocol used. TCP has its own mechanism to limit negotiate packet size to the lowest value that both sides support, called MSS. It may allow things to work, but may not help in situations when both sides support large MTU, while some router between them doesn't. For other protocols, such as UDP or ICMP, large packets may just get dropped on the receiver, causing multiple retries and finally timeout errors.


----------



## Daniel Santos (Oct 20, 2014)

Hello mav@.
I just switched everyone's frame size back to 1.5 KB. Now Xen is able to discover the LUN, log in to it and even format it for usage, but when I create a virtual disk attach it to a virtual machine and try to format it, the VM hangs forever. I can see the pool usage increasing bit by bit during the format processing, but this seems to be giving a few bytes/sec. I will reach the DC next week to directly connect the servers to see if the switch is the problem.

Thanks for all your help people.


----------



## Sebulon (Oct 21, 2014)

mav@ said:


> It depends on the protocol used. TCP has its own mechanism to limit negotiate packet size to the lowest value that both sides support, called MSS. It may allow things to work, but may not help in situations when both sides support large MTU, while some router between them doesn't. For other protocols, such as UDP or ICMP, large packets may just get dropped on the receiver, causing multiple retries and finally timeout errors.



Didn´t know about MSS. From my own experience, trying to reach a system using jumbo frames from one with regular size just don´t work, and causes performance issues in the switches between, trying to repackage everything.

Daniel Santos 
Never ever mix jumbo frame sized networks with regular networks. If you want to access the machine another way, create a VLAN-trunk on the port with one VLAN for the jumbo storage part, and another VLAN for the other network. That way you can configure jumbo frames on the storage VLAN interface, while having regular sized packets on the other network VLAN interface. Or better yet, do an LACP lagg device with VLAN interfaces ontop:

```
ifconfig_em0="mtu 9000 up"
ifconfig_em1="mtu 9000 up"
cloned_interfaces="lagg0 vlan1 vlan10"
ifconfig_lagg0="up laggproto lacp laggport em0 laggport em1"
ifconfig_vlan1="inet 10.11.0.30 netmask 255.255.255.0 vlan 1 vlandev lagg0 mtu 1500"
ifconfig_vlan10="inet 192.168.100.10 netmask 255.255.255.0 vlan 10 vlandev lagg0 mtu 9000"
```

This way you can have as many network as want without patching any more cables, and get failover if a NIC or cable dies. Note that the switch needs to be properly configured and have support for LACP and VLAN (enterprise switches have that).

/Sebulon


----------

