# Lagg issue between FreeBSD box and Windows Seven (working well with Ubuntu)



## milonz (Sep 16, 2012)

Hi,

*I*'ve recently acquired a*n* HP Proliant Microserver and *I*'ve installed FreeBSD 9 for its ZFS support.
The server comes with a Broadcom Corporation NetXtreme BCM5723 Gigabit Ethernet PCIe network card (detected as bg*e*0), and *I*'ve added a*n* Intel 1000CT card (82574L, detected as em0)*.*  I've loaded the box with a bunch of disks, and ZFS performs like a charm!

To maximize the throughput, *I*'ve bonded the two network interfaces via the lagg(4)module, and it works well:

*I* have failover when *I* disconnect one of my cables
*I* reach 900 Mbs to 1200 Mbs when doing iperf tests (these values will be explained later)

I also have a*n* Ubuntu box with a Realtek gigabit card on board (RTL8111/8168B), and all the bandwidth tests with iperf *I*'ve done get me 900 Mbs (*I* plan to bond this box with another interface too, as soon as *I* receive another Intel 1000CT card, in order to perform over 1 Gbs, because it's my second NAS box and *I*'ll backup my data)

However, *I* have a third PC, a Windows *7* (nobody's perfect ...). It has the same network card on board as the Ubuntu box, *I* believe, a*n* RTL8111E chip.

The banwidth with his one is pretty much abnormal. I have 150 Mbs upload from the Free*BSD* box to Windows, and 350 Mbs from Windows to Free*BSD*.

Of course, *I*'ve done several tests as swapping cables to make sure it's not a physical issue. I've done all my tests with iperf, which is not hard disk dependant (and anyway, Windows *7* runs on a*n* SSD).  And done some tweaking on the Windows side, like enabling jumbo frames.

But, it did not help.

My original Free*BSD* setup was:

```
cloned_interfaces="lagg0"
ifconfig_bge0="up"
ifconfig_em0="up"
ifconfig_lagg0="laggproto roundrobin laggport bge0 laggport em0"
ipv4_addrs_lagg0="192.168.0.3/24"
```

So, *I*'ve disabled roundrobin, and set the *IP* address on only one card as following (*I*'ve switched em0 and bge0 during my tests, to make sure the flaw wasn't coming from one of the cards) :


```
cloned_interfaces="lagg0"
ifconfig_bge0="up"
ifconfig_em0="up"
#ifconfig_lagg0="laggproto roundrobin laggport em0 laggport bge0"
#ipv4_addrs_lagg0="192.168.0.3/24"
ifconfig_bge0=" inet 192.168.0.3 netmask 255.255.255.0"
#ifconfig_em0=" inet 192.168.0.3 netmask 255.255.255.0"
```

When disabling roundrobin, *I* have a very generous bandwidth between the *Free*BSD box and Windows (up to 900 *M*bs with iperf, both sides, with bge0 and em0).  Accordingly, my FTP transfers reach 100 MB/s for a real life example.

But when *I* re-enable roundrobin (the original setup), the bandwidth is once more crappy (150 Mbs upload and 350 Mbs download)*.*

I've run iperf as a tcp server on the Free*BSD* box, and with the Ubuntu and Windows boxes as simultaneous tcp clients, *I* reach 1*.*2 Gbs bandwidth (900+300) on the Free*BSD*, so I know roundrobin is really working.
My goal is to have a "co*m*fortable" bandwidth between the Free*BSD* and the Ubuntu servers, because *I*'ll have 6 TB backups going through the wires (*I*'ll receive the missing card soon), so i need roundrobin on the *F*ree*BSD* side too.

But as the Windows box is my leisure PC, it's no way to keep the things as they are (30-40 MB/s will be too short for some uses, as reading big iso files). ZFS is soooo good, *I* don't want to switch the Proliant server from *F*ree*BSD* to another Linux with a raid =/  And besides, *I*'m a sysadmin, and *I*'m stubborn, *I*'ll keep searching 'till *I* find a solution ...

For now, all the tests *I*'ve run lead me to believe that lagg(4)'s roundrobin somehow is "incompatible" with the Microsoft's IP stack. But *I* don't know what to do.

I have a proper physical network
I've searched for dropped packets but there's none reported by iperf (when you switch the utility to udp connections, it reports how much packets were lost, and there's none).
The network cards on Ubuntu and Windows are the same

If somebody has a solution (tweaking the Free*BSD* box or Windows *7*), let me know.
It really matters for me.

Thanks!


----------



## milonz (Sep 16, 2012)

My i*f*config trace, if it helps!


```
em0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=219b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WOL_MAGIC>
        ether 68:05:ca:0a:0c:26
        inet6 fe80::6a05:caff:fe0a:c26%em0 prefixlen 64 scopeid 0x1
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
bge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=c019b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,VLAN_HWTSO,LINKSTATE>
        ether 68:05:ca:0a:0c:26
        inet6 fe80::ea39:35ff:fe2d:f1cd%bge0 prefixlen 64 scopeid 0x2
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=3<RXCSUM,TXCSUM>
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x9
        inet 127.0.0.1 netmask 0xff000000
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
lagg0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=19b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4>
        ether 68:05:ca:0a:0c:26
        inet 192.168.0.3 netmask 0xffffff00 broadcast 192.168.0.255
        nd6 options=29<PERFORMNUD,IFDISABLED,AUTO_LINKLOCAL>
        media: Ethernet autoselect
        status: active
        laggproto roundrobin
        laggport: bge0 flags=4<ACTIVE>
        laggport: em0 flags=4<ACTIVE>
```


----------



## milonz (Sep 16, 2012)

I've run some additionnal tests this morning, when switching the *Free*BSD box from "roundrobin" to "failover", I get once more a solid 900 Mbs iperf throughput (110 MB/s with real life ftp transfers). So it's only the roundrobin method which causes issues.  Too bad it's the one I need =/

Some thoughts? (the roundrobin method is hmmm lightly documented in the lagg(4) manual)


----------



## Savagedlight (Sep 16, 2012)

Can you try setting up a ftp server on the Windows 7 box, and have the FreeBSD server fetch a file from it to test performance in that direction?

I believe this may help shed some light on where the problem is.


----------



## wblock@ (Sep 16, 2012)

Do you have an Intel card to try in the Windows system?


----------



## milonz (Sep 16, 2012)

> Do you have an Intel card to try in the Windows system?


Not yet, but i'll have one in one week, as soon as it's shipped (chinese PC hardware shops only sell Realtek, in France)


----------



## milonz (Sep 16, 2012)

Hello Savagedlight, i've run the tests you asked me to, I musk admit i wasn't expecting these results.
FTP connections have been initiated BSD side, the read/writes are made on the ZFS pool.
I've installed the latest Filezilla Server on Windows, and configured the user's home directory on my SSD disk, to make sure disk I/O weren't the bottleneck. Firewall and antivirus disabled.

Sorry, for now, i can't run these kind of tests on the Ubuntu box; its HDD is really old, and will certainly be the bottleneck (i have a SSD in stock, but will have to reinstall the OS on this new SSD)

Failover :

Put :

```
229 Entering Extended Passive Mode (|||50546|)
150 Connection accepted
100% |************************************************************************************************************************************************************************************************|  8324 MiB   48.90 MiB/s    --:-- ETA
226 Transfer OK
8728466265 bytes sent in 02:50 (48.90 MiB/s)
```

Get :

```
229 Entering Extended Passive Mode (|||50595|)
150 Connection accepted
100% |************************************************************************************************************************************************************************************************|  8324 MiB   50.89 MiB/s    00:00 ETA
226 Transfer OK
8728466265 bytes received in 02:43 (50.89 MiB/s)
```

Roundrobin :

Get :

```
local: Assassins Creed II - Revelations.iso remote: Assassins Creed II - Revelations.iso
229 Entering Extended Passive Mode (|||52793|)
150 Connection accepted
100% |************************************************************************************************************************************************************************************************|  8323 MiB   40.55 MiB/s    00:00 ETA
226 Transfer OK
8728033449 bytes received in 03:25 (40.55 MiB/s)
```


Put :


```
229 Entering Extended Passive Mode (|||52832|)
150 Connection accepted
100% |************************************************************************************************************************************************************************************************|  8323 MiB   19.57 MiB/s    --:-- ETA
226 Transfer OK
8727791721 bytes sent in 07:05 (19.57 MiB/s)
```

One interface only mode :

```
Get :
229 Entering Extended Passive Mode (|||53590|)
150 Connection accepted
100% |************************************************************************************************************************************************************************************************|  8323 MiB   51.46 MiB/s    00:00 ETA
226 Transfer OK
8728033453 bytes received in 02:41 (51.46 MiB/s)
```

Put :

```
229 Entering Extended Passive Mode (|||54535|)
150 Connection accepted
100% |************************************************************************************************************************************************************************************************|  8323 MiB   54.04 MiB/s    --:-- ETA
226 Transfer OK
8727791725 bytes sent in 02:34 (54.04 MiB/s)
```

What could it mean ? I've got transfers reaching an excellent 100MB/s when Failover or only one interface is enabled on the BSD box, when Windows initiates connexions (FTP, NFS, pure tcp via iperf), but crappy transfer rates both ways when the BSD initiates connections on the Windows box, no matter how the BSD interfaces are configured (and it's even not true for iperf, which reaches 900Mbs when roundrobin is not set)
I'm quite lost ... More Windows optimizations ?
I really do want to run the same tests with the Ubuntu box for these results to make really sense, but it will have to wait a couple of days I think =/


----------



## milonz (Sep 17, 2012)

One last word for tonight, i've installed the SSD on the Ubuntu box, i'm using Proftpd on this one. I've run some ftp transfers from the BSD (still on ZFS, roundrobin mode) connected to the Ubuntu's ftp (on SSD). As i was hoping, the transfer rates did improve (my previous HDD was REALLY too old)
It gives me nice rates :
Get :

```
ftp> get sr-acii.iso
local: sr-acii.iso remote: sr-acii.iso
229 Entering Extended Passive Mode (|||23155|)
150 Opening BINARY mode data connection for sr-acii.iso (6810935296 bytes)
100% |************************************************************************************************************************************************************************************************|  6495 MiB   90.24 MiB/s    00:00 ETA
226 TÃƒÂ©lÃƒÂ©chargement terminÃƒÂ©
6810935296 bytes received in 01:11 (90.24 MiB/s)
```

Put :

```
ftp> put sr-acii.iso
local: sr-acii.iso remote: sr-acii.iso
229 Entering Extended Passive Mode (|||51774|)
150 Ouverture d'une connexion de donnÃƒÂ©es en mode BINARY pour sr-acii.iso
100% |************************************************************************************************************************************************************************************************|  6495 MiB   97.33 MiB/s    00:00 ETA
226 TÃƒÂ©lÃƒÂ©chargement terminÃƒÂ©
6810935296 bytes sent in 01:08 (95.43 MiB/s)
```

So the issue is Windows centric (whether it comes from Windows or Freebsd)


----------



## milonz (Sep 17, 2012)

Please forgive my last sentence, i'm tired ;-)
I'll run more tests tomorrow, between the Ubuntu and Windows boxes before drawing conclusions lol


----------



## milonz (Sep 19, 2012)

I took a Ubuntu Live test session on my Windows box, it seems that the transfer rates are as bad as on Windows, so I guess there's something broken on my integrated NIC. Though I can't figure why the vary so much (from excellent to bad) with different LAGG settings on the FreeBSD =/
I'll receive a new Intel 1000CT NIC and a bunch of cat 6 cables tomorrow or friday.
I guess i will figure what's going on then ... or not lol

I'll keep you updated !


----------



## milonz (Nov 26, 2012)

Hello,

to update my case, i haven't been able to solve the issue (i suspect it comes from the windows ip stack, and its ability to reorder ip packets with the roundrobin method ?!)
As I recently acquired a managed switch with etherchannel/LACP for the esx server i've had, i've bonded all my computers with LACP, including the FreeBSD, Linux and even Windows boxes, and it works fine.
Solid throughputs on all computers, including the Windows workstation (with the RTL8111E chip only, then with two 82574L cards i've managed to spare from another computer)

Case closed, thanks


----------



## milonz (Nov 26, 2012)

PS : i haven't retried transfers on the RTL8111E in fact, i moved directly on the Intel Cards, on the Windows box


----------

