# Network performance problem



## lartza (Apr 1, 2010)

Hello, I have the following setup.

Fileserver with integrated gblan (re0) bridged (bridge0) with another 100mbit pci nic, which is connected to a 100mbit switch.

Atom Ion HTPC with integrated gblan (re0), connected directly with cat5e cable to the integrated gblan on the fileserver. It pxeboots from fserver and uses nfs as root.

Both machines are running freebsd-8.0 release and software compiled from ports.

The setup works like a charm, except there's something fishy about network throughput which I can't figure out. I've been googling and reading all the forums and mailing lists, tried pretty much all the tweaks I can find.

iperf gives static throughput of 400-500mbit, peaking even at 600mbit. This is confirmed using iftop -i bridge0. Not quite gblan, but close enough for cheap realtek chips.

both ends agree on media: Ethernet autoselect (1000baseT <full-duplex>) and forcing it doesn't change a thing.

The odd part is all the other protocols.
nfs: maximum throughput around 200mbit (dd and iftop both agree on this)
smb: maximum throughput around 100mbit (utilizing samba and checking iftop confirms)

Practically everything works pretty much the same speed it did with 100mbit network (or 2x the speed at max). I've read that gblan should give over 50MB/s easily (some claim over 100MB/s, which is near the theoretical maximum of Gbit).

Disk IO on the server is fine, around 70-80MB/s as expected.

Everything on the server and client idles when nfs and/or samba are utilized. However, with iperf, I did get the server cpu utilized to the max (interrupts 80%) when polling was disabled.

Currently, I have the defaults in sysctl and ifconfig (except polling enabled on server). AIO is enabled in smb.conf, nfs is as it ships with freebsd. I've tried rsize and wsize tuning, but it seems like all the fine tuning parameters I can find have only marginal gain (10%-20%) compared to the 2x-4x speedup I'm missing here. All the tunings still show similar pattern: iperf is 2x-4x times faster than anything else.

So the question is: *why on earth* iperf is 500mbit, but nfs, samba and others aren't? Some people say it's the crappy realtek chips to blaim and I'd tolerate this explanation if iperf wasn't so much faster than any other "real" protocol. Even if it was realtek's fault, I'd like to have some technical insight why iperf performance is so much higher.


----------



## phoenix (Apr 1, 2010)

There are a lot of tuning settings that you can use with NFS and Samba.  The defaults are not set for pure speed/raw throughput.  You need to tweak them yourselves.  Same with the default sysctls for TCP and UDP.

NFS and Samba are also dependent on the disk I/O speed, whereas iperf just runs in RAM.


----------



## lartza (Apr 1, 2010)

I know iperf uses ram and nfs/samba/others use disk.

Observing `systat -vmstat` while dd'ing over nfs tells me disks are not very busy when giving out around 200mbit to the network, while dd'ing locally on the server to devnull gives good busy % on the disks so disks should be fine.

observing `iftop -i bridge0` with iperf tells me network should be 500mbit, while it gives only 200mbit when dd'ing over nfs. 

Disk io isn't the limiting factor neither is network throughput, so what is it then? Where should I look? Which tweaking is it.


----------



## SirDice (Apr 1, 2010)

Somebody (i forgot who) posted a few sysctls some time ago, those seem to help for me. I had some issues with NFS and Samba too. 


```
dice@molly:~>cat sysctl.sh 
#!/bin/sh

sysctl kern.ipc.maxsockbuf=2097152

sysctl net.inet.tcp.recvspace=262144
sysctl net.inet.tcp.recvspace=262144
sysctl net.inet.tcp.mssdflt=1452

sysctl net.inet.udp.recvspace=65535
sysctl net.inet.udp.maxdgram=65535

sysctl net.local.stream.recvspace=65535
sysctl net.local.stream.sendspace=65535
```

Just run the script and see if it works out. If they're ok for you you can add them to /etc/sysctl.conf.


----------



## lartza (Apr 2, 2010)

Nope, these sysctl tunables don't seem to help much. iperf peaks still well over 500Mbit in iftop, while samba/nfs/etc peak a bit over 200Mbit. 

I can read ~13Mb/s with samba and ~22Mb/s from nfs.


----------



## Savagedlight (Apr 3, 2010)

Quoting my own post here as I think it will solve your issue with Samba. Take special note of the changes to the samba config towards the bottom.


			
				Savagedlight said:
			
		

> I've managed to cap my gigabit network using samba (115-126MB/s file transfers, according to the file copy dialog), and managed to get 990mbit/s through iperf.
> 
> This is a copypaste of my relevant settings in /etc/sysctl.conf
> 
> ...



I'd start with the jumbo frames and samba socket options.


----------



## lartza (Apr 4, 2010)

Nope, no good yet. 

I've tried enabling jumbo frames but the realtek chip on atom ion box refuses. Could it be so utterly crappy hardware that it can do only a bit over 100mbit though it is marketed as gblan? All the protocols (http, scp, samba, ...) transfer a bit over 10Mb/s which sounds like regular 100mbit, though media reported by ifconfig is 1000.

I can't change the mtu to 7422 on the atom ion box, highest mtu I can get is 1504. Could it be a driver issue?

It's still very puzzling, how come iperf has so much higher throughput. All the tweaks I know lead to the same situation. Even if the hardware is broken by default, what would happen if I piped all the traffic through iperf?


----------



## jalla (Apr 4, 2010)

lartza said:
			
		

> I can't change the mtu to 7422 on the atom ion box, highest mtu I can get is 1504.



Is that the same if you "unbridge" the interface?


----------



## lartza (Apr 4, 2010)

jalla said:
			
		

> Is that the same if you "unbridge" the interface?



Yes.

The mtu on the server (where re0 is bridged) can be changed to 7422 even if it was on the bridge.

Also, "Unbridging" the server interfaces and using "direct" re0<->re0 isn't any better. The mtu limit on atom box is 1504 and the speed is 100mbit-200mbit instead of full gblan.

In fact, the real speed of any protocol matches exactly 100mbit a bit too well. Could they be selling motherboard equipped with "gblan", which actually is only 100mbit lying on the ethernet level about being gblan?

It's still a big mystery why nfs has 20MB/s transfer rate (instead of 10-13MB/s as all the other) and iperf claims over 500mbit speed succesfully.


----------



## lartza (Apr 4, 2010)

I booted archlinux from usb stick, mounted nfs and dd gives 32MB/s. Same for scp and all the rest. Should I blame freebsd realtek driver?


----------



## lartza (Mar 9, 2011)

I came across a couple of Intel GB NICs, and decided to try this out again. I'm pretty sure I found out something meaningful, just can't put the last pieces together.

The server is updated to FreeBSD 8.2 release. I removed most of the tweaks, since they didn't help this particular problem.

As I tried the local disk throughput with dd again, I noticed that using bs=1M (as I did earlier) it reads/writes fast (about 80MB/s), BUT without the bs=1M parameter (i.e. using the default 512 bytes) rw speeds are very similar to samba, scp, http and nfs! Writing without bs=1M is inconsistent (100MB-200MB bursts about once a 1-3 seconds) totalling to something very similar as the "poor" 20-30MB write over nfs.

So, it might be a ZFS problem after all? Is there a tunable for ZFS (or nfs/apache/samba/ssh) which would adjust the default bs=512 to something higher? or is it default for dd only? nfs rsize and wsize are 32768 already, but writing with dd bs=32K is a lot faster than dd bs=512 (and dd bs=512 acts pretty much equal to nfs, reading about 20-30MB/s and writing in bursts every few seconds)

I already googled for this and searched this forum, tried to fiddle with zfs.txg.* and some zfs related /boot/loader.conf tunables etc without getting any further.


----------



## Alt (Mar 9, 2011)

Probably you should try this http://nfs.sourceforge.net/nfs-howto/ar01s05.html


----------



## phoenix (Mar 9, 2011)

If you are using ZFS as the backing store for NFS, then you *really* need to look into adding an SSD as a separate ZIL device.  NFS writes are *sync* writes, which means ZFS first writes them out to the ZIL, waits for the write to complete, then signals the NFS client that the write is complete (data is still stored in the ARC).  Later, ZFS writes the data out from the ARC to the pool.  Thus, every *sync* write is done twice.

By default, the ZIL is part of the pool.  Thus, NFS writes are done twice to the same pool (once to internal ZIL, once to pool).

If you cannot afford a separate SSD for the ZIL, or don't have the room for it in the case, you can disable the ZIL via /boot/loader.conf.  It's generally not recommended to do so, but it can provide increased performance in certain situations.

Samba performance issues with ZFS usually come from trying to use *sendfile* options in smb.conf.  ZFS and sendfile do not play well together in FreeBSD 8.x.  There are patches available for 8-STABLE to remedy this.


----------



## lartza (Mar 10, 2011)

Hmm hmm. it's probably not zfs related issue then, because *reading* over the network is max. 30MB/s, reading locally is over 80MB/s.

With the new intel nics and iperf I get 300Mbit to one direction (from server to client), 600Mbit to another (client to server). Exactly the same thing with intel-realtek or realtek-realtek. With three different cables. And I tested with MacBook as client too.

Something is bringing down the network performance of the FreeBSD server, probably on the software level (since different hardware has the same problem). Or maybe there's too much cosmic radiation in my closet.

I've already tried every possible software level solution for this kind of problem from sysctl tcp tunables to device polling with no luck. I was so sure that it was the realtek chips causing problems...


----------



## Savagedlight (Mar 11, 2011)

I've found that I have to max out the MTU to get the most out of my NICs. With a MTU of 9000, I get very close to 1Gbps in both directions with Intel NICs, and close to 600Mbps with realtek ones.


----------



## lartza (Mar 14, 2011)

I borrowed a MacBook, tested a bit further and got the following results:


 Atom ION + Integrated Realtek <-> MacBook: ~900Mbit/s both directions
 Atom ION + PCI Intel NIC <-> MacBook: ~900Mbit/s both directions
All good!


 Atom ION + Integrated Realtek <-> MSI Neo K9N V2 Integrated realtek (FreeBSD)
 Iperf server on Atom, client on FreeBSD: 300Mbit
 Iperf client on Atom, server on FreeBSD: 700Mbit

 Atom ION + PCI Intel NIC <-> MSI Neo K9N V2 + PCI Intel NIC (FreeBSD):
 Iperf server on Atom, client on FreeBSD: 300Mbit
 Iperf client on Atom, server on FreeBSD: 700Mbit

 MacBook <-> MSI Neo K9N V2 PCI Intel NIC (FreeBSD):
 Iperf server on MacBook client on FreeBSD: 300Mbit
 Iperf client on MacBook, server on FreeBSD: 700Mbit


Probably a hardware issue on the homeserver motherboard (MSI)? Since the realtek NIC  is integrated on the PCI bus it could cause the same slowdown on both NICs. Or maybe some kind of compatibility issue between FreeBSD and the Motherboard?

So... I probably have to rebuild the server on a different motherboard to get a real 1gbit link, damn.


----------

