# Why good network speed + good disk speed => low disk speed?



## littlesandra88 (Apr 8, 2013)

Dear all

I have a weird problem.

If I do
`# slave# time dd if=/dev/zero of=/tank3/fs5/test bs=1M count=10000`

then I get a write speed of ~525MB/s.

If I do

```
slave# nc -l 8023 > /dev/null
master# time dd if=/dev/zero bs=1M count=10000 | nc 10.10.10.11 8023
```
then I get a network transfer speed of ~350MB/s with MTU 4000 on a 10Gbit connection.

And if I combine the two to simulate ZFS replication

```
slave# nc -l 8023 > /tank3/fs5/test
master# time dd if=/dev/zero bs=1M count=10000 | nc 10.10.10.11 8023
```
then I only get ~120MB/s.

Going from MTU 1500 to 4000 didn't change much, and setting it higher makes nc exit right away.

Using mbuffer for only 10GB data makes the total time longer because of the burn in.

*Question
*

 What could the reason be that I can't set MTU higher?
 What could a reason be that I get so bad write performance when transferring over the network?


----------



## cpm@ (Apr 8, 2013)

littlesandra88 said:
			
		

> What could the reason be that I can't set MTU higher?
> What could a reason be that I get so bad write performance when transferring over the network?



Test using nc6(1) which  supports MTU. Install net/nc6 port.


> --mtu=BYTES
> Set the Maximum Transmission Unit for the remote  endpoint
> (network transmits).  This is only really useful for data-
> gram protocols like UDP.  For TCP the MTU is determined by
> ...


----------



## littlesandra88 (Apr 8, 2013)

@cpu82

This is exceeding interesting! I hope I can ask a few questions about nc6 =)

When using --mtu=BYTES shall I leave the MTU for the NIC interface to default or change it to the same value as --mtu=BYTES ?

What would the disadvantages be to use UDP in nc6 instead of TCP?

When reading about --buffer-size=BYTES then it sounds a lot like mbuffer. Can nc6 replace mbuffer -s 128k -m 1G ?

Though I must say I don't see any impact on -s in mbuffer, but I guess it is nice that it is the same as the ZFS block size? =)


```
--buffer-size=BYTES
                    Set the buffer size for the local  and  remote  endpoints.
                    netcat6  does all reads into these buffers, so they should
                    be large enough  to  minimize  excessive  reads  from  the
                    socket  and  in  UDP  mode  it  should  be large enough to
                    receive  an  entire  datagram  (also  see  '--nru').    By
                    default,  the  buffer  size is 8 kilobytes for TCP connecâ€
                    tions and 128 kilobytes for UDP.
```


----------



## cpm@ (Apr 9, 2013)

littlesandra88 said:
			
		

> When using --mtu=BYTES shall I leave the MTU for the NIC interface to default or change it to the same value as --mtu=BYTES ?


Not necessary  given that for TCP(4) the MTU is determined by the kernel. The TCP window controls the flow of data, and is negotiated during the start of a TCP connection. Using too small of a size will result in slowness, since TCP can only use the smaller of the two end system's capabilities. While this may be useful when connecting two hosts directly together, it becomes less useful when connecting through a switch that doesn't support larger MTUs. To squeeze your NIC you can customize this sysctl variables:


kern.ipc.maxsockbuf 
net.inet.tcp.sendspace
net.inet.tcp.recvspace
net.inet.tcp.rfc1323
kern.ipc.nmbclusters



			
				littlesandra88 said:
			
		

> What would the disadvantages be to use UDP in nc6 instead of TCP?



UDP(4) stands for User Datagram Protocol and this protocol is one that does not provide error checking and is known more for speed than guaranteed data arrival. UDP is a connectionless protocol and no connection is needed in order to send data, it can be sent at anytime. Applications that do not require error checking or flow control prefer to use UDP. UDP is built for speed and can send data up to three times faster than TCP due to less overhead.

Summarizing the disadvantages of UDP are: 


No guaranteed delivery of data so it may get lost and not resent.
No ordering of data or packets.



			
				littlesandra88 said:
			
		

> When reading about --buffer-size=BYTES then it sounds a lot like mbuffer. Can nc6 replace mbuffer -s 128k -m 1G ?



Thats correct but in reverse, mbuffer(1) is a replacement for buffer(1)  and offers  more  options. Also read ZFS send/receive accross different transport mechanisms for test commands


----------



## wblock@ (Apr 9, 2013)

Just a reminder: changing the MTU on a network card does not change the MTU on an existing route.


----------



## littlesandra88 (Apr 9, 2013)

@cpu82,

I must have had a bad fiber connection. After moving the socket a bit, I now have 1.1 Gbyte/s, just using mbuffer from your last link.  Good stuff =)

And mbuffer + writing to disk is now ~400MB/s.

This is with MTU 4000. If I set it higher, I get

```
mbuffer: warning: error connecting to 10.10.10.11:8023: Network is down
mbuffer: error: unable to connect to 10.10.10.11:8023
mbuffer: fatal: no output left - nothing to do
```

By optimising on the sysctl variables you posted, how much would you think I could gain in a best case? A lot or diminishing returns?

I saw about 525MB/s when writing locally to disk. Should I be able to do that same over network, now that I can transfer 1.1GB/s?

The two hosts are connected directly with a 10Gbit fiber. What could the course be, that I can't set the MTU higher than 4000?


----------



## littlesandra88 (Apr 9, 2013)

wblock@ said:
			
		

> Just a reminder: changing the MTU on a network card does not change the MTU on an existing route.



I am not quite sure what an existing route means. What I did was `# ifconfig ix1 10.10.10.11 mtu 4000` and then `# ifconfig ix1` to verify that the MTU were changed.

I did this with different IPs on each host.


----------



## wblock@ (Apr 9, 2013)

Try [cmd=]route show default[/cmd] or [cmd=]netstat -rW[/cmd].


----------



## littlesandra88 (Apr 9, 2013)

@wblock@,

ix0 is the public NIC and ix1 is only for replication.


```
[# route show default
   route to: default
destination: default
       mask: default
    gateway: example.com
  interface: ix0
      flags: <UP,GATEWAY,DONE,STATIC>
 recvpipe  sendpipe  ssthresh  rtt,msec    mtu        weight    expire
       0         0         0         0      1500         1         0 
# netstat -rW
Routing tables

Internet:
Destination           Gateway            Flags    Refs      Use    Mtu    Netif Expire
default               example.com        UGS         0    11234   1500      ix0
10.0.0.0              link#2             U           0  4197197   4000      ix1
10.10.10.10           link#2             UHS         0        0  16384      lo0
localhost             link#12            UH          0      318  16384      lo0
xxx.xxx.xxx.0/22      link#1             U           0 16549538   1500      ix0
nas.example.com       link#1             UHS         0        0  16384      lo0

Internet6:
Destination                  Gateway                      Flags    Refs      Use    Mtu    Netif Expire
::                           localhost                    UGRS        0        0  16384      lo0
localhost                    link#12                      UH          0        4  16384      lo0
::ffff:0.0.0.0               localhost                    UGRS        0        0  16384      lo0
fe80::                       localhost                    UGRS        0        0  16384      lo0
fe80::%ix1                   link#2                       U           0        0   1500      ix1
fe80::225:90ff:fe96:65a7%ix1 link#2                       UHS         0        0  16384      lo0
fe80::%lo0                   link#12                      U           0        0  16384      lo0
fe80::1%lo0                  link#12                      UHS         0        0  16384      lo0
ff01::%ix1                   fe80::225:90ff:fe96:65a7%ix1 U           0        0   1500      ix1
ff01::%lo0                   localhost                    U           0        0  16384      lo0
ff02::                       localhost                    UGRS        0        0  16384      lo0
ff02::%ix1                   fe80::225:90ff:fe96:65a7%ix1 U           0       59   1500      ix1
ff02::%lo0                   localhost                    U           0      236  16384      lo0
#
```


----------



## cpm@ (Apr 9, 2013)

littlesandra88 said:
			
		

> This is with MTU 4000. If I set it higher, I get
> 
> ```
> mbuffer: warning: error connecting to 10.10.10.11:8023: Network is down
> ...



Depends if your NIC is capable to support MTU>4000. Consult your NIC's datasheet. Regarding  those errors is by design and be expected, is documented in mbuffer.c and network.c (source code).

Show your `% sysctl` dump to see what can be done.


----------



## littlesandra88 (Apr 9, 2013)

@cpu82,

It is a "Supermicro 2-port 10Gb Standard LP NIC" which has an Intel 82599ES controller. Just having searched the datasheet they haven't mentioned a maximum size for MTU.

What seems to be a readme for Linux they write


> The maximum MTU setting for Jumbo Frames is 9710.  This value coincides
> with the maximum Jumbo Frames size of 9728. This driver will attempt to
> use multiple page sized buffers to receive each jumbo packet.  This
> should help to avoid buffer starvation issues when allocating receive
> packets.



What can you conclude from this? =)


```
# sysctl -a | grep "ix.1"     
dev.ix.1.%desc: Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.4.8
dev.ix.1.%driver: ix
dev.ix.1.%location: slot=0 function=1
dev.ix.1.%pnpinfo: vendor=0x8086 device=0x10fb subvendor=0x15d9 subdevice=0x0611 class=0x020000
dev.ix.1.%parent: pci4
dev.ix.1.fc: 3
dev.ix.1.enable_aim: 1
dev.ix.1.advertise_speed: 0
dev.ix.1.rx_processing_limit: 128
dev.ix.1.dropped: 0
dev.ix.1.mbuf_defrag_failed: 0
dev.ix.1.no_tx_dma_setup: 0
dev.ix.1.watchdog_events: 0
dev.ix.1.tso_tx: 82053968
dev.ix.1.link_irq: 42
dev.ix.1.queue0.interrupt_rate: 83333
dev.ix.1.queue0.irqs: 3466429
dev.ix.1.queue0.txd_head: 2
dev.ix.1.queue0.txd_tail: 2
dev.ix.1.queue0.no_desc_avail: 0
dev.ix.1.queue0.tx_packets: 16126743
dev.ix.1.queue0.rxd_head: 1
dev.ix.1.queue0.rxd_tail: 0
dev.ix.1.queue0.rx_packets: 21574362
dev.ix.1.queue0.rx_bytes: 60
dev.ix.1.queue0.lro_queued: 0
dev.ix.1.queue0.lro_flushed: 0
dev.ix.1.queue1.interrupt_rate: 100000
dev.ix.1.queue1.irqs: 3025206
dev.ix.1.queue1.txd_head: 1002
dev.ix.1.queue1.txd_tail: 1002
dev.ix.1.queue1.no_desc_avail: 0
dev.ix.1.queue1.tx_packets: 8390514
dev.ix.1.queue1.rxd_head: 817
dev.ix.1.queue1.rxd_tail: 816
dev.ix.1.queue1.rx_packets: 12089399
dev.ix.1.queue1.rx_bytes: 123679358
dev.ix.1.queue1.lro_queued: 0
dev.ix.1.queue1.lro_flushed: 0
dev.ix.1.queue2.interrupt_rate: 31250
dev.ix.1.queue2.irqs: 2994068
dev.ix.1.queue2.txd_head: 0
dev.ix.1.queue2.txd_tail: 0
dev.ix.1.queue2.no_desc_avail: 0
dev.ix.1.queue2.tx_packets: 5168536
dev.ix.1.queue2.rxd_head: 0
dev.ix.1.queue2.rxd_tail: 1023
dev.ix.1.queue2.rx_packets: 6034567
dev.ix.1.queue2.rx_bytes: 0
dev.ix.1.queue2.lro_queued: 0
dev.ix.1.queue2.lro_flushed: 0
dev.ix.1.queue3.interrupt_rate: 83333
dev.ix.1.queue3.irqs: 4420287
dev.ix.1.queue3.txd_head: 999
dev.ix.1.queue3.txd_tail: 999
dev.ix.1.queue3.no_desc_avail: 0
dev.ix.1.queue3.tx_packets: 13517943
dev.ix.1.queue3.rxd_head: 847
dev.ix.1.queue3.rxd_tail: 846
dev.ix.1.queue3.rx_packets: 32761311
dev.ix.1.queue3.rx_bytes: 119678090
dev.ix.1.queue3.lro_queued: 0
dev.ix.1.queue3.lro_flushed: 0
dev.ix.1.queue4.interrupt_rate: 100000
dev.ix.1.queue4.irqs: 3633735
dev.ix.1.queue4.txd_head: 1429
dev.ix.1.queue4.txd_tail: 1429
dev.ix.1.queue4.no_desc_avail: 0
dev.ix.1.queue4.tx_packets: 24932991
dev.ix.1.queue4.rxd_head: 33
dev.ix.1.queue4.rxd_tail: 32
dev.ix.1.queue4.rx_packets: 34301802
dev.ix.1.queue4.rx_bytes: 122605550
dev.ix.1.queue4.lro_queued: 0
dev.ix.1.queue4.lro_flushed: 0
dev.ix.1.queue5.interrupt_rate: 100000
dev.ix.1.queue5.irqs: 2897678
dev.ix.1.queue5.txd_head: 1257
dev.ix.1.queue5.txd_tail: 1257
dev.ix.1.queue5.no_desc_avail: 0
dev.ix.1.queue5.tx_packets: 1429566
dev.ix.1.queue5.rxd_head: 847
dev.ix.1.queue5.rxd_tail: 846
dev.ix.1.queue5.rx_packets: 1920686
dev.ix.1.queue5.rx_bytes: 127875146
dev.ix.1.queue5.lro_queued: 0
dev.ix.1.queue5.lro_flushed: 0
dev.ix.1.queue6.interrupt_rate: 83333
dev.ix.1.queue6.irqs: 3321952
dev.ix.1.queue6.txd_head: 2
dev.ix.1.queue6.txd_tail: 2
dev.ix.1.queue6.no_desc_avail: 0
dev.ix.1.queue6.tx_packets: 12027743
dev.ix.1.queue6.rxd_head: 0
dev.ix.1.queue6.rxd_tail: 1023
dev.ix.1.queue6.rx_packets: 19633672
dev.ix.1.queue6.rx_bytes: 0
dev.ix.1.queue6.lro_queued: 0
dev.ix.1.queue6.lro_flushed: 0
dev.ix.1.queue7.interrupt_rate: 31250
dev.ix.1.queue7.irqs: 3247609
dev.ix.1.queue7.txd_head: 0
dev.ix.1.queue7.txd_tail: 0
dev.ix.1.queue7.no_desc_avail: 0
dev.ix.1.queue7.tx_packets: 13576627
dev.ix.1.queue7.rxd_head: 0
dev.ix.1.queue7.rxd_tail: 1023
dev.ix.1.queue7.rx_packets: 20004072
dev.ix.1.queue7.rx_bytes: 0
dev.ix.1.queue7.lro_queued: 0
dev.ix.1.queue7.lro_flushed: 0
dev.ix.1.mac_stats.crc_errs: 0
dev.ix.1.mac_stats.ill_errs: 0
dev.ix.1.mac_stats.byte_errs: 0
dev.ix.1.mac_stats.short_discards: 0
dev.ix.1.mac_stats.local_faults: 32
dev.ix.1.mac_stats.remote_faults: 7
dev.ix.1.mac_stats.rec_len_errs: 0
dev.ix.1.mac_stats.link_xon_txd: 0
dev.ix.1.mac_stats.link_xon_rcvd: 0
dev.ix.1.mac_stats.link_xoff_txd: 0
dev.ix.1.mac_stats.link_xoff_rcvd: 0
dev.ix.1.mac_stats.total_octets_rcvd: 10433044007
dev.ix.1.mac_stats.good_octets_rcvd: 10433044007
dev.ix.1.mac_stats.total_pkts_rcvd: 148319871
dev.ix.1.mac_stats.good_pkts_rcvd: 148319871
dev.ix.1.mac_stats.mcast_pkts_rcvd: 0
dev.ix.1.mac_stats.bcast_pkts_rcvd: 668
dev.ix.1.mac_stats.rx_frames_64: 177
dev.ix.1.mac_stats.rx_frames_65_127: 148308725
dev.ix.1.mac_stats.rx_frames_128_255: 7088
dev.ix.1.mac_stats.rx_frames_256_511: 1316
dev.ix.1.mac_stats.rx_frames_512_1023: 1291
dev.ix.1.mac_stats.rx_frames_1024_1522: 1274
dev.ix.1.mac_stats.recv_undersized: 0
dev.ix.1.mac_stats.recv_fragmented: 0
dev.ix.1.mac_stats.recv_oversized: 0
dev.ix.1.mac_stats.recv_jabberd: 0
dev.ix.1.mac_stats.management_pkts_rcvd: 0
dev.ix.1.mac_stats.management_pkts_drpd: 0
dev.ix.1.mac_stats.checksum_errs: 0
dev.ix.1.mac_stats.good_octets_txd: 368347329667
dev.ix.1.mac_stats.total_pkts_txd: 232978359
dev.ix.1.mac_stats.good_pkts_txd: 232978359
dev.ix.1.mac_stats.bcast_pkts_txd: 149
dev.ix.1.mac_stats.mcast_pkts_txd: 0
dev.ix.1.mac_stats.management_pkts_txd: 0
dev.ix.1.mac_stats.tx_frames_64: 60
dev.ix.1.mac_stats.tx_frames_65_127: 1414870
dev.ix.1.mac_stats.tx_frames_128_255: 1826551
dev.ix.1.mac_stats.tx_frames_256_511: 1268771
dev.ix.1.mac_stats.tx_frames_512_1023: 47120219
dev.ix.1.mac_stats.tx_frames_1024_1522: 181347888
dev.ix.1.mac_stats.fc_crc: 0
dev.ix.1.mac_stats.fc_last: 0
dev.ix.1.mac_stats.fc_drpd: 0
dev.ix.1.mac_stats.fc_pkts_rcvd: 0
dev.ix.1.mac_stats.fc_pkts_txd: 0
dev.ix.1.mac_stats.fc_dword_rcvd: 0
dev.ix.1.mac_stats.fc_dword_txd: 0
```


----------



## cpm@ (Apr 9, 2013)

You have super jumbo frames MTU support: 
`# ifconfig ix1 10.10.10.11 mtu 9000`

Alternatively, you can use router command to set MTU: 
`# route change 10.10.10.11 -mtu 9000`

Modify interface ix1 in /etc/rc.conf as follows:

```
ifconfig_ix1="inet x.x.x.x netmask y.y.y.y media 1000baseTX mediaopt full-duplex mtu 9000"
```

Restart networking: 
`# /etc/rc.d/netif restart`

Please, show next sysctl variables regarding TCP:

```
kern.ipc.maxsockbuf
net.inet.tcp.sendspace
net.inet.tcp.recvspace
net.inet.tcp.rfc1323
kern.ipc.nmbclusters
net.inet.tcp.sendbuf_max
net.inet.tcp.recvbuf_max
```


----------



## littlesandra88 (Apr 9, 2013)

Excellent =)


```
kern.ipc.maxsockbuf: 2097152
net.inet.tcp.sendspace: 32768
net.inet.tcp.recvspace: 65536
net.inet.tcp.rfc1323: 1
kern.ipc.nmbclusters: 25600
net.inet.tcp.sendbuf_max: 2097152
net.inet.tcp.recvbuf_max: 2097152
```

The entire dump is here.


----------



## cpm@ (Apr 9, 2013)

Change values to TCP performance tuning in /etc/sysctl.conf:

```
kern.ipc.maxsockbuf=16777216
net.inet.tcp.sendspace=262144
net.inet.tcp.recvspace=262144
net.inet.tcp.rfc1323=1
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216
```

Modify in /boot/loader.conf

```
kern.ipc.nmbclusters="32768"
```


----------



## littlesandra88 (Apr 10, 2013)

@cpu82,

It is very weird. I see extreme variations in speed/transfer time when testing with


```
slave# mbuffer -4 -s 128k -m 1G -I 8023 > /tank3/fs5/test
master# dd if=/dev/zero bs=1M count=10000 | mbuffer -4 -s 128k -m 1G -O 10.10.10.11:8023
```

Everything from an average of ~200MB/s to ~500MB/s with and without the tweaks.

Is it because my test is flawed and doesn't simulate a ZFS replication very well?


----------



## cpm@ (Apr 10, 2013)

@littlesandra88,

I told you, first, you need enable super jumbo frames (SJFs) on FreeBSD. This article should help. Furthermore, take an eye to network testing with dd and netcat under Linux using improvements options taskset/nice, click here.

Note that, taskset is a standard shell wrapper for Linux which allows set the CPU affinity of a process to a particular core(s). Instead, for FreeBSD use cpuset(1) and nice(1) :e


----------



## wblock@ (Apr 10, 2013)

But make sure the switch supports jumbo frames larger than 9000 bytes.  Many do not (like all the cheap ones I have).


----------



## littlesandra88 (Apr 11, 2013)

@cpu82,

Ahh, yes. I had completely forgot about setting the MTU.

Now I have run some primitive benchmarks using your excellent tips. I am really learning a lot from this forum =)

Both hosts and tests have MTU 9000 and the tweaks from your previous post.


```
TEST 1

sender# cat /tank3/fs/test | mbuffer -4 -s 128k -m 1G -O 10.10.10.11:8023
receiver# mbuffer -4 -s 128k -m 1G -I 8023 > /tank3/fs5/test

summary: 1e+04 Byte in 34.3 sec - average of  292 MiB/s
summary: 1e+04 Byte in 26.1 sec - average of  383 MiB/s
summary: 1e+04 Byte in 24.4 sec - average of  410 MiB/s
summary: 1e+04 Byte in 24.0 sec - average of  417 MiB/s
summary: 1e+04 Byte in 24.8 sec - average of  403 MiB/s
summary: 1e+04 Byte in 26.3 sec - average of  380 MiB/s
summary: 1e+04 Byte in 27.1 sec - average of  369 MiB/s
summary: 1e+04 Byte in 28.1 sec - average of  355 MiB/s
summary: 1e+04 Byte in 27.3 sec - average of  367 MiB/s
summary: 1e+04 Byte in 29.6 sec - average of  338 MiB/s
summary: 1e+04 Byte in 25.8 sec - average of  387 MiB/s

echo "(292+383+410+417+403+380+369+355+367+338+387)/10" | bc -l
410.10000000000000000000

TEST 2

sender# nice -n -20 cpuset -l 4 -s 0 cat /tank3/fs/test | nice -n -20 cpuset -l 5 -s 0 mbuffer -4 -s 128k -m 2G -O 10.10.10.11:8023
receiver# nice -n -20 cpuset -l 4 -s 0 mbuffer -4 -s 128k -m 2G -I 8023 > /tank3/fs5/test

summary: 1e+04 Byte in 31.1 sec - average of  322 MiB/s
summary: 1e+04 Byte in 27.5 sec - average of  363 MiB/s
summary: 1e+04 Byte in 25.3 sec - average of  395 MiB/s
summary: 1e+04 Byte in 25.5 sec - average of  392 MiB/s
summary: 1e+04 Byte in 30.2 sec - average of  331 MiB/s
summary: 1e+04 Byte in 28.4 sec - average of  352 MiB/s
summary: 1e+04 Byte in 28.4 sec - average of  352 MiB/s
summary: 1e+04 Byte in 27.7 sec - average of  361 MiB/s
summary: 1e+04 Byte in 27.7 sec - average of  361 MiB/s
summary: 1e+04 Byte in 25.1 sec - average of  399 MiB/s
summary: 1e+04 Byte in 26.0 sec - average of  385 MiB/s

echo "(322+363+395+392+331+352+352+361+361+399+385)/10" | bc -l
401.30000000000000000000
```

I bet I would probably get the same average if I used 30 samples in each. And there is a bit of a burn in when using mbuffer, where test 2 is not completely fair as I used 2GB mbuffer instead of 1GB as in test 1.

I used the single threaded method as the ZFS replication is that by nature.

Given that I can write ~525MB/s locally, ~1GB/s over just the NICs, is ~400MB/s then acceptable, or should I be able to get the same as the local write speed?


----------



## littlesandra88 (Apr 11, 2013)

@wblock@,

I have just checked, and the NIC's are directly connected to each over. No switch is involved. That could have been fun, if that was teasing me all this time =)


----------



## cpm@ (Apr 11, 2013)

You should distinguish local performance from remote (over the network) performance. In theory, local performance over the network should be way lower due to TCP overhead is involved. See also, if you are curious, this explanatory article about goodput and overheat.

Anyway, try tweak vfs.zfs.write_limit_override value. For further information, reads ZFSTuningGuide.

IMHO, you have got decent speeds


----------



## littlesandra88 (Apr 12, 2013)

@cpu82,

I understand that there is an overhead from TCP and such, so the equation is

`# (goodput + tcp_overhead) / time   >  file_size / time`

where I measure the right side, but in fact more data is being sent than what I enter in my speed calculation.

What I don't understand is, why is this 3x faster

```
receiver# mbuffer -4 -s 128k -m 2G -I 8023 > /dev/null
sender# dd if=/dev/zero bs=1M count=10000 | mbuffer -4 -s 128k -m 2G -O 10.10.10.11:8023
```
than

```
sender# cat /tank3/fs/test | mbuffer -4 -s 128k -m 2G -O 10.10.10.11:8023
receiver# mbuffer -4 -s 128k -m 2G -I 8023 > /tank3/fs5/test
```

?

One bottleneck is the local write to disk speed which is ~525MB/s, so since I am sending the same amount of data over the NICs in both tests, I would have expected ~525MB/s and not ~400MB/s.

Or is there something I am not accounting for?

But yes, 400MB/s is not bad =) I just don't get why I am not getting 525MB/s, so now it is just a theoretical question more than anything =)

Very nice ZFS Tuning Guide. I'll read that right away.


----------



## littlesandra88 (Apr 12, 2013)

@cpu82,

Maybe this reply from @phoenix is the answer to my question?

In my local write to disk test, I used [cmd=]dd if=/dev/zero ...[/cmd]


----------



## cpm@ (Apr 12, 2013)

Summarizing about /dev/null and /dev/zero:

Read from /dev/null always returns a read error.
Write to /dev/null nothing happens (done the reading process from the source, but not done any writing process).

```
[CMD="#"]cat /dev/null[/CMD]
#
[CMD="#"]truss cat /dev/null[/CMD] 
<snip>
open("/dev/null",O_RDONLY,027757764600)		 = 3 (0x3)
fstat(1,{ mode=crw--w---- ,inode=143,size=0,blksize=4096 }) = 0 (0x0)
__sysctl(0xbfbfe8b8,0x2,0xbfbfe8c4,0xbfbfe8c8,0x0,0x0) = 0 (0x0)
__sysctl(0xbfbfe8b8,0x2,0xbfbfe8c4,0xbfbfe8c8,0x0,0x0) = 0 (0x0)
read(3,0x28405000,4096)				 = 0 (0x0)
close(3)					 = 0 (0x0)
close(1)					 = 0 (0x0)
<snip>
process exit, rval = 0
#
```

Read from /dev/zero always returns an ASCII character 0 (the null or 0x00, not the number zero.)
Write to /dev/zero nothing happens (done the reading process from the source, but not done any writing process).

```
[CMD="#"]cat /dev/zero[/CMD]
^C
[CMD="#"]truss cat /dev/zero[/CMD]
<snip>
open("/dev/zero",O_RDONLY,027757764600)		 = 3 (0x3)
fstat(1,{ mode=crw--w---- ,inode=143,size=0,blksize=4096 }) = 0 (0x0)
__sysctl(0xbfbfe8b8,0x2,0xbfbfe8c4,0xbfbfe8c8,0x0,0x0) = 0 (0x0)
__sysctl(0xbfbfe8b8,0x2,0xbfbfe8c4,0xbfbfe8c8,0x0,0x0) = 0 (0x0)
read(3,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(1,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
read(3,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(1,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
read(3,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
<snip>
```
...and on it goes until we (or another process) interrupts it.

Try using /dev/urandom. This way you don't have the overhead of compression and use a more realistic approach to test your network.

```
[CMD="#"]nc -l -p 8023 | dd of=/dev/null[/CMD] 
[CMD="#"]dd if=/dev/urandom bs=1M count=10000 | nc 10.10.10.11 8023[/CMD]
```

Take a look to benchmarks/iperf and benchmarks/netperf.


----------

