# Preferred way to copy 4TB / 15M files over NFS?



## littlesandra88 (Apr 29, 2013)

Hi =)

Four days ago I started an rsync over NFS to copy 4TB in 15 million files. The first ~100GB was transferred in ~12 hours, but now after four days, only 600GB have been copied. So I am wondering, what is slowing the copying down?
Is it rsync that is slow with this many files, as it e.g. needs to check source and destination each time before a file is copied?
Can it have something to do with the optimizations for perform good on large sequential transfers?


```
echo 'kern.ipc.nmbclusters="32768"' >> /boot/loader.conf

echo 'kern.ipc.maxsockbuf=16777216' >> /etc/sysctl.conf
echo 'net.inet.tcp.sendspace=262144' >> /etc/sysctl.conf
echo 'net.inet.tcp.recvspace=262144' >> /etc/sysctl.conf
echo 'net.inet.tcp.rfc1323=1' >> /etc/sysctl.conf
echo 'net.inet.tcp.sendbuf_max=16777216' >> /etc/sysctl.conf
echo 'net.inet.tcp.recvbuf_max=16777216' >> /etc/sysctl.conf
```
The MTU for the NIC that is used for NFS is 1500.

Transferring this many files only have to be done one time for each file system to seed the host. After that it will serve as a NFS/CIFS host.

Anyone that have experienced something similar to this?

Hugs,
Sandra =)


----------



## cpm@ (Apr 29, 2013)

Why not try cpio(1)?


----------



## Uniballer (Apr 29, 2013)

You didn't say: are you copying to a local file system from an NFS filesystem?  Or from one NFS filesystem to another?  Or what?

How have you structured things as far as directories go?  It would make a huge difference if you were trying to put 15M files in one directory, or if you built a tree with about 100 files per directory.  I don't know where the optimal tradeoff is, but it is neither all files in one directory nor one file per directory in a really bushy tree.

What is happening to your network and/or disks?  Are you approaching saturation somewhere, or does it just seem to be slow from waiting a lot?

One problem may be that your copy operation is quite synchronous (i.e. locate a file from the NFS volume, then create a file on the local volume to copy it to, then read and write data, then close, etc.).  If, for example, you were to pipe a copy of tar (or cpio) reading the source files to another one writing the destination files then you might get more I/O overlap.

Tell us more and maybe we can help.


----------



## littlesandra88 (May 2, 2013)

@cpu82, @Uniballer

The source NAS is mounted on the FreeBSD box, where the data should end up. I didn't knew about cpio, and it boosted the performance a lot! Reading about rsync, many are reporting that even 1M files is a lot for rsync, and they see a decrease in performance over time. So I will see if I can write a script that will divide the directory structure up in _n_ pieces and then `cpio` those, and finalize with rsync.


----------



## pboehmer (May 3, 2013)

One other idea might be to run multiple rsyncs simultaneously.  We did something like this for a migration to a new RAID system a while back.  Something like:


```
rsync -a --delete /home/[a-d]* system2:/home/ &
rsync -a --delete /home/[e-h]* system2:/home/ &
```

..and so on.  This is just an example, we had logging redirects, etc.  Also might help if you use the rsyncd server instead of using ssh as the transport (assuming you don't need encryption).


----------



## Crest (May 3, 2013)

Unless your connection is unreliable: `tar cf - $src_path | buffer -s 128k -m 128m | ssh -c arcfour $dst_host "buffer -s 128k -m 128m | tar xpf - -C $dst_path`


----------

