# Poor performance with AESNI GELI



## Faldaani (Jul 27, 2012)

Hello

I've got an E3-1240, 3.4Ghz quad core xeon with the AESNI instructionset.
I'm getting no more than 180mb/sec encryption with GELI, without a filesystem involved.

In this test I enable the geom_zero device, encrypt it, read from it and dump the encrypted data to /dev/null.


```
# kldload aesni
# kldload geom_eli
# kldload geom_zero
# geli onetime -s 4096 gzero
# sysctl kern.geom.zero.clear=0

# geli list gzero.eli
Geom name: gzero.eli
State: ACTIVE
EncryptionAlgorithm: AES-XTS
KeyLength: 128
Crypto: hardware
Flags: ONETIME
KeysAllocated: 2
KeysTotal: 268435456
Providers:
1. Name: gzero.eli
   Mediasize: 1152921504606846976 (1.0E)
   Sectorsize: 4096
   Mode: r0w0e0
Consumers:
1. Name: gzero
   Mediasize: 1152921504606846976 (1.0E)
   Sectorsize: 512
   Mode: r1w1e1

# dd if=/dev/gzero.eli of=/dev/null bs=1m count=4096
4096+0 records in
4096+0 records out
4294967296 bytes transferred in 24.186710 secs (177575508 bytes/sec)
```

I also tried encrypting an SSD with ZFS on and got about the same speed as above.

For comparison sake, using DiskCryptor on the same machine in Windows I get up to 5.3 GIGABYTE / second in benchmarking tests using AES-XTS, and the absolute slowest speed is

I'm obviously missing something here... guessing that DiskCryptor is misrepresenting the encryption speed somewhat, but... looking around on Google I see 2.4gb/sec in TrueCrypt.

So... am I benchmarking the wrong way in FreeBSD? Any ideas on how to speed this up?
170mb/sec isn't going to cut it when split over 20 disks...


----------



## SirDice (Jul 27, 2012)

Not sure if it's the issue but I noticed gzero has a sectorsize of 512 while gzero.eli has 4096. What happens if both have the same sectorsize?


----------



## Faldaani (Jul 27, 2012)

It makes no difference with 512 byte sector size unfortunately 

This is a just installed system (first time with FreeBSD ever) and I realized that I didn't have the /dev/crypto device in the kernel (although I guess GELI doesn't use it?).

I recompiled the kernel and ran the following:

```
# cd /usr/src/tools/tools/crypto/
# make clean install
# ./cryptotest -a aes256 4096 100000
   1.170 sec,    8192 aes256 crypts,  100000 bytes, 700293043 byte/sec,  5342.8 Mb/sec
```

That is 5.3gb/sec, almost exactly the performance of DiskCryptor in Windows.
But I'm still getting the same poor performance from GELI.

I noticed that GELI only uses one thread for the encryption per device.
The machine has 4 cores with two paralell pipelines each, for a total of 8.

Assuming that in a real world usage scenario GELI will use all 8 pipelines (will it?) then 175 * 8 = 1400, which is better, but still a far cry from 5.3gb...


----------



## vermaden (Jul 27, 2012)

By default GELI spreads a thread for every available core, for example, I have dual-core box, so by deafult it uses 2 threads:


```
g_eli[1] ada0p3
g_eli[0] ada0p3
```

But You can control that with that OID:

```
% sysctl -d kern.geom.eli.threads
kern.geom.eli.threads: Number of threads doing crypto work
```

Value 0 means use one thread per one core.


----------



## wblock@ (Jul 27, 2012)

Isn't it AES-CBC that's supposed to be able to achieve wire speed with AES-NI?  There was a commit recently-ish by phk (I think) that had some notes about it.  Which I can't find now, of course.


----------



## Faldaani (Jul 27, 2012)

Okay, my bad about the threads. Tried changing it to "2", makes no difference


----------



## Faldaani (Jul 27, 2012)

Interesting, with aes-cbc and 128 keylength I get ~300mb/sec
4294967296 bytes transferred in 14.069708 secs (305263426 bytes/sec)

Thats double... any other suggestions for making it a bit faster? 
It is a bit odd, since DiskCryptor on windows was using AES-XTS..


----------



## mmoll (Jul 27, 2012)

Hi,
if you use 9.x, have a look at the following patches which were commited to 10.x but not MFCed:
http://www.secnetix.de/olli/FreeBSD/svnews/index.py?r=226837
http://www.secnetix.de/olli/FreeBSD/svnews/index.py?r=226840


----------



## wblock@ (Jul 27, 2012)

Faldaani said:
			
		

> Interesting, with aes-cbc and 128 keylength I get ~300mb/sec
> 4294967296 bytes transferred in 14.069708 secs (305263426 bytes/sec)
> 
> Thats double... any other suggestions for making it a bit faster?
> It is a bit odd, since DiskCryptor on windows was using AES-XTS..



No experience with DiskCryptor; could it have been delayed encryption?  (Write the raw data, encrypt while the processor is idle.)


----------



## Faldaani (Jul 28, 2012)

Those fixes seem interesting... will have to get those... somehow...
Just out of curiosity, which one is "better", AES-CBC or AES-XTS? My googling says XTS should be used for random IO?

I need to pick an algorithm soon.. its a bit of a pain to change it later.
Figure I'll pick the one that is the most likely to have performance improvements later.

I did some additional benchmarks with windows and it turns out that the performance isn't that good when doing actual encryption (compared to its benchmarks), basically equivalent to GELI. Oops.

Just wish I understood why encryption of IO isn't as fast as the benchmarks. Due to a roundtrip through the CPU? But those busses are crazy fast... so I doubt it?


----------



## Faldaani (Jul 28, 2012)

Hmm.. I suddenly get 750mb/s with AES-CBS 128 and 550mb/s with AES-CBS 256... for some reason. I haven't changed anything except my network adapter and performed a reboot, which should be totally unrelated.

AES-CBS, 128 key length:
4294967296 bytes transferred in 5.770889 secs (744247107 bytes/sec)

AES-CBS, 256 key length:
4294967296 bytes transferred in 7.311982 secs (587387571 bytes/sec)

AES-XTS, 128 key length:
4294967296 bytes transferred in 24.125382 secs (178026914 bytes/sec)

AES-XTS, 256 key length:
4294967296 bytes transferred in 26.799398 secs (160263574 bytes/sec)


----------



## Faldaani (Jul 28, 2012)

... I meant CBC, not CBS in the last post. And I can't edit it 
Thats what I get for posting at 4am.


----------



## lockdoc (Jul 28, 2012)

In the commit he also says


> As a side-note, GELI with AES-NI using *AES-CBC* can achive native disk speed.
> 
> MFC after:      3 days



This seems to match your benchmark.

I wonder why there is such a big difference between XTS and CBC. Anyone know the security difference between both of them?

Btw:
As far as I understood,  wiki says that you do need a 512bit key for AES-XTS-256 and a 256bit key for AES-XTS-128.
Can anyone confirm that if I want to encrypt my disks with geli aes-xts-128, that I will do need the specified key length?


----------



## Faldaani (Jul 28, 2012)

Yeah, unless you have a really fast RAID array 
But I'm more than happy with 700mb/s, just wish I understood why it suddenly improved 

I'd also be interested to know about the keylength.
Currently I'm generating a 2kb random key file and sending -l 256 as keylen for AES-XTS-256, should be safe, right?


----------



## lockdoc (Jul 28, 2012)

I though you were using CBC and not XTS?


----------



## Faldaani (Jul 28, 2012)

Hmm.. it appears that AES-CBC performance is related to the sector size of the GELI device.
So is XTS, but not as much.

AES-CBC, 128 key length, 512b sector
# geli detach /dev/gzero.eli; geli onetime -l 128 -e aes-cbc -s 512 /dev/gzero
# dd if=/dev/gzero.eli of=/dev/null bs=1m count=4096
4294967296 bytes transferred in 18.744036 secs (229137807 bytes/sec)

AES-CBC, 128 key length, 2048b sector
# geli detach /dev/gzero.eli; geli onetime -l 128 -e aes-cbc -s 2048 /dev/gzero
# dd if=/dev/gzero.eli of=/dev/null bs=1m count=4096
4294967296 bytes transferred in 7.621870 secs (563505711 bytes/sec)

AES-CBC, 128 key length, 4096b sector
# geli detach /dev/gzero.eli; geli onetime -l 128 -e aes-cbc -s 2048 /dev/gzero
# dd if=/dev/gzero.eli of=/dev/null bs=1m count=4096
4294967296 bytes transferred in 5.792496 secs (741470917 bytes/sec)

AES-XTS, 128 key length, 512b sector
# geli detach /dev/gzero.eli ; geli onetime -l 128 -e aes-xts -s 512 /dev/gzero
# dd if=/dev/gzero.eli of=/dev/null bs=1m count=4096
4294967296 bytes transferred in 32.427407 secs (132448682 bytes/sec)

AES-XTS, 128 key length, 2048b sector
# geli detach /dev/gzero.eli ; geli onetime -l 128 -e aes-xts -s 2048 /dev/gzero
# dd if=/dev/gzero.eli of=/dev/null bs=1m count=4096
4294967296 bytes transferred in 25.264700 secs (169998746 bytes/sec)

AES-XTS, 128 key length, 4096b sector
# geli detach /dev/gzero.eli ; geli onetime -l 128 -e aes-xts -s 2048 /dev/gzero
# dd if=/dev/gzero.eli of=/dev/null bs=1m count=4096
4294967296 bytes transferred in 23.995813 secs (178988196 bytes/sec)


----------



## Faldaani (Jul 28, 2012)

Sigh, copy paste errors.
The last geli onetime command for xts/cbc is sector size 4096 (not 2048 as it says), I just copied it wrong <.<


----------



## lockdoc (Jul 28, 2012)

4096 is the default for geli. Anything smaller shouldn't really be used.
Of course 4096b will be faster than smaller ones. So use 4096b.

Another interesting point is the drive itself. Is yours already a 4k (advanced format drive) disk?


----------



## lockdoc (Jul 28, 2012)

Btw here are mine

```
# geli onetime -l 128 -e aes-cbc -s 4096 /dev/gzero
# dd if=/dev/gzero.eli of=/dev/null bs=1m count=4096
4294967296 bytes transferred in 6.837752 secs (628125644 bytes/sec)


# geli onetime -l 128 -e aes-xts -s 4096 /dev/gzero
# dd if=/dev/gzero.eli of=/dev/null bs=1m count=4096
4294967296 bytes transferred in 32.004749 secs (134197812 bytes/sec)
```

It really is much faster. I should have know it before encrypting 8 disks with XTS :-(


----------



## Faldaani (Jul 28, 2012)

I've got a mix of 512 and 4k drives. Right now I haven't even tested with the drives (they're in use, going to nuke them later)... just staying with memory based testing for now.

I guess I'll go with AES-CBC 256 then.. unless someone else has a good reason for using XTS over CBC?

I don't really understand the difference.


----------



## lockdoc (Jul 28, 2012)

An argument against CBC could be this


> Unlike XTS, CBC must read the previos cypher block to encrypt the next, and...
> in CBC (with IV's), if you need to change some data on block 1, then, you will
> need to recypher subsequent blocks.
> 
> ...


http://seclists.org/basics/2009/May/253


----------



## Faldaani (Jul 28, 2012)

Yeah.. not sure I understand its implications.

What is a cypher block in the context of GELI? I'm guessing that each HDD sector has one or more cypher blocks? If that is the case it shouldn't matter since the whole sector is rewritten anyway (when data is written to disk)?

Don't know enough of how it works internally in GELI


----------



## lockdoc (Jul 28, 2012)

There is really not much information out there. Most google searches link me back to this thread.


----------



## Sebulon (Jul 29, 2012)

I've also done some research on the subject:
GELI Benchmarks

/Sebulon


----------



## lockdoc (Jul 29, 2012)

Thanks Sebulon.

Do you know if key length (not the encryption bit) does affect the performance as well?


----------



## Sebulon (Jul 29, 2012)

@lockdoc

Not that big of a difference, but measureable. The numbers are all there.

/Sebulon


----------



## lockdoc (Jul 29, 2012)

Sebulon said:
			
		

> @lockdoc
> Not that big of a difference, but measureable. The numbers are all there.
> /Sebulon



I meant the keyfile. You only have used one.


			
				Sebulon said:
			
		

> dd if=/dev/random of=/boot/geli/disks.key bs=64 count=1


----------



## Sebulon (Jul 29, 2012)

@lockdoc

Ahh, OK, now I get it. Yes, I only used the same key, with the lenght described in the Handbook. In case it affected performance, the results wouldn't have been comparable.

/Sebulon


----------



## lockdoc (Aug 4, 2012)

Anyone know what the security difference between AES-XTS and AES-CBS is?
I mean, as seen from the benchmarks above I would like to migrate to AES-CBC, but only if it is as secure as XTS.


----------



## vermaden (Aug 4, 2012)

@lockdoc

Its already described here in this thread:
http://forums.freebsd.org/showpost.php?p=185291&postcount=21


----------



## lockdoc (Aug 5, 2012)

vermaden said:
			
		

> @lockdoc
> 
> Its already described here in this thread:
> http://forums.freebsd.org/showpost.php?p=185291&postcount=21


Yes I posted that. But again, if you read this


> ...XTS will be more fast since you can do parallel operations.


And then compare it to the benchmarks the users have done in this forum, it all doesnt make sense, as CBC seems to be faster.



> ...XTS have some strong design on some attacks...


 I cannot extract security related information from that. So the question is still open.


----------



## vermaden (Aug 5, 2012)

I have read that both CBC and XTS are bing considered secure, its like AES vs blowfish debate, but I am not that into cryptography to tell You exact differences to tell You which one is more secure.

Below are real world benchmark results from some user of these forums, all WITH aesni(4), one without:


```
[B]ALGORITHM     BIT  MB/s[/B]
NONE           -   146
AES-XTS       128   70
AES-CBC       128  114 (65 without AESNI)
Blowfish-CBC  128   28
Camellia-CBC  128   43
3DES-CBC      192   14
AES-XTS       256   68
AES-CBC       256  106
Blowfish-CBC  256   28
Camellia-CBC  256   37
```


----------



## nterupt (May 28, 2013)

mmoll said:
			
		

> Hi,
> if you use 9.x, have a look at the following patches which were commited to 10.x but not MFCed:
> http://www.secnetix.de/olli/FreeBSD/svnews/index.py?r=226837
> http://www.secnetix.de/olli/FreeBSD/svnews/index.py?r=226840



I am currently using FreeBSD 9.1 x64 and AES-XTS 256.  I am having some performance issues and was wondering 1) how I can tell whether this has made it into 9.1 and 2) if not, what would be the best way for me to pull into this change onto my system.


----------



## mmoll (May 28, 2013)

Hi,



			
				nterupt said:
			
		

> 1) how I can tell whether this has made it into 9.1


They haven't.


			
				nterupt said:
			
		

> 2) if not, what would be the best way for me to pull into this change onto my system.


From the links above, you can get diffs/patches, which you can apply to your 9.x sources and rebuild the kernel.


----------



## jmg@ (Nov 18, 2013)

Faldaani said:
			
		

> Hello
> So... am I benchmarking the wrong way in FreeBSD? Any ideas on how to speed this up?
> 170mb/sec isn't going to cut it when split over 20 disks...



FreeBSD previously wasn't pipelining the AES-NI instructions. I have updates to HEAD and they will be in 10.0-RELEASE when it comes out that does (I do plan on back porting to stable9 some time soon). It makes GELI on gzero perform in excess of 400 MB/sec on my machine.

The main commit is in http://svnweb.freebsd.org/changeset/base/255187.

Though there are other fixes (like properly dealing with unaligned accesses) that should be integrated too.

It is also fast enough that I use (a modified and not as fast) patch on my 9.1 8 drive ZFS on GELI system and get good performance, though my performance is also impacted by using SHA256 as an authentication layer for encryption.


----------



## lakhindr (Jan 9, 2014)

Freebsd FreeBSD 10.0 improvements will help here: https://wiki.freebsd.org/WhatsNew/FreeBSD10



> Support for AES-NI instruction and intrinsics has been added to gcc. The aesni module has been improved to use pipelining when possible. This results in a significant speed up for AES-XTS and AES-CBC decrypt.


----------

