# SSD performance and partition alignment



## aragon (Nov 6, 2010)

Hi,

I recently bought an OCZ Vertex 2 90GB SSD.  I will be using it as a system boot drive and plan to partition it with MBR style partitions.  I decided to do some experimenting with partition boundaries before I put the drive to use.  I'm going to dive right into the benchmarks I've done first, then explain my partitioning method afterwards.

The benchmarks are just sequential writes with dd.  I know this is simplistic, but it does demonstrate the point of this post, and I'm happy to take suggestions on further benchmarks before I put this SSD to use.

Preparation
First I set up a tmpfs(5) file system and wrote a 512 MiB file filled with random data.  I'll be using this file to write to the SSD.

`# mkdir /tmp/tst`
`# kldload tmpfs`
`# mount -t tmpfs tmpfs /tmp/tst`
`# dd if=/dev/urandom of=/tmp/tst/rand bs=1m count=512`


Initial partitions
I create the first partition at the usual 63 sectors offset from the start of the disk (track 1) which is _unaligned_ with the SSD erase block.  The second partition is set to start at sector 21030912 (10767826944 bytes) which is _aligned_ with the SSD erase block.

`# gpart create -s MBR ada0`
`# gpart add -s 10g -t freebsd ada0`
`# gpart add -b 21030912 -s 10g -t freebsd ada0`

```
# gpart show ada0
=>       63  175836465  ada0  MBR  (84G)
         63   20971503     1  freebsd  (10G)
   20971566      59346        - free -  (29M)
   21030912   20971503     2  freebsd  (10G)
   42002415  133834113        - free -  (64G)
```


BSD labels and UFS file systems
I create the UFS file systems by calling newfs(8) with the -E parameter so that the SSD is erased using TRIM hardware commands.

`#  gpart create -s BSD ada0s1`
`#  gpart create -s BSD ada0s2`
`#  gpart add -s 1g -t freebsd-ufs ada0s1`
`#  gpart add -s 1g -t freebsd-ufs ada0s2`

```
# gpart show ada0s1
=>       0  20971503  ada0s1  BSD  (10G)
         0   2097152       1  freebsd-ufs  (1.0G)
   2097152  18874351          - free -  (9.0G)

# gpart show ada0s2
=>       0  20971503  ada0s2  BSD  (10G)
         0   2097152       1  freebsd-ufs  (1.0G)
   2097152  18874351          - free -  (9.0G)

# newfs -E ada0s1a
/dev/ada0s1a: 1024.0MB (2097152 sectors) block size 16384, fragment size 2048
	using 6 cylinder groups of 183.72MB, 11758 blks, 23552 inodes.
Erasing sectors [128...2097151]
super-block backups (for fsck -b #) at:
 160, 376416, 752672, 1128928, 1505184, 1881440
# newfs -E ada0s2a
/dev/ada0s2a: 1024.0MB (2097152 sectors) block size 16384, fragment size 2048
	using 6 cylinder groups of 183.72MB, 11758 blks, 23552 inodes.
Erasing sectors [128...2097151]
super-block backups (for fsck -b #) at:
 160, 376416, 752672, 1128928, 1505184, 1881440
```


Mount points
`# mount -t ufs /dev/ada0s1a /mnt/1`
`# mount -t ufs /dev/ada0s2a /mnt/2`


First tests

```
# dd if=/tmp/tst/rand of=/mnt/1/rand bs=1m 
512+0 records in
512+0 records out
536870912 bytes transferred in 5.721152 secs (93839651 bytes/sec)
# dd if=/tmp/tst/rand of=/mnt/2/rand bs=1m
512+0 records in
512+0 records out
536870912 bytes transferred in 4.318652 secs (124314467 bytes/sec)

# dd if=/tmp/tst/rand of=/mnt/1/rand bs=1m
512+0 records in
512+0 records out
536870912 bytes transferred in 6.489263 secs (82732185 bytes/sec)
# dd if=/tmp/tst/rand of=/mnt/2/rand bs=1m
512+0 records in
512+0 records out
536870912 bytes transferred in 4.088808 secs (131302547 bytes/sec)

# dd if=/tmp/tst/rand of=/mnt/1/rand bs=1m
512+0 records in
512+0 records out
536870912 bytes transferred in 6.750464 secs (79530965 bytes/sec)
# dd if=/tmp/tst/rand of=/mnt/2/rand bs=1m
512+0 records in
512+0 records out
536870912 bytes transferred in 4.091864 secs (131204490 bytes/sec)
```

The unaligned partition performs about 25% slower at a sequential write than the aligned partition.  It also loses speed with subsequent writes which doesn't happen on the aligned partition.


Refresh file systems
I wanted to see if TRIM could restore the speed of the first partition.

`# umount /mnt/1 /mnt/2`
`# newfs -E ada0s1a`
`# newfs -E ada0s2a`
`# mount -t ufs /dev/ada0s1a /mnt/1`
`# mount -t ufs /dev/ada0s2a /mnt/2`

```
# dd if=/tmp/tst/rand of=/mnt/1/rand bs=1m
512+0 records in
512+0 records out
536870912 bytes transferred in 5.906114 secs (90900874 bytes/sec)
# dd if=/tmp/tst/rand of=/mnt/2/rand bs=1m
512+0 records in
512+0 records out
536870912 bytes transferred in 4.088698 secs (131306084 bytes/sec)
```

Looks like a TRIM on the unaligned partition brought it back up to speed...


Align unaligned partition
So let's destroy the first partition and adjust its starting point so that it is aligned with the SSD erase block boundaries.  I recreate it to start at sector 129024 (66060288 bytes) which is aligned.

`# gpart delete -i 1 ada0s1`
`# gpart destroy ada0s1`
`# gpart delete -i 1 ada0`
`# gpart add -t freebsd -b 129024 -s 20842479 -i 1 ada0`

```
# gpart show ada0
=>       63  175836465  ada0  MBR  (84G)
         63     128961        - free -  (63M)
     129024   20842479     1  freebsd  (9.9G)
   20971503      59409        - free -  (29M)
   21030912   20971503     2  freebsd  (10G)
   42002415  133834113        - free -  (64G)
```

`# gpart create -s BSD ada0s1`
`# gpart add -t freebsd-ufs -s 1g ada0s1`
`# newfs -E ada0s1a`
`# mount -t ufs /dev/ada0s1a /mnt/1`


```
# dd if=/tmp/tst/rand of=/mnt/1/rand bs=1m
512+0 records in
512+0 records out
536870912 bytes transferred in 4.075008 secs (131747207 bytes/sec)
# dd if=/tmp/tst/rand of=/mnt/1/rand bs=1m
512+0 records in
512+0 records out
536870912 bytes transferred in 4.077147 secs (131678085 bytes/sec)
# dd if=/tmp/tst/rand of=/mnt/1/rand bs=1m
512+0 records in
512+0 records out
536870912 bytes transferred in 4.073861 secs (131784294 bytes/sec)
```

Wallah. 

So it seems pretty clear from these basic tests that correct partition alignment improves SSD throughput.  Anyone want me to do more extensive tests?


----------



## aragon (Nov 6, 2010)

In case anyone doesn't know, SSDs benefit from careful partition alignment so as to ensure they start on the erase block boundaries of the SSD in question.  Google can fill you in on all the details, but the basic constraints are as follows:


SSD erase block boundaries vary from manufacturer to manufacturer, but a safe number to assume should be 1 MiB (1048576 bytes).
Old BIOSes and partitioning tools require partitions to fall on track boundaries.  Sector size is typically 512 bytes, and track size is typically 63 sectors.  In other words, partitions can start every 63 sectors (32256 bytes).  This is very different to what an SSD wants - 1048576 bytes, or 2048 sectors.

Apparently BIOSes since about 2001 stopped needing track boundaries, however, there are still enough partitioning tools that adhere strongly to the 63 sectors/track boundary requirement to make life difficult if one tries to ignore it.  So what's the least disruptive work around?

Calculate the boundaries that satisfy the 63 sectors/track requirement and match the 1 MiB erase block boundaries of SSDs.  All that one loses is a bit of granularity when partitioning disks.

I use the following script to calculate the boundaries:


```
#!/bin/sh

# erase boundary in 512 byte blocks
# eg. for a 128KiB erase boundary:
# 131072 / 512 = 256
ERASEB=2048
TRACKB=63     # 63 sectors/track
#TRACKB=16065  # 255 tracks/cylinder, 63 sectors/track (linux wants cylinder boundaries)
PARTB=$(( ${ERASEB} * ${TRACKB} ))

echo "enter start offset of partition in bytes (append m/g for MiB/GiB)"
read boffset

case "" in
${boffset##*g})
	boffset=$(( ${boffset%*g} * 1073741824 ))
	;;
${boffset##*m})
	boffset=$(( ${boffset%*m} * 1048576 ))
	;;
esac

if [ -z "${boffset}" ]; then exit 1; fi

sectors=$(( ( ${boffset} - ( ${boffset} % 512 ) ) / 512 ))
npb=$(( ${sectors} / ${PARTB} ))
ssdlbahigh=$(( (${npb} + 1) * ${PARTB} ))
ssdlbalow=$(( ${npb} * ${PARTB} ))

echo
echo "Desired offset: ${boffset} bytes"
echo "Corrected offsets:"
echo "High: ${ssdlbahigh} blocks, $(( ${ssdlbahigh} / ${TRACKB} )) tracks @ ${TRACKB} s/t ($(( ${ssdlbahigh} * 512 )) bytes)"
echo "Low: ${ssdlbalow} blocks, $(( ${ssdlbalow} / ${TRACKB} )) tracks @ ${TRACKB} s/t ($(( ${ssdlbalow} * 512 )) bytes)"
```

Probably the easiest way to keep things aligned is to just ensure the first partition is aligned by running the above script and entering "0" as the start offset.  Use the "High" corrected offset as your starting point for the first partition.  Then ensure all following partitions (if any) are multiples of 1 MiB in size, eg.


```
# echo 0 |ssdlba.sh 
enter start offset of partition in bytes (append m/g for MiB/GiB)

Desired offset: 0 bytes
Corrected offsets:
High: 129024 blocks, 2048 tracks @ 63 s/t (66060288 bytes)
Low: 0 blocks, 0 tracks @ 63 s/t (0 bytes)

# gpart add -b 129024 -t freebsd -s 1g ada0
# gpart add -t freebsd -s 10g ada0
```

*Correction*: I've noticed that gpart doesn't necessarily follow memory alignment when one specifies sizes with a "g" or "m" suffix.  If you look in my benchmarks above it created partitions that are 10737409536 bytes in size, but 10 GiB is 10737418240 bytes.  After deeper consideration this makes sense - gpart is adjusting the size so that the partition ends on a track boundary.  This is to be expected.  In that case, use my script to calculate your partition end boundaries too.  It's not really crucial that a partition ends on an SSD erase block boundary, but if the end of one partition marks the beginning of another partition then it is important. 

Another option is to not bother with aligning the MBR partitions, but to do something similar inside your BSD label.  I have not benchmarked this method though...


----------



## aragon (Nov 6, 2010)

I've done some quick testing with doing the alignment inside the BSD label.  I recreated the first partition with the default start offset of sector 63 (unaligned).


```
# gpart show ada0
=>       63  175836465  ada0  MBR  (84G)
         63   20971503     1  freebsd  (10G)
   20971566      59346        - free -  (29M)
   21030912   20971503     2  freebsd  (10G)
   42002415  133834113        - free -  (64G)
```

So our partition starts at sector 63, or byte 32256.  Our data should begin at byte 1048576 (1 MiB), so our first BSD partition should start at 1048576-32256 (byte 1016320, sector 1985):


```
# gpart show ada0s1
=>       0  20971503  ada0s1  BSD  (10G)
         0      1985          - free -  (993K)
      1985   2097152       1  freebsd-ufs  (1.0G)
   2099137  18872366          - free -  (9.0G)
```

This is a little bit more space efficient than aligning MBR partitions.  Speeds are also good:


```
# dd if=/tmp/tst/rand of=/mnt/1/rand bs=1m
512+0 records in
512+0 records out
536870912 bytes transferred in 4.061751 secs (132177213 bytes/sec)
# dd if=/tmp/tst/rand of=/mnt/1/rand bs=1m
512+0 records in
512+0 records out
536870912 bytes transferred in 4.067801 secs (131980623 bytes/sec)
# dd if=/tmp/tst/rand of=/mnt/1/rand bs=1m
512+0 records in
512+0 records out
536870912 bytes transferred in 4.059658 secs (132245353 bytes/sec)
```

Just to confirm that the above wasn't some fluke, I tried recreating the BSD partition unaligned and reran the test:


```
# gpart show ada0s1
=>       0  20971503  ada0s1  BSD  (10G)
         0      1984          - free -  (992K)
      1984   2097152       1  freebsd-ufs  (1.0G)
   2099136  18872367          - free -  (9.0G)

# dd if=/tmp/tst/rand of=/mnt/1/rand bs=1m
512+0 records in
512+0 records out
536870912 bytes transferred in 5.954004 secs (90169729 bytes/sec)
```

So, I will probably do my own alignment inside the BSD label.


----------



## mav@ (Nov 6, 2010)

Nice numbers. Thank you. The only thing to add is that

```
diskinfo -v /dev/ada0s1a
```
is able to report specified partition offset from the beginning of the stripe (if present) or from the beginning of the disk (usually). Could be used for checking.


----------



## steveh (May 29, 2012)

Tried this on a Corsair F60 disk and couldn't repeat the results. Throughput in both cases is around 62MB/s.


```
# diskinfo -v /dev/ada0s1a
/dev/ada0s1a
        512             # sectorsize
        1073741824      # mediasize in bytes (1.0G)
        2097152         # mediasize in sectors
        0               # stripesize
        32256           # stripeoffset
        2080            # Cylinders according to firmware.
        16              # Heads according to firmware.
        63              # Sectors according to firmware.
        1112650900000999009E    # Disk ident.
```


----------



## steveh (May 29, 2012)

Hmm no edit, ho hum...

```
# diskinfo -v /dev/ada0s2a
/dev/ada0s2a
        512             # sectorsize
        1073741824      # mediasize in bytes (1.0G)
        2097152         # mediasize in sectors
        0               # stripesize
        2177892352      # stripeoffset
        2080            # Cylinders according to firmware.
        16              # Heads according to firmware.
        63              # Sectors according to firmware.
        1112650900000999009E    # Disk ident.
```


----------



## steveh (May 30, 2012)

Some more interesting results. Seems that while 4K sector size makes no difference for incompressible data on Corsair F60's there's a marked difference with compressible data.

I my test here 512b sectors on ZFS when copying zero's gives a maximum of ~170MB/s switching the pool to 4k sectors using the process below resulted in up to ~270MB/s. Using a backup of real data as instead of zeros the change wasn't as marked but still increased from ~90MB/s to ~105MB/s.

Process used for 4k sector:-

```
gnop create -S 4096 ada0
zpool create ssd ada0.nop
zpool export ssd
gnop destroy ada0.nop
zpool import ssd
```


----------



## papelboyl1 (Jun 2, 2012)

Hi Aragon. How did you get the 210xxxxx number in the initial partition? Thank you.


----------



## aragon (Jun 2, 2012)

papelboyl1 said:
			
		

> Hi Aragon. How did you get the 210xxxxx number in the initial partition? Thank you


Are you asking how I calculated it?  I used the shell script in post #2.

But if you want to make sense of it, the break down is:


```
21030912 sectors of 512 bytes each
21030912 * 512 = 10767826944 bytes
10767826944 / 1024 / 1024 = 10269 MiB
```

And at the same time, it also falls on a track boundary:

```
Given the "standard" 63 sectors/track figure
21030912 sectors
21030912/63 = 333824 tracks
```

So both ideals are met: MBR track boundaries of 63 sectors, and SSD erase boundaries of 1 MiB.

P.S. I haven't tested it, but I think gpart(8)'s new "-b" argument makes all this much easier now.


----------

