# Geom stripe bottleneck



## frabron (Jun 2, 2014)

Hi all,

I have a stripe (RAID0) geom setup for my database's data. Currently I am applying some large updates on the data and I think the performance of my stripe could be better. But I am uncertain and so I thought I'd request some interpretation help from the community 

The stripe consists of two disks (WD Velociraptor with 10.000 rpm):

```
>diskinfo -v ada2
ada2
        512             # sectorsize
        600127266816    # mediasize in bytes (558G)
        1172123568      # mediasize in sectors
        0               # stripesize
        0               # stripeoffset
        1162821         # Cylinders according to firmware.                                                         
        16              # Heads according to firmware.                                                             
        63              # Sectors according to firmware.                                                           
        WD-WXH1E61ASNX9 # Disk ident.
>diskinfo -v ada3
ada3                                                                                                               
        512             # sectorsize                                                                               
        600127266816    # mediasize in bytes (558G)                                                                
        1172123568      # mediasize in sectors                                                                     
        0               # stripesize                                                                               
        0               # stripeoffset                                                                             
        1162821         # Cylinders according to firmware.                                                         
        16              # Heads according to firmware.                                                             
        63              # Sectors according to firmware.                                                           
        WD-WXL1E61PWAL2 # Disk ident.
```

and /var/log/dmesg.boot

```
# snip
ada2 at ahcich2 bus 0 scbus2 target 0 lun 0
ada2: <WDC WD6000HLHX-01JJPV0 04.05G04> ATA-8 SATA 3.x device
ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada2: Command Queueing enabled
ada2: 572325MB (1172123568 512 byte sectors: 16H 63S/T 16383C)
ada2: Previously was known as ad8
ada3 at ahcich3 bus 0 scbus3 target 0 lun 0
ada3: <WDC WD6000HLHX-01JJPV0 04.05G04> ATA-8 SATA 3.x device
ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada3: Command Queueing enabled
ada3: 572325MB (1172123568 512 byte sectors: 16H 63S/T 16383C)
ada3: Previously was known as ad10
#snap
```

And here's some `iostat -d -w 10 ada0 ada1 ada2 ada3` example output

```
#snip
           ada0             ada1             ada2             ada3 
  KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  MB/s 
  0.00   0  0.00   0.00   0  0.00  19.33 176  3.32  19.33 176  3.32 
 16.25   0  0.01  16.25   0  0.01  16.87 133  2.20  16.87 133  2.20 
  0.00   0  0.00   0.00   0  0.00  16.77 146  2.40  16.77 147  2.40 
  0.00   0  0.00   0.00   0  0.00  19.46 170  3.24  19.45 170  3.23 
 21.50   0  0.01  21.50   0  0.01  17.00 125  2.08  17.00 125  2.08 
  0.50   0  0.00   0.50   0  0.00  16.88 145  2.38  16.88 145  2.38 
  0.00   0  0.00   0.00   0  0.00  16.96 125  2.07  16.97 125  2.07 
  0.00   0  0.00   0.00   0  0.00  19.82 158  3.06  19.81 158  3.07 
 28.77   1  0.03  28.77   1  0.03  16.83 133  2.19  16.82 133  2.19
#snap
```

I think the MB/s output is rather low for such a disk. To gain further insight I started `gstat`:

```
# gstat 
dT: 1.001s  w: 1.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    0     27      0      0    0.0     27   2226    4.8    7.0| ada0
    0     28      1     32   23.9     27   2226    1.3    3.9| ada1
    2    120    115   1838    6.4      5     96    0.2   74.3| ada2
    2    121    116   1854    6.3      5     96    0.4   72.9| ada3
    0     28      1     32   24.0     27   2226    5.0    8.7| mirror/gm
    2    121    116   3708    7.9      5    192    0.4   92.2| stripe/gs
    0     28      1     32   24.0     27   2226    5.0    8.7| mirror/gms1
    0     12      0      0    0.0     12   1343    9.1    6.9| mirror/gms1a
    0      0      0      0    0.0      0      0    0.0    0.0| mirror/gms1b
    0      0      0      0    0.0      0      0    0.0    0.0| mirror/gms1d
    0      0      0      0    0.0      0      0    0.0    0.0| mirror/gms1e
    0     16      1     32   24.0     15    883    1.7    2.9| mirror/gms1f
```

What bothers me here is that the stripe/gs is 92% busy while the disks themselves are only at 74/72%. This lead me to my post here and seek some advice, since I don't know enough about the mechanics and so I can't really find the problem, if there is any at all. Btw, here's the output from `geom stripe list`:

```
# geom stripe list
Geom name: gs
State: UP
Status: Total=2, Online=2
Type: AUTOMATIC
Stripesize: 8192
ID: 1042782665
Providers:
1. Name: stripe/gs
   Mediasize: 1200254517248 (1.1T)
   Sectorsize: 512
   Stripesize: 8192
   Stripeoffset: 0
   Mode: r1w1e1
Consumers:
1. Name: ada2
   Mediasize: 600127266816 (558G)
   Sectorsize: 512
   Mode: r1w1e2
   Number: 0
2. Name: ada3
   Mediasize: 600127266816 (558G)
   Sectorsize: 512
   Mode: r1w1e2
   Number: 1
```

So is there some bottleneck or I am just worrying about nothing?

Many thanks,

Frank


----------



## worldi (Jun 3, 2014)

In the output above your drives reached their maximum IOPS (aka _tps_ aka _ops/s_).

If I had to guess I'd say that the records read from the database are larger than _Stripesize_ and each read operation hits both disks (note that the IOPS of the `gstripe` match the IOPS of a single disk).


----------



## SirDice (Jun 3, 2014)

Are you sure it's not the database itself that needs tweaking? A lot can be achieved using caching and tuning the database server itself. It may just be bad indexes and a really small cache that's causing a lot of disk I/O.


----------



## frabron (Jun 4, 2014)

Yes, my internet search results and an inquiry to the geom ML lead me also to the conclusion that the drives are performing as fast as they can.

Concerning a non optimal DB configuration: Yeah, probably, I've used Greg Smith's Book Postgresql High Performance for configuring the server, but this is not my daily kind of work, and the database version is also different so there's likely room for improvement. I am importing OpenStreetMap geographic data covering the world into the database, and that's just pretty much data and a lot of I/O is to be expected. The resulting database will probably be around 300GB, and the server is on the lower end for that kind of size and usage.


----------



## mav@ (Jun 5, 2014)

To completely saturate modern disks, and especially in RAID0, you should send down to them as many requests as possible (for you setup of 2xSATA up to 64 requests). You should check your database configuration and SQL request flow to make sure that load they generate is parallel enough. If you are executing SQL requests in single stream and database configured to process them synchronously -- that may give not enough parallelism for your disk setup.


----------



## frabron (Jun 6, 2014)

Thanks, I will investigate my toolchain (converters/osm2pgsql and databases/postgresql93-server) and see if I can enhance the import process. But from what I see, osm2pgsql parses the raw data into a format acceptable for PostgreSQL and uses SQL COPY FROM STDIN to put the data into the database.


----------

