# Bad performance on FreeBSD 7.1 RELEASE AMD64



## randux (Jan 11, 2009)

Hi,

I've been using all of the major BSD off and on for about 3 or 4 years. I just got some new boxes and needed a good desktop so I tried all the latest releases and I just about settled on FreeBSD AMD64 for this machine but after using it for a full day the performance feels really bad.

I know that saying the performance feels bad doesn't mean much so I ran some programs to try to quantify it.

I used rarcrack to brute force passwords against a rar archive with a long password. To me it's a good example of CPU throughput that doesn't use much memory. On a really slow Linux on the same box (openSUSE 11.1) I can test about 750 passwords per second. On the same archive on the same box running FreeBSD AMD64 it only processes 22 passwords a second. I don't understand how this can be happening.

I built ubench from ports and the results were:

Ubench CPU:   761518
Ubench MEM:   255535
--------------------
Ubench AVG:   508526

According to the list published on phystech these numbers look pretty good. But the system feels extremely sluggish (applications take forever to load) and there are other performance problems. Most of my downloads building ports die in the middle, it took me ages to get things built.

I don't have much disk space left on this box but I left a primary partition on one of the drives so I may try to install i386 again run ubench on that arch to see if it makes any difference.

Any ideas, fellas?

Cheers,
Randall


----------



## hitest (Jan 11, 2009)

Have you tried running:

# portsclean -C

Is your /usr directory maxed out?  What does df -h show?


----------



## kamikaze (Jan 12, 2009)

I suppose your hard disk mode is misdetected.

Run [cmd=atacontrol]mode <dev>[/cmd] to check the detected mode. You can force change it with that command if the mode is wrong. On my system the output looks like that:
	
	



```
# atacontrol mode ad4
current mode = SATA150
```

I have this line in my /etc/rc.local file, because my DVD-Burner is wrongly detected as *PIO4*:
	
	



```
/sbin/atacontrol mode acd0 WDMA2
```

The atacontrol(8) manual page states the available modes (apart from the SATA modes).


----------



## randux (Jan 13, 2009)

hitest said:
			
		

> Have you tried running:
> 
> # portsclean -C
> 
> Is your /usr directory maxed out?  What does df -h show?



It's a new install with PLENTY of disk space.


----------



## randux (Jan 13, 2009)

kamikaze said:
			
		

> I suppose your hard disk mode is misdetected.
> 
> Run [cmd=atacontrol]mode <dev>[/cmd] to check the detected mode. You can force change it with that command if the mode is wrong. On my system the output looks like that:
> 
> ...



Hi thanks for your idea, I think you are on the right track. ubench shows very high numbers but the system still feels very slow and doesn't give much throughput on rarcrack.

I checked and it's running SATA150 mode.

I tried new installs both i386 and AMD64 with and without softdep and I can still only test 22-25 passwords/second. 

Anything else to check, guys?


----------



## kamikaze (Jan 13, 2009)

`# dd bs=1m if=/dev/zero of=test count=1024`
`# dd bs=1m if=test of=/dev/null`

You can check your file system read and write performance to get a clue weather this is a HD problem. The read command (2nd one) will read from the cache, so you might want to reboot and run it a second time to get your read speed for uncached data.


----------



## randux (Jan 13, 2009)

It's certainly not a hardware problem- it's contained within FreeBSD. I documented the performance difference running openSUSE on the same box.

Not sure where to look next.


----------



## randux (Jan 13, 2009)

I will look in ports/benchmarks to see if there's some filesystem benchmarking.

Interestingly and unrelated, i386 runs significantly slower Ubench on the same box.


----------



## richardpl (Jan 13, 2009)

Can you provide more details.
vmstat -i, uptime, top, ...


----------



## SaveTheRbtz (Jan 13, 2009)

Maybe you could try old school 4BSD scheduler?

PS. And How you built rarcrack? from ports?


----------



## randux (Jan 13, 2009)

Yes, rarcrack from ports. Is it a bad test because of the threading in BSD?


----------



## randux (Jan 13, 2009)

richardpl said:
			
		

> Can you provide more details.
> vmstat -i, uptime, top, ...



I'm not sure what you are asking for here with uptime and top. I'll post vmstat -i in a few minutes, running benchmarks now.


----------



## randux (Jan 13, 2009)

This is an Intel E8400 Core 2 Duo box on MSI motherboard, 4G RAM, Seagate Barracuda 7200.11 drives.

Some benchmarks:

Bonnie 2.0.6 from ports


```
-------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
          100 112685 37.4 95438  7.4 177180 17.3 298364 97.5 3253065 117.3 328569.1 188.1
```

UnixBench 4.1


```
#    #  #    #  #  #    #          #####   ######  #    #   ####   #    #
   #    #  ##   #  #   #  #           #    #  #       ##   #  #    #  #    #
   #    #  # #  #  #    ##            #####   #####   # #  #  #       ######
   #    #  #  # #  #    ##            #    #  #       #  # #  #       #    #
   #    #  #   ##  #   #  #           #    #  #       #   ##  #    #  #    #
    ####   #    #  #  #    #          #####   ######  #    #   ####   #    #

                 4        1           Based on the Byte Magazine Unix Benchmark
                44       11
   v   v       4 4        1
    v v       44444       1           v4.1 revisions mostly by David C. Niemi,
     v           4   o   111          Reston, VA, USA  <niemi@tux.org>
 


Dhrystone 2 using register variables  1 2 3 4 5 6 7 8 9 10

Double-Precision Whetstone  1 2 3 4 5 6 7 8 9 10

System Call Overhead  1 2 3 4 5 6 7 8 9 10

Pipe Throughput  1 2 3 4 5 6 7 8 9 10

Pipe-based Context Switching  1 2 3 4 5 6 7 8 9 10

Process Creation  1 2 3

Execl Throughput  1 2 3

Filesystem Throughput 1024 bufsize 2000 maxblocks  1 2 3

Filesystem Throughput 256 bufsize 500 maxblocks  1 2 3

Filesystem Throughput 4096 bufsize 8000 maxblocks  1 2 3

Shell Scripts (1 concurrent)  1 2 3
Shell Scripts (8 concurrent)  1 2 3
Shell Scripts (16 concurrent)  1 2 3

Arithmetic Test (type = short)  1 2 3

Arithmetic Test (type = int)  1 2 3

Arithmetic Test (type = long)  1 2 3

Arithmetic Test (type = float)  1 2 3

Arithmetic Test (type = double)  1 2 3

Arithoh  1 2 3

C Compiler Throughput  1 2 3

Dc: sqrt(2) to 99 decimal places  1 2 3

Recursion Test--Tower of Hanoi  1 2 3

==============================================================

  BYTE UNIX Benchmarks (Version 4.1.0)
  System -- localhost.invalid.org
  Start Benchmark Run: Tue Jan 13 16:43:08 UTC 2009
   2 interactive users.
   4:43PM  up  1:13, 2 users, load averages: 0.00, 0.08, 0.40
  -r-xr-xr-x  1 root  wheel  132064 Jan  1 07:48 /bin/sh
  /bin/sh: ELF 64-bit LSB executable, x86-64, version 1 (FreeBSD), for FreeBSD 7.1, dynamically linked (uses shared libs), FreeBSD-style, stripped
  /dev/ad4s1d    16244334 4116138 10828650    28%    /usr
Dhrystone 2 using register variables     17392567.4 lps   (10.0 secs, 10 samples)
Double-Precision Whetstone                 3985.7 MWIPS (9.8 secs, 10 samples)
System Call Overhead                     1139565.4 lps   (10.0 secs, 10 samples)
Pipe Throughput                          1415830.9 lps   (10.0 secs, 10 samples)
Pipe-based Context Switching             279496.7 lps   (10.0 secs, 10 samples)
Process Creation                          11042.3 lps   (30.0 secs, 3 samples)
Execl Throughput                           3052.1 lps   (29.8 secs, 3 samples)
File Read 1024 bufsize 2000 maxblocks    1115016.0 KBps  (30.0 secs, 3 samples)
File Write 1024 bufsize 2000 maxblocks    75095.0 KBps  (30.0 secs, 3 samples)
File Copy 1024 bufsize 2000 maxblocks     76621.0 KBps  (30.0 secs, 3 samples)
File Read 256 bufsize 500 maxblocks      300892.0 KBps  (30.0 secs, 3 samples)
File Write 256 bufsize 500 maxblocks     112166.0 KBps  (30.0 secs, 3 samples)
File Copy 256 bufsize 500 maxblocks      113295.0 KBps  (30.0 secs, 3 samples)
File Read 4096 bufsize 8000 maxblocks    2511317.0 KBps  (30.0 secs, 3 samples)
File Write 4096 bufsize 8000 maxblocks   103730.0 KBps  (30.0 secs, 3 samples)
File Copy 4096 bufsize 8000 maxblocks     90873.0 KBps  (30.0 secs, 3 samples)
Shell Scripts (1 concurrent)               4462.8 lpm   (59.5 secs, 3 samples)
Shell Scripts (8 concurrent)                802.3 lpm   (59.5 secs, 3 samples)
Shell Scripts (16 concurrent)               421.2 lpm   (59.5 secs, 3 samples)
Arithmetic Test (type = short)           2875767.9 lps   (10.0 secs, 3 samples)
Arithmetic Test (type = int)             2909285.8 lps   (10.0 secs, 3 samples)
Arithmetic Test (type = long)            809862.3 lps   (10.0 secs, 3 samples)
Arithmetic Test (type = float)           2428436.8 lps   (10.0 secs, 3 samples)
Arithmetic Test (type = double)          1483392.5 lps   (10.0 secs, 3 samples)
Arithoh                                  428084233.6 lps   (10.0 secs, 3 samples)
C Compiler Throughput                      2163.1 lpm   (59.8 secs, 3 samples)
Dc: sqrt(2) to 99 decimal places         238913.3 lpm   (30.0 secs, 3 samples)
Recursion Test--Tower of Hanoi           186968.7 lps   (20.0 secs, 3 samples)


                     INDEX VALUES            
TEST                                        BASELINE     RESULT      INDEX

Dhrystone 2 using register variables        116700.0 17392567.4     1490.4
Double-Precision Whetstone                      55.0     3985.7      724.7
Execl Throughput                                43.0     3052.1      709.8
File Copy 1024 bufsize 2000 maxblocks         3960.0    76621.0      193.5
File Copy 256 bufsize 500 maxblocks           1655.0   113295.0      684.6
File Copy 4096 bufsize 8000 maxblocks         5800.0    90873.0      156.7
Pipe Throughput                              12440.0  1415830.9     1138.1
Pipe-based Context Switching                  4000.0   279496.7      698.7
Process Creation                               126.0    11042.3      876.4
Shell Scripts (8 concurrent)                     6.0      802.3     1337.2
System Call Overhead                         15000.0  1139565.4      759.7
                                                                 =========
     FINAL SCORE                                                     665.1
```

vmstat -i 


```
interrupt                          total       rate
irq1: atkbd0                        3249          0
irq6: fdc0                            14          0
irq12: psm0                        67835          8
irq18: re0 uhci2                     309          0
irq19: uhci1+                    3325168        432
cpu0: timer                     15478820       2014
cpu1: timer                     15478379       2014
Total                           34353774       4471
```

Ubench


```
Unix Benchmark Utility v.0.3
Copyright (C) July, 1999 PhysTech, Inc.
Author: Sergei Viznyuk <sv-obfuscated-mailaddr@phystech.com>
http://www.phystech.com/download/ubench.html
FreeBSD 7.1-RELEASE FreeBSD 7.1-RELEASE #0: Thu Jan  1 08:58:24 UTC 2009     root-obfuscated-mailaddr@driscoll.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64
Ubench CPU:   761053
Ubench MEM:   254555
--------------------
Ubench AVG:   507804
```


----------



## richardpl (Jan 13, 2009)

Ah, my results are worse: I got 4-5 pwds/sec and 3 pwds/sec with only one thread.


----------



## Anonymous (Jan 14, 2009)

It is not help for you but my experience with 7.1 on 386 system is that is machine slower as on 7.0. I don't know if is a problem because I have /usr and /var gjournal or something else.
For example: on 7.0 I install from ports OpenOffice 3.0 about 6-7 hours on 7.1 12!! Before if I were compailling something and athe same time working on the KDE it was not a problem, on 7.1 is masochism. I have the same configuration and settings as before on 7.0.


----------



## trasz@ (Jan 14, 2009)

@randux: Could you please paste the output of the following two commands (running in two separate terminals) when the problematic test is running?  I.e., run "vmstat 10", "iostat 10", then switch to another terminal, wait 20 seconds, run the problematic program and leave it running for a few minutes.  Then paste the vmstat and iostat output here.  Commands:

vmstat 10

iostat 10


----------



## randux (Jan 14, 2009)

lumiwa said:
			
		

> It is not help for you but my experience with 7.1 on 386 system is that is machine slower as on 7.0. I don't know if is a problem because I have /usr and /var gjournal or something else.
> For example: on 7.0 I install from ports OpenOffice 3.0 about 6-7 hours on 7.1 12!! Before if I were compailling something and athe same time working on the KDE it was not a problem, on 7.1 is masochism. I have the same configuration and settings as before on 7.0.



That's really an incredible difference. I hope the devs will  look at all these posts and fix the problem. Thanks for your post.


----------



## randux (Jan 14, 2009)

trasz@ said:
			
		

> @randux: Could you please paste the output of the following two commands (running in two separate terminals) when the problematic test is running?  I.e., run "vmstat 10", "iostat 10", then switch to another terminal, wait 20 seconds, run the problematic program and leave it running for a few minutes.  Then paste the vmstat and iostat output here.  Commands:
> 
> vmstat 10
> 
> iostat 10



Hi, here is the info:

vmstat @ http://randux.pastebin.com/m44c1235d
iostat @ http://randux.pastebin.com/m697766a8

Thank you.


----------



## randux (Jan 14, 2009)

SaveTheRbtz said:
			
		

> Maybe you could try old school 4BSD scheduler?
> 
> PS. And How you built rarcrack? from ports?



Can you revert to the 4BSD scheduler without rebuilding the kernel? How do you do it?


----------



## cajunman4life (Jan 14, 2009)

randux said:
			
		

> Can you revert to the 4BSD scheduler without rebuilding the kernel? How do you do it?



Nope, unfortunately you'll have to set the option in the kernel config file and re-build the kernel.


----------



## randux (Jan 14, 2009)

Thanks for the info. I may have to pull down stable from source anyway to fix lack of direct rendering for my chipset, so maybe I will get my hands dirty and try to learn a little FreeBSD.


----------



## richardpl (Jan 14, 2009)

I run rarcrack uner truss and I have also look its source code.
Most of time it is vforking unrar and waiting for results, allocating end freeing memory all the time.


----------



## randux (Jan 14, 2009)

I just installed 7.0-RELEASE-AMD64 and I get the same poor rarcrack performance so I think we can rule out the scheduler changes in 7.1.

It may be waiting under FreeBSD because there's a performance problem in unrar? Or fork or malloc/free is slow on FreeBSD?

On openSUSE on the same box it runs 33x faster. There is something wrong here.


----------



## SaveTheRbtz (Jan 17, 2009)

Kinda crazy, but can you try to compile rarcrack / unrar with tcmalloc ?


----------



## randux (Jan 17, 2009)

Maybe if I knew more about what you are saying 

Do I just global change all occurrences of malloc to tcmalloc?


----------



## randux (Jan 23, 2009)

I tested rarcrack on OpenBSD 4.4 AMD64 and I get almost exactly the same bad performance as I do on FreeBSD 7.1. Now I'm more confused then ever.

SaveTheRbtz can you explain your last post a bit more please?


----------

