# FreeBSD 8.2 gets slower over time.



## olav (Mar 1, 2011)

After I upgraded to FreeBSD 8.2 I'm seeing a strange behaviour where my system will after a few days of uptime start throttling and will eventually stop responding.

While it's throttling I can use the system, though it takes like 10 seconds before what I type will show on screen. I can't find any process which is under heavy load, nor are there any memory leaks. I can't find anything in the logs either.

The really strange thing though is that when I connect a screen directly to the server I see the flying chuck screensaver which work smooth. The system respond fast, right until that moment I try to login. After I've typed in the username and hit enter the system stops responding. 

Have anyone had a similar problem before? What can I do?


----------



## Zare (Mar 1, 2011)

It's a hardware issue, probably. Keep in mind that you get disk I/O while trying to login. Have you checked SMART attributes of your HDD?


----------



## olav (Mar 1, 2011)

Yeah, they seem fine. I've even mirrored them.


----------



## User23 (Mar 1, 2011)

This sounds a little bit like a dead lock. The strange thing is, that the system becomes slowly throttled. Maybe you should build a kernel with debugging options to figure it out.


----------



## oliverh (Mar 1, 2011)

This sounds like ... almost no data. Dmesg, logs (if possible), config ... etc. pp. Otherwise it's just wild guessing.


----------



## zeissoctopus (Mar 1, 2011)

How did you upgrade your base system? Using buildworld or freebsd-update? Are you sure any system scripts and configuration files in /etc are up-to-date?


----------



## nekoexmachina (Mar 1, 2011)

Hello, olav!
I've had similar problem on my desk with Radeon (x1950: its r500? if I remember correctly) in KDE4 (both with and without compositing), but not in KDE3 or non-de WMs.


----------



## phoenix (Mar 1, 2011)

If you don't have a monitor connected to the system, disable the console screen savers.  All they are doing is wasting CPU/RAM/video resources.  No point, if you can't see them.  Just use the blank_saver.ko is you really need one; or simple configure the BIOS to turn off the video output after 15 minutes or whatever.

To help diagnose this, you should connect a monitor to the system, disable all screen savers and power saving, then login on separate virtual consoles and leave running:

nothing, this is to catch console messages
top(1)
gstat(8)
net-mgmt/iftop
tail(1) -f of logs like /var/log/messages
misc/gnu-watch running every 10-15 seconds outputting *vmstat -i*
anything else that may be helpful
That way, when things slow down, you can just flip through the virtual consoles (ALT+F1 through ALT-F7) to get a snapshot of how the system is running, without having to login.


----------



## olav (Mar 2, 2011)

I used freebsd-update. I don't think there are any special configuration in /etc causing this.
It is a pure server, with no x-server.

Hey, I like the flying chuck screen saver. Everytime I see him, I feel proud as a FreeBSD user 
My server is mostly idling and that screen saver doesn't steal that many cpu cycles 

I've configured different virtual consoles as you suggested and will come back with more info when it happens again.


----------



## jb_fvwm2 (Mar 2, 2011)

Not_relevant maybe, but if that server motherboard has onboard graphics, if you put in an aftermarket video card a *slight* chance the situation will improve.


----------



## olav (Mar 2, 2011)

Okey it happened again right now. Gstat showed me that the two mirrored OS disks have 100% load. I rebooted and now its fine again. What could be causing this? Gmirror status said that the mirror was okey.


----------



## aragon (Mar 3, 2011)

Flakey disks and/or controller?

I guess doing some SMART self tests with sysutils/smartmontools is a start.


----------



## olav (Mar 3, 2011)

I don't belive so as I use two different controllers and smart tests doesn't say anything.


----------



## Pushrod (Mar 3, 2011)

It's not fsck running, or another disk thrasher, is it?


----------



## olav (Mar 4, 2011)

I have no idea, how can I check that? Wouldn't fsck show in the log?


----------



## _martin (Mar 4, 2011)

As @phenix mentioned - what did gstat reported when you hit 100% disk utilization (which FS was busy)? What did top output say during that time? Did you verify the time when this started (maybe cron or periodic related) ?

You can use:
`$ ps ax | grep fsck`
to verify if fsck is running.


----------



## olav (Mar 5, 2011)

It's the swap partition which is causing this problem. Should I try to disable it?


----------



## _martin (Mar 5, 2011)

I would not do that if I were you. Rather check what is actually using your swap. Sort the top output by size: 

`# top -o size` 
and check what is eating so much memory.

You can use `# ps auxwww | awk '$8 ~ /.W.*/ { print $0}'` to check swapped processes (once found this command in FreeBSD mailing lists).


----------



## Pushrod (Mar 5, 2011)

Is the swap partition being used heavily? If so, you have something (or may things) using more memory than you have in the machine. You will definitely notice a slowdown if so.

What does this machine do all day?


----------



## olav (Mar 6, 2011)

The thing is, top show no activity. There are no visible processes causing the swap partition to overload. The server mostly idle, it runs a few jails, dns, ldap, ssh. Only the dns and ssh jails is exposed to the internet. It also act as a fileserver with ZFS. The server has 6GB ram, I've configured /boot/loader.conf with the vm.kmem_size="9G" property.

I get this output when I check swapped processes

```
[olav@zpool ~]$ ps auxwww | awk '$8 ~ /.W.*/ { print $0}'
root    124  0.0  0.0  2804     0  ??  IWs  -         0:00.00 adjkerntz -i
root   1044  0.0  0.0 16652     0  ??  IW   -         0:00.00 /usr/local/sbin/smartd -p /var/run/smartd.pid -c /usr/local/etc/smartd.conf
root   1591  0.0  0.0 38228     0  ??  IWs  -         0:00.00 sshd: olav [priv] (sshd)
smmsp  2602  0.0  0.0 12192     0  ??  IWs  -         0:00.00 sendmail: Queue runner@00:30:00 for /var/spool/clientmqueue (sendmail)
root   2609  0.0  0.0  8012     0  ??  SWs  -         0:00.00 /usr/sbin/cron -s
```

[CMD=""]top -o size[/CMD]
show this:

```
last pid: 18456;  load averages:  0.05,  0.01,  0.00  up 0+14:25:52  11:11:44
36 processes:  1 running, 35 sleeping
CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
Mem: 2228K Active, 56K Inact, 1164M Wired, 8640K Cache, 623M Buf, 144M Free
Swap: 4096M Total, 15M Used, 4081M Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
 1600 olav          1  44    0 38228K   548K select  1   0:00  0.00% sshd
 1591 root          1  44    0 38228K     0K sbwait  0   0:00  0.00% <sshd>
17370 root          1  44    0 34332K   704K select  0   0:01  0.00% smbd
 1068 root          1  44    0 34112K   160K select  0   0:00  0.00% smbd
 1072 root          1  44    0 34112K   116K select  0   0:00  0.00% smbd
 2756 root          1  44    0 26336K   132K select  1   0:00  0.00% winbindd
 1114 root          1  44    0 26308K   120K select  1   0:00  0.00% winbindd
18441 root          1  59    0 26260K  1004K select  0   0:00  0.00% sshd
 1073 root          1  44    0 26208K   176K select  0   0:00  0.00% winbindd
 2757 root          1  44    0 26196K   120K select  0   0:00  0.00% winbindd
 1062 root          1  44    0 24108K   608K select  0   0:02  0.00% nmbd
 1044 root          1  44    0 16652K     0K nanslp  0   0:00  0.00% <smartd>
 1601 olav          1  47    0 13356K     0K wait    0   0:00  0.00% <bash>
 2596 root          1  44    0 12192K   540K select  0   0:01  0.00% sendmail
 2602 smmsp         1  44    0 12192K     0K pause   0   0:00  0.00% <sendmail>
18454 olav          1  44    0  9408K   968K CPU0    0   0:00  0.00% top
  888 root          1  44    0  8012K   112K select  1   0:00  0.00% rpcbind
 2609 root          1  53    0  8012K     0K nanslp  0   0:00  0.00% <cron>
  866 root          1  44    0  7084K   156K select  0   0:00  0.00% syslogd
 1195 root          1  76    0  7020K    56K select  1   0:00  0.00% rsync
 1003 root          1  44    0  6952K    72K select  0   0:00  0.00% mountd
 2681 root          1  76    0  6952K    72K ttyin   0   0:00  0.00% getty
```

This is information which is available when the system starts throttling.
I should also mention that I've also noticed now that the /usr partition also show some activity when the system overuse the swap folder.

After reboot top show something interesting

```
last pid:  3277;  load averages:  0.05,  0.01,  0.00   up 0+00:39:05  12:21:15
80 processes:  1 running, 79 sleeping
CPU:  0.0% user,  0.0% nice,  0.4% system,  0.8% interrupt, 98.9% idle
Mem: 71M Active, 40M Inact, 1558M Wired, 428K Cache, 30M Buf, [color="Red"]4187M Free[/color]
Swap: 4096M Total, 4096M Free
```


----------



## aragon (Mar 6, 2011)

Well, something is truly strange.  You have 6 GB of RAM, but your first top output doesn't indicate more than about 2 GB...


----------



## _martin (Mar 6, 2011)

Indeed it seems you've "lost" some memory between reboots. I bet you have bloody lot of swapping due to ZFS and very low memory. 
Check if your system detects memory correctly each time: 

`#  grep -i "real memory" /var/log/dmesg.*`

You can also use sysutils/dmidecode from ports to check how the system seems memory banks and modules. 

e.g. you can use:
`# dmidecode --type=16,17`
to list memory banks (Physical Memory Array) and it's modules (Memory Device).

You should reseat memory modules and do a memtest+ check to verify you have no (further) HW problem.


----------



## Galactic_Dominator (Mar 7, 2011)

aragon said:
			
		

> Well, something is truly strange.  You have 6 GB of RAM, but your first top output doesn't indicate more than about 2 GB...


Yes, finally data that was asked for so long ago.

Usually this type of symptom can be resolved by a BIOS update.


----------



## aragon (Mar 7, 2011)

Considering the OP is using ZFS and Samba on 8.2, could the problem be (1b) on this?


----------



## Galactic_Dominator (Mar 7, 2011)

Well if they haven't disabled sendfile it's a guarantee.  And that patch doesn't resolve all ZFS sendfile issues, it should still be disabled.  The was a recent thread on stable@ for anyone interested.  However, that would have nothing to do with the limited amount of RAM made available to the system which is a separate problem, pretty common on re-purposed Dell's but not limited to them.

That's why when dmesg was requested and not given, it greatly extents the time to resolution.


----------



## olav (Mar 7, 2011)

Yes, the real memory is detected each time.
[CMD="grep -i "real memory" /var/log/dmesg.*"][/CMD]

```
/var/log/dmesg.today:real memory  = 6442450944 (6144 MB)
/var/log/dmesg.yesterday:real memory  = 6442450944 (6144 MB)
/var/log/dmesg.yesterday:real memory  = 6442450944 (6144 MB)
/var/log/dmesg.yesterday:real memory  = 6442450944 (6144 MB)
```

dmidecode:

```
# dmidecode 2.10
SMBIOS 2.4 present.

Handle 0x001B, DMI type 16, 15 bytes
Physical Memory Array
        Location: System Board Or Motherboard
        Use: System Memory
        Error Correction Type: None
        Maximum Capacity: 4 GB
        Error Information Handle: Not Provided
        Number Of Devices: 4

Handle 0x001C, DMI type 17, 27 bytes
Memory Device
        Array Handle: 0x001B
        Error Information Handle: Not Provided
        Total Width: 64 bits
        Data Width: 64 bits
        Size: 1024 MB
        Form Factor: DIMM
        Set: None
        Locator: A0
        Bank Locator: Bank0/1
        Type: Unknown
        Type Detail: None
        Speed: 667 MHz
        Manufacturer:  
        Serial Number:  
        Asset Tag:  
        Part Number:  

Handle 0x001D, DMI type 17, 27 bytes
Memory Device
        Array Handle: 0x001B
        Error Information Handle: Not Provided
        Total Width: 64 bits
        Data Width: 64 bits
        Size: 2048 MB
        Form Factor: DIMM
        Set: None
        Locator: A1
        Bank Locator: Bank2/3
        Type: Unknown
        Type Detail: None
        Speed: 800 MHz
        Manufacturer:  
        Serial Number:  
        Asset Tag:  
        Part Number:  

Handle 0x001E, DMI type 17, 27 bytes
Memory Device
        Array Handle: 0x001B
        Error Information Handle: Not Provided
        Total Width: 64 bits
        Data Width: 64 bits
        Size: 1024 MB
        Form Factor: DIMM
        Set: None
        Locator: A2
        Bank Locator: Bank4/5
        Type: Unknown
        Type Detail: None
        Speed: 667 MHz
        Manufacturer:  
        Serial Number:  
        Asset Tag:  
        Part Number:  

Handle 0x001F, DMI type 17, 27 bytes
Memory Device
        Array Handle: 0x001B
        Error Information Handle: Not Provided
        Total Width: 64 bits
        Data Width: 64 bits
        Size: 2048 MB
        Form Factor: DIMM
        Set: None
        Locator: A3
        Bank Locator: Bank6/7
        Type: Unknown
        Type Detail: None
        Speed: 800 MHz
        Manufacturer:  
        Serial Number:  
        Asset Tag:  
        Part Number:
```

The system always starts with 6GB memory available in top after a reboot, but over time it disappears. Perhaps I should just try to upgrade to STABLE.
Bios is the latest version.


----------



## _martin (Mar 7, 2011)

I see you are using modules with different speeds: 667MHz and 800 MHz. This can be a problem - depending on your BIOS settings (what speed did you configure for all modules?). 

It is also interesting that BIOS reports maximum size to be 4GB (4x 1GB I guess).
What is the make/model of your motherboard? You can use dmidecode for that too:

`# dmidecode --type=2`

Strange that 6GB is reported upon boot (so one would suggest this amount is supported).
Did you see any memory related issues in syslog/dmesg?


----------



## olav (Mar 7, 2011)

I have this motherboard http://gigabyte.com/products/product-page.aspx?pid=2457#dl
Memory speed is 667mhz, nothing is overclocked.


----------



## olav (Mar 9, 2011)

I upgraded to FreeBSD 8-STABLE yesterday. My system now acts completely different and I always have plenty of memory available 

I'm pretty sure I had problems with the disappearing memory issue -> http://blog.vx.sk/archives/24-Backported-patches-for-FreeBSD-82-RELEASE.html


----------



## bestwc (Mar 10, 2011)

olav said:
			
		

> I upgraded to FreeBSD 8-STABLE yesterday. My system now acts completely different and I always have plenty of memory available
> 
> I'm pretty sure I had problems with the disappearing memory issue -> http://blog.vx.sk/archives/24-Backported-patches-for-FreeBSD-82-RELEASE.html



Did you apply those patches?


----------



## olav (Mar 11, 2011)

They're already in FreeBSD 8.2-STABLE, if I'm not mistaken?
Anyway the server is still going strong, everything is now fine again


----------



## chrcol (Mar 15, 2011)

I don't understand the logic behind deliberately not fixing known bugs in a release, and then putting them in the code after.  Basically they released it with known bugs  standards dropping again 

If it meant delaying 8.2 for another couple of weeks then do it.  I now find myself probably following STABLE again instead of RELEASE as I suspect these wont be patched in via ERRATA.


----------



## wblock@ (Mar 15, 2011)

chrcol said:
			
		

> I dont understand the logic behind deliberatly not fixing known bugs in a release, and then putting them in the code after.  Basically they released it with known bugs  standards dropping again



The whole point of a release is that it's a known state.  It's been tested as a unit, everything is synced, it's a fixed point in time.  The standards haven't dropped, it's the same as it was.  (And "again" is unfair to FreeBSD.)

Of course there will be bugs discovered shortly after release.  Or minor bugs might be discovered shortly before release.  There isn't any software that is bug-free.

If you want ongoing bug fixes, follow -STABLE.  There are a lot of people using it and problems are found and fixed quickly.


----------



## Galactic_Dominator (Mar 15, 2011)

chrcol said:
			
		

> I dont understand the logic behind deliberatly not fixing known bugs in a release, and then putting them in the code after.  Basically they released it with known bugs  standards dropping again


Actually it's you that doesn't even have a standard for accuracy.  Not fixing bugs found after the tree enters it's slush phase has been SOP for this OS, and pretty much every other as well.

Of course, if you're so dissatisfied with it, you could help testing out earlier and provide some patches.


----------



## chrcol (Mar 21, 2011)

So why haven't patches that have been provided been submitted and made into an ERRATA for the release? Same question really for release 8.1 that was shipped with a zfs bug that stopped booting of the 2nd hdd in mirror when first failed. These are nasty bugs not been fixed in release code.  Years ago they the sort of things that used to get patched in.


----------



## wblock@ (Mar 22, 2011)

Things don't happen magically, with FreeBSD or any other software.  So when "Somebody ought to fix this!" is said, I say "How about you?  You're somebody!"  So if there are problems, like missing errata, submit a PR.  It's one of the great opportunities of FreeBSD that fixing a problem you're having also helps lots of other people.


----------



## sossego (Mar 22, 2011)

Wblock is right; RELEASE and SNAPSHOT are for testing and building. 
If you want a stable system from such, then the mailing lists are where to look.
Volunteering, if you are able, is a good suggestion from wblock.


----------

