# FreeBSD 9 on ESXi 5, clock stops



## dvdmandt (May 9, 2012)

Hi, I have a FreeBSD 9.0 machine, GENERIC kernel on which the clock stops ticking after a seemingly random amount of time (days, weeks, minutes, ..). The system doesn't hang, most systems and services appear to work fine, it's just that the clock doesn't move forward. Things like cron therefore stop working. Running *date* just returns the same string over and over. I also can't reboot, it just hangs when I try that.

I've disabled ntpd, both on the host and guest. There's another VM running on the host, a Windows server, and it does not have this issue.

A quick google suggested other people also having this problem, but I didn't see a solution (or a cause for that matter).

Does anyone have any ideas?


----------



## npl (May 10, 2012)

I am also having this problem, on different types of hardware, different types of storage, on different clusters. 

I am also seeing this problem on 8.1, 8.2 and 8.3 so it is not limited to 9.0.


----------



## bach (May 10, 2012)

Hi,

I have the same issue. My system: 9.0-STABLE has been updated to latest at Mar 13. I've found a suggestion to set the sysctl variable kern.eventtimer.periodic=1, I've tried. Unfortunately, it didn't solve the problem.


----------



## xzkto (May 10, 2012)

Hi, 

I can also confirm this problem with FreeBSD 8.2-RELEASE. 
The only thing relevant to time freeze are the following lines from vmware log (before each time freeze):


```
2012-04-20T06:08:53.612Z| vmx| GuestRpcSendTimedOut: message to toolbox timed out.
2012-04-20T06:09:08.612Z| vmx| GuestRpcSendTimedOut: message to toolbox timed out.
2012-04-20T06:09:08.612Z| vmx| GuestRpc: app toolbox's second ping timeout; assuming app is down
2012-04-20T06:09:08.613Z| vmx| GuestRpc: Reinitializing Channel 0(toolbox)
2012-04-20T06:09:08.613Z| vmx| GuestMsg: Channel 0, Cannot unpost because the previous post is already completed
2012-04-20T06:09:08.613Z| vmx| GuestRpc: Channel 0 reinitialized.
2012-04-20T06:09:08.613Z| vmx| GuestRpc: Channel 0 reinitialized.
2012-04-20T06:12:08.616Z| vmx| GuestRpcSendTimedOut: message to toolbox timed out.
2012-04-20T06:12:08.616Z| vmx| Vix: [4537798 guestCommands.c:2194]: Error VIX_E_TOOLS_NOT_RUNNING in VMAutomationTranslateGuestRpcError(): VMware Tools are not running in the guest
```


----------



## bach (May 10, 2012)

Btw, ntpd is not running, I've enabled timesync with esxi via open-vm-tools (to be clear, it's also not solved the problem).


----------



## k-nike (May 11, 2012)

*T*ry:
`echo "kern.hz="100"" >> /boot/loader.conf`


----------



## bach (May 12, 2012)

Sorry, forgot to mention. Already tried this.


----------



## Savagedlight (May 13, 2012)

Try setting up a test VM, verify the problem exists in this VM, then recompile the kernel with the 4BSD scheduler and see if it helps.

The reason I'm suggesting this, is I've had issues with random processes not being assigned any CPU cycles when running FBSD FreeBSD 9 under ESXi 5, using the default kernel (ULE scheduler), and recompiling with the 4BSD scheduler solved this issue.

/usr/src/sys/<arch>/conf/VMWARETEST

```
include         GENERIC
ident           VMWARETEST

nooptions       SCHED_ULE
options         SCHED_4BSD
```


----------



## joel@ (May 13, 2012)

FWIW, I have a large set of virtual machines running FreeBSD 8.2 amd64 on ESX 4.1 and I'm not seeing this.

Every VM is configured with 1 vCPU and 3GB RAM. ntpd is running. The official VMware Tools package is installed (no open-vm tools). Kernel is GENERIC, no special sysctls or kern.hz configuration.


----------



## npl (May 13, 2012)

joel@ said:
			
		

> FWIW, I have a large set of virtual machines running FreeBSD 8.2 amd64 on ESX 4.1 and I'm not seeing this.
> 
> Every VM is configured with 1 vCPU and 3GB RAM. ntpd is running. The official VMware Tools package is installed (no open-vm tools). Kernel is GENERIC, no special sysctls or kern.hz configuration.



I can confirm the same - the problem only seems to occur on ESXi 5.0 and 5.0U1 and is easy to reproduce


----------



## npl (May 14, 2012)

I can also confirm the problem occurs on FreeBSD 7.2 amd64.

The triggers seem to be heavy CPU & disk I/O, and/or creating snapshots under VMware.


----------



## joel@ (May 14, 2012)

Has anyone been able to trigger this on ESX 4? I'd like to be sure that the problem only affects ESXi 5.


----------



## xzkto (May 14, 2012)

npl said:
			
		

> I can also confirm the problem occurs on FreeBSD 7.2 amd64.
> 
> The triggers seem to be heavy CPU & disk I/O, and/or creating snapshots under VMware.



I have just tried to recreate that bug with a simple test: about 200 bash threads to load disk (simple dd, 100 with bs=1 that copied small files constantly, and 100 with default bs that copied big files) and about 200 bash threads to load CPU (simple infinite loops) that ran for a few hours - no success, clock still ticks. Does anyone have any reliable method to recreate this bug?


----------



## Zare (May 14, 2012)

I can confirm this doesn't happen in 4.0 / 4.1.


----------



## duncan2386 (May 14, 2012)

We are also experiencing this behaviour on ESXi 5 update 1 with FreeBSD 8.2-RELEASE.  This occurs with both SCHED_ULE and SCHED_4BSD, across several VMs on two different ESXi hosts.  The VMware VM logs are showing the same errors that xzkto [post 4] mentioned.

When we were still on ESXi 4.1 we never witnessed this behaviour.

Some VMs are running Squid/Dansguardian and others Postfix/MailScanner - both setups have exhibited this issue but it is more prevalent on those running Squid.


----------



## joel@ (May 15, 2012)

My suggestion would be for people with valid support contracts to contact VMware support and file a bug report.


----------



## frijsdijk (May 15, 2012)

Weird. I have never seen this issue! Running several 9.0 machines in ESXi 5.0. No specific configurations done to the OS.


----------



## ixdwhite (May 15, 2012)

For those of you experiencing the problem, please provide:


A description of the VMware host, including system manufacturer and model (if applicable), CPU model and count, HyperThreading/SMT state, physical RAM configuration, and a summary of the storage configuration;
Version of VMware ESXi installed, including the build number (i.e., 5.0.0 623860);
Summary of the VM configuration(s) that have experienced lockups, including CPU/core count, memory size, VM version, and OS type selected in the configuration;
If VMware Tools or open-vm-tools is installed and running in the VM experiencing lockups;
If vMotion is deployed at your site;
Any special tuning applied to the FreeBSD VMs having problems, which includes anything in loader.conf and sysctl.conf and any special kernel options if not running GENERIC.

I have the ability to run test cases but I need to know if my test hardware is representative of the systems having issues.

Thanks for any information you can provide.


----------



## bach (May 16, 2012)

Hi,

Fujitsu BX920 S2. 2 x Intel Xeon X5650. HyperThreading/SMT -  Active. 128G RAM. Storage configuration: LSI 1064E + NetApp v3240 NAS via NFSv4.
ESXi 5.0.0 515841
8 cpus (2 virtual sockets with 4 cores per socket). 32G RAM, VMVersion - 8. OS type selected - FreeBSD 64bit.
open-vm-tools-nox11-471268_1 is installed and running (we've experienced problems with and without open-vm-tools).
vMotion is deployed.


```
/boot/loader.conf:
zfs_load="YES"
vfs.root.mountfrom="zfs:zroot"
tmpfs_load="YES"
kern.hz="100"
vfs.zfs.txg.timeout="5"
vfs.zfs.vdev.min_pending="1"
vfs.zfs.vdev.max_pending="32"
vfs.zfs.vdev.cache.size="64m"
vfs.zfs.cache_flush_disable="1"
vfs.zfs.arc_max="1G"

/etc/sysctl.conf:
kern.ipc.somaxconn=1024
net.inet.ip.intr_queue_maxlen=1000
kern.maxvnodes=250000
kern.maxfiles=65536
kern.eventtimer.periodic=1

Kernel options:
options         VFS_AIO
options         ZERO_COPY_SOCKETS
options         DIRECTIO
```

Let me know, if you need more information.
Thanks.


----------



## xzkto (May 16, 2012)

ixdwhite said:
			
		

> For those of you experiencing the problem, please provide:
> ....



Hi,

I'm not sure about some of the things you asked, but here is some info:

Supermicro X8DTL, 2 x Intel Xeon E5620 2.4Ghz (4 cores per socket), HyperThreading - Active (I don't know how to check SMT), 32 GB RAM, Storage configuration: local HDD 1 Tb sata-II 300 Western Digital RE3 WD1002FB with VMFS 5.54.
ESXi 5.0.0 469512
8 vcpus (4 virtual sockets with 2 cores per socket). 20Gb RAM, VMVersion - 8. OS type selected - FreeBSD 64bit.
No.
No.
[CMD=]cat /boot/loader.conf[/CMD]

```
vmxnet_load="YES"
vmxnet3_load="YES"
```

[CMD=]cat /etc/sysctl.conf[/CMD]

```
net.inet.carp.allow=1
net.inet.carp.preempt=1
net.inet.carp.log=1
net.inet.carp.arpbalance=1
```

[CMD=]diff GENERIC PF_CARP_CONFIG[/CMD]

```
device carp
device pf
device pflog
device pfsync

options         ALTQ
options         ALTQ_CBQ
options         ALTQ_RED
options         ALTQ_RIO
options         ALTQ_HFSC
options         ALTQ_PRIQ
options         ALTQ_NOPCC
```

If you need any more information - feel free to ask.


----------



## xzkto (May 18, 2012)

Time has just stopped again on one of our virtual servers, other virtual servers with same configuration are still working, even on the same physical server. This time (no pun intended) I decided to do some experiments instead of restart. 

I checked timeounters in sysctl and found out that kern.timecounter.tc.HPET.counter is not changing anymore. So, I changed kern.timecounter.choice from HPET (default) to ACPI-safe and time went on again. 

I searched all logs that I could find (/var/log*, dmesg and more) and found nothing interesting except that vmware-tools didn't really die (contrary to wmware logs I posted before) - it just went to sleep. When time started ticking again vmware-tools started working again, nothing interesting in the debug log.

Now I have a virtual server with one frozen timer (HPET), other timers (i8254, ACPI-safe, TSC) are still working (most are disabled, but their counters are changing). Does anyone know if it is safe to do as I did? Can this lead to a more serious crash than one more time freeze? 

P.S. If anyone wants to run some tests on this frozen timer - post your suggestions here. I will try to do it but I will not do anything dangerous - it is a production server.


----------



## ixdwhite (May 18, 2012)

Changing the timecounter at runtime is safe. Good to know that it seems to be related to HPET. That probably means the problems people are having with other FreeBSD releases are likely a different problem since 8.x does not typically use HPET as a timecounter.

Odd that APCI-safe was a choice for you. On my VMs it is showing ACPI-fast as the next preferred clock. Is your VM server particularly busy?

Here is the kern.timecounter.choice on my test VMs for this issue:

kern.timecounter.choice: TSC(-100) i8254(0) ACPI-fast(900) HPET(950) dummy(-1000000)

The kernel chooses the highest scored choice at boot unless overridden otherwise.

Are you (or anyone else experiencing the problem) using pmcstat(8) or similiar tools at runtime? These use the HPET hardware as well and could be conflicting or triggering a bug somewhere.


----------



## xzkto (May 18, 2012)

ixdwhite said:
			
		

> That probably means the problems people are having with other FreeBSD releases are likely a different problem since 8.x does not typically use HPET as a timecounter.



I didn't understand that - we are using FreeBSD 8.2-RELEASE with minimal kernel tuning (CARP, PF, etc.) and HPET is default for all our virtual servers.

I have no idea why we have ACPI-safe instead of ACPI-fast - it seems to be default on our hardware, even new default FreeBSD GENERIC install on same server still uses same counters. Our servers are not usually busy but they experience huge spikes a few hours a day. Our counters are: TSC(-100) i8254(0) ACPI-safe(850) HPET(900) dummy(-1000000) if I remember correctly. Sorry, maybe I didn't get this either - I'm not really an administrator, just a programmer.

We are not using pmcstat(8) directly, but we are using Zabbix for monitoring and it looks like something that Zabbix may call to generate some of its statistics. I will check it on Monday, off for the weekend now. 

Thank you for the ideas, I will try to find all software that we are using and check if any of it does low-level access to timecounters but I really doubt it. 

By the way, is it possible that this bug is somehow related to CPU P or S states (ESXI seems to have P-state option turned on by default)? Last time timecounter stopped our server was nearly idle (but maybe I missed some spike just before it did). I have no idea how it can be related, just remember reading something like it.


----------



## eezzeee (May 18, 2012)

Hi,

*W*e have been experiencing the same problem for a few days now. We have just upgraded to ESXi 5 two months ago and it was kind of odd that two machines just suddenly behaved this way. Changing the timer to ACPI-safe worked for us as well. It is interesting to note that 60% of our machines still use VMWare virtual Hardware Version 7 and those VMs (we use FreeBSD 7.4 and 8.1) all selected ACPI-safe as default. All VMs with Version 8 Hardware were set to HPET by default. The clock stopped only for two and both were Hardware Version 8 VMs.


----------



## ixdwhite (May 19, 2012)

I spot-checked several 8.x machines on real hardware here and they are all using ACPI-fast with HPET as the second choice. 

It wouldn't surprise me if the ACPI timecounter stability test (which decides whether to use ACPI-fast or ACPI-safe) was confused by heavily loaded systems, thought the method was unstable, and dropped back to ACPI-safe instead. Since ACPI-safe has a low base score, HPET would win out in those instances.

The VMware ESXi 5 HPET doesn't seem to implement actual performance counters so I suspect the emulation on ESXi 5 is not entirely stable and shifting to another timecounter is the appropriate workaround. Various places in FreeBSD already knows if its running on VMware; the acpi_hpet driver might need a similar check to drop its score if its running in that environment. 

I ran a buildworld with pmcstat enabled in one of the test VMs and it at least finished, though the counters all returned 0 for 'instructions' which should work on everything. (It did on real hardware.) 

If you have a VMware support contract it would be useful if you could open a bug on it just in case VMware is working on it or doesn't know there is a problem.


----------



## eezzeee (May 19, 2012)

We have opened a ticket for this and the issue has already been passed along up the chain to engineering. From what I understand VMWare has been informed regarding this issue from other customers as well and they are aware of this thread.


----------



## duncan2386 (May 22, 2012)

We have a support contract so raised a ticket, I had this reply today from VMware:



> I just wanted to get in touch with you to let you know that I've reviewed the logs and information you have provided. I've sent the details on to our Engineering team - it appears other customers are experiencing this issue and a case was only opened with Engineering last week regarding this issue. The same workaround you found (manually force the guest OS to use the ACPI-safe source) appears to be working for other customers as well.
> 
> We are in the process of drafting a KB article for this issue while Engineering work on a fix.



They have been very helpful and responsive throughout which is nice!


----------



## bach (May 23, 2012)

Good news. 
Thanks.


----------



## throAU (May 25, 2012)

joel@ said:
			
		

> FWIW, I have a large set of virtual machines running FreeBSD 8.2 amd64 on ESX 4.1 and I'm not seeing this.
> 
> Every VM is configured with 1 vCPU and 3GB RAM. ntpd is running. The official VMware Tools package is installed (no open-vm tools). Kernel is GENERIC, no special sysctls or kern.hz configuration.




I have a few VMs running under ESX 4.1 at the moment - FreeBSD 7.4 x86, FreeBSD 8.2 x64, and FreeBSD 8.1 x64.

None have experienced this issue, and my uptime is generally >180 days or so between reboots.  All kernels on them are GENERIC.

I am following this thread with interest, as I'm likely a month or so off upgrading to vSphere 5 here myself.


edit:
The 7.4 machine is running open-vm-tools, the 8.x machines are running VMware tools.


----------



## frijsdijk (Jun 10, 2012)

I just had the same issue.

Esxi 5.0
FreeBSD 9.0, 64bit, GENERIC kernel. open-vm-tools-471268_1 installed.


```
[root@srv03 /home/admin]# kldstat
Id Refs Address            Size     Name
 1   25 0xffffffff80200000 11cd9b0  kernel
 2    1 0xffffffff813ce000 203d70   zfs.ko
 3    2 0xffffffff815d2000 5c50     opensolaris.ko
 4    1 0xffffffff815d8000 a80      accf_data.ko
 5    1 0xffffffff815d9000 17d8     accf_http.ko
 6    1 0xffffffff81812000 159f     vmmemctl.ko
 7    1 0xffffffff81814000 c16e     ipfw.ko
 8    1 0xffffffff81821000 6dda     ipmi.ko
 9    1 0xffffffff81828000 889      smbus.ko

[root@srv03 /home/admin]# cat /boot/loader.conf
accf_http_load="YES"
accf_data_load="YES"
zfs_load="YES"

[root@srv03 /home/admin]# cat /etc/sysctl.conf
# $FreeBSD: release/9.0.0/etc/sysctl.conf 112200 2003-03-13 18:43:50Z mux $
#
#  This file is read when going to multi-user and its contents piped thru
#  ``sysctl'' to adjust kernel values.  ``man 5 sysctl.conf'' for details.
#

# Uncomment this to prevent users from seeing information about processes that
# are being run under another UID.
#security.bsd.see_other_uids=0
net.inet.ip.fw.dyn_buckets=65536
net.inet.ip.fw.dyn_max=65536
net.inet.ip.fw.dyn_ack_lifetime=120
vm.pmap.shpgperproc=1000
```

The (HPET) clock stopped ticking. Can login, but it's not serving requests. ntpd takes 100% load. 


```
PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
72566 root        1 102    0 22332K  2308K CPU2    2   4:53 100.00% ntpd
72396 www         1  21    0   297M 49968K select  0   0:02  0.98% httpd
```


```
[root@srv03 /home/admin]# date
Sun Jun 10 08:44:12 CEST 2012
[root@srv03 /home/admin]# date
Sun Jun 10 08:44:12 CEST 2012
[root@srv03 /home/admin]# date
Sun Jun 10 08:44:12 CEST 2012
[root@srv03 /home/admin]# date
Sun Jun 10 08:44:12 CEST 2012
[root@srv03 /home/admin]# date
Sun Jun 10 08:44:12 CEST 2012
[root@srv03 /home/admin]# date
Sun Jun 10 08:44:12 CEST 2012
[root@srv03 /home/admin]# date
```


```
[root@srv03 /home/admin]# sysctl kern.timecounter
kern.timecounter.tick: 1
kern.timecounter.choice: TSC(-100) i8254(0) ACPI-fast(900) HPET(950) dummy(-1000000)
kern.timecounter.hardware: HPET
kern.timecounter.stepwarnings: 0
kern.timecounter.tc.HPET.mask: 4294967295
kern.timecounter.tc.HPET.counter: 1392653989
kern.timecounter.tc.HPET.frequency: 14318180
kern.timecounter.tc.HPET.quality: 950
kern.timecounter.tc.ACPI-fast.mask: 16777215
kern.timecounter.tc.ACPI-fast.counter: 2995577
kern.timecounter.tc.ACPI-fast.frequency: 3579545
kern.timecounter.tc.ACPI-fast.quality: 900
kern.timecounter.tc.i8254.mask: 65535
kern.timecounter.tc.i8254.counter: 17227
kern.timecounter.tc.i8254.frequency: 1193182
kern.timecounter.tc.i8254.quality: 0
kern.timecounter.tc.TSC.mask: 4294967295
kern.timecounter.tc.TSC.counter: 1427630916
kern.timecounter.tc.TSC.frequency: 2266747000
kern.timecounter.tc.TSC.quality: -100
kern.timecounter.smp_tsc: 0
kern.timecounter.invariant_tsc: 1
[root@srv03 /home/admin]# sysctl kern.timecounter
kern.timecounter.tick: 1
kern.timecounter.choice: TSC(-100) i8254(0) ACPI-fast(900) HPET(950) dummy(-1000000)
kern.timecounter.hardware: HPET
kern.timecounter.stepwarnings: 0
kern.timecounter.tc.HPET.mask: 4294967295
kern.timecounter.tc.HPET.counter: 1392653989
kern.timecounter.tc.HPET.frequency: 14318180
kern.timecounter.tc.HPET.quality: 950
kern.timecounter.tc.ACPI-fast.mask: 16777215
kern.timecounter.tc.ACPI-fast.counter: 8039395
kern.timecounter.tc.ACPI-fast.frequency: 3579545
kern.timecounter.tc.ACPI-fast.quality: 900
kern.timecounter.tc.i8254.mask: 65535
kern.timecounter.tc.i8254.counter: 60099
kern.timecounter.tc.i8254.frequency: 1193182
kern.timecounter.tc.i8254.quality: 0
kern.timecounter.tc.TSC.mask: 4294967295
kern.timecounter.tc.TSC.counter: 326655140
kern.timecounter.tc.TSC.frequency: 2266747000
kern.timecounter.tc.TSC.quality: -100
kern.timecounter.smp_tsc: 0
kern.timecounter.invariant_tsc: 1
```

So all HPET values are stuck.

When I do:


```
[root@srv03 /home/admin]# sysctl kern.timecounter.hardware=ACPI-fast
kern.timecounter.hardware: HPET -> ACPI-fast
[root@srv03 /home/admin]# date
Sun Jun 10 08:44:16 CEST 2012
[root@srv03 /home/admin]# date
Sun Jun 10 08:44:16 CEST 2012
[root@srv03 /home/admin]# date
Sun Jun 10 08:44:16 CEST 2012
[root@srv03 /home/admin]# date
Sun Jun 10 08:44:17 CEST 2012
[root@srv03 /home/admin]# date
Sun Jun 10 08:44:17 CEST 2012
[root@srv03 /home/admin]# date
Sun Jun 10 08:44:18 CEST 2012
[root@srv03 /home/admin]# date
Sun Jun 10 08:44:18 CEST 2012
```

.. the clock again starts to tick. I ntp to sync time, and all is normal (Nagios recovers as well).


----------



## joel@ (Jun 21, 2012)

Any news?


----------



## Riplakish (Jul 20, 2012)

Any update to this? I still have multiple VMs hanging after switching the clock to ACPI-fast...


----------



## xzkto (Jul 23, 2012)

Riplakish said:
			
		

> Any update to this? I still have multiple VMs hanging after switching the clock to ACPI-fast...



Strange, after we changed out timecounter to ACPI (we have ACPI-safe) we have uptime close to two months without any problems. And our VM's never hanged, only time stopped. Are you sure that your problem is with timecounters?


----------



## frijsdijk (Jul 24, 2012)

No problems here either anymore with ACPI-fast.

The most recent VM I deployed (AMD64, 9.0), selected "kern.timecounter.hardware: TSC" by itself.


----------



## bach (Jul 24, 2012)

I can confirm, that ACPI-fast has solved this problem for us. We have a huge farm of FreeBSD VM's without any hangs for a long time.


----------



## grahamb413 (Aug 11, 2012)

Setting:
_kern.timecounter.hardware: ACPI-fast_
also fixed this for me, however after setting and rebooting the value goes back to what it was before (HPET) how do I make this a permanent fix?

ESXi 5.0.0 (Dell image)
FreeNAS 8.2.0 Release (Running on FreeBSD)


----------



## joel@ (Aug 12, 2012)

grahamb413 said:
			
		

> Setting:
> _kern.timecounter.hardware: ACPI-fast_
> also fixed this for me, however after setting and rebooting the value goes back to what it was before (HPET) how do I make this a permanent fix?


Add kern.timecounter.hardware=ACPI-fast to /etc/sysctl.conf.


----------



## Bado (Aug 16, 2012)

FWIW, I have been experiencing this issue as well, only not with EXSi 5. I've been having the problem using VMWare Fusion on my Mac. This has been happening with FreeBSD 9.0 and 9.1.  The 9.1 virtual was a brand new instance that I just created, barely set up and configured yet. It ran for a few days, then the date froze. I never (that I recall) had the issue on Fusion with 8.x instances, but I haven't run any at home for a year or more.

I'm attempting the kern.timecounter.hardware=ACPI-fast now to see if the problem resolves.

Of note, I have FBSD 8.0, 8.1, 8.2, 9.0, and 9.1 instances running in ESXi 4 (or 4.1?) at the office and none of them has ever had this problem (in 1.5 years of running multiple virts here)


----------



## joel@ (Aug 17, 2012)

Bado said:
			
		

> Of note, I have FBSD 8.0, 8.1, 8.2, 9.0, and 9.1 instances running in ESXi 4 (or 4.1?) at the office and none of them has ever had this problem (in 1.5 years of running multiple virts here)


Same here. ESX 4.x works really well with FreeBSD 7-10.


----------



## glocke (Sep 5, 2012)

Any news about that KB article? A quick search revealed nothing (e.g. only the timekeeping pdf for ESX 4.0 and the timekeeping best practices for Linux guests).

edit:
Forgot to mention: I just had a 7.1 FreeBSD guest (yeah kinda old...) with ACPI-safe as timecounter, open-vm-tools-nox11-148847 installed and timesync enabled, which got delayed several minutes today. Runs under ESXi, 5.0.0, 768111. Heavy IOps on the disc, probably also from other guests which are on the same SAN volume. ntpd is not running (btw. when did VMWare change its opinion regarding running ntpd in a *NIX guest. Some years ago it was recommended to Do Not Use ntpd but instead sync via vmware-tools...)


----------



## joel@ (Sep 12, 2012)

Has anyone tried ESXi 5.1 with FreeBSD 9.0 yet? It's the first ESXi release to officially support FreeBSD 9.0.

I'd be really interested in hearing about any test results.


----------



## throAU (Sep 13, 2012)

joel@ said:
			
		

> Has anyone tried ESXi 5.1 with FreeBSD 9.0 yet? It's the first ESXi release to officially support FreeBSD 9.0.
> 
> I'd be really interested in hearing about any test results.



I'm building a test lab at the moment which will likely have ESXi 5.1 on it.

Hopefully get it done tomorrow / early next week - will spin up a VM.


----------



## joel@ (Sep 18, 2012)

Any updates?


----------



## throAU (Sep 19, 2012)

Sorry, test lab has been held up slightly, but I should be able to install this week and leave it running for a bit.

I've had the noob going through our build documentation to build the test lab and we uncovered a few documentation problems which held that up 


edit:
just confirmed, our lab was built with ESXi 5.0, i'll get it upgraded to 5.1 soon.


----------



## throAU (Sep 20, 2012)

Upgraded the test lab to 5.1 this morning, going to give FreeBSD 9 a go and see how things are.

Will install VMware tools, as I'm guessing most would run them in their VMs, and turn time sync (NTP and Tools) OFF.


This should replicate the problem, if it exists, yes?


edit:
Installed a VM (FreeBSD 9.0 release, latest VMware tools installed), will check it out in a couple of hours and see if the clock is still running...


----------



## Bado (Sep 21, 2012)

My home virts (the ones having problems) do not have tools installed, and are running with NTP turned on. I install a basic system from ISO, then just install ports here and there as I need them.  

At the office (no problems; fbsd9 on esxi 4.x), we do have vmware-guestd and vmware-kmod installed from ports.

So, I'd be interested in seeing any issues without tools installed, and NTP on.


----------



## Bado (Sep 21, 2012)

I stand corrected; my home virts also have the vmware tools installed, however NTP is on


----------



## throAU (Sep 21, 2012)

Just on the VMware tools - you really want them installed for the balloon driver to work, if nothing else.  Essentially the balloon driver forces the VM to page (as IT sees fit) when the host is under memory pressure (and the host reclaims that real memory), rather than the host swapping bits of the running VM out to disk that may actually be active (potentially causing massive performance issues and page thrashing on the host).


----------



## throAU (Sep 21, 2012)

So, it's been 24 hrs +

GENERIC kernel, FreeBSD 9.0 Release, ESXi 5.1 (downloaded on Tuesday), VMware tools installed, no NTP or VMWare time sync turned on.


So far, my clock is still working fine.

Obviously the VM is under no load (vanilla install with no additional services running), but so far so good.

I'm out of the office as of this evening until Thursday next week, but hopefully I'll be able to confirm whether or not it is still ticking when I get back.


----------



## glocke (Sep 24, 2012)

I have a test environment running just now:
ESXi 5.1.0, 799733 with FreeBSD 9.0, official VMWare Tools installed, no clock sync enabled, no ntpd running. A small script runs sysbench to simulate some IO:
	
	



```
#!/bin/sh

trap bail HUP INT QUIT ILL TRAP ABRT EMT FPE KILL

bail () {
        echo "caught signal"
        exit 1
}

while (true); do
        for m in seqwr seqrewr seqrd rndrd rndwr rndrw; do
                sysbench --test=fileio prepare
                sysbench --test=fileio --file-test-mode=$m run
                sysbench --test=fileio cleanup
                sleep 30
        done
done
```
I will let it run for 24h and report back any issues with the clock.


----------



## scotia (Sep 25, 2012)

Hi all,
I've been running 8.2 on about 10 VMs on ESXi 5 (and then 5.1) for a while now with no issue.  Then I added the vmx3f (VMXNET 3) NIC to a VM, and a few days later (last night) the clock froze.  As discussed, this was fixed with ACPI-safe.
Not sure if it's just a coincidence with the addition of the NIC to the VM.  Just thought I'd posit the thought.
Regards
Scott


----------



## glocke (Sep 25, 2012)

sysbench has been running now continuously for more than 27 hours. I issued a ntpdate after that:


```
25 Sep 15:05:48 ntpdate[66479]: step time server 93.185.134.36 offset 2.512522 sec
```

Definitely not a big difference, of course you still want to sync your clock in some way. I will try the same test on the cloned host on a  ESXi 5.0 environment, but with a lower vHardware Version, this one has vmx-09, FWIW it has an em interface connected to it.


----------



## grahamb413 (Sep 26, 2012)

Unfortunately I upgraded to ESXi 5.1 running Freebsd 8.2 and had this issue come back around 2 weeks after upgrading ESXi.
ACPI-safe resolved the issue again though.
I'll post more info on the setup including the NIC types when I get a free moment


----------



## kattrap (Sep 29, 2012)

Patch came out yesterday. 
http://kb.vmware.com/kb/2032586

PR887134: Timer stops in FreeBSD 8.x and 9.x as virtual hardware HPET main counter register fails to update due to comparison failure between signed and unsigned integer values.


----------



## scotia (Oct 1, 2012)

Download the patch from: http://www.vmware.com/patchmgr/download.portal
In Search By Product, choose ESXi, 5.0.0, release date 09/27/2012, then Search.
567MB later...


----------



## throAU (Oct 2, 2012)

Have VMware confirmed whether or not this is fixed in 5.1?  I just did a search for the PR on their site to see whether or not there is an equivalent 5.1 patch but no joy.


----------



## joel@ (Oct 2, 2012)

throAU said:
			
		

> Have VMware confirmed whether or not this is fixed in 5.1?  I just did a search for the PR on their site to see whether or not there is an equivalent 5.1 patch but no joy.


I'd be interested in this as well.


----------



## joel@ (Oct 2, 2012)

I've just received confirmation from VMware support that the timer fix is included in 5.1.


----------



## xzkto (Oct 5, 2012)

grahamb413 said:
			
		

> Unfortunately I upgraded to ESXi 5.1 running Freebsd 8.2 and had this issue come back around 2 weeks after upgrading ESXi.
> ACPI-safe resolved the issue again though.
> I'll post more info on the setup including the NIC types when I get a free moment





			
				joel@ said:
			
		

> I've just received confirmation from VMware support that the timer fix is included in 5.1.



Did they change that 5.1 upgrade, I missed something or grahamb413 has some different issue?


----------



## joel@ (Oct 5, 2012)

Did you report it to VMware? Do you have a valid support contract?


----------



## grahamb413 (Oct 6, 2012)

xzkto said:
			
		

> Did they change that 5.1 upgrade, I missed something or grahamb413 has some different issue?



Apologies for the confusion.
It turns out that we were upgraded to a newer build and not 5.1
Now on 5.1 and the problem had not yet come back


----------



## spork (Oct 17, 2012)

Thanks to all for adding all the info to this thread, very much appreciated.

It took me way too long to figure out this was my issue today.  For those running firewalls (at least pf), note that your notification you hit this bug might be that the box goes unpingable.  When the clock stops, pf has no idea it should be reaping old states out of the state table.  One might think they were being DDoS'd or similar as the state entries keep climbing and climbing and you see the same hosts over and over in pfctl state listings...


----------



## jpierri (Jul 18, 2019)

For those that still need to run VMware 5.0 or 5.1 (due to some compliance rule or something else), this issue still happens with 11.2-RELEASE-p11

The fix works as used to be:

`sysctl kern.timecounter.hardware=ACPI-fast
date HHMM (put current time, just to kick clock back into action)`

Remember to put it in /etc/sysctl to keep it fixed after the next reboot.


----------

