# 7 second delay executing chronyc in 12.2-latest



## aoeu (Feb 14, 2021)

Hello FreeBSD Forums!  First time poster, please be gentle 

I updated my physical server to the latest last night via the usual `freebsd-update fetch install` and `pkg update; pkg upgrade` dance (I have no ports installed, just a few packages, including chrony and telegraf).  All seemed to be well until I realised that telegraf was no longer recording my time stats!  Not the end of the world, but I thought I'd investigate.

```
15:18:14 nick@filer:/home/nick/ uname -a                          
FreeBSD filer.int.aoeu.uk 12.2-RELEASE-p3 FreeBSD 12.2-RELEASE-p3 GENERIC  amd64
15:18:19 [B]nick@filer:/home/nick/[/B] pkg info | grep "telegraf\|chrony"
chrony-4.0                     System clock synchronization client and server
telegraf-1.17.0                Time-series data collection
```
Here's what telegraf has to say:

```
15:15:16 root@filer:/root/ tail -n 2 /var/log/telegraf/telegraf.log 
2021-02-14T15:14:50Z E! [inputs.chrony] Error in plugin: failed to run command /usr/local/bin/chronyc -n tracking: Command timed out. - 506 Cannot talk to daemon
2021-02-14T15:15:05Z E! [inputs.chrony] Error in plugin: failed to run command /usr/local/bin/chronyc -n tracking: Command timed out. - 506 Cannot talk to daemon
```
Interesting!  Well we updated to chrony-4.0 from 3.<something> yesterday, maybe the tracking command syntax has changed?

```
15:10:28 nick@filer:/home/nick/ sudo /usr/local/bin/chronyc -n tracking
Password:
Reference ID    : D8EF2304 (216.239.35.4)
Stratum         : 2
Ref time (UTC)  : Sun Feb 14 15:08:35 2021
System time     : 0.000031922 seconds slow of NTP time
Last offset     : +0.000048788 seconds
RMS offset      : 0.000102199 seconds
Frequency       : 24.948 ppm slow
Residual freq   : +0.002 ppm
Skew            : 0.027 ppm
Root delay      : 0.006732919 seconds
Root dispersion : 0.000752528 seconds
Update interval : 1044.4 seconds
Leap status     : Normal
```
Looks normal enough to me.  Ok, so maybe there's a file permission issue:

```
15:38:38 nick@filer:/home/nick/ sudo ps aux | grep telegraf | grep -v grep
root      6384   0.0  0.0   10844  2292  -  Ss   01:37      0:00.09 daemon: /usr/local/bin/telegraf[6385] (daemon)
telegraf  6385   0.0  0.2 4991656 67228  -  S    01:37      1:30.37 /usr/local/bin/telegraf --quiet --config=/usr/local/etc/telegraf.conf
```
^ so there's a core daemon running as root, and a forked child that collects data running as telegraf.  Let's check the chrony executable:

```
15:18:20 [B]nick@filer:/home/nick/[/B] ls -lah /usr/local/bin/chronyc 
-rwxr-xr-x  1 root  wheel    91K Feb  6 12:44 /usr/local/bin/chronyc
```
Nope, anyone should be able to run it.  Let's try running as telegraf:

```
15:21:21 [B]nick@filer:/home/nick/[/B] sudo -u telegraf /usr/local/bin/chronyc -n tracking | head -n 1
Reference ID    : D8EF2304 (216.239.35.4)
```
Good output, but it took a very long time to run!  Let's compare with various accounts:

```
15:21:32 [B]nick@filer:/home/nick/[/B] time sudo /usr/local/bin/chronyc -n tracking | head -n 1
Reference ID    : D8EF2304 (216.239.35.4)
sudo /usr/local/bin/chronyc -n tracking  0.00s user 0.01s system 93% cpu 0.006 total
head -n 1  0.00s user 0.00s system 13% cpu 0.005 total
```
^ root is fast

```
15:22:21 [B]nick@filer:/home/nick/[/B] time sudo -u telegraf /usr/local/bin/chronyc -n tracking | head -n 1
Reference ID    : D8EF2304 (216.239.35.4)
sudo -u telegraf /usr/local/bin/chronyc -n tracking  0.01s user 0.00s system 0% cpu 7.127 total
head -n 1  0.00s user 0.00s system 0% cpu 7.126 total
```
^ WTF?  7 seconds!

Ok, so maybe there's a problem with sudo - what with the recent vulnerability, maybe there's been some kind of regression?  Let's try as me, without sudo:

```
15:23:10 [B]nick@filer:/home/nick/[/B] time /usr/local/bin/chronyc -n tracking | head -n 1 
Reference ID    : D8EF2304 (216.239.35.4)
/usr/local/bin/chronyc -n tracking  0.00s user 0.00s system 0% cpu 7.076 total
head -n 1  0.00s user 0.00s system 0% cpu 7.076 total
```
^ Still slow, now with no sudo!

Ok, so I think we can rule sudo out.  I ran truss against chronyc, but don't see any output - I guess the client doesn't need to make much in the way of system calls if it's just talking to the local server.  Could be some sort of hostname lookup that's failing somewhere?  tcpdump shows no port 53 traffic at all though.

Is the problem wider than just chronyc?  Seems not, at least for ping:

```
15:28:43 [B]nick@filer:/home/nick/[/B] time sudo -u telegraf /sbin/ping -c 1 1.1.1.1
Password:
PING 1.1.1.1 (1.1.1.1): 56 data bytes
64 bytes from 1.1.1.1: icmp_seq=0 ttl=55 time=1.696 ms
--- 1.1.1.1 ping statistics ---
1 packets transmitted, 1 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 1.696/1.696/1.696/0.000 ms
sudo -u telegraf /sbin/ping -c 1 1.1.1.1  0.00s user 0.01s system 0% cpu 9.008 total
```
I did hit the issue a few people have mentioned with an account being claimed as going missing during the pkg upgrade process (in my case _tss), but `pwd_mkdb -p /etc/master.passwd` seemed to fix that.  I've not messed with /etc/master.passwd before, maybe that has broken something in pam?  My system is very vanilla, nothing will have been edited in pam, and the only sudo config I have in place is to allow %wheel to run as root with password auth.  I've got a completely untouched /etc/nsswitch.conf in case that is likely to be suspect.

I'm pretty stuck at this point.  Any advice on how to delve deeper into this would be much appreciated!


----------



## aoeu (Feb 14, 2021)

Oops, my ping test was a bit unclear there - typing in my password took a while, it seems   Here's the proper output:

```
15:55:21 nick@filer:/home/nick/ time sudo -u telegraf /sbin/ping -c 1 1.1.1.1
PING 1.1.1.1 (1.1.1.1): 56 data bytes
64 bytes from 1.1.1.1: icmp_seq=0 ttl=55 time=1.725 ms
--- 1.1.1.1 ping statistics ---
1 packets transmitted, 1 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 1.725/1.725/1.725/0.000 ms
sudo -u telegraf /sbin/ping -c 1 1.1.1.1  0.00s user 0.00s system 66% cpu 0.007 total
```


----------



## SirDice (Feb 14, 2021)

Is chronyd(8) started and running? `sysrc chronyd_enable="YES"`


----------



## aoeu (Feb 14, 2021)

Hi SirDice,

Yes, and it's functioning fine as far as I can tell:

16:23:20 *nick@filer:/home/nick/* grep chrony /etc/rc.conf
chronyd_enable="YES"

16:24:14 *nick@filer:/home/nick/* chronyc -n sources
MS Name/IP address         Stratum Poll Reach LastRx Last sample               
===============================================================================
^+ 216.239.35.0                  1  10   377   365   +940us[ +940us] +/- 3552us
^* 216.239.35.4                  1  10   377   413   +553us[ +533us] +/- 3857us
^+ 216.239.35.8                  1  10   377   325   +557us[ +557us] +/- 3930us
^+ 216.239.35.12                 1   9   377   333   +347us[ +347us] +/- 5289us
^+ 2001:4860:4806::              1  10   377   404   -352us[ -352us] +/- 4664us
^+ 2001:4860:4806:4::            1  10   377   367   -213us[ -213us] +/- 4514us
^+ 2001:4860:4806:8::            1  10   377    11  -1041us[-1041us] +/- 5365us
^+ 2001:4860:4806:c::            1  10   377   785   -282us[ -303us] +/- 5584us

The only problem is the 7 second delay before producing any output (including with 'sources', 'sourcestats' or - importantly for telegraf, 'tracking').


----------



## aoeu (Feb 14, 2021)

Also, my other machines, for whom my filer is the timeserver, are happy with it as a time source.  Here's a Debian VM for example:

16:52:45 *nick@stage:/home/nick/* sudo chronyc sources    
210 Number of sources = 1
MS Name/IP address         Stratum Poll Reach LastRx Last sample               
===============================================================================
^* filer.int.aoeu.uk             2  10   377   285    -43us[  +63us] +/- 4794us

This is a very odd problem!


----------



## BjarneB (Feb 14, 2021)

I can confim I see the same issue on several servers, like on this one:

dataman$ time chronyc -n tracking| head -1
Reference ID    : 550AF0FD (85.10.240.253)
7.15 real 0.00 user 0.00 sys

dataman$ sudo time chronyc -n tracking| head -1
Reference ID : 550AF0FD (85.10.240.253)
0.00 real 0.00 user 0.00 sys

All Freebsd 12.2

Nothing obvious and timekeeping is good.
Wierd!
Investigating further....


----------



## BjarneB (Feb 14, 2021)

ok, some debugging.
`truss chronyc -n tracking` provides among other things:


```
socket(PF_LOCAL,SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK,0) = 3 (0x3)
unlink("/var/run/chrony/chronyc.67106.sock")     ERR#13 'Permission denied'
bind(3,{ AF_UNIX "/var/run/chrony/chronyc.67106.sock" },106) ERR#13 'Permission denied'
getsockname(3,{ AF_UNIX "" },0x7fffffffe8bc) = 0 (0x0)
unlink("") ERR#2 'No such file or directory'
close(3) = 0 (0x0)
socket(PF_INET,SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK,0) = 3 (0x3)
connect(3,{ AF_INET 127.0.0.1:323 },16) = 0 (0x0)
gettimeofday({ 1613327669.087299 },0x0) = 0 (0x0)
getrandom("\M--W\^SeAX\M-n\^\\M-~:2\M-eiwi"...,40,0) = 40 (0x28)
mmap(0x0,1104,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34374377472 (0x800df6000)
minherit(0x800df6000,1104,INHERIT_ZERO) = 0 (0x0)
sendto(3,"\^F\^A\0\0\0!\0\0\M-v[2<\0\0\0\0"...,104,0,NULL,0) = 104 (0x68)
select(4,{ 3 },0x0,0x0,{ 1.000000 }) = 0 (0x0)
gettimeofday({ 1613327670.127509 },0x0) = 0 (0x0)
sendto(3,"\^F\^A\0\0\0!\0\^AZ2\M-d\^T\0\0"...,104,0,NULL,0) = 104 (0x68)
```

First chronyc tries to create a socket in /var/run/chrony  - permission denied
then it creates a connection to localhost 127.0.0.1  - good
next it tries to send data via the newly created socket #3, three times   -  this is where the time is spend, about 2secs pr send.

When running as root, the connection is made via socket /var/run/chronyd.sock. 

And just for fun, when running:
`chronyc -6 -n tracking`
there is no delay.

The relevant socket command is:

```
socket(PF_INET6,SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK,0) = 3 (0x3)
setsockopt(3,IPPROTO_IPV6,IPV6_V6ONLY,0x7fffffffe8c0,4) = 0 (0x0)
connect(3,{ AF_INET6 [::1]:323 },28) = 0 (0x0)
gettimeofday({ 1613329516.446837 },0x0) = 0 (0x0)
getrandom("\M-O&\M-V\M-K\M^[&Y\M^J3\^]#\M-d"...,40,0) = 40 (0x28)
mmap(0x0,1104,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34381029376 (0x80144e000)
minherit(0x80144e000,1104,INHERIT_ZERO) = 0 (0x0)
sendto(3,"\^F\^A\0\0\0!\0\0\M^W\M-;E\240\0"...,104,0,NULL,0) = 104 (0x68)
```


By the way, I am using 

```
$ chronyc --version
chronyc (chrony) version 4.0 (+READLINE +SECHASH +IPV6 -DEBUG)
```


----------



## aoeu (Feb 14, 2021)

Mm, I can reproduce this with a plain VM.  Steps:

-Install VM from 12.0-Release ISO, 2 vCPU, 2GB RAM, 16GB disk
-Choose default option for everything in installer
-Upgrade VM to 12.2 via freebsd-update -r
-Run pkg update; pkg upgrade, hit issue with _tss account reporting missing
-Run pwd_mkdb -p /etc/master.passwd
-Re-run pkg update; pkg upgrade
-Install chronyc with pkg
-Configure chrony.conf with an upstream server
-Enable chronyd in rc.conf, start server
-Log in as non-root account, run 'chronyc sources' or any other chronyc command, observe 7 second delay before output starts


----------



## aoeu (Feb 14, 2021)

Reproduced on 12.2-RELEASE iso too (x64 to be clear)

-Install VM from 12.2-Release ISO
-Choose default option for everything in installer (other than setting timezone to UK, which I did in the previous test too - I should have mentioned)
-Update to latest with freebsd-update fetch install; reboot
-pkg update
-pkg upgrade (note lack of issue with _tss account with this method)
-pkg install chrony
-echo 'chronyd_enable="YES"' >> /etc/rc.conf
-vi /usr/local/etc/chrony.conf, set appropriate upstream server (with iburst, in case that matters)
-service chronyd start
-log in as non-root user, run chronyc command, observe 7-second delay

I guess this is a bug in chronyc then?  How can we dig further?


----------



## SirDice (Feb 14, 2021)

BjarneB said:


> First chronyc tries to create a socket in /var/run/chrony - permission denied


This looks relevant: PR 245399.

It there a pid file reference in /usr/local/etc/chrony.conf? The directory should probably be /var/run/chrony with the right ownership on that directory.


----------



## aoeu (Feb 14, 2021)

BjarneB, same re: ipv6.  If I invoke chronyc with -6, I get instant responses.  Interestingly, if I use -4, I get actual failures (after the usual delay):

20:58:26 *nick@filer:/home/nick/* time chronyc -4 -n sources
506 Cannot talk to daemon
chronyc -4 -n sources  0.00s user 0.00s system 0% cpu 7.158 total

Apologies for the disjointed responses, my account is new so my posts are being manually vetted, which takes time.

Would you mind explaining how you got your truss output?  I tried the following, to no avail:

20:59:21 *nick@filer:/home/nick/* truss chronyc -4 -n sources
506 Cannot talk to daemon
truss: Unable to enable LWP events for pid 69150: No such process

Do I need to have root run truss and attach to a newly-started chronyc process running as the test user?  Something like (as root) "truss sudo -u <whoever> chronyc <whatever>" gives me what seems to be sudo's calls?  Apologies, I am new at this 


SirDice Yes, there is a (commented-out) default, but it doesn't reflect reality; my pid file is in /var/run/chrony as you suggest:

20:53:17 *root@filer:/root/* grep -B5 pid /usr/local/etc/chrony.conf
# chronyd writes its process ID to a file.  If you try to start a second
# copy of chronyd, it will detect that the process named in the file is
# still running and bail out.  If you want to change the path to the PID
# file, uncomment this line and edit it.  The default path is shown.

! pidfile /var/run/chronyd.pid
20:53:32 *root@filer:/root/* ls -lah /var/run/chrony                
total 10
drwxr-x---  2 chronyd  chronyd     4B Feb 14 20:44 .
drwxr-xr-x  9 root     wheel      29B Feb 14 03:42 ..
-rw-r--r--  1 root     chronyd     6B Feb 14 20:44 chronyd.pid
srwxr-xr-x  1 chronyd  chronyd     0B Feb 14 20:44 chronyd.sock

I guess the fix is going to be something involving the /var/run/chrony directory (which can't currently be traversed by anyone other than chronyd or root) and the sock file - I presume that chronyc would need write access to that, right?


----------

