NTP & clock skew issues

#How to debug NTP issues:

From time to time the NTP may not have set the correct time on the system. This can be seen for example in mon server clock skew error messages. The procedure is always the same. First we check if the croit management node has a correct time and runs synchronized to other external ntp servers. Currently the following servers are used:

root@mgmt-node ~ $ docker exec -it croit cat /etc/ntp.conf |grep ^pool
pool 0.debian.pool.ntp.org iburst
pool 1.debian.pool.ntp.org iburst
pool 2.debian.pool.ntp.org iburst
pool 3.debian.pool.ntp.org iburst

To fix the current time, connect to the management server via SSH, but not to the docker container. Make sure that no ntp is running at the moment:

root@mgmt-node ~ $ timedatectl status
System clock synchronized: no

If the system time is not synchronized, the update of the Ceph Storage Server fails. In this case you can initiate a manual update of the NTP time. First stop all running NTP services on the server, no matter if inside or outside the container.

root@mgmt-node ~ $ timedatectl set-ntp off
root@mgmt-node ~ $ docker exec -it croit pkill -9 ntpd

Please check that really no other NTP service is running. Depending on the operating system used, a different NTP implementation may be used, which may result in other commands being correct.

To manually update the system time over the network, execute the following command:

root@mgmt-node ~ $ ntpd -gq
14 Sep 11:25:38 ntpd[2592029]: ntpd 4.2.8p12@1.3728-o (1): Starting
14 Sep 11:34:48 ntpd[2592029]: ntpd: time set +542.994624 s
ntpd: time set +542.994624s

Restart the docker container to restore the NTP and container to a clean state.

root@mgmt-node ~ $ docker restart croit

Now the MON servers should update the time after a few minutes and the clock skew warning should disappear. To speed this up you can force the NTP update. Connect to the appropriate server and do the following:

root@mgmt-node ~ $ docker exec -it croit ssh <mon-host> "systemctl stop ntp.service; sleep .1; ntpd -gq; systemctl start ntp"
14 Sep 09:46:04 ntpd[3839354]: ntpd 4.2.8p12@1.3728-o (1): Starting
14 Sep 09:46:11 ntpd[3839354]: ntpd: time slew -0.001482 s
ntpd: time slew -0.001482s

The error message should now disappear quickly.