NTP & clock skew issues
#How to debug NTP issues:
From time to time the NTP may not have set the correct time on the system. This can be seen for example in mon server clock skew error messages. The procedure is always the same. First we check if the croit management node has a correct time and runs synchronized to other external ntp servers. Currently the following servers are used:
root@mgmt-node ~ $ docker exec -it croit cat /etc/ntp.conf |grep ^pool pool 0.debian.pool.ntp.org iburst pool 1.debian.pool.ntp.org iburst pool 2.debian.pool.ntp.org iburst pool 3.debian.pool.ntp.org iburst
To fix the current time, connect to the management server via SSH, but not to the docker container. Make sure that no ntp is running at the moment:
root@mgmt-node ~ $ timedatectl status ... System clock synchronized: no ...
If the system time is not synchronized, the update of the Ceph Storage Server fails. In this case you can initiate a manual update of the NTP time. First stop all running NTP services on the server, no matter if inside or outside the container.
root@mgmt-node ~ $ timedatectl set-ntp off root@mgmt-node ~ $ docker exec -it croit pkill -9 ntpd
Please check that really no other NTP service is running. Depending on the operating system used, a different NTP implementation may be used, which may result in other commands being correct.
To manually update the system time over the network, execute the following command:
root@mgmt-node ~ $ ntpd -gq 14 Sep 11:25:38 ntpd: ntpd firstname.lastname@example.org (1): Starting ... 14 Sep 11:34:48 ntpd: ntpd: time set +542.994624 s ntpd: time set +542.994624s
Restart the docker container to restore the NTP and container to a clean state.
root@mgmt-node ~ $ docker restart croit
Now the MON servers should update the time after a few minutes and the clock skew warning should disappear. To speed this up you can force the NTP update. Connect to the appropriate server and do the following:
root@mgmt-node ~ $ docker exec -it croit ssh <mon-host> "systemctl stop ntp.service; sleep .1; ntpd -gq; systemctl start ntp" 14 Sep 09:46:04 ntpd: ntpd email@example.com (1): Starting ... 14 Sep 09:46:11 ntpd: ntpd: time slew -0.001482 s ntpd: time slew -0.001482s
The error message should now disappear quickly.