prior next up home

The story (with pictures)

For some time (years) I configured my systems to synchronize their clocks over the Internet using NTP. They'd talk to stratum two or stratum three servers and were stable to within a small fraction of a second.
When I connected a GPS receiver to my system to use the Pulse Per Second, I also configured the system to use a GPS based stratum one server, bigben.cac.washington.edu. These two sources stayed nicely in sync for quite some time. My system synced to within tens of microseconds of the GPS Pulse Per Second, and to with a few hundred microseconds of bigben.
In late February these two sources appeared to diverge from one another, reaching a peak of nine milliseconds after a week.
Since I couldn't tell which clock was in error, I changed the configuration to call upon many more servers, but this wasn't much comfort. The third GPS based clock was in close agreement with bigben. The dial-up (ACTS) based servers jumped around quite a bit, but their excursions were correlated with bigben as well. That suggested it was my clock which was in error, but strangely enough, one clock, based in the same city as the NIST time source, was 9 milliseconds from the other servers. There was some hope that the cluster of servers would return to agreement with my system.
That was not to be. Instead of coming back into sync, the divergence started to go the other way. I started to wonder if there would be a regular 8 week long cycle to this divergence.
As I started to look for the next cycle to start, things went badly awry. Although the discrepancy from bigben continued, my system's close synchronization with the GPS pps was lost.
So I updated to the latest kernel and ntpd versions, and rebooted. (The kernel was several months old because I don't care to reboot without need.) This brought everything (but the one oddball server) back into sync.
FWIW, I've also noted how well my local servers maintain sync with my one GPS synched system. I had hoped for sub-millisecond agreement for systems on the same LAN, but that wasn't to be. (Actually, they're not all on the same LAN. Some go through a switch at the cost of 150 microsecond delay. Others go through an additional switch (and a router on campus) which introduces an additional 100 microseconds delay.) I wonder if NTP's design to deal with large variations interfere with better synchronization when the variations are smaller, but proportionally larger relative to the absolute value.


last edited by Randolph Bentson, bentson atsymbol holmsjoen dotsymbol com on 2010-04-11T13:46:57-07:00