Time Drift Monitoring: Troubles of Unsynchronized Servers

A distributed IT infrastructure has many servers operating across multiple regions. Applying a single consistent time across servers will ensure isolated servers communicate in the proper time zone when exposed to the internet.

When Microsoft encountered this problem back in 2017, the servers were offset by almost an hour, affecting a number of windows machines. With Reddit, it was a different case. A "leap second" glitch caught all the Java processes running in loops, which in turn ate up the CPU power and crashed the site for a while.

Maintaining consistent and accurate time across servers, and even network devices, is crucial. Otherwise, businesses could run into any of the problems below.

1. Active Directory (AD) replication conflicts: When AD replication takes place, information is updated consistently across all domain controllers (DCs). If different DCs operate with a mismatched time of even a few seconds, concurrent operations can lead to replication conflicts.

2. Authentication failure: If the server that’s sending an authentication request has its clock set differently from the server receiving the request, the authentication request will time out. Additionally, a servers’ clocks being out of sync increases the probability of replay attacks and multiple reuse of the same authentication. This is why a time stamp is used in Kerberos authentication in an AD environment.

3. Incorrect logs for troubleshooting: When an issue occurs, businesses rely on logs to identify the root cause of the problem. Having log entries in multiple time zones make the troubleshooting process tedious, and identifying the exact time the issue occurred can also be difficult.

4. Inconsistent reports: Subtle differences in time, leads to inconsistency in data reporting. Maintaining consistent time across servers will prevent skewed data in reports, thereby helping IT teams make decisions that won't impact infrastructure operations.

How to synchronize time between servers: The widely accepted solution in the IT industry is to use a network time protocol (NTP) server that synchronizes a systems’ clock. However, connecting to NTP servers can still lead to inconsistencies by a few seconds. This is caused by unpredictable latency from network traffic and connections having to go through several routers.

Site24x7 to the rescue

It's recommended to sync time with a local NTP server and monitor the time drift to ensure an added layer of assurance. Using Site24x7 as a monitoring system reduces the turn around time to identify the time drift and take corrective actions. Site24x7 provides the capability to configure the server to be monitored as the primary server, and a public NTP server as the secondary server. This set up will alert IT teams if any server drifts beyond the configured threshold. Setting the correct time will prove useful when tracking anomalies and troubleshooting during downtime.

Read our help document for step-by-step instructions to get started, and click here to sign up for a free, 30-day trial of Site24x7.

Guest blog courtesy of Site24x7. Read more Site24x7 blogs here.