Hello all.  Eric here again.  OK, so maybe the title of this blog post isn’t the best, primarily because there could be a ton of reasons why some DC’s might not be able to sync time with the PDCE.  Some obvious examples would be port blockages, connectivity issues, DC’s set to “NoSync”, VM’s syncing to their host, maybe a broken IPSec policy on the DC, and a lot of other reasons.  I ran into what I thought was an interesting scenario today though and wanted to share.

Note:  This environment is all 2008 R2 in a single forest/domain.

So first things first, the problem:

The customer noticed that time kept straying out of sync.  In doing some spot checks around the environment, I simply ran the following command – w32tm /resync

While some DC’s would successfully sync time, others would not.  For the ones that failed, I cycled the time service to try to generate some data in the system log.  When looking at the system log, sure enough there was a new error in relation to time – Event ID 129 with a Source of Time-Service that says:

“NtpClient was unable to set a domain peer to use as a time source because of discovery error. NtpClient will try again in %2 minutes and double the reattempt interval thereafter. The error was: %1”

The interesting thing was that the ones that failed ALL failed with this error.  So that’s one helpful piece of information.  The next thing that I wanted to look at was the time source for the DC’s that were working, and for those that weren’t working.  To do that, I ran the following command – w32tm /query /source

The working DC’s showed the PDCE as the command output, where the broken DC’s came back with either “free-running System Clock” or “Local CMOS Clock”.  Those are essentially the same thing, as a matter of fact, it could say “free-running System Clock” and then you could cycle time, and then it would say “Local CMOS Clock”.  Regardless, there was definitely a trend.  The DC’s that were failing to sync time showed the same time source and had the same event ID.

So for their time configuration, they implemented a GPO for time sync and applied it to all of the DC’s, but had a security filter in place so that it didn’t apply to the PDCE.  On the PDCE, he set that DC to sync time externally, essentially configuring the settings found in this article – http://support.microsoft.com/kb/816042.

Instead of showing a GPResult or export of the “DC Time” GPO, an easier method to show the time configuration for the purposes of giving data to look at in the blog, but also for information that I needed in troubleshooting, it’s easier to run and show the output of the following command:

w32tm /query /configuration /verbose

[Configuration]

EventLogFlags: 2 (Policy)
AnnounceFlags: 10 (Policy)
TimeJumpAuditOffset: 28800 (Local)
MinPollInterval: 6 (Policy)
MaxPollInterval: 10 (Policy)
MaxNegPhaseCorrection: 172800 (Policy)
MaxPosPhaseCorrection: 172800 (Policy)
MaxAllowedPhaseOffset: 300 (Policy)

FrequencyCorrectRate: 4 (Policy)
PollAdjustFactor: 5 (Policy)
LargePhaseOffset: 50000000 (Policy)
SpikeWatchPeriod: 900 (Policy)
LocalClockDispersion: 10 (Policy)
HoldPeriod: 5 (Policy)
PhaseCorrectRate: 1 (Policy)
UpdateInterval: 100 (Policy)

FileLogName:  (Undefined or NotUsed)
FileLogEntries:  (Undefined or NotUsed)
FileLogSize: 0 (Undefined or NotUsed)
FileLogFlags: 0 (Undefined or NotUsed)

[TimeProviders]

NtpClient (Local)
DllName: C:Windowssystem32w32time.dll (Local)
Enabled: 1 (Local)
InputProvider: 1 (Local)
CrossSiteSyncFlags: 1 (Policy)
AllowNonstandardModeCombinations: 1 (Local)
ResolvePeerBackoffMinutes: 15 (Policy)
ResolvePeerBackoffMaxTimes: 7 (Policy)
CompatibilityFlags: 2147483648 (Local)
EventLogFlags: 3 (Policy)
LargeSampleSkew: 3 (Local)
SpecialPollInterval: 900 (Policy)
Type: NT5DS (Policy)
NtpServer:  (Undefined or NotUsed)

NtpServer (Local)
DllName: C:Windowssystem32w32time.dll (Local)
Enabled: 1 (Local)
InputProvider: 0 (Local)
AllowNonstandardModeCombinations: 1 (Local)
EventLogFlags: 0 (Undefined or NotUsed)

VMICTimeProvider (Local)
DllName: C:WindowsSystem32vmictimeprovider.dll (Local)
Enabled: 1 (Local)
InputProvider: 1 (Local)

This shows all of the time related settings on the DC, showing which settings are received from Policy, and which settings are local to the box, not being overridden by a policy.

Interestingly enough though, that was the exact same output for DC’s that were syncing time and for DC’s that were not syncing time.  The next check that I did was a quick packet capture on one of the DC’s that wasn’t syncing time, to try to rule out connectivity or port related issues.  When I did a packet capture and then cycled the time service however, I saw no NTP traffic leaving the box whatsoever, so it had to be some setting on the DC that was preventing this.

So what I did next was just compare the verbose time settings from above to those that were in my lab, since my lab was in a working state as far as syncing time, and setup in a similar fashion – minus the use of a GPO.  Other than certain intervals being different, the one flag that stood out as being different was the CrossSiteSyncFlags.  Their environment was set to 1 and in my environment, it was set to 2, where 2 is the default.  So I did a bit of reading up on the values for that flag.

TechNet (http://technet.microsoft.com/en-us/library/cc781996(WS.10).aspx) showed the following for the values:

  • 0 – The time service cannot synchronize with a partner that is outside the computer’s site.
  • 1 – The time service can synchronize only with the primary domain controller.
  • 2 – The time service can synchronize with a partner that is outside the computer’s site.

In the actual GPO, the explanation of the policy says this:

CrossSiteSyncFlags: This value, expressed as a bitmask, controls how W32time chooses time sources outside its own site. The possible values are 0, 1, and 2. Setting this value to 0 (None) indicates that the time client should not attempt to synchronize time outside its site. Setting this value to 1 (PdcOnly) indicates that only the computers that function as primary domain controller (PDC) emulator operations masters in other domains can be used as synchronization partners when the client has to synchronize time with a partner outside its own site. Setting a value of 2 (All) indicates that any synchronization partner can be used. This value is ignored if the NT5DS value is not set. The default value is 2 decimal (0x02 hexadecimal).

So the documentation has somewhat different definitions, but all of the definitions discuss time sync in relation to site boundaries.  After seeing that I looked at which sites the DC’s were in that were successfully syncing vs. which sites the non-syncing DC’s were in.  Interestingly enough, all of the working DC’s were in the same site as the PDCE…  So now it’s looking more and more like that’s the problem, but if it is, it doesn’t match the behavior that’s discussed in the TechNet documentation, so before making an enterprise change like this, I decided to turn on Windows Time Debug Logging on one of the DC’s that wasn’t able to sync time.  To do that, you can follow the instructions found here:  http://support.microsoft.com/kb/816043

After I did that, I found some interesting data after cycling the time service…  Highlighted is the more relevant data that was collected from the log:

Resolving domain peer
149851 20:59:04.6778750s – ReadConfig: ‘EventLogFlags’=0x00000000 (0)
149851 20:59:04.6778750s – NetLogonGetTimeServiceParentDomain dwErr = 1355 netlogonbits = 0.
149851 20:59:04.6778750s – Query 1 (BACKGROUND): <SITE: SomeRandomSite, DOM: contoso.com, FLAGS: 00026B00>
149851 20:59:04.6778750s – PeerPollingThread: PeerListUpdated
149851 20:59:04.6778750s – PeerPollingThread: waiting forever
149851 20:59:04.6778750s – Query 1: no DC found.
149851 20:59:04.6778750s – Query 2 (BACKGROUND): <SITE: SomeRandomSite, DOM: contoso.com, FLAGS: 00024380> 
149851 20:59:04.7091250s – ListeningThread — DataAvailEvent set for socket 1 (0.0.0.0:123)
149851 20:59:04.7091250s – ListeningThread — response heard from 10.x.x.x:123 <- 214.x.x.x:123
149851 20:59:04.7091250s – Ignoring packet invalid mode combination (in:3 out:0).
149851 20:59:04.7560000s – ListeningThread — DataAvailEvent set for socket 1 (0.0.0.0:123)
149851 20:59:04.7560000s – ListeningThread — response heard from 10.x.x.x:65535 <- 214.x.x.x:123
149851 20:59:04.7560000s – Ignoring packet invalid mode combination (in:3 out:0).
149851 20:59:04.8028750s – ListeningThread — DataAvailEvent set for socket 1 (0.0.0.0:123)
149851 20:59:04.8028750s – ListeningThread — response heard from 10.x.x.x:65535 <- 214.x.x.x:123
149851 20:59:04.8028750s – Ignoring packet invalid mode combination (in:3 out:0).
149851 20:59:04.8653750s – ListeningThread — DataAvailEvent set for socket 1 (0.0.0.0:123)
149851 20:59:04.8653750s – ListeningThread — response heard from 10.x.x.x:65535 <- 214.x.x.x:123
149851 20:59:04.8653750s – Ignoring packet invalid mode combination (in:3 out:0).
149851 20:59:04.9278750s – ListeningThread — DataAvailEvent set for socket 1 (0.0.0.0:123)
149851 20:59:04.9278750s – ListeningThread — response heard from 10.x.x.x::65535 <- 214.x.x.x:123
149851 20:59:04.9278750s – Ignoring packet invalid mode combination (in:3 out:0).
149851 20:59:05.0841250s – Query 2: no DC found.
149851 20:59:05.0841250s – Retrying resolution for domain hierarchy. Retry 1 will be in 15 minutes. 
149851 20:59:05.0841250s – Logging warning: NtpClient was unable to set a domain peer to use as a time source because of discovery error. NtpClient will try again in 15 minutes and double the reattempt interval thereafter. The error was: The entry is not found. (0x800706E1)
149851 20:59:05.0841250s – Logging error: NtpClient has been configured to acquire time from one or more time sources, however none of the sources are currently accessible and no attempt to contact a source will be made for 15 minutes. NTPCLIENT HAS NO SOURCE OF ACCURATE TIME.

In previous Windows Time Debug Logs, I’ve seen Query 1,2,3,4,5 etc, change sites between queries (though not necessarily DCs), but in this case, it only queried for DC’s in the same site, but since none of those DC’s were PDCE’s the time sync attempt failed..  By this time I was pretty certain that the problem was the value of the CrossSiteSyncFlags, and then just to put icing on the cake, shortly after enabling debug logging I found an internal solutions document that described the values of that flag.  They read as follows:

  • With CrossSiteSyncFlags set to 0, Windows XP-based clients or Windows 2003-based
    clients do not try to synchronize outside the site.
  • With CrossSiteSyncFlags set to 1, Windows XP-based clients or Windows 2003-based
    clients, only the PDC can synchronize across sites (not member servers).
  • With CrossSiteSyncFlags set to 2, Windows XP-based clients or Windows 2003-based
    clients can use out-of-site domain controllers for synchronization. This is the
    default value.

Granted, this wasn’t XP or 2003, but from what I was seeing, this still showed to be true, so I made the recommendation to change the value of CrossSiteSyncFlags in the GPO to 2.  After doing that and the DC’s refreshing policy, now they were all able to sync time.