Quantcast
Channel: Ather Beg's Useful Thoughts
Viewing all articles
Browse latest Browse all 168

ESXi Time Configuration: Don’t point it towards the Windows Domain name

$
0
0

Recently, a client asked me what happens if ESX(i) servers are pointed towards the Windows domain name in “Time Configuration” and one or more domain controllers are not contactable? While I had a theory, I realised I have never tested what actually happens and what is the failure process. As I have seen many environments doing that (perhaps because it’s easy and therefore become a “de-facto” practice), I thought I should test it out and document it.  Here is what happened, which confirmed why it is not a good idea.

Before I start, here are some disclaimers

  • These tests have been done in my home lab,
  • I carried out the tests on a vSphere 5.1 and a 5.5 cluster,
  • I only have two DCs and I could only shutdown one, and
  • It’s a flat network with no firewalls between these machines.

I configured the two clusters mentioned above to point towards my Windows domain name, shutdown one of the domain controllers and started monitoring the ESXi hosts pointing towards that DC for time synchronization, to see what happens. The initial findings were:

  • Pointing the ESXi servers to domain name, will result in a round-robin resolution of the domain controller for the ESXi to use for time.  However, that will be “just one entry” for it to use,
  • Once it has resolved a name, it will use that domain controller for the time service for 8 consecutive times (or at least it counts until there are 8 failures), and
  • The polling interval starts with 64 seconds but gradually grows to 1024 seconds (just over 17 minutes).

Time Query with domain name pointer

You can see in the screenshot above that only one of my two domain controllers is in the configuration, which it kept trying to contact after it was taken down. This screenshot was taken after eight failed attempts. You can tell from the “reach” column which displays the results of the last eight attempts in octal format. If all is good, this should show 377 but with more than 8 unsuccessful attempts, it has gone down to 0.

Now I was expecting that after 8 failures, the ESXi server will do another lookup. However, that was not the case as none of the ESXi servers tried another one, on either cluster!  Once I brought the downed DC back after over 5 hours, the ESXi servers started synchronizing time with it automatically but only on the next polling period.

That confirms to me that having domain names in the configuration (as compared to domain controller FQDNs or IP addresses) will result in a situation where the ESX(i) hosts will stop synchronizing their time with the hierarchy in case they can’t get to the looked-up DC and won’t try another, unless the service is restarted. That could be due to port blocks or a downed DC. Service restart might also be hit and miss depending on what round-robin returns.

Also, DNS could possibly return as many DCs that resolve to a domain name but not all of them may be available to an ESX(i) host, due to port blocks etc, which is quite possible in larger environments. As far as the NTP client is concerned, it’s a valid result so it won’t try another lookup.

As compared to that, having individual DC FQDNs/IPs in there has the following benefits:

  • One can point to specific DCs according to port access policies,
  • As they’re individually mentioned, all of them are active at the same time, and
  • Having a downed DC doesn’t have an impact as there are others to pick up the service.

Consider the screenshot below:

Time Query with individual DCs

Here I had the two individual FQDN entries of the domain controllers in the time configuration. Both DCs appear in the configuration as a result and the configuration is already resilient because of that.

If the environment and access policies allows (e.g. in a lab), one can also point the service to VMware’s entries. Here is what happens when I point my cluster towards the following entries:

0.vmware.pool.ntp.org
1.vmware.pool.ntp.org
2.vmware.pool.ntp.org
3.vmware.pool.ntp.org

Time Query with four individual VMware NTP

As you can see, four different entries resolve to four different external time servers to sync time with. Surely, this is more resilient, however, this might not be suited to all environments.

So, this little experiment confirms what I’ve always thought: Time (NTP) configuration on a vSphere environment should always be pointed towards individual FQDNs or IP addresses of reliable time sources and not towards a Windows domain name.


Viewing all articles
Browse latest Browse all 168

Trending Articles