Best Practices for Virtualizing Active Directory Domain Controllers (AD DC), Part II

Topics in this article

Virtualized Active Directory is ready for Primetime, Part II!

In the first of this two-part blog series, I discussed how virtualization-first is the new normal and fully supported; and elaborated on best practices for Active Directory availability, achieving integrity in virtual environments, and making AD confidential and tamper-proof.

In this second installment, I’ll discuss the elements of time in Active Directory, touch on replication, latency and convergence; the preventing and mediating lingering objects, cloning and of much relevance, preparedness for Disaster Recovery.

Proper Time with Virtualized Active Directory Domain Controllers (AD DC)

Time in virtual machines can easily drift if they are not receiving constant and consistent time cycles. Windows operating systems keep time based on interrupt timers set by CPU clock cycles. In a VMware ESXi host with multiple virtual machines, CPU cycles are not allocated to idle virtual machines.

To plan for an Active Directory implementation, you must carefully consider the most effective way of providing accurate time to domain controllers and understand the relationship between the time source used by clients, member servers, and domain controllers.

The Domain Controller with the PDC Emulator role for the forest root domain ultimately becomes the “master” timeserver for the forest – the root time server for synchronizing the clocks of all Windows computers in the forest. You can configure the PDC to use an external source to set its time. By modifying the defaults of this domain controller’s role to synchronize with an alternative external stratum 1 time source, you can ensure that all other DCs and workstations within the domain are accurate.

Why Time Synchronization Is Important in Active Directory

Every domain-joined device is affected by time!

Ideally, all computer clocks in an AD DS domain are synchronized with the time of an authoritative computer. Many factors can affect time synchronization on a network. The following factors often affect the accuracy of synchronization in AD DS:

  • Network conditions
  • The accuracy of the computer’s hardware clock
  • The amount of CPU and network resources available to the Windows Time service

Prior to Windows Server 2016, the W32Time service was not designed to meet time-sensitive application needs. Updates to Windows Server 2016 allow you to implement a solution for 1ms accuracy in your domain.

Figure 1: How Time Synchronization Works in Virtualized Environments

See Microsoft’s How the Windows Time Service Works for more information.

How Synchronization Works in Virtualized Environments

An AD DS forest has a predetermined time synchronization hierarchy. The Windows Time service synchronizes time between computers within the hierarchy, with the most accurate reference clocks at the top. If more than one time source is configured on a computer, Windows Time uses NTP algorithms to select the best time source from the configured sources based on the computer’s ability to synchronize with that time source. The Windows Time service does not support network synchronization from broadcast or multicast peers.

Replication, Latency and Convergence

Eventually, changes must converge in a multi-master replication model…

The Active Directory database is replicated between domain controllers. The data replicated between controllers called ‘data’ are also called ‘naming context.’ Only the changes are replicated, once a domain controller has been established. Active Directory uses a multi-master model; changes can be made on any controller and the changes are sent to all other controllers. The replication path in Active Directory forms a ring which adds reliability to the replication.

Latency is the required time for all updates to be completed throughout all domain controllers on the network domain or forest.

Convergence is the state at which all domain controllers have the same replica contents of the Active Directory database.

Figure 2: How Active Directory Replication Works

For more information on Replication, Latency and Convergence, see Microsoft’s Detecting and Avoiding Replication Latency.”

Preventing and Remediating Lingering Objects

Don’t revert to snapshot or restore backups beyond the TSL.

Lingering objects are objects in Active Directory that have been created, replicated, deleted, and then garbage collected on at least the Domain Controller that originated the deletion but still exist as live objects on one or more DCs in the same forest. Lingering object removal has traditionally required lengthy cleanup sessions using various tools, such as the Lingering Objects Liquidator (LoL).

Dominant Causes of Lingering Objects

  1. Long-term replication failures

While knowledge of creates and modifies are persisted in Active Directory forever, replication partners must inbound replicate knowledge of deleted objects within a rolling Tombstone Lifetime (TSL) # of days (default 60 or 180 days depending on what OS version created your AD forest). For this reason, it’s important to keep your DCs online and replicating all partitions between all partners within a rolling TSL # of days. Tools like REPADMIN /SHOWREPL * /CSV, REPADMIN /REPLSUM and AD Replication Status should be used to continually identify and resolve replication errors in your AD forest.

  1. Time jumps

System time jump more than TSL # of days in the past or future can cause deleted objects to be prematurely garbage collected before all DCs have inbound replicated knowledge of all deletes. The protection against this is to ensure that:

  • The forest root PDC is continually configured with a reference time source (including following FSMO transfers).
  • All other DCs in the forest are configured to use NT5DS hierarchy.
  • Time rollback and roll-forward protection has been enabled via the maxnegphasecorrection and maxposphasecorrection registry settings or their policy-based equivalents.
  • The importance of configuring safeguards can’t be stressed enough.
  1. USN rollbacks

USN rollbacks are caused when the contents of an Active Directory database move back in time via an unsupported restore. Root causes for USN Rollbacks include:

  • Manually copying previous version of the database into place when the DC is offline.
  • P2V conversions in multi-domain forests.
  • Snapshot restores of physical and especially virtual DCs. For virtual environments, both the virtual host environment AND the underlying guest DCs should be compatible with VM Generation ID. Windows Server 2012 or later, and vSphere 5.0 Update 2 or later, support this feature.
  • Events, errors and symptoms that indicate you have lingering objects.
Figure 3: USN Rollbacks – How Snapshots Can Wreak Havoc on Active Directory

Cloning

You should always use a test environment before deploying the clones to your organization’s network.

DC Cloning enables fast, safer Domain Controller provisioning through clone operation.

When you create the first domain controller in your organization, you are also creating the first domain, the first forest, and the first site. It is the domain controller, through group policy, that manages the collection of resources, computers, and user accounts in your organization.

Active Directory Disaster Recovery Plan: It’s a Must

Build, test, and maintain an Active Directory Disaster Recovery Plan!

AD is indisputably one of an organization’s most critical pieces of software plumbing and in the event of a catastrophe – the loss of a domain or forest – its recovery is a monumental task. You can use Site Recovery to create a disaster recovery plan for Active Directory.

Microsoft Active Directory Disaster Recovery Plan is an extensive document; a set of high-level procedures and guidelines that must be extensively customized for your environment and serves as a vital point of reference when determining root cause and how to proceed with recovery with Microsoft Support.

Summary

There are several excellent reasons for virtualizing Windows Active Directory. The release of Windows Server 2012 and its virtualization-safe features and support for rapid domain controller deployment alleviates many of the legitimate concerns that administrators have about virtualizing AD DS. VMware® vSphere® and our recommended best practices also help achieve 100 percent virtualization of AD DS.

Please reach out to your Dell EMC representative or checkout Dell EMC Consulting Services to learn how we can help you with virtualizing AD DS or leave me a comment below and I’ll be happy to respond back to you.

Sources

Virtualizing a Windows Active Directory Domain Infrastructure

Related Blog

Best Practices for Virtualizing Active Directory Domain Controllers (AD DC), Part I

About the Author: Matt Liebowitz

Matt Liebowitz is the Global Multicloud lead for the Dell Technologies Consulting Services Portfolio. He focuses on thought leadership and service development for multicloud, automation and data center related Consulting services. Matt has been named a VMware vExpert every year since 2010 and is a frequent blogger and author on a wide range of cloud related topics. Matt has been a co-author on three virtualization-focused books, including Virtualizing Microsoft Business-critical Applications on VMware vSphere and VMware vSphere Performance. He is also a frequent speaker at the VMware Explore and Dell Technologies World conferences.
Topics in this article