Active Directory during DR Tests

Author by Mitchell Grande

Due to their central role in a Windows environment, including Active Directory domain controllers in a disaster recovery test will nearly always be required.  However, how do you ensure that the DCs function correctly, and safely, in a DR environment without having any impact on production?  Let's look at the best practices around Active Directory in a DR test.  These ideas apply to both on-premise replication to a secondary site and replication to Azure using Azure Site Recovery.

Solution Design

Before actually performing a disaster recovery test, the plan for Active Directory should be reviewed to ensure it won't cause any conflicts with the production network.  Note that this information is not applicable to DCs in an actual failover.  That is a different topic altogether with different considerations to be made.  Instead, we are looking at a DR test where all of the servers are being run in a separate network without taking the production servers offline.

When doing a DR test where the production network remains online, a fully isolated test network should be used.  This network can have internet access (inbound and/or outbound depending on the application requirements), but communication to all other internal networks should be blocked.  With an isolated network though, how do you provide Active Directory services?  You need to bring a DC into the isolated network, for which there are two main options.  First, you can replicate a DC along with the other servers.  This DC would not be used during a true failover, but it would be used in the isolated network of a test failover.  Alternately, you can clone a DC from the production network into the isolated network as part of the test failover process.

The Isolated Network

Regardless of which method you use to get a DC into the test failover network, it is critically important that the replica or cloned DC cannot communicate with the DCs in the production network.  Considering that the test failover DC will be a copy of a still-running production DC, any communication with the other DCs would cause significant replication issues that would be very difficult to resolve.  Additionally, oftentimes there are changes done in the test failover environment that you wouldn't want replicated to the main network.  The safest and easiest way to achieve this isolation is to create a separate network that's dedicated to test failovers and never configure it to route back to the main networks.

Common Issues During Test Failover

Due to the isolated nature of the test failover network, there are some common domain controller issues to be dealt with.  After bringing the test failover DC online, the FSMO roles should be seized to it.  Since the DCs holding those roles aren't accessible from the test environment, various issues can be encountered if this isn't done.  Additionally, you may run into issues getting Active Directory online on a single DC when it is expecting the other DCs to be reachable.  To resolve that, it is sometimes necessary to remove and do a metadata cleanup of all of the other DCs on the test failover DC.

The isolated DC may experience other issues, such as with SYSVOL replication, time sync, etc.  Use dcdiag to review the health of the domain controller and resolve any issues required to get it functioning.  For issues with SYSVOL replication, which would prevent the SYSVOL and NETLOGON shares from working, you may need to do an authoritative restore.  This is done using the burflags setting for FRS Replication or a different process for DFS Replication.

Summary

Here is a short list of the best practices for Active Directory and DR testing:

  • Plan the DR test thoroughly to avoid causing issues with production services.
  • Use a fully isolated network for the test failover environment.  If you aren't already sure that it's fully isolated, use ping or other tools to ensure that the production DCs can't be contacted from the network.
  • Bring a clone or replica of a DC into the test failover network.
  • Seize the FSMO roles on the test failover DC.
  • If required, remove and do a metadata cleanup of all other DCs on the test failover DC.
  • Use dcdiag to ensure the DC is fully operational.
Author

Mitchell Grande

Systems Engineer

Tags in this Article