With Active Directory having a decentralized database, healthy replication is extremely important to ensuring it functions correctly. Replication problems can lead to all sorts of issues, including authentication failures, machines falling off the domain, or worse. Let's take a look at some ways to diagnose and troubleshoot basic replication problems.
Replication Health Review
Before doing anything else, you first need to determine whether or not replication is working successfully in the domain. Windows has the built-in repadmin command line tool that can be used to check replication status, both on the local domain controller or other DCs in the network. Additionally, the Directory Services event log will usually log errors for ongoing replication issues.
We've looked at repadmin in a previous entry, but as a quick recap, repadmin has two main switches for reviewing replication status. Using repadmin /showrepl will show the inbound replication status of the local server. That is, it will show any errors replicating into the local DC you're running the command on. On the other hand, repadmin /replsum will provide a replication summary of the entire domain by reaching out to every DC, collecting their replication status, and then compiling it into a summary table. Here are some examples of good and bad results for each form of the command:
In addition to the repadmin tool, the Directory Services event log can provide insight into replication issues. These examples both show unhealthy replication:
Troubleshooting Replication Issues
Using the above tools, you may find that there are replication issues affecting one or more DCs in the environment. The first troubleshooting step is to identify which DCs are affected and in which direction replication is failing between them. Troubleshooting should focus on the server where repadmin /showrepl returns failures and/or there are errors in the Directory Services log. This is because replication is always a pull operation. If DC01 is replicating to DC02, DC02 is actually connecting to DC01 to receive (pull) changes to the AD database. Since DC02 is initiating the replication operation, most troubleshooting would be done on that server.
Looking at the first failure example from the previous section, DC5 is attempting to receive changes from DC4, but is unable to. In a situation like this, the following are the key points to check, in this order:
- Does nslookup DC4 return the correct IP address when run on DC5?
- If not, this would indicate that there's an issue with DNS. Either DC4 isn't registering the correct IP address in DNS, or DC5 is unable to resolve the right IP for some other reason.
- Can DC5 ping DC4 by IP address?
- If not, this could show that there's no network connectivity between the DCs.
- Can you connect to some common AD ports on DC4 from DC5?
- You can test this with telnet or these PowerShell commands: Test-NetConnection DC4 -Port 135 and Test-NetConnection DC4 -Port 389.
- If those fail, there may be a firewall issue between the two servers.
- Are there serious errors in the Directory Service log on DC4?
- If DC4 is experiencing its own errors internally, it may not respond to replication requests from other DCs
- Does dcdiag return any significant errors for either DC4 or DC5?
- Dcdiag can point out other issues that may affect replication.
- If you download and run PortQryUI, using the "Domains and Trusts" test from DC5 to DC4, are there other failed ports?
- Sometimes, some ports are open but others are blocked by the firewall. PortQryUI makes it easy to scan all required ports at once.
If you run through this list, you'll likely find the source of the problem and can begin to resolve the underlying issue. If you get through all of these steps and still haven't found the root cause, more in depth investigation is required.
In addition to repadmin and the event log, Microsoft has a semi-official tool called the Active Directory Replication Status Tool (aka ADREPLSTATUS). You can download and install this tool to help diagnose and resolve replication issues. Other Microsoft and third party tools also exist for monitoring replication health in a domain including SCOM and Azure Log Analytics. Using an external monitoring tool like these can provide proactive alerting to uncover problems before they turn into major issues.