Insights Crash Dump Analysis

Crash Dump Analysis

Extracting information from a memory dump after a server crash is an important part of root cause analysis.  Although this is an advanced topic, and debugging crash dumps is often a very complex task, here we will look at the basics.  This information is enough to get started and debug a simple crash that has a clear cause.
 
Tools Required
WinDbg – WinDbg is the main program for debugging code and analyzing crash dumps.  This software is provided by Microsoft as part of the Windows SDK (Software Development Kit).  You can get more information and download it from docs.microsoft.com.
 
Debug Symbols – debug symbols provide additional metadata about the program you’re debugging to make it easier to read and understand the details of what’s in the dump.  Although they can be manually downloaded ahead of time, it’s easiest to let WinDbg download the symbols on-demand.  That will be covered in a later section.
 
Obtaining a Memory Dump
After a Windows server crashes, you should see a “memory.dmp” file in C:\Windows\.  This file contains a dump of the system memory (RAM) from the time of the crash.  Copy this file to your workstation so you can perform analysis on it.


 Analyzing a Dump
Once you have WinDbg installed and a memory dump file in hand, you can actually perform an analysis.  First, open up WinDbg on your workstation.  Then, go to “File” > “Symbol File Path…” and copy/paste this text into the box: srv*C:\Temp\symbols*http://msdl.microsoft.com/download/symbols.  Next, press OK.  This instructs WinDbg to download the debugging symbols from the official Microsoft servers and cache a copy in “C:\Temp\symbols” to avoid having to re-download the symbols every time you load a dump file.  You need to enter this every time you analyze a dump – the local cache just speeds up future debugging.
 
With the symbols set, go to “File” > “Open Crash Dump…”, browse to the memory.dmp file you copied from the server that crashed, and open it.  You’ll immediately see a new console-type window pop up, and it will list the loading steps as it loads the dump file.  When it’s done, you may see this type of line which clearly indicates the likely file or driver that caused the crash: Other times, you’ll have to do more digging to get the cause.  The very first thing to do is run the “!analyze -v” command using the toolbar at the bottom.  Typing that command in the command bar and pressing enter will cause WinDbg to run a more in-depth analysis of the dump file:

Note: this command also shows some details of your workstation.  In the above screenshot, the BUILDOSVER_STR refers to my Windows 10 PC used to run WinDbg.

The output of that command will include various details of the crash, including the server name and OS, crash date/time, and the stack text.  The stack text shows the chain of commands that led to the crash.  In the below screenshot, we see the “IntcDAud” process is very involved in the chain, which would lead us to believe it caused the crash:

A quick web search of the “IntcDAud” name indicates this is the Intel audio driver.  The next step would be look up that driver on the Intel site for known issues and/or any available updates.  Ultimately, this crash was not caused by the OS itself but by the audio driver.
 
Summary
Using the “!analyze -v” command to automatically analyze a dump file is just the beginning steps of understanding what caused a server to crash.  If you’d like to learn more, courses are available online, or you can get hands-on by analyzing actual dump files generated from servers that have crashed.
 
For more information about Crash Dump Analysis, see the official documentation.