Windows 8 has been out for a while, featuring an interface that's as cool as it is annoying . . . until you get the hang of it. But, like any computer operating system, it can fall over. Luckily, there is an easy way to solve the cause of most crashes; just call up WinDbg, the Windows debugger; a free tool to diagnose the most common causes of Windows crashes -- misbehaved third party drivers.
In W8, the Blue Screen of Death/BSOD has been modified to include a large, simple : ( emoticon and a short message in human (if not very informative) language.
Also, Microsoft has made advancements in the dump file creation and management process. While this article focuses on W8, the information applies to both RT and Server 2012. For earlier operating systems, see Solve Windows 7 crashes in minutes or, for XP and 2000, see How to solve Windows crashes in minutes.
About Windows crashes
Operating system crashes are quite different from applications crashes, system hangs or other problems. In most cases, operating systems crash as a protective measure. When the OS discovers that critical devices are failing or that an internal operating system state has been identified as inconsistent because of possible viruses, bad device drivers or even RAM failures, it is generally safer to stop immediately. Otherwise, continuing operations would allow far more serious damage, such as application data corruption or loss.
[HELP IS ON THE WAY: Where to go for help with Windows crashes]
Two out of three system crashes are caused by third party drivers taking inappropriate actions (such as writing to non-existent memory) in Kernel mode where they have direct access to the OS kernel and to the hardware.
In contrast, drivers operating in User Mode, with only indirect access to the OS kernel, cannot directly cause a crash. A small percentage of crashes are caused by hardware issues such as bad memory, even less by faults in the OS itself. And some causes are simply unknown.
Thanks for the memory dump
A memory dump is the ugliest best friend you'll ever have. It is a snapshot of the state of the computer system at the point in time that the operating system stopped. And, of the vast amount of not-very-friendly looking data that a dump file contains, you will usually only need a few items that are easy to grasp and use. With the introduction of Windows 8, the OS now creates four different memory dumps; Complete, Kernel, and Minidumps and the new Automatic memory dump.
1. Automatic memory dump
Size: Hsize of OS kernel
The Automatic memory dump is the default option selected when you install Windows 8. It was created to support the "System Managed" page file configuration which has been updated to reduce the page file size on disk. The Automatic memory dump option produces a Kernel memory dump, the difference is when you select Automatic, it allows the SMSS process to reduce the page file smaller than the size of RAM.
2. Complete memory dump
Size: Hsize of installed RAM plus 1MB
A complete (or full) memory dump is about equal to the amount of installed RAM. With many systems having multiple GBs, this can quickly become a storage issue, especially if you are having more than the occasional crash. Normally I do not advise saving a full memory dump because they take so much space and are generally unneeded. However, there are cases when working with Microsoft (or another vendor) to find the cause of a very complex problem that the full memory dump would be very helpful. Therefore, stick to the automatic dump, but be prepared to switch the setting to generate a full dump on rare occasions.
3. Kernel memory dump
Size: Hsize of physical memory "owned" by kernel-mode components
Kernel dumps are roughly equal in size to the RAM occupied by the Windows 8 kernel. On my test system with 4GB RAM running Windows 8 on a 64-bit processor the kernel dump was about 336MB. Since, on occasion, dump files have to be transported, I compressed it, which brought it down to 80MB. One advantage to a kernel dump is that it contains the binaries which are needed for analysis. The Automatic dump setting creates a kernel dump file by default, saving only the most recent, as well as a minidump for each event.~~
4. Small or minidump
Size: At least 64K on x86 and 128k on x64 (279K on my W8 test PC)
Minidumps include memory pages pointed to them by registers given their values at the point of the fault, as well as the stack of the faulting thread. What makes them small is that they do not contain any of the binary or executable files that were in memory at the time of the failure.
However, those files are critically important for subsequent analysis by the debugger. As long as you are debugging on the machine that created the dump file, WinDbg can find them in the System Root folders (unless the binaries were changed by a system update after the dump file was created). Alternatively the debugger should be able to locate them automatically through SymServ, Microsoft's online store of symbol files. Windows 8 creates and saves a minidump for every crash event, essentially providing a historical record of all events for the life of the system.
Configure W8 to get the right memory dumps
While the default configuration for W8 sets the OS to generate the memory dump format you will most likely need, take a quick look to be sure. From the W8 Style Menu simply type "control panel" (or only the first few letters in many cases) which will auto-magically take you to the Apps page where you should see a white box surrounding "Control Panel"; hitting Enter will take you to that familiar interface.
The path to check Windows 8 Memory Dump Settings, beginning at Control Panel, follows:
Control Panel | System and Security | System | Advanced system settings | Startup and Recovery | Settings
Once at the Startup and Recovery dialogue box (below) ensure that "Automatic memory dump" is checked. You will probably also want to ensure that both "Write an event to the system log" and "Automatically restart" (which should also be on by default) are checked.
To set your PC up for WinDbg-based crash analysis, you will need the following:
" 32-bit or 64-bit Windows 8/R2/Server 2012/Windows 7/Server 2008
Depending on the processor you are running the debugger on, you can use either the 32-bit or the 64-bit debugging tools. Note that it is not important whether the dump file was made on an x86-based or an x64-based platform.
" The Debugging Tools for Windows portion of the Windows SDK for Windows 8, which you can download for free from Microsoft.
" Approximately 103MB of hard disk space (not including storage space for dump files or for symbol files)
" Live Internet connection
First download sdksetup.exe, a small file (969KB) that launches the Web setup, from which you select what components to install.
" Automated download (the download will start on its own):
Ignore the disk space required of 1.2GB; you will only be installing a small portion of the kit. On my test machine the installation process predicted 256.2MB but only needed 103MB according to File Explorer following installation.
Install the Software Development Kit (SDK) to the machine that you will use to view memory dump files.
A. Launch sdksetup.exe.
B. Specify location:
The suggested installation path follows:
C:\Program Files (x86)\Windows Kits\8.0\
If you are downloading to install on a separate computer, choose the second option and set the appropriate path.
C. Accept the License Agreement
D. Remove the check marks for all but Debugging Tools for Windows
What are symbols and why do I need them?
Now that the debugger is installed and before calling up a dump file you have to make sure it has access to the symbol files. Symbol tables are a byproduct of compilation. When a program is compiled, the source code is translated from a high-level language into machine code. At the same time, the compiler creates a symbol file with a list of identifiers, their locations in the program, and their attributes. Since programs don't need this information to execute, it can be taken out and stored in another file. This reduces the size of the final executable so it takes up less disk space and loads faster into memory. But, when a program causes a problem, the OS only knows the hex address at which the problem occurred, not who was there and what the person was doing. Symbol tables, available through the use of SymServe, provide that information.
SymServ (also spelled SymSrv) is a critically important utility provided by Microsoft that manages the identification of the correct symbol tables to be retrieved for use by WinDbg. There is no charge for its use and it functions automatically in the background as long as the debugger is properly configured, and has unfettered access to the symbol store at Microsoft.~~
From the W8 UI, right-click on the version of WinDbg you will use (x64 or x86) then select "Run as administrator" from the bar that pops up from the bottom of the screen. You will then see a singularly unexciting application interface; a block of gray. Before filling it in with data you must tell it where to find the symbol files.
Setting the symbol File Path
There is a massive number of symbol table files for Windows because every build of the operating system, even one-off variants, results in a new file. Using the wrong symbol tables would be like finding your way through San Francisco with a map of Boston. To be sure you are using the correct symbols, at WinDbg's menu bar, select the following:
File | Symbol file path
In the Symbol search path window enter the following address:
Note that the address between the asterisks is where you want the symbols stored for future reference. For example, I store the symbols in a folder called symbols at the root of my c: drive, thus:
Make sure that your firewall allows access to msdl.microsoft.com.
How WinDbg handles symbol files
When opening a memory dump, WinDbg will look at the executable files (.exe, .dll, etc.) and extract version information. It then creates a request to SymServ at Microsoft, which includes this version information and locates the precise symbol tables to draw information from. It won't download all symbols for the specific operating system you are troubleshooting; it will download what it needs.
Space for symbol files
The space needed to store symbols varies. In my W8 test machine, after running numerous crash tests, the folder was about 35MB. On another system, running W7, and on which I opened dump files from several other systems the folder was still under 100MB. Just remember that if you open files from additional machines (with variants of the operating system) your folder can continue to grow in size.
Alternatively, you can opt to download and store the complete symbol file from Microsoft. Before you do, note that - for each symbol package - you should have at least 1GB of disk space free. That's because, in addition to space needed to store the files, you also need space for the required temporary files. Even with the low cost of hard drives these days, the space used is worth noting.
" Each x86 symbol package may require 750 MB or more of hard disk space.
" Each x64 symbol package may require 640 MB or more.
Symbol packages are non-cumulative unless otherwise noted, so if you are using an SP2 Windows release, you will need to install the symbols for the original RTM version and for SP1 before you install the symbols for SP2.
Create a dump file
What if you don't have a memory dump to look at? No worries. You can generate one yourself. There are different ways to do it, but the best way is to use a tool called NotMyFault created by Mark Russinovich.
To get NotMyFault, go to the Windows Internals Book page at SysInternals and scroll down to the Book Tools section where you will see a download link. The tool includes a selection of options that load a misbehaving driver (which requires administrative privileges). After downloading, I created a shortcut from the desktop to simplify access.
Keep in mind that using NotMyFault WILL CREATE A SYSTEM CRASH and while I've never seen a problem using the tool there are no guarantees in life, especially in computers. So, prepare your system and have anyone who needs access to it log off for a few minutes. Save any files that contain information that you might otherwise lose and close all applications. Properly prepared, the machine should go down, reboot and both a minidump and a kernel dump should be created.
Launch NotMyFault and select the High IRQL fault (Kernel-mode) then . . . hit the Crash button. Your Frown-of-Frustration will appear in a second, both a minidump and a kernel dump file will be saved and - if properly configured - your system will restart.
Over the W8 UI will be a band of blue with the message that "Your PC ran into a problem . . . ". If you click the "Send details" button, Microsoft will use WinDbg and the command "!analyze" as part of an automated service to identify the root cause of the problem. The output is combined with a database of known driver bug fixes to help identify the failure.
Launch WinDbg and (often) see the cause of the crash
Launch WinDbg by right-clicking on it from the W8 UI then select "Run as administrator" from the bar that pops up at the bottom of the screen. Once the debugger is running, select the menu option
File | Open Crash Dump
and point it to open the dump file you want to analyze. Note that WinDbg will open any size dump file; a minidump, kernel dump or complete dump file. When offered to Save Workspace Information, say Yes; it will remember where the dump file is.
A command window will open. If this is the first time you are using WinDbg on this system or looking at a dump file from another system you have not loaded files for before, it may take a moment to fill with information. This is because the debugger has to identify the precise release of Windows then go to SymServ at Microsoft and locate the corresponding symbol files and download the ones it needs. In subsequent sessions this step is unneeded because the symbols are saved on the hard drive. Once WinDbg has the symbols it needs it will run an analysis and fill the window with the results. This will include basic information such as the version of WinDbg, the location and name of the dump file opened, the symbol search path being used and even a brief analysis offering, in this case,
Probably caused by : myfault.sys
which, of course, we know to be true (myfault.sys is the name of the driver for NotMyFault).
WinDbg Error Messages
If WinDbg reports a *** WARNING or an *** ERROR, the solution is usually simple. The following lists the common messages, what they mean and how to resolve them.
*** WARNING: Unable to verify timestamp for ntoskrnl.exe
*** ERROR: Module load completed but symbols could not be loaded for ntoskrnl.exe
This is important. When you see these two messages near the beginning of the output from WinDbg, it means that you will not get the analysis that you need. This is confirmed after the "Bugcheck Analysis" is automatically run, and the message
***** Kernel symbols are WRONG. Please fix symbols to do analysis
Likely causes follow:
" No path/wrong path; a path to the symbol files has not been set or the path is incorrect (look for typos such as a blank white space). Check the Symbol Path.
" Failed connection; check your Internet connection to make sure it is working properly.~~
" Access blocked; a firewall blocked access to the symbol files or the files were damaged during retrieval. See that no firewall is blocking access to msdl.microsoft.com (it may only be allowing access to www.microsoft.com).
Note that if a firewall initially blocks WinDbg from downloading a symbol table, it can result in a corrupted file. If unblocking the firewall and attempting to download the symbol file again does not work; the file remains damaged. The quickest fix is to close WinDbg, delete the symbols folder (which you most likely set at c:\symbols), and unblock the firewall. Next, reopen WinDbg and a dump file. The debugger will recreate the folder and re-download the symbols.
Do not go further with your analysis until this is corrected.
If you see the following error, no worries:
*** WARNING: Unable to verify timestamp for myfault.sys
*** ERROR: Module load completed but symbols could not be loaded for myfault.sys
This means that the debugger was looking for information on myfault.sys. However, since it is a third-party driver, there are no symbols for it, since Microsoft does not store all of the third-party drivers. The point is that you can ignore this error message. Vendors do not typically ship drivers with symbol files and they aren't necessary to your work; you can pinpoint the problem driver without them.
So, what caused the crash?
As mentioned above, when you open a dump file with WinDbg it automatically runs a basic analysis that will often nail the culprit without even giving the debugger any direct commands as shown in the screen where it says "Probably caused by : myfault.sys"
Getting a little more information about the crash event and the suspect module is easy. Often, all you need is two commands among the hundreds that the rather powerful debugger offers:
A new way to command WinDbg
Normally, you would type in the commands and parameters you need. Things have changed, however, and Windows too. If you take a good look at the WinDbg interface, just below the "Bugcheck Analysis" box, it says "Use !analyze -v to get detailed debugging information" and that the command is underlined and in blue. Yes, it's a link. Just touch it and the command will be run for you. But, in case you don't have a touch screen, a mouse will work fine or resort to the traditional method of typing the command into the window at the bottom of the interface where you see the prompt "kd>" (which stands for "kernel debugger"). Be sure to do it precisely; this is a case where syntax is key. For instance, note the space between the command and the "-v". The "v" or verbose switch tells WinDbg that you want all the details. You can do the same where you see the link for myfault which will display metadata for the suspect driver.
Output from !analyze -v
The analysis provided by !analyze -v is a combination of English and programmer-speak, but it is nonetheless a great start. In fact, in many cases you will not need to go any further. If you recognize the cause of the crash, you're probably done.
Output from !analyze -v
The !analyze -v provides more detail about the system crash. In this case it accurately describes what the test driver (myfault.sys) was instructed to do; to access an address at an interrupt level that was too high.~~
An attempt was made to access a pageable (or completely invalid) address at an interrupt request level (IRQL) that is too high. This is usually caused by drivers using improper addresses.
Under Debugging Details the report suggests that the problem was a "WIN_8_DRIVER_FAULT" and that it NotMyFault.exe was active.
An important feature of the debugger's output using !analyze -v is the stack text. Whenever looking at a dump file always look at the far right end of the stack for any third-party drivers. In this case we see myfault. Note that the chronologic sequence of events goes from the bottom to the top; as each new task is performed by the system it shows up at the top. In this rather short stack you can see that myfault was active, then a page fault occurred, and the system declared a BugCheck, which is when the system stopped (Blue Screened).
One way to look at this is that when you see a third-party driver active on the stack when the system crashed, it is like walking into a room and finding a body on the floor and someone standing over it with a smoking gun in his hand; it doesn't mean that he is guilty but makes him suspect No.1.
Output from lmvm (or by selecting myfault)
Knowing the name of a suspect is not enough; you need to know where he lives and what he does. That's where lmvm comes in. It provides a range of data from this image path (not all drivers live in %systemroot%\system32\drivers.), time stamp, image size and file type (in this case a driver) to the company that made it, the product it belongs to, version number and description. Some companies even include contact information for technical support. What the debugger reports, though, is solely dependent upon what the developer included, which, in some cases, is very little.
After you find the vendor's name, go to its Web site and check for updates, knowledge base articles, and other supporting information. If such items do not exist or do not resolve the problem, contact them. They may ask you to send along the debugging information (it is easy to copy the output from the debugger into an e-mail or Word document) or they may ask you to send them the memory dump (zip it up first, both to compress it and protect data integrity).
If you have any questions regarding the use of WinDbg, check out the WinDbg help file. It is excellent. And, when reading about a command be sure to look at the information provided about the many parameters such as "-v" which returns more (verbose) information.
The other third
While it's true that, by following the instructions above, you'll likely know the cause of two out of three crashes immediately; that does leave that annoying other third. What do you do then? Well, the list of what could have cause the system failure is not short; it can range from a case fan failing, allowing the system to overheat, to bad memory.
Sometimes it's the hardware
If you have recurring crashes but no clear or consistent reason, it may be a memory problem. Two good ways to check memory are the Windows Memory Diagnostic tool and Memtest86. Go to Control Panel and enter "memory" into its search box then select "Diagnose your computer's memory problems".
This simple diagnostic tool is quick and works great. Many people discount the possibility of a memory problem, because they account for such a small percentage of system crashes. However, they are often the cause that keeps you guessing the longest.
Is Windows the culprit?
In all probability: no. For all the naysayers who are quick to blame Redmond for such events, the fact is that Windows is very seldom the cause of a system failure. But, if ntoskrnl.exe (Windows core) or win32.sys (the driver that is most responsible for the "GUI" layer on Windows) is named as the culprit -- and they often are - don't be too quick to accept it. It is far more likely that some errant third-party device driver called upon a Windows component to perform an operation and passed a bad instruction, such as telling it to write to non-existent memory. So, while the operating system certainly can err, exhaust all other possibilities before you blame Microsoft.
What about my antivirus driver?
Often you may see an antivirus driver named as the culprit but there is a good chance it is not guilty. Here's why: for antivirus code to work it must watch all file openings and closings. To accomplish this, the code sits at a low layer in the OS and is constantly working so that he will often be on the stack of function calls that was active when the crash occurred.
Missing vendor information?
Some driver vendors don't take the time to include sufficient information with their modules. So if lmvm doesn't help, try looking at the subdirectories on the image path (if there is one). Often one of them will be the vendor name or a contraction of it. Another option is to search Google. Type in the driver name and/or folder name. You'll probably find the vendor as well as others who have posted information regarding the driver.
Bear in mind that the time it took you to read this primer and to configure WinDbg on your system is far more effort than you will need to solve two of three crashes. Indeed, most crash analysis efforts will take you less than one minute. And, while the other third can certainly be more challenging, at least you'll have more time to try.
Smith is president of Alexander LAN Inc., a freelance consultant and writer in IT. He can be reached at DirkADSmith@gmail.com.
Read more about software in Network World's Software section.