I get a chuckle with how many places will see SYMPTOMS of some sort and the first thing they come up will be to _decide_ the problem is that they need to reboot a DC. If I had to give this form of troubleshooting a name I would give it something like “Wishful thinking”, “I should be fired”, “I can’t be bothered to do my job”, etc.
This is especially popular in a larger orgs with a mix of decentralized and centralized support resources and domain controllers. Someone will have maybe once seen that rebooting a DC “solved” a problem and that becomes THE solution. You will get someone out in a remote site or in an application group who can somehow get physical access to a DC (or server in general) who will then take it as their job to reboot that server that they think needs to be rebooted, regardless of whether it does or not. Alternately maybe they will “request” the reboot and when I quote the word request it is because I don’t mean request, I mean they why and complain and demand you reboot as they “know” that will solve the problem.
The proper answer is to try and work out what the problem is, especially if it is recurring. I have walked into environments where I have been told the solution to a problem is to just reboot the DC when it occurs. This is not a solution, this is a bandage to alleviate symptoms. A solution involves actually troubleshooting and trying to work out the actual issues. And you can’t do that if the first thing you do is reboot.
This silly type “troubleshooting” mechanism is not just something low skill admins come up with but it is also something that I have heard come from the mouths of MSFT people unfortunately, more often MCS (Consulting) folks versus PSS. Most of the senior folks in PSS are very good in this area, the last thing they want is rebooting because it often erases all evidence of what is going on if the problem really is at the server being rebooted.
As one quick actual example, there was one company I went into back in 2001 who had some issues. Approximately 80% of the DCs in one domain (so about 80 DCs) weren’t actually working properly only they had no monitoring and they had no one proactively looking at things. The troubleshooting mechanism was if anyone at the site complained, the DC was rebooted. The local site people actually trained that that was the solution so their tickets started changing from, we see these symptoms to “reboot this DC” which would be simply processed by the centralized people (This is a firing offense in my opinion, trouble tickets are often like you telling your doctor your issues, he/she is supposed to take that and really work through the problem, not listen to you say what you would like and then just do it). When my group took these DCs over we were getting double digit reboot requests a week and immediately our response was “no, not going to happen”, you tell us what is going on and what you think is wrong and we will take it from there. I can’t begin to explain how bad this pissed some people off because they just wanted it done. I had high level escalations etc and thankfully this is an easy battle to win if your management aren’t complete idiots. The argument is “I would like to figure out WHY this needs to be done every X days/months/etc versus just doing it and maybe we can remove the reboot need entirely and give more availability.”… See how I got that availability key word in there. Mucho helpful. Anyway, there wasn’t a single issue I don’t think we didn’t track down to specific items that we were able to correct and the environment within 3-6 months stabilized dramatically due to lack of reboots and actually having everything configured properly.
Don’t get me wrong, sometimes a reboot is the answer, even the correct one. But you need to work to understand WHY it is. Rebooting because you know it will allieve symptoms is not troubleshooting, don’t pretend that it is. If you do reboot, what other steps are you then taking to ascertain what that reboot did to solve the issue and then prevent those things from occurring again?
joe