Well I was thinking, I need another tech blog entry up here so I looked through my list of drafts to see which one I could most quickly firm up. My drafts are things that I want to write about because someone asked a question or because I think it is cool or important or simply so I don’t forget something I learned or figured out.
Currently the drafts sort of looks like
Your Drafts: NTDS Connection Object options attribute values, Setting a password in AD with ADMOD, D-who??? What is this DSID thing and do I care?, Reducing traffic on the wire when using ADSI, My queries get really slow once I use bitwise filters…., Using auxiliary objectClasses – Static VS Dynamic, Modifying lockout policy from the command line… , Changing interval that lastLogonTimeStamp gets updated…, Protecting command line parameters from the command interpreter, Replication metadata in Windows Server 2003 AD and ADAM, Encoding binary values for LDAP Queries….
and there is actually more but that is enough to give a sample. I think of a lot of stuff and I type fast but unfortunately I don’t type as fast as I think and any time I try to type that fast I end up saying 14 things at once with none of it making sense. 🙂
So onto the topic…
D-Who??? What is this DSID thing and do I care?
One of the big issues with many programs is that they give very crappy error information when they have a problem if they choose to tell you they had a problem at all. My programs aren’t exempt from this either, I have some apps that give horrendous error message help. The problem is often that you don’t really know what kind of errors may be encountered or you are just being lazy and don’t check for errors in all of the places you should. This is especially true if you are the type to copy and paste code instead of actually writing code because 99% of the examples out there have absolutely no error checking. This is mostly because the people writing the examples can’t be bothered or don’t actually know how to do it properly but use the very handy excuse of
“For purposes of clarity, full error checking is not provided in the code examples.”
This goes double for using proper security programming practices when open/creating objects and you have the opportunity to provide a security descriptor for the operation. NULL is much easier to use and looks much cleaner. It can be said that hackers tend to prefer it as well.
So you have this program doing something to/with AD on your behalf and blammo it breaks. It doesn’t work. You don’t get what you want. Snake Eyes. Whatever. More than likely you will get some crappy error like
C:\>repadmin /replsingleobj 2k3dc02 2k3dc04 ou=testou,dc=joe,dc=com
[d:\r2\ds\adam\src\util\repadmin\repdsrep.c, 1266] LDAP error 1 (Operations Error) Win32 Err 110.
or even worse, you don’t get any error and instead see no more output or get “The command did not complete successfully.” or “things have gone pear-shaped dude” or something like that and all you can do is WISH you got the crappy error message above.
So you look at the message and think
“My what a handy error message, let me just go look on my D: drive for the file d:\r2\ds\adam\src\util\repadmin\repdsrep.c so I can look at line 1266 to see what the problem is…”
You peek around and low and behold, you most likely don’t find it. Ok, so maybe the rest of the error message is helpful…
LDAP error 1, Operations Error… err nope, not much help there.
Ok how about Win32 Err 110…
C:\>net helpmsg 110
The system cannot open the device or file specified.
Don’t know about you, but I didn’t think I was opening any devices or files…
Hmm I guess I have rolled snake eyes, now it is time to start randomly guessing what I have done wrong and try different values until I finally get it to work…..
WRONG!
Instead you start up Ethereal because you learned long ago that when troubleshooting problems with programs that run across the network, it is handy to actually know what is going across the network. So now Ethereal is running, if you know what kind of traffic may have been generated, you can specify a filter to narrow down the amount of traffic which I highly recommend. Obviously you don’t always know enough so you have to spend a week and a half looking at every packet of an entire 1GB trace from a chatty Exchange Public Folder server from a 15 minute period of time which sucks but that is another story. In this story I happen to know that this operation was done over LDAP so I enter a handy dandy filter of “tcp port 389” for my capture filter and tell Ethereal to start capturing packets.
I run the command and low and behold I capture 36 packets so I can look through them.
Looking quickly through the packet info fieds at the summaries I see one that says
MsgId=22 Modify Result, operationsError
seeing that our previous error said (and I repeat myself)
LDAP error 1 (Operations Error)
there is a good possibility that that packet may have the necessary info I need.
Opening the packet I see
LDAP Message, Modify Result
Message Id: 22
Message Type: Modify Result (0x07)
Message Length: 74
Response To: 29
Time: 0.001056000 seconds
Result Code: operationsError (0x01)
Matched DN: (null)
Error Message: 0000208D: SvcErr: DSID-032107D9, problem 5012 (DIR_ERROR), data 1\n
Interesting. That is more info, but is it any more help? What the heck does it mean?
Well that last line holds a message that is called an extended error message and nearly all AD LDAP query/mod failures have one though rarely do programs actually tell you about them. There are three main pieces of info here. They are
I. Error Message: 0000208D
II. SvcErr: DSID-032107D9,
III. problem 5012 (DIR_ERROR), data 1
Let me start from the last piece and work back to the first piece because in this case the first piece is actually the most useful but I still want to tell you about the other pieces because you might have wondered about them.
So in the last piece the part I tend to focus on is the value in the parens. In this case it is DIR_ERROR. That really isn’t too entirely helpful; you now know you have a Directory Error. For you non-LDAP folks this isn’t a file directory, it is an LDAP Directory error. Sometimes this last section of info is more helpful, for example I posted in this previous post another extended error of
Extended Error: 000004DC: LdapErr: DSID-0C09062B, comment: In order to perform this operation a successful bind must be completed on the connection., data 0, vece
You will note that the comment: string is much more useful than the message we got in this case. DIR_ERROR is undoubtably useful to someone, just not the normal admins out there running into this problem.
So on to the next piece which is the mysterious and beautiful DSID… Specifically DSID 032107d9. Ah, I know EXACTLY what the problem is now! Well no, I lied. Unless you are with MS or otherwise have access to a special little program called DSID.EXE you won’t know what this number means. It is actually quite a brilliant little item; it is a direct pointer to the actual line of code where the issue was flagged. I told you it was brilliant. You should wish this was in every error message from everything from MS. After you run DSID.EXE with the DSID returned from AD you will see a file name and a line number, I would show what it looks like but someone somewhere would probably think I should get in trouble for doing that because all sorts of trouble can occur if people get to see a source file name and source line number. Err… Ok, now if you have Microsoft Windows Operating System source code access, you open that file and look for that line number and you will likely now have a much better idea of what is wrong. It isn’t guaranteed, but it can be very useful. A few quick notes about DSID.EXE…
1. There is a KB article (Q249256) that talks about getting DSID.EXE from Microsoft to help troubleshoot intra-site replication failures. Specifically it says
13. You may receive the following error message when you run the previous Repadmin.exe command:
The security context could not be established due to a failure in the requested quality of service.
If you do, turn up internal processing and look for “DSID”s. Contact Microsoft Product Support Services (PSS) for information about how to obtain the Dsid.exe tool. For information about how to contact Microsoft PSS, visit the following Microsoft Web site:
http://support.microsoft.com (http://support.microsoft.com/)
Hopefully from what I said above you realize how fruitless it is to get this program because most everyone who will be troubleshooting intrasite replication failures will NOT have the requisite source code access to make any kind of use of the data DSID.EXE provides. Asking for the tool is probably just going to be met with blank stares because the PSS person you get on the phone doesn’t know anything about the tool or will fight with you about how you aren’t going to get it and all the while you thinking “MY GOD, IF ONLY I HAD DSID.EXE I COULD TOTALLY FIX MY PROBLEM!!!”. No, don’t bother asking for it. It is not going to help you. At best you get to learn what the name of the file is your problem is in and that usually is nowhere near enough to actually figure anything out.
2. If you get a nice DSID message like this and you resort to posting in a public newsgroup or a listserv like ActiveDir.Org or what not, post the entire message as well as the version of AD or ADAM like Windows 2000 SP4, Windows Server 2003, ADAM SP1 you are using. Why? Because the DSID value does not incorporate the version of the binary file the DSID was generated from and the source code files don’t change their names for every new Service Pack or OS version so a DSID could be from any of the versions of AD/ADAM. This means that if you hook someone in to help you who does have access to the DSID.EXE tool AND the source code, they have to either start talking to you or guess which version of the source file they should look at. Some of those folks will look at several most likely candidates and then try to respond, others will ignore your post and move on to another where someone provided enough info so they can fire a quick response that hopefully won’t generate more questions. I can’t say for sure, but I think there are some folks who filter newsgroup postings and have the ones with DSIDs in them highlighted or something because they seem to really target those posts. I don’t have a problem with that, those folks that do that are usually on the Dev team or very close to it and know what they are doing and are very busy and I am happy to see them helping at all. If focusing just on messages that can be answered fairly authoritatively and quickly helps them to help us, have at it, I am totally in your corner.
Ok so where were we… post the version of AD you are using when posting a DSID and don’t bother asking PSS for DSID.EXE….
Err… what was my last point… oh yeah
3. Do Google for DSIDs. Even though the values do often change between versions of the binaries there is a possibility you can still learn something useful from a given DSID value from various blogs/KB articles/other web pages. This has worked for me on occasion when I didn’t have time to go source surfing and none of the fields of the extended error message gave me enough to be useful. Primarily google the Microsoft newsgroups and the site:support.microsoft.com, those are the two most likely sources of help. If nothing there, open it up to everything and hope fervently for a blog post like this one.
Ok so we will assume you DO NOT have DSID.EXE and more importantly you DO NOT have source code access to the DS branch of the Operating System. For those of you that do, please be off and have a nice time. (wave)
So what is that first chunk of info and how is it useful?
The first piece of info is a hexidecimal error code and can normally be readily converted to something you *may* understand with a snazzy little program called ERR.EXE. Unlike DSID.EXE this is readily available and quite useful to most everyone. I have to give kudos to the Exchange team because that is where it is from. Kudos Exchange Team.
You can get ERR.EXE here and if you don’t have it, you probably should as it can be quite useful for decoding all sorts of errors from all sorts of programs.
Feeding this error code through Err gets us
C:\>err 208D
# for hex 0x208d / decimal 8333 :
ERROR_DS_OBJ_NOT_FOUND winerror.h
# Directory object not found.
# 1 matches found for "208D"
Thankfully, it is only one message, sometimes you will get 30 different messages from 30 different .h (Header) files and you will have to look at each one to ascertain which one applies this time.
Anyway, that is all kinds of more useful than what we had before, we have the beginnings of the understanding of what the problem might be even though the developer gave us no help at all. 🙂
The question is no longer, “What the frack is going on?” and is now “Which frackin object can’t be found??” This is a not so subtle important increase in our level of understanding of the possible issue.
If you have only specified a single object by name via DN, this isn’t too terribly difficult to work out unless you have trouble eating pudding safely. If however you have multiple items involved or you have no clue what items are involved then you may have a bit of a problem. Fortunately, you have a network trace!
You look at the summary line again and it says
MsgId=22 Modify Result, operationsError
This means that you should be able to find a matching Modify Request packet with the same MsgId of 22… And the line
Response To: 29
probably should be measure of help as well…
Low and behold, you do find it
MsgId=22 Modify Request
Let’s open it up and have a peek shall we
LDAP Message, Modify Request
Message Id: 22
Message Type: Modify Request (0x06)
Message Length: 92
Response In: 30
Distinguished Name: (null)
Replace: replicateSingleObject
Value: <guid=2k3dc04>:ou=testou,dc=joe,dc=com
Ok, so it is a request to replace the value of the attribute replicateSingleObject
with the actual value of <guid=2k3dc04>:ou=testou,dc=joe,dc=com
.
So anything stick out there? I see two different objects specified and they are separated by a colon. The second object, is a full DN so that is easy to check and sure enough ou=testou,dc=joe,dc=com does exist… Now about that first item… That doesn’t even look correct… I wonder if I even have a DC called 2k3dc04 and even if I do, that won’t get to it? Let me check. Oh, haha ha. No I don’t. Obviously the program tried to look up something for 2k3dc04 and failed and instead of telling me it telling me it failed to look something up it just went right ahead and just shoved 2k3dc04 into the field like it had been decoded fine and shipped it over the wire so the DC could say no, that isn’t correct.
Problem solved! And the developer didn’t help us one bit. ;o)
BTW, ADFIND and ADMOD will both output extended error info if you specify -exterr as one of the switches. LDP will also kick that info out for you as well.
joe
Note: A $3 a word, this would have payed about $8100.00. At a $1 a word it still would have payed about $2700.
Joe,
Just looking at your ActiveDir posts, I think you are a freakin’ millionaire by now! 😉
Jorge
That is one of the best DS troubleshooting related article I’ve read :).
Comment to the note: maybe You should consider putting option to donate Your blog 🙂 – I don’t know what rate per word You would get, if any but .. 🙂 it’s an option.