Recently I was searching for some specific OU’s in Active Directory that had a specific substring within the name. i.e. it didn’t start with a specific substring, the substring I wanted was buried in the main string like for example the substring Crab in the string Joe’s Crab Shack which would be represented as *crab* or in this specific case the LDAP filter looked like OU=*something. This is called a medial search.
I knew that although I had more than a half a million objects in the directory I only had about 25 OU’s to search through so even though medial searches can be slower than normal searches, with only a handful of OU’s this really shouldn’t be too much of an issue. Surprise… It was. The query took a considerable amount of time to run and it really really shocked me. I could have output the relatively few OU’s and manually found the ones I wanted faster as it took over a minute to find the OU’s I needed.
This really caught my curiosity so I started digging into it. The very first step is to validate that the OU attribute was indexed, I was pretty confident it was but you need to start somewhere:
G:>adfind -sc s:ou searchflags
AdFind V01.40.00cpp **BETA** Joe Richards (joe@joeware.net) February 2009
Using server: r2dc1.test.loc:389
Directory: Windows Server 2003
Base DN: CN=Schema,CN=Configuration,DC=test,DC=locdn:CN=Organizational-Unit-Name,CN=Schema,CN=Configuration,DC=test,DC=loc
>searchFlags: 1 [INDEX(1)]1 Objects returned
So yes, it was indexed. Next step is to look at the STATS of the query and make sure the index was being used…
G:\>adfind -default -f ou=*something -stats+only
AdFind V01.40.00cpp **BETA** Joe Richards (joe@joeware.net) February 2009
Using server: r2dc1.test.loc:389
Directory: Windows Server 2003
Base DN: DC=test,DC=locStatistics
=================================
Elapsed Time: 89679 (ms)
Returned 0 entries of 714377 visited – (0.00%)Used Filter:
(ou=*something)Used Indices:
DNT_index:485112:NAnalysis
———————————
Hit Rate of 0.00% is InefficientNo dedicated indices used for search, this is inefficient.
Indices used:
Index Name : DNT_index
Record Count: 485112 (estimate)
Index Type : Normal Attribute IndexFilter Breakdown:
(
(ou=*something)
)
WHOA! It isn’t using the OU attribute index, it is using the DNT index which is everything… Look at that 714377 entries visited… Now that doesn’t make much sense… Let’s take a step back, what is used if we use OU=*?
G:\>adfind -default -f ou=* -stats+only
AdFind V01.40.00cpp **BETA** Joe Richards (joe@joeware.net) February 2009
Using server: r2dc1.test.loc:389
Directory: Windows Server 2003
Base DN: DC=test,DC=locStatistics
=================================
Elapsed Time: 150 (ms)
Returned 22 entries of 23 visited – (95.65%)Used Filter:
(ou=*)Used Indices:
idx_ou:23:NAnalysis
———————————
Hit Rate of 95.65% is EfficientIndices used:
Index Name : idx_ou
Record Count: 23 (estimate)
Index Type : Normal Attribute IndexFilter Breakdown:
(
(ou=*)
)
Now that is more like it… 23 entries.
But what is happening, that makes seriously no sense. The logic of the Query Processor should be such that if you are searching for an attribute, even via a medial search, if there is a regular index… Use it, don’t use the DNT index. Even if it only cut it down by 10% that is still a nice reduction, in this case it would cut it from more than half a million objects to 23… That is a serious reduction.
So I tested some more and it seems that any time I tried to run a medial search against an attribute that had a normal index (searchFlags & 1) the Query Processor would fail to utilize the index even if the number of choices that had to be looked at was substantially reduced from the DNT index. The only time this worked ok was when an actual tuple (or medial) index was defined for the attribute like so.
G:\>adfind -default -f ou=*something -stats+only
AdFind V01.40.00cpp **BETA** Joe Richards (joe@joeware.net) February 2009
Using server: r2dc1.test.loc:389
Directory: Windows Server 2003
Base DN: DC=test,DC=locStatistics
=================================
Elapsed Time: 60 (ms)
Returned 0 entries of 0 visited – (0.00%)Used Filter:
(ou=*something)Used Indices:
idx_ou:0:TAnalysis
———————————
Hit Rate of 0.00% is InefficientIndices used:
Index Name : idx_ou
Record Count: 0 (estimate)
Index Type : Tuple Index (medial substring index)Filter Breakdown:
(
(ou=*something)
)
Note the index type… tuple. This of course is a workaround for this problem but you don’t generally want to go around making a bunch of tuple indexes as it breaks up the strings into chunks of 3 characters for comparison which could rather dramatically impact DIT size not to mention insertion of new instances of values for that attribute.
I tested this problem against Windows Server 2003 and Windows Server 2008 and decided to ping some friends of mine in the production groups at Microsoft in Redmond to see what they thought. Rather quickly Dmitri came back to me and said it looked like this was an area of improvement and to get it into the change request system to get it looked at. He gave me specific details and pointed out where I could find it in the source code to see what it was doing as well. Not only that, he gave me a second workaround that can used until such a time that Microsoft corrects the issue.
There are two solutions to this issue.
1. The first is as we saw above, define a tuple index. That requires a schema change and if you want to practice in your lab you can quickly do this like so…
adfind -sc s:ou searchflags -adcsv | admod searchflags::{{.:SET:32}}
and to clear it you use
adfind -sc s:ou searchflags -adcsv | admod searchflags::{{.:CLR:32}}
Then the attribute will look like
G:\>adfind -sc s:ou searchflags
AdFind V01.40.00cpp **BETA** Joe Richards (joe@joeware.net) February 2009
Using server: r2dc1.test.loc:389
Directory: Windows Server 2003
Base DN: CN=Schema,CN=Configuration,DC=test,DC=locdn:CN=Organizational-Unit-Name,CN=Schema,CN=Configuration,DC=test,DC=loc
>searchFlags: 33 [INDEX(1);TUPLE INDEX(32)]1 Objects returned
2. The second workaround is what Dmitri clued me on to… Using what he termed as an “Index Hint”. Basically you add one more piece to the query that forces the Query Processor to look at the indexes again and actually chose correctly. It kind of sucks that it needs to be done but it is nice that it can be done at least. Changing the query to be “&(ou=*something)(ou=*)” will perform as you would expect (ou=*something) should perform all on its own. The (ou=*) is the hint that the Query Processor needs…
G:\>adfind -default -f “&(ou=*something)(ou=*)” -stats+only
AdFind V01.40.00cpp **BETA** Joe Richards (joe@joeware.net) February 2009
Using server: r2dc1.test.loc:389
Directory: Windows Server 2003
Base DN: DC=test,DC=locStatistics
=================================
Elapsed Time: 20 (ms)
Returned 0 entries of 23 visited – (0.00%)Used Filter:
( & (ou=*something) (ou=*) )Used Indices:
idx_ou:23:NAnalysis
———————————
Hit Rate of 0.00% is InefficientIndices used:
Index Name : idx_ou
Record Count: 23 (estimate)
Index Type : Normal Attribute IndexFilter Breakdown:
(
(&
(ou=*something)
(ou=*)
)
)
Still not as efficient as a Tuple index but it doesn’t have the overhead.
I have been working with AD for almost 10 years now, I had no clue about this until just this week when I ran into it. I have always known medial searches to be a bit more slow than your normal searches but never had the time to really dig into it like this and I am glad I did. Definitely keep an eye open on this and if it is something that impacts your company, please be sure to do what I am going to do, report this as an issue to Microsoft for correction. The more people who report it, the more likely it will be fixed in some timely fashion. So call your TAMs up and say, hey I have a bug I want fixed… Here it is… Oh and at the same time, don’t forget to order your TAM (male or female) a nice new Joeware Thong (http://www.cafepress.com/joewarenet). They love em!!! I have had several TAMs email me laughing about having received them. I wouldn’t mind to see it happen more and more. 🙂
BTW, Dmitri Gavrilov is one of the speakers (http://www.tec2009.com/vegas/agenda/directory/speaker_bios.php) at The Experts Conference in Henderson (Green Valley Resort) this year and he will be talking about Query Performance. I intend to be in the room when he is presenting because I am sure there are other things I will learn from him as well. That has been my relationship with Dmitri from the beginning, he speaks and I learn things. 🙂
joe
This is just awesome finding! I hope it could lead to resoolving my sharepoint peoplepicker lazyness. I dont have so much objects as you but i have hundreds of OU.