Yesterday I received an email from one of the faithful joeware customer’s with an issue that when I first started reading confused me and then as I continued reading I was reminded of my earlier, eviler, more maverick days… The days when men were men and sheep were scared, at least in New Zealand and parts of Australia and the Netherlands. Of course we are talking about the early years of Windows 2000… But let’s have Art explain a little about what he saw…
…we discovered an interesting issue with ADFIND last week in our SYS AD forest (systems testing). Turns out ADFIND and a number of our scripts stopped working for some reason. We are able to run the tool in two other forests that mirror SYS and ADFIND runs with no issues. When we attempt to perform a query with ADFIND in SYS it returns the below error:
>adfind -b "DC=domain,DC=com" -f "objectcategory=user"
AdFind V01.41.00cpp Joe Richards (joe@joeware.net<mailto:joe@joeware.net>) February 2010
LDAP_SEARCH_S: 0x4
LDAP_SEARCH_S: Size Limit Exceeded
After I read that I thought… Oh that can’t be good, but that is AD, not AdFind as I use paged searches like a good little LDAP dev…
Then I read on through the various levels of testing that Art and his compatriots performed and then I started chuckling when I hit
Debug time … I executed the same query with ADFIND only added the debug switch ‘-d’
>adfind -b "DC=domain,DC=com" -f "objectcategory=user" -d
AdFind V01.41.00cpp Joe Richards (joe@joeware.net<mailto:joe@joeware.net>) February 2010
DEBUG: Opening TCP connection
DEBUG: In OpenLDAP… Params:
DEBUG: ARecEx: 0
DEBUG: SSL: 0
DEBUG: Port: 389
DEBUG: Ref: 1
DEBUG: V3: 1
DEBUG: Anonymous: 0
DEBUG: userdn:
DEBUG: password:
DEBUG: Simple: 0
DEBUG: Digest Auth: 0
DEBUG: LDAP_OPT_ENCRYPT: 0
DEBUG: Delegation: 0
DEBUG: Req Writeable: 0
DEBUG: Extended Error Info: 0
LDAP_OPTION: Version 3
LDAP_OPTIONS SET
LDAP_BIND: [(null)] Successful
DEBUG: Gathering RootDSE
DEBUG: Entering CRootDSE…
DEBUG: Leaving CRootDSE.
DEBUG: RootDSE Completed
DEBUG: Loading OIDs from schema…
LDAP_SEARCH_S: 0x4
LDAP_SEARCH_S: Size Limit Exceeded
You may think it is odd that I started chuckling, but I did because I knew for sure I was to blame. I had been stupid. Or at least severely myopic. But I continued reading anyway…
Something in the schema is clearly causing the size limit error. Recently we extended the schema in SYS for Exchange 2007 SP2, OCS 2007 R2, and SCCM 2007. These changes have NOT yet been introduced in our 2 other forests and are scheduled for implementation in April and May.
And further along the email…
Lab time. I took this into an isolated AD environment (vanilla install of win2k3 sp2 / default AD query values) and using ADSchemaAnalyzer exported our production AD schema (the one without Exchange 2007 SP2/OCS 2007 R2/SCCM 2007) and imported into my lab. ADFIND works as expected and returns the results for all queries. I then extended the schema for Exchange 2007 SP2. Immediately after the schema extension I am able to reproduce the issue and see the same size limit exceeded error:
and still further
Same failure, immediately after ADFIND loads the OIDs from the schema. I decided to change the MaxPageSize in Active Directory to 10, 000, reboot and ADFIND works again. I change it back to 1000, reboot and ADFIND fails. After digging around the ADFIND help file I noticed the tool includes a switch ‘-dloid’:
And finally
ADFIND returns all the user objects in my isolated lab forest. I then ran the same query in our SYS forest and ADFIND and all our ADFIND dependant scripts start working again. Interesting that by either adding the -dloid switch or changing the MaxPageSize in AD to 10,000 immediately fixes the issue. Changing the MaxPageSize to 10,000 will not be possible given MS recommendations and Exchange requirements for this value to be set at 1000. For the time being I’m using the DLOID switch.
Any idea what could be causing the failure without the DLOID switch? is it something specific to the Exchange 2007 SP2 schema upgrade? or did we exceed some sort of limit somewhere? let me know if there are any additional logs we can gather for you.
As I mentioned, I knew pretty early on that I was at fault and every paragraph after that re-emphasized that it was my problem. I was happy to see the depth and quality of Art’s debugging of the issue. Great job Art. 🙂
The history behind the issue… back in the wild days of Windows 2000 when I first put AdFind together, I knew of no other way to determine how to decode various attributes than to have a set of mapping structures to identify the attributes when they were encountered. That way I knew which attributes to decode as time fields, which were SIDs, which were GUIDs, which were binary, etc. I would manually update the maps as I found the attributes. This worked ok initially but once I went north of 400 separate requests for people who wanted me to add “their” custom attributes I decided I needed to do something else and I implemented it in V01.09.00 in 2002.
That something else ended up as a routine that tore through various attributes in the schema looking for the types that I cared about and building the attribute maps dynamically on the fly. The –dloid switch avoids that process. Even if you use –dloid, there are still some attributes hardcoded into the maps so there will be some decoding that occurs, just not all of the decoding that could occur.
Anyway… the stupid assumption I made, and obviously now quite embarrassed by, is that I used a non-paged search to pull the attributes I wanted from the schema. Back when I did it I knew that paged searches were goodness and should be used but I never really thought it would be an issue for this specific query… I mean seriously, even in a default Windows Server 2003 schema, there are only 293 attributes that I pick up for my query and the failure doesn’t occur until after 1000 attributes of type DN, binary, or SID… Lots of head room right? Should be safe right? Wrong… And this is why we tell people to use paged queries when requesting information from Active Directory, this very reason.
To validate the issue I simply changed my lDAPAdminLimits MaxPageSize value to 10 and sure enough… Kaboom.
F:\Dev\Current\CPP\AdFind\Release>adfind -default -s base
AdFind V01.42.00cpp ***BETA*** Joe Richards (joe@joeware.net) March 2010
LDAP_SEARCH_S: 0x4
LDAP_SEARCH_S: Size Limit Exceeded
So I fixed the issue by changing the section that pulls the info from schema to use a paged query just like I should have used in the first place. Ran the new binary against my “hacked” test directory with a MaxPageSize of 10 entries and it worked like a champ. I sent the link to the beta binary to Art and he tested it in his environment and he is working A-OK again.
If anyone else is having the issue and wants to try the beta, you can find the new betas for AdFind and AdMod at
http://www.joeware.net/downloads/beta/adfindmod_beta.zip
The AdMod beta has a rather large fix in it as well, but I am not ready to discuss that yet. However your data is safe, it was a crash issue when data was specified in a specific way.
joe