joeware - never stop exploring... :)

Information about joeware mixed with wild and crazy opinions...

Bug in Microsoft LDAP API? – Revisited

by @ 1:01 am on 9/28/2006. Filed under tech

In the blog entry Bug in Microsoft LDAP API? I documented a pain in my side I ran into with ldap_search_init_page. As I mentioned in the comments I began a conversation with one of the smart Microsoft Dev folks, specifically a Dev who previously did LDAP Client and now is a core dev. I am not sure if I can mention his whole name, either you know who it is or you don’t and I don’t want to lead anyone new to him. 😉 I knew who he was prior to this email chain having met him in Redmond which made the conversation go smoother from my side because I knew who I was dealing with and was confident in his skill level and that what he was saying could be taken pretty seriously. So for the rest of this entry we will choose the completely arbitrary name of Matt for this unknown Dev guy.

After all the conversations with Matt I have to admit that this isn’t a bug though it is certainly still a pain in the butt. 🙂 It slowly became clear from Matt’s careful explanations that I was interpreting the RFC at too high a level; it wasn’t immediately obvious in RFC2696 but once you tied in RFC2251 it became much clearer. If anything, this is a shortcoming in the definition of the paging control in RFC2696.

So the details… In RFC2696 it indicates that you use the paging control to break up the number of objects returned from a server for a given LDAP search into smaller bite size result sets, example reason given is that the client may not be able to handle the overall result set data size. It then discusses the implementation of the feature. Part of that discussion is that the PageSize should be less than the SizeLimit which makes sense because if you have a SizeLimit of 10 and a PageSize of 20, only 10 results will be returned anyway so the paging control could essentially be tossed out (i.e. the Page returned is automatically going to be less that the PageSize specified). What isn’t spelled out specifically and where my source of issue came in is what the results should be for the overall result set when the SizeLimit exceeds the PageSize.

If you mull that over at a high level, it sounds like the call to ldap_search_init_page generates a result set and the SizeLimit you specify limits the overall result set size and the PageSize will chop that overall result set up into smaller pieces and feed that back. You think that way because you consider that in your code you are only submitting a single query request and then just retrieving pages.

However, if you look deeper at RFC2696 and RFC2251 and actual network traces you see that a paged search is really a series of searches. The call to ldap_search_init_page doesn’t actually send anything across the wire, it is the call to ldap_get_next_page that calls across the wire. It takes the info specified in the previous call, generates the query and sends it across and maintains the state of the cookie passed back so it knows where it left off and then the next time you call it submits the same query again but with the cookie to remind the server where it was. Again, each one of those calls in pure LDAP parlance is the result set. If you did that all manually (i.e. didn’t use ldap_search_init_page) you could send a different SizeLimit each time. While the system could handle that, it would have to be handled.

The long and short of it is that LDAP isn’t looking at the overall Result Set that matches your query, it is looking at the result set of each individual ldap search call. And in that context, if your PageSize is X and your SizeLimit is X+1, SizeLimit is effectively moot because the result set size of each individual call will always be smaller than the SizeLimit because that is what the PageSize does….

So what is really needed is an addendum to the paged control which states that you can send along an overall resultset size which limits the server to sending back x total of the y matching entries regardless of whether it is a single search or a paged search comprised of multiple simple searches.

 

This is a problem on Active Directory obviously because Microsoft limited the PageSize on all queries by default to 1000 objects and they really recommend NOT increasing that value. Not to save the client from taking a beating… but to save the server. Imagine many clients all hitting the server at once with massive queries that have result sets of hundreds of thousands of objects. Without paging, the server is likely going to be hurting in some way shape or form. What shape/way/form that will take, I leave as an exercise to the reader to go discover if they so want. I am not really sure how other LDAP directories prevent this from hurting them other than having maximum resultset size rules which I have seen in several cases, in a couple as low as 10 objects. I guess you could say MSFT is being generous in that they don’t limit the total resultset size, they just limit how much you get at once which certainly makes things easier on several fronts. However if you depend on SizeLimit really working for values greater than the default page size, you are SOL or you are changing the defaults of the system which in turn can also make you SOL.

 

So this all came back to a realistic issue for me… I have this switch in adfind called -maxe which is supposed to be Maximum Entries returned. I initially added that switch as homage to SizeLimit as I thought it was a good thing. Only recently did I learn it didn’t work for values above the page size… So what to do? Just document the shortcoming or make it work the way I expected and wanted it to work? Well some of you know me and you know exactly what my only available course of action was to me… I fixed it…

So here is a basic code snippet of how to do this in a relatively elegant way. I am sure there are better ways, but this is what I came up with in the short time I spent on it… I just needed a solution that wouldn’t make me vomit, not the cure for cancer. 🙂

The basic idea is to change the page size to the max entries you need if the max entries you need is less than the page size you are normally using… Clear as mud??? So… an example… If you have 10,000 entries left to retrieve and your page size is normally 1000, it will ask for 1000 entries. However if you have only 800 entries left to hit the maxmimum specified it will ask for 800 and then on the next pass it will ask for 0 which tells ldap_get_next_page you are done…

 

TempPageSize=PageSize;
EntriesRemaining=MaxEntries;

LDAPSearch* pSearch=ldap_search_init_page(pLdap, BaseDN, Scope, Filter, Attrs, bNoValues, pServerControls, pClientControls, Timeout, MaxEntries, pSortKeys);
if (pSearch!=NULL)
 {
  while (uErr==LDAP_SUCCESS)
   {
    if (MaxEntries) TempPageSize=(PageSize>=EntriesRemaining?EntriesRemaining:PageSize);
    uErr=ldap_get_next_page_s(pLdap, pSearch, &sTimeout, TempPageSize, NULL, &pMsg);
    if (uErr==LDAP_SUCCESS)
     {
      ThisObjCnt=ldap_count_entries(pLdap,pMsg);
      EntriesRemaining-=ThisObjCnt;

      [do whatever with the result set]

     }
    else
     {
      // handle ldap_get_next_page_s errors here
     }
   }
 }
else
 {
  // handle ldap_search_init_page here
 }

Rating 3.00 out of 5

3 Responses to “Bug in Microsoft LDAP API? – Revisited”

  1. Joe Kaplan says:

    This is a good, detailed explanation. I’m glad this “Matt” guy helped work out the details. I think I may know this “Matt” guy, actually. 🙂

    The way I always thought of it is that when using paging, size limit is just ignored. This is sort of what happens in ADSI, where the paging is all done for you by the system. The results just keep coming while you continue to enumerate. I always just took it as the client’s problem to limit the overall result set size if paging was used.

    I like your solution though. It makes good sense.

  2. joe says:

    Yep I am sure you know Matt, I believe I saw you speaking to him in one of the MSFT cafe’s…

    It is interesting how different we look at the expected outcome coming from using the different APIs.

  3. V says:

    I need to use paging to retrieve the results from AD. I am running into the limit issue as well.
    can you send me the code to do that? From the snippet above, it was not clear what some of the values of the parameters is going to be.

    Thanks,
    V

[joeware – never stop exploring… :) is proudly powered by WordPress.]