Hi, I just wanted to touch base on the upcoming releases.
I thought I would be releasing new production versions of AdFind and AdMod at the beginning of the month. Obviously that did not happen. I ran into a few bugs I needed to deal with and most recently ran into something I really wanted to dig into…
That thing I wanted to dig into was the –cv / –cva functionality for counting values in multi-valued attributes. Specifically I had a coworker who was having an issue with counting the members of a group that was somewhere between 2-3 million users. AdFind was damn slow doing it.
So those of you who were aware of and actually remember the history, AdFind did not start with CSV output and I kept saying it wouldn’t get it. Then on one of the early Microsoft MVP Summits out in Redmond, Washington in, I think 2005, I was having a chat with fellow MVP Jerold Schulman of the old JSI FAQ web site while we were waiting for a bus and he was pushing on me on how great it would be for AdFind to have CSV, I kept saying no, it isn’t going to happen until I produced an AdFind V2.0 engine and then suddenly, I realized I could actually hack it into the V1.0 engine without completely rewriting the main flow of the engine. It would impact perf but slightly slower overall AdFind functionality (which was already faster than everything else) was a decent tradeoff to get CSV output. I long intended to write that V2.0 engine but I just never really got a chance and with Active Directory being mostly ignored by MSFT now I am unlikely to spend that time doing it now. Anyway, sometime later I came up with the additional –cv hack on top of –csv to count multi-valued attributes. So –cv was a hack on a hack and so extra slow.
Anyway cut to my friend trying to count the members in this large group and AdFind taking a few hours to do it. That isn’t tenable. I had previously written a perl script that could get the count in less than a minute but not everyone has perl installed and I thought, maybe I can optimize some of the code specifically for the –cv functionality.
After digging into it I did in fact find some optimizations and was able to reduce the time for –cv by an order of magnitude.
In testing, against a local LDS instance to get the variability of network traffic out of the way with a group with 1.5 million users I had these perf numbers:
Original V01.54.00
"dn","member"
"CN=largegroup,OU=test,O=FULL","1500000"
Time Elapsed (sec): 7990
Initial optimization tweak to optimize -cv specifically in V01.55.00beta
"dn","member"
"CN=largegroup,OU=test,O=FULL","1500000"
Time Elapsed (sec): 665
So a reduction from 7990 seconds (~2.2 *hours*) down to 665 seconds (11 minutes).
I thought that isn’t bad. But I had noted that when I was watching the processing I realized that all of the time was getting stacked up in some routines that are part of the CSV hack that perform some string manipulations. So I thought, hmmm, I wonder if the std::string functions are maybe not great for performance for large strings (100’s of K to MBs) so I wrote my own string replace function and the processing got SO much faster… I gained another order of magnitude of speed reduction.
Additional optimization by writing my own std::string replace function in V01.55.00beta
"dn","member""CN=largegroup,OU=test,O=FULL","1500000"
Time Elapsed (sec): 44
So yep, that is 44 seconds…. From the original initial performance of over 2 hours!
I looked at it some more and then realized I could reduce it even more, likely to get it down below 10 seconds but that would take some serious additional hacking of the flow to pull the count functionality out of the normal flow completely and leverage some underlying implementation details of AD and how it handled large groups but felt it really wasn’t worth the effort. I am good with less than a minute to count the group members for a group that has 1.5 million members as there aren’t many companies out there that have groups of that scale. Groups with members in the tens of thousands can be counted in milliseconds and most companies don’t even have groups that large.
Anyway, with this deeper string functionality change I am doing a lot more tests to make sure I didn’t make some mistake that could be very painful.
So once again, sorry for the delay, but AdFind is getting very close and I am using the new version of AdMod daily at work as well as allowed several co-workers to get it and use it as well and I have heard a couple “OH THAT IS SO COOL” comments for some of the new functionality.
joe