I seem to field a lot of questions on this though I am far from the expert here. However, here is my take on it…
Warming the cache is not limited to x64 with giant DITs, it just happens to be something people like to force up front with x64 with large DITS because the smaller DITs tend to take care of themselves fairly quickly. That being said, this script will work with x86 or x64. If x86 (on K3 or ADAM) and your DIT exceeds 2.6GB with /3GB enabled or you don’t have enough RAM for the DIT then this isn’t going to do much for you because the whole DIT cannot be cached. Go buy an x64 box and load up Windows Server 2003 x64 or LH x64.
Here is a quick and dirty perl script to defrost the cache.
# This is a perl script. 🙂
#
# Will work against K3 AD and ADAM
#
my $dcname=shift;
my $hoststr=””
if ($dcname) {$hoststr=”-h $dcname”} else {$dcname=”Default DC”};
print “\n”
print “Defrost V01.00.00pl Joe Richards (joe\@joeware.net) August 2006\n\n”#
# DEFROST NORMAL/DOMAIN NCS
# Expensive query to return all objects on DC (excluding app NCs) through phantom root
# o uses DNT_index
#
print “Defrosting $dcname…\n”
`adfind $hoststr -b -pr -t 0 -f * * replPropertyMetaData;binary replUpToDateVector;binary -list`;#
# DEFROST APP NCs ON AD
# Expensive query to return all objects on App NCs
# o Cannot do this through phantom root query because app ncs not exposed to them
# o Don’t need this for ADAM as phantom root sees everything
#
my @out=grep(/1.2.840.113556.1.4.1851/,`adfind $hoststr -rootdse supportedCapabilities -csv`);
if ($out[0]!~/\w/)
{
print “Active Directory Detected…\n”
print “Defrosting APP NCs…\n”
%ncs=();my @out=`adfind $hoststr -rootdse namingcontexts defaultNamingContext schemaNamingContext configurationNamingContext -csv -nodn -csvdelim “|”`;
chomp @out;
$out[1]=~s/\”//g;
my @attribvals=split(/\|/,$out[1]);
@ncs=split(/;/,$attribvals[0]);
map{$ncs{$_}=1} @ncs;
foreach my $this (@attribvals[1..3]) {delete $ncs{$this}};foreach my $this (sort keys %ncs)
{
print ” $this…\n”
`adfind $hoststr -b $this -t 0 -f * * replPropertyMetaData replUpToDateVector -list`;
}
}
else {print “ADAM Detected…\n”};#
# PARTIAL INDEX LOAD
# Queries to partially load attribute indexes
# Skip attributes that will use DNT_index for EXISTS queries (linked and special case)
# Do not use a single OR (|) query instead as it will fall back to DNT_index
# Obviously don’t need to return attributes
#
print “Partial index load….\n”
my @out=`adfind $hoststr -schema -list -bit -f “&(searchflags:AND:=1)(!linkid=*)” ldapdisplayname -s one -sl`;
chomp @out;
foreach my $this (sort @out)
{
next unless $this=~/\w/;
next if $this eq “cn”
next if $this eq “objectGUID”
next if $this eq “name”
print ” Loading index for $this…\n”
`adfind $hoststr -b -pr -f $this=* -t 0 -list`;
}print “Completed.\n”
# This is the end of the perl script.
As long as you understand that there are a great many things that don’t get warmed by doing this (like burst LVs which are not requested, such as supplemental credentials and such), sure.
I would also add…I’ve never been sold that all DCs need all data in memory. Some do, but some of course do not. Warming it all might be unnecessary and along the way starving the system of I/O that could be handling user requests happening concurrently.
Interestingly I can’t seem to get much more than about 50% – 60% of my DIT size loaded into memory using warming techniques. And that’s not really substituting the LSASS footprint. If I leave it a couple of days, more will be loaded (another 30% if the value pre-reboot is expected/ taken).
Running the queries in your script load a good couple of hundred MB, but no more (after a reboot). Check out this output from the following command:
adfind -b -pr -f objectclass=* -stats+only
…
Returned 371253 entries of 874997 visited (42.43%)
…
What is making the large difference between actual objects and what was “visited” by the query processor?
The reason I bugged Joe about this script pre-post, is I’m trying to do some performance tests and baselines against x86 and x64 boxes, as well as current and forthcoming DC models…
The current test environment contains 371253 objects with a DIT of ~2.6GB. I will grow this soon.
(I know that I won’t see much difference in terms of perf between the different proc architectures with such a small DIT, but am doing these tests piecemeal – first the 32-bit capable DIT, followed by the larger one that will more accurately depict our environment).
p.s. Thanks for the script Joe!
(How about a VBS one 😉
Sorry yes, I should have posted the limitations here. I have discussed at length with Eric in the past the idea of defrosting caches and there are things that you just won’t get loaded up via query like Eric said such as the credential info, etc. But this is also info that isn’t being used in queries and not returnable to applications. It is used internally in the DS and should have little impact on perf. Also note the section entitled partial index load, Eric has assured me it is difficult if not impossible to load an entire index into cache with a single query. It is based on the backend implementation. The directory is designed to be able to function by touching as little as possible.
As for how many objects you are touching. Keep in mind when you do an objectclass=* query you are hitting the DNT index which is every object on the DC, not every object in a single partition… Even still, if you force a GC search of objectclass=* you will get closer numbers but still won’t see a result of 100% entries returned versus visited due to backend implementation. Do a DB dump sometime and look at all of the items listed in the DNT table.
“It is used internally in the DS and should have little impact on perf”
You think creds have little impact on perf? I thought DCs auth users pretty regularly? 🙂
It is possible that’s true…if it is more CPU intensive than I/O on your particular server. But that’s conjecture at best.
Eric: I would expect that the first time they are needed, the creds would be pulled and cached. That being the case, I wouldn’t expect a huge impact from perf considering most of the rest is already pulled into memory. I certainly don’t see the cache state of that info having a large impact on perf of queries.