Recently a friend pinged me about what I thought about load balancing Domain Controllers behind some sort of VIP (Virtual IP) hardware be it from Brocade, Cisco, Barracuda, F5 or from whatever vendor.
As a generic statement, I am against it. Period.
Active Directory was built in such a way that the load balancing[1] and redundancy is built in. The clients just have to be smart enough to utilize it. This can be done, I have seen people writing code in UNIX even to do the SRV record lookup and in fact one implementation I saw was arguably better than MSFT’s in how it chased after next best site and next best site after that, etc.
But joe… some apps just don’t do it right! True, so let’s just give them a pass to do things wrong[2]… No. Go back to the vendors and tell them you don’t consider them to be AD Integrated and you will find a different solution. Of course if someone already bought the product without actually consulting anyone with relevant technical skills then you are a little stuck but I would still avoid load balancing, instead point at the Domain DNS record and say have at it. Possibly showing them that you won’t crutch every single thing they throw your way will help get you or someone else with the insight to check out the product before purchase. These special crutching operations cost companies money both in having to work out the unique solutions but also in supporting them and teaching new support people about them. Can’t tell you how much fun it is to explain an environment to someone like “Well it works in this way with the exception of this which does that and that which does this, etc.”
There can be issues with this practice other than you are helping some company sell a product that should be updated. When I received the initial question I knew there was a Kerberos issue with this but thought I would ask some of my intelligent friends on an MVP distribution list what they were aware of to see if there was more than I was thinking about and also if Microsoft had written up any documentation. One of the responses was really good that I would like to share here. It is from my friend and fellow Microsoft MVP Joe Kaplan.
—–Original Message—–
From: xxx [mailto:xxx] On Behalf Of Joe Kaplan
Sent: Friday, February 19, 2010 12:02 PM
To: xxx
Subject: Re: [xxx] Hardware Load Balancing Domain ControllersThere is a principle in Kerb that exactly one security account can be
associated with a given SPN. On a DC, various services run as system which
in turn ends up using the domain computer account. This is a different
account on each DC.
However, clients form requests for Kerb service tickets by using the DNS
name of the target service to form the SPN used in the request.
In a load balanced scenario, you run the risk of having a single DNS name
refer to multiple different hosts behind it. If in turn the service being
targeted is a service that accepts Kerb auth and the service is running as a
different user depending on the host (which would be the case for services
that are part of a DC), then you’ll get random Kerb auth failures (the
dreaded KERB_APP_ERR_MODIFIED). This is probably not what you want and
therefore a significant risk in this situation.
You could potentially get away with load balancing LDAP and use an alternate
DNS name that has no Kerb SPN associated with it. You’d get no Kerb auth
(only NTLM) and it might work in most cases as a result (no cross domain
moves because delegation is now broken but other stuff may be fine).
However, it probably isn’t a good idea.
You CAN do this the right way with ADAM by having the ADAM instances all run
as the same fixed domain user service account and creating an SPN to match
the DNS name of the load balancer front end you are using. ADAM makes it
hard on you to get SSL working in this instance by insisting that you use a
wildcard cert, but it can be reasonable. Note that this is typically also
what you do when load balancing web applications that require Kerb auth.
Joe K.
that is pretty clear, I asked Joe if he knew of any MSFT docs on the subject and his next response was even more clear.
From: xxx [xxx] On Behalf Of Joe Kaplan
Sent: Friday, February 19, 2010 1:15 PM
To: xxx
Subject: Re: [xxx] Hardware Load Balancing Domain ControllersI have no idea on the documentation part. I’m just telling you what I know
based on personal experience and my knowledge of the underlying mechanisms
(which is unfortunately much deeper than I’d probably enjoy due to some of
my personal experiences to date :)).
I’m obviously recommending against doing this which is also the party line.
My opinion is that the benefit here is probably completely overshadowed by
the risk of having lots of things not work in ways that are very difficult
to understand. Getting predictable, positive results from this will likely
be non-trivial.*
Joe K.
* Emphasis is mine… Joe didn’t smack us in the face with his email like that. He is too nice. I am the mean joe… I was nice enough to get Joe’s permission to publish this though.
There are several MVPs whose opinion I will not question or at least not question when they are speaking to specific technologies. For example, I won’t question Guido Grillenmeier too much on AD Disaster Recovery. I will not question Lee Flight too much on ADAM / ADLDS / and pretty much anything LDAP related. I won’t question my bestest buddy Dean Wells on batch commands… well I would now, he isn’t an MVP anymore… ;o) Back to the point, I don’t question JoeK on his knowledge of .NET / Windows Auth / ADFS / Kerberos. If he says something works in a certain way, I thank him with great humility for taking the time to respond. More than once his comments in email or in newsgroup posts has helped me work issues out in those areas and there are changes in AdFind/Mod that are directly due to some of Joe’s comments/emails and even his .NET book – The .NET Developer’s Guide to Directory Services Programming. 🙂
joe
[1] Granted the load balancing isn’t that great, keep taking on load until you fall over and don’t respond anymore… But that is exactly the same kind of load balancing you get with VIP in front of a DC. The VIP devices do not know how much load the DCs are under, at best they look at the port(s) you told it to worry about and it will stop giving out that IP if that/those port/s stop responding.
Anyone who has ever watched Exchange’s load balancing algorithms for using AD in a very big nasty Exchange deployments knows it is keep beating on the DCs until they fall over and then find another DC to gang up on and in all actuality Exchange is in a position to do better because it is one distributed app that could keep track of the connections to all of the DCs from all of the Exchange servers itself. I guess MSFT could add a special operational attribute to the rootdse of DCs (and ADAM servers) which indicated relative load when you asked for it and then you could get a VIP (or other app) smart enough to ask for that but I don’t see that happening anytime soon.
[2] As an example here… Say you have six kids you are babysitting. Four of them listen to you and follow your rules, the other two choose to randomly kick you in the shin and paint on your 52” Big Screen TV… Do you give those two kids a pass? No you correct their uncivilized behavior without delay. Think of your vendors as kids. They are working for your approval and your $’s. Make them earn it.