joeware - never stop exploring... :)

Information about joeware mixed with wild and crazy opinions...

10/23/2008

Lag Sites++

by @ 1:41 am. Filed under tech

More comments on the Active Directory Services Team blog concerning Lag Sites. My friend Guido, probably one of the top guys in the industry in terms of understanding the backup/recovery solution space for Active Directory stepped up and commented as well. He didn’t even know I had left a comment and later pinged me and mentioned how similar our responses were.

http://blogs.technet.com/askds/archive/2008/10/20/lag-site-or-hot-site-aka-delayed-replication-for-active-directory-disaster-recovery-support.aspx

 

Here was the response I received if you don’t feel like going to the site…

Hi Joe,

Great comments!

To add a few thoughts here (as Gary is out for a few days; I’ll let him reply in depth when he returns).

The lag site is *not* a fully supported scenario. That is the point of this post. If you call me and my team here and ask for advice on how to best configure a lag site, we will tell you the same. ‘Supported’ has a very specific meaning when you talk to our product group and us – it means we exhaustively test the scenario: this is not done for lag sites. It’s also why if you read our technet documentation you will not find a guide to creating lag sites.

The other main point that Gary was trying to reach is that we have found in Support that many thousands of customers have been using Lag Sites *exclusively*. They don’t use, maintain, or test their systemstate backup systems – then we work tons of cases each year where they thought that their lag sites would save them, and they did not. So this wasn’t directly pulled from Gary’s behind – we have 10 years of 3rd tier support cases evidence to back it up.

And your main point is well taken – you probably will not have good backups or a good disaster recovery strategy if you’re not doing your job as an admin.

(PS: love your webpage, tools, and general AD passion)

– Ned

This was my response

Hey Ned, glad you enjoy the utilities/site/etc. 🙂

So which part of the lag site concept isn’t supported?

My understanding from speaking to various folks around MS within PPS and the PG is that what isn’t supported is that a lag site be used as the sole DR recovery mechanism. Again, I fully agree with that. That is an insane position to put yourself into.

Anyway, lets break it down to some of the various components that may or may not be used in any given lag site configuration…

* Delayed replication sites are supported.

* Auth restoring objects on any arbitrary DC in a domain is supported.

* Disabling registration of domain SRV record specific DNS entries pointing to a given site is supported

* Disabling replication entirely (or shutting DCs down) for periods not exceeding the forest TSL on a given DC or every DC in a site is supported

I have been involved in various situations where PSS has indicated one or more of each of those be done for a given situation. Heck anyone who has been on a call with a customer and PSS in a major accidental deletion incident has likely heard “has the deletion replicated to all DCs in the domain?” and if not that is followed by “stop replication to that DC immediately and let’s restore the objects from there”. I have heard a multitude of stories from the PG that started that way. Every time that is done it is acknowledgement of the concept of the lag site.

Will PSS help someone set up a lag site if someone asks for that specific thing. Sounds like no and I can understand the reticence to do so unless you have a thorough understanding of the overall DR plan/process for a given customer. Will PSS help a customer set up a site to replicate on a schedule that is measured in days instead of hours or minutes… Absolutely, I have talked to customers who have been walked through the process by PSS. Will PSS help a customer auth restore an object from any arbitrary DC? Absolutely, have seen it with my own eyes. Ditto for the other items.

What seems to be the issue PSS has is the intent behind the uses of these features in the technology, not the use of the features themselves.

The comment that “many thousands of customers” have been using lag sites exclusively scares me. That would seem to me that someone at MSFT isn’t getting the concepts of how backup/restore works in AD out there very well. I am also just surprised to hear that number. I work in a very large services org for my full time job and have dealt with many large customers over the years and have seen very few instances of lag sites that I wasn’t involved in some way in setting up. Smaller companies never seem that interested due to the hardware and OS licensing investment.

Not to bust your chops but I think the 10 years of cases is a bit of an exaggeration Ned. We are on the 7th year of truly popular use of AD (though some of us had it in large scale Fortune x if not Fortune xx production as early as 99 or 2000) and lag sites didn’t really start catching mainstream attention until several years into AD being in production. Some of us picked up on the idea that a latent (non-converged) site (which is what those of us who were publicly discussing it called it initially) could be used for this type of recovery but the people talking about it were people who could work it out on their own and also understood the repercussions. I recall the first time I heard the “lag site” moniker was at one of the DEC conferences four or five years ago at which point the concept started to explode.

Anyway, people do a lot of stupid things in their production ADs. Lag sites are a relatively painless and innocuous item. I am far more worried and have seen far more issues with DC virtualization than lag sites though I do recommend lag sites be running on virtual machines when I recommend lag sites. 😉  And yes, I do officially recommend them to companies. I also give them the caveats of when it is and isn’t good to use and make sure they fully realize it is a mitigator, not a total DR solution.

Let’s face it, setting up a lag site isn’t rocket science. If someone can’t work it out themselves, they likely shouldn’t be doing it for a variety of reason. Being who I am I would also go as far as to say they probably shouldn’t be running AD at all but that’s just me. No one who has to call PSS to ask how it should be set up, should be doing it.

 

  joe

Rating 3.00 out of 5

10/22/2008

AdFind compiles successfully under Code Gear Builder 2009!!!

by @ 2:22 am. Filed under tech

Yes! I hit a milestone in AdFind V01.38.00…. I got it to compile under Code Gear C++ Builder 2009. This was quite an accomplishment as there have been some serious changes in that compiler from Borland C++ Builder 6.0 (circa 2002) which was the last compiler that I was using for it. Mostly the issues had to do with various functions not being able to be linked in along with ambiguous references that previously were allowed and now are busted.

The executable, even in Debug mode has shrunk in size considerably. That is nice. I am hoping the performance has increased as well due to the newer compiler because in the past I saw performance gains when I upgraded from Borland C++ Builder 5 to BB6.

So anyway, now to test all of the functionality and if that pans out ok, time to start making a whole bunch of updates to it that I have collected the last year and half and then hopefully get it released before New Years Day.

 

    joe

Rating 3.00 out of 5

Active Directory Lag Sites

by @ 2:05 am. Filed under tech

Over on the “Ask the Directory Services Team” blog there is a post about Lag Sites. I really disliked what was written and left a nice long comment. I am not sure if it will be posted or not and I also wanted to reach folks with my comments about lag sites that possibly don’t read that blog.

My personal thoughts on lag sites are that they can be a good thing. They are not, and visualize me saying this three times out loud to you, they are not a COMPLETE DISASTER RECOVERY SOLUTION. However they can be PART of an overall DR solution. Even an integral part that can help you meet tight SLA/SLO goals.

Better than a lag site, IMO, would be a tombstone reanimation combined with snapshot data recovery mechanism but that is a Windows Server 2008 kind of thing and if the customers I work with are any indication, it will be awhile before most large orgs will be in a position to use that. In the meanwhile lag sites work with everything all the way back to OEM Windows 2000.

 

Here is the blog post

http://blogs.technet.com/askds/archive/2008/10/20/lag-site-or-hot-site-aka-delayed-replication-for-active-directory-disaster-recovery-support.aspx

 

These are my comments to the post:

All I read here is that you need to know what you are doing and should have a clear design and operational model in mind if you use lag sites. I would argue you should know what you are doing and have a good plan if you are responsible for AD at all.

Everything you list here can be covered if you have knowledgeable informed admins. If you have an unwitting admin, you already have a problem, maybe the lag site will help you catch the problem and eradicate it before something serious happens.

Even the repadmin /force can be stopped dead in its tracks if necessary. The methods may or may not be supported by PSS but it doesn’t mean they don’t work just fine. Lots of things in the real world aren’t supported by PSS… Yet…. that work just fine.

Point: Lag sites are not guaranteed to be intact in a disaster:

CounterPoint: Ditto for backups. You should hopefully (again not guaranteed if your admins are the idiots that couldn’t properly run a lag site) have enough backups over time to go back far enough but if the issue occurred pre-TSL, you are SOL either way. A friend of mine presented to the DS PG an awesome way to poison the backups several years ago that was completely undetectable under normal circumstances. That hole has since been plugged because of that conversation but if MSFT now guarantees that backups will be intact in the event of a disaster, I would like to see that guarantee in writing. If not, this point is moot with the understanding that Lag Sites are not a COMPLETE DR solution, but could be PART of an overall solution.

Point: Replicating from lag site might have unrecoverable consequences

Counterpoint: And restoring from tape is also using out of date data correct? Do you have the same concern there??? Logic says you should. Also doing a schema update can do the same, should we not do those either? This is the same scare tactics used by folks in the early days of AD to warn them off from doing Schema changes. We quickly learned that if we know what we are doing and use proper precautions and procedures we will be fine. I especially enjoyed the “may have to do a forest recovery…” bit. Had that been presented to me in a meeting with MSFT in front of a customer, I likely would have been unable to control my chuckling.

Point: Lag sites pose security threats to the corporate environment

Counterpoint: This one gave me a good chuckle too. Ever hear of normal slow convergence across a large enterprise? Ever hear of Kerberos Tickets? At what point did Kerberos start validating if a currently unexpired ticket was tied to a disabled or deleted userid? Yes there *could* be additional issues if auth is possible through the lag site, but this is simply a design and operational criteria to take into account for lag sites as well as normal overall convergence of data churn. It could be a bad thing that happens when repl gets plugged or when a site is normally more latent than other sites or with “official” lag sites or if someone adjusts kerb ticket configuration settings. It isn’t a “oh my god the sky is falling don’t do lag sites because of this” item.

Point: Careful consideration must be put in configuring and deploying lag sites:

Counterpoint: Of course. Careful consideration must be put in configuring and deploying ANY site as well as ANY domain or ANY forest or ANY domain controller.

You likely should have stopped with your post after stating that one week is the hard coded upper limit on normal replication schedules. The rest of this was unhelpful and again reminded me of all of the Schema Updates are bad scare tactics that went around for the initial years of AD.

If you wanted, you could have stated that Lag Sites need to be properly planned. They need to be properly managed. They aren’t a complete DR plan but they can be part of an overall DR plan that is used for various scenarios along with tombstone reanimation, Snapshot data recovery in Window Server 2008, and god forbid tape recovery. As a personal point of interest, I would much rather restore objects out of a lag site than from a backup file. I trust the lag sites more than I trust the backup/restore process.

Going forward, please don’t give advice based on misinformation, little information, or just plain “let’s scare em” type scenarios.

The wrap-up is that a lag site is simply a site that replicates on a longer convergence frequency than “normal sites”. Possibly up to a week out of convergence. This is a fully supported configuration by MSFT. It just isn’t supported as your sole Disaster Recovery solution. And it shouldn’t be because it isn’t a full Disaster Recovery solution.

Rating 4.00 out of 5

K.I.S.S

by @ 2:04 am. Filed under quotes

Recently I received an email from an ex-coworker. One of our other ex-coworkers had popped off to a Math Seminar/Conference thingy about neural networks and it ended up discussing nodes and hubs, interconnections, etc and the most optimal solutions. This ex-coworker who attended happened to remark that what was being described as the optimal solution reminded him of what we put together in the company we had all worked at for the AD Site Topology and Replication model.

When we designed and built that, we weren’t using advanced math or anything like that, we simply were trying to keep things simple. Anytime I walk into any customer I try to keep things simple or if they have something complex, I try to make it more simple. What you design at 11AM on a Tuesday with 3 cups of Starbucks in you will possibly have to be troubleshot at 3AM on Saturday morning after 1.5 hours of fitful sleep and maybe for some of you, 6 or 12 drinks too many at the Halloween or Christmas or Independence Day party…

So in general I will tell people…

I have always thought there is beauty, truth, and elegance in simplicity.

Or if that fails to grab their attention and get them to correct a complex design I go for…

KISS – Keep It Simple Dumb Ass… 😉

Rating 3.00 out of 5

10/7/2008

I code… therefore I am

by @ 10:42 pm. Filed under general

I am finally writing code again… I am happy about that because I was effectively not coding for over a year. Yes, for over a year, almost 15 months actually. That is a long time for me not to write code because it is one of my main creative outlets.

I will be releasing one of the tools because it is an update to GCChk. When I originally wrote GCChk I thought about allowing you to specify the DCs to use to check and then decided that AD is probably better at picking the DCs to use than most admins and didn’t allow you to specify the DCs. Well my good friend Guido ran into an issue that was a perfect example of why you may need to specify a DC to use to do the check. He actually needed to check one GC against the partitions on another GC. Now this normally wouldn’t be a good idea because you should probably check against a writeable partition as it would be considered a bit more authoritative but Guido was in a position where he didn’t have network access to a writeable DC for the partitions he was checking due to the network configuration. The ONLY way to check for lingering objects would be to daisy chain from the writeable to the closest GC to the next closest GC etc all the way to the end of the WAN. The MSFT lingering object check in repadmin just can’t do it and trying to get that updated to do that would take an OS release and Guido was on a project that needed a very troubled forest checked out and fixed quickly. I couldn’t help him with the correcting the issues, but I could help him out with identifying GCs and their specific issues. And so I updated GCChk to to allow specifying GCs for the comparison. Again that will be uploaded in the next few weeks.

 

And the second tool I worked on was a complete surprise for me…. This one I had to work on for my day job so obviously I will never be releasing it. It is… yes wait for it… a C# utility… Yes I wrote my first .NET program. I will admit it was better than I expected but at the same time it was worse. The ".NET is so intuitive and easy" really wasn’t the case, at least not for me. And System.DirectoryServices has some serious issues and bad assumptions. I know I know Eric and JoeK and BrianD and everyone else who told me to use System.DirectoryServices.Protocols. Next time I play with .NET, I will do so. But I think it was good for me to see what most people who jump into .NET (and PowerShell) are likely going to use for directory access and I feel for them. I will try to write a blog or two on a few things that I found to be really annoying and/or bad.

Rating 3.00 out of 5

10/1/2008

Happy October???

by @ 11:47 pm. Filed under general

Can you believe it? Already October again. Sigh. The year really flies when you lose half of it to Carbon Monoxide. On the positive side I don’t have fear turning on the furnace in a few weeks when it gets a little colder.

So Oct 1 has come and is almost gone (here in the good ol EST TZ) and today I got a very welcome email from the MVP Program… I have indeed been awarded the MVP award this year again. Cheer. 🙂 I was really concerned as I wasn’t able to commit any time to the newsgroups but I guess my listserv work, my web site, my free tools, my blog, my responses to the literally thousands of emails in the last year with questions for help were enough. I am happy, I really like the MVP program and the benefits it brings to me.

Sadly though my friend Dean was not re-awarded, I guess you can’t be both a Microsoft Employee and a Microsoft MVP at the same time. 🙁  This last Monday was his first day to work at Microsoft and the bastard has yet to email me or IM me to tell me how he is doing. So if you are on the MSFT campus and you read this and you are within distance to huck something hefty but not hefty enough to permanently harm Dean at him, could you please do so on my behalf? That would be outstanding. Pictures of requested hucking of semi-hefty items would be appreciated as well. Good luck Dean, you wanker. 😉

Rating 3.00 out of 5

9/28/2008

More funny Tina Fey playing Palin…

by @ 11:56 pm. Filed under humour

http://www.nbc.com/Saturday_Night_Live/video/clips/couric-palin-open/704042/

Rating 3.00 out of 5

Behavior…

by @ 1:51 pm. Filed under quotes

When you choose behavior, you choose the consequences.

   – Dr. Phil (Chapter 9 – Family First)

Rating 3.00 out of 5

Watching…

by @ 1:51 pm. Filed under quotes

Don’t worry that children never listen to you; worry that they are always watching you.

      – Robert Fulghm

Rating 3.00 out of 5

Let your life be a living example…

by @ 1:51 pm. Filed under quotes

Your children are profoundly shaped by you, and your actions will resonate, for good or ill, throughout the rest of their lives. Be a parent who lives the qualities, characteristics and values you would like your family to emulate. Let your life be a living example of what you want to see in your children.

    – Dr. Phil  (Chapter 7 Family First)

Rating 3.00 out of 5

[joeware – never stop exploring… :) is proudly powered by WordPress.]