joeware - never stop exploring... :)

Information about joeware mixed with wild and crazy opinions...

Kerberos Skew > 5 minutes… You can probably still log on…

by @ 2:43 pm on 9/17/2012. Filed under tech

I had intended, actually strike that, I had thought I had written a blog entry about the urban legends surrounding the dreaded 5 minutes of Kerberos skew. Basically it was to be a story around the fear everyone has about our clocks going outside of 5 minutes of skew and then never being able to log on again yadda yadda yadda booooooo cue squeaky door and ghostly howl and perhaps a moon rising through a dirty multi-paned window. Well apparently I never wrote it because it isn’t out here…. anywhere. What probably happened is that I know I wrote a version for the folks I was discussing it with internal to my employer since that is where I got goosed to go look into it again and then never took the time to write up a version to post for everyone else even though I fully intended to when my hands were elbow deep in Windows Kerberos Client Source Code and I didn’t want to repeat this journey again later. For that I apologize. 🙂

 

Anyway I recently had some folks (actually a couple of different groups at the same time oddly enough) who were rather adamant that authentication is as dead as a doornail when kerb skew > 300 seconds and I said no it isn’t and of course then I had to back it all up because so many documents say you will break but worse, everyone *KNOWS* you will break. You know how that is… Everyone "knows" something that is incorrect but since everyone "knows", it doesn’t matter that everyone is wrong… Well in this case sort of wrong. Point blank, yes if you have Kerberos Skew greater than 5 minutes, you can almost certainly still interactively log onto your Windows Server and/or Client. That doesn’t mean everything will work perfectly, but you will be able to log on to the machine because that piece does work.

 

Back Story: I personally knew it worked because I had chatted with a dev at an MVP Summit years ago (Windows 2003 release time frame I would say) about it and asked why couldn’t they fix it and he looked at me and said it was fixed. You should have no issues logging into a client when the skew is greater than 5 minutes… They fixed it after 2K OEM at some point and in fact, by some accounts, it was always intended to actually work. I immediately shut up realizing I hadn’t tested it ever since I had seen the actual error first hand on OEM W2K in April 2000 or so when pushing W2K en masse out to a largish (Fortune 5 at the time) company and I didn’t want to look any more silly than I already had complaining about something that was apparently already fixed. :)  When I got back to my home lab I set it up and sure enough, obviously, the dev was correct, it worked great and the network trace clearly showed it working. The server would return a KRB_APP_SKEW error and it would have a server time stamp and the client would resend the request with an updated time stamp. From that point *I* knew it was ok for client auth but that didn’t help anyone else as I never mentioned it to anyone. 😉

So anyway, after telling these folks that yes interactive authentication does indeed work if you are out of skew I started looking at the docs to get the proof they, for some reason <eg>,  wanted and sure enough, pretty much every doc (one Google cache exception – couldn’t find it except in the Google cache) I could find all said the same thing… >5 minutes skew is bad for your authentication health. That meant I needed to test it again and perform the network trace and sure enough it all worked as I (but apparently no one else) expected so I started looking at the source code and found all the nice bits that set up a table of time deltas for dealing with KDCs so that if the client is off on its time, the error message coming back from the server will include a time stamp and you should (note the use of the word should and not the word must) use that time to resend a request with a proper time stamp. I chased it all the way back to 2K SP3 which is the furthest back I could look and it was all there. The fact that it works this way makes perfect sense… Consider the case of multiple forests (or other Kerberos Realms) managed by different groups who may or may not use the same time sources and very possibly out of sync with each other. Do want two groups of admins to duke it out over time or do you, as a user, just want to log on?

So armed with that info I decided to ping some really smart DS guys I know inside of MSFT and try to get to the bottom of everything. It was one of my fairly normal emails to the guys there that I have so much respect for (seriously – most of the smartest people I have ever met are within MSFT)… It spoke about the sky falling and the apocalypse etc. that supposedly occurs when your time is wrong but I can log on fine and then yadda yadda yadda blah blah with some technical chatter and I get more and more specific with the details I have including, IIRC, discussion about network traces and Kerberos constants and I think I discussed some lines from the OS Source Code (I don’t recall now for certain, it was a month or two and about a dozen different technical emergencies ago) about why I don’t think the sky is really falling but I admit I don’t know everything and perhaps I am missing something. 

The response that comes back makes total sense. In summary the answer was "It works but we can’t guarantee it will work everywhere at all times so it is safer to just say "Hey this is bad!""- actually I think the beating of drums was mentioned, perhaps that is how the information is spread in areas with no mobile coverage… drums, smoke signals and the occasional carrier pigeon. J/K  😉 In the absence of a guarantee you have to beat a drum to get everyone going in the same general direction in the same way (i.e. stick to less than 5 minutes skew because trying to bet on the caveats could skewer you[1]). Also now as I think back to it, beating the drum was the old fashioned (but still used) way to maintain time for certain synced events (like rowing) so perhaps I missed a bit of buried humour there. I need to go back and find that email!

 

Also my source, a well known resource in the DS community that we will call Mr. X, said there are certain instances where it really shouldn’t work… say like replication and some other functions. You may be thinking as I did for a few seconds, but why? Well… if you think about it, and I did, replication should really be working with the right time or at least, and this is more important, CONSISTENT TIME, between domain controllers… Why you ask? Go back and review the AD docs on conflict resolution. One of the conflict resolution data points, thankfully not the first, but one of them, is the time of the update. That alone is worth the price of admission for a Kerberos time skew failure that sticks without even considering other points.

 

So the KB article 956627 (http://support.microsoft.com/kb/956627) was shared with me that for some reason I never was able to find previously even though I used the Kerberos Constant name for the time skew failure. While the document doesn’t come straight out and tell you this should work, it is explaining why you will still get a ticket when the skew is greater than defined for "failure".

 

Ah interesting… I just typed a lot for apparently no reason… I just performed a search on the KERB constant again to get some references and got a hit on

http://www.networksteve.com/?p=7246

which is a rip off of the AskDS blog at http://blogs.technet.com/b/askds/archive/2012/08/24/friday-i-mean-saturday-mail-sack-very-wordy-edition.aspx and you can now read the other side of the question… ;)  Well actually no, this should get out there too, perhaps we will get enough people that will post this info that it will at least equal the number of references on the internet that says it will absolutely break. 🙂

 

   joe

 

[1] See what I did there…

Rating 4.40 out of 5

2 Responses to “Kerberos Skew > 5 minutes… You can probably still log on…”

  1. Martin says:

    Thanks for sharing!

  2. Mike Kline says:

    Network Steve seems to be a total asshole…can’t believe he took the DS blog and plagiarized it word for word.

[joeware – never stop exploring… :) is proudly powered by WordPress.]