joeware - never stop exploring... :)

Information about joeware mixed with wild and crazy opinions...

Happy 10th Birthday <insertcompanynamehere>.com!!!

by @ 7:25 pm on 4/12/2010. Filed under tech

I built my very first production Active Directory in an Enterprise environment on this day 10 years ago!

c:\>adfind -config -f "(&(objectclass=crossref)(name=companyname))" -alldc whencreated

AdFind V01.40.00cpp Joe Richards (joe@joeware.net) February 2009
Using server: dcname.city.company.com:389
Directory: Windows Server 2003
Base DN: CN=Configuration,DC=company,DC=com

dn:CN=COMPANYNAME,CN=Partitions,CN=Configuration,DC=company,DC=com
>whenCreated: 2000/04/12-19:25:22 Eastern Daylight Time

1 Objects returned

That was a scary time for me. We weren’t moving to Windows 2000 because we had spent a bunch of time thinking about it and that was the end result on some five year timeline somewhere. Nope… We were moving because I found something bad…

At the time, I was working for a company who does outsourcing work. They had hired me from being a contractor to one of the divisions of CompanyX to come in and work on the team running CompanyX’s Data Center Servers and also the NT4 Domain Controllers[1] globally. While I was gauging the size of my issues I looked at the SAM Sizes and saw the SAM size issue and immediately ran (did not pass go, did not collect $200) to the Systems Integration building and spoke to the Windows guys and said… umm we have a bit of a problem. I may or may not have been jumping up and down on a table. I don’t recall exactly. I explained the problem, some folks were sort of shocked because in all honesty, when the company started moving to NT, there wasn’t a lot of people behind it and I think some folks thought it was a bit of a joke. I mean we had mainframes and UNIX and Novell for heaven sakes, who needed Windows? So the plan was hatched to move the Windows 2000 upgrade project into high gear.

SAM Size issue you ask??? The young ones out there may not recall, but there was a published SAM limit size of 40MB back then. The KB discussing it didn’t say anything other than… “Don’t get bigger than this because you will break”. Now there is a much better article, probably the same article but updated – http://support.microsoft.com/kb/130914 that talks you through the SAM size but trust me, back then I recalled it saying something like if you are approaching 40 MB, buy your ticket to Aruba and Cheetos for the flight right now… We had three of the five geographic domains all over 40MB, in fact a couple were in the 100MB range.

 

After that, every day I came into work I was thinking, I wonder if today is the day it all melts down… The Systems Integration folks did amazing work, we had lots of meetings and they came up with a BRILLIANT Upgrade in Place as a Fresh Install process. Basically an image was shot down to every DC through Tivoli[2] and then with a special program we got from MSFT, we would boot over to an alternate normally unbootable partition and apply the image. The image had an automated build process and voila the machine came up as a new Windows 2000 Server and sent us an email that said it was done. We then promoted the DC. We built 400 DCs with that with three people working on. Me full time and a couple of other guys part time as they could. Most of it was done in less than two months. Then there was a hiatus after I was fired from the company I worked for. I spent the summer rollerblading and then by the end of the summer the company who needed the help brought me in directly as a contractor and we got back into it and finished up the Asia Pacific domain since the outsourcing company really didn’t get much if anything of the migration/upgrade done while I was gone. I also helped re-insource most of the servers that had been outsourced to the company I previously worked for. That outsourcing company fired me, and then lost a contract worth many many many millions of dollars where I was used to re-insource everything again. Kind of silly when you think about it. That same company that fired me, three years later, ended up spending five or so months trying to get me to come back…

The number of issues we had moving to Windows 2000 that early on was considerable. FRS was a complete and utter train-wreck for us. Once we hit about 25-50 Domain Controllers FRS was regularly broken and I was working weekly with Microsoft Alliance Premier Support getting buddy builds and trying them. I finally got to a point where I told Microsoft I was going to shut off FRS on all DCs and make my own perl based replication engine. I mean honestly, the SYSVOL was tiny, we had a nearly empty default domain policy, nearly empty default domain controllers policy, and a few scripts in NETLOGON. I could practically hand type the info faster than FRS could replicate it. MSFT got very concerned that I would do what I said (because I would have and they knew full well I would have) and seemed to get FRS mostly working then.

We also had a nice issue with the PDC of our North America domain going out to lunch (or perhaps it was brunch) around 7AM every day and not returning until 11AM or so, and by this I mean it wasn’t processing the thousands of password changes and some other things that needed to occur every morning. MSFT had been working on that for a week or so and they weren’t getting anywhere and I was getting beaten up daily about it so started performing network traces that ran for hours and eventually tracked it down to the NetBIOS node setting. Seems that the PDC was trying to find clients it hadn’t spoken to in ages via broadcast and when it did it, it did it in a single threaded manner and caused the PDC to get all bunched up and “snowball into hell”. Rebooting only helped for a little bit. I asked MSFT if changing the Node type to P-Node from H-Node would help, they said no, I said I would do it anyway. The problem went away and never came back[3] after I did that. MSFT wanted us to switch back to H-Node again so we could troubleshoot it some more and my manager’s manager laughed and told my manager he was stupider than he looked if he allowed that… I guess they didn’t want to impact the 50k or so people anymore that had been getting impacted with the issue.

I could go on forever, lots of fun and interesting memories of that AD deployment and the Exchange deployment that followed. It was a blast and I had a great time working on it. Most technically challenged I have been in the last decade. I would very much enjoy to be back in that environment.

 

   joe

 

[1] Responsible for about 400 NT4 DCs across all time zones support roughly 250,000 or so IDs as well as thousands of NT4 servers in the global corporate Data Centers.

[2] In the end, and really the beginning, Tivoli didn’t work well and instead the image was delivered to all of the DCs via a custom in-house process called FakeTivoli.pl… A perl script I wrote in about an hour or so to zip the image, chop it up, and deliver the chunks and reassemble on the other side. Oh how I wish I had the millions of dollars paid to me that IBM had received for Tivoli which I as never able to use effectively on any of my DCs nor members that had it loaded.

[3] Well it did come back, but not while I ran the Enterprise. After I was fired by the outsourcing company they promoted a new machine to PDC and didn’t heed the warnings and info I had sent out in emails when the issue occurred the first time. They had to email me at home when I no longer worked for them asking how to fix it.

Rating 4.00 out of 5

One Response to “Happy 10th Birthday <insertcompanynamehere>.com!!!”

[joeware – never stop exploring… :) is proudly powered by WordPress.]