joeware - never stop exploring... :)

Information about joeware mixed with wild and crazy opinions...

“GUIDs” or “Having unique in the name doesn’t make it so…”

by @ 1:08 pm on 6/19/2005. Filed under tech

There is a common misconception that GUIDs[1] are guaranteed to be unique. The misconception is probably because GUID stands for Globally Unique Identifier or people are misunderstanding the concept of being “mathematically guaranteed” unique.

If you look at the documentation for one of the higher MS API function that creates GUIDs – CoCreateGuid you will find the actual text

To a very high degree of certainty, this function returns a unique value – no other invocation, on the same or any other system (networked or not), should return the same value.

GUIDS are mathemetically guaranteed to be unique due to the generation algorithms combined with the sheer number of GUID key values available. GUIDS are 128 bits (32 nibbles, 16 bytes, 8 DWORDs) long. That means, assuming a good creation algorithm, that it is statistically unlikely that you will encounter a duplicate that hasn’t been purposely contrived. Let me write that again “statistically unlikely”, that is NOT read as “impossible”… If you have a crappy GUID generation algorithm, the chances of duplication can go up significantly. Anyway, it is statistically unlikely because there are 2 to the 64th power keys, this comes out to 18446744073709551616 possible keys. That is over 18 quintrillion keys (18 trillions in british)…

Or

18 Qintillions
446 quadrillions
744 trillions
073 billions
709 millions
551 thousands
616 hundreds

Oh for my the British friends this would actually be

18 trillions
446 billiards
744 billions
073 milliards
709 millions
551 thousands
616 hundreds

That means if there are 6 billion people on the planet Earth (in American), every single one could have something like 3 billion (again in American) unique GUIDs/UUIDs. One would wonder what happens though when some knucklehead like me writes a program to generate GUIDs to see if I can force a duplicate and generate billions and billions of GUIDs… Does that mean several people in small third world countries will never get their GUIDs? What if lots of people run tools that do that? Does that mean that people living in Chad will never get to have a single unique GUID to themselves[2]?

For normal use, this “guarantee” is alleged to run out around the year 3400. For most folks, me being one of them, this is a pretty good guarantee for normal computing, but strictly speaking, this isn’t a true guarantee. If you had an explosion due to a duplicate GUID, no one is going to rush in and say, well you are covered because that is guaranteed to not happen, have a free copy of Windows to replace the version that broke with a whole new set of GUIDs for you! No, you would hear, well, it is statistically unlikely to happen, you just happen to have hit the statistical part (do you play Lotto?). Quite honestly, duplicates could be popping up all of the time, no one is actually checking every GUID/UUID being created. You have to be confident this isn’t the case though or else you would be scared, very very scared.

A couple of things that bother me about GUIDs and their generation….

The primary algorithm is or at least was before Melissa showed the privacy issue with GUIDs was to use the MAC address of the machine if it had a network card. This is fine and all, but MACs aren’t guaranteed to be unique. If they were, you couldn’t modify them. It is quite easy to have 3 or 10 or 100 computers all with identical MAC addresses, as long as they aren’t on the same network segment using the same router you are quite fine. In fact, even if you are, chances are things will work, just poorly. “Borrowing” MAC addresses is a pretty common mechanism for stealing wireless access in snotty hotels that don’t offer free wireless. This would mean that that functionality cancels out some of the uniqueness offered by MACs unless you scope the whole thing to a realm with controlled MACs.

Now who does this MAC address changing who isn’t trying to steal services? Well lots of people, this is actually fairly common in big businesses that use MAC addresses for identifying machines. If it weren’t, you wouldn’t find it so easy to do. One company I was with used to hard code the MAC address of every single server. The server that would eventually replace that server would get the exact same MAC address and any time there was a NIC failure, the NIC was also hardset to the proper MAC address. Now you could say, well they are enforcing their own uniqueness (hopefully) by not having them out there at the same time. But that isn’t the point, the point is, the MAC address isn’t guaranteed unique to a NIC or a given machine.

The next thing that bothers me is something I ran into by accident when looking into another issue. It seems that MS changed the algorithm for the low level API function UUIDCreate in (I believe) Windows 2000 so that it can no longer be traced to a specific computer nor can it be associated with other GUIDs generated by the same computer. I only found this documented in one place, the function call that had been changed, everything else refers to MAC and time being used still.

This spawns all sorts of questions of “how?”. Embedding the MAC was one of the ways to “guarantee” uniqueness on a single network since if you had the MAC slapped in there, it would at least take a machine with the exact same MAC or some highly improbably random GUID function generation to hit a dupe. Is that level of guarantee still there? Is the MAC interweaved somehow into the address still? Does this have a practical impact on its level of uniqueness guarantee in a closed scope environment? For instance, if all machines on a closed network were MS machines used a MAC based generation mechanism and all machines had unique MACs and the time was strictly controlled on every machine, you are as near to 100% guarantee of non-duplicate GUIDs as is currently possible. This is because you have the same algorithm and the parts used to help generate uniqueness are themselves controlled to actually be unique. Do you have that same level of guarantee still with the new mechanism? The part that really gets me is that no GUID can be associated with another GUID from the same machine. Anyone have a public document with the new GUID generation mechanism for UUIDCreate?

joe

[1] GUIDs are an implementation of the Digital Equipment Corp’s (DEC) UUID concept which stands for Universally Unique Identifiers. There are several mechanisms that have been used for creating GUIDs/UUIDs, the generaly used components used are MAC address of the machine generating the GUID/UUID , the time, and/or a “cryptographically strong” random number. The MAC and time components have created some stir about not being opaque (meaning people can determine information from the GUID/UUID) generated so the algorithms have been receiving some work and MS now has two low level main functions for creating UUIDs, one that uses the MAC, and one that doesn’t.

[2] To protect myself I have created several hundred billion GUIDs[3] that I stuck in a database for a rainy day, who knows maybe someday I can sell them for $1 a piece later. You laugh, but look at IP Addresses, no one thought they would get tight either!

[3] Ok maybe I didn’t… what is that like 5 or so Terabytes of info for 300 billion GUIDs? ~67 million GUIDs to a GB… I only have about 3 Terabytes of storage online at home. I wonder what that would zip down to if the GUIDs were properly sorted with proper being dependent on the best way to sort for efficient zipping…

Rating 3.00 out of 5

Comments are closed.

[joeware – never stop exploring… :) is proudly powered by WordPress.]