Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Generating a Not-Quite-Globally Unique Identifier

I've found a number of different questions on generating UIDs, but as far as I can tell, my requirements here are somewhat unique (ha).

To summarize: I need to generate a very short ID that's "locally" unique, but does not have to be "globally" or "universally" unique. The constraints are not simply based on aesthetic or space concerns, but due to the fact that this is essentially being used as a hardware tag and is this subject to the hardware's constraints. Here are the specifications:

Hard Requirements

  • The ID must contain only decimal digits (the underlying data is a BCD);
  • The maximum length of the ID is 12 characters (digits).
  • Must be generated offline - a database/web connection is not always available!

Soft Requirements

  • We'd like it to begin with the calendar year and/or month. As this does waste a lot of entropy, I don't mind compromising on this or scrapping it entirely (if necessary).
  • IDs generated from a particular machine should appear sequential.
  • IDs do not have to sort by machine - for example, it's perfectly fine for machine 1 to spit out [123000, 124000, 125000], and machine 2 to spit out [123500, 123600, 124100].
  • However, the more sequential-looking in a collective sense, the better. A set of IDs like [200912000001, 200912000002, 200912000003, ...] would be perfect, although this obviously does not scale across multiple machines.

Usage Scenario:

  • IDs within the scope of this scheme will be generated from 10, maybe 100 different machines at most.
  • There will not be more than a few million IDs generated, total.
  • Concurrency is extremely low. A single machine will not generate IDs more often than every 5 minutes or so. Also, most likely no more than 5 machines at a time will generate IDs within the same hour or even the same day. I expect less than 100 IDs to be generated within one day on a given machine and less than 500 for all machines.
  • A small number of machines (3-5) would most likely be responsible for generating more than 80% of the IDs.

I know that it's possible to encode a timestamp down to 100 ms or even 10 ms precision using less than 12 decimal digits, which is more than enough to guarantee a "unique enough" ID for this application. The reason I am asking this here on SO, is because I would really like to either try to incorporate human-readable year/month in there or encode some piece of information about the source machine, or both.

I'm hoping that someone can either help with a compromise on those soft requirements... or explain why none of them are possible given the other requirements.

(P.S. My "native" language is C# but code in any language or even pseudocode is fine if anybody has any brilliant ideas.)

Update:

Now that I've had the chance to sleep on it, I think what I'm actually going to do is use a timestamp encoding by default, and allow individual installations to switch to a machine-sequential ID by defining their own 2- or 3-digit machine ID. That way, customers who want to mess with the ID and pack in human-readable information can sort out their own method of ensuring uniqueness, and we're not responsible for misuse. Maybe we help out by providing a server utility to handle machine IDs if they happen to be doing all online installations.

like image 393
Aaronaught Avatar asked Dec 23 '09 16:12

Aaronaught


People also ask

How do I make a global unique identifier?

An online generator constructs a unique GUID according to RFC 4122. When creating a GUID, users should note the timestamp, clock sequence and the node ID -- such as a Media Access Control (MAC) address. This image shows an example of a GUID using hexadecimal digits.

How is a UUID generated?

UUID was standardized by the Open Software Foundation (OSF), becoming a part of the Distributed Computing Environment (DCE). Different versions of UUID follow the RFC 4122 specification. UUIDs are generated using an algorithm based on a timestamp and other factors such as the network address.

What does a GUID look like?

What does a GUID look like? A GUID follows a specific structure defined in RFC 4122 and come in a few different versions and variants. All variants follow the same structure xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx where M represents the version and the most significant bits of N represent the variant.

What are the parts of a globally unique identifier?

A globally unique identifier (GUID) is a 128-bit number created by the Windows operating system or another Windows application to uniquely identify specific components, hardware, software, files, user accounts, database entries and other items.


2 Answers

"The reason I am asking this here on SO, is because I would really like to either try to incorporate human-readable year/month in there or encode some piece of information about the source machine, or both."

Let me start by saying I've dealt with this before and attempting to store useful information into a serial number is a BAD idea long term. A device serial number should be meaningless. Just like the primary key of a database record should be meaningless.

The second you start trying to put real data into your serial number, you've just thrown BUSINESS LOGIC into it and you will be forced to maintain it like any other piece of code. Future you will hate past you. Trust me on this. ;o)

If you attempt to store date/time values, then you'll waste numeric space with invalid time/dates. For instance you'll never have anything greater than 12 in the month field.

A straight epoch / unit time counter would be better, but for a machine that only generates a few id's per minute you'll still waste a lot of space.

12 digits is not a lot of space. Look at the VIN page on Wikipedia. Space for only a few manufacturers, only a few thousand cars. They are now reusing VINs because they ran out of space by packing meaning into it.

http://en.wikipedia.org/wiki/VIN

That's not to say ALL meaning in a serial number is bad, just keep it strictly limited to making sure the numbers don't collide.

Something like this...

  • Position 1-3: 999 Machines
  • Position 4-12: Sequential Numbers

That's ALL you need to avoid collisions. If you adding a location digit, then you are screwed when you get to 11 locations.

Sorry if this sounds like a rant. I deal with this a lot manufacturing electronics and various machined parts. It had never ended well long term unless there's LOTS of space available, or a secondary tag (which -wow- provides the necessary id space mentioned before)

like image 50
Great Turtle Avatar answered Oct 25 '22 06:10

Great Turtle


How about yyMMddhhmmID?

yy = two-digit year
MM = two-digit month
dd = two-digit day
hh = two-digit hour (24-hour time)
mm = two-digit minute
ID = machine-specific ID

Example: 0912113201 from machine with ID = 01.

Alternatively (if you don't like two-digit years (Y2K lol)), how about yyyyMMIDxxxx?

yyyy = four-digit year
MM = two-digit month
ID = machine-specific ID
xxxx = sequentially-incremented integer

Example: 200912010001 from machine with ID = 01.

As you said that each machine will only generate one identifier maximum every five minutes, this gives you room for 8,928 (24 * 31 * 60 / 5 = 8928) identifiers per month which will fit in xxxx. Here you could squeeze the year down to a three-digit year yyy (009, e.g.) if you needed an extra digit in the xxxx sequence or the machine ID.

Both of these fit timestamp/machine ID as you requested.

We all like concrete code:

class Machine {
    public int ID { get; private set; }
    public Machine(int id) {
        ID = id;
    }
}

 class IdentifierGenerator {
    readonly Machine machine;
    int seed;
    const int digits = 4;
    readonly int modulus;
    readonly string seedFormat;
    public IdentifierGenerator(Machine machine) {
        this.machine = machine;
        this.modulus = (int)Math.Pow(10, digits);
        this.seedFormat = new string('0', digits);
    }

    public string Generate() {
        string identifier = DateTime.Now.ToString("yyyyMM") 
                                + machine.ID.ToString("00") 
                                + seed.ToString(seedFormat);
        seed = (seed + 1) % modulus;
        return identifier;
    }
}

Machine m = new Machine(1);
IdentifierGenerator gen = new IdentifierGenerator(m);
Console.WriteLine(gen.Generate());
Console.WriteLine(gen.Generate());

Outputs:

200912010000
200912010001
like image 38
jason Avatar answered Oct 25 '22 05:10

jason