Pros and cons of Flake ids and cryptographic Ids

Question

A distributed system can generate unique ids either by Flake or cryptographic ids (e.g., 128 bit murmur3).

Wonder what are the pros and cons of each method.

Michael Deardeuff · Accepted Answer

I'm going to assume 128-bit ids, kind-a like UUIDs. Let's start at a baseline, though

TL;DR: Use random ids. If and only if you have database performance issues try flake ids.

Auto-increment ids

Auto-increment ids are when your backend system assigns a unique, densely-packed id to each new entity. This is usually done by a database, but not always.

The clear advantage is that the id is guaranteed unique to your system, though 128 bits is probably overkill.

The first disadvantage is that you leak information every time you expose your id. You leak what other ids there are (an attacker can easily guess what to look for). You also leak how busy your system is (your competition now knows how many ids you create in a time period and can infer, say financial information).

The second disadvantage is that your backend is no longer as scalable. You are tied to some slow, less scalable id generator that will always be a bottleneck in a large system.

Random ids

Random ids are when you just generate 128 random bytes. v4 UUIDs 122-bit random ids (e.g. 2bbfb5ba-f5a2-11e7-8c3f-9a214cf093ae). These are also practically unique.

Random ids get rid of both of the disadvantages of auto-increment ids: they leak no information and are infinitely scalable.

The disadvantage comes when storing ids in b-trees (à la databases) because they randomize the memory/disk pages that the tree accesses. This may be a source of slow-downs to your system.

To me this is still the ideal id scheme, and you should have a good reason to move off of it. (i.e. profiler data).

Flake ids

Flake ids are random ids with except that the high k bits are taken from the lower bits of a timestamp. For example, you may get the following three ids in a row, where the top bits are really close together.

2bbfb5baf5a211e78c3f9a214cf093ae
2bbf9d4ec10c41049fb1671d6616b213
2bc6bb66e5964fb59050fcf3beed51b1

While you may leak some information, it isn't much if your k and timestamp granularity are designed well.

But if you mal-design the ids they can be less-than-helpful, either too infrequently updated—leading the b-trees to rely on the top random bits negating the usefulness—or too frequently—where you thrash the database because your updates.

Note: By time granularity, I mean how frequently the low bits of a timestamp change. Depending on your data throughput, you probably want this to be hour, deca-minutes, or minutes. It's a balance.

If you see the ids otherwise semantic-less (i.e. never infer anything from the top bits) then you can change any of these parameters at any time without interruption—even going back to purely random where k = 0.

Cryptographic ids

I'm assuming by this you mean ids have some semantic information encrypted in them. Maybe like hashids?

Disadvantages abound:

You'll have different length ids for different data, unless you have a fixed-length protocol.
You'll be tempted to add more and more info to the ids.
Look random, but no mitigation to add flake-like timestamps to the front
Ids become tied to the system that made it. You may start asking that system for decrypted versions of the id instead of just asking for the data it points to.
Your system burns time decrypting ids to extract data.
You add encryption problems
- what happens if the secret-key is leaked? (Better not have too sensitive of data in there, customer name, or heaven forbid a credit card number)
- coordinating key rotation.
- Small ids like hashid can be brute-forced attack.

As you can see, I am not a fan of semantic ids in general. There are a few places where I use them, though I call them tokens. These don't get stored as keys in a database (or likely not stored anywhere).

For example I use encryption for pagination tokens: encrypted {last-id / context} of a pagination API. I prefer this over having the client pass the last element of the prior page because we keep the database context hidden from the user. It's simpler for everyone, and the encryption is little more than obfuscation (no sensitive information).

Pros and cons of Flake ids and cryptographic Ids

Tags:

identity

distributed-system

Justin Lin

1 Answers

Auto-increment ids

Random ids

Flake ids

Cryptographic ids

Michael Deardeuff

Recent Activity

Donate For Us

Pros and cons of Flake ids and cryptographic Ids

Tags:

identity

distributed-system

Justin Lin

1 Answers

Auto-increment ids

Random ids

Flake ids

Cryptographic ids

Michael Deardeuff

Related questions

Recent Activity

Donate For Us