Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are the tradeoffs when generating unique sequence numbers in a distributed and concurrent environment?

I am curious about the contraints and tradeoffs for generating unique sequence numbers in a distributed and concurrent environment.

Imagine this: I have a system where all it does is give back an unique sequence number every time you ask it. Here is an ideal spec for such a system (constraints):

  • Stay up under high-load.
  • Allow as many concurrent connections as possible.
  • Distributed: spread load across multiple machines.
  • Performance: run as fast as possible and have as much throughput as possible.
  • Correctness: numbers generated must:
    1. not repeat.
    2. be unique per request (must have a way break ties if any two request happens at the exact same time).
    3. in (increasing) sequential order.
    4. have no gaps between requests: 1,2,3,4... (effectively a counter for total # requests)
  • Fault tolerant: if one or more, or all machines went down, it could resume to the state before failure.

Obviously, this is an idealized spec and not all constraints can be satisfied fully. See CAP Theorem. However, I would love to hear your analysis on various relaxation of the constraints. What type of problems will we left with and what algorithms would we use to solve the remaining problems. For example, if we rid of the counter constraint, then the problem becomes much easier: since gaps are allowed, we can just partition the numeric ranges and map them onto different machines.

Any references (papers, books, code) are welcome. I'd also like to keep a list of existing software (open source or not).


Software:

  • Snowflake: a network service for generating unique ID numbers at high scale with some simple guarantees.
  • keyspace: a publicly accessible, unique 128-bit ID generator, whose IDs can be used for any purpose
  • RFC-4122 implementations exist in many languages. The RFC spec is probably a really good base, as it prevents the need for any inter-system coordination, the UUIDs are 128-bit, and when using IDs from software implementing certain versions of the spec, they include a time code portion that makes sorting possible, etc.
like image 307
newtonapple Avatar asked Jul 08 '10 07:07

newtonapple


People also ask

How does zookeeper generate monotonically increasing numbers?

Zookeeper provides two types of modes to create a sequential ZNode. EPHEMERAL_SEQUENTIAL mode: The node gets created when the client connects with the server and once the client disconnects the node will be deleted. The name is appended with a monotonically increasing number.


1 Answers

If you must be sequential (per machine) but can drop the gap/counter requirments look for an implementation of the Version 1 UUID as specified in RFC 4122.

If you're working in .NET and can eliminate the sequential and gap/counter requirements, just use System.Guids. They implement RFC 4122 Version 4 and are already unique (very low collision probability) across machines and requests. This could be easily implemented as a web service or just used locally.

like image 151
JamieSee Avatar answered Oct 31 '22 03:10

JamieSee