Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to partition UUID space into N equal-size partitions?

Take a UUID in its hex representation: '123e4567-e89b-12d3-a456-426655440000'

I have a lot of such UUIDs, and I want to separate them into N buckets, where N is of my choosing, and I want to generate the bounds of these buckets.

I can trivially create 16 buckets with these bounds:

00000000-0000-0000-0000-000000000000
10000000-0000-0000-0000-000000000000
20000000-0000-0000-0000-000000000000
30000000-0000-0000-0000-000000000000
...
e0000000-0000-0000-0000-000000000000
f0000000-0000-0000-0000-000000000000
ffffffff-ffff-ffff-ffff-ffffffffffff

just by iterating over the options for the first hex digit.

Suppose I want 50 equal size buckets(equal in terms of number of UUID possibilities contained within each bucket), or 2000 buckets, or N buckets.

How do I generate such bounds as a function of N?

like image 804
andy Avatar asked Jan 21 '26 23:01

andy


2 Answers

Your UUIDs above are 32 hex digits in length. So that means you have 16^32 ≈ 3.4e38 possible UUIDs. A simple solution would be to use a big int library (or a method of your own) to store these very large values as actual numbers. Then, you can just divide the number of possible UUIDs by N (call that value k), giving you bucket bounds of 0, k, 2*k, ... (N-1)*k, UMAX.

This runs into a problem if N doesn't divide the number of possible UUIDs. Obviously, not every bucket will have the same number of UUIDs, but in this case, they won't even be evenly distributed. For example, if the number of possible UUIDs is 32, and you want 7 buckets, then k would be 4, so you would have buckets of size 4, 4, 4, 4, 4, 4, and 8. This probably isn't ideal. To fix this, you could instead make the bucket bounds at 0, (1*UMAX)/N, (2*UMAX)/N, ... ((N-1)*UMAX)/N, UMAX. Then, in the inconvenient case above, you would end up with bounds at 0, 4, 9, 13, 18, 22, 27, 32 -- giving bucket sizes of 4, 5, 4, 5, 4, 5, 5.

You will probably need a big int library or some other method to store large integers in order to use this method. For comparison, a long long in C++ (in some implementations) can only store up to 2^64 ≈ 1.8e19.

like image 179
Thomas Cohn Avatar answered Jan 23 '26 20:01

Thomas Cohn


If N is a power of 2, then the solution is obvious: you can split on bit boundaries as for 16 buckets in your question.

If N is not a power of 2, the buckets mathematically cannot be of exactly equal size, so the question becomes how unequal are you willing to tolerate in the name of efficiency.

As long as N<2^24 or so, the simplest thing to do is just allocate UUIDs based on the first 32 bits into N buckets each of size 2^32/N. That should be fast enough and equal enough for most applications, and if N needs to be larger than that allows, you could easily double the bits with a smallish penalty.

like image 26
StephenS Avatar answered Jan 23 '26 21:01

StephenS