I've been browsing the net trying to find a solution that will allow us to generate unique IDs in a regionally distributed environment.
I looked at the following options (among others):
SNOWFLAKE (by Twitter)
UUID
AUTOINCREMENT IN RELATIONAL DATABASE LIKE MYSQL
AUTOINCREMENT IN A NON-RELATIONAL DATABASE LIKE COUCHBASE
Lets say that we have clusters consisting of 10 Couchbase Nodes and 10 Application nodes in 5 different regions (Africa, Europe, Asia, America and Oceania). This is to ensure that content is served from a location closest to the user (to boost speed) and to ensure redundancy in case of disasters etc.
Now, the task is to generate IDs that wont collide when the replication (and balancing) occurs and I think this can be achieved in 3 steps:
Step 1
All regions will be assigned integer IDs (unique identifiers):
Step 2
Assign an ID to every Application node that is added to the cluster keeping in mind that there may be up to 99 999 servers in one cluster (even though I doubt: just as a safely precaution). This will look something like this (fake IPs):
Please note that all of these are in the same cluster, so that means you can have node 00001 per region.
Step 3
For every record inserted into the database, an incremented ID will be used to identify it, and this is how it will work:
Couchbase offers an increment feature that we can use to create IDs internally within the cluster. To ensure redundancy, 3 replicas will be created within the cluster. Since these are in the same place, I think it should be safe to assume that unless the whole cluster is down, one of the nodes responsible for this will be available, otherwise a number of replicas can be increased.
Bringing it all together
Say a user is signing up from Europe: The application node serving the request will grab the region code (4 in this case), get its own ID (say 00005) and then get an incremented ID (1) from Couchbase (from the same cluster).
We end up with 3 components: 4, 00005,1
. Now, to create an ID from this, we can just join these components into 4.00005.1
. To make it even better (I'm not too sure about this), we can concatenate (not add them up) the components to end up with: 4000051
.
In code, this will look something like this:
$id = '4'.'00005'.'1';
NB: Not $id = 4+00005+1;
.
Pros
Cons
I know that every solution has flaws, and possibly more that what we see on the surface. Can you spot any issues with this whole approach?
Thank you in advance for your help :-)
EDIT
As @DaveRandom suggested, we can add the 4th step:
Step 4
We can just generate a random number and append it to the ID to prevent predictability. Effectively, you end up with something like this:
4000051357
instead of just 4000051
.
I think this looks pretty solid. Each region maintains consistency, and if you use XDCR there are no collisions. INCR is atomic within a cluster, so you will have no issues there. You don't actually need to have the Machine code part of it. If all the app servers within a region are connected to the same cluster, it's irrelevant to infix the 00001 part of it. If that is useful for you for other reasons (some sort of analytics) then by all means, but it isn't necessary.
So it can simply be '4' . 1' (using your example)
Can you give me an example of what kind of "sorting" you need?
First: One downside of adding entropy (and I am not sure why you would need it), is you cannot iterate over the ID collection as easily.
For Example: If you ID's from 1-100, which you will know from a simple GET query on the Counter key, you could assign tasks by group, this task takes 1-10, the next 11-20 and so on, and workers can execute in parallel. If you add entropy, you will need to use a Map/Reduce View to pull the collections down, so you are losing the benefit of a key-value pattern.
Second: Since you are concerned with readability, it can be valuable to add a document/object type identifier as well, and this can be used in Map/Reduce Views (or you can use a json key to identify that).
Ex: 'u:' . '4' . '1'
If you are referring to ID's externally, you might want to obscure in other ways. If you need an example, let me know and I can append my answer with something you could do.
@scalabl3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With