Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Shortening/Rehashing UUIDs

first of all, I want to assure that I'm aware of the fact, that rehashing is a sensible topic. However I'd like to hear some of your opinions, what approach you would take here.

I'm building a distributed application, where nodes remotely create entities identified by a UUID. Eventually, all entities should be gathered at a dedicated drain node, which stores all entities by using these UUIDs.

Now I want to create additional identifiers, which are more handy for human users. Base64-encoding the UUIDs would still create IDs with 22 characters, which is not appropriate for human usage. So I need something like URL-shortening services. Applying bijective functions will not help, because they will not reduce the information value. Of course, I'm aware that I need to lose information in order to shorten the id. And I'm also aware that any reduction of information of a hash will increase the probability of collision. I'm stuck, what is the most appropriate way to reduce information in order to create shorter ids for humans.

Here are some prerequisites: I will provide the ability to map {UUID, shortened ID} via my data storage. I'd still prefer a non-centralized solution. I will probably never ever need more than about a milion of IDs (~2^20) in total.

Here are the thoughts I came up with so far:

  • Auto incremented IDs: If I'd use some kind of auto-incremented id, I could transfer this id to an obfuscated string and pass this around. This would be the easiest approach, and as long as there are few keys around, the keys would not be very long. However I'd have to introduce a centralized entity which I don't really want.
  • Shorten the UUID: I could just take some of the bits of the original 128 bit uuid. Then I should take at least into account the version of the UUID. Or is there anything else wrong with this?
  • Rehashing the UUID: I could apply a second hashing algorithm on my initial UUID and store the mapping.

Are there any other approaches? What is favorable?

Thanks in advance!

like image 294
b_erb Avatar asked Feb 12 '10 17:02

b_erb


People also ask

How do you shorten Uuids?

The UUID shortener shortens a 36 long UUID to a 22 character long string. I use this UUID shortener in Rails.

Are Uuids hashes?

UUID's are generated from names. We can now use a name and a namespace to create a series of UUID's. The MD5 hashing algorithm is a widely used hash function that produces a 128-bit hash value. Originally MD5 was designed to be used as a cryptographic hash function, but it now appears to have vulnerability issues.


1 Answers

1) To shorten the UUID, you can simply XOR the top half with the bottom (and repeat until it's short enough for you). This will preserve the distribution characteristics. Like any solution that shortens the output, it will increase the possibility of collision due to the birthday paradox

2) XOR amounts to a trivial hash, but since no additional mixing is needed, it's fine. You could use a CRC or noncryptographic hash on your UUID, but I don't believe it's any improvement.

3) If you're willing to accept some central management, it doesn't have to be painful. A central authority can dole out medium-sized blocks of address space to each client, then the client can iterate through that subrange when assigning ID's. This guarantees that there are no collisions, but also avoids a round-trip for each ID. One way to do it would be to use a 32-bit integer for the ID, doling out a 16-bit block at a time. In other words, the first client gets handed 0001, which allows 00010000 to 0001FFFF.

4) You could insert into the database with a UUID, but also have an identity field. This would provide an alternate, more compact unique ID, which can be limited to a 32-bit int.

like image 129
Steven Sudit Avatar answered Sep 21 '22 04:09

Steven Sudit