Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there anything wrong with using base62 (Alphanumeric) UUIDs?

Tags:

uuid

base62

Standard UUIDs are long, and you can't select the whole thing by double clicking.

e.g. 123e4567-e89b-12d3-a456-426655440000

I like shorter IDs.

I like being able to double click an ID to select it.

My question is: are there any issues with encoding a standard ID into a 22(ish) character long base62 alphanumeric string?

e.g. 71jbvv7LfRKYp19gtRLtkn

EDIT: Added Context
Our needs are for general data storage in NoSQL data storage services such as DynamoDB. Collision should not happen, but my understanding is that collision risk with UUIDs is negligible. Standard UUIDs would suit our needs, so what I'm asking is... is there any difference, or extra risk or unforeseen issues with encoding in base62 that doesn't exist with standard UUIDs?

Thanks.

like image 553
JeremyTM Avatar asked Mar 07 '17 21:03

JeremyTM


3 Answers

I think it's a good idea and I'm strongly considering it myself for my current project.

But only for external representation, not for internal storage.

Indeed, UUIDs are fundamentally just 128 bit integers, or an array of 16 bytes or 128 bits.

For efficient DB storage, they should be stored in their binary form (e.g. a BINARY(16) column in MySQL). It will save space (16 bytes vs 36 bytes for the usual text representation, or 22 bytes for Base62), and perform faster when querying or indexing (strings don’t sort as fast as numbers because they rely on collation rules).

The canonical representation is a hexadecimal encoding, with the 8-4-4-4-12 grouping, based on the semantic meaning of each group of bytes (meaning which we don't care about in most cases).

But it is just a convention, and not human-friendly at all. So I think a different encoding such as Base62 is totally acceptable, to be exposed where human interaction happens (e.g. in URLs), or for interfaces or storage system that are text-based anyway (HTTP APIs for example, or file storage in CSV/JSON/XML...).

Internally your application should use them in binary form. I don't know about PHP but Java for example has the java.util.UUID class.

For Java there's also a really nice library that makes conversion between raw UUID and Base62 text representation very easy:

https://github.com/Devskiller/friendly-id

More about UUIDs:

  • Wikipedia article

  • UUID or GUID as Primary Keys? Be Careful!

like image 155
Pierre Henry Avatar answered Nov 18 '22 02:11

Pierre Henry


Base62 is not as standard as base-64, but then base-64 would have two extra symbols which may not allow selecting the whole thing by double clicking.

How about just removing the dashes (-)? That would make it shorter than original and it would be easily selectable by double clicking a mouse.
Example:
123e4567e89b12d3a456426655440000

Update:
There are two common encodings for base-64: [a-zA-Z0-9/+] and [a-zA-Z0-9_-]. If you go with the latter, then that resolves your selection issue.
On the other hand, I think base-62 is more widely used than I originally thought. Here is a nice blog on the topic of using base-62: http://blog.birdhouse.org/2010/10/24/base62-urls-django/

like image 2
dabest1 Avatar answered Nov 18 '22 01:11

dabest1


Solution to your problem is frequently named as Url62, some projects are using this conventions. They are converting plain UUID to Base62 format.

If you are developing in Java, then take a look at FriendlyId project: https://github.com/Devskiller/friendly-id

More to read about this topic: https://medium.com/@huntie/representing-a-uuid-as-a-base-62-hash-id-for-short-pretty-urls-c30e66bf35f9

like image 2
MariuszS Avatar answered Nov 18 '22 01:11

MariuszS