Standard UUIDs are long, and you can't select the whole thing by double clicking.
e.g. 123e4567-e89b-12d3-a456-426655440000
I like shorter IDs.
I like being able to double click an ID to select it.
My question is: are there any issues with encoding a standard ID into a 22(ish) character long base62 alphanumeric string?
e.g. 71jbvv7LfRKYp19gtRLtkn
EDIT: Added Context
Our needs are for general data storage in NoSQL data storage services such as DynamoDB. Collision should not happen, but my understanding is that collision risk with UUIDs is negligible. Standard UUIDs would suit our needs, so what I'm asking is... is there any difference, or extra risk or unforeseen issues with encoding in base62 that doesn't exist with standard UUIDs?
Thanks.
I think it's a good idea and I'm strongly considering it myself for my current project.
But only for external representation, not for internal storage.
Indeed, UUIDs are fundamentally just 128 bit integers, or an array of 16 bytes or 128 bits.
For efficient DB storage, they should be stored in their binary form (e.g. a BINARY(16) column in MySQL). It will save space (16 bytes vs 36 bytes for the usual text representation, or 22 bytes for Base62), and perform faster when querying or indexing (strings don’t sort as fast as numbers because they rely on collation rules).
The canonical representation is a hexadecimal encoding, with the 8-4-4-4-12 grouping, based on the semantic meaning of each group of bytes (meaning which we don't care about in most cases).
But it is just a convention, and not human-friendly at all. So I think a different encoding such as Base62 is totally acceptable, to be exposed where human interaction happens (e.g. in URLs), or for interfaces or storage system that are text-based anyway (HTTP APIs for example, or file storage in CSV/JSON/XML...).
Internally your application should use them in binary form. I don't know about PHP but Java for example has the java.util.UUID
class.
For Java there's also a really nice library that makes conversion between raw UUID and Base62 text representation very easy:
https://github.com/Devskiller/friendly-id
More about UUIDs:
Wikipedia article
UUID or GUID as Primary Keys? Be Careful!
Base62 is not as standard as base-64, but then base-64 would have two extra symbols which may not allow selecting the whole thing by double clicking.
How about just removing the dashes (-)? That would make it shorter than original and it would be easily selectable by double clicking a mouse.
Example:123e4567e89b12d3a456426655440000
Update:
There are two common encodings for base-64: [a-zA-Z0-9/+] and [a-zA-Z0-9_-]. If you go with the latter, then that resolves your selection issue.
On the other hand, I think base-62 is more widely used than I originally thought. Here is a nice blog on the topic of using base-62: http://blog.birdhouse.org/2010/10/24/base62-urls-django/
Solution to your problem is frequently named as Url62, some projects are using this conventions. They are converting plain UUID to Base62 format.
If you are developing in Java, then take a look at FriendlyId project: https://github.com/Devskiller/friendly-id
More to read about this topic: https://medium.com/@huntie/representing-a-uuid-as-a-base-62-hash-id-for-short-pretty-urls-c30e66bf35f9
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With