I have a bunch of items in my database. Each is assigned a unique ID. I want to shorten this ID and display it on the page, so that if I user needs to contact us (over the phone) regarding a particular item, he can give us the shortened ID, rather than a really big number. Similar to the SKU, on sites like NCIX. Thus, I was thinking about encoding it in base 36. The problem with that, however, is letters like 1lI
all look kind of the same. So, I was thinking about eliminating the look-alikes. Is this a good idea, or should I just use a really legible font?
Yes, you should eliminate sources of confusion. Because if a mistake can be made, someone will make it. Very easy to confuse 0 with O and I with l or 1 - hence should not use them both. Well that's easy - since you won't use 3 chars (i, L and o), just get the number in base 36-3 = 33 and convert
SKU.replace('I','X').replace('L','Y').replace('O','Z')
Inversely when given such code and before doing int(SKU, 33), you will have to return XYZ back to the confusing characters. Before that though, if - as expected - you are given by mistake L or I, replace with 1 and if given O, replace with 0. E.g. use SKU.translate() with
string.maketrans('LIOXYZ','110IL0')
I'm assuming the original ID is numeric. We've had good results from z-base-32 with a similar scenario. We've been using it since April 2009.
I particularly liked the encoding's goals of minimizing transcription errors, through removing confusing letters from the alphabet, and brevity, as shorter identifiers are easier to use.
The encoding orders the alphabet so that the more commonly occurring characters are those that are easier to read, write, speak and remember. Lower case is used as it's easier to read.
I asked this similar question before we decided to use z-base-32.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With