How would you convert an integer to base 62 (like hexadecimal, but with these digits: '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ').
I have been trying to find a good Python library for it, but they all seems to be occupied with converting strings. The Python base64 module only accepts strings and turns a single digit into four characters. I was looking for something akin to what URL shorteners use.
To get the decimal number from base 62 string, for each character raise the base to the power index and multiply the result by the decimal equivalent of character.
Base62 uses 62 possible ASCII letters, 0 – 9, a – z and A – Z, therefore it is often used to represent large numbers in short length of string. Mainly it has two advantages: A shorter number representation in base62 yields a smaller risk of error entered by human and the number can be typed in faster.
There is no standard module for this, but I have written my own functions to achieve that.
BASE62 = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" def encode(num, alphabet): """Encode a positive number into Base X and return the string. Arguments: - `num`: The number to encode - `alphabet`: The alphabet to use for encoding """ if num == 0: return alphabet[0] arr = [] arr_append = arr.append # Extract bound-method for faster access. _divmod = divmod # Access to locals is faster. base = len(alphabet) while num: num, rem = _divmod(num, base) arr_append(alphabet[rem]) arr.reverse() return ''.join(arr) def decode(string, alphabet=BASE62): """Decode a Base X encoded string into the number Arguments: - `string`: The encoded string - `alphabet`: The alphabet to use for decoding """ base = len(alphabet) strlen = len(string) num = 0 idx = 0 for char in string: power = (strlen - (idx + 1)) num += alphabet.index(char) * (base ** power) idx += 1 return num
Notice the fact that you can give it any alphabet to use for encoding and decoding. If you leave the alphabet
argument out, you are going to get the 62 character alphabet defined on the first line of code, and hence encoding/decoding to/from 62 base.
Hope this helps.
PS - For URL shorteners, I have found that it's better to leave out a few confusing characters like 0Ol1oI etc. Thus I use this alphabet for my URL shortening needs - "23456789abcdefghijkmnpqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ"
Have fun.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With