Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Base 62 conversion

How would you convert an integer to base 62 (like hexadecimal, but with these digits: '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ').

I have been trying to find a good Python library for it, but they all seems to be occupied with converting strings. The Python base64 module only accepts strings and turns a single digit into four characters. I was looking for something akin to what URL shorteners use.

like image 883
mikl Avatar asked Jul 13 '09 14:07

mikl


People also ask

How do you calculate base 62?

To get the decimal number from base 62 string, for each character raise the base to the power index and multiply the result by the decimal equivalent of character.

What is Base62 used for?

Base62 uses 62 possible ASCII letters, 0 – 9, a – z and A – Z, therefore it is often used to represent large numbers in short length of string. Mainly it has two advantages: A shorter number representation in base62 yields a smaller risk of error entered by human and the number can be typed in faster.


1 Answers

There is no standard module for this, but I have written my own functions to achieve that.

BASE62 = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"  def encode(num, alphabet):     """Encode a positive number into Base X and return the string.      Arguments:     - `num`: The number to encode     - `alphabet`: The alphabet to use for encoding     """     if num == 0:         return alphabet[0]     arr = []     arr_append = arr.append  # Extract bound-method for faster access.     _divmod = divmod  # Access to locals is faster.     base = len(alphabet)     while num:         num, rem = _divmod(num, base)         arr_append(alphabet[rem])     arr.reverse()     return ''.join(arr)  def decode(string, alphabet=BASE62):     """Decode a Base X encoded string into the number      Arguments:     - `string`: The encoded string     - `alphabet`: The alphabet to use for decoding     """     base = len(alphabet)     strlen = len(string)     num = 0      idx = 0     for char in string:         power = (strlen - (idx + 1))         num += alphabet.index(char) * (base ** power)         idx += 1      return num 

Notice the fact that you can give it any alphabet to use for encoding and decoding. If you leave the alphabet argument out, you are going to get the 62 character alphabet defined on the first line of code, and hence encoding/decoding to/from 62 base.

Hope this helps.

PS - For URL shorteners, I have found that it's better to leave out a few confusing characters like 0Ol1oI etc. Thus I use this alphabet for my URL shortening needs - "23456789abcdefghijkmnpqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ"

Have fun.

like image 130
Baishampayan Ghose Avatar answered Sep 23 '22 12:09

Baishampayan Ghose