I want the shortest possible way of representing an integer in a URL. For example, 11234 can be shortened to '2be2' using hexadecimal. Since base64 uses is a 64 character encoding, it should be possible to represent an integer in base64 using even less characters than hexadecimal. The problem is I can't figure out the cleanest way to convert an integer to base64 (and back again) using Python.
The base64 module has methods for dealing with bytestrings - so maybe one solution would be to convert an integer to its binary representation as a Python string... but I'm not sure how to do that either.
In Python an integer can be converted into a string using the built-in str() function. The str() function takes in any python data type and converts it into a string.
Use the urllib. parse. urlencode() function (with the doseq parameter set to True ) to convert such dictionaries into query strings.
This answer is similar in spirit to Douglas Leeder's, with the following changes:
Instead of converting the number first to a byte-string (base 256), it converts it directly to base 64, which has the advantage of letting you represent negative numbers using a sign character.
import string ALPHABET = string.ascii_uppercase + string.ascii_lowercase + \ string.digits + '-_' ALPHABET_REVERSE = dict((c, i) for (i, c) in enumerate(ALPHABET)) BASE = len(ALPHABET) SIGN_CHARACTER = '$' def num_encode(n): if n < 0: return SIGN_CHARACTER + num_encode(-n) s = [] while True: n, r = divmod(n, BASE) s.append(ALPHABET[r]) if n == 0: break return ''.join(reversed(s)) def num_decode(s): if s[0] == SIGN_CHARACTER: return -num_decode(s[1:]) n = 0 for c in s: n = n * BASE + ALPHABET_REVERSE[c] return n
>>> num_encode(0) 'A' >>> num_encode(64) 'BA' >>> num_encode(-(64**5-1)) '$_____'
A few side notes:
All the answers given regarding Base64 are very reasonable solutions. But they're technically incorrect. To convert an integer to the shortest URL safe string possible, what you want is base 66 (there are 66 URL safe characters).
That code looks something like this:
from io import StringIO import urllib BASE66_ALPHABET = u"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-_.~" BASE = len(BASE66_ALPHABET) def hexahexacontadecimal_encode_int(n): if n == 0: return BASE66_ALPHABET[0].encode('ascii') r = StringIO() while n: n, t = divmod(n, BASE) r.write(BASE66_ALPHABET[t]) return r.getvalue().encode('ascii')[::-1]
Here's a complete implementation of a scheme like this, ready to go as a pip installable package:
https://github.com/aljungberg/hhc
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With