Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert an integer to the shortest url-safe string in Python?

Tags:

python

url

base64

I want the shortest possible way of representing an integer in a URL. For example, 11234 can be shortened to '2be2' using hexadecimal. Since base64 uses is a 64 character encoding, it should be possible to represent an integer in base64 using even less characters than hexadecimal. The problem is I can't figure out the cleanest way to convert an integer to base64 (and back again) using Python.

The base64 module has methods for dealing with bytestrings - so maybe one solution would be to convert an integer to its binary representation as a Python string... but I'm not sure how to do that either.

like image 761
Simon Willison Avatar asked Feb 18 '09 15:02

Simon Willison


People also ask

How do I convert an integer to a string in Python?

In Python an integer can be converted into a string using the built-in str() function. The str() function takes in any python data type and converts it into a string.

How do you convert a URL to a string in Python?

Use the urllib. parse. urlencode() function (with the doseq parameter set to True ) to convert such dictionaries into query strings.


2 Answers

This answer is similar in spirit to Douglas Leeder's, with the following changes:

  • It doesn't use actual Base64, so there's no padding characters
  • Instead of converting the number first to a byte-string (base 256), it converts it directly to base 64, which has the advantage of letting you represent negative numbers using a sign character.

    import string ALPHABET = string.ascii_uppercase + string.ascii_lowercase + \            string.digits + '-_' ALPHABET_REVERSE = dict((c, i) for (i, c) in enumerate(ALPHABET)) BASE = len(ALPHABET) SIGN_CHARACTER = '$'  def num_encode(n):     if n < 0:         return SIGN_CHARACTER + num_encode(-n)     s = []     while True:         n, r = divmod(n, BASE)         s.append(ALPHABET[r])         if n == 0: break     return ''.join(reversed(s))  def num_decode(s):     if s[0] == SIGN_CHARACTER:         return -num_decode(s[1:])     n = 0     for c in s:         n = n * BASE + ALPHABET_REVERSE[c]     return n 

    >>> num_encode(0)     'A'     >>> num_encode(64)     'BA'     >>> num_encode(-(64**5-1))     '$_____' 

A few side notes:

  • You could (marginally) increase the human-readibility of the base-64 numbers by putting string.digits first in the alphabet (and making the sign character '-'); I chose the order that I did based on Python's urlsafe_b64encode.
  • If you're encoding a lot of negative numbers, you could increase the efficiency by using a sign bit or one's/two's complement instead of a sign character.
  • You should be able to easily adapt this code to different bases by changing the alphabet, either to restrict it to only alphanumeric characters or to add additional "URL-safe" characters.
  • I would recommend against using a representation other than base 10 in URIs in most cases—it adds complexity and makes debugging harder without significant savings compared to the overhead of HTTP—unless you're going for something TinyURL-esque.
like image 133
Miles Avatar answered Sep 27 '22 21:09

Miles


All the answers given regarding Base64 are very reasonable solutions. But they're technically incorrect. To convert an integer to the shortest URL safe string possible, what you want is base 66 (there are 66 URL safe characters).

That code looks something like this:

from io import StringIO import urllib  BASE66_ALPHABET = u"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz-_.~" BASE = len(BASE66_ALPHABET)  def hexahexacontadecimal_encode_int(n):     if n == 0:         return BASE66_ALPHABET[0].encode('ascii')      r = StringIO()     while n:         n, t = divmod(n, BASE)         r.write(BASE66_ALPHABET[t])     return r.getvalue().encode('ascii')[::-1] 

Here's a complete implementation of a scheme like this, ready to go as a pip installable package:

https://github.com/aljungberg/hhc

like image 40
Alexander Ljungberg Avatar answered Sep 27 '22 22:09

Alexander Ljungberg