Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pythonic way to ensure unicode in python 2 and 3

I'm working on porting a library so that it is compatible with both python 2 and 3. The library receives strings or string-like objects from the calling application and I need to ensure those objects get converted to unicode strings.

In python 2 I can do:

unicode_x = unicode(x)

In python 3 I can do:

unicode_x = str(x)

However, the best cross-version solution I have is:

def ensure_unicode(x):
  if sys.version_info < (3, 0):
    return unicode(x)
  return str(x)

which certainly doesn't seem great (although it works). Is there a better solution?

I am aware of unicode_literals and the u prefix but both of those solutions do not work as the inputs come from clients and are not literals in my library.

like image 835
Pace Avatar asked Mar 23 '15 15:03

Pace


1 Answers

Don't re-invent the compatibility layer wheel. Use the six compatibility layer, a small one-file project that can be included with your own:

Six supports every Python version since 2.6. It is contained in only one Python file, so it can be easily copied into your project. (The copyright and license notice must be retained.)

It includes a six.text_type() callable that does exactly this, convert a value to Unicode text:

import six

unicode_x = six.text_type(x)

In the project source code this is defined as:

import sys

PY2 = sys.version_info[0] == 2
PY3 = sys.version_info[0] == 3
# ...

if PY3:
    # ...
    text_type = str
    # ...

else:
    # ...
    text_type = unicode
    # ...
like image 158
Martijn Pieters Avatar answered Oct 12 '22 01:10

Martijn Pieters