Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unicode literals that work in python 3 and 2

So I have a python script that I'd prefer worked on python 3.2 and 2.7 just for convenience.

Is there a way to have unicode literals that work in both? E.g.

#coding: utf-8
whatever = 'שלום'

The above code would require a unicode string in python 2.x (u'') and in python 3.x that little u causes a syntax error.

like image 430
ubershmekel Avatar asked Jul 08 '11 14:07

ubershmekel


People also ask

Are Python 2 and 3 compatible with each other?

# Python 2 and 3: forward-compatible from builtins import range for i in range(10**8): ... # Python 2 and 3: backward-compatible from past.

What is Unicode in Python 2?

The unicode object lets you work with characters. It has all the same methods as the string object. “encoding” is converting from a unicode object to bytes. “decoding” is converting from bytes to a unicode object.

Does Python 3 have Unicode?

Since Python 3.0, the language's str type contains Unicode characters, meaning any string created using "unicode rocks!" , 'unicode rocks!'

What are Unicode literals?

If the character string literal has a prefix of N, the literal is treated as a Unicode string. When the N prefix is used, the characters in the literal are read as WCHAR characters. Any string literal with non-ASCII characters is treated as a Unicode literal by default.


1 Answers

Edit - Since Python 3.3, the u'' literal works again, so the u() function isn't needed.

The best option is to make a method that creates unicode objects from string objects in Python 2, but leaves the string objects alone in Python 3 (as they are already unicode).

import sys
if sys.version < '3':
    import codecs
    def u(x):
        return codecs.unicode_escape_decode(x)[0]
else:
    def u(x):
        return x

You would then use it like so:

>>> print(u('\u00dcnic\u00f6de'))
Ünicöde
>>> print(u('\xdcnic\N{Latin Small Letter O with diaeresis}de'))
Ünicöde
like image 136
Lennart Regebro Avatar answered Oct 09 '22 15:10

Lennart Regebro