So I have a python script that I'd prefer worked on python 3.2 and 2.7 just for convenience.
Is there a way to have unicode literals that work in both? E.g.
#coding: utf-8
whatever = 'שלום'
The above code would require a unicode string in python 2.x (u''
) and in python 3.x that little u
causes a syntax error.
# Python 2 and 3: forward-compatible from builtins import range for i in range(10**8): ... # Python 2 and 3: backward-compatible from past.
The unicode object lets you work with characters. It has all the same methods as the string object. “encoding” is converting from a unicode object to bytes. “decoding” is converting from bytes to a unicode object.
Since Python 3.0, the language's str type contains Unicode characters, meaning any string created using "unicode rocks!" , 'unicode rocks!'
If the character string literal has a prefix of N, the literal is treated as a Unicode string. When the N prefix is used, the characters in the literal are read as WCHAR characters. Any string literal with non-ASCII characters is treated as a Unicode literal by default.
Edit - Since Python 3.3, the u''
literal works again, so the u()
function isn't needed.
The best option is to make a method that creates unicode objects from string objects in Python 2, but leaves the string objects alone in Python 3 (as they are already unicode).
import sys
if sys.version < '3':
import codecs
def u(x):
return codecs.unicode_escape_decode(x)[0]
else:
def u(x):
return x
You would then use it like so:
>>> print(u('\u00dcnic\u00f6de'))
Ünicöde
>>> print(u('\xdcnic\N{Latin Small Letter O with diaeresis}de'))
Ünicöde
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With