I have some python code writen in an older version of python(2.x) and I struggle to make it work. I'm using python 3.4
_eng_word = ur"[a-zA-Z][a-zA-Z0-9'.]*"
(it's part of a tokenizer)
http://bugs.python.org/issue15096
Title: Drop support for the "ur" string prefix
When PEP 414 restored support for explicit Unicode literals in Python 3, the "ur" string prefix was deemed to be a synonym for the "r" prefix.
So, use 'r' instead of 'ur'
Indeed, Python 3.4 only supports u'...'
(to support code that needs to run on both Python 2 and 3) and r'....'
, but not both. That's because the semantics of how ur'..'
works in Python 2 are different from how ur'..'
would work in Python 3 (in Python 2, \uhhhh
and \Uhhhhhhhh
escapes still are processed, in Python 3 a `r'...' string would not).
Note that in this specific case there is no difference between the raw string literal and the regular! You can just use:
_eng_word = u"[a-zA-Z][a-zA-Z0-9'.]*"
and it'll work in both Python 2 and 3.
For cases where a raw string literal does matter, you could decode the raw string from raw_unicode_escape
on Python 2, catching the AttributeError
on Python 3:
_eng_word = r"[a-zA-Z][a-zA-Z0-9'.]*"
try:
# Python 2
_eng_word = _eng_word.decode('raw_unicode_escape')
except AttributeError:
# Python 3
pass
If you are writing Python 3 code only (so it doesn't have to run on Python 2 anymore), just drop the u
entirely:
_eng_word = r"[a-zA-Z][a-zA-Z0-9'.]*"
This table compares (some of) the different string literal prefixes in Python 2(.7) and 3(.4+):
As you can see, in Python 3 there's no way to have a literal that doesn't process escapes, but does process unicode literals. To get such a string with code that works in both Python 2 and 3, use:
br"[a-zA-Z][a-zA-Z0-9'.]*".decode('raw_unicode_escape')
Actually, your example is not very good, since it doesn't have any unicode literals, or escape sequences. A better example would be:
br"[\u03b1-\u03c9\u0391-\u03a9][\t'.]*".decode('raw_unicode_escape')
In python 2:
>>> br"[\u03b1-\u03c9\u0391-\u03a9][\t'.]*".decode('raw_unicode_escape')
u"[\u03b1-\u03c9\u0391-\u03a9][\\t'.]*"
In Python 3:
>>> br"[\u03b1-\u03c9\u0391-\u03a9][\t'.]*".decode('raw_unicode_escape')
"[α-ωΑ-Ω][\\t'.]*"
Which is really the same thing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With