Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python version 3.4 does not support a 'ur' prefix

I have some python code writen in an older version of python(2.x) and I struggle to make it work. I'm using python 3.4

_eng_word = ur"[a-zA-Z][a-zA-Z0-9'.]*"

(it's part of a tokenizer)

like image 594
user_s Avatar asked Sep 26 '14 16:09

user_s


3 Answers

http://bugs.python.org/issue15096

Title: Drop support for the "ur" string prefix
When PEP 414 restored support for explicit Unicode literals in Python 3, the "ur" string prefix was deemed to be a synonym for the "r" prefix.

So, use 'r' instead of 'ur'

like image 55
KevinDTimm Avatar answered Nov 02 '22 01:11

KevinDTimm


Indeed, Python 3.4 only supports u'...' (to support code that needs to run on both Python 2 and 3) and r'....', but not both. That's because the semantics of how ur'..' works in Python 2 are different from how ur'..' would work in Python 3 (in Python 2, \uhhhh and \Uhhhhhhhh escapes still are processed, in Python 3 a `r'...' string would not).

Note that in this specific case there is no difference between the raw string literal and the regular! You can just use:

_eng_word = u"[a-zA-Z][a-zA-Z0-9'.]*"

and it'll work in both Python 2 and 3.

For cases where a raw string literal does matter, you could decode the raw string from raw_unicode_escape on Python 2, catching the AttributeError on Python 3:

_eng_word = r"[a-zA-Z][a-zA-Z0-9'.]*"
try:
    # Python 2
    _eng_word = _eng_word.decode('raw_unicode_escape')
except AttributeError:
    # Python 3
    pass

If you are writing Python 3 code only (so it doesn't have to run on Python 2 anymore), just drop the u entirely:

_eng_word = r"[a-zA-Z][a-zA-Z0-9'.]*"
like image 25
Martijn Pieters Avatar answered Nov 02 '22 01:11

Martijn Pieters


This table compares (some of) the different string literal prefixes in Python 2(.7) and 3(.4+): enter image description here

As you can see, in Python 3 there's no way to have a literal that doesn't process escapes, but does process unicode literals. To get such a string with code that works in both Python 2 and 3, use:

br"[a-zA-Z][a-zA-Z0-9'.]*".decode('raw_unicode_escape')

Actually, your example is not very good, since it doesn't have any unicode literals, or escape sequences. A better example would be:

br"[\u03b1-\u03c9\u0391-\u03a9][\t'.]*".decode('raw_unicode_escape')

In python 2:

>>> br"[\u03b1-\u03c9\u0391-\u03a9][\t'.]*".decode('raw_unicode_escape')
u"[\u03b1-\u03c9\u0391-\u03a9][\\t'.]*"

In Python 3:

>>> br"[\u03b1-\u03c9\u0391-\u03a9][\t'.]*".decode('raw_unicode_escape')
"[α-ωΑ-Ω][\\t'.]*"

Which is really the same thing.

like image 7
itsadok Avatar answered Nov 02 '22 00:11

itsadok