Problem with accented characters [closed]

Question

I'm having a problem on the accentuation of Python regular expression, I'm doing the following attempt:

import re
ER = re.compile(r'\w', re.L)
print(ER.sub('.','Maçã'))

..çã

Even using the re.compile passing the locale as an argument, the accents are not recognized. Has anyone had this problem?

Thanks!

SilentGhost · Accepted Answer

You're better off using re.U unicode flag.

If using Python 2.x you also need to specify the string as unicode, i.e.

print(ER.sub('.', u'Maçã'))

MeanEYE · Answer

From http://www.regular-expressions.info/python.html

By default, Python's regex engine only considers the letters A through Z, the digits 0 through 9, and the underscore as "word characters". Specify the flag re.L or re.LOCALE to make \w match all characters that are considered letters given the current locale settings. Alternatively, you can specify re.U or re.UNICODE to treat all letters from all scripts as word characters. The setting also affects word boundaries.

Try using re.UNICODE.

Problem with accented characters [closed]

Tags:

python

regex

unicode

Michel Andrade

2 Answers

SilentGhost

MeanEYE

Recent Activity

Donate For Us

Problem with accented characters [closed]

Tags:

python

regex

unicode

Michel Andrade

2 Answers

SilentGhost

MeanEYE

Related questions

Recent Activity

Donate For Us