I'm having a problem on the accentuation of Python regular expression, I'm doing the following attempt:
import re
ER = re.compile(r'\w', re.L)
print(ER.sub('.','Maçã'))
..çã
Even using the re.compile
passing the locale as an argument, the accents are not recognized.
Has anyone had this problem?
Thanks!
You're better off using re.U
unicode flag.
If using Python 2.x you also need to specify the string as unicode, i.e.
print(ER.sub('.', u'Maçã'))
From http://www.regular-expressions.info/python.html
By default, Python's regex engine only considers the letters A through Z, the digits 0 through 9, and the underscore as "word characters". Specify the flag re.L or re.LOCALE to make \w match all characters that are considered letters given the current locale settings. Alternatively, you can specify re.U or re.UNICODE to treat all letters from all scripts as word characters. The setting also affects word boundaries.
Try using re.UNICODE.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With