I am trying to use the re.UNICODE flag to match a string potentially containing unicode characters, but it doesn't seem to be working. E.g.:
Python 2.7.12 (default, Dec 4 2017, 14:50:18)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> r = re.compile(ur"(\w+)", re.UNICODE)
>>> r.findall(u"test test test", re.UNICODE)
[]
It works if I do not specify the unicode flag, but then obviously it will not work with unicode strings. What do I need to do to get this working?
The second argument to r.findall is not flags, but pos. You don't need to specify flags again when you already specified them in compile.
>>> r = re.compile(ur"(\w+)", re.UNICODE)
>>> r.findall(u'test test test')
[u'test', u'test', u'test']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With