Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does (?ui) mean in a Python regex?

Tags:

python

regex

The Python Fuzzy Wuzzy library includes the following regex:

regex = re.compile(r"(?ui)\W")
return regex.sub(u" ", a_string)

(https://github.com/seatgeek/fuzzywuzzy/blob/master/fuzzywuzzy/string_processing.py#L17)

This replaces any non-alphanumeric in a_string with a space.

What does the (?ui) bit do though? It seems to work fine without it.

Thanks

like image 604
alan Avatar asked May 04 '26 05:05

alan


1 Answers

The u is the unicode flag and i is the ignore case flag.

The unicode flag makes \w, \W, \b, \B, \d, \D, \s and \S dependent on the Unicode character properties database. For example:

>>> re.findall(r'\d+', u'The answer is \u0664\u0662')         # No flag
[]

>>> re.findall(r'(?u)\d+', u'The answer is \u0664\u0662')     # With unicode flag
[u'\u0664\u0662']

The ignore case flag performs case-insensitive matching. Expressions like [A-Z] will match lowercase letters as well. This is not affected by the current locale. For example:

>>> re.findall(r'[a-z]+', 'HELLO world')         # No flag
['world']

>>> re.findall(r'(?i)[a-z]+', 'HELLO world')     # With ignore case flag
['HELLO', 'world']
like image 57
Raymond Hettinger Avatar answered May 06 '26 21:05

Raymond Hettinger