Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I match all unicode lowercase characters in Python with a regular expression?

I am trying to write a regular expression that would match Unicode lowercase characters in Python 3. I'm using the re library. For example, re.findall(some_pattern, 'u∏ñKθ') should return ['u', 'ñ', 'θ'].

In Sublime Text, I could simply type [[:lower:]] to find these characters.

I'm aware that Python can match on any Unicode character with re.compile('[^\W\d_]'), but I specifically need to differentiate between uppercase and lowercase. I'm also aware that re.compile('[a-z]') would match any ASCII lowercase character, but my data is UTF-8, and it includes lots of non-ASCII characters—I checked.

Is this possible with regular expressions in Python 3, or will I need to take an alternative approach? I know other ways to do it. I was just hoping to use regex.

like image 856
Nik Avatar asked Nov 04 '25 16:11

Nik


1 Answers

You can use the regex module that supports POSIX character classes:

import regex 

>>> regex.findall('[[:lower:]]', 'u∏ñKθ')
['u', 'ñ', 'θ']

Or, use the Unicode Category Class of \p{Ll} or \p{Lowercase_Letter}:

>>> regex.findall(r'\p{Ll}', 'u∏ñKθ')
['u', 'ñ', 'θ']

Or just use Python's string logic:

>>> [c for c in 'u∏ñKθ' if c.islower()]
['u', 'ñ', 'θ']

In either case, beware of string such as this:

>>> s2='\u0061\u0300\u00E0'
>>> s2
'àà'

The first grapheme 'à' is the result of an 'a' with the combining character of '̀' where the second 'à' is the result of that specific code point. If you use a character class here, it will match 'a' and not the combining accent:

>>> regex.findall('[[:lower:]]', s2)
['a', 'à']
>>> [c for c in s2 if c.islower()]
['a', 'à']

To solve that, you need to account for that in more complicated regex patterns or normalize the string:

>>> regex.findall('[[:lower:]]', unicodedata.normalize('NFC',s2))
['à', 'à']

or loop through grapheme by grapheme:

>>> [c for c in regex.findall(r'\X', s2) if c.islower()]
['à', 'à']
like image 190
dawg Avatar answered Nov 07 '25 11:11

dawg



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!