In .net you can use <code>\p{L}</code> to match any letter, how can I do the same in Python? Namely, I want to match any uppercase, lowercase, and accented letters.

PyPi regex module supports <code>\p{L}</code> Unicode property class, and many more, see "Unicode codepoint properties, including scripts and blocks" section in the documentation and full list at http://www.unicode.org/Public/UNIDATA/PropList.txt. Using <code>regex</code> module is convenient because you get consistent results across any Python version (mind that the Unicode standard is constantly evolving and the number of supported letters grows). Install the library using <code>pip install regex</code> (or <code>pip3 install regex</code>) and use <pre class="prettyprint"><code>\p{L} # To match any Unicode letter \p{Lu} # To match any uppercase Unicode letter \p{Ll} # To match any lowercase Unicode letter \p{L}\p{M}* # To match any Unicode letter and any amount of diacritics after it </code></pre> See some usage examples below: <pre class="prettyprint"><code>import regex text = r'Abc-++-Абв. It’s “Łąć”!' # Removing letters: print( regex.sub(r'\p{L}+', '', text) ) # => -++-. ’ “”! # Extracting letter chunks: print( regex.findall(r'\p{L}+', text) ) # => ['Abc', 'Абв', 'It', 's', 'Łąć'] # Removing all but letters: print( regex.sub(r'\P{L}+', '', text) ) # => AbcАбвItsŁąć # Removing all letters but ASCII letters: print( regex.sub(r'[^\P{L}a-zA-Z]+', '', text) ) # => Abc-++-. It’s “”! </code></pre> See a Python demo online

Match any unicode letter?

2 Answers

Python's re module doesn't support Unicode properties yet. But you can compile your regex using the re.UNICODE flag, and then the character class shorthand \w will match Unicode letters, too.

Since \w will also match digits, you need to then subtract those from your character class, along with the underscore:

[^\W\d_]

will match any Unicode letter.

>>> import re
>>> r = re.compile(r'[^\W\d_]', re.U)
>>> r.match('x')
<_sre.SRE_Match object at 0x0000000001DBCF38>
>>> r.match(u'é')
<_sre.SRE_Match object at 0x0000000002253030>

141

answered Oct 25 '22 15:10

Tim Pietzcker

PyPi regex module supports \p{L} Unicode property class, and many more, see "Unicode codepoint properties, including scripts and blocks" section in the documentation and full list at http://www.unicode.org/Public/UNIDATA/PropList.txt. Using regex module is convenient because you get consistent results across any Python version (mind that the Unicode standard is constantly evolving and the number of supported letters grows).

Install the library using pip install regex (or pip3 install regex) and use

\p{L}        # To match any Unicode letter
\p{Lu}       # To match any uppercase Unicode letter
\p{Ll}       # To match any lowercase Unicode letter
\p{L}\p{M}*  # To match any Unicode letter and any amount of diacritics after it

See some usage examples below:

import regex
text = r'Abc-++-Абв. It’s “Łąć”!'
# Removing letters:
print( regex.sub(r'\p{L}+', '', text) ) # => -++-. ’ “”!
# Extracting letter chunks:
print( regex.findall(r'\p{L}+', text) ) # => ['Abc', 'Абв', 'It', 's', 'Łąć']
# Removing all but letters:
print( regex.sub(r'\P{L}+', '', text) ) # => AbcАбвItsŁąć
# Removing all letters but ASCII letters:
print( regex.sub(r'[^\P{L}a-zA-Z]+', '', text) ) # => Abc-++-. It’s “”!

See a Python demo online

answered Oct 25 '22 16:10

Wiktor Stribiżew

Related questions
                            
                                how to flatten input in `nn.Sequential` in Pytorch
                            
                                1006 Connection closed abnormally error with python 3.7 websockets
                            
                                Using Flake8 in VSCode...?
                            
                                ipython3 does not work in the terminal with python3.7
                            
                                Why did dict.get(key) work but not dict[key]?
                            
                                Plot Confusion Matrix with scikit-learn without a Classifier
                            
                                pyenv configure: error: C compiler cannot create executables
                            
                                Is there a Python equivalent to `perl -pi -e`?
                            
                                What is the correct (or best) way to subclass the Python set class, adding a new instance variable?
                            
                                Equivalent for inject() in Python?
                            
                                In Django, how do I clear a sessionkey?
                            
                                Python: HTTP Post a large file with streaming
                            
                                best way to implement a deck for a card game in python
                            
                                Using Django view variables inside templates
                            
                                Fastest way to sort in Python
                            
                                Does anyone have good examples of using mutagen to write to files? [closed]
                            
                                python function call with variable
                            
                                Why doesn’t SQLite3 require a commit() call to save data?
                            
                                python, numpy boolean array: negation in where statement
                            
                                Find phase difference between two (inharmonic) waves

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Match any unicode letter?

Tags:

python

regex

character-properties

mpen

People also ask

2 Answers

Tim Pietzcker

Wiktor Stribiżew

Recent Activity

Donate For Us