<p>Perl and some other current regex engines support Unicode properties, such as the category, in a regex. E.g. in Perl you can use <code>\p{Ll}</code> to match an arbitrary lower-case letter, or <code>p{Zs}</code> for any space separator. I don't see support for this in either the 2.x nor 3.x lines of Python (with due regrets). Is anybody aware of a good strategy to get a similar effect? Homegrown solutions are welcome.</p>

<p>The regex module (an alternative to the standard <code>re</code> module) supports Unicode codepoint properties with the <code>\p{}</code> syntax.</p>

Python regex matching Unicode properties

Tags:

Perl and some other current regex engines support Unicode properties, such as the category, in a regex. E.g. in Perl you can use \p{Ll} to match an arbitrary lower-case letter, or p{Zs} for any space separator. I don't see support for this in either the 2.x nor 3.x lines of Python (with due regrets). Is anybody aware of a good strategy to get a similar effect? Homegrown solutions are welcome.

887

asked Dec 02 '09 13:12

ThomasH

1 Answers

The regex module (an alternative to the standard re module) supports Unicode codepoint properties with the \p{} syntax.

answered Sep 21 '22 23:09

ronnix

Related questions
                            
                                Interactive matplotlib plot with two sliders
                            
                                How to check if a string is a valid regex in Python?
                            
                                How do I write a null (no-op) contextmanager in Python?
                            
                                How to return custom JSON in Django REST Framework
                            
                                How to yield results from a nested generator function?
                            
                                What exactly are "containers" in python? (And what are all the python container types?)
                            
                                Skip over a value in the range function in python
                            
                                Python: Wait on all of `concurrent.futures.ThreadPoolExecutor`'s futures
                            
                                How to make an object properly hashable?
                            
                                A better way for a Python 'for' loop
                            
                                Add another tuple to a tuple of tuples
                            
                                In Django models.py, what's the difference between default, null, and blank?
                            
                                open file in "w" mode: IOError: [Errno 2] No such file or directory
                            
                                How does python find a module file if the import statement only contains the filename?
                            
                                multiprocessing: map vs map_async
                            
                                How is HDF5 different from a folder with files?
                            
                                Why does Python assignment not return a value?
                            
                                How to install R packages that are not available in "R-essentials"?
                            
                                Pycharm: "scanning files to index" is taking forever
                            
                                Zipped Python generators with 2nd one being shorter: how to retrieve element that is silently consumed

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python regex matching Unicode properties

Tags:

python

regex

unicode

character-properties

ucd

ThomasH

People also ask

1 Answers

ronnix

Recent Activity

Donate For Us