Suppose I want to match a lowercase letter followed by an uppercase letter, I could do something like <pre class="prettyprint"><code>re.compile(r"[a-z][A-Z]") </code></pre> Now I want to do the same thing for unicode strings, i.e. match something like 'aÅ' or 'yÜ'. Tried <pre class="prettyprint"><code>re.compile(r"[a-z][A-Z]", re.UNICODE) </code></pre> but that does not work. Any clues?

This is hard to do with Python regex because the current implementation doesn't support Unicode property shortcuts like <code>\p{Lu}</code> and <code>\p{Ll}</code>. <code>[A-Za-z]</code> will of course only match ASCII letters, regardless of whether the Unicode option is set or not. So until the <code>re</code> module is updated (or you install the <code>regex</code> package currently in development), you either need to do it programmatically (iterate through the string and do <code>char.islower()</code>/<code>char.isupper()</code> on the characters), or specify all the unicode code points manually which probably isn't worth the effort...

Matching case sensitive unicode strings with regular expressions in Python

Tags:

python

regex

case-insensitive

unicode

character-properties

Suppose I want to match a lowercase letter followed by an uppercase letter, I could do something like

re.compile(r"[a-z][A-Z]")

Now I want to do the same thing for unicode strings, i.e. match something like 'aÅ' or 'yÜ'.

Tried

re.compile(r"[a-z][A-Z]", re.UNICODE)

but that does not work.

Any clues?

894

asked Sep 13 '11 06:09

repoman

1 Answers

This is hard to do with Python regex because the current implementation doesn't support Unicode property shortcuts like \p{Lu} and \p{Ll}.

[A-Za-z] will of course only match ASCII letters, regardless of whether the Unicode option is set or not.

So until the re module is updated (or you install the regex package currently in development), you either need to do it programmatically (iterate through the string and do char.islower()/char.isupper() on the characters), or specify all the unicode code points manually which probably isn't worth the effort...

answered Oct 19 '22 08:10

Tim Pietzcker

Related questions
                            
                                Django - No module named PIL
                            
                                Django from the point of view of Zend Framework developer
                            
                                mod_wsgi process getting killed and django stops working
                            
                                Twisted application without twistd
                            
                                Incorrect exit code in python when calling windows script
                            
                                How to populate shelf with existing dictionary
                            
                                Django model layer for HBase support
                            
                                inspect.getfile () vs inspect.getsourcefile()
                            
                                SqlAlchemy Migrate Declarative
                            
                                Is Python on every GNU/Linux distribution?
                            
                                Import arbitrary-named file as a Python module, without generating bytecode file
                            
                                Python regex not to match http://
                            
                                How to properly organize a Python class definition with respect to helper files for that class?
                            
                                Python find audio frequency and amplitude over time
                            
                                Can the CSV module parse files with multi-character delimiters?
                            
                                pythons 'print' statement doesn't call the .write() method?
                            
                                how to use Python SaveAs dialog
                            
                                Can modules with a common package hierarchy mentioned multiple times in my PYTHONPATH?
                            
                                Python to parse non-standard XML file
                            
                                Does SQLAlchemy support "closure tables?"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With