I have read thru the other questions at Stackoverflow, but still no closer. Sorry, if this is allready answered, but I didn`t get anything proposed there to work. <pre class="prettyprint"><code>>>> import re >>> m = re.match(r'^/by_tag/(?P<tag>\w+)/(?P<filename>(\w|[.,!#%{}()@])+)$', '/by_tag/xmas/xmas1.jpg') >>> print m.groupdict() {'tag': 'xmas', 'filename': 'xmas1.jpg'} </code></pre> All is well, then I try something with Norwegian characters in it ( or something more unicode-like ): <pre class="prettyprint"><code>>>> m = re.match(r'^/by_tag/(?P<tag>\w+)/(?P<filename>(\w|[.,!#%{}()@])+)$', '/by_tag/påske/øyfjell.jpg') >>> print m.groupdict() Traceback (most recent call last): File "<interactive input>", line 1, in <module> AttributeError: 'NoneType' object has no attribute 'groupdict' </code></pre> How can I match typical unicode characters, like øæå? I`d like to be able to match those characters as well, in both the tag-group above and the one for filename.

You need to specify the <code>re.UNICODE</code> flag, and input your string as a Unicode string by using the <code>u</code> prefix: <pre class="prettyprint"><code>>>> re.match(r'^/by_tag/(?P<tag>\w+)/(?P<filename>(\w|[.,!#%{}()@])+)$', u'/by_tag/påske/øyfjell.jpg', re.UNICODE).groupdict() {'tag': u'p\xe5ske', 'filename': u'\xf8yfjell.jpg'} </code></pre> This is in Python 2; in Python 3 you must leave out the <code>u</code> because all strings are Unicode.

matching unicode characters in python regular expressions

Tags:

python

regex

unicode

character-properties

non-ascii-characters

I have read thru the other questions at Stackoverflow, but still no closer. Sorry, if this is allready answered, but I didn`t get anything proposed there to work.

>>> import re >>> m = re.match(r'^/by_tag/(?P<tag>\w+)/(?P<filename>(\w|[.,!#%{}()@])+)$', '/by_tag/xmas/xmas1.jpg') >>> print m.groupdict() {'tag': 'xmas', 'filename': 'xmas1.jpg'}

All is well, then I try something with Norwegian characters in it ( or something more unicode-like ):

>>> m = re.match(r'^/by_tag/(?P<tag>\w+)/(?P<filename>(\w|[.,!#%{}()@])+)$', '/by_tag/påske/øyfjell.jpg') >>> print m.groupdict() Traceback (most recent call last): File "<interactive input>", line 1, in <module> AttributeError: 'NoneType' object has no attribute 'groupdict'

How can I match typical unicode characters, like øæå? I`d like to be able to match those characters as well, in both the tag-group above and the one for filename.

962

asked Feb 17 '11 12:02

Weholt

1 Answers

You need to specify the re.UNICODE flag, and input your string as a Unicode string by using the u prefix:

>>> re.match(r'^/by_tag/(?P<tag>\w+)/(?P<filename>(\w|[.,!#%{}()@])+)$', u'/by_tag/påske/øyfjell.jpg', re.UNICODE).groupdict() {'tag': u'p\xe5ske', 'filename': u'\xf8yfjell.jpg'}

This is in Python 2; in Python 3 you must leave out the u because all strings are Unicode.

200

answered Sep 30 '22 16:09

Thomas

Related questions
                            
                                Creating a Terminal Program with Python
                            
                                Fitting a 2D Gaussian function using scipy.optimize.curve_fit - ValueError and minpack.error
                            
                                python mysql.connector DictCursor?
                            
                                Python dictionary key error when assigning - how do I get around this?
                            
                                Python and JSON - TypeError list indices must be integers not str
                            
                                How to change downloading name in Flask?
                            
                                How to create dataframe from list in Spark SQL?
                            
                                Install mod_wsgi on Ubuntu with Python 3.6, Apache 2.4, and Django 1.11
                            
                                What is the most Pythonic way to provide a fall-back value in an assignment?
                            
                                timeout for urllib2.urlopen() in pre Python 2.6 versions
                            
                                Manager dict in Multiprocessing
                            
                                How do I insert a list at the front of another list?
                            
                                how to tell pylint to ignore certain imports?
                            
                                Import netCDF file to Pandas dataframe
                            
                                Setting HTTP status code in Bottle?
                            
                                How to run specific test cases from a test suite using Robot Framework
                            
                                Split a string by backslash in python
                            
                                Return self in python [closed]
                            
                                How to use slugify in Python 3?
                            
                                Python: Multicore processing?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With