I need the solutions to this question, except for Python! I've tried installing the regex library for Python, as apparently that enables the use of POSIX expressions in Python's regexes, but nevertheless I guess it does not include Unicode characters in the [:alpha:]
class. E.g.:
>>> re.search(r'[[:alpha:] ]+','Please work blåbær and NOW stop 123').group(0)
'Please work bl'
When I want it to match Please work blåbær and NOW stop
EDIT: I am using Python 2.7
EDIT 2: I tried the following:
>>> re.search(re.compile('[\w ]+', re.UNICODE),'Please work blåbær and NOW stop 123').group(0)
'Please work bl\xc3'
Not quite what I wanted (I want to match the part after the first non-ASCII character too), but at least it matched on character more than before. What should I be doing here to get it to match the rest of what I want?
EDIT 3: I don't want to match any non-"word" characters; by "word" I mean a-z, A-Z, space, and any accented variations of word characters. I hope I got my idea across; in a phrase like
lets match força, but stop before that comma
I want to match only lets match força
EDIT 4: So I tried to use Python 3 just for this one script:
>>> re.search(re.compile('[\w ]+', re.UNICODE),'lets match força, but stop before that comma').group(0)
'lets match força'
I guess it works for the most part in Python 3, except that it also matches numbers (which I definitely don't want) and underscores. Any way to fix this, in Python 2 or 3?
It's not clear which python version you are using. if you use 2.x then you maybe have an unicode issue. see this post for further pointers and feel free to update your question to elaborate further.
Im quite surprissed, that i can't convert the accented character to proper unicode representation...
but there are workaround:
re.search(re.compile('((\w+\s)|(\w+\W+\w+\s))+', re.UNICODE), ur'Please work blåbær and NOW stop 123').group(0)
or
re.search(re.compile('\D+', re.UNICODE), ur'Please work blåbær and NOW stop 123').group(0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With