Python regex exclude Underscore

Question

I need to find all two-char sumbols in UNICODE, except underscore. Current solutin is:

pattern = re.compile(ur'(?:\s*)(\w{2})(?:\s*)', re.UNICODE | re.MULTILINE | re.DOTALL)
print pattern.findall('a b c ab cd vs sd a a_ _r')
['ab', 'cd', 'vs', 'sd', 'a_', '_r']

I need to exclude underscore _ from regex, so a_ AND _r are not found. The problem is, my characters can be in any language. So i can't use regex like this: [^a-zA-Z]. For example, in russian:

print pattern.findall(u'ф_')

Ioan Alexandru Cucu · Accepted Answer

Exclude anything that's a non-word char AND _

[^\W_]

instead of

\w

Martijn Pieters · Answer

Your best bet would be to use the new regex module instead. One of it's features is that it can remove characters from a character set:

import regex as re

pattern = re.compile(ur'(?:\s*)([\w--_]{2})(?:\s*)', re.UNICODE | re.MULTILINE | re.DOTALL)

The [\w--_] syntax creates a character set that is the same as \w with the underscore character removed from the matching characters.

Python regex exclude Underscore

Tags:

python

regex

artyomboyko

2 Answers

Ioan Alexandru Cucu

Martijn Pieters

Recent Activity

Donate For Us

Python regex exclude Underscore

Tags:

python

regex

artyomboyko

2 Answers

Ioan Alexandru Cucu

Martijn Pieters

Related questions

Recent Activity

Donate For Us