Special sequences (character classes) in Python RegEx are escapes like \w
or \d
that matches a set of characters.
In my case, I need to be able to match all alpha-numerical characters except numbers.
That is, \w
minus \d
.
I need to use the special sequence \w
because I'm dealing with non-ASCII characters and need to match symbols like "Æ" and "Ø".
One would think I could use this expression: [\w^\d]
but it doesn't seem to match anything and I'm not sure why.
So in short, how can I mix (add/subtract) special sequences in Python Regular Expressions?
EDIT: I accidentally used [\W^\d]
instead of [\w^\d]
. The latter does indeed match something, including parentheses and commas which are not alpha-numerical characters as far as I'm concerned.
In the context of regular expressions, a character class is a set of characters enclosed within square brackets. It specifies the characters that will successfully match a single character from a given input string.
With a “character class”, also called “character set”, you can tell the regex engine to match only one out of several characters. Simply place the characters you want to match between square brackets. If you want to match an a or an e, use [ae].
made this to find all with multiple #regular #expressions. regex1 = r"your regex here" regex2 = r"your regex here" regex3 = r"your regex here" regexList = [regex1, regex1, regex3] for x in regexList: if re. findall(x, your string): some_list = re. findall(x, your string) for y in some_list: found_regex_list.
A "character class", or a "character set", is a set of characters put in square brackets. The regex engine matches only one out of several characters in the character class or character set. We place the characters we want to match between square brackets.
You can use r"[^\W\d]"
, ie. invert the union of non-alphanumerics and numbers.
You cannot subtract character classes, no.
Your best bet is to use the new regex
module, set to replace the current re
module in python. It supports character classes based on Unicode properties:
\p{IsAlphabetic}
This will match any character that the Unicode specification states is an alphabetic character.
Even better, regex
does support character class subtraction; it views such classes as sets and allows you to create a difference with the --
operator:
[\w--\d]
matches everything in \w
except anything that also matches \d
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With