Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to exclude a character from a regex group?

Tags:

python

regex

I want to strip all non-alphanumeric characters EXCEPT the hyphen from a string (python). How can I change this regular expression to match any non-alphanumeric char except the hyphen?

re.compile('[\W_]') 

Thanks.

like image 430
atp Avatar asked Nov 05 '10 17:11

atp


People also ask

How do you omit a character in regex?

To match any character except a list of excluded characters, put the excluded charaters between [^ and ] . The caret ^ must immediately follow the [ or else it stands for just itself. The character '.

What is difference [] and () in regex?

This answer is not useful. Show activity on this post. [] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9.

What is ?! In regex?

The ?! n quantifier matches any string that is not followed by a specific string n.


1 Answers

You could just use a negated character class instead:

re.compile(r"[^a-zA-Z0-9-]") 

This will match anything that is not in the alphanumeric ranges or a hyphen. It also matches the underscore, as per your current regex.

>>> r = re.compile(r"[^a-zA-Z0-9-]") >>> s = "some#%te_xt&with--##%--5 hy-phens  *#" >>> r.sub("",s) 'sometextwith----5hy-phens' 

Notice that this also replaces spaces (which may certainly be what you want).


Edit: SilentGhost has suggested it may likely be cheaper for the engine to process with a quantifier, in which case you can simply use:

re.compile(r"[^a-zA-Z0-9-]+") 

The + will simply cause any runs of consecutively matched characters to all match (and be replaced) at the same time.

like image 51
eldarerathis Avatar answered Sep 29 '22 04:09

eldarerathis