I want to strip all non-alphanumeric characters EXCEPT the hyphen from a string (python). How can I change this regular expression to match any non-alphanumeric char except the hyphen?
re.compile('[\W_]')
Thanks.
To match any character except a list of excluded characters, put the excluded charaters between [^ and ] . The caret ^ must immediately follow the [ or else it stands for just itself. The character '.
This answer is not useful. Show activity on this post. [] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9.
The ?! n quantifier matches any string that is not followed by a specific string n.
You could just use a negated character class instead:
re.compile(r"[^a-zA-Z0-9-]")
This will match anything that is not in the alphanumeric ranges or a hyphen. It also matches the underscore, as per your current regex.
>>> r = re.compile(r"[^a-zA-Z0-9-]") >>> s = "some#%te_xt&with--##%--5 hy-phens *#" >>> r.sub("",s) 'sometextwith----5hy-phens'
Notice that this also replaces spaces (which may certainly be what you want).
Edit: SilentGhost has suggested it may likely be cheaper for the engine to process with a quantifier, in which case you can simply use:
re.compile(r"[^a-zA-Z0-9-]+")
The +
will simply cause any runs of consecutively matched characters to all match (and be replaced) at the same time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With