I want replace all \W
not letters with exception of -
dash to spaces i.e:
black-white
will give black-white
black#white
will give black white
I know regular expression very well but I have no idea how to deal with it.
Consider that I want use Unicode so [a-zA-Z]
is not \w
like in English only.
Consider that I prefer Python re syntax but can read other suggestions.
Using negated character class: (\W
is equivalent to [^\w]
; [^-\w]
=> \W
except -
)
>>> re.sub(r'[^-\w]', ' ', 'black-white')
'black-white'
>>> re.sub(r'[^-\w]', ' ', 'black#white')
'black white'
If you use regex
package, you can use nested sets, set operations:
>>> import regex
>>> print regex.sub(r'(?V1)[\W--[-]]', ' ', 'black-white')
black-white
>>> print regex.sub(r'(?V1)[\W--[-]]', ' ', 'black#white')
black white
I would use negative lookahead like below,
>>> re.sub(r'(?!-)\W', r' ', 'black-white')
'black-white'
>>> re.sub(r'(?!-)\W', r' ', 'black#white')
'black white'
(?!-)\W
the negative lookahead at the start asserts that the character we are going to match would be any from the \W
(non-word character list) but not of hyphen -
. It's like a kind of substraction, that is \W - character present inside the negative lookahead
(ie. hyphen).
DEMO
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With