I'm trying to remove all characters from a string except for #
, @
, :)
, :(
.
Example:
this is, a placeholder text. I wanna remove symbols like ! and ? but keep @ & # & :)
should result in (after removing the matched results):
this is a placeholder text I wanna remove symbols like and but keep @ # :)
I tried:
(?! |#|@|:\)|:\()\W
It is working but in the case of :)
and :(
, :
is still being matched.
I know that it's matching because it's checking every character and the previous ones, e.g: :)
matches only :
but :))
matches :)
.
This is a tricky question, because you want to remove all symbols except for a certain whitelist. In addition, some of the symbols on the whitelist actually consist of two characters:
:)
:(
To handle this, we can first spare both colon :
and parentheses, then selectively remove either one should it not be part of a smiley or frown face:
input = "this is, a (placeholder text). I wanna remove symbols like: ! and ? but keep @ & # & :)"
output = re.sub(r'[^\w\s:()@&#]|:(?![()])|(?<!:)[()]', '', input)
print(output)
this is a placeholder text I wanna remove symbols like and but keep @ & # & :)
The regex character class I used was:
[^\w\s:()@&#]
This will match any character which is not a word or whitespace character. It also spares your whitelist from the replacement. In the other two parts of the alternation, we then override this logic, by removing colon and parentheses should they not be part of a smiley face.
As others have shown, it is possible to write a regex that will succeed the way you have framed the problem. But this is a case where it's much simpler to write a regex to match what you want to keep. Then just join those parts together.
import re
rgx = re.compile(r'\w|\s|@|&|#|:\)|:\(')
orig = 'Blah!! Blah.... ### .... #@:):):) @@ Blah! Blah??? :):)#'
new = ''.join(rgx.findall(orig))
print(new)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With