Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

RegEx for matching all chars except some special chars and ":)"

I'm trying to remove all characters from a string except for #, @, :), :(. Example:

this is, a placeholder text. I wanna remove symbols like ! and ? but keep @ & # & :)

should result in (after removing the matched results):

this is a placeholder text I wanna remove symbols like  and  but keep @  #  :)

I tried:

(?! |#|@|:\)|:\()\W

It is working but in the case of :) and :(, : is still being matched. I know that it's matching because it's checking every character and the previous ones, e.g: :) matches only : but :)) matches :).

like image 274
mahmoudafer Avatar asked May 11 '19 15:05

mahmoudafer


2 Answers

This is a tricky question, because you want to remove all symbols except for a certain whitelist. In addition, some of the symbols on the whitelist actually consist of two characters:

:)
:(

To handle this, we can first spare both colon : and parentheses, then selectively remove either one should it not be part of a smiley or frown face:

input = "this is, a (placeholder text). I wanna remove symbols like: ! and ? but keep @ & # & :)"
output = re.sub(r'[^\w\s:()@&#]|:(?![()])|(?<!:)[()]', '', input)
print(output)

this is a placeholder text I wanna remove symbols like  and  but keep @ & # & :)

The regex character class I used was:

[^\w\s:()@&#]

This will match any character which is not a word or whitespace character. It also spares your whitelist from the replacement. In the other two parts of the alternation, we then override this logic, by removing colon and parentheses should they not be part of a smiley face.

like image 77
Tim Biegeleisen Avatar answered Sep 30 '22 14:09

Tim Biegeleisen


As others have shown, it is possible to write a regex that will succeed the way you have framed the problem. But this is a case where it's much simpler to write a regex to match what you want to keep. Then just join those parts together.

import re

rgx = re.compile(r'\w|\s|@|&|#|:\)|:\(')
orig = 'Blah!! Blah.... ### .... #@:):):) @@ Blah! Blah??? :):)#'
new = ''.join(rgx.findall(orig))
print(new)
like image 44
FMc Avatar answered Sep 30 '22 12:09

FMc