Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

regex remove all non alphanumeric characters except emoticons

I need to remove all non alphanumeric characters except spaces and allowed emoticons.

Allowed emoticons are :), :(, :P etc (the most popular).

I have a string:

$string = 'Hi! Glad # to _ see : you :)';

so I need to process this string and get the following:

$string = 'Hi Glad to see  you :)';

Also please pay attention emoticons can contain spaces

e.g.

:     ) instead of :)

or

:     P instead of :P

Does anyone have a function to do this?

If someone helped me it would be so great :)

UPDATE

Thank you very much for your help.

buckley offered ready solution,

but if string contains emoticons with spaces

e.g. Hi! Glad # to _ see : you :   )

result is equal to Hi Glad to see you

as you see emoticon :  ) was cut off.

like image 715
xyz Avatar asked Jan 07 '23 18:01

xyz


2 Answers

I don't "speak" php ;) but this does it in JS. Maybe you can convert it.

var sIn = 'Hi! Glad # to _ see : you :)',
    sOut;

sOut = sIn.match(/([\w\s]|: ?\)|: ?\(|: ?P)*/g).join('');

It works the otherway around from your attempt - it finds all "legal" characters/combinations and joins them together.

Regards

Edit: Updated regex to handle optional spaces in emoticons (as commented earlier).

like image 129
SamWhan Avatar answered Jan 14 '23 11:01

SamWhan


Ha! This one was interesting

Replace

(?!(:\)|:\(|:P))[^a-zA-Z0-9 ](?<!(:\)|:\(|:P))

With nothing

The idea is that you sandwich the illegal characters with the same regex once as a negative lookhead and once as negative lookbehind.

The result will have consecutive spaces in it. This is something that a regex cannot do in 1 sweep AFAIK cause it can't look at multiple matches at once.

To eliminate the consecutive spaces you can replace \s+ with (an empty space)

like image 39
buckley Avatar answered Jan 14 '23 11:01

buckley