Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how remove special characters from the end of every word in a string?

Tags:

python

regex

i want it match only the end of every word

example:

"i am test-ing., i am test.ing-, i am_, test_ing," 

output should be:

"i am test-ing i am test.ing i am test_ing"
like image 491
killown Avatar asked Nov 25 '25 17:11

killown


1 Answers

>>> import re
>>> test = "i am test-ing., i am test.ing-, i am_, test_ing,"
>>> re.sub(r'([^\w\s]|_)+(?=\s|$)', '', test)
'i am test-ing i am test.ing i am test_ing'

Matches one or more non-alphanumeric characters ([^\w\s]|_) followed by either a space (\s) or the end of the string ($). The (?= ) construct is a lookahead assertion: it makes sure that a matching space is not included in the match, so it doesn't get replaced; only the [\W_]+ gets replaced.

Okay, but why [^\w\s]|_, you ask? The first part matches anything that's not alphanumeric or an underscore ([^\w]) or whitespace ([^\s]), i.e. punctuation characters. Except we do want to eliminate underscores, so we then include those with |_.

like image 90
John Kugelman Avatar answered Nov 28 '25 08:11

John Kugelman



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!