I am trying to remove words that have length below 2 and any word that is numbers. For example
s = " This is a test 1212 test2"
Output desired is
" This is test test2"
I tried \w{2,}
this removes all the word whose length is below 2. When I added \D+
this removes all numbers when I didn't want to get rid of 2 from test2
.
The \W* at the start lets you remove both the word and the preceding non-word characters so that the rest of the sentence still matches up. Note that punctuation is included in \W , use \s if you only want to remove preceding whitespace.
Using 'str. replace() , we can replace a specific character. If we want to remove that specific character, replace that character with an empty string. The str. replace() method will replace all occurrences of the specific character mentioned.
You can remove a character from a Python string using replace() or translate(). Both these methods replace a character or string with a given value. If an empty string is specified, the character or string you select is removed from the string without a replacement.
You may use:
s = re.sub(r'\b(?:\d+|\w)\b\s*', '', s)
RegEx Demo
Pattern Details:
\b
: Match word boundary(?:\d+|\w)
: Match a single word character or 1+ digits\b
: Match word boundary\s*
: Match 0 or more whitespacesYou can make use of work boundaries '\b'
and remove anything that is 1 character long inside boundaries: number or letter, doesn't matter.
Also remove anything between boundaries that is just numbers:
import re
s = " This is a test 1212 test2"
print( re.sub(r"\b([^ ]|\d+)\b","",s))
Output:
This is test test2
Explanation:
\b( word boundary followed by a group
[^ ] anything that is not a space (1 character)
| or
\d+ any amount of numbers
) followed by another boundary
is replaced by re.sub(pattern, replaceBy, source)
with ""
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With