Is it possible use regex to remove small words in a text? For example, I have the following string (text):
anytext = " in the echo chamber from Ontario duo "
I would like remove all words that is 3 characters or less. The Result should be:
"echo chamber from Ontario"
Is it possible do that using regular expression or any other python function?
Thanks.
The \W* at the start lets you remove both the word and the preceding non-word characters so that the rest of the sentence still matches up. Note that punctuation is included in \W , use \s if you only want to remove preceding whitespace.
Using Python's NLTK Library To remove stop words from a sentence, you can divide your text into words and then remove the word if it exits in the list of stop words provided by NLTK. In the script above, we first import the stopwords collection from the nltk. corpus module.
Given a String and a Word, the task is remove that Word from the String. Approach : In Java, this can be done using String replaceAll method by replacing given word with a blank space.
I don't think you need a regex for this simple example anyway ...
' '.join(word for word in anytext.split() if len(word)>3)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With