Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove small words using Python

Tags:

Is it possible use regex to remove small words in a text? For example, I have the following string (text):

anytext = " in the echo chamber from Ontario duo " 

I would like remove all words that is 3 characters or less. The Result should be:

"echo chamber from Ontario" 

Is it possible do that using regular expression or any other python function?

Thanks.

like image 976
Thomas Avatar asked Sep 27 '12 19:09

Thomas


People also ask

How do you remove short words in Python?

The \W* at the start lets you remove both the word and the preceding non-word characters so that the rest of the sentence still matches up. Note that punctuation is included in \W , use \s if you only want to remove preceding whitespace.

How do you remove common words in Python?

Using Python's NLTK Library To remove stop words from a sentence, you can divide your text into words and then remove the word if it exits in the list of stop words provided by NLTK. In the script above, we first import the stopwords collection from the nltk. corpus module.

How do you remove words from a string?

Given a String and a Word, the task is remove that Word from the String. Approach : In Java, this can be done using String replaceAll method by replacing given word with a blank space.


1 Answers

I don't think you need a regex for this simple example anyway ...

' '.join(word for word in anytext.split() if len(word)>3) 
like image 189
mgilson Avatar answered Oct 04 '22 05:10

mgilson