I want to remove the stop words from my column "tweets". How do I iterative over each row and each item?
pos_tweets = [('I love this car', 'positive'), ('This view is amazing', 'positive'), ('I feel great this morning', 'positive'), ('I am so excited about the concert', 'positive'), ('He is my best friend', 'positive')] test = pd.DataFrame(pos_tweets) test.columns = ["tweet","class"] test["tweet"] = test["tweet"].str.lower().str.split() from nltk.corpus import stopwords stop = stopwords.words('english')
We use Pandas apply with the lambda function and list comprehension to remove stop words declared in NLTK.
To remove stop words from a sentence, you can divide your text into words and then remove the word if it exits in the list of stop words provided by NLTK. In the script above, we first import the stopwords collection from the nltk. corpus module. Next, we import the word_tokenize() method from the nltk.
With the replace() function, we can create a new string where the specified value is replaced by another specified value. We can use the replace() function to remove words from a string. To remove a given word from a string, you can use replace() and pass an empty string as the replacement value as shown below.
We can import stopwords
from nltk.corpus
as below. With that, We exclude stopwords with Python's list comprehension and pandas.DataFrame.apply
.
# Import stopwords with nltk. from nltk.corpus import stopwords stop = stopwords.words('english') pos_tweets = [('I love this car', 'positive'), ('This view is amazing', 'positive'), ('I feel great this morning', 'positive'), ('I am so excited about the concert', 'positive'), ('He is my best friend', 'positive')] test = pd.DataFrame(pos_tweets) test.columns = ["tweet","class"] # Exclude stopwords with Python's list comprehension and pandas.DataFrame.apply. test['tweet_without_stopwords'] = test['tweet'].apply(lambda x: ' '.join([word for word in x.split() if word not in (stop)])) print(test) # Out[40]: # tweet class tweet_without_stopwords # 0 I love this car positive I love car # 1 This view is amazing positive This view amazing # 2 I feel great this morning positive I feel great morning # 3 I am so excited about the concert positive I excited concert # 4 He is my best friend positive He best friend
It can also be excluded by using pandas.Series.str.replace
.
pat = r'\b(?:{})\b'.format('|'.join(stop)) test['tweet_without_stopwords'] = test['tweet'].str.replace(pat, '') test['tweet_without_stopwords'] = test['tweet_without_stopwords'].str.replace(r'\s+', ' ') # Same results. # 0 I love car # 1 This view amazing # 2 I feel great morning # 3 I excited concert # 4 He best friend
If you can not import stopwords, you can download as follows.
import nltk nltk.download('stopwords')
Another way to answer is to import text.ENGLISH_STOP_WORDS
from sklearn.feature_extraction
.
# Import stopwords with scikit-learn from sklearn.feature_extraction import text stop = text.ENGLISH_STOP_WORDS
Notice that the number of words in the scikit-learn stopwords and nltk stopwords are different.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With