Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding custom stopwords in R tm

I have a Corpus in R using the tm package. I am applying the removeWords function to remove stopwords

tm_map(abs, removeWords, stopwords("english")) 

Is there a way to add my own custom stop words to this list?

like image 849
Brian Avatar asked Aug 26 '13 14:08

Brian


People also ask

What package is Stopwords in R?

stopwords is an R package that provides easy access to stopwords in more than 50 languages in the Stopwords ISO library. This package should be used conjunction with packages such as quanteda to perform text analysis in many different languages.

How do I remove stop words from a text file in R?

3.1.1 Stop word removal in R If you have your text in a tidy format with one word per row, you can use filter() from dplyr with a negated %in% if you have the stop words as a vector, or you can use anti_join() from dplyr if the stop words are in a tibble() .

What are considered Stopwords?

Stop words are a set of commonly used words in any language. For example, in English, “the”, “is” and “and”, would easily qualify as stop words. In NLP and text mining applications, stop words are used to eliminate unimportant words, allowing applications to focus on the important words instead.


2 Answers

stopwords just provides you with a vector of words, just combine your own ones to this.

tm_map(abs, removeWords, c(stopwords("english"),"my","custom","words")) 
like image 85
James Avatar answered Oct 10 '22 09:10

James


Save your custom stop words in a csv file (ex: word.csv).

library(tm)
stopwords <- read.csv("word.csv", header = FALSE)
stopwords <- as.character(stopwords$V1)
stopwords <- c(stopwords, stopwords())

Then you can apply custom words to your text file.

text <- VectorSource(text)
text <- VCorpus(text)
text <- tm_map(text, content_transformer(tolower))
text <- tm_map(text, removeWords, stopwords)
text <- tm_map(text, stripWhitespace)

text[[1]]$content
like image 31
Reza Rahimi Avatar answered Oct 10 '22 10:10

Reza Rahimi