I have a Corpus in R using the tm
package. I am applying the removeWords
function to remove stopwords
tm_map(abs, removeWords, stopwords("english"))
Is there a way to add my own custom stop words to this list?
stopwords is an R package that provides easy access to stopwords in more than 50 languages in the Stopwords ISO library. This package should be used conjunction with packages such as quanteda to perform text analysis in many different languages.
3.1.1 Stop word removal in R If you have your text in a tidy format with one word per row, you can use filter() from dplyr with a negated %in% if you have the stop words as a vector, or you can use anti_join() from dplyr if the stop words are in a tibble() .
Stop words are a set of commonly used words in any language. For example, in English, “the”, “is” and “and”, would easily qualify as stop words. In NLP and text mining applications, stop words are used to eliminate unimportant words, allowing applications to focus on the important words instead.
stopwords
just provides you with a vector of words, just c
ombine your own ones to this.
tm_map(abs, removeWords, c(stopwords("english"),"my","custom","words"))
Save your custom stop words
in a csv file (ex: word.csv
).
library(tm)
stopwords <- read.csv("word.csv", header = FALSE)
stopwords <- as.character(stopwords$V1)
stopwords <- c(stopwords, stopwords())
Then you can apply custom words
to your text file.
text <- VectorSource(text)
text <- VCorpus(text)
text <- tm_map(text, content_transformer(tolower))
text <- tm_map(text, removeWords, stopwords)
text <- tm_map(text, stripWhitespace)
text[[1]]$content
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With