I have a character vector of stopwords in R:
stopwords = c("a" ,
"able" ,
"about" ,
"above" ,
"abst" ,
"accordance" ,
...
"yourself" ,
"yourselves" ,
"you've" ,
"z" ,
"zero")
Let's say I have the string:
str <- c("I have zero a accordance")
How can remove my defined stopwords from str
?
I think gsub
or another grep
tool could be a good candidate to pull this off, although other recommendations are welcome.
Try this:
str <- c("I have zero a accordance")
stopwords = c("a", "able", "about", "above", "abst", "accordance", "yourself",
"yourselves", "you've", "z", "zero")
x <- unlist(strsplit(str, " "))
x <- x[!x %in% stopwords]
paste(x, collapse = " ")
# [1] "I have"
Addition: Writing a "removeWords" function is simple so it is not necessary to load an external package for this purpose:
removeWords <- function(str, stopwords) {
x <- unlist(strsplit(str, " "))
paste(x[!x %in% stopwords], collapse = " ")
}
removeWords(str, stopwords)
# [1] "I have"
You could use the tm
library for this:
require("tm")
removeWords(str,stopwords)
#[1] "I have "
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With