Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing words featured in character vector from string

Tags:

r

I have a character vector of stopwords in R:

stopwords = c("a" ,
            "able" ,
            "about" ,
            "above" ,
            "abst" ,
            "accordance" ,
            ...
            "yourself" ,
            "yourselves" ,
            "you've" ,
            "z" ,
            "zero")

Let's say I have the string:

str <- c("I have zero a accordance")

How can remove my defined stopwords from str?

I think gsub or another grep tool could be a good candidate to pull this off, although other recommendations are welcome.

like image 363
zthomas.nc Avatar asked Mar 04 '16 07:03

zthomas.nc


2 Answers

Try this:

str <- c("I have zero a accordance")

stopwords = c("a", "able", "about", "above", "abst", "accordance", "yourself",
"yourselves", "you've", "z", "zero")

x <- unlist(strsplit(str, " "))

x <- x[!x %in% stopwords]

paste(x, collapse = " ")

# [1] "I have"

Addition: Writing a "removeWords" function is simple so it is not necessary to load an external package for this purpose:

removeWords <- function(str, stopwords) {
  x <- unlist(strsplit(str, " "))
  paste(x[!x %in% stopwords], collapse = " ")
}

removeWords(str, stopwords)
# [1] "I have"
like image 81
Mikko Avatar answered Sep 30 '22 19:09

Mikko


You could use the tm library for this:

require("tm")
removeWords(str,stopwords)
#[1] "I have   "
like image 45
RHertel Avatar answered Sep 30 '22 20:09

RHertel