Consider that I have the below mentioned String;
str_input <- c("Mellanox,Asia, China, India, JAVA, United States, APIs")
I have used the below mentioned gsub code which removes my specific StopWords.
gsub(paste0("\\b(",paste(location_sw, collapse="|"),")\\b"), "", str_input)
where, location_sw consists of my list of stopwords as mentioned below
location_sw <- c('Rose', 'Java', 'JAVA', 'Mellanox', 'Microsoft', '144GiB', 'West',
'Amazon', 'Channel Asia', 'jClarity', 'APIs')
On using the above provided gsub code, I am getting the below mentioned output
",Asia, China, India, , United States, "
However, I would like the following outcome;
"Asia, China, India, United States"
I would like to remove the commas present after removing the stopwords. Any inputs will be really helpfull.
Another approach is to strsplit
the string into a character vector and then taking the setdiff
with respect to location_sw
:
out <- setdiff(strsplit(str_input, split = ",\\s*")[[1]], location_sw)
out
#> [1] "Asia" "China" "India" "United States"
If necessary, we can paste
it back to a character:
paste(out, collapse = ", ")
#> [1] "Asia, China, India, United States"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With