Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove elements of a vector that are substrings of another

Tags:

string

r

Is there a better way to achieve this? I'd like to remove all strings from this vector, which are substrings of other elements.

words = c("please can you", 
  "please can", 
  "can you", 
  "how did you", 
  "did you",
  "have you")
> words
[1] "please can you" "please can"     "can you"        "how did you"    "did you"        "have you"

library(data.table)
library(stringr)
dt = setDT(expand.grid(word1 = words, word2 = words, stringsAsFactors = FALSE))
dt[, found := str_detect(word1, word2)]
setdiff(words, dt[found == TRUE & word1 != word2, word2])
[1] "please can you" "how did you"    "have you" 

This works, but it seems like overkill and I'm interested to know a more elegant way of doing it.

like image 543
Akhil Nair Avatar asked Oct 18 '15 19:10

Akhil Nair


Video Answer


1 Answers

Search for each component of words in words keeping those that occur once:

words[colSums(sapply(words, grepl, words, fixed = TRUE)) == 1]

giving:

[1] "please can you" "how did you"    "have you"   
like image 140
G. Grothendieck Avatar answered Sep 22 '22 18:09

G. Grothendieck