Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get minimal shared part between elements of string's vector

Having a list of vector of strings:

xx <- c("concord wanderer basic set air snug beige",
  "concord wanderer basic set air snug black noir", 
  "concord wanderer basic set air snug blue bleu", 
  "concord wanderer basic set air snug brown marron", 
  "concord wanderer basic set air snug green vert", 
   "concord wanderer basic set air snug grey gris", 
   "concord wanderer basic set air snug red rouge", 
   "concord wanderer basic set air snug rose" )

I tried to get minimal shared part between elements of the vector, for example, here I should get:

"concord wanderer basic set air snug"

xx is a result of a previous process, so I am sure that there is a shared part between the elements. But the removed part is not always at the end of he strings.

Using strsplit and `table I get this partial solution, but it is a little bit tricky and I loose the original order of words:

table_x <- table(unlist(strsplit(xx,' ')))
paste(names(table_x[table_x==max(table_x)]),collapse=' ')
[1] "air basic concord set snug wanderer"

I am pretty sure that there is better solution. I tried with agrep or adist but without a lot of success.

like image 388
agstudy Avatar asked Mar 21 '23 04:03

agstudy


1 Answers

You could use intersect with Reduce to get the output you want.

paste(Reduce(intersect, strsplit(xx, " ")), collapse=" ")
#[1] "concord wanderer basic set air snug"
like image 91
GSee Avatar answered Apr 30 '23 03:04

GSee