Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to find identical string in various columns of a data frame

Tags:

r

I am trying to find the identical strings across as many columns and combinations as possible. for instance, I have a data like this

df<-structure(list(first = c("SNTM1", "STTTT2", "STOLA", "STOMQ", 
"STR2", "SUPTY1", "TBNHSG", "TEYAH", "TMEIL1", "TMEIL2", "TMEIL3", 
"TNIL", "TREUK", "TTRK", "TRRFK", "UBA52", "YIPF1", NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA), second = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, "SNTLK", "STTTFSG", "STOIU", "STOMQ", "STR25", 
"SUPYHGS", "TBHYDG", "TEHDYG", "TMEIL1", "YIPF1", NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA), second2 = c(NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, "SNTLKM", "STTTFSGTT", "GFD", "STOMQ", 
"TRS", "BRsts", "TMHS", "RSEST", "TRSF", "YIPF1")), class = "data.frame", row.names = c(NA, 
-37L))

it has 3 columns, I want to find what is similar between column 1 and column 2 . then 2 and 3 and then 1,2,3 together . SO the answer is like this

C1-C2   C2-C3 C1-C3   C1-C2-C3
STOMQ   STOMQ   STOMQ STOMQ
TMEIL1  YIPF1   YIPF1 YIPF1
YIPF1   

which means C1(column1)-C2(column 2) share the only following identical strings

 STOMQ
TMEIL1
YIPF1

the same for other columns

like image 287
nik Avatar asked Oct 20 '25 17:10

nik


1 Answers

a <- combn(unname(df),2, do.call, what=intersect, simplify=FALSE)

a above contains the intersections of 1,2 and 1,3 and 2,3. Now to add the intersection of 1,2,3 to the list we do the below command: this add the intersection of 1,2,3 to the list a

c(a, list(intersect(a[[1]],a[[2]])))


[[1]]
[1] "STOMQ"  "TMEIL1" "YIPF1"  NA      

[[2]]
[1] "STOMQ" "YIPF1" NA     

[[3]]
[1] NA      "STOMQ" "YIPF1"

[[4]]
[1] "STOMQ" "YIPF1" NA     
like image 56
KU99 Avatar answered Oct 23 '25 05:10

KU99



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!