Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: pairwise extraction of common elements between multiple character lists

Tags:

r

I have several lists with gene names like this:

List1:

XLOC_012482 
XLOC_019357 
XLOC_014642 
XLOC_010021 
XLOC_013282 

List2:

XLOC_012482 
XLOC_019357 
XLOC_004860 
XLOC_004022 
XLOC_002278 

List3:

XLOC_004860 
XLOC_004022 
XLOC_006292 
XLOC_006616 
XLOC_013802 

And I want to extract the common elements between all pairs of lists. I tried using intersect but I could not use it on characters, and I also don't know how to perform this on all pairwise combinations.

like image 575
Jon Avatar asked Jun 28 '16 20:06

Jon


People also ask

How do you find common elements in multiple lists in R?

First of all, create a number of vectors. Use intersect function to find the common elements in all the vectors.

How do you find the matching value between two vectors in R?

R Match – Using match() and %in% to compare vectors The R match () function – returns the indices of common elements. the %in% operator – returns a vector of True / False results which indicates if a value in the first vector was present in the second.


2 Answers

You can put your lists into a single list li and then use combn on the list with intersect as the function parameter:

combn(li, 2, function(x) intersect(x[[1]], x[[2]]), simplify = F)
# [[1]]
# [1] "XLOC_012482" "XLOC_019357"
# 
# [[2]]
# character(0)
# 
# [[3]]
# [1] "XLOC_004860" "XLOC_004022"

Data:

li <- list(c("XLOC_012482", "XLOC_019357", "XLOC_014642", "XLOC_010021", 
"XLOC_013282"), c("XLOC_012482", "XLOC_019357", "XLOC_004860", 
"XLOC_004022", "XLOC_002278"), c("XLOC_004860", "XLOC_004022", 
"XLOC_006292", "XLOC_006616", "XLOC_013802"))
like image 90
Psidom Avatar answered Nov 03 '22 07:11

Psidom


This is also helpful using table (I use the same li list as @Psidom's answer):

tb <- table(unlist(li))

will give you each sequence along with its count among all lists:

# XLOC_002278 XLOC_004022 XLOC_004860 XLOC_006292 XLOC_006616 XLOC_010021 XLOC_012482 
#        1           2           2           1           1           1           2 
# XLOC_013282 XLOC_013802 XLOC_014642 XLOC_019357 
#          1           1           1           2 

If you want to extract those duplicated:

tb[tb>1]

# XLOC_004022 XLOC_004860 XLOC_012482 XLOC_019357 
#          2           2           2           2 
like image 39
989 Avatar answered Nov 03 '22 08:11

989