Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select pairs that are repeated in a list of dataframes

Tags:

r

This is my code. It's a list of dataframes. My actual list is much bigger, this list has 3 entries, mine has more than 1,000. Its just a example:

w=list(structure(list(Col1 = structure(1:6, .Label = c("A", "B", 
"C", "D", "E", "F"), class = "factor"), Col2 = structure(c(1L, 
2L, 3L, 2L, 4L, 5L), .Label = c("B", "C", "D", "F", "G"), class = "factor")), class = "data.frame", row.names = c(NA, 
-6L)), structure(list(Col1 = structure(c(1L, 4L, 5L, 6L, 2L, 
3L), .Label = c("A", "E", "H", "M", "N", "P"), class = "factor"), 
Col2 = structure(c(1L, 2L, 3L, 2L, 4L, 5L), .Label = c("B", 
"C", "D", "F", "G"), class = "factor")), class = "data.frame", row.names = c(NA, 
-6L)), structure(list(Col1 = structure(c(1L, 4L, 6L, 5L, 2L, 
3L), .Label = c("A", "W", "H", "M", "T", "U"), class = "factor"), 
Col2 = structure(c(1L, 2L, 3L, 2L, 4L, 5L), .Label = c("B", 
"C", "D", "S", "G"), class = "factor")), class = "data.frame", row.names = c(NA, 
-6L)))

What I need is to identify the pairs that are repeated in each entry, that is, it goes in the first entry and goes through all the pairs, there goes to the second entry and sees if there is any pair that is repeated compared to the first one input

In the third entry he makes the same search, looking at the first entry and the second AT THE SAME TIME and checking if the pair in question in the 3rd entry is present in the 1st and 2nd entries simultaneously.

If you only have the first and third entries I do not care. Or the second and third do not interest me either.

** That is, he should give me pairs A B and E F. **

Notice that the pair HG repeats only in the second and third entrance, so this pair does not interest me.

I need to have an order and it is critical that the pairs of the response belong to the first entry. The best answer would be for the pairs to belong to all inputs, which is the case of A B. The case of E F would be the second best option.

I'd like to be able to save them in a vector of text elements.

What is the function that could be used to express this idea? Any sugestion?

like image 418
Laura Avatar asked Aug 21 '18 20:08

Laura


1 Answers

Reduce(f = dplyr::intersect, x = w)
#   Col1 Col2
# 1    A    B
# 2    E    F
# Warning messages:
# 1: Column `Col1` joining factors with different levels, coercing to character vector 
# 2: Column `Col1` joining character vector and factor, coercing into character vector 

I would assume this will go faster if you use data.tables instead of data.frames in your list and then use fintersect. If you are reading from files, lapply(your_files, fread) will create data.tables very quickly (and also avoid the factor issues that get taken care of with a warning).

like image 188
Gregor Thomas Avatar answered Nov 03 '22 05:11

Gregor Thomas