This is my code. It's a list of dataframes. My actual list is much bigger, this list has 3 entries, mine has more than 1,000. Its just a example:
w=list(structure(list(Col1 = structure(1:6, .Label = c("A", "B",
"C", "D", "E", "F"), class = "factor"), Col2 = structure(c(1L,
2L, 3L, 2L, 4L, 5L), .Label = c("B", "C", "D", "F", "G"), class = "factor")), class = "data.frame", row.names = c(NA,
-6L)), structure(list(Col1 = structure(c(1L, 4L, 5L, 6L, 2L,
3L), .Label = c("A", "E", "H", "M", "N", "P"), class = "factor"),
Col2 = structure(c(1L, 2L, 3L, 2L, 4L, 5L), .Label = c("B",
"C", "D", "F", "G"), class = "factor")), class = "data.frame", row.names = c(NA,
-6L)), structure(list(Col1 = structure(c(1L, 4L, 6L, 5L, 2L,
3L), .Label = c("A", "W", "H", "M", "T", "U"), class = "factor"),
Col2 = structure(c(1L, 2L, 3L, 2L, 4L, 5L), .Label = c("B",
"C", "D", "S", "G"), class = "factor")), class = "data.frame", row.names = c(NA,
-6L)))
What I need is to identify the pairs that are repeated in each entry, that is, it goes in the first entry and goes through all the pairs, there goes to the second entry and sees if there is any pair that is repeated compared to the first one input
In the third entry he makes the same search, looking at the first entry and the second AT THE SAME TIME and checking if the pair in question in the 3rd entry is present in the 1st and 2nd entries simultaneously.
If you only have the first and third entries I do not care. Or the second and third do not interest me either.
** That is, he should give me pairs A B and E F. **
Notice that the pair HG repeats only in the second and third entrance, so this pair does not interest me.
I need to have an order and it is critical that the pairs of the response belong to the first entry. The best answer would be for the pairs to belong to all inputs, which is the case of A B. The case of E F would be the second best option.
I'd like to be able to save them in a vector of text elements.
What is the function that could be used to express this idea? Any sugestion?
Reduce(f = dplyr::intersect, x = w)
# Col1 Col2
# 1 A B
# 2 E F
# Warning messages:
# 1: Column `Col1` joining factors with different levels, coercing to character vector
# 2: Column `Col1` joining character vector and factor, coercing into character vector
I would assume this will go faster if you use data.table
s instead of data.frame
s in your list and then use fintersect
. If you are reading from files, lapply(your_files, fread)
will create data.table
s very quickly (and also avoid the factor issues that get taken care of with a warning).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With