I have a list
[[1]]
[1] 7
[[2]]
[1] 10 11 12 211 446 469
[[3]]
[1] 10 11 12 13
[[4]]
[1] 11 12 13 215
[[5]]
[1] 15 16
[[6]]
[1] 15 17 216 225
I want to merge list slices that have common elements, and index which list slices have been merged. My desired output is below.
$`1`
[1] 7
$`2`, `3`, `4`
[1] 10 11 12 13 211 215 446 469
$`5`,`6`
[1] 15 16 17 216 225
(I've put the original list slice indices as new list names, but any form of output is fine.)
Reproducible data:
mylist <- list(7, c(10, 11, 12, 211, 446, 469), c(10, 11, 12, 13), c(11,
12, 13, 215), c(15, 16), c(15, 17, 216, 225))
Here is another approach using "Matrix" and "igraph" packages.
First, we need to extract the information of which elements are connected. Using sparse matrices can, potetially, save a lot memory usage:
library(Matrix)
i = rep(1:length(mylist), lengths(mylist))
j = factor(unlist(mylist))
tab = sparseMatrix(i = i, j = as.integer(j), x = TRUE, dimnames = list(NULL, levels(j)))
#as.matrix(tab) ## just to print colnames
# 7 10 11 12 13 15 16 17 211 215 216 225 446 469
#[1,] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#[2,] FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE TRUE
#[3,] FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#[4,] FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
#[5,] FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#[6,] FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE TRUE TRUE FALSE FALSE
Find if each element is connected to each other:
connects = tcrossprod(tab, boolArith = TRUE)
#connects
#6 x 6 sparse Matrix of class "lsCMatrix"
#
#[1,] | . . . . .
#[2,] . | | | . .
#[3,] . | | | . .
#[4,] . | | | . .
#[5,] . . . . | |
#[6,] . . . . | |
Then, using graphs, we can group the indices of "mylist":
library(igraph)
# 'graph_from_adjacency_matrix' seems to not work with the "connects" object directly.
# An alternative to coercing "connects" here would be to build it as 'tcrossprod(tab) > 0'
group = clusters(graph_from_adjacency_matrix(as(connects, "lsCMatrix")))$membership
#group
#[1] 1 2 2 2 3 3
And, finally, concatenate:
tapply(mylist, group, function(x) sort(unique(unlist(x))))
#$`1`
#[1] 7
#
#$`2`
#[1] 10 11 12 13 211 215 446 469
#
#$`3`
#[1] 15 16 17 216 225
tapply(1:length(mylist), group, toString)
# 1 2 3
# "1" "2, 3, 4" "5, 6"
Not happy with the solution but this I think gives the answer. There is still scope of improvement :
unique(sapply(lst, function(x)
unique(unlist(lst[sapply(lst, function(y)
any(x %in% y))]))))
#[[1]]
#[1] 7
#[[2]]
#[1] 10 11 12 211 446 469 13 215
#[[3]]
#[1] 15 16 17 216 225
This is basically double loop to check if any of the list element is present in any another list. If you find any such element then merge them together taking only unique
values out of them.
data
lst <- list(7, c(10 ,11 ,12, 211, 446, 469), c(10, 11, 12, 13),c(11 ,12, 13 ,215),
c(15, 16), c(15, 17 ,216 ,225))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With