Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouping Ids based on at least one common values

I have a list whose elements are integers and I would like to accumulate these elements if only they share at least one value. With regard to those elements that don't share any values with the rest I would like them to stay as they are. Here is my sample date:

x <- list(c(1, 2), c(1, 2, 3), c(2, 3, 4), c(3, 4, 5), c(4, 5, 8), c(6, 9, 7), 7, c(5, 8), 10, 11)

And here is my desired output:

desired_reult <- list(c(1, 2, 3, 4, 5, 8), 
                      c(6, 9, 7), 
                      10, 
                      11)

I would like to do it first with reduce or accumulate functions from purrr but any other tidyverse solution would be welcomed. I have tried this solution so far but it only gave me one union and apparently abandons the rest:

x %>% 
  reduce(~ if(any(.x %in% .y)) union(.x, .y) else .x)

[1] 1 2 3 4 5 8

In general I am looking for a way of grouping integers (ids) with common values like a sort of clustering but so far my efforts have been in vain unfortunately.

Thank you very much indeed for your help in advance.

like image 774
Anoushiravan R Avatar asked Jun 14 '21 13:06

Anoushiravan R


1 Answers

I suspect there's a set covering solution to be had, but in the interim here's a graph approach:

First, let's convert the integer vectors to an edge list so it can be made into a graph. We can use expand.grid.

library(igraph)
edgelist <- do.call(rbind,lapply(x,\(x)expand.grid(x,x))) #R version >= 4.1.0

Now we have a two column data.frame showing the connections between all the integers (a set of edges).

igraph::graph.data.frame can conveniently make a graph from this.

From there we can use igraph::components to extract the connected components.

g <- graph.data.frame(edgelist)
split(names(components(g)$membership),components(g)$membership)
#$`1`
#[1] "1" "2" "3" "4" "5" "8"
#$`2`
#[1] "6" "9" "7"
#$`3`
#[1] "10"
#$`4`
#[1] "11"

Or with Tidyverse:

library(dplyr); library(purrr)
map_dfr(x, ~expand.grid(.x,.x)) %>%
  graph.data.frame() %>%
  components() %>% 
  pluck(membership) %>%
  stack() %>%
  {split(as.numeric(as.character(.[,2])),.[,1])}

$`1`
[1] 1 2 3 4 5 8

$`2`
[1] 6 9 7

$`3`
[1] 10

$`4`
[1] 11
like image 97
Ian Campbell Avatar answered Oct 05 '22 12:10

Ian Campbell