Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do network analysis on three fields simultaneously in R

How can I do a network analysis on three fields simultaneously in R. Below is sample data along with desired output in last column.

df <- data.frame(
  stringsAsFactors = FALSE,
                    id_1 = c("ABC","ABC","BCD",
                             "CDE","DEF","EFG","GHI","HIJ","IJK","JKL",
                             "GHI","KLM","LMN","MNO","NOP"),
                    id_2 = c("1A","2A","3A",
                             "1A","4A","5A","6A","8A","9A","10A","7A",
                             "12A","13A","14A","15A"),
                    id_3 = c("Z3","Z2","Z1",
                             "Z4","Z1","Z5","Z5","Z6","Z7","Z8","Z6","Z8",
                             "Z9","Z9","Z1"),
                    Name = c("StackOverflow1",
                             "StackOverflow2","StackOverflow3","StackOverflow4",
                             "StackOverflow5","StackOverflow6",
                             "StackOverflow7","StackOverflow8","StackOverflow9",
                             "StackOverflow10","StackOverflow11","StackOverflow12",
                             "StackOverflow13","StackOverflow14","StackOverflow15"),
          desired_output = c(1L,1L,2L,1L,2L,
                             3L,3L,3L,4L,5L,3L,5L,6L,6L,2L)
      )
df
#>    id_1 id_2 id_3            Name desired_output
#> 1   ABC   1A   Z3  StackOverflow1              1
#> 2   ABC   2A   Z2  StackOverflow2              1
#> 3   BCD   3A   Z1  StackOverflow3              2
#> 4   CDE   1A   Z4  StackOverflow4              1
#> 5   DEF   4A   Z1  StackOverflow5              2
#> 6   EFG   5A   Z5  StackOverflow6              3
#> 7   GHI   6A   Z5  StackOverflow7              3
#> 8   HIJ   8A   Z6  StackOverflow8              3
#> 9   IJK   9A   Z7  StackOverflow9              4
#> 10  JKL  10A   Z8 StackOverflow10              5
#> 11  GHI   7A   Z6 StackOverflow11              3
#> 12  KLM  12A   Z8 StackOverflow12              5
#> 13  LMN  13A   Z9 StackOverflow13              6
#> 14  MNO  14A   Z9 StackOverflow14              6
#> 15  NOP  15A   Z1 StackOverflow15              2

Actually I can perform network analysis on 2 fields simultaneously using igraph as described in my own answer here, but I am unable to do it on 2 fields.

Please help.

My present approach (2 iterations), Which I have a feeling can be optimised.

library(igraph)
library(tidyverse)

graph.data.frame(df) %>%
  components() %>%
  pluck(membership) %>%
  stack() %>%
  set_names(c('GRP', 'id_1')) %>%
  right_join(df %>% mutate(id_1 = as.factor(id_1)), by = c('id_1')) %>%
  select(GRP, id_3) %>%
  graph.data.frame() %>% 
  components() %>%
  pluck(membership) %>%
  stack() %>%
  set_names(c('GRP', 'id_3')) %>%
  right_join(df %>% mutate(id_3 = as.factor(id_3)), by = c('id_3'))
#>    GRP id_3 id_1 id_2            Name desired_output
#> 1    1   Z3  ABC   1A  StackOverflow1              1
#> 2    1   Z2  ABC   2A  StackOverflow2              1
#> 3    2   Z1  BCD   3A  StackOverflow3              2
#> 4    2   Z1  DEF   4A  StackOverflow5              2
#> 5    2   Z1  NOP  15A StackOverflow15              2
#> 6    1   Z4  CDE   1A  StackOverflow4              1
#> 7    3   Z5  EFG   5A  StackOverflow6              3
#> 8    3   Z5  GHI   6A  StackOverflow7              3
#> 9    3   Z6  HIJ   8A  StackOverflow8              3
#> 10   3   Z6  GHI   7A StackOverflow11              3
#> 11   4   Z7  IJK   9A  StackOverflow9              4
#> 12   5   Z8  JKL  10A StackOverflow10              5
#> 13   5   Z8  KLM  12A StackOverflow12              5
#> 14   6   Z9  LMN  13A StackOverflow13              6
#> 15   6   Z9  MNO  14A StackOverflow14              6

Created on 2021-11-15 by the reprex package (v2.0.1)

like image 359
AnilGoyal Avatar asked Jan 24 '23 05:01

AnilGoyal


1 Answers

Create list of all connections between vertices defined by id columns and row number (function f). At the end you are interested only in connection between rows.

f <- function(vec){
  
  i <- last(vec)
  vec <- head(vec, -1)
  
  c(
    seq_len(length(vec) - 1) %>% map(~vec[.x:(.x+1)]),
    vec %>% map(~c(i, .x))
  ) 
}

df$desired_output <- df %>%
  select(matches("^id_[0-9]+$")) %>%
  mutate(row = row_number()) %>%
  pmap(~f(c(...))) %>%
  flatten() %>%
  reduce(rbind) %>%
  igraph::graph_from_edgelist() %>% 
  components() %>%
  membership() %>%
  .[as.character(seq_len(nrow(df)))]

edit

Imagine connections between ids. You are interested in connections between rows. For that you need to add vertices for each row. Those vertices are connected to all ids in that row.

Example for 6th row:

6  EFG   5A   Z5

we are interested in connections between ids (first part in c in function f:

[[1]]
[1] "EFG" "5A" 

[[2]]
[1] "5A" "Z5"

and connections between row and ids (second part of c in f):

[[1]]
[1] "6"   "EFG"

[[2]]
[1] "6"  "5A"

[[3]]
[1] "6"  "Z5"

when you create graph that way you end up with:

enter image description here

and you are interested which row vertices are connected

note

you can use directed = FALSE when creating graph for this result, or mode = "strong" in components if you are interested in that.

like image 163
det Avatar answered Feb 10 '23 03:02

det