How can I do a network analysis on three fields simultaneously in R. Below is sample data along with desired output
in last column.
df <- data.frame(
stringsAsFactors = FALSE,
id_1 = c("ABC","ABC","BCD",
"CDE","DEF","EFG","GHI","HIJ","IJK","JKL",
"GHI","KLM","LMN","MNO","NOP"),
id_2 = c("1A","2A","3A",
"1A","4A","5A","6A","8A","9A","10A","7A",
"12A","13A","14A","15A"),
id_3 = c("Z3","Z2","Z1",
"Z4","Z1","Z5","Z5","Z6","Z7","Z8","Z6","Z8",
"Z9","Z9","Z1"),
Name = c("StackOverflow1",
"StackOverflow2","StackOverflow3","StackOverflow4",
"StackOverflow5","StackOverflow6",
"StackOverflow7","StackOverflow8","StackOverflow9",
"StackOverflow10","StackOverflow11","StackOverflow12",
"StackOverflow13","StackOverflow14","StackOverflow15"),
desired_output = c(1L,1L,2L,1L,2L,
3L,3L,3L,4L,5L,3L,5L,6L,6L,2L)
)
df
#> id_1 id_2 id_3 Name desired_output
#> 1 ABC 1A Z3 StackOverflow1 1
#> 2 ABC 2A Z2 StackOverflow2 1
#> 3 BCD 3A Z1 StackOverflow3 2
#> 4 CDE 1A Z4 StackOverflow4 1
#> 5 DEF 4A Z1 StackOverflow5 2
#> 6 EFG 5A Z5 StackOverflow6 3
#> 7 GHI 6A Z5 StackOverflow7 3
#> 8 HIJ 8A Z6 StackOverflow8 3
#> 9 IJK 9A Z7 StackOverflow9 4
#> 10 JKL 10A Z8 StackOverflow10 5
#> 11 GHI 7A Z6 StackOverflow11 3
#> 12 KLM 12A Z8 StackOverflow12 5
#> 13 LMN 13A Z9 StackOverflow13 6
#> 14 MNO 14A Z9 StackOverflow14 6
#> 15 NOP 15A Z1 StackOverflow15 2
Actually I can perform network analysis on 2 fields simultaneously using igraph
as described in my own answer here, but I am unable to do it on 2 fields.
Please help.
My present approach (2 iterations), Which I have a feeling can be optimised.
library(igraph)
library(tidyverse)
graph.data.frame(df) %>%
components() %>%
pluck(membership) %>%
stack() %>%
set_names(c('GRP', 'id_1')) %>%
right_join(df %>% mutate(id_1 = as.factor(id_1)), by = c('id_1')) %>%
select(GRP, id_3) %>%
graph.data.frame() %>%
components() %>%
pluck(membership) %>%
stack() %>%
set_names(c('GRP', 'id_3')) %>%
right_join(df %>% mutate(id_3 = as.factor(id_3)), by = c('id_3'))
#> GRP id_3 id_1 id_2 Name desired_output
#> 1 1 Z3 ABC 1A StackOverflow1 1
#> 2 1 Z2 ABC 2A StackOverflow2 1
#> 3 2 Z1 BCD 3A StackOverflow3 2
#> 4 2 Z1 DEF 4A StackOverflow5 2
#> 5 2 Z1 NOP 15A StackOverflow15 2
#> 6 1 Z4 CDE 1A StackOverflow4 1
#> 7 3 Z5 EFG 5A StackOverflow6 3
#> 8 3 Z5 GHI 6A StackOverflow7 3
#> 9 3 Z6 HIJ 8A StackOverflow8 3
#> 10 3 Z6 GHI 7A StackOverflow11 3
#> 11 4 Z7 IJK 9A StackOverflow9 4
#> 12 5 Z8 JKL 10A StackOverflow10 5
#> 13 5 Z8 KLM 12A StackOverflow12 5
#> 14 6 Z9 LMN 13A StackOverflow13 6
#> 15 6 Z9 MNO 14A StackOverflow14 6
Created on 2021-11-15 by the reprex package (v2.0.1)
Create list of all connections between vertices defined by id columns and row number (function f
). At the end you are interested only in connection between rows.
f <- function(vec){
i <- last(vec)
vec <- head(vec, -1)
c(
seq_len(length(vec) - 1) %>% map(~vec[.x:(.x+1)]),
vec %>% map(~c(i, .x))
)
}
df$desired_output <- df %>%
select(matches("^id_[0-9]+$")) %>%
mutate(row = row_number()) %>%
pmap(~f(c(...))) %>%
flatten() %>%
reduce(rbind) %>%
igraph::graph_from_edgelist() %>%
components() %>%
membership() %>%
.[as.character(seq_len(nrow(df)))]
edit
Imagine connections between ids. You are interested in connections between rows. For that you need to add vertices for each row. Those vertices are connected to all ids in that row.
Example for 6th row:
6 EFG 5A Z5
we are interested in connections between ids (first part in c
in function f
:
[[1]]
[1] "EFG" "5A"
[[2]]
[1] "5A" "Z5"
and connections between row and ids (second part of c
in f
):
[[1]]
[1] "6" "EFG"
[[2]]
[1] "6" "5A"
[[3]]
[1] "6" "Z5"
when you create graph that way you end up with:
and you are interested which row vertices are connected
note
you can use directed = FALSE
when creating graph for this result, or mode = "strong"
in components
if you are interested in that.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With