I am trying to create a data frame for creating network charts using igraph package. I have sample data "mydata_data" and I want to create "expected_data".
I can easily calculate number of customers visited a particular store, but how do I calculate common set of customers who go to store x1 & store x2 etc.
I have 500+ stores, so I don't want to create columns manually. Sample data for reproducible purpose given below:
mydata_data<-data.frame(
Customer_Name=c("A","A","C","D","D","B"),
Store_Name=c("x1","x2","x2","x2","x3","x1"))
expected_data<-data.frame(
Store_Name=c("x1","x2","x3","x1_x2","x2_x3","x1_x3"),
Customers_Visited=c(2,3,1,1,1,0))
Another possible solution via dplyr is to create a list with all the combos for each customer, unnest that list, count and merge with a data frame with all the combinations, i.e.
library(tidyverse)
df %>%
group_by(Customer_Name) %>%
summarise(combos = list(unique(c(unique(Store_Name), paste(unique(Store_Name), collapse = '_'))))) %>%
unnest() %>%
group_by(combos) %>%
count() %>%
right_join(data.frame(combos = c(unique(df$Store_Name), combn(unique(df$Store_Name), 2, paste, collapse = '_'))))
which gives,
# A tibble: 6 x 2 # Groups: combos [?] combos n <chr> <int> 1 x1 2 2 x2 3 3 x3 1 4 x1_x2 1 5 x1_x3 NA 6 x2_x3 1
NOTE: Make sure that your Store_Name variable is a character NOT factor, otherwise the combn() will fail
Here's an igraph approach:
A <- as.matrix(as_adj(graph_from_edgelist(as.matrix(mydata_data), directed = FALSE)))
stores <- as.character(unique(mydata_data$Store_Name))
storeCombs <- t(combn(stores, 2))
data.frame(Store_Name = c(stores, apply(storeCombs, 1, paste, collapse = "_")),
Customers_Visited = c(colSums(A)[stores], (A %*% A)[storeCombs]))
# Store_Name Customers_Visited
# 1 x1 2
# 2 x2 3
# 3 x3 1
# 4 x1_x2 1
# 5 x1_x3 0
# 6 x2_x3 1
Explanation: A is the adjacency matrix of the corresponding undirected graph. stores is simply
stores
# [1] "x1" "x2" "x3"
while
storeCombs
# [,1] [,2]
# [1,] "x1" "x2"
# [2,] "x1" "x3"
# [3,] "x2" "x3"
The main trick then is how to obtain Customers_Visited: the first three numbers are just the corresponding numbers of neighbours of stores, while the common customers we get from the common graph neighbours (which we get from the square of A).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With