Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to calculate common values across different groups?

Tags:

r

dplyr

igraph

I am trying to create a data frame for creating network charts using igraph package. I have sample data "mydata_data" and I want to create "expected_data".

I can easily calculate number of customers visited a particular store, but how do I calculate common set of customers who go to store x1 & store x2 etc.

I have 500+ stores, so I don't want to create columns manually. Sample data for reproducible purpose given below:

mydata_data<-data.frame(
  Customer_Name=c("A","A","C","D","D","B"),
  Store_Name=c("x1","x2","x2","x2","x3","x1"))

expected_data<-data.frame(
 Store_Name=c("x1","x2","x3","x1_x2","x2_x3","x1_x3"), 
 Customers_Visited=c(2,3,1,1,1,0))
like image 931
Yogesh Kumar Avatar asked Dec 13 '18 06:12

Yogesh Kumar


2 Answers

Another possible solution via dplyr is to create a list with all the combos for each customer, unnest that list, count and merge with a data frame with all the combinations, i.e.

library(tidyverse)

df %>%
    group_by(Customer_Name) %>%
    summarise(combos = list(unique(c(unique(Store_Name), paste(unique(Store_Name), collapse = '_'))))) %>%
    unnest() %>%
    group_by(combos) %>%
    count() %>%
    right_join(data.frame(combos = c(unique(df$Store_Name), combn(unique(df$Store_Name), 2, paste, collapse = '_'))))

which gives,

# A tibble: 6 x 2
# Groups:   combos [?]
  combos     n
  <chr>  <int>
1 x1         2
2 x2         3
3 x3         1
4 x1_x2      1
5 x1_x3     NA
6 x2_x3      1

NOTE: Make sure that your Store_Name variable is a character NOT factor, otherwise the combn() will fail

like image 153
Sotos Avatar answered Oct 12 '22 07:10

Sotos


Here's an igraph approach:

A <- as.matrix(as_adj(graph_from_edgelist(as.matrix(mydata_data), directed = FALSE)))
stores <- as.character(unique(mydata_data$Store_Name))
storeCombs <- t(combn(stores, 2))

data.frame(Store_Name = c(stores, apply(storeCombs, 1, paste, collapse = "_")),
           Customers_Visited = c(colSums(A)[stores], (A %*% A)[storeCombs]))
#   Store_Name Customers_Visited
# 1         x1                 2
# 2         x2                 3
# 3         x3                 1
# 4      x1_x2                 1
# 5      x1_x3                 0
# 6      x2_x3                 1

Explanation: A is the adjacency matrix of the corresponding undirected graph. stores is simply

stores
# [1] "x1" "x2" "x3"

while

storeCombs
#      [,1] [,2]
# [1,] "x1" "x2"
# [2,] "x1" "x3"
# [3,] "x2" "x3"

The main trick then is how to obtain Customers_Visited: the first three numbers are just the corresponding numbers of neighbours of stores, while the common customers we get from the common graph neighbours (which we get from the square of A).

like image 42
Julius Vainora Avatar answered Oct 12 '22 06:10

Julius Vainora