Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count unique values of a column by pairwise combinations of another column and group by third column in R

Quite a complicated task, to be honest. It's basically an extension of a question I asked earlier - Count unique values of a column by pairwise combinations of another column in R

Let's say this time around, I have the following data frame in R:

data.frame(Reg.ID = c(1,1,2,2,2,3,3), Location = c("X","X","Y","Y","Y","X","X"), Product = c("A","B","A","B","C","B","A"))

The data looks like this -

      Reg.ID Location Product
1      1        X       A
2      1        X       B
3      2        Y       A
4      2        Y       B
5      2        Y       C
6      3        X       B
7      3        X       A

I would like to count unique values of the column "Reg.ID" by pairwise combinations of the values in column "Product", grouped by the column "Location". The result should look like this -

  Location Prod.Comb Count
1        X       A,B     2
2        Y       A,B     1
3        Y       A,C     1
4        Y       B,C     1

I tried getting the output using base R functions, but didn't get any success. I'm guessing there's a fairly simple solution using data.tablepackage in R?

Any help would be greatly appreciated. Thanks!

like image 233
sharmanas Avatar asked Feb 13 '17 22:02

sharmanas


2 Answers

Not much tested idea, but this is what comes to mind first with data.table:

library(data.table)
dt <- data.table(Reg.ID = c(1,1,2,2,2,3,3), Location = c("X","X","Y","Y","Y","X","X"), Product = c("A","B","A","B","C","B","A"))
dt.cj <- merge(dt, dt, by ="Location", all = T, allow.cartesian = T)
dt.res <- dt.cj[Product.x < Product.y, .(cnt = length(unique(Reg.ID.x))),by = .(Location, Product.x, Product.y)]


#    Location Product.x Product.y cnt
# 1:        X         A         B  2
# 2:        Y         A         B  1
# 3:        Y         A         C  1
# 4:        Y         B         C  1
like image 140
Bulat Avatar answered Sep 28 '22 01:09

Bulat


A dplyr solution, plagiarized from the question you mention:

library(dplyr)

df <- data.frame(Reg.ID = c(1,1,2,2,2,3,3), 
                 Location = c("X","X","Y","Y","Y","X","X"), 
                 Product = c("A","B","A","B","C","B","A"),
                 stringsAsFactors = FALSE)

df %>%
  full_join(df, by="Location") %>%
  filter(Product.x < Product.y) %>%
  group_by(Location, Product.x, Product.y) %>%
  summarise(Count = length(unique(Reg.ID.x))) %>%
  mutate(Prod.Comb = paste(Product.x, Product.y, sep=",")) %>%
  ungroup %>%
  select(Location, Prod.Comb, Count) %>%
  arrange(Location, Prod.Comb)

# # A tibble: 4 × 3
#   Location Prod.Comb Count
#      <chr>     <chr> <int>
# 1        X       A,B     2
# 2        Y       A,B     1
# 3        Y       A,C     1
# 4        Y       B,C     1
like image 24
Scarabee Avatar answered Sep 28 '22 01:09

Scarabee