Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create table with all pairs of values from one column in R, counting unique values [duplicate]

Tags:

r

reshape

I have data that shows what customers have purchased certain items. They can purchase an item multiple times. What I need is a table that shows all of the possible pairwise combinations of items along with the unique number of customers who have purchased that combination (the diagonal of the table will just be the unique number of people purchasing each item).

Here is an example:

item <- c("h","h","h","j","j")
customer <- c("a","a","b","b","b")
test.data <- data.frame(item,customer)

Here is the test.data:

item customer
h    a
h    a
h    b
j    b
j    b

Result needed - a table with the items as row and column names, with the counts of unique customers purchasing the pair inside the table. So, 2 customers purchased item h, 1 purchased both item h and j, and 1 purchased item j.

item   h    j
h      2    1
j      1    1

I have tried using the table function, melt/cast, etc., but nothing gets me the counts I need within the table. My first step is using unique() to get rid of duplicate rows.

like image 332
user1228982 Avatar asked Oct 06 '15 18:10

user1228982


People also ask

How do I count unique values of a column in R?

How to Count Distinct Values in R?, using the n_distinct() function from dplyr, you can count the number of distinct values in an R data frame using one of the following methods. With the given data frame, the following examples explain how to apply each of these approaches in practice.

How do I count unique frequencies in R?

There are multiple ways to get the count of the frequency of all unique values in an R vector. To count the number of times each element or value is present in a vector use either table(), tabulate(), count() from plyr package, or aggregate() function.

How do I get unique values in multiple columns in R?

To extract unique values in multiple columns in an R data frame, we first need to create a vector of the column values but for that we would need to read the columns in matrix form. After that we can simply unique function for the extraction.

What is unique function in R?

Unique() function in R Programming Language it is used to return a vector, data frame, or array without any duplicate elements/rows. Syntax: unique(x, incomparables, fromLast, nmax, …, MARGIN)


1 Answers

Using data.table and the gtools package, we can recreate all possible permutations by customer:

library(data.table)
library(gtools)

item <- c("h","h","h","j","j")
customer <- c("a","a","b","b","b")
test.data <- data.table(item,customer)

DT <- unique(test.data) #The unique is used as multiple purchases do not count twice

tuples <- function(x){
  return(data.frame(permutations(length(x), 2, x, repeats.allowed = T, set = F), stringsAsFactors = F))
}

DO <- DT[, tuples(item), by = customer]

This gives:

   customer X1 X2
1:        a  h  h
2:        b  h  h
3:        b  h  j
4:        b  j  h
5:        b  j  j

Which is a list of all unique item pairings a customer has. As per your example we are treating h x j differently from j x h. We can now get the frequency of each pair using the table function:

table(DO$X1,DO$X2)
    j h
  j 1 1
  h 1 2
like image 105
Chris Avatar answered Oct 12 '22 15:10

Chris