Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Contingency table based on third variable (numeric)

Tags:

r

Some time ago I asked a question about creating market basket data. Now I would like to create a similar data.frame, but based on a third variable. Unfortunately I run into problems trying. Previous question: Effecient way to create market basket matrix in R
@shadow and @SimonO101 gave me good answers, but I was not able to alter their anwser correctly. I have the following data:

Customer <- as.factor(c(1000001,1000001,1000001,1000001,1000001,1000001,1000002,1000002,1000002,1000003,1000003,1000003))
Product <- as.factor(c(100001,100001,100001,100004,100004,100002,100003,100003,100003,100002,100003,100008))
input <- data.frame(Customer,Product)

I can create a contingency table now the following way:

input_df <- as.data.frame.matrix(table(input))

However I have a third (numeric) variable which I want as output in the table.

Number <- c(3,1,-4,1,1,1,1,1,1,1,1,1) 
input <- data.frame(Customer,Product,Number)

Now the code (of course, now there are 3 variables) does not work anymore. The result I am looking for has unique Customer as row names and unique Product as column names. And has Number as value (or 0 if not present), this number could be calculated by:

input_agg <- aggregate( Number ~ Customer + Product, data = input, sum)

Hope my question is clear, please comment if something is not clear.

like image 817
Freddy Avatar asked Dec 26 '22 18:12

Freddy


1 Answers

You can use xtabs for that :

R> xtabs(Number~Customer+Product, data=input)

         Product
Customer  100001 100002 100003 100004 100008
  1000001      0      1      0      2      0
  1000002      0      0      3      0      0
  1000003      0      1      1      0      1
like image 146
juba Avatar answered Dec 28 '22 08:12

juba