Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R generating a sparse matrix

I have a large file with the following format which I read as x

userid,productid,freq
293994,8,3
293994,5,3
949859,2,1
949859,1,1
123234,1,1
123234,3,1
123234,4,1
...

It gives the product a given user bought and its frequency. I'm trying to make it into a matrix which gives all the productid's as columns and userids as rows with the frequency value as the entry. So the expected output is

       1 2 3 4 5 8
293994 0 0 0 0 3 3
949859 1 1 0 0 0 0
123234 1 0 1 1 0 0

It is a sparse matrix. I tried doing table(x[[1]],x[[2]]) which works for small files, but beyond a point table gives an error

Error in table(x[[1]], x[[2]]) : 
 attempt to make a table with >= 2^31 elements
Execution halted

Is there a way to get this to work? I'm on R-3.1.0 and its supposed to support 2^51 sized vectors, so confused why it can't handle the file size. I've 40MM lines with total file size of 741M. Thanks in advance

like image 388
broccoli Avatar asked Oct 01 '22 11:10

broccoli


2 Answers

One data.table way of doing it is:

library(data.table)
library(reshape2)

# adjust fun.aggregate as necessary - not very clear what you want from OP
dcast.data.table(your_data_table, userid ~ productid, fill = 0L)

You can check if that works for your data.

like image 196
eddi Avatar answered Oct 03 '22 16:10

eddi


#This is old, but worth noting the Matrix package sparseMatrix() to directly format object without reshaping.

    userid <- c(293994,293994,949859,949859,123234,123234,123234)
    productid <- c(8,5,2,1,1,3,4)
    freq <- c(3,3,1,1,1,1,1)

    library(Matrix)

#The dgCMatrix sparseMatrix is a fraction of the size and builds much faster than reshapeing if the data gets large

    x <- sparseMatrix(i=as.integer(as.factor(userid)),
                      j=as.integer(as.factor(productid)),
                      dimnames = list(as.character(levels(as.factor(userid))),
                                   as.character(levels(as.factor(productid)))
                                   ),
                      x=freq)


#Easily converted to a matrix.
    x <- as.matrix(x)

#Learned this the hard way using recommenderlab (package built on top of Matrix) to build a binary matrix, so in case it helps someone else.
like image 26
David Lucey Avatar answered Oct 03 '22 16:10

David Lucey