I have a large file with the following format which I read as x
userid,productid,freq
293994,8,3
293994,5,3
949859,2,1
949859,1,1
123234,1,1
123234,3,1
123234,4,1
...
It gives the product a given user bought and its frequency. I'm trying to make it into a matrix which gives all the productid's as columns and userids as rows with the frequency value as the entry. So the expected output is
1 2 3 4 5 8
293994 0 0 0 0 3 3
949859 1 1 0 0 0 0
123234 1 0 1 1 0 0
It is a sparse matrix. I tried doing table(x[[1]],x[[2]])
which works for small files, but beyond a point table
gives an error
Error in table(x[[1]], x[[2]]) :
attempt to make a table with >= 2^31 elements
Execution halted
Is there a way to get this to work? I'm on R-3.1.0 and its supposed to support 2^51 sized vectors, so confused why it can't handle the file size. I've 40MM lines with total file size of 741M. Thanks in advance
One data.table
way of doing it is:
library(data.table)
library(reshape2)
# adjust fun.aggregate as necessary - not very clear what you want from OP
dcast.data.table(your_data_table, userid ~ productid, fill = 0L)
You can check if that works for your data.
#This is old, but worth noting the Matrix package sparseMatrix() to directly format object without reshaping.
userid <- c(293994,293994,949859,949859,123234,123234,123234)
productid <- c(8,5,2,1,1,3,4)
freq <- c(3,3,1,1,1,1,1)
library(Matrix)
#The dgCMatrix sparseMatrix is a fraction of the size and builds much faster than reshapeing if the data gets large
x <- sparseMatrix(i=as.integer(as.factor(userid)),
j=as.integer(as.factor(productid)),
dimnames = list(as.character(levels(as.factor(userid))),
as.character(levels(as.factor(productid)))
),
x=freq)
#Easily converted to a matrix.
x <- as.matrix(x)
#Learned this the hard way using recommenderlab (package built on top of Matrix) to build a binary matrix, so in case it helps someone else.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With