Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create Sparse Matrix from a data frame

I m doing an assignment where I am trying to build a collaborative filtering model for the Netflix prize data. The data that I am using is in a CSV file which I easily imported into a data frame. Now what I need to do is create a sparse matrix consisting of the Users as the rows and Movies as the columns and each cell is filled up by the corresponding rating value. When I try to map out the values in the data frame I need to run a loop for each row in the data frame, which is taking a lot of time in R, please can anyone suggest a better approach. Here is the sample code and data:

buildUserMovieMatrix <- function(trainingData)
{
  UIMatrix <- Matrix(0, nrow = max(trainingData$UserID), ncol = max(trainingData$MovieID), sparse = T);
  for(i in 1:nrow(trainingData))
  {
    UIMatrix[trainingData$UserID[i], trainingData$MovieID[i]] = trainingData$Rating[i];
  }
  return(UIMatrix);
}

Sample of data in the dataframe from which the sparse matrix is being created:

    MovieID UserID  Rating
1       1      2       3
2       2      3       3
3       2      4       4
4       2      6       3
5       2      7       3

So in the end I want something like this: The columns are the movie IDs and the rows are the user IDs

    1   2   3   4   5   6   7
1   0   0   0   0   0   0   0
2   3   0   0   0   0   0   0
3   0   3   0   0   0   0   0
4   0   4   0   0   0   0   0
5   0   0   0   0   0   0   0
6   0   3   0   0   0   0   0
7   0   3   0   0   0   0   0

So the interpretation is something like this: user 2 rated movie 1 as 3 star, user 3 rated the movie 2 as 3 star and so on for the other users and movies. There are about 8500000 rows in my data frame for which my code takes just about 30-45 mins to create this user item matrix, i would like to get any suggestions

like image 354
user37940 Avatar asked Oct 05 '14 22:10

user37940


People also ask

How do you create a sparse data frame?

Use DataFrame. sparse. from_spmatrix() to create a DataFrame with sparse values from a sparse matrix.

How do you construct a sparse matrix?

S = sparse( A ) converts a full matrix into sparse form by squeezing out any zero elements. If a matrix contains many zeros, converting the matrix to sparse storage saves memory. S = sparse( m,n ) generates an m -by- n all zero sparse matrix.

How do you convert a DataFrame to a CSR matrix?

To convert a DataFrame to a CSR matrix, you first need to create indices for users and movies. Then, you can perform conversion with the sparse. csr_matrix function. It is a bit faster to convert via a coordinate (COO) matrix.


1 Answers

The Matrix package has a constructor made especially for your type of data:

library(Matrix)
UIMatrix <- sparseMatrix(i = trainingData$UserID,
                         j = trainingData$MovieID,
                         x = trainingData$Rating)

Otherwise, you might like knowing about that cool feature of the [ function known as matrix indexing. Your could have tried:

buildUserMovieMatrix <- function(trainingData) {
  UIMatrix <- Matrix(0, nrow = max(trainingData$UserID),
                        ncol = max(trainingData$MovieID), sparse = TRUE);
  UIMatrix[cbind(trainingData$UserID,
                 trainingData$MovieID)] <- trainingData$Rating;
  return(UIMatrix);
}

(but I would definitely recommend the sparseMatrix approach over this.)

like image 181
flodel Avatar answered Oct 20 '22 08:10

flodel