Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

convert dictionary to sparse matrix

I have a dictionary with keys as user_ids and values as list of movie_ids liked by that user with #unique_users = 573000 and # unique_movies =16000.

{1: [51, 379, 552, 2333, 2335, 4089, 4484], 2: [51, 379, 552, 1674, 1688, 2333, 3650, 4089, 4296, 4484], 5: [783, 909, 1052, 1138, 1147, 2676], 7: [171, 321, 959], 9: [3193], 10: [959], 11: [131,567,897,923],..........}

Now i want to convert this into into a matrix with rows as user_ids and columns as movies_id with values 1 for the movies which user has liked i.e it will be 573000*16000

Ultimately i have to multiply this matrix with it's transpose to have co-occurrence matrix with dim (#unique_movies,#unique_movies).

Also, what will be the time complexity of X'*X operation where X is like (500000,12000).

like image 390
chirag yadav Avatar asked Jun 16 '16 14:06

chirag yadav


People also ask

How do you convert to sparse matrix?

Description. S = sparse( A ) converts a full matrix into sparse form by squeezing out any zero elements. If a matrix contains many zeros, converting the matrix to sparse storage saves memory. S = sparse( m,n ) generates an m -by- n all zero sparse matrix.

How do you convert dense to sparse matrix?

A dense matrix stored in a NumPy array can be converted into a sparse matrix using the CSR representation by calling the csr_matrix() function.

What is a sparse matrix Python?

A sparse matrix is a matrix in which most of the elements have zero value and thus efficient ways of storing such matrices are required.


1 Answers

I think you can construct an empty dok_matrix and fill the values. Then transpose it and convert it to csr_matrix for efficient matrix multiplications.

import numpy as np
import scipy.sparse as sp
d = {1: [51, 379, 552, 2333, 2335, 4089, 4484], 2: [51, 379, 552, 1674, 1688, 2333, 3650, 4089, 4296, 4484], 5: [783, 909, 1052, 1138, 1147, 2676], 7: [171, 321, 959], 9: [3193], 10: [959], 11: [131,567,897,923]}

mat = sp.dok_matrix((573000,16000), dtype=np.int8)

for user_id, movie_ids in d.items():
    mat[user_id, movie_ids] = 1

mat = mat.transpose().tocsr()
print mat.shape
like image 81
Zichen Wang Avatar answered Sep 28 '22 00:09

Zichen Wang