I have a dictionary with keys as user_ids and values as list of movie_ids liked by that user with #unique_users = 573000 and # unique_movies =16000.
{1: [51, 379, 552, 2333, 2335, 4089, 4484], 2: [51, 379, 552, 1674, 1688, 2333, 3650, 4089, 4296, 4484], 5: [783, 909, 1052, 1138, 1147, 2676], 7: [171, 321, 959], 9: [3193], 10: [959], 11: [131,567,897,923],..........}
Now i want to convert this into into a matrix with rows as user_ids and columns as movies_id with values 1 for the movies which user has liked i.e it will be 573000*16000
Ultimately i have to multiply this matrix with it's transpose to have co-occurrence matrix with dim (#unique_movies,#unique_movies).
Also, what will be the time complexity of X'*X operation where X is like (500000,12000).
Description. S = sparse( A ) converts a full matrix into sparse form by squeezing out any zero elements. If a matrix contains many zeros, converting the matrix to sparse storage saves memory. S = sparse( m,n ) generates an m -by- n all zero sparse matrix.
A dense matrix stored in a NumPy array can be converted into a sparse matrix using the CSR representation by calling the csr_matrix() function.
A sparse matrix is a matrix in which most of the elements have zero value and thus efficient ways of storing such matrices are required.
I think you can construct an empty dok_matrix and fill the values. Then transpose it and convert it to csr_matrix for efficient matrix multiplications.
import numpy as np
import scipy.sparse as sp
d = {1: [51, 379, 552, 2333, 2335, 4089, 4484], 2: [51, 379, 552, 1674, 1688, 2333, 3650, 4089, 4296, 4484], 5: [783, 909, 1052, 1138, 1147, 2676], 7: [171, 321, 959], 9: [3193], 10: [959], 11: [131,567,897,923]}
mat = sp.dok_matrix((573000,16000), dtype=np.int8)
for user_id, movie_ids in d.items():
mat[user_id, movie_ids] = 1
mat = mat.transpose().tocsr()
print mat.shape
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With