I have a csv file in this format :
userId movieId rating timestamp
1 31 2.5 1260759144
2 10 4 835355493
3 1197 5 1298932770
4 10 4 949810645
I want to construct a sparse matrix with rows as userId and columns as movieID. I have stored all the data as a dictionary named "column" where column['user'] contains user IDs, column['movie'] has movie IDs, and column['ratings'] has ratings as follows:
f = open('ratings.csv','rb')
reader = csv.reader(f)
headers = ['user','movie','rating','timestamp']
column = {}
for h in headers:
column[h] = []
for row in reader:
for h, v in zip(headers, row):
column[h].append(float(v))
When I call the sparse matrix function as :
mat = scipy.sparse.csr_matrix((column['rating'],(column['user'],column['movie'])))
I get "TypeError: invalid shape"
Please help
scipy.sparse.csr_matrix([column['rating'],column['user'],column['movie']])
You had a tuple consisting of a 1xn dimensional list, and a 2xn dimensional list which will not work.
P.S.: For the reading of the data, you should try Pandas :-) (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html). Minimal example:
import pandas as pd
# Setup a dataframe from the CSV and make it sparse
df = pd.read_csv('ratings.csv')
df = df.to_sparse(fill_value=0)
print(df.head())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With