I use the pd.pivot_table() method to create a user-item matrix by pivoting the user-item activity data. However, the dataframe is so large that I got complain like this:
Unstacked DataFrame is too big, causing int32 overflow
Any suggestions on solving this problem? Thanks!
r_matrix = df.pivot_table(values='rating', index='userId', columns='movieId')
Some Solutions:
df.groupby('EVENT_ID')['DIAGNOSIS'].apply(list).to_dict()
You can use groupby
instead. Try this code:
reviews.groupby(['userId','movieId'])['rating'].max().unstack()
An integer overflow inside library code is nothing you can do much about. You have basically three options:
You don't provide much code, so I cannot tell what is the best solution for you.
If you want movieId as your columns, first sort the dataframe using movieId as the key.
Then divide (half) the dataframe such that each subset contains all the ratings for a particular movie.
subset1 = df[:n]
subset2 = df[n:]
Now, apply to each of the subsets
matrix1 = subset1.pivot_table(values='rating', index='userId', columns='movieId')
matrix2 = subset2.pivot_table(values='rating', index='userId', columns='movieId')
Finally join matrix1 and matrix2 using,
complete_matrix = matrix1.join(matrix2)
On the other hand, if you want userId as your columns, sort the dataframe using userId as the key and repeat the above process.
***Please be sure to delete subset1, subset2, matrix1 & matrix2 after you're done or else you'll end up with Memory Error.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With