I have a numpy array for ratings given by users on movies. The rating is between 1 and 5, while 0 means that a user does not rate on a movie. I want to calculate the average rating of each movie, and the average rating of each user. In other words, I will calculate the mean of each column or row of non-zero elements.
Is there an efficient numpy array function to handle this case? I know manually iterating ratings by columns or rows can solve the problem.
Thanks in advance!
Then call the count() function on this Series object, and it will give the count of non-zero values in the Dataframe column.
count_nonzero. Counts the number of non-zero values in the array a . The word “non-zero” is in reference to the Python 2.
Since the values to discard are 0, you can compute the mean manually by doing the sum along an axis and then dividing by the number of non zeros elements (along the same axis):
a = np.array([[8.,9,7,0], [0,0,5,6]])
a.sum(1)/(a != 0).sum(1)
results in:
array([ 8. , 5.5])
as you can see, the zeros are not considered in the mean.
You could make use of np.nanmean
, after converting all 0
values to np.nan
. Note that np.nanmean
is only available in numpy 1.8
.
import numpy as np
ratings = np.array([[1,4,5,0],
[2,0,3,0],
[4,0,0,0]], dtype=np.float)
def get_means(ratings):
ratings[np.where(ratings == 0)] = np.nan
user_means = np.nanmean(ratings, axis=1)
movie_means = np.nanmean(ratings, axis=0)
return {'user_means' : user_means, 'movie_means' : movie_means}
Result:
>>> get_means(ratings)
{'movie_means': array([ 2.33333333, 4. , 4. , nan]),
'user_means': array([ 3.33333333, 2.5 , 4. ])}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With