Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

An efficient way to calculate the mean of each column or row of non-zero elements

I have a numpy array for ratings given by users on movies. The rating is between 1 and 5, while 0 means that a user does not rate on a movie. I want to calculate the average rating of each movie, and the average rating of each user. In other words, I will calculate the mean of each column or row of non-zero elements.

Is there an efficient numpy array function to handle this case? I know manually iterating ratings by columns or rows can solve the problem.

Thanks in advance!

like image 566
GarudaReiga Avatar asked Jan 11 '14 01:01

GarudaReiga


People also ask

Which function will be used to count non-zero values in a Dataframe?

Then call the count() function on this Series object, and it will give the count of non-zero values in the Dataframe column.

How do you count nonzero values in Python?

count_nonzero. Counts the number of non-zero values in the array a . The word “non-zero” is in reference to the Python 2.


2 Answers

Since the values to discard are 0, you can compute the mean manually by doing the sum along an axis and then dividing by the number of non zeros elements (along the same axis):

a = np.array([[8.,9,7,0], [0,0,5,6]])
a.sum(1)/(a != 0).sum(1)

results in:

array([ 8. ,  5.5])

as you can see, the zeros are not considered in the mean.

like image 52
user2304916 Avatar answered Nov 03 '22 00:11

user2304916


You could make use of np.nanmean, after converting all 0 values to np.nan. Note that np.nanmean is only available in numpy 1.8.

import numpy as np

ratings = np.array([[1,4,5,0],
                    [2,0,3,0],
                    [4,0,0,0]], dtype=np.float)


def get_means(ratings):
    ratings[np.where(ratings == 0)] = np.nan

    user_means = np.nanmean(ratings, axis=1)
    movie_means = np.nanmean(ratings, axis=0)

    return {'user_means' : user_means, 'movie_means' : movie_means}

Result:

>>> get_means(ratings)
{'movie_means': array([ 2.33333333,  4.        ,  4.        ,         nan]), 

'user_means': array([ 3.33333333,  2.5       ,  4.        ])}
like image 26
Akavall Avatar answered Nov 03 '22 01:11

Akavall