An efficient way to calculate the mean of each column or row of non-zero elements

Tags:

I have a numpy array for ratings given by users on movies. The rating is between 1 and 5, while 0 means that a user does not rate on a movie. I want to calculate the average rating of each movie, and the average rating of each user. In other words, I will calculate the mean of each column or row of non-zero elements.

Is there an efficient numpy array function to handle this case? I know manually iterating ratings by columns or rows can solve the problem.

Thanks in advance!

566

asked Jan 11 '14 01:01

GarudaReiga

2 Answers

Since the values to discard are 0, you can compute the mean manually by doing the sum along an axis and then dividing by the number of non zeros elements (along the same axis):

a = np.array([[8.,9,7,0], [0,0,5,6]])
a.sum(1)/(a != 0).sum(1)

results in:

array([ 8. ,  5.5])

as you can see, the zeros are not considered in the mean.

answered Nov 03 '22 00:11

user2304916

You could make use of np.nanmean, after converting all 0 values to np.nan. Note that np.nanmean is only available in numpy 1.8.

import numpy as np

ratings = np.array([[1,4,5,0],
                    [2,0,3,0],
                    [4,0,0,0]], dtype=np.float)


def get_means(ratings):
    ratings[np.where(ratings == 0)] = np.nan

    user_means = np.nanmean(ratings, axis=1)
    movie_means = np.nanmean(ratings, axis=0)

    return {'user_means' : user_means, 'movie_means' : movie_means}

Result:

>>> get_means(ratings)
{'movie_means': array([ 2.33333333,  4.        ,  4.        ,         nan]), 

'user_means': array([ 3.33333333,  2.5       ,  4.        ])}

answered Nov 03 '22 01:11

Akavall

Related questions
                            
                                Replacing element in list without list comprehension, slicing or using [ ]s
                            
                                Name 'x' is parameter and global [Python]
                            
                                AttributeError in python/numpy when constructing function for certain values
                            
                                What's wrong with Pandas plot?
                            
                                How can I count the occurrences of an item in a list of dictionaries?
                            
                                GridSearchCV on LogisticRegression in scikit-learn
                            
                                Trying to run KIVY, for the first time
                            
                                'ascii' codec can't encode character u'\u2013' in position 9: ordinal not in range(128)
                            
                                Does "for key in dict" in python always iterate in a fixed order?
                            
                                Using Sci-Kit learn to classify text with a large corpus
                            
                                Upload File using Django Rest Framework
                            
                                urllib module error! AttributeError: 'module' object has no attribute 'request'
                            
                                What is Building and Installing?
                            
                                How to convert datetime string to UTC to plot points on Highcharts
                            
                                'str' object has no attribute 'META'
                            
                                Where is "The Zen of Python" located in the CPython source code?
                            
                                Scrapy : How to pass list of arguments through command prompt to spider?
                            
                                Find all text files not containing some text string
                            
                                Create a Numpy scalar from dtype
                            
                                Scikit-learn: Parallelize stochastic gradient descent

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

An efficient way to calculate the mean of each column or row of non-zero elements

Tags:

python

arrays

numpy

GarudaReiga

People also ask

2 Answers

user2304916

Akavall

Recent Activity

Donate For Us