Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

np.mean() vs np.average() in Python NumPy?

People also ask

Is mean the same as average in Python?

mean() function can be used to calculate mean/average of a given list of numbers. It returns mean of the data set passed as parameters. Arithmetic mean is the sum of data divided by the number of data-points.

What does NP mean do in numpy?

mean() in Python. The sum of elements, along with an axis divided by the number of elements, is known as arithmetic mean. The numpy. mean() function is used to compute the arithmetic mean along the specified axis.

Does numpy have an average function?

The numpy. average() function computes the weighted average of elements in an array according to their respective weight given in another array. The function can have an axis parameter. If the axis is not specified, the array is flattened.

What is NP average in Python?

average() in Python. The numpy module of Python provides a function called numpy. average(), used for calculating the weighted average along the specified axis.


np.average takes an optional weight parameter. If it is not supplied they are equivalent. Take a look at the source code: Mean, Average

np.mean:

try:
    mean = a.mean
except AttributeError:
    return _wrapit(a, 'mean', axis, dtype, out)
return mean(axis, dtype, out)

np.average:

...
if weights is None :
    avg = a.mean(axis)
    scl = avg.dtype.type(a.size/avg.size)
else:
    #code that does weighted mean here

if returned: #returned is another optional argument
    scl = np.multiply(avg, 0) + scl
    return avg, scl
else:
    return avg
...

np.mean always computes an arithmetic mean, and has some additional options for input and output (e.g. what datatypes to use, where to place the result).

np.average can compute a weighted average if the weights parameter is supplied.


In some version of numpy there is another imporant difference that you must be aware:

average do not take in account masks, so compute the average over the whole set of data.

mean takes in account masks, so compute the mean only over unmasked values.

g = [1,2,3,55,66,77]
f = np.ma.masked_greater(g,5)

np.average(f)
Out: 34.0

np.mean(f)
Out: 2.0

In addition to the differences already noted, there's another extremely important difference that I just now discovered the hard way: unlike np.mean, np.average doesn't allow the dtype keyword, which is essential for getting correct results in some cases. I have a very large single-precision array that is accessed from an h5 file. If I take the mean along axes 0 and 1, I get wildly incorrect results unless I specify dtype='float64':

>T.shape
(4096, 4096, 720)
>T.dtype
dtype('<f4')

m1 = np.average(T, axis=(0,1))                #  garbage
m2 = np.mean(T, axis=(0,1))                   #  the same garbage
m3 = np.mean(T, axis=(0,1), dtype='float64')  # correct results

Unfortunately, unless you know what to look for, you can't necessarily tell your results are wrong. I will never use np.average again for this reason but will always use np.mean(.., dtype='float64') on any large array. If I want a weighted average, I'll compute it explicitly using the product of the weight vector and the target array and then either np.sum or np.mean, as appropriate (with appropriate precision as well).