mean() function can be used to calculate mean/average of a given list of numbers. It returns mean of the data set passed as parameters. Arithmetic mean is the sum of data divided by the number of data-points.
mean() in Python. The sum of elements, along with an axis divided by the number of elements, is known as arithmetic mean. The numpy. mean() function is used to compute the arithmetic mean along the specified axis.
The numpy. average() function computes the weighted average of elements in an array according to their respective weight given in another array. The function can have an axis parameter. If the axis is not specified, the array is flattened.
average() in Python. The numpy module of Python provides a function called numpy. average(), used for calculating the weighted average along the specified axis.
np.average takes an optional weight parameter. If it is not supplied they are equivalent. Take a look at the source code: Mean, Average
np.mean:
try:
mean = a.mean
except AttributeError:
return _wrapit(a, 'mean', axis, dtype, out)
return mean(axis, dtype, out)
np.average:
...
if weights is None :
avg = a.mean(axis)
scl = avg.dtype.type(a.size/avg.size)
else:
#code that does weighted mean here
if returned: #returned is another optional argument
scl = np.multiply(avg, 0) + scl
return avg, scl
else:
return avg
...
np.mean
always computes an arithmetic mean, and has some additional options for input and output (e.g. what datatypes to use, where to place the result).
np.average
can compute a weighted average if the weights
parameter is supplied.
In some version of numpy there is another imporant difference that you must be aware:
average
do not take in account masks, so compute the average over the whole set of data.
mean
takes in account masks, so compute the mean only over unmasked values.
g = [1,2,3,55,66,77]
f = np.ma.masked_greater(g,5)
np.average(f)
Out: 34.0
np.mean(f)
Out: 2.0
In addition to the differences already noted, there's another extremely important difference that I just now discovered the hard way: unlike np.mean
, np.average
doesn't allow the dtype
keyword, which is essential for getting correct results in some cases. I have a very large single-precision array that is accessed from an h5
file. If I take the mean along axes 0 and 1, I get wildly incorrect results unless I specify dtype='float64'
:
>T.shape
(4096, 4096, 720)
>T.dtype
dtype('<f4')
m1 = np.average(T, axis=(0,1)) # garbage
m2 = np.mean(T, axis=(0,1)) # the same garbage
m3 = np.mean(T, axis=(0,1), dtype='float64') # correct results
Unfortunately, unless you know what to look for, you can't necessarily tell your results are wrong. I will never use np.average
again for this reason but will always use np.mean(.., dtype='float64')
on any large array. If I want a weighted average, I'll compute it explicitly using the product of the weight vector and the target array and then either np.sum
or np.mean
, as appropriate (with appropriate precision as well).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With