I have a very large masked NumPy array (originalArray
) with many rows and two columns. I want take the average of every two rows in originalArray
and build a newArray
in which each row is the average of two rows in originalArray
(so newArray
has half as many rows as originalArray
). This should be a simple thing to do, but the script below is EXTREMELY slow. Any advice from the community would be greatly appreciated.
newList = []
for i in range(0, originalArray.shape[0], 2):
r = originalArray[i:i+2,:].mean(axis=0)
newList.append(r)
newArray = np.asarray(newList)
There must be a more elegant way of doing this. Many thanks!
To calculate the average individually for each column of the 2Dimension matrix, use the function call numpy. average(array, axis=0) setting the axis parameter to 0. It will always return the mean value of the matrix.
Using Numpy, you can calculate average of elements of total Numpy Array, or along some axis, or you can also calculate weighted average of elements. To find the average of an numpy array, you can use numpy. average() statistical function.
np. mean always computes an arithmetic mean, and has some additional options for input and output (e.g. what datatypes to use, where to place the result). np. average can compute a weighted average if the weights parameter is supplied.
mean() to calculate mean values across dimensions in an array. Call numpy. ndarray. mean(axis=x) with x as 0 and then 1 to calculate the mean value of each column and then row in numpy.
Your problem (average of every two rows with two columns):
>>> a = np.reshape(np.arange(12),(6,2))
>>> a
array([[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11]])
>>> a.transpose().reshape(-1,2).mean(1).reshape(2,-1).transpose()
array([[ 1., 2.],
[ 5., 6.],
[ 9., 10.]])
Other dimensions (average of every four rows with three columns):
>>> a = np.reshape(np.arange(24),(8,3))
>>> a
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17],
[18, 19, 20],
[21, 22, 23]])
>>> a.transpose().reshape(-1,4).mean(1).reshape(3,-1).transpose()
array([[ 4.5, 5.5, 6.5],
[ 16.5, 17.5, 18.5]])
General formula for taking the average of r rows for a 2D array a with c columns:
a.transpose().reshape(-1,r).mean(1).reshape(c,-1).transpose()
The mean of two values a
and b
is 0.5*(a+b)
Therefore you can do it like this:
newArray = 0.5*(originalArray[0::2] + originalArray[1::2])
It will sum up all two consecutive rows and in the end multiply every element by 0.5
.
Since in the title you are asking for avg over N rows, here is a more general solution:
def groupedAvg(myArray, N=2):
result = np.cumsum(myArray, 0)[N-1::N]/float(N)
result[1:] = result[1:] - result[:-1]
return result
The general form of the average over n
elements is sum([x1,x2,...,xn])/n
.
The sum of elements m
to m+n
in vector v
is the same as subtracting the m-1
th element from the m+n
th element of cumsum(v)
. Unless m
is 0, in that case you don't subtract anything (result[0]).
That is what we take advantage of here. Also since everything is linear, it is not important where we divide by N
, so we do it right at the beginning, but that is just a matter of taste.
If the last group has less than N
elements, it will be ignored completely.
If you don't want to ignore it, you have to treat the last group specially:
def avg(myArray, N=2):
cum = np.cumsum(myArray,0)
result = cum[N-1::N]/float(N)
result[1:] = result[1:] - result[:-1]
remainder = myArray.shape[0] % N
if remainder != 0:
if remainder < myArray.shape[0]:
lastAvg = (cum[-1]-cum[-1-remainder])/float(remainder)
else:
lastAvg = cum[-1]/float(remainder)
result = np.vstack([result, lastAvg])
return result
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With