Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast way to take average of every N rows in a .npy array

Tags:

python

numpy

I have a very large masked NumPy array (originalArray) with many rows and two columns. I want take the average of every two rows in originalArray and build a newArray in which each row is the average of two rows in originalArray (so newArray has half as many rows as originalArray). This should be a simple thing to do, but the script below is EXTREMELY slow. Any advice from the community would be greatly appreciated.

newList = []
for i in range(0, originalArray.shape[0], 2):
    r = originalArray[i:i+2,:].mean(axis=0)
    newList.append(r)
newArray = np.asarray(newList)

There must be a more elegant way of doing this. Many thanks!

like image 905
Emily Avatar asked May 21 '15 16:05

Emily


People also ask

How do you average a column in a NumPy array?

To calculate the average individually for each column of the 2Dimension matrix, use the function call numpy. average(array, axis=0) setting the axis parameter to 0. It will always return the mean value of the matrix.

How do you find the average of a NumPy array?

Using Numpy, you can calculate average of elements of total Numpy Array, or along some axis, or you can also calculate weighted average of elements. To find the average of an numpy array, you can use numpy. average() statistical function.

How is NP mean () different from NP average () in NumPy?

np. mean always computes an arithmetic mean, and has some additional options for input and output (e.g. what datatypes to use, where to place the result). np. average can compute a weighted average if the weights parameter is supplied.

How do you calculate the mean of each row in NumPy?

mean() to calculate mean values across dimensions in an array. Call numpy. ndarray. mean(axis=x) with x as 0 and then 1 to calculate the mean value of each column and then row in numpy.


2 Answers

Your problem (average of every two rows with two columns):

>>> a = np.reshape(np.arange(12),(6,2))
>>> a
array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11]])
>>> a.transpose().reshape(-1,2).mean(1).reshape(2,-1).transpose()
array([[  1.,   2.],
       [  5.,   6.],
       [  9.,  10.]])

Other dimensions (average of every four rows with three columns):

>>> a = np.reshape(np.arange(24),(8,3))
>>> a
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14],
       [15, 16, 17],
       [18, 19, 20],
       [21, 22, 23]])
>>> a.transpose().reshape(-1,4).mean(1).reshape(3,-1).transpose()
array([[  4.5,   5.5,   6.5],
       [ 16.5,  17.5,  18.5]])

General formula for taking the average of r rows for a 2D array a with c columns:

a.transpose().reshape(-1,r).mean(1).reshape(c,-1).transpose()
like image 39
Jona Avatar answered Nov 14 '22 21:11

Jona


The mean of two values a and b is 0.5*(a+b)
Therefore you can do it like this:

newArray = 0.5*(originalArray[0::2] + originalArray[1::2])

It will sum up all two consecutive rows and in the end multiply every element by 0.5.

Since in the title you are asking for avg over N rows, here is a more general solution:

def groupedAvg(myArray, N=2):
    result = np.cumsum(myArray, 0)[N-1::N]/float(N)
    result[1:] = result[1:] - result[:-1]
    return result

The general form of the average over n elements is sum([x1,x2,...,xn])/n. The sum of elements m to m+n in vector v is the same as subtracting the m-1th element from the m+nth element of cumsum(v). Unless m is 0, in that case you don't subtract anything (result[0]).
That is what we take advantage of here. Also since everything is linear, it is not important where we divide by N, so we do it right at the beginning, but that is just a matter of taste.

If the last group has less than N elements, it will be ignored completely. If you don't want to ignore it, you have to treat the last group specially:

def avg(myArray, N=2):
    cum = np.cumsum(myArray,0)
    result = cum[N-1::N]/float(N)
    result[1:] = result[1:] - result[:-1]

    remainder = myArray.shape[0] % N
    if remainder != 0:
        if remainder < myArray.shape[0]:
            lastAvg = (cum[-1]-cum[-1-remainder])/float(remainder)
        else:
            lastAvg = cum[-1]/float(remainder)
        result = np.vstack([result, lastAvg])

    return result
like image 192
swenzel Avatar answered Nov 14 '22 23:11

swenzel