I have a very large masked NumPy array (<code>originalArray</code>) with many rows and two columns. I want take the average of every two rows in <code>originalArray</code> and build a <code>newArray</code> in which each row is the average of two rows in <code>originalArray</code> (so <code>newArray</code> has half as many rows as <code>originalArray</code>). This should be a simple thing to do, but the script below is EXTREMELY slow. Any advice from the community would be greatly appreciated. <pre class="prettyprint"><code>newList = [] for i in range(0, originalArray.shape[0], 2): r = originalArray[i:i+2,:].mean(axis=0) newList.append(r) newArray = np.asarray(newList) </code></pre> There must be a more elegant way of doing this. Many thanks!

Your problem (average of every two rows with two columns): <pre class="prettyprint"><code>>>> a = np.reshape(np.arange(12),(6,2)) >>> a array([[ 0, 1], [ 2, 3], [ 4, 5], [ 6, 7], [ 8, 9], [10, 11]]) >>> a.transpose().reshape(-1,2).mean(1).reshape(2,-1).transpose() array([[ 1., 2.], [ 5., 6.], [ 9., 10.]]) </code></pre> Other dimensions (average of every four rows with three columns): <pre class="prettyprint"><code>>>> a = np.reshape(np.arange(24),(8,3)) >>> a array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11], [12, 13, 14], [15, 16, 17], [18, 19, 20], [21, 22, 23]]) >>> a.transpose().reshape(-1,4).mean(1).reshape(3,-1).transpose() array([[ 4.5, 5.5, 6.5], [ 16.5, 17.5, 18.5]]) </code></pre> General formula for taking the average of r rows for a 2D array a with c columns: <pre class="prettyprint"><code>a.transpose().reshape(-1,r).mean(1).reshape(c,-1).transpose() </code></pre>

The mean of two values <code>a</code> and <code>b</code> is <code>0.5*(a+b)</code> Therefore you can do it like this: <pre class="prettyprint"><code>newArray = 0.5*(originalArray[0::2] + originalArray[1::2]) </code></pre> It will sum up all two consecutive rows and in the end multiply every element by <code>0.5</code>. Since in the title you are asking for avg over N rows, here is a more general solution: <pre class="prettyprint"><code>def groupedAvg(myArray, N=2): result = np.cumsum(myArray, 0)[N-1::N]/float(N) result[1:] = result[1:] - result[:-1] return result </code></pre> The general form of the average over <code>n</code> elements is <code>sum([x1,x2,...,xn])/n</code>. The sum of elements <code>m</code> to <code>m+n</code> in vector <code>v</code> is the same as subtracting the <code>m-1</code>th element from the <code>m+n</code>th element of <code>cumsum(v)</code>. Unless <code>m</code> is 0, in that case you don't subtract anything (result[0]). That is what we take advantage of here. Also since everything is linear, it is not important where we divide by <code>N</code>, so we do it right at the beginning, but that is just a matter of taste. If the last group has less than <code>N</code> elements, it will be ignored completely. If you don't want to ignore it, you have to treat the last group specially: <pre class="prettyprint"><code>def avg(myArray, N=2): cum = np.cumsum(myArray,0) result = cum[N-1::N]/float(N) result[1:] = result[1:] - result[:-1] remainder = myArray.shape[0] % N if remainder != 0: if remainder < myArray.shape[0]: lastAvg = (cum[-1]-cum[-1-remainder])/float(remainder) else: lastAvg = cum[-1]/float(remainder) result = np.vstack([result, lastAvg]) return result </code></pre>

Fast way to take average of every N rows in a .npy array

Tags:

python

numpy

I have a very large masked NumPy array (originalArray) with many rows and two columns. I want take the average of every two rows in originalArray and build a newArray in which each row is the average of two rows in originalArray (so newArray has half as many rows as originalArray). This should be a simple thing to do, but the script below is EXTREMELY slow. Any advice from the community would be greatly appreciated.

newList = []
for i in range(0, originalArray.shape[0], 2):
    r = originalArray[i:i+2,:].mean(axis=0)
    newList.append(r)
newArray = np.asarray(newList)

There must be a more elegant way of doing this. Many thanks!

905

asked May 21 '15 16:05

Emily

2 Answers

Your problem (average of every two rows with two columns):

>>> a = np.reshape(np.arange(12),(6,2))
>>> a
array([[ 0,  1],
       [ 2,  3],
       [ 4,  5],
       [ 6,  7],
       [ 8,  9],
       [10, 11]])
>>> a.transpose().reshape(-1,2).mean(1).reshape(2,-1).transpose()
array([[  1.,   2.],
       [  5.,   6.],
       [  9.,  10.]])

Other dimensions (average of every four rows with three columns):

>>> a = np.reshape(np.arange(24),(8,3))
>>> a
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14],
       [15, 16, 17],
       [18, 19, 20],
       [21, 22, 23]])
>>> a.transpose().reshape(-1,4).mean(1).reshape(3,-1).transpose()
array([[  4.5,   5.5,   6.5],
       [ 16.5,  17.5,  18.5]])

General formula for taking the average of r rows for a 2D array a with c columns:

a.transpose().reshape(-1,r).mean(1).reshape(c,-1).transpose()

answered Nov 14 '22 21:11

Jona

The mean of two values a and b is 0.5*(a+b)
Therefore you can do it like this:

newArray = 0.5*(originalArray[0::2] + originalArray[1::2])

It will sum up all two consecutive rows and in the end multiply every element by 0.5.

Since in the title you are asking for avg over N rows, here is a more general solution:

def groupedAvg(myArray, N=2):
    result = np.cumsum(myArray, 0)[N-1::N]/float(N)
    result[1:] = result[1:] - result[:-1]
    return result

The general form of the average over n elements is sum([x1,x2,...,xn])/n. The sum of elements m to m+n in vector v is the same as subtracting the m-1th element from the m+nth element of cumsum(v). Unless m is 0, in that case you don't subtract anything (result[0]).
That is what we take advantage of here. Also since everything is linear, it is not important where we divide by N, so we do it right at the beginning, but that is just a matter of taste.

If the last group has less than N elements, it will be ignored completely. If you don't want to ignore it, you have to treat the last group specially:

def avg(myArray, N=2):
    cum = np.cumsum(myArray,0)
    result = cum[N-1::N]/float(N)
    result[1:] = result[1:] - result[:-1]

    remainder = myArray.shape[0] % N
    if remainder != 0:
        if remainder < myArray.shape[0]:
            lastAvg = (cum[-1]-cum[-1-remainder])/float(remainder)
        else:
            lastAvg = cum[-1]/float(remainder)
        result = np.vstack([result, lastAvg])

    return result

192

answered Nov 14 '22 23:11

swenzel

Related questions
                            
                                Get the value of a ctypes.c_ulong pointer?
                            
                                Is it possible to perform a parameter sensitivity analysis using python?
                            
                                normalize a matrix row-wise in theano
                            
                                Installing numpy from wheel format: "...is not a supported wheel on this platform"
                            
                                Pyenv not auto activating
                            
                                python pandas dataframe join two dataframes [duplicate]
                            
                                How to set alpha value of a pixel in Python
                            
                                Extract title tag with BeautifulSoup
                            
                                Multiple linear regression in pandas statsmodels: ValueError
                            
                                Python / pypyODBC: Row Insert Using String and NULLs
                            
                                Display name of a choice field in Django while using annotate
                            
                                Generating random ID from list - jinja
                            
                                Can't install python-ldap via pip
                            
                                Is there a way to correctly remove the tense or plural from a word?
                            
                                What is the proper way to print a nested list with the highest value in Python
                            
                                How to count the number of digits in numbers in different bases?
                            
                                How to manage django and flask application sharing one database model?
                            
                                Get element at position with Selenium
                            
                                Trouble when using alembic with sqlalchemy_utils
                            
                                Mutiplication of n functions

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With