Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python numpy: sum every 3 rows (converting monthly to quarterly)

Tags:

python

numpy

I have a set of one-dimensional numpy arrays with monthly data. I need to aggregate them by quarter, creating a new array where the first item is the sum of the first 3 items of the old array, etc.

I am using this function, with x =3 :

def sumeveryxrows(myarray,x):
     return([sum(myarray[x*n:x*n+x]) for n in range( int(len(myarray)/x))])

It works, but can you think of a faster way? I profiled it, and 97% of the time is spent doing __getitem__

like image 853
Pythonista anonymous Avatar asked Jul 09 '15 13:07

Pythonista anonymous


2 Answers

You could use reshape (assuming your array has a size multiple of x):

sumeveryxrows = lambda myarray, x: myarray.reshape((myarray.shape[0] / x, x)).sum(1)

The above takes less than .3s on an array with 30000000 values:

>>> a = numpy.random.rand(30000000)
>>> cProfile.run('sumeveryxrows(a, 3)')
         8 function calls in 0.263 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.258    0.258 <stdin>:1(<lambda>)
        1    0.005    0.005    0.263    0.263 <string>:1(<module>)
        1    0.000    0.000    0.258    0.258 _methods.py:31(_sum)
        1    0.000    0.000    0.263    0.263 {built-in method exec}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.258    0.258    0.258    0.258 {method 'reduce' of 'numpy.ufunc' objects}
        1    0.000    0.000    0.000    0.000 {method 'reshape' of 'numpy.ndarray' objects}
        1    0.000    0.000    0.258    0.258 {method 'sum' of 'numpy.ndarray' objects}
like image 110
Holt Avatar answered Oct 17 '22 17:10

Holt


another solution may be

def sumeveryxrows(myarray, x):
  return [sum(myarray[n: n+x]) for n in xrange(0, len(myarray), x)]

This is for python 2.x. If your using python 3 replace xrange with range. xrange uses an iterator rather than generating an entire list. You can also specify a step. This removes the need to use multiplication.

Then of course there is always the non-python way to do it (specifically for 3).

def sumevery3rows(a):
  i = 0
  ret = []
  stop = len(a) - 2
  while i < stop:
    ret.append(a[i] + a[i+1] + a[i+2])
    i += 3
  if i != len(a):
    ret.append(sum(a[i:len(a)]))
  return ret

I don't know how well this performs, and an implementation for variable x would probably make any benefits of this solution non-existent.

like image 1
Timothy Murphy Avatar answered Oct 17 '22 15:10

Timothy Murphy