Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculating local means in a 1D numpy array

I have 1D NumPy array as follows:

import numpy as np
d = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20])

I want to calculate means of (1,2,6,7), (3,4,8,9), and so on. This involves mean of 4 elements: Two consecutive elements and two consecutive elements 5 positions after.

I tried the following:

>> import scipy.ndimage.filters as filt
>> res = filt.uniform_filter(d,size=4)
>> print res
[ 1  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]

This unfortunately does not give me the desired results. How can I do it?

like image 610
Borys Avatar asked Jul 18 '15 14:07

Borys


2 Answers

Instead of indexing, you can approach this with a signal processing perspective. You are basically performing a discrete convolution of your input signal with a 7-tap kernel where the three centre coefficients are 0 while the extremities are 1, and since you want to compute the average, you need to multiply all of the values by (1/4). However, you're not computing the convolution of all of the elements but we will address that later. One way is to use scipy.ndimage.filters.convolve1d for that:

import numpy as np
from scipy.ndimage import filters
d = np.arange(1, 21, dtype=np.float)
ker = (1.0/4.0)*np.array([1,1,0,0,0,1,1], dtype=np.float)
out = filters.convolve1d(d, ker)[3:-3:2]

Because you're using a 7 tap kernel, convolution will extend the output by 3 to the left and 3 to the right, so you need to make sure to crop out the first and last three elements. You also want to skip every other element because convolution involves a sliding window, but you want to discard every other element so that you get the result you want.

We get this for out:

In [47]: out
Out[47]: array([  4.,   6.,   8.,  10.,  12.,  14.,  16.])

To double-check to see if we have the right result, try some sample calculations for each element. The first element is equal to (1+2+6+7)/4 = 4. The second element is equal to (3+4+8+9)/4 = 6, and so on.


For a solution with less headaches, try numpy.convolve with the mode=valid flag. This avoids the cutting out of the extra padding to the left and right, but you will still need to skip every other element though:

import numpy as np
d = np.arange(1, 21, dtype=np.float)
ker = (1.0/4.0)*np.array([1,1,0,0,0,1,1], dtype=np.float)
out = np.convolve(d, ker, mode='valid')[::2]

We also get:

In [59]: out
Out[59]: array([  4.,   6.,   8.,  10.,  12.,  14.,  16.])

Finally if you want indexing, something like this may suffice:

length = len(d[6::2])
out = np.array([(a+b+c+e)/4.0 for (a,b,c,e) in zip(d[::2][:length], d[1::2][:length], d[5::2][:length], d[6::2])])

We get:

In [69]: out
Out[69]: array([  4.,   6.,   8.,  10.,  12.,  14.,  16.])

This is really ugly, but it works. The total length of your signal is governed by the fact that the end of each window is at the 7th index. The length of this array that contains these indices dictates the final length of your signal. Also, note that for an element in a window, its next element can found by skipping every other element until the end of the array. There are 4 of these sequences in total and we simply zip over these 4 sequences where each sequence skips every other element, but there is an offset that we start at. The first sequence starts at offset 0, the next at 1, the next at 5 and the next at 6. We collect these four elements and average them, then skip over every one in the array until we finish.

BTW, I still like convolution better.

like image 86
rayryeng Avatar answered Sep 27 '22 16:09

rayryeng


You can use numpy.lib.stride_tricks.as_strided() to obtain a grouping array applicable for a more generic case:

import numpy as np
from numpy.lib.stride_tricks import as_strided

d = np.arange(1, 21)

consec = 2
offset = 5
nsub = 2
pace = 2

s = d.strides[0]
ngroups= (d.shape[0] - (consec + (nsub-1)*offset - 1))//pace
a = as_strided(d, shape=(ngroups, nsub, consec),
               strides=(pace*s, offset*s, 1*s))

Where:

  • consec is the number of consecutive numbers in the sub-group
  • offset the offset between the first entry in each sub-group
  • nsub the number of sub-groups (1, 2 is one sub-group, separated from the second sub-group 6, 7 by offset
  • pace indicates the stride between the first entry of two groups, which in your case is pace=consec, but could be different in a more general case

In your case (using the given values) a would be:

array([[[ 1,  2],
        [ 6,  7]],

       [[ 3,  4],
        [ 8,  9]],

       [[ 5,  6],
        [10, 11]],

       [[ 7,  8],
        [12, 13]],

       [[ 9, 10],
        [14, 15]],

       [[11, 12],
        [16, 17]],

       [[13, 14],
        [18, 19]]])

From where it is quite ready to obtain the desired average by doing:

a.mean(axis=-1).mean(axis=-1)

#array([  4.,   6.,   8.,  10.,  12.,  14.,  16.])
like image 36
Saullo G. P. Castro Avatar answered Sep 27 '22 16:09

Saullo G. P. Castro