I'm trying to vectorize a sliding window operation. For the 1-d case a helpful example could go along the lines of:
x= vstack((np.array([range(10)]),np.array([range(10)])))
x[1,:]=np.where((x[0,:]<5)&(x[0,:]>0),x[1,x[0,:]+1],x[1,:])
The n+1 value for each current value for indices <5. But I get this error:
x[1,:]=np.where((x[0,:]<2)&(x[0,:]>0),x[1,x[0,:]+1],x[1,:])
IndexError: index (10) out of range (0<=index<9) in dimension 1
Curiously I wouldn't get this error for the n-1 value which would mean indices smaller than 0. It doesn't seem to mind:
x[1,:]=np.where((x[0,:]<5)&(x[0,:]>0),x[1,x[0,:]-1],x[1,:])
print(x)
[[0 1 2 3 4 5 6 7 8 9]
[0 0 1 2 3 5 6 7 8 9]]
Is there anyway around this? is my approach totally wrong? any comments would be appreciated.
EDIT :
This is what I would like to achieve, I flatten a matrix to an numpy array on which I want to calculate the mean of the 6x6 neighborhood of each cell:
matriz = np.array([[1,2,3,4,5],
[6,5,4,3,2],
[1,1,2,2,3],
[3,3,2,2,1],
[3,2,1,3,2],
[1,2,3,1,2]])
# matrix to vector
vector2 = ndarray.flatten(matriz)
ncols = int(shape(matriz)[1])
nrows = int(shape(matriz)[0])
vector = np.zeros(nrows*ncols,dtype='float64')
# Interior pixels
if ( (i % ncols) != 0 and (i+1) % ncols != 0 and i>ncols and i<ncols*(nrows-1)):
vector[i] = np.mean(np.array([vector2[i-ncols-1],vector2[i-ncols],vector2[i-ncols+1],vector2[i-1],vector2[i+1],vector2[i+ncols-1],vector2[i+ncols],vector2[i+ncols+1]]))
If I understand the problem correctly you would like to take the mean of all numbers 1 step around the index, neglecting the index.
I have patched your function to work, I believe you were going for something like this:
def original(matriz):
vector2 = np.ndarray.flatten(matriz)
nrows, ncols= matriz.shape
vector = np.zeros(nrows*ncols,dtype='float64')
# Interior pixels
for i in range(vector.shape[0]):
if ( (i % ncols) != 0 and (i+1) % ncols != 0 and i>ncols and i<ncols*(nrows-1)):
vector[i] = np.mean(np.array([vector2[i-ncols-1],vector2[i-ncols],\
vector2[i-ncols+1],vector2[i-1],vector2[i+1],\
vector2[i+ncols-1],vector2[i+ncols],vector2[i+ncols+1]]))
I rewrote this using using slicing and views:
def mean_around(arr):
arr=arr.astype(np.float64)
out= np.copy(arr[:-2,:-2]) #Top left corner
out+= arr[:-2,2:] #Top right corner
out+= arr[:-2,1:-1] #Top center
out+= arr[2:,:-2] #etc
out+= arr[2:,2:]
out+= arr[2:,1:-1]
out+= arr[1:-1,2:]
out+= arr[1:-1,:-2]
out/=8.0 #Divide by # of elements to obtain mean
cout=np.empty_like(arr) #Create output array
cout[1:-1,1:-1]=out #Fill with out values
cout[0,:]=0;cout[-1,:]=0;cout[:,0]=0;cout[:,-1]=0 #Set edges equal to zero
return cout
Using np.empty_like
and then filling the edges seemed slightly faster then np.zeros_like
. First lets double check they give the same thing using your matriz
array.
print np.allclose(mean_around(matriz),original(matriz))
True
print mean_around(matriz)
[[ 0. 0. 0. 0. 0. ]
[ 0. 2.5 2.75 3.125 0. ]
[ 0. 3.25 2.75 2.375 0. ]
[ 0. 1.875 2. 2. 0. ]
[ 0. 2.25 2.25 1.75 0. ]
[ 0. 0. 0. 0. 0. ]]
Some timings:
a=np.random.rand(500,500)
print np.allclose(original(a),mean_around(a))
True
%timeit mean_around(a)
100 loops, best of 3: 4.4 ms per loop
%timeit original(a)
1 loops, best of 3: 6.6 s per loop
Roughly ~1500x speedup.
Looks like a good place to use numba:
def mean_numba(arr):
out=np.zeros_like(arr)
col,rows=arr.shape
for x in xrange(1,col-1):
for y in xrange(1,rows-1):
out[x,y]=(arr[x-1,y+1]+arr[x-1,y]+arr[x-1,y-1]+arr[x,y+1]+\
arr[x,y-1]+arr[x+1,y+1]+arr[x+1,y]+arr[x+1,y-1])/8.
return out
nmean= autojit(mean_numba)
Now lets compare against all presented methods.
a=np.random.rand(5000,5000)
%timeit mean_around(a)
1 loops, best of 3: 729 ms per loop
%timeit nmean(a)
10 loops, best of 3: 169 ms per loop
#CT Zhu's answer
%timeit it_mean(a)
1 loops, best of 3: 36.7 s per loop
#Ali_m's answer
%timeit fast_local_mean(a,(3,3))
1 loops, best of 3: 4.7 s per loop
#lmjohns3's answer
%timeit scipy_conv(a)
1 loops, best of 3: 3.72 s per loop
A 4x speed with numba up is pretty nominal indicating that the numpy code is about as good as its going to get. I pulled the other codes as presented, although I did have to change @CTZhu's answer to include different array sizes.
It sounds like you're trying to compute a 2D convolution. If you are able to use scipy
, I would suggest trying scipy.signal.convolve2d:
matriz = np.random.randn(10, 10)
# to average a 3x3 neighborhood
kernel = np.ones((3, 3), float)
# to compute the mean, divide by size of neighborhood
kernel /= kernel.sum()
average = scipy.signal.convolve2d(matriz, kernel)
The reason this computes the mean of all 3x3 neighborhoods can be seen if you "unroll" convolve2d into its constituent loops. Effectively (and ignoring what happens at the edges of the source and kernel arrays), it is computing :
X, Y = kernel.shape
for i in range(matriz.shape[0]):
for j in range(matriz.shape[1]):
for ii in range(X):
for jj in range(Y):
average[i, j] += kernel[ii, jj] * matriz[i+ii, j+jj]
So if every value in your kernel is 1/(1+1+1+1+1+1+1+1+1) == 1/9, you can rewrite the code above as :
for i in range(matriz.shape[0]):
for j in range(matriz.shape[1]):
average[i, j] = 1./9 * matriz[i:i+X, j:j+Y].sum()
Which is exactly the same as computing the average of the values in matriz, over a 3x3 area, starting at i, j
.
One advantage of doing things this way is that you can easily change the weights associated with your neighborhood by setting values in your kernel appropriately. So, for example, if you wanted to give the center value in each neighborhood twice as much weight as the others, you could build your kernel like this :
kernel = np.ones((3, 3), float)
kernel[1, 1] = 2.
kernel /= kernel.sum()
and the convolution code would remain the same, but the computation would yield a different type of average (a "center-weighted" one). There are a lot of possibilities here ; hopefully this provides a nice abstraction for the task you're doing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With