I am trying to group a numpy array into smaller size by taking average of the elements. Such as take average foreach 5x5 sub-arrays in a 100x100 array to create a 20x20 size array. As I have a huge data need to manipulate, is that an efficient way to do that?
To calculate the average of all values in a 2 dimensional NumPy array called matrix, use the numpy. average(matrix) function. The output will display a numpy array that has three average values, one per column of the input given array.
Using Numpy, you can calculate average of elements of total Numpy Array, or along some axis, or you can also calculate weighted average of elements. To find the average of an numpy array, you can use numpy. average() statistical function.
Having a data type (dtype) is one of the key features that distinguishes NumPy arrays from lists. In lists, the types of elements can be mixed.
I have tried this for smaller array, so test it with yours:
import numpy as np nbig = 100 nsmall = 20 big = np.arange(nbig * nbig).reshape([nbig, nbig]) # 100x100 small = big.reshape([nsmall, nbig//nsmall, nsmall, nbig//nsmall]).mean(3).mean(1)
An example with 6x6 -> 3x3:
nbig = 6 nsmall = 3 big = np.arange(36).reshape([6,6]) array([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35]]) small = big.reshape([nsmall, nbig//nsmall, nsmall, nbig//nsmall]).mean(3).mean(1) array([[ 3.5, 5.5, 7.5], [ 15.5, 17.5, 19.5], [ 27.5, 29.5, 31.5]])
This is pretty straightforward, although I feel like it could be faster:
from __future__ import division
import numpy as np
Norig = 100
Ndown = 20
step = Norig//Ndown
assert step == Norig/Ndown # ensure Ndown is an integer factor of Norig
x = np.arange(Norig*Norig).reshape((Norig,Norig)) #for testing
y = np.empty((Ndown,Ndown)) # for testing
for yr,xr in enumerate(np.arange(0,Norig,step)):
for yc,xc in enumerate(np.arange(0,Norig,step)):
y[yr,yc] = np.mean(x[xr:xr+step,xc:xc+step])
You might also find scipy.signal.decimate interesting. It applies a more sophisticated low-pass filter than simple averaging before downsampling the data, although you'd have to decimate one axis, then the other.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With