I have a 2D data array and I'm trying to get a profile of values about its center in an efficient manner. So the output should be two one-dimensional arrays: one with the values of distances from the center, the other with the mean of all the values in the original 2D that are at that distance from the center.
Each index has a non-integer distance from the center, which prevents me from using some already known solutions for the problem. Allow me to explain.
Consider these matrices
data = np.random.randn(5,5)
L = 2
x = np.arange(-L,L+1,1)*2.5
y = np.arange(-L,L+1,1)*2.5
xx, yy = np.meshgrid(x, y)
r = np.sqrt(xx**2. + yy**2.)
So the matrices are
In [30]: r
Out[30]:
array([[ 7.07106781, 5.59016994, 5. , 5.59016994, 7.07106781],
[ 5.59016994, 3.53553391, 2.5 , 3.53553391, 5.59016994],
[ 5. , 2.5 , 0. , 2.5 , 5. ],
[ 5.59016994, 3.53553391, 2.5 , 3.53553391, 5.59016994],
[ 7.07106781, 5.59016994, 5. , 5.59016994, 7.07106781]])
In [31]: data
Out[31]:
array([[ 1.27603322, 1.33635284, 1.93093228, 0.76229675, -0.00956535],
[ 0.69556071, -1.70829753, 1.19615919, -1.32868665, 0.29679494],
[ 0.13097791, -1.33302719, 1.48226442, -0.76672223, -1.01836614],
[ 0.51334771, -0.83863115, -0.41541794, 0.34743342, 0.1199237 ],
[-1.02042539, 0.90739383, -2.4858624 , -0.07417987, 0.90748933]])
For this case the expected output should be array([ 0. , 2.5 , 3.53553391, 5. , 5.59016994, 7.07106781]) for the index of distances, and a second array of same length with the mean of all the values that are at those corresponding distances: array([ 0.98791323, -0.32496927, 0.37221219, -0.6209728 , 0.27986926, 0.04060628]).
From this answer there is a very nice function to compute the profile about any arbitrary point. However, the problem with his approach is that it approximates the distance r by the index distance. So his r for my case would be this:
array([[2, 2, 2, 2, 2],
[2, 1, 1, 1, 2],
[2, 1, 0, 1, 2],
[2, 1, 1, 1, 2],
[2, 2, 2, 2, 2]])
which is a pretty big difference for me, since I'm working with small matrices. This approximation, however, allows him to use np.bincount, which is pretty handy (but won't work for me).
I've been trying to expand this for float distance, like my version r, but so far no luck. bincount doesn't work with floats and histogram needs equally-spaced bins, which is not the case. Any suggestion?
Approach #1
def radial_profile_app1(data, r):
mid = data.shape[0]//2
ids = np.rint((r**2)/r[mid-1,mid]**2).astype(int).ravel()
count = np.bincount(ids)
R = data.shape[0]//2 # Radial profile radius
R0 = R+1
dists = np.unique(r[:R0,:R0][np.tril(np.ones((R0,R0),dtype=bool))])
mean_data = (np.bincount(ids, data.ravel())/count)[count!=0]
return dists, mean_data
For the given sample data -
In [475]: radial_profile_app1(data, r)
Out[475]:
(array([ 0. , 2.5 , 3.53553391, 5. , 5.59016994,
7.07106781]),
array([ 1.48226442 , -0.3297520425, -0.8820454775, -0.3605795875,
0.5696863263, 0.2883829525]))
Approach #2
def radial_profile_app2(data, r):
R = data.shape[0]//2 # Radial profile radius
range_arr = np.arange(-R,R+1)
ids = (range_arr[:,None]**2 + range_arr**2).ravel()
count = np.bincount(ids)
R0 = R+1
dists = np.unique(r[:R0,:R0][np.tril(np.ones((R0,R0),dtype=bool))])
mean_data = (np.bincount(ids, data.ravel())/count)[count!=0]
return dists, mean_data
Runtime test -
In [562]: # Setup inputs
...: N = 2001
...: data = np.random.randn(N,N)
...: L = (N-1)//2
...: x = np.arange(-L,L+1,1)*2.5
...: y = np.arange(-L,L+1,1)*2.5
...: xx, yy = np.meshgrid(x, y)
...: r = np.sqrt(xx**2. + yy**2.)
...:
In [563]: out01, out02 = radial_profile_app1(data, r)
...: out11, out12 = radial_profile_app2(data, r)
...:
...: print np.allclose(out01, out11)
...: print np.allclose(out02, out12)
...:
True
True
In [566]: %timeit radial_profile_app1(data, r)
...: %timeit radial_profile_app2(data, r)
...:
10 loops, best of 3: 114 ms per loop
10 loops, best of 3: 91.2 ms per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With