I have a large numpy 3d array (10000, 3, 3). in which I would like to find the center coordinates of each region (clusters with the same number). Each sub-array can have 1, 2, 3 or 4 regions.
A subset of my array is:
largearray= array([[[1, 0, 0],
[0, 0, 2],
[3, 0, 2]],
[[0, 0, 4],
[0, 0, 4],
[0, 0, 4]],
[[5, 0, 0],
[5, 0, 6],
[0, 6, 6]],
[[7, 0, 8],
[0, 0, 0],
[9, 0,10]]])
The output that I would like would be the location of the subarray and the x and y coordinates representing the centers:
#output:
array([[ 0., 0., 0.],
[ 0., 1.5, 2.],
[ 0., 2., 0.],
[ 1., 1., 2.],
[ 2., 0.5, 0.],
[ 2., 1.66666667, 1.66666667],
[ 3., 0., 0.],
[ 3., 0., 2.],
[ 3., 2., 0.],
[ 3., 2., 2.]])
I am open to other outputs, but something like this would be awesome!
Thanks in advance!
Using functionality from the numpy_indexed package (disclaimer: I am its author), one can construct a fully vectorized solution (that is, no for-loops):
import numpy_indexed as npi
idx = np.indices(largearray.shape).reshape(largearray.ndim, largearray.size)
label, mean = npi.group_by(largearray, axis=None).mean(idx, axis=1)
For large input, this should be a lot more efficient.
Note that if the labels are not unique within each subarray (they appear to be in your example, but this is not explicitly stated), but you still want to take the mean per subarray only, you could simply write this:
(label, subarr), mean = npi.group_by((largearray.flatten(), idx[0])).mean(idx[1:], axis=1)
That is, a grouping by unique tuples of subarray-index and label.
You might also want to check out the numpy-groupies
package which deals with problems realted to this. [disclaimer: I am a co-author]. It should be faster than numpy-indexed
(a package mentioned in another answer) as it uses bincount
rather than argsort
and reduceat
.
However, your task here is simple enough that you could use bincount
directly:
s0, s1, s2 = a.shape
group_counts = np.bincount(a.ravel())
idx = np.broadcast_to(np.arange(s0).reshape([s0, 1, 1]), [s0,s1,s2])
group_sum_0 = np.bincount(a.ravel(), idx.ravel())
idx = np.broadcast_to(np.arange(s1).reshape([1, s1, 1]), [s0,s1,s2])
group_sum_1 = np.bincount(a.ravel(), idx.ravel())
idx = np.broadcast_to(np.arange(s2).reshape([1, 1, s2]), [s0,s1,s2])
group_sum_2 = np.bincount(a.ravel(), idx.ravel())
group_mean = np.vstack((group_sum_0, group_sum_1, group_sum_2)) / group_counts
group_mean.T[1:] # this is the output you show in the question
Or if you want to "cheat", you could just use one of the functions in ndimage.measurements from scipy.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With