Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find center coordinates of regions in a 3d numpy array

I have a large numpy 3d array (10000, 3, 3). in which I would like to find the center coordinates of each region (clusters with the same number). Each sub-array can have 1, 2, 3 or 4 regions.

A subset of my array is:

largearray= array([[[1, 0, 0],
    [0, 0, 2],
    [3, 0, 2]],

   [[0, 0, 4],
    [0, 0, 4],
    [0, 0, 4]],

   [[5, 0, 0],
    [5, 0, 6],
    [0, 6, 6]],

   [[7, 0, 8],
    [0, 0, 0],
    [9, 0,10]]])

The output that I would like would be the location of the subarray and the x and y coordinates representing the centers:

#output:
array([[ 0., 0., 0.],
[ 0., 1.5, 2.],
[ 0., 2., 0.],
[ 1., 1.,  2.],
[ 2., 0.5,  0.],
[ 2., 1.66666667, 1.66666667],
[ 3., 0., 0.],
[ 3., 0., 2.],
[ 3., 2., 0.],
[ 3., 2., 2.]])

I am open to other outputs, but something like this would be awesome!

Thanks in advance!

like image 275
Wilmar van Ommeren Avatar asked Jan 07 '23 04:01

Wilmar van Ommeren


2 Answers

Using functionality from the numpy_indexed package (disclaimer: I am its author), one can construct a fully vectorized solution (that is, no for-loops):

import numpy_indexed as npi
idx = np.indices(largearray.shape).reshape(largearray.ndim, largearray.size)
label, mean = npi.group_by(largearray, axis=None).mean(idx, axis=1)

For large input, this should be a lot more efficient.

Note that if the labels are not unique within each subarray (they appear to be in your example, but this is not explicitly stated), but you still want to take the mean per subarray only, you could simply write this:

(label, subarr), mean = npi.group_by((largearray.flatten(), idx[0])).mean(idx[1:], axis=1)

That is, a grouping by unique tuples of subarray-index and label.

like image 83
Eelco Hoogendoorn Avatar answered Jan 09 '23 20:01

Eelco Hoogendoorn


You might also want to check out the numpy-groupies package which deals with problems realted to this. [disclaimer: I am a co-author]. It should be faster than numpy-indexed (a package mentioned in another answer) as it uses bincount rather than argsort and reduceat.

However, your task here is simple enough that you could use bincount directly:

s0, s1, s2 = a.shape

group_counts = np.bincount(a.ravel())

idx = np.broadcast_to(np.arange(s0).reshape([s0, 1, 1]), [s0,s1,s2])
group_sum_0 = np.bincount(a.ravel(), idx.ravel()) 

idx = np.broadcast_to(np.arange(s1).reshape([1, s1, 1]), [s0,s1,s2])
group_sum_1 = np.bincount(a.ravel(), idx.ravel()) 

idx = np.broadcast_to(np.arange(s2).reshape([1, 1, s2]), [s0,s1,s2])
group_sum_2 = np.bincount(a.ravel(), idx.ravel()) 

group_mean = np.vstack((group_sum_0, group_sum_1, group_sum_2)) / group_counts

group_mean.T[1:] # this is the output you show in the question

Or if you want to "cheat", you could just use one of the functions in ndimage.measurements from scipy.

like image 33
dan-man Avatar answered Jan 09 '23 19:01

dan-man