Find center coordinates of regions in a 3d numpy array

Question

I have a large numpy 3d array (10000, 3, 3). in which I would like to find the center coordinates of each region (clusters with the same number). Each sub-array can have 1, 2, 3 or 4 regions.

A subset of my array is:

largearray= array([[[1, 0, 0],
    [0, 0, 2],
    [3, 0, 2]],

   [[0, 0, 4],
    [0, 0, 4],
    [0, 0, 4]],

   [[5, 0, 0],
    [5, 0, 6],
    [0, 6, 6]],

   [[7, 0, 8],
    [0, 0, 0],
    [9, 0,10]]])

The output that I would like would be the location of the subarray and the x and y coordinates representing the centers:

#output:
array([[ 0., 0., 0.],
[ 0., 1.5, 2.],
[ 0., 2., 0.],
[ 1., 1.,  2.],
[ 2., 0.5,  0.],
[ 2., 1.66666667, 1.66666667],
[ 3., 0., 0.],
[ 3., 0., 2.],
[ 3., 2., 0.],
[ 3., 2., 2.]])

I am open to other outputs, but something like this would be awesome!

Thanks in advance!

Eelco Hoogendoorn · Accepted Answer

Using functionality from the numpy_indexed package (disclaimer: I am its author), one can construct a fully vectorized solution (that is, no for-loops):

import numpy_indexed as npi
idx = np.indices(largearray.shape).reshape(largearray.ndim, largearray.size)
label, mean = npi.group_by(largearray, axis=None).mean(idx, axis=1)

For large input, this should be a lot more efficient.

Note that if the labels are not unique within each subarray (they appear to be in your example, but this is not explicitly stated), but you still want to take the mean per subarray only, you could simply write this:

(label, subarr), mean = npi.group_by((largearray.flatten(), idx[0])).mean(idx[1:], axis=1)

That is, a grouping by unique tuples of subarray-index and label.

dan-man · Answer

You might also want to check out the numpy-groupies package which deals with problems realted to this. [disclaimer: I am a co-author]. It should be faster than numpy-indexed (a package mentioned in another answer) as it uses bincount rather than argsort and reduceat.

However, your task here is simple enough that you could use bincount directly:

s0, s1, s2 = a.shape

group_counts = np.bincount(a.ravel())

idx = np.broadcast_to(np.arange(s0).reshape([s0, 1, 1]), [s0,s1,s2])
group_sum_0 = np.bincount(a.ravel(), idx.ravel()) 

idx = np.broadcast_to(np.arange(s1).reshape([1, s1, 1]), [s0,s1,s2])
group_sum_1 = np.bincount(a.ravel(), idx.ravel()) 

idx = np.broadcast_to(np.arange(s2).reshape([1, 1, s2]), [s0,s1,s2])
group_sum_2 = np.bincount(a.ravel(), idx.ravel()) 

group_mean = np.vstack((group_sum_0, group_sum_1, group_sum_2)) / group_counts

group_mean.T[1:] # this is the output you show in the question

Or if you want to "cheat", you could just use one of the functions in ndimage.measurements from scipy.

Find center coordinates of regions in a 3d numpy array

Tags:

python

arrays

pandas

numpy

scipy

Wilmar van Ommeren

2 Answers

Eelco Hoogendoorn

dan-man

Recent Activity

Donate For Us

Find center coordinates of regions in a 3d numpy array

Tags:

python

arrays

pandas

numpy

scipy

Wilmar van Ommeren

2 Answers

Eelco Hoogendoorn

dan-man

Related questions

Recent Activity

Donate For Us