Say I've labeled an image with scipy.ndimage.measurements.label like so:
[[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 3, 0],
[2, 2, 0, 0, 0, 0],
[2, 2, 0, 0, 0, 0]]
What's a fast way to collect the coordinates belonging to each label? I.e. something like:
{ 1: [[0, 1], [1, 1], [2, 1]],
2: [[4, 0], [4, 1], [5, 0], [5, 1]],
3: [[3, 4]] }
I'm working with images that are ~15,000 x 5000 pixels in size, and roughly half of each image's pixels are labeled (i.e. non-zero).
Rather than iterating through the entire image with nditer
, would it be faster to do something like np.where(img == label)
for each label?
EDIT:
Which algorithm is fastest depends on how big the labeled image is as compared to how many labels it has. Warren Weckesser and Salvador Dali / BHAT IRSHAD's methods (which are based on np.nonzero
and np.where
) all seem to scale linearly with the number of labels, whereas iterating through each image element with nditer
obviously scales linearly with the size of labeled image.
The results of a small test:
size: 1000 x 1000, num_labels: 10
weckesser ... 0.214357852936s
dali ... 0.650229930878s
nditer ... 6.53645992279s
size: 1000 x 1000, num_labels: 100
weckesser ... 0.936990022659s
dali ... 1.33582305908s
nditer ... 6.81486487389s
size: 1000 x 1000, num_labels: 1000
weckesser ... 8.43906402588s
dali ... 9.81333303452s
nditer ... 7.47897100449s
size: 1000 x 1000, num_labels: 10000
weckesser ... 100.405524015s
dali ... 118.17239809s
nditer ... 9.14583897591s
So the question becomes more specific:
For labeled images in which the number of labels is on the order of sqrt(size(image))
is there an algorithm to gather label coordinates that is faster than iterating through every image element (i.e. with nditer
)?
Here's a possibility:
import numpy as np
a = np.array([[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 3, 0],
[2, 2, 0, 0, 0, 0],
[2, 2, 0, 0, 0, 0]])
# If the array was computed using scipy.ndimage.measurements.label, you
# already know how many labels there are.
num_labels = 3
nz = np.nonzero(a)
coords = np.column_stack(nz)
nzvals = a[nz[0], nz[1]]
res = {k:coords[nzvals == k] for k in range(1, num_labels + 1)}
I called this script get_label_indices.py
. Here's a sample run:
In [97]: import pprint
In [98]: run get_label_indices.py
In [99]: pprint.pprint(res)
{1: array([[0, 1],
[1, 1],
[2, 1]]),
2: array([[4, 0],
[4, 1],
[5, 0],
[5, 1]]),
3: array([[3, 4]])}
You can do something like this (let img
is your original nd.array)
res = {}
for i in np.unique(img)[1:]:
x, y = np.where(a == i)
res[i] = zip(list(x), list(y))
which will give you what you want:
{
1: [(0, 1), (1, 1), (2, 1)],
2: [(4, 0), (4, 1), (5, 0), (5, 1)],
3: [(3, 4)]
}
Whether it will be faster - is up to the benchmark to determine.
Per Warren's suggestion, I do not need to use unique and can just do
res = {}
for i in range(1, num_labels + 1)
x, y = np.where(a == i)
res[i] = zip(list(x), list(y))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With