Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast way to turn a labeled image into a dictionary of { label : [coordinates] }

Say I've labeled an image with scipy.ndimage.measurements.label like so:

[[0, 1, 0, 0, 0, 0],
 [0, 1, 0, 0, 0, 0],
 [0, 1, 0, 0, 0, 0],
 [0, 0, 0, 0, 3, 0],
 [2, 2, 0, 0, 0, 0],
 [2, 2, 0, 0, 0, 0]]

What's a fast way to collect the coordinates belonging to each label? I.e. something like:

{ 1: [[0, 1], [1, 1], [2, 1]],
  2: [[4, 0], [4, 1], [5, 0], [5, 1]],
  3: [[3, 4]] }

I'm working with images that are ~15,000 x 5000 pixels in size, and roughly half of each image's pixels are labeled (i.e. non-zero).

Rather than iterating through the entire image with nditer, would it be faster to do something like np.where(img == label) for each label?

EDIT:

Which algorithm is fastest depends on how big the labeled image is as compared to how many labels it has. Warren Weckesser and Salvador Dali / BHAT IRSHAD's methods (which are based on np.nonzero and np.where) all seem to scale linearly with the number of labels, whereas iterating through each image element with nditer obviously scales linearly with the size of labeled image.

The results of a small test:

size: 1000 x 1000, num_labels: 10
weckesser ... 0.214357852936s 
dali ... 0.650229930878s 
nditer ... 6.53645992279s 


size: 1000 x 1000, num_labels: 100
weckesser ... 0.936990022659s 
dali ... 1.33582305908s 
nditer ... 6.81486487389s 


size: 1000 x 1000, num_labels: 1000
weckesser ... 8.43906402588s 
dali ... 9.81333303452s 
nditer ... 7.47897100449s 


size: 1000 x 1000, num_labels: 10000
weckesser ... 100.405524015s 
dali ... 118.17239809s 
nditer ... 9.14583897591s

So the question becomes more specific:

For labeled images in which the number of labels is on the order of sqrt(size(image)) is there an algorithm to gather label coordinates that is faster than iterating through every image element (i.e. with nditer)?

like image 455
bennlich Avatar asked Sep 23 '15 20:09

bennlich


2 Answers

Here's a possibility:

import numpy as np

a = np.array([[0, 1, 0, 0, 0, 0],
              [0, 1, 0, 0, 0, 0],
              [0, 1, 0, 0, 0, 0],
              [0, 0, 0, 0, 3, 0],
              [2, 2, 0, 0, 0, 0],
              [2, 2, 0, 0, 0, 0]])

# If the array was computed using scipy.ndimage.measurements.label, you
# already know how many labels there are.
num_labels = 3

nz = np.nonzero(a)
coords = np.column_stack(nz)
nzvals = a[nz[0], nz[1]]
res = {k:coords[nzvals == k] for k in range(1, num_labels + 1)}

I called this script get_label_indices.py. Here's a sample run:

In [97]: import pprint

In [98]: run get_label_indices.py

In [99]: pprint.pprint(res)
{1: array([[0, 1],
       [1, 1],
       [2, 1]]),
 2: array([[4, 0],
       [4, 1],
       [5, 0],
       [5, 1]]),
 3: array([[3, 4]])}
like image 184
Warren Weckesser Avatar answered Nov 20 '22 01:11

Warren Weckesser


You can do something like this (let img is your original nd.array)

res = {}
for i in np.unique(img)[1:]:
  x, y = np.where(a == i)
  res[i] = zip(list(x), list(y))

which will give you what you want:

{
 1: [(0, 1), (1, 1), (2, 1)],
 2: [(4, 0), (4, 1), (5, 0), (5, 1)],
 3: [(3, 4)]
}

Whether it will be faster - is up to the benchmark to determine.

Per Warren's suggestion, I do not need to use unique and can just do

res = {}
for i in range(1, num_labels + 1)
    x, y = np.where(a == i)
    res[i] = zip(list(x), list(y))
like image 2
Salvador Dali Avatar answered Nov 20 '22 00:11

Salvador Dali