Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Minimum Distance Algorithm using GDAL and Python

I'm trying to implement the Minimum Distance Algorithm for image classification using GDAL and Python. After calculating the mean pixel-value of the sample areas and storing them into a list of arrays ("sample_array"), I read the image into an array called "values". With the following code I loop through this array:

values = valBD.ReadAsArray()

# loop through pixel columns
for X in range(0,XSize):

    # loop thorugh pixel lines
    for Y in range (0, YSize):

        # initialize variables
        minDist = 9999
        # get minimum distance
        for iSample in range (0, sample_count):
            # dist = calc_distance(values[jPixel, iPixel], sample_array[iSample])

            # computing minimum distance
            iPixelVal = values[Y, X]
            mean = sample_array[iSample]
            dist = math.sqrt((iPixelVal - mean) * (iPixelVal - mean)) # only for testing

            if dist < minDist:
                minDist = dist
                values[Y, X] = iSample

classBD.WriteArray(values, xoff=0, yoff=0)

This procedure takes very long for big images. That's why I want to ask if somebody knows a faster method. I don't know much about access-speed of different variables in python. Or maybe someone knows a libary I could use. Thanks in advance, Mario

like image 553
Mario Härtwig Avatar asked May 05 '11 20:05

Mario Härtwig


2 Answers

You should definitely be using NumPy. I work with some pretty large raster datasets and NumPy burns through them. On my machine, with the code below there's no noticeable delay for a 1000 x 1000 array. An explanation of how this works follows the code.

import numpy as np
from scipy.spatial.distance import cdist

# some starter data
dim = (1000,1000)
values = np.random.randint(0, 10, dim)

# cdist will want 'samples' as a 2-d array
samples = np.array([1, 2, 3]).reshape(-1, 1)

# this could be a one-liner
# 'values' must have the same number of columns as 'samples'
mins = cdist(values.reshape(-1, 1), samples)
outvalues = mins.argmin(axis=1).reshape(dim)

cdist() calculates the "distance" from each element in values to each of the elements in samples. This generates a 1,000,000 x 3 array, where each row n has the distance from pixel nin the original array to each of the sample values [1, 2, 3]. argmin(axis=1) gives you the index of the minimum value along each row, which is what you want. A quick reshape gives you the rectangular format you'd expect for an image.

like image 129
Robin Kraft Avatar answered Oct 03 '22 09:10

Robin Kraft


Agree with Thomas K: use PIL, or else write a C-function and wrap it using e.g. ctypes, or at very least use some numPy matrix operations. Or else use pypy on your existing code (JIT-compiled code can be 100x faster, on image code). Try pypy and tell us what speedup you got.

Bottom line: never do stuff pixel-wise like this natively in cPython, the interpreting and memory-mgt overhead will kill you.

like image 37
smci Avatar answered Oct 03 '22 09:10

smci