Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently implementing erode/dilate

So normally and very inefficiently min/max filter is implemented by using four for loops.

for( index1 < dy ) { // y loop
    for( index2 < dx ) { // x loop
        for( index3 < StructuringElement.dy() ) { // kernel y
            for( index4 < StructuringElement.dx() ) { // kernel x
                pixel = src(index3+index4);
                val = (pixel > val) ? pixel : val; // max
            }
        }
        dst(index2, index1) = val;
    }
}

However this approach is damn inefficient since it checks again previously checked values. So I am wondering what methods are there to implement this with using previously checked values on next iteration?

Any assumptions regarding structuring element size/point of origin can be made.

Update: I am especially keen to know any insights of this or kind of implementation: http://dl.acm.org/citation.cfm?id=2114689

like image 623
ckain Avatar asked Feb 18 '14 12:02

ckain


2 Answers

I have been following this question for some time, hoping someone would write a fleshed-out answer, since I am pondering the same problem.

Here is my own attempt so far; I have not tested this, but I think you can do repeated dilation and erosion with any structuring element, by only accessing each pixel twice:

Assumptions: Assume the structuring element/kernel is a KxL rectangle and the image is a NxM rectangle. Assume that K and L are odd.

The basic approach you outlined has four for loops and takes O(K*L*N*M) time to complete.

Often you want to dilate repeatedly with the same kernel, so the time is again multiplied by the desired number of dilations.

I have three basic ideas for speeding up the dilation:

  1. dilation by a KxL kernel is equal to dilation by a Kx1 kernel followed by dilation by a 1xL kernel. You can do both of these dilations with only three for loops, in O(KNM) and O(LNM)

  2. However you can do a dilation with a Kx1 kernel much faster: You only need to access each pixel once. For this you need a particular data structure, explained below. This allows you to do a single dilation in O(N*M), regardless of the kernel size

  3. repeated dilation by a Kx1 kernel is equal to a single dilation by a larger kernel. If you dilate P times with a Kx1 kernel, this is equal to a single dilation with a ((K-1)*P + 1) x 1 kernel. So you can do repeated dilation with any kernel size in a single pass, in O(N*M) time.


Now for a detailed description of step 2.
You need a queue with the following properties:

  • push an element to the back of the queue in constant time.
  • pop an element from the front of the queue in constant time.
  • query the current smallest or largest element in the queue in constant time.

How to build such a queue is described in this stackoverflow answer: Implement a queue in which push_rear(), pop_front() and get_min() are all constant time operations. Unfortunately not much pseudocode, but the basic idea seems sound.

Using such a queue, you can calculate a Kx1 dilation in a single pass:

Assert(StructuringElement.dy()==1);
int kernel_half = (StructuringElement.dx()-1) /2;

for( y < dy ) { // y loop

    for( x <= kernel_half ) { // initialize the queue 
        queue.Push(src(x, y));
    }

    for( x < dx ) { // x loop

        // get the current maximum of all values in the queue
         dst(x, y) = queue.GetMaximum();

        // remove the first pixel from the queue
        if (x > kernel_half)
            queue.Pop();

        // add the next pixel to the queue
        if (x < dx - kernel_half)
            queue.Push(src(x + kernel_half, y));
    }
}
like image 99
HugoRune Avatar answered Sep 28 '22 01:09

HugoRune


The only approach I can think of is to buffer the maximum pixel values and the rows in which they are found so that you only have to do the full iteration over a kernel sized row/column when the maximum is no longer under it.
In the following C-like pseudo code, I have assumed signed integers, 2d row-major arrays for the source and destination and a rectangular kernel over [±dx, ±dy].

//initialise the maxima and their row positions
for(x=0; x < nx; ++x)
{
  row[x] = -1;
  buf[x] = 0;
}

for(sy=0; sy < ny; ++sy)
{
  //update the maxima and their row positions
  for(x=0; x < nx; ++x)
  {
    if(row[x] < max(sy-dy, 0))
    {
      //maximum out of scope, search column
      row[x] = max(sy-dy, 0);
      buf[x] = src[row[x]][x];
      for(y=row[x]+1; y <= min(sy+dy, ny-1); ++y)
      {
        if(src[y][x]>=buf[x])
        {
          row[x] = y;
          buf[x] = src[y][x];
        }
      }
    }
    else
    {
      //maximum in scope, check latest value
      y = min(sy+dy, ny-1);
      if(src[y][x] >= buf[x])
      {
        row[x] = y;
        buf[x] = src[y][x];
      }
    }
  }

  //initialise maximum column position
  col = -1;

  for(sx=0; sx < nx; ++sx)
  {
    //update maximum column position
    if(col<max(sx-dx, 0))
    {
      //maximum out of scope, search buffer
      col = max(sx-dx, 0);
      for(x=col+1; x <= min(sx+dx, nx-1); ++x)
      {
        if(buf[x] >= buf[col]) col = x;
      }
    }
    else
    {
      //maximum in scope, check latest value
      x = min(sx+dx, nx-1);
      if(buf[x] >= buf[col]) col = x;
    }

    //assign maximum to destination
    dest[sy][sx] = buf[col];
  }
}

The worst case performance occurs when the source goes smoothly from a maximum at the top left to a minimum at the bottom right, forcing a full row or column scan at each step (although it's still more efficient than the original nested loops).
I would expect average case performance to be much better though, since regions containing increasing values (both row and column wise) will update the maximum before a scan is required.
That said, not having actually tested it I'd recommend that you run a few benchmarks rather than trust my gut feeling!

like image 26
thus spake a.k. Avatar answered Sep 28 '22 00:09

thus spake a.k.