Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding Contiguous Areas of Bits in 2D Bit Array

The Problem

I have a bit array which represents a 2-dimensional "map" of "tiles". This image provides a graphical example of the bits in the bit array:

Bit Array Example

I need to determine how many contiguous "areas" of bits exist in the array. In the example above, there are two such contiguous "areas", as illustrated here:

Contiguous Areas

Tiles must be located directly N, S, E or W of a tile to be considered "contiguous". Diagonally-touching tiles do not count.

What I've Got So Far

Because these bit arrays can become relatively large (several MB in size), I have intentionally avoided using any sort of recursion in my algorithm.

The pseudo-code is as follows:

LET S BE SOURCE DATA ARRAY LET C BE ARRAY OF IDENTICAL LENGTH TO SOURCE DATA USED TO TRACK "CHECKED" BITS FOREACH INDEX I IN S     IF C[I] THEN          CONTINUE      ELSE         SET C[I]         IF S[I] THEN             EXTRACT_AREA(S, C, I)  EXTRACT_AREA(S, C, I):     LET T BE TARGET DATA ARRAY FOR STORING BITS OF THE AREA WE'RE EXTRACTING     LET F BE STACK OF TILES TO SEARCH NEXT     PUSH I UNTO F     SET T[I]     WHILE F IS NOT EMPTY         LET X = POP FROM F         IF C[X] THEN              CONTINUE         ELSE             SET C[X]             IF S[X] THEN                 PUSH TILE NORTH OF X TO F                 PUSH TILE SOUTH OF X TO F                 PUSH TILE WEST OF X TO F                 PUSH TILE EAST OF X TO F                 SET T[X]     RETURN T 

Animation of My Algorithm

What I Don't Like About My Solution

  • Just to run, it requires two times the memory of the bitmap array it's processing.
  • While extracting an "area", it uses three times the memory of the bitmap array.
  • Duplicates exist in the "tiles to check" stack - which seems ugly, but not worth avoiding the way I have things now.

What I'd Like To See

  • Better memory profile
  • Faster handling of large areas

Solution / Follow-Up

I re-wrote the solution to explore the edges only (per @hatchet 's suggestion).

This was very simple to implement - and eliminated the need to keep track of "visited tiles" completely.

Based on three simple rules, I can traverse the edges, track min/max x & y values, and complete when I've arrived at the start again.

Here's the demo with the three rules I used:

Solution

like image 702
Steve Avatar asked Jul 31 '13 18:07

Steve


1 Answers

One approach would be a perimeter walk. Given a starting point anywhere along the edge of the shape, remember that point.

Start the bounding box as just that point.

Walk the perimeter using a clockwise rule set - if the point used to get to the current point was above, then first look right, then down, then left to find the next point on the shape perimeter. This is kind of like the simple strategy of solving a maze where you continuously follow a wall and always bear to the right.

Each time you visit a new perimeter point, expand the bounding box if the new point is outside it (i.e. keep track of the min and max x and y.

Continue until the starting point is reached.

Cons: if the shape has lots of single pixel 'filaments', you'll be revisiting them as the walk comes back.

Pros: if the shape has large expanses of internal occupied space, you never have to visit them or record them like you would if you were recording visited pixels in a flood fill.

So, conserves space, but in some cases at the expense of time.

Edit

As is so often the case, this problem is known, named, and has multiple algorithmic solutions. The problem you described is called Minimum Bounding Rectangle. One way to solve this is using Contour Tracing. The method I described above is in that class, and is called Moore-Neighbor Tracing or Radial Sweep. The links I've included for them discuss them in detail and point out a problem I hadn't caught. Sometimes you'll revisit the start point before you have traversed the entire perimeter. If your start point is for instance somewhere along a single pixel 'filament', you will revisit it before you're done, and unless you consider this possibility, you'll stop prematurely. The website I linked to talks about ways to address this stopping problem. Other pages at that website also talk about two other algorithms: Square Tracing, and Theo Pavlidis's Algorithm. One thing to note is that these consider diagonals as contiguous, whereas you don't, but that should just something that can be handled with minor modifications to the basic algorithms.

An alternative approach to your problem is Connected-component labeling. For your needs though, this may be a more time expensive solution than you require.

Additional resource:

Moore Neighbor Contour Tracing Algorithm in C++

like image 113
hatchet - done with SOverflow Avatar answered Sep 25 '22 21:09

hatchet - done with SOverflow