Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

can counting contiguous regions in a bitmap be improved over O(r * c)?

You are given an image of a surface photographed by a satellite.The image is a bitmap where water is marked by '.' and land is marked by '*'. Adjacent group of '*'s form an island. (Two '*' are adjacent if they are horizontal, vertical or diagonal neighbours). Your task is to print the number of islands in the bitmap.

Example Input:-

.........**
**......***
...........
...*.......
*........*.
*.........*

Output:- 5

Here, is my implementation which takes O(r * c) space and O(r * c) space where r is total no. of rows and c is total no of cols.

#include <stdio.h>
#define COLS 12

void markVisted(char map[][COLS], int visited[][COLS], int row, int col, int rowCount)
{
    if((row < 0) || (row >= rowCount) || (col < 0) || (col >= COLS) || (map[row][col] != '*') || (visited[row][col] == 1)) return;

    visited[row][col] = 1;

    //calling neighbours
    markVisted(map, visited, row+1, col, rowCount);
    markVisted(map, visited, row, col+1, rowCount);
    markVisted(map, visited, row-1, col, rowCount);
    markVisted(map, visited, row, col-1, rowCount);
    markVisted(map, visited, row+1, col+1, rowCount);
    markVisted(map, visited, row-1, col-1, rowCount);
    markVisted(map, visited, row-1, col+1, rowCount);
    markVisted(map, visited, row+1, col-1, rowCount);
}
int countIslands(char map[][COLS], int visited[][COLS], int rowCount)
{
    int i, j, count = 0;
    for(i=0; i<rowCount; ++i){
        for(j=0; j<COLS; ++j){

            if((map[i][j] == '*') && (visited[i][j] == 0)){
                ++count;
                markVisted(map, visited, i, j, rowCount);
            }
        }
    }
    return count;
}

int main()
{
    char map[][COLS] = {
                    "*..........",
                    "**........*",
                    "...........",
                    "...*.......",
                    "*........*.",
                    "..........*"               
                    };
    int rows = sizeof(map)/sizeof(map[0]);
    int visited[rows][COLS], i, j;  

    for(i=0; i<rows; ++i){
        for(j=0; j<COLS; ++j) visited[i][j] = 0;
    }

    printf("No. of islands = %d\n", countIslands(map, visited, rows));


    return 0;
}

please suggest some better logic for this problem
also, suggestions to improve my solution is welcomed.

like image 311
Eight Avatar asked Aug 13 '12 14:08

Eight


4 Answers

I think the confusion here is that your algorithm does actually run in linear time, not quadratic time.

When using big-O notation, n stands for the input size. Your input here is not just r or just c, but rather, r * c, as it is a grid of nodes. Your algorithm runs in O(r * c), as you said in your question... thus your algorithm runs in O(n) which is linear time.

It seems to me that any algorithm that solves this problem will have to read each input cell once in the worst case. Thus the best running time you can hope for is O(n). As your algorithm runs in O(n) you can't have any algorithm that runs of a faster order, in the worst case, than the algorithm you proposed.

I can think of some clever tricks. For example, if you have a block of *s, you could only check the diagonals, in certain cases. That is, if you have

......
.****.
.****.
.****.
.****.
......

it won't matter if you only read these cells:

......
.*.*..
..*.*.
.*.*..
..*.*.
......

unless for example you have something in the bottom-left-most corner, in which case you would need to read that bottom-left-most *. So maybe in certain cases your algorithm can run more quickly, but for the worst case (which is what O measures), it will have to be O(n).

EDIT: Also even in that case where you only read half the nodes, the run-time would be O(n/2) which is still of the same order (O(n)).

like image 54
Claudiu Avatar answered Nov 11 '22 03:11

Claudiu


This is highly related to connected component labeling. The number of connected component is just a byproduct of the labeling. Algorithm described in the linked wikipedia article works in linear time.

like image 40
Nicolas Barbey Avatar answered Nov 11 '22 02:11

Nicolas Barbey


  1. Create a undirected graph, where each island node connects to its neighboor island nodes.

  2. While there are unvisited nodes:

    • pick an unvisited node, do a depth-first traversal and mark every node visited, increase number_of_islands.
  3. Done.

Both (1) and (2) takes O(n) time.

like image 1
Ali Ferhat Avatar answered Nov 11 '22 01:11

Ali Ferhat


Asymptotically your approach is the best O(n).

However, I noticed a couple of things:

First:

inside the function markVisited you check a cells neighbors in the order:

down, right, up, left, down-right, up-left, up-right, down-left

A better order would be:

left, right, up-left, up, up-right, down-left, down, down-right

This will play nicer in the cache since it is starting by reading values directly next to the current position, and sequentially in a given row.

(note that starting with the diagonals would mark visited to a larger number of locations, but since checking if a cell was visited is only the last check it wouldn't save any time).

Secondly:

In the worst case of a satellite image that contains only land, your algorithm will visit each cell multiple times, (something like once for each neighbor the cell has).

This means you are approximately doing eight times more array accesses than possibly needed.

I believe that you can solve this problem with a more or less linear pass over the data.

I'm currently thinking of an approach, if it works I'll add the code as an edit.

like image 1
Xantix Avatar answered Nov 11 '22 03:11

Xantix