Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can we find the largest contiguous region of a graph with two different "ID"s?

I've recently learned about the Flood-Fill Algorithm, an algorithm that can take a graph and assign each node a component number in O(N) time.

For example, a common problem that can be solved efficiently with the Flood-Fill Algorithm would be to find the largest region in a N*N board, where every node in the region is adjacent to another node with the same ID either directly up, down, to the left, or to the right.

enter image description here

In this board, the largest regions would both be of size 3, made up of all 1s and all 9s respectively.

However, I recently started wondering if we could extend this problem; specifically, if we could find the largest region in a graph such that every node in the region is adjacent to another node with two possible IDs. In the above board, the largest such region is made up of 1s and 9s, and has a size of 7.

Here was my thought process in trying to solve this problem:

Thought 1: O(N^4) Algorithm

We can solve this in O(N^4) time using a basic flood-fill algorithm. We do this by testing all O(N^2) pairs of horizontally or vertically adjacent squares. For every pair of squares, if they have different IDs, then we run a flood-fill from one of the two squares.

Then, by modifying the flood-fill algorithm so that it travels to squares with one of the two possible IDs, we can test each pair in O(N^2) time --> O(N^2) pairs * O(N^2) flood fill per pair = O(N^4) algorithm.

Then, I had an insight: An Possibly O(N^2) Algorithm

First, we run a regular flood-fill through the board and separate the board into a "component graph" (where each component in the original graph is reduced to a single node).

enter image description here

Now, we do a flood-fill through the edges of the component graph instead of the nodes. We mark each edge with a pair of integers signifying the two IDs inside the two components which it connects, before flood-filling through the edges as if they themselves were nodes.

I believe that this, if implemented correctly, would result in a O(N^2) algorithm, because an upper bound for the number of edges in a N*N board is 4*N*N.

Now, my question is, is my thought process logically sound? If not, can somebody suggest another algorithm to solve this problem?

like image 421
Anthony Smith Avatar asked Nov 26 '20 21:11

Anthony Smith


2 Answers

Here is the algorithm that I wrote to solve your problem. It expands on your idea to flood-fill through the edges (great idea, by the way) and is able to output the correct answer for a 250*250 grid in less than 300ms, with less than 30 megabytes of memory allocated.

Here is the problem that I managed to find online that matches your question exactly, and it is also where I tested the validity of my algorithm: USACO Problem

Note that the USACO Problem requires us to find the largest single-id component before finding the largest double-id component. In my algorithm, the first step is actually necessary in order to reduce the whole board into a component graph.

Here's my commented C++ Code:

#include <iostream>
#include <fstream>
#include <cmath>
#include <algorithm>
#include <vector>
#include <unordered_set>

using namespace std;

// board to hold square ids and comp[][] to mark component numbers
vector <vector<int>> board, comp;
vector <int> comp_size = {-1}; // size of those components
vector <int> comp_id = {-1}; // id contained within those components
vector <unordered_set <int>> adj = {{}}; // component graph adjacency list
vector <bool> visited; // component graph visited array

void dfs(int x, int y, int N, int id, int curr_comp){
    if(x < 0 || x >= N || y < 0 || y >= N){return;}
    else if(board[x][y] != id){
        if(comp[x][y] == 0){return;}

        // construct component graph adjacency list during the first flood-fill
        adj[comp[x][y]].insert(curr_comp);
        adj[curr_comp].insert(comp[x][y]);
        // this is why we use an unordered_set: it automatically eliminates
        // collisions

        return;
    }
    else if(comp[x][y]){return;}

    ++comp_size[curr_comp];
    comp[x][y] = curr_comp;

    dfs(x-1, y, N, id, curr_comp);
    dfs(x+1, y, N, id, curr_comp);
    dfs(x, y-1, N, id, curr_comp);
    dfs(x, y+1, N, id, curr_comp);
}

void dfs2(int curr, int id1, int id2, int &size){
    visited[curr] = true;

    // recurse from all valid and adjacent components to curr
    vector <int> to_erase;
    for(int item : adj[curr]){
        if(visited[item]){continue;}
        if(comp_id[item] == id1 || comp_id[item] == id2){
            to_erase.push_back(item);
            size += comp_size[item];
            dfs2(item, id1, id2, size);
        }
    }

    // we erase all edges connecting the current component AT THE SAME TIME to
    // prevent std::unordered_set iterators from being invalidated, which would
    // happen if we erased items as we iterated through adj[curr]
    for(int item : to_erase){
        adj[curr].erase(item);
        adj[item].erase(curr);
    }

    return;
}

int main()
{
    ifstream fin("multimoo.in");
    ofstream fout("multimoo.out");

    int N;
    fin >> N;

    board = vector <vector<int>> (N, vector <int> (N));
    for(int i = 0; i < N; ++i){
        for(int j = 0; j < N; ++j){
            fin >> board[i][j];
        }
    }
    // Input Done

    comp = vector <vector<int>> (N, vector <int> (N, 0)); // note that comp[i][j] = 0 means not visited yet

    // regular flood-fill through all the nodes
    for(int i = 0, curr_comp = 1; i < N; ++i){
        for(int j = 0; j < N; ++j){
            if(comp[i][j]){continue;}

            // add information about the current component
            comp_size.push_back(0);
            comp_id.push_back(board[i][j]);
            adj.push_back({});

            dfs(i, j, N, board[i][j], curr_comp++);
        }
    }

    fout << *max_element(comp_size.begin(), comp_size.end()) << endl;

    int ANS = 0;
    for(unsigned int i = 1; i < comp_size.size(); ++i){
        // no range-for loop here as we erase elements while iterating, which
        // may invalidate unordered_set iterators; instead, we use a while-loop
        while(!adj[i].empty()){
            int size = comp_size[i], curr = *(adj[i].begin());
            visited = vector <bool> (comp_size.size(), false); // reset visited
            dfs2(i, comp_id[i], comp_id[curr], size);
            ANS = max(ANS, size);
        }
    }

    fout << ANS << endl;

    return 0;
}





As for the time complexity, I personally am not very sure. If somebody could help analyze this algorithm to determine its complexity, I'd greatly appreciate it!

like image 67
Telescope Avatar answered Nov 18 '22 01:11

Telescope


Your algorithm works...

As far as I can tell, flood filling over your induced graph indeed gives all possible components, after which it's simple to find the largest one.

...but I'm not sure about the runtime

You correctly say that there are O(N^2) edges in the original graph, and therefore O(N^2) nodes in the induced graph. However, these nodes are no longer guaranteed to be in a nice grid, which may leave more than O(N^2) induced edges.

For example, consider the large "1-block" in your example. This block has 6 edges, which will give a complete graph with 6 vertices, as all these edges-turned-vertices are connected. This may give you an induced graph with more than O(N^2) edges, making it impossible to find components in O(N^2) time.

Therefore, I believe that the algorithm will not run in O(N^2), but I'm unsure of the actual runtime, as it will depend on what exactly the algorithm does at this point. The question only notes flood fill, but I think it had not imagined this situation.

Consider the following 9x9 grid:

232323232
311111113
212313212
313212313
212313212
313212313
212313212
313212313
212313212

The idea is simple: it's a single large component designed to border as many small components as possible. The induced graph here would be a single almost-complete graph with O(N^2) vertices and O(N^4) edges. Alternatively, if we only link the (1,2) edges with other (1,2) edges, and similar for (1,3) edges and other (1,3) edges, we will have a slightly less-connected graph, but it would still consist of two components with each O(N^4) edges, albeit with a lower constant.

Therefore, creating this graph would take at least O(N^4) time, as would traversing it. This is the time I would argue that the algorithm takes, but I cannot prove that there are no possible optimizations that improve upon this.

like image 1
ADdV Avatar answered Nov 18 '22 01:11

ADdV