Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating sets of similar elements in a 2D array

I am trying to solve a problem that is based on a 2D array. This array contains different kinds of elements (from a total of 3 possible kinds). Lets assume the kind as X, Y, Z.

The array appears to be something like this. Note that it would always be completely filled. The diagram is for illustration.

7 | | | | | | |
6 | | | | | | |
5 | | | | | | |
4 | |X|Z|Y|X| |
3 | |Y|X|Y|Y|X|
2 |Y|Y|X|Z|Z|X|
1 |X|X|Y| |X|X|
0 | | | |Z| | |
   0 1 2 3 4 5

I am trying to create sets of elements that are placed adjacent to each other. For example, set1 may comprise of elements of type X located at: (0,1), (1,1), (2,2), (2,3), (1,4). Similarly, set2 may comprise of elements of type Y located at: (3,4), (3,3), 4,3).

Problem: Given any point in the array, it must be capable of adding all elements to the appropriate set and ensuring that there are no two sets that contain the same element. Note that a set is only created if more than 2 adjacent elements of the same kind are encountered.

Moreover, if a certain subset of elements is removed, more elements are added to replace the removed ones. The array must then be re-iterated over to make new sets or modify the existing ones.

Solution: I implemented a recursive solution such that it would iterate over all the adjacent elements of, for example, element X (0,1). Then, while iterating over the 8 possible adjacent elements, it would call itself recursively whenever a type X occurred.

This kind of solution is too much brute-force and inefficient, especially in the case where some elements are replaced with new ones of possibly different types. In such a case, almost the whole array has to be re-iterated to make/modify sets and ensuring that no same element exists in more than one set.

Is there any algorithm to deal efficiently with this kind of problem? I need help with some ideas/suggestions or pseudo codes.

like image 994
Rafay Avatar asked Jul 22 '13 19:07

Rafay


2 Answers

[EDIT 5/8/2013: Fixed time complexity. (O(a(n)) is essentially constant time!)]

In the following, by "connected component" I mean the set of all positions that are reachable from each other by a path that allows only horizontal, vertical or diagonal moves between neighbouring positions having the same kind of element. E.g. your example {(0,1), (1,1), (2,2), (2,3), (1,4)} is a connected component in your example input. Each position belongs to exactly one connected component.

We will build a union/find data structure that will be used to give every position (x, y) a numeric "label" having the property that if and only if any two positions (x, y) and (x', y') belong to the same component then they have the same label. In particular this data structure supports three operations:

  • set(x, y, i) will set the label for position (x, y) to i.
  • find(x, y) will return the label assigned to the position (x, y).
  • union(Z), for some set of labels Z, will combine all labels in Z into a single label k, in the sense that future calls to find(x, y) on any position (x, y) that previously had a label in Z will now return k. (In general k will be one of the labels already in Z, though this is not actually important.) union(Z) also returns the new "master" label, k.

If there are n = width * height positions in total, this can be done in O(n*a(n)) time, where a() is the extremely slow-growing inverse Ackermann function. For all practical input sizes, this is the same as O(n).

Notice that whenever two vertices are adjacent to each other, there are four possible cases:

  1. One is above the other (connected by a vertical edge)
  2. One is to the left of the other (connected by a horizontal edge)
  3. One is above and to the left of the other (connected by a \ diagonal edge)
  4. One is above and to the right of the other (connected by a / diagonal edge)

We can use the following pass to determine labels for each position (x, y):

  • Set nextLabel to 0.
  • For each row y in increasing order:
    • For each column x in increasing order:
      • Examine the W, NW, N and NE neighbours of (x, y). Let Z be the subset of these 4 neighbours that are of the same kind as (x, y).
      • If Z is the empty set, then we tentatively suppose that (x, y) starts a brand new component, so call set(x, y, nextLabel) and increment nextLabel.
      • Otherwise, call find(Z[i]) on each element of Z to find their labels, and call union() on this set of labels to combine them together. Assign the new label (the result of this union() call) to k, and then also call set(x, y, k) to add (x, y) to this component.

After this, calling find(x, y) on any position (x, y) effectively tells you which component it belongs to. If you want to be able to quickly answer queries of the form "Which positions belong to the connected component containing position (x, y)?" then create a hashtable of lists posInComp and make a second pass over the input array, appending each (x, y) to the list posInComp[find(x, y)]. This can all be done in linear time and space. Now to answer a query for some given position (x, y), simply call lab = find(x, y) to find that position's label, and then list the positions in posInComp[lab].

To deal with "too-small" components, just look at the size of posInComp[lab]. If it's 1 or 2, then (x, y) does not belong to any "large-enough" component.

Finally, all this work effectively takes linear time, so it will be lightning fast unless your input array is huge. So it's perfectly reasonable to recompute it from scratch after modifying the input array.

like image 120
j_random_hacker Avatar answered Oct 23 '22 22:10

j_random_hacker


In your situation, I would rely, at least, on two different arrays:

Array1 (sets) -> all the sets and the associated list of points. Main indices: set names.
Array2 (setsDef) -> type of each set ("X", "Y" or "Z"). Main indices: type names.

It might be possible to create more supporting arrays like, for example, one including the minimum/maximum X/Y values for each set to speed up the analysis (although it would be pretty quick anyway, as shown below).

You are not mentioning any programming language, but I include a sample (C#) code because it is the best way to explain the point. Please, don't understand it as a suggestion of the best way to proceed (personally, I don't like Dictionaries/Lists too much; although think that do provide a good graphical way to show an algorithm, even for unexperienced C# users). This code only intends to show a data storage/retrieval approach; the best way to achieve the optimal performance would depend upon the target language and further issues (e.g., dataset size) and is something you have to take care of.

Dictionary<string, List<Point>> sets = new Dictionary<string, List<Point>>(); //All sets and the associated list of points
Dictionary<string, List<string>> setsDef = new Dictionary<string, List<string>>(); //Array indicating the type of information stored in each set (X or Y)

List<Point> temp0 = new List<Point>();
temp0.Add(new Point(0, 0));
temp0.Add(new Point(0, 1));
sets.Add("Set1", temp0);
List<String> tempX = new List<string>();
tempX.Add("Set1");

temp0 = new List<Point>();
temp0.Add(new Point(0, 2));
temp0.Add(new Point(1, 2));
sets.Add("Set2", temp0);
List<String> tempY = new List<string>();
tempY.Add("Set2");

setsDef.Add("X", tempX);
setsDef.Add("Y", tempY);


//-------- TEST
//I have a new Y value which is 2,2
Point targetPoint = new Point(2, 2);
string targetSet = "Y";

//I go through all the Y sets
List<string> targetSets = setsDef[targetSet];

bool alreadyThere = false;
Point candidatePoint;
string foundSet = "";
foreach (string set in targetSets) //Going through all the set names stored in setsDef for targetSet
{
    List<Point> curPoints = sets[set];
    foreach (Point point in curPoints) //Going through all the points in the given set
    {
        if (point == targetPoint)
        {
            //Already-stored point and thus the analysis will be stopped
            alreadyThere = true;
            break;
        }
        else if (isSurroundingPoint(point, targetPoint))
        {
            //A close point was found and thus the set where the targetPoint has to be stored
            candidatePoint = point;
            foundSet = set;
            break;
        }
    }
    if (alreadyThere || foundSet != "")
    {
        break;
    }
}

if (!alreadyThere)
{
    if (foundSet != "")
    {
        //Point added to an existing set
        List<Point> curPoints = sets[foundSet];
        curPoints.Add(targetPoint);
        sets[foundSet] = curPoints;
    }
    else
    {
        //A new set has to be created
        string newName = "New Set";
        temp0 = new List<Point>();
        temp0.Add(targetPoint);
        sets.Add(newName, temp0);

        targetSets.Add(newName);
        setsDef[targetSet] = targetSets;
    }
}

Where isSurroundingPoint is a function checking whether both points are close one to the other:

private bool isSurroundingPoint(Point point1, Point point2)
{
    bool isSurrounding = false;
    if (point1.X == point2.X || point1.X == point2.X + 1 || point1.X == point2.X - 1)
    {
        if (point1.Y == point2.Y || point1.Y == point2.Y + 1 || point1.Y == point2.Y - 1)
        {
            isSurrounding = true;
        }
    }
    return isSurrounding;
}
like image 23
varocarbas Avatar answered Oct 24 '22 00:10

varocarbas