Suppose I have the following graph (arrows indicate direction of the connection) and I want to count the size of the cluster of black nodes:
which is organized in the memory as a list of nodes, such that each node has a list of its neighbors nodes. I want to count, starting from any node, how many nodes have the node[i].State == 1
, if the given node is also with state 1. Thus, I implemented a method Node.GetClusterSize()
, in which I count the cluster size (it is based in Depth-First Search algorithm):
public class Node
{
public Int32 State { get; private set; } // 0 = white; 1 = black;
public Boolean Visited { get; private set; }
public List<Node> InputNeigh { get; private set; } // list of references to
// neighbors nodes
public Int32 GetClusterSize()
{
this.Visited = true;
if (this.State == 1)
{
Int32 s = 1, i = 0;
while (i < this.InputNeigh.Count)
{
if (!this.InputNeigh[i].Visited)
{
s += this.InputNeigh[i].GetClusterSize();
}
i++;
}
this.Visited = false; // this is an important step, I'll explain why
return s;
}
else
{
return 0;
}
}
public void Evolve() { /* doesn't matter for this question */ }
}
Now, I need to mark nodes as not visited because I count the cluster size for every node at every timestep of the main simulation (the state of the nodes is evolving with time, so clusters may change size in the next timestep).
This issue could be easily fixed if, instead of a flag in the Node
objects, I have an external list of Boolean, which a given element i
correspond to node i
: List<Boolean> nodeStatus
, and passing this list as reference to the function Node.GetClusterSize()
. But then, I would have to reset this list at every timestep, slowing down the code (performance matters!).
The failure of the above code is exactly marking the node as not visited after iterating through its neighbors. This situation is better visualized with the following tree (visited from left to right and supposing I first call node[0].GetClusterSize()
):
The Depth-First Search iterates across the blue path in the above tree and, when it reaches the node 3
, it knows that all its neighbors have already been visited, marks 3
as not visited and returns s = 1
. As 3
is the next neighbor of 2
to be visited, and 3
is marked as not visited (although it has already been visited), it checks again and the algorithm goes into a StackOverflow
exception or, at best, returns the wrong size of the cluster.
Thus, I came up with 2 ideas, although I still don't know how to implement them:
1) Implement a Breadth-First Search algorithm; Although I do not know how to apply this concept to the presented situation.
2) Implement Depth-First Search in a sequential way (not recursively). Nevertheless, I can't imagine how it is possible.
Do you have any idea to override this issue? Any suggestion?
PS: The graph can be way larger than this example and there may be more than one black cluster in the network at the same time, disconnected from each other. Thus, it is not just a matter of counting black elements.
Do not mutate an object that you are attempting to query; that way lies madness because as you note, you then have to unmutate the object.
Look at it this way. You have a relation defined. A black node is related to another black node if there is any edge directly between them. When given a black node you wish to compute the size of the reflexive and transitive closure of this relation.
In your example the relation appears to also be symmetric, so the closure would then define an equivalence class, and your question is then "given a member, find the size of its equivalence class.
So let's solve the more general problem.
What is a relation? As a commenter points out, a relation is properly a set of ordered pairs. But it is convenient to think of your relation as a function which when given an element, gives you a sequence of all the elements that are related to it. In this case: given a black node, the relation function gives you a sequence of all the neighbouring black nodes.
Here we have a non-recursive method which when given an item and a relation computes the transitive closure of that relation:
static HashSet<T> TransitiveClosure<T>(
Func<T, IEnumerable<T>> relation,
T item)
{
var closure = new HashSet<T>();
var stack = new Stack<T>();
stack.Push(item);
while(stack.Count > 0)
{
T current = stack.Pop();
foreach(T newItem in relation(current))
{
if (!closure.Contains(newItem))
{
closure.Add(newItem);
stack.Push(newItem);
}
}
}
return closure;
}
Notice that this is a non-recursive depth-first traversal with cycle detection.
Exercise: What simple changes could you make to this implementation to turn it into a non-recursive breadth-first traversal with cycle detection?
We create the reflexive and transitive closure easily enough:
static HashSet<T> TransitiveAndReflexiveClosure<T>(
Func<T, IEnumerable<T>> relation,
T item)
{
var closure = TransitiveClosure(relation, item);
closure.Add(item);
return closure;
}
Exercise: Your relationship is symmetric, which means that when we start with a node X and visit a neighbor Y, then when we process Y it will put X back on the stack, and eventually in the closure. Therefore it is unnecessary to take the reflexive closure.
The previous argument is incorrect; it is necessary to take the reflexive closure. Which sentence in that argument contained the first mistake?
Now you have a method which you can call very easily:
var cluster = TransitiveAndReflexiveClosure<Node>(
node => from n in node.InputNeigh where n.State == node.State select n,
someNode);
And now you can simply ask the cluster for its size if that's what you want.
(And please change the name of InputNeigh
. Abbrevs are totes uncool, yo, unless you are 13 years old.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With