Iterative connected components algorithm

Tags:

I have a bipartite graph and I'm looking for the most efficient iterative way to divide it into connected components. My recursive version has started overflowing the stack on large data sets. I'm willing to port from any language/pseudocode but for completeness I'll be coding in C#.

My existing code is specialized for my data types. One partition is proteins, the other is spectra. Map and Set are C++ stdlib workalikes.

void recursivelyAssignProteinToCluster (long proteinId,
                                        long clusterId,
                                        Set<long> spectrumSet,
                                        Map<long, Set<long>> spectrumSetByProteinId,
                                        Map<long, Set<long>> proteinSetBySpectrumId,
                                        Map<long, long> clusterByProteinId)
{
    // try to assign the protein to the current cluster
    var insertResult = clusterByProteinId.Insert(proteinId, clusterId);
    if (!insertResult.WasInserted)
        return;

    // recursively add all "cousin" proteins to the current cluster
    foreach (long spectrumId in spectrumSet)
        foreach (var cousinProteinId in proteinSetBySpectrumId[spectrumId])
        {
            if (proteinId != cousinProteinId)
            {
                Set<long> cousinSpectrumSet = spectrumSetByProteinId[cousinProteinId];
                recursivelyAssignProteinToCluster(cousinProteinId,
                                                  clusterId,
                                                  cousinSpectrumSet,
                                                  spectrumSetByProteinId,
                                                  proteinSetBySpectrumId,
                                                  clusterByProteinId);
            }
        }
}

Map<long, long> calculateProteinClusters (NHibernate.ISession session)
{
    var spectrumSetByProteinId = new Map<long, Set<long>>();
    var proteinSetBySpectrumId = new Map<long, Set<long>>();

    var query = session.CreateQuery("SELECT pi.Protein.id, psm.Spectrum.id " + GetFilteredQueryString(FromProtein, ProteinToPeptideSpectrumMatch));

    foreach (var queryRow in query.List<object[]>())
    {
        long proteinId = (long) queryRow[0];
        long spectrumId = (long) queryRow[1];

        spectrumSetByProteinId[proteinId].Add(spectrumId);
        proteinSetBySpectrumId[spectrumId].Add(proteinId);
    }

    var clusterByProteinId = new Map<long, long>();
    int clusterId = 0;

    foreach (var pair in spectrumSetByProteinId)
    {
        long proteinId = pair.Key;

        // for each protein without a cluster assignment, make a new cluster
        if (!clusterByProteinId.Contains(proteinId))
        {
            ++clusterId;

            recursivelyAssignProteinToCluster(proteinId,
                                              clusterId,
                                              pair.Value,
                                              spectrumSetByProteinId,
                                              proteinSetBySpectrumId,
                                              clusterByProteinId);
        }
    }

    return clusterByProteinId;
}

As ShinTakezou suggested I refactored to put the stack on the heap and it works great. I used the DepthFirstSearch approach from digEmAll's example.

var clusterByProteinId = new Map<long, long>();
int clusterId = 0;
var clusterStack = new Stack<KeyValuePair<long, Set<long>>>();

foreach (var pair in spectrumSetByProteinId)
{
    long proteinId = pair.Key;

    if (clusterByProteinId.Contains(proteinId))
        continue;

    // for each protein without a cluster assignment, make a new cluster
    ++clusterId;
    clusterStack.Push(new KeyValuePair<long, Set<long>>(proteinId, spectrumSetByProteinId[proteinId]));
    while (clusterStack.Count > 0)
    {
        var kvp = clusterStack.Pop();

        // try to assign the protein to the current cluster
        var insertResult = clusterByProteinId.Insert(kvp.Key, clusterId);
        if (!insertResult.WasInserted)
            continue;

        // add all "cousin" proteins to the current cluster
        foreach (long spectrumId in kvp.Value)
            foreach (var cousinProteinId in proteinSetBySpectrumId[spectrumId])
                if (!clusterByProteinId.Contains(cousinProteinId))
                    clusterStack.Push(new KeyValuePair<long, Set<long>>(cousinProteinId, spectrumSetByProteinId[cousinProteinId]));
    }
}

432

asked Apr 05 '12 17:04

Matt Chambers

1 Answers

Here's an example of an helper class that holds an undirected Graph and allows to get the connected components of it (iteratively):

public class Graph<T>
{
    public Dictionary<T, HashSet<T>> nodesNeighbors;
    public IEnumerable<T> Nodes
    {
        get { return nodesNeighbors.Keys; }
    }
    public Graph()
    {
        this.nodesNeighbors = new Dictionary<T, HashSet<T>>();
    }
    public void AddNode(T node)
    {
        this.nodesNeighbors.Add(node, new HashSet<T>());
    }
    public void AddNodes(IEnumerable<T> nodes)
    {
        foreach (var n in nodes)
            this.AddNode(n);
    }
    public void AddArc(T from, T to)
    {
        this.nodesNeighbors[from].Add(to);
        this.nodesNeighbors[to].Add(from);
    }
    public bool ContainsNode(T node)
    {
        return this.nodesNeighbors.ContainsKey(node);
    }
    public IEnumerable<T> GetNeighbors(T node)
    {
        return nodesNeighbors[node];
    }
    public IEnumerable<T> DepthFirstSearch(T nodeStart)
    {
        var stack = new Stack<T>();
        var visitedNodes = new HashSet<T>();
        stack.Push(nodeStart);
        while (stack.Count > 0)
        {
            var curr = stack.Pop();
            if (!visitedNodes.Contains(curr))
            {
                visitedNodes.Add(curr);
                yield return curr;
                foreach (var next in this.GetNeighbors(curr))
                {
                    if (!visitedNodes.Contains(next))
                        stack.Push(next);
                }
            }
        }
    }
    public Graph<T> GetSubGraph(IEnumerable<T> nodes)
    {
        Graph<T> g = new Graph<T>();
        g.AddNodes(nodes);
        foreach (var n in g.Nodes.ToList())
        {
            foreach (var neigh in this.GetNeighbors(n))
                g.AddArc(n, neigh);
        }
        return g;
    }

    public IEnumerable<Graph<T>> GetConnectedComponents()
    {
        var visitedNodes = new HashSet<T>();
        var components = new List<Graph<T>>();

        foreach (var node in this.Nodes)
        {
            if (!visitedNodes.Contains(node))
            {
                var subGraph = GetSubGraph(this.DepthFirstSearch(node));
                components.Add(subGraph);
                visitedNodes.UnionWith(subGraph.Nodes);
            }
        }
        return components;
    }
}

Usage:

static void Main(string[] args)
{
    var g = new Graph<long>();
    g.AddNodes(new long[] { 1, 2, 3, 4, 5, 6, 7, 8, 9 });
    g.AddArc(1, 2);
    g.AddArc(1, 3);

    g.AddArc(9, 6);
    g.AddArc(6, 7);
    g.AddArc(6, 8);

    g.AddArc(4, 5);

    var subGraphs = g.GetConnectedComponents();

}

You could use the Graph<> class instead of your maps, or if you want to stick with your maps have a look at the code that is quite easy to understand (inside the class it is used a Dictionary<T,HashSet<T>> to hold nodes and arcs, so is very similar to your approach)

answered Oct 04 '22 19:10

digEmAll

Related questions
                            
                                Fading sound in/out
                            
                                Dynamics CRM 2011 Online, CrmSvcUtil, Proxy server authentication failing
                            
                                Architecture for MVC web project / usage of different types of model
                            
                                asp.net CompositeDataBoundControl for a single object
                            
                                ASP.NET Web Api in Web forms
                            
                                Submit trial version of WP7 application
                            
                                c# ASP.Net Dynamically populated DropDownList loses selected index on submit button
                            
                                MS Chart: How can you change the color of each label on the Axis of a Bar Chart?
                            
                                Edit MSMQ messages in a queue
                            
                                what is the difference between AppDomain.CreateInstance and Activator.CreateInstance?
                            
                                Include PNG in reciept on a LP2844Z (Zebra Printer) [duplicate]
                            
                                Send FINS command from C# to PLC
                            
                                Testing ViewModel PropertyChanged Events
                            
                                Upload file from Html form (multipart/form-data) to WCF REST service as a stream without streaming the whole form's inputs?
                            
                                Why isn't `Encoding.UTF8.GetBytes(Encoding.UTF8.GetString(x))==x`
                            
                                .NET equivalent of curl to upload a file to REST API?
                            
                                Pass list of ints from JavaScript to C# -- I get the list, but it is empty; form data isn't structured properly?
                            
                                What do time fields in CounterSample mean?
                            
                                Is there a better design option?
                            
                                Can a single schema.ini definition cover multiple files

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Iterative connected components algorithm

Tags:

c#

iteration

recursion

graph-theory

Matt Chambers

People also ask

1 Answers

digEmAll

Recent Activity

Donate For Us