Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm to find disconnected graph from sets

Goal: Want to efficiently find all the disconnected graphs from a large collection of sets

For example, I have a data file like the following:

A, B, C
C, D, E
A, F, Z
G, J
...

each entry represents a set of element. First entries A, B, C = {A, B, C} This also indicate that there is a edge between A and B, A and C, B and C.

The algorithm I initially came up with was the following

1.parse all the entries into a list:
[
{A,B,C}
{C,D,E}
...
]
2.start with the first element/set of the list can called start_entry, {A,B,C} in this case
3.traverse other element in the list and do the following:
     if the intersection of the element and start_entry is not empty
          start_entry = start_entry union with the element
          remove element from the list
4.with the updated start_entry, traverse the list again until there is not new update

The algorithm above should return a list of vertex of connected graph. Nevertheless, I ran into the runtime problem due to the dataset size. There is ~100000 entries. So I just wonder if anyone knows there is more efficient way to find connected graph.

The data structure could also be altered into (if this is easier) A,B B,C E,F ... with each entry represent an edge of graph.

like image 288
Junwei su Avatar asked Jun 16 '17 19:06

Junwei su


People also ask

Which algorithm is used for disconnected graph?

Prim's Algorithm and disconnected graph.

How do you find a disconnected graph?

Disconnected Graph A graph is disconnected if at least two vertices of the graph are not connected by a path. If a graph G is disconnected, then every maximal connected subgraph of G is called a connected component of the graph G.

Does Kruskal algorithm work for disconnected graphs?

Kruskal will run just fine on your disconnected graph; it will find a minimum spanning tree for each connected component. Alternatively, you could run Prim's on each subgraph that contains only connected components.

Can Prim's algorithm be used for disconnected graphs?

Explanation: Prim's algorithm iterates from one node to another, so it can not be applied for disconnected graph.


1 Answers

This looks like an ideal case for using a disjoint set data structure.

This lets you join together sets in almost linear time.

Example Python code

from collections import defaultdict

data=["A","B","C"],["C","D","E"],["F","G"]

# Prepare mapping from data element to index
S = {}
for a in data:
    for x in a:
        if x not in S:
            S[x] = len(S)

N = len(S)
rank=[0]*N
parent=range(N)

def Find(x):
    """Find representative of connected component"""
    if  parent[x] != x:
        parent[x] = Find(parent[x])
    return parent[x]

def Union(x,y):
    """Merge sets containing elements x and y"""
    x = Find(x)
    y = Find(y)
    if x == y:
        return
    if rank[x]<rank[y]:
        parent[x] = y
    elif rank[x]>rank[y]:
        parent[y] = x
    else:
        parent[y] = x
        rank[x] += 1

# Merge all sets
for a in data:
    x = a[0]
    for y in a[1:]:
        Union(S[x],S[y])

# Report disconnected graphs
V=defaultdict(list)
for x in S:
    V[Find(S[x])].append(x)

print V.values()

prints

[['A', 'C', 'B', 'E', 'D'], ['G', 'F']]
like image 184
Peter de Rivaz Avatar answered Oct 22 '22 08:10

Peter de Rivaz