Goal: Want to efficiently find all the disconnected graphs from a large collection of sets
For example, I have a data file like the following:
A, B, C
C, D, E
A, F, Z
G, J
...
each entry represents a set of element. First entries A, B, C = {A, B, C} This also indicate that there is a edge between A and B, A and C, B and C.
The algorithm I initially came up with was the following
1.parse all the entries into a list:
[
{A,B,C}
{C,D,E}
...
]
2.start with the first element/set of the list can called start_entry, {A,B,C} in this case
3.traverse other element in the list and do the following:
if the intersection of the element and start_entry is not empty
start_entry = start_entry union with the element
remove element from the list
4.with the updated start_entry, traverse the list again until there is not new update
The algorithm above should return a list of vertex of connected graph. Nevertheless, I ran into the runtime problem due to the dataset size. There is ~100000 entries. So I just wonder if anyone knows there is more efficient way to find connected graph.
The data structure could also be altered into (if this is easier) A,B B,C E,F ... with each entry represent an edge of graph.
Prim's Algorithm and disconnected graph.
Disconnected Graph A graph is disconnected if at least two vertices of the graph are not connected by a path. If a graph G is disconnected, then every maximal connected subgraph of G is called a connected component of the graph G.
Kruskal will run just fine on your disconnected graph; it will find a minimum spanning tree for each connected component. Alternatively, you could run Prim's on each subgraph that contains only connected components.
Explanation: Prim's algorithm iterates from one node to another, so it can not be applied for disconnected graph.
This looks like an ideal case for using a disjoint set data structure.
This lets you join together sets in almost linear time.
from collections import defaultdict
data=["A","B","C"],["C","D","E"],["F","G"]
# Prepare mapping from data element to index
S = {}
for a in data:
for x in a:
if x not in S:
S[x] = len(S)
N = len(S)
rank=[0]*N
parent=range(N)
def Find(x):
"""Find representative of connected component"""
if parent[x] != x:
parent[x] = Find(parent[x])
return parent[x]
def Union(x,y):
"""Merge sets containing elements x and y"""
x = Find(x)
y = Find(y)
if x == y:
return
if rank[x]<rank[y]:
parent[x] = y
elif rank[x]>rank[y]:
parent[y] = x
else:
parent[y] = x
rank[x] += 1
# Merge all sets
for a in data:
x = a[0]
for y in a[1:]:
Union(S[x],S[y])
# Report disconnected graphs
V=defaultdict(list)
for x in S:
V[Find(S[x])].append(x)
print V.values()
prints
[['A', 'C', 'B', 'E', 'D'], ['G', 'F']]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With