Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find islands in directed graph

I'm working in an obscure language with poor dependency management. To help out 14000 file codebase, I've written some parsing tools (in Java) and have generated a dependency graph.

I wrote my own graph and BFS classes, and they work just fine. With them, I have such methods as getParents() and getChildren().

Now I'm trying to find the 'islands' in this graph; that is to say, I'm trying to find what sections of our codebase are not dependent on each other, in the hopes of gathering them into isolated modules.

Later, I also plan to analyze the individual islands to see if there's any weak points in them where we can establish a module barrier and define that module's interface, but that's down the road.

Right now, I'm thinking of doing this:

Map<DependencyEntry, Set<DependencyEntry>> allChildren = new ...;
for(DependencyEntry entry : allFiles) allChildren.put(entry,getAllChildren(entry));
Set<DependencyEntry> visited = new ...;
Set<DependencyEntry> largest = new HashSet<DependencyEntry>(); // size 0
// slightly more expensive but more readable
for(DependencyEntry entry : allChildren.keySet()) {
    Set<DependencyEntry> set = allChildren.get(key);
    if(set.size() > largest.size()) largest = set;
}
visited.addAll(largest);

This should give me the largest island. From there, I can go through and exclude any sets that contain any visited nodes, and then run it again to get the next largest island, and so forth.

Is this an accurate algorithm? Is there a better way to solve this problem that I'm not seeing?

like image 691
corsiKa Avatar asked Aug 25 '11 16:08

corsiKa


2 Answers

Sounds like you want to build a collection of connected components found in the dependency graph. From there you could iterate the collection and identify the largest component (or any other interesting metric).

like image 196
PaulF Avatar answered Sep 18 '22 16:09

PaulF


it would have been simpler to use a graph library, because these kind of things are built-in. typically they let you store arbitrary information in nodes, edges, so you can still use your own classes for the data, but the library provides the support and standard algorithms.

the algorithm as you describe it seems a bit unclear on what are root nodes. if you don't start at a root then you won't find the "entire island" - just the parts below (children of) where you started. you can fix this by following parents as well as children. apart from that, it sounds ok - it is doing as PaulF's answer says, as far as i can tell, and finding connected components.

see also Good Java graph algorithm library?

like image 26
andrew cooke Avatar answered Sep 20 '22 16:09

andrew cooke