Python implementation of a graph-similarity-grading algorithm

Tags:

I am looking for a Python implementation of an algorithm which performs the following task:

Given two directed graphs, that may contain cycles, and their roots, produce a score to the two graphs' similarity.

(The way that Python's difflib can perform for two sequences)

Hopefully, such an implementation exists. Otherwise, I'll try and implement an algorithm myself. In which case, what's the preferable algorithm to implement (with regard to simplicity).

The way the algorithm works is of no importance to me, though its' complexity is. Also, an algorithm which works with a different data-structure is also acceptable, as long as a graph, such as I described, can be represented with this DS.

I'll emphasize, an implemetation would be much nicer.

Edit:
It seems an isomorphism algortihm is not relevant. It was suggested that graph edit distance is more to the point, which narrows down my search to a solution that either executes graph edit distance or reduces a graph to a tree and then performs tree edit distance.
The nodes themseleves consist of a few lines of assembly code each.

218

asked Aug 25 '12 12:08

YaronK

2 Answers

Another method is to use what is called Eigenvector Similarity. Basically, you calculate the Laplacian eigenvalues for the adjacency matrices of each of the graphs. For each graph, find the smallest k such that the sum of the k largest eigenvalues constitutes at least 90% of the sum of all of the eigenvalues. If the values of k are different between the two graphs, then use the smaller one. The similarity metric is then the sum of the squared differences between the largest k eigenvalues between the graphs. This will produce a similarity metric in the range [0, ∞), where values closer to zero are more similar.

For example, if using networkx:

def select_k(spectrum, minimum_energy = 0.9):     running_total = 0.0     total = sum(spectrum)     if total == 0.0:         return len(spectrum)     for i in range(len(spectrum)):         running_total += spectrum[i]         if running_total / total >= minimum_energy:             return i + 1     return len(spectrum)  laplacian1 = nx.spectrum.laplacian_spectrum(graph1) laplacian2 = nx.spectrum.laplacian_spectrum(graph2)  k1 = select_k(laplacian1) k2 = select_k(laplacian2) k = min(k1, k2)  similarity = sum((laplacian1[:k] - laplacian2[:k])**2)

114

answered Sep 18 '22 16:09

ESultanik

What we ended up doing is implementing an algorithm described in: "Heuristics for Chemical Compound Matching".

We're using NetworkX for representing the grap and for finding the max clique.

Edit:

Basically, you create a new graph, each node (v) representing a possible pairing of a node from graph A (a) to a node from graph B (b).

If in your application the two nodes (a,b) are either similar or not, you remove nodes (v) from the new graph which correspond to dissimilar pairings (a,b). You connect two nodes with an edge if they don't contradict each other. For example, the pairings (a,b) and (a,c) contradict each other (see article for a formal definition). You then find a clique in the new graph which has the maximum amount of nodes.

If in your application the two nodes' similarity is not binary, you give the new nodes weights within a range (say (0,1)). You can remove, heuristically, remove new nodes with similarity grades lower than a predefined threshold. You then find a clique in the new graph which has the maximum weight (the sum of the nodes' assigned weights).

Either way, you finish up by generating the similarity grade: the size / total weight of the clique divided by a function of the attributes of the original graphs (maximum/minimum/average of the sizes / weights of A and B ).

A nice feature is that the you can deduce the "source" of the similarity from the clique you found - the "stronger" pairings.

Further clarification: The constraints are application dependent. We used the approach to compare pairs of function control-flow graphs. Generally, the approach finds a matching of some nodes in the first graph to some nodes in the second graph (subgraph to subgraph). Each node in the association graph symbolizes a possible matching of a single node from the first graph to a single node in the second graph. Since eventually a clique (a subset of the nodes) is selected, an edge means two matchings don't contradict each other. To apply for a different application, you should ask what are the criteria for possible pairings (or what nodes do I create), and how does selecting one pairing influence the selection of another pairing (or how do I connect the nodes with edges).

answered Sep 17 '22 16:09

YaronK

Related questions
                            
                                Java Factory Pattern With Generics
                            
                                Putty likes to print itself sometimes in my command line
                            
                                Python "setup.py develop": is it possible to create ".egg-info" folder not in source code folder?
                            
                                Setting an httponly cookie with javax.servlet 2.5
                            
                                Sending http post request in Ruby by Net::HTTP
                            
                                Event handlers, closures and garbage collection in Javascript
                            
                                Session management with Firebase?
                            
                                How can I stop cURL from using 100 Continue?
                            
                                Writing to HDFS from Java, getting "could only be replicated to 0 nodes instead of minReplication"
                            
                                Does the finally block execute if the thread running the function is interrupted?
                            
                                Multithreading in JavaFX hangs the UI
                            
                                I want to change actionbar icon size

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python implementation of a graph-similarity-grading algorithm

Tags:

YaronK

People also ask

2 Answers

ESultanik

YaronK

Recent Activity

Donate For Us