Rejecting isomorphisms from collection of graphs

Tags:

I have a collection of 15M (Million) DAGs (directed acyclic graphs - directed hypercubes actually) that I would like to remove isomorphisms from. What is the common algorithm for this? Each graph is fairly small, a hybercube of dimension N where N is 3 to 6 (for now) resulting in graphs of 64 nodes each for N=6 case.

Using networkx and python, I implemented it like this which works for small sets like 300k (Thousand) just fine (runs in a few days time).

def isIsomorphicDuplicate(hcL, hc):
    """checks if hc is an isomorphism of any of the hc's in hcL
    Returns True if hcL contains an isomorphism of hc
    Returns False if it is not found"""
    #for each cube in hcL, check if hc could be isomorphic
    #if it could be isomorphic, then check if it is
    #if it is isomorphic, then return True
    #if all comparisons have been made already, then it is not an isomorphism and return False

    for saved_hc in hcL:
        if nx.faster_could_be_isomorphic(saved_hc, hc):
            if nx.fast_could_be_isomorphic(saved_hc, hc):
                if nx.is_isomorphic(saved_hc, hc):
                    return True
    return False

One better way to do it would be to convert each graph to its canonical ordering, sort the collection, then remove the duplicates. This bypasses checking each of the 15M graphs in a binary is_isomophic() test, I believe the above implementation is something like O(N!N) (not taking isomorphic time into account) whereas a clean convert all to canonical ordering and sort should take O(N) for the conversion + O(log(N)N) for the search + O(N) for the removal of duplicates. O(N!N) >> O(log(N)N)

I found this paper on Canonical graph labeling, but it is very tersely described with mathematical equations, no pseudocode: "McKay's Canonical Graph Labeling Algorithm" - http://www.math.unl.edu/~aradcliffe1/Papers/Canonical.pdf

tldr: I have an impossibly large number of graphs to check via binary isomorphism checking. I believe the common way this is done is via canonical ordering. Do any packaged algorithms or published straightforward to implement algorithms (i.e. have pseudocode) exist?

318

asked Nov 12 '14 17:11

SwimBikeRun

1 Answers

Here is a breakdown of McKay ’ s Canonical Graph Labeling Algorithm, as presented in the paper by Hartke and Radcliffe [link to paper].

I should start by pointing out that an open source implementation is available here: nauty and Traces source code.

Ok, let's do this! Unfortunately this algorithm is heavy in graph theory, so we need some terms. First I will start by defining isomorphic and automorphic.

Isomorphism:

Two graphs are isomorphic if they are the same, except that the vertices are labelled differently. The following two graphs are isomorphic.

Isomorphic graphs

Automorphic:

Two graphs are automorphic if they are completely the same, including the vertex labeling. The following two graphs are automorphic. This seems trivial, but turns out to be important for technical reasons.

Automorphic graphs

Graph Hashing:

The core idea of this whole thing is to have a way to hash a graph into a string, then for a given graph you compute the hash strings for all graphs which are isomorphic to it. The isomorphic hash string which is alphabetically (technically lexicographically) largest is called the "Canonical Hash", and the graph which produced it is called the "Canonical Isomorph", or "Canonical Labelling".

With this, to check if any two graphs are isomorphic you just need to check if their canonical isomporphs (or canonical labellings) are equal (ie are automorphs of each other). Wow jargon! Unfortuntately this is even more confusing without the jargon :-(

The hash function we are going to use is called i(G) for a graph G: build a binary string by looking at every pair of vertices in G (in order of vertex label) and put a "1" if there is an edge between those two vertices, a "0" if not. This way the j-th bit in i(G) represents the presense of absence of that edge in the graph.

McKay ’ s Canonical Graph Labeling Algorithm

The problem is that for a graph on n vertices, there are O( n! ) possible isomorphic hash strings based on how you label the vertices, and many many more if we have to compute the same string multiple times (ie automorphs). In general we have to compute every isomorph hash string in order to find the biggest one, there's no magic sort-cut. McKay's algorithm is a search algorithm to find this canonical isomoprh faster by pruning all the automorphs out of the search tree, forcing the vertices in the canonical isomoprh to be labelled in increasing degree order, and a few other tricks that reduce the number of isomorphs we have to hash.

(1) Sect 4: the first step of McKay's is to sort vertices according to degree, which prunes out the majority of isomoprhs to search, but is not guaranteed to be a unique ordering since there may be more than one vertex of a given degree. For example, the following graph has 6 vertices; verts {1,2,3} have degree 1, verts {4,5} have degree 2 and vert {6} has degree 3. It's partial ordering according to vertex degree is {1,2,3|4,5|6}.

vertex degree example

(2) Sect 5: Impose artificial symmetry on the vertices which were not distinguished by vertex degree; basically we take one of the groups of vertices with the same degree, and in turn pick one at a time to come first in the total ordering (fig. 2 in the paper), so in our example above, the node {1,2,3|4,5|6} would have children { {1|2,3|4,5|6}, {2|1,3|4,5|6}}, {3|1,2|4,5|6}} } by expanding the group {1,2,3} and also children { {1,2,3|4|5|6}, {1,2,3|5|4|6} } by expanding the group {4,5}. This splitting can be done all the way down to the leaf nodes which are total orderings like {1|2|3|4|5|6} which describe a full isomorph of G. This allows us to to take the partial ordering by vertex degree from (1), {1,2,3|4,5|6}, and build a tree listing all candidates for the canonical isomorph -- which is already a WAY fewer than n! combinations since, for example, vertex 6 will never come first. Note that McKay evaluates the children in a depth-first way, starting with the smallest group first, this leads to a deeper but narrower tree which is better for online pruning in the next step. Also note that each total ordering leaf node may appear in more than one subtree, there's where the pruning comes in!

(3) Sect. 6: While searching the tree, look for automorphisms and use that to prune the tree. The math here is a bit above me, but I think the idea is that if you discover that two nodes in the tree are automorphisms of each other then you can safely prune one of their subtrees because you know that they will both yield the same leaf nodes.

I have only given a high-level description of McKay's, the paper goes into a lot more depth in the math, and building an implementation will require an understanding of this math. Hopefully I've given you enough context to either go back and re-read the paper, or read the source code of the implementation.

answered Oct 26 '22 04:10

Mike Ounsworth

Related questions
                            
                                random colors for circles in d3.js graph
                            
                                Graph/lattice simplification
                            
                                Finding a shortest path that passes through some arbitrary sequence of nodes?
                            
                                How can I build the post-dominator tree of a function with an endless loop?
                            
                                What is the easiest way to generate a Control Flow-Graph for a method in Python?
                            
                                Find all subtrees of size N in an undirected graph
                            
                                Graph Theory: Splitting a Graph
                            
                                Plotting a cumulative graph of python datetimes
                            
                                Creating a facet_wrap plot with ggplot2 with different annotations in each plot
                            
                                How to create a bidirectional edge between two vertices using gremlin?
                            
                                How to create ternary contour plot in Python?
                            
                                missing value in highcharts line graph results in no line, just points
                            
                                Use neo4j with R
                            
                                How to have actual values in matplotlib Pie Chart displayed
                            
                                What's the fastest force-directed network graph engine for large data sets?
                            
                                dealing with highcharts bar chart with really long category names
                            
                                Number of shortest paths in a graph
                            
                                Colouring edges by weight in networkx
                            
                                Differences between Minimum Spanning Tree and Shortest Path Tree
                            
                                Finding a cycle in an undirected graph vs finding one in a directed graph

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Rejecting isomorphisms from collection of graphs

Tags:

graph

isomorphism

canonicalization

SwimBikeRun

People also ask

1 Answers

Mike Ounsworth

Recent Activity

Donate For Us