Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing a large number of graphs for isomorphism

I am comparing a large set of networkx graphs for isomorphism, where most of the graphs should not be isomorphic (Lets say 0-20% are isomorphic to something in the list, for example).

I have tried the following approach.

graphs = [] # A list of networkx graphs
unique = [] # A list of unique graphs

for new in graphs:
    for old in unique:
        if nx.is_isomorphic(new, old[0]):
            break
    else:
        unique.append([new])

This let me get a much faster reduced set, but I still find it too slow for ideal use. Is there some faster algorithm to handle this type of problem (comparing pairs of transitive commutative properties) or a way to extend this algorithm to a multicore setup (running on a 20 core machine).

I am already filtering these sets of data based on the number of nodes / edges, we can assume that the nx.is_isomorphic function cannot be made faster by any filtering types of operations. I also cannot change tools easily right now, so using a compiled package is not an option.

Additional Information:

Graphs tend to be roughly 16-20 nodes with 24-48 edges total, there is a lot of interconnection so each node has roughly 8 edges. Each edge is labeled as well, but there are only 2-3 types of edges ever used.

like image 569
Tristan Maxson Avatar asked Oct 29 '17 11:10

Tristan Maxson


3 Answers

Can you use nauty (http://users.cecs.anu.edu.au/~bdm/nauty/, available in linux distributions)? That has a canonical label algorithm that is fast and might work for your problem. A canonical labeling makes isomorphic graphs identical (canonization). For example using graph6 format output from a set of random graphs gives the following count of isomorphic graphs

$ cat g6.py
import networkx as nx
for i in range(100000):
    print(nx.generate_graph6(nx.fast_gnp_random_graph(4,0.2),header=False))


$ python g6.py  |nauty-labelg  |sort |uniq -c 
>A labelg
>Z 100000 graphs labelled from stdin to stdout in 0.21 sec.
   4898 C`
    167 C^
     10 C~
  26408 C?
  39392 C@
  19684 CB
   1575 CF
   1608 CJ
   1170 CN
    288 Cr
   4800 CR

Those are the 11 graphs of 4 nodes -

$ cat atlas.py 
import networkx as nx
for g in  nx.atlas.graph_atlas_g()[8:19]:
     print(nx.generate_graph6(g,header=False))
$ python atlas.py  |nauty-labelg  |sort |uniq -c 
>A labelg
>Z 11 graphs labelled from stdin to stdout in 0.00 sec.
      1 C`
      1 C^
      1 C~
      1 C?
      1 C@
      1 CB
      1 CF
      1 CJ
      1 CN
      1 Cr
      1 CR

It would be pretty easy to parallelize this approach if it runs too slow.

like image 146
Aric Avatar answered Nov 15 '22 21:11

Aric


As others have mentioned, if you want to stay in Python + Networkx, you could use could_be_isomorphic to filter your graphs.

The problem is that this method expects 2 graphs as an input, not millions. If you compare every pair of graphs with this method, it would take an awfully long time.

Looking at the sourcecode of could_be_isomorphic, it compares degree, triangle, and number of cliques sequences for both graphs. If they're not equal, the graphs cannot be isomorphic.

You could pack this fingerprint in a function, sort your graphs according to this fingerprint and group them with itertools.groupby. There will be a huge majority of lone graphs. The few graphs that have the same fingerprints can then be checked for isomorphism.

Using a list of 100 000 random graphs:

many_graphs = [nx.fast_gnp_random_graph(random.randint(16, 22), 0.2) for _ in range(100000)]

There were only 500 fingerprints that were shared by at least 2 graphs. If you add edge types information, there will be even fewer common fingerprints.

Here's an example with 3000 graphs, each having between 10 and 14 nodes:

import networkx as nx
from itertools import groupby
import random

many_graphs = [nx.fast_gnp_random_graph(
    random.randint(10, 14), 0.3) for _ in range(3000)]


def graph_fingerprint(g):
    order = g.order()
    d = g.degree()
    t = nx.triangles(g)
    c = nx.number_of_cliques(g) 
    props = [[d[v], t[v], c[v]] for v in d]
    props.sort()
    # TODO: Add count of edge types.
    return(props)


sorted_graphs = sorted(many_graphs, key=graph_fingerprint)

for f, g in groupby(sorted_graphs, key=graph_fingerprint):
    similar_graphs = list(g)
    n = len(similar_graphs)
    if n > 1:
        print("Found %d graphs which could be isomorphic." % n)
        for i in range(n):
            for j in range(i + 1, n):
                g1, g2 = similar_graphs[i], similar_graphs[j]
                if g1 != g2 and nx.is_isomorphic(g1, g2):
                    print(" %s and %s are isomorphic!" %
                          (nx.generate_graph6(g1,header=False), nx.generate_graph6(g2,header=False)))

It finds 4 isomorphic pairs in less than 1s:

Found 2 graphs which could be isomorphic.
Found 3 graphs which could be isomorphic.
 Id?OGG_C? and IaKO_@?@? are isomorphic!
Found 6 graphs which could be isomorphic.
 I?OWcGG?G and I?OCSa?@_ are isomorphic!
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
 I_uI???JG and II??QDNA? are isomorphic!
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 3 graphs which could be isomorphic.
Found 3 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 3 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 3 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
 IDOCCY@GG and IOGC@`dS? are isomorphic!
Found 3 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 3 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 3 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 3 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.
Found 2 graphs which could be isomorphic.

Here are the 2 last isomorphic graphs. "IDOCCY@GG":

enter image description here

and "IOGC@\dS?":

enter image description here

Here are 2 graphs which have the same fingerprint but aren't isomorphic:

enter image description here enter image description here

The fingerprinting could be done in parallel. Sorting and grouping would have to happen on 1 CPU, but the isomorphism check for each group could be done in distinct CPUs.

like image 41
Eric Duminil Avatar answered Nov 15 '22 21:11

Eric Duminil


You can try your code on PyPy which provides just-in-time compilation for pure Python code. For possible performance boost they say it...

...depends greatly on the type of task being performed. The geometric average of all benchmarks is 0.13 or 7.5 times faster than CPython

If your workload is CPU-bound (which seems like so) and Python process is long-running (so JIT compilation can be performed) then performance boost can be significant. NetworkX is pure Python (it has optional dependencies like numpy, but they are needed for extra functionality) and specifically isomorph module. I tried PyPy 5.7.1 and networkx/algorithms/isomorphism/tests/test_isomorphism.py passes. The the suite in general has a few failures:

Ran 2952 tests in 51.311s

FAILED (failures=3, skipped=54)
Test failed: <unittest.runner.TextTestResult run=2952 errors=0 failures=3>

On Python 2.7.12 it's:

Ran 2952 tests in 88.915s

OK (skipped=54)
like image 35
saaj Avatar answered Nov 15 '22 21:11

saaj