Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a Python equivalent to dereferencing in Perl?

I'm currently porting a code base, I initially implemented in Perl, to Python. The following short piece of code takes up about 90% of the significant runtime when I run on the whole dataset.

    def equate():
        for i in range(row):
            for j in range(row):
                 if adj_matrix[i][j] != adj_matrix[mapping[i]][mapping[j]]:
                 return False
        return True

Where equate is a closure inside of another method, row is an integer, adj_matrix is a list of lists representing a matrix and mapping is a list representing a vector.

The equivalent Perl code is as follows:

sub equate
{
    for ( 0..$row)
    {
        my ($smrow, $omrow) = ($$adj_matrix[$_], $$adj_matrix[$$mapping[$_]]); #DEREF LINE
        for (0..$row)
        {
            return 0 if $$smrow[$_] != $$omrow[$$mapping[$_]];
        }
    }
    return 1;
 }

This is encapsulated as a sub ref in the outer subroutine, so I don't have to pass variables to the subroutine.

In short, the Perl version is much much faster and my testing indicates that it is due to the dereferencing in "DEREF LINE". I have tried what I believed was the equivalent in Python:

    def equate():
        for i in range(row):
            row1 = adj_matrix[i]
            row2 = adj_matrix[mapping[i]]
            for j in range(row):
                 if row1[j] != row2[mapping[j]]:
                 return False
        return True

But this was an insignificant improvement. Additionally, I tried using a NumPy matrix to represent adj_matrix, but again this was a small improvement probably because adj_matrix is typically a small matrix so the overhead from NumPy is much greater, and I'm not really doing any matrix math operations.

I welcome any suggestion to improve the runtime of the Python equate method and an explanation why my "improved" Python equate method is not much better. While I consider myself a competent Perl programmer, I am a Python novice.


ADDITIONAL DETAILS:

I am using Python 3.4, although similar behavior was observed when I initially implemented it in 2.7. I switched to 3.4 since the lab I work in uses 3.4.

As for the contents of the vectors, allow me to provide some background so the following details make sense. This is part of a algorithm to identify subgraph isomorphisms between two chemical compounds (a and b) represented by the graphs A and B respectively, where each atom is a node and each bond an edge. The above code is for the simplified case where A = B, so I am looking for symmetrical transformations of the compound (planes of symmetry), and the size of A in number of atoms is N. Each atom is assigned a unique index beginning at zero.

Mapping is a 1D vector of dimensions 1xN where each element in a mapping is an integer. mapping[i] = j, represents that atom with index i (will refer to as atom i or generically atom 'index') is currently mapped to atom j. The absence of a mapping is indicated by j = -1.

Adj_matrix is a 2D matrix of dimensions NxN where each element adj_matrix[i][j] = k is a natural number and represents the presence and order of an edge between atoms i and j in compound A. If k = 0, there is no such edge (AKA no bond between i and j) else k > 0 and k represents the order of the bond between atoms i and j.

When A != B, there are two different adj_matrices that are compared in equate and the size of a and b in atoms is Na and Nb. Na does not have to equal Nb, but Na =< Nb. I only mention this as optimizations are possible for the special case that are not valid in the general case, but any advice would be helpful.

like image 414
Joshua Mitchell Avatar asked Dec 27 '14 20:12

Joshua Mitchell


1 Answers

With numpy you could vectorize your whole code as follows, assuming adj_matrix and mapping are numpy arrays:

def equate():
    row1 = adj_matrix[:row]
    row2 = adj_matrix[mapping[:row]]
    return np.all(row1 == row2)

It doesn't break out early of the loop if it finds a mismatch, but unless your arrays are huge, the speed of NumPy is going to dominate.

like image 146
Jaime Avatar answered Oct 31 '22 18:10

Jaime