Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to speed up python loop

I had a look at several dicussions on several sites and none of them gave me a solution. This piece of code takes more than 5 seconds to run :

for i in xrange(100000000):
  pass

I'm working on an integer optimization problem and I have to use an O(n log n) algorithm edit : an O(n²/4) algorithm, where n stands for all the matrix' items, ie, in the following code, n * m = 10000. So, for a matrix 100 * 100 with 10000 elements, it will result in nearly 25000000 iterations .... Its code can be summed up like this :

m = 100
n = 100
for i in xrange(m):
  for j in xrange(n):
    for i2 in xrange(i + 1, m):
      for j2 in xrange(j + 1, n):
        if myarray[i][j] == myarray[i2][j2] and myarray[i2][j] == myarray[i][j2]:
          return [i, j], [i2, j2]

Should I give up with Python and go back to Java or to C ?

I work with Python 2.7 and Psyco isn't available. PyPy doesn't support Tkinter out of the box and I'm using Tkinter.

So, would they improve the looping speed ? Are there other solutions ?

like image 342
s_xavier Avatar asked Jan 07 '12 15:01

s_xavier


People also ask

Is Python for loop fast?

As you can see, here we are using nearly only built-in features, which are always faster than pure Python. This is why the for-each loop is so much faster than its counterparts because the heavy load is handled by the Python interpreter itself in an optimized way.


2 Answers

Not the prettiest coding style, but desperate times call for desperate coding. Try turning your nested nested loops into one big generator expression:

try:
    i,j,i2,j2 = ((i,j,i2,j2)
        for i in xrange(m)
          for j in xrange(n)
            for i2 in xrange(i + 1, m)
              for j2 in xrange(j + 1, n)
                if myarray[i][j] == myarray[i2][j2] and 
                    myarray[i2][j] == myarray[i][j2]).next()
    return [i,j],[i2,j2]
except StopIteration:
    return None

Updated to use builtins next and product, and Py3 range instead of xrange:

from itertools import product
match = next(((i,j,i2,j2)
    for i, j in product(range(m), range(n))
        for i2, j2 in product(range(i+1, m), range(j+1, n))
            if myarray[i][j] == myarray[i2][j2] and 
                myarray[i2][j] == myarray[i][j2]), None)
if match is not None:
    i,j,i2,j2 = match
    return [i,j],[i2,j2]
else:
    return None
like image 87
PaulMcG Avatar answered Oct 08 '22 01:10

PaulMcG


You can still use the python notation, and have the speed of C, using the Cython project. A first step would be to convert the loop in C loop: it's automatically done by typing all the variables used in the loop:

cdef int m = 100
cdef int n = 100
cdef int i, j, i2, j2
for i in xrange(m):
  for j in xrange(n):
    for i2 in xrange(i + 1, m):
      for j2 in xrange(j + 1, n):

For The next part, it would be even better if that myarray would be pure C array, then no rich python comparaison nor array access. With your python array, you can remove the rich comparaison by doing native comparaison (if you have double in your array):

        cdef double a, b, c, d
        a = myarray[i][j]
        b = myarray[i2][j2]
        c = myarray[i2][j]
        d = myarray[i][j2]

        if a == b and c == d:
          return [i, j], [i2, j2]

You can see how optimization are going by running cython -a yourfile.pyx, then open the yourfile.html generate. You'll see how Cython have optimized your code, and removed Python overhead.

like image 38
tito Avatar answered Oct 08 '22 02:10

tito