Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I've mangled Cython badly, it's performing worse than pure Python. Why?

I'm rather new to Python and absolutely ignorant of C (unfortunately) so I am struggling to properly understand some aspects of working with Cython.

After profiling a Python program and discovering that it was just a couple of loops that were hogging most of the time, I decided to look into dumping them into Cython. Initially, I just let Cython interpret the Python as it was, and the result was a (remarkable!) ~2x speed boost. Cool!

From the Python main, I pass the function two 2-D arrays ("a" and "b") and a float, "d", and it returns a list, "newlist". As examples:

a =[[12.7, 13.5, 1.0],[23.4, 43.1, 1.0],...]
b =[[0.46,0.95,0],[4.56,0.92,0],...]
d = 0.1

Here is the original code, with just the cdefs added for Cython:

def loop(a, b, d):

    cdef int i, j
    cdef double x, y

    newlist = []

    for i in range(len(a)):
        if b[i][2] != 1:
            for j in range(i+1,len(a)):
                if a[i] == a[j] and b[j][2] != 1:
                    x = b[i][0]+b[j][0]
                    y = b[i][1]+b[j][1]
                    b[i][2],b[j][2] = 1,1

                    if abs(y)/abs(x) > d:
                        if y > 0: newlist.append([a[i][0],a[i][1],y])

    return newlist

In "pure Python", this ran (with several ten-thousand loops) in ~12.5s. In Cython it ran in ~6.3s. Great progress with near-zero work done!

However, with a little reading, it was clear that much, much more could be done, so I set about trying to apply some type changes to get things going even faster, following the Cython docs, here (also referenced in the comments).

Here are the collected modifications, meant to mimic the Cython docs:

import numpy as np
cimport numpy as np

DtypeA = np.float
DtypeB = np.int

ctypedef np.float_t DtypeA_t
ctypedef np.int_t DtypeB_t

def loop(np.ndarray[DtypeA_t, ndim=2] A,
         np.ndarray[DtypeA_t, ndim=2] B,
         np.ndarray[DtypeB_t, ndim=1] C,
         float D):

    cdef Py_ssize_t i, j
    cdef float x, y

    cdef np.ndarray[DtypeA_t, ndim=2] NEW_ARRAY = np.zeros((len(C),3), dtype=DtypeA)

    for i in range(len(C)):
        if C[i] != 1:
            for j in range(i+1,len(C)):
                if A[i][0]==A[j][0] and A[i][1]==A[j][1] and C[j]!= 1:
                    x = B[i][0]+B[j][0]
                    y = B[i][1]+B[j][1]
                    C[i],C[j] = 1,1

                    if abs(y)/abs(x) > D:
                        if y > 0: NEW_ARRAY[i]=([A[i][0],A[i][1],y])

    return NEW_ARRAY

Among other things, I split the previous array "b" into two different input arrays "B" and "C", because each row of "b" contained 2 float elements and an integer that just acted as a flag. So I removed the flag integers and put them in a separate 1-D array, "C". So, the inputs now looked liked this:

A =[[12.7, 13.5, 1.0],[23.4, 43.1, 1.0],...]
B =[[0.46,0.95],[4.56,0.92],...]
C =[0,0,...]
D = 0.1

Ideally, this should go much faster with all the variables now being typed(?)...but obviously I'm doing something very wrong, because the function now comes in at a time of 35.3s...way WORSE than the "pure Python"!!

What am I botching so badly? Thanks for reading!

like image 636
Nordlendingen Avatar asked Dec 11 '22 23:12

Nordlendingen


1 Answers

I believe the use of the indexing notation b[j][0] may be throwing Cython off, making it impossible for it to use fast indexing operations behind the scenes. Incidentally, even in pure Python code this style is not idiomatic and may lead to slower code.

Try instead to use the notation b[j,0] throughout and see if it improves your performance.

like image 110
cfh Avatar answered Apr 06 '23 12:04

cfh