Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Speeding up datetime comparison with Cython

I am trying to speed up comparison between datetimes using Cython, when passed a numpy array of datetimes (or details sufficient to create datetimes). To start, I tried to see how Cython would speed up comparison between integers.

testArrayInt = np.load("testArray.npy")

Python Method:

def processInt(array):
    compareSuccess = 0#number that is greater than
    testValue = 1#value to compare against
    for counter in range(testArrayInt.shape[0]):
        if testValue > testArrayInt[counter]:
            compareSuccess+=1
    print compareSuccess

Cython Method:

def processInt(np.ndarray[np.int_t,ndim=1] array):
    cdef int rows = array.shape[0]
    cdef int counter = 0
    cdef int compareSuccess = 0
    for counter in range(rows):
        if testInt > array[counter]:
            compareSuccess = compareSuccess+1
    print compareSuccess

Time Comparison with a numpy array of rows 1000000 is:

Python: 0.204969 seconds
Cython: 0.000826 seconds
Speedup: 250 times approx.

Repeating the same exercise with datetimes: Since cython wouldnt accept an array of datetime, I split and sent an array of year, month and days to both the methods.

testArrayDateTime = np.load("testArrayDateTime.npy")

Python Code:

def processDateTime(array):
    compareSuccess = 0
    d = datetime(2009,1,1)#test datetime used to compare
    rows = array.shape[0]
    for counter in range(rows):
        dTest = datetime(array[counter][0],array[counter][1],array[counter][2])
        if d>dTest:
            compareSuccess+=1
    print compareSuccess

Cython Code:

from cpython.datetime cimport date

def processDateTime(np.ndarray[np.int_t, ndim=2] array):
    cdef int compareSuccess = 0
    cdef int rows = avlDates.shape[0]
    cdef int counter = 0
    for counter in range(rows):
        dTest = date(array[counter,0],array[counter,1],array[counter,2])
        if dTest>d:
            compareSuccess=compareSuccess+1
    print compareSuccess

Performance:

Python: 0.865261 seconds
Cython: 0.162297 seconds
Speedup: 5 times approx. 

Why is the speedup so low? And what is a possible way to increase this?

like image 268
statBeginner Avatar asked Oct 16 '15 06:10

statBeginner


1 Answers

You are creating a date object for every line. This takes more time, both because you have to allocate and deallocate memory and because it runs various checks on the arguments to ensure that it is a valid date.

For a faster comparison, either compare the np.datetime64 array using integer comparison or compare the year, month and day columns separately as integers.

like image 156
user7813790 Avatar answered Sep 28 '22 05:09

user7813790