I am trying to speed up comparison between datetimes using Cython, when passed a numpy array of datetimes (or details sufficient to create datetimes). To start, I tried to see how Cython would speed up comparison between integers.
testArrayInt = np.load("testArray.npy")
Python Method:
def processInt(array):
compareSuccess = 0#number that is greater than
testValue = 1#value to compare against
for counter in range(testArrayInt.shape[0]):
if testValue > testArrayInt[counter]:
compareSuccess+=1
print compareSuccess
Cython Method:
def processInt(np.ndarray[np.int_t,ndim=1] array):
cdef int rows = array.shape[0]
cdef int counter = 0
cdef int compareSuccess = 0
for counter in range(rows):
if testInt > array[counter]:
compareSuccess = compareSuccess+1
print compareSuccess
Time Comparison with a numpy array of rows 1000000 is:
Python: 0.204969 seconds
Cython: 0.000826 seconds
Speedup: 250 times approx.
Repeating the same exercise with datetimes: Since cython wouldnt accept an array of datetime, I split and sent an array of year, month and days to both the methods.
testArrayDateTime = np.load("testArrayDateTime.npy")
Python Code:
def processDateTime(array):
compareSuccess = 0
d = datetime(2009,1,1)#test datetime used to compare
rows = array.shape[0]
for counter in range(rows):
dTest = datetime(array[counter][0],array[counter][1],array[counter][2])
if d>dTest:
compareSuccess+=1
print compareSuccess
Cython Code:
from cpython.datetime cimport date
def processDateTime(np.ndarray[np.int_t, ndim=2] array):
cdef int compareSuccess = 0
cdef int rows = avlDates.shape[0]
cdef int counter = 0
for counter in range(rows):
dTest = date(array[counter,0],array[counter,1],array[counter,2])
if dTest>d:
compareSuccess=compareSuccess+1
print compareSuccess
Performance:
Python: 0.865261 seconds
Cython: 0.162297 seconds
Speedup: 5 times approx.
Why is the speedup so low? And what is a possible way to increase this?
You are creating a date
object for every line. This takes more time, both because you have to allocate and deallocate memory and because it runs various checks on the arguments to ensure that it is a valid date.
For a faster comparison, either compare the np.datetime64
array using integer comparison or compare the year, month and day columns separately as integers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With