Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a bad interaction, in numpy 1.7.1, between datetime64 and vectorize?

Tags:

python

numpy

I want to convert a pandas DateTimeIndex to excel dates (the number of days since 12/30/1899).. I tried to use numpy.vectorize on a function that takes datetime64s and returns an excel date. I was surprised by how numpy vectorize behaves - on the first call, a test call to see the return type, vectorize passes in datetime64 as provided. On subsequent calls, it passes in the internal storage type of the datetime64 - in my case a long. Internally, _get_ufunc_and_otypes calls:

inputs = [asarray(_a).flat[0] for _a in args]
outputs = func(*inputs)

While _vectorize_call does the following:

inputs = [array(_a, copy=False, subok=True, dtype=object) 
                  for _a in args]            

outputs = ufunc(*inputs)

As it turns out, I could just as easily use the internal numpy array math to do it (x - day0)/1day. But this behavior seems strange (type changing when a function is vectorized)

Here's my sample code:

import numpy

DATETIME64_ONE_DAY   = numpy.timedelta64(1,'D')
DATETIME64_DATE_ZERO = numpy.datetime64('1899-12-30T00:00:00.000000000')

def excelDateToDatetime64(x):
   return DATETIME64_DATE_ZERO + numpy.timedelta64(int(x),'D')

def datetime64ToExcelDate(x):
   print type(x)
   return (x - DATETIME64_DATE_ZERO) / DATETIME64_ONE_DAY

excelDateToDatetime64_Array = numpy.vectorize(excelDateToDatetime64)
datetime64ToExcelDate_Array = numpy.vectorize(datetime64ToExcelDate)

excelDates = numpy.array([ 41407.0, 41408.0, 41409.0, 41410.0, 41411.0, 41414.0 ])
datetimes  = excelDateToDatetime64_Array(excelDates)
excelDates2 = datetime64ToExcelDate(datetimes)


print excelDates2  # Works fine

# TypeError: ufunc subtract cannot use operands with types dtype('int64') and dtype('<M8[ns]')
# You can see from the print that the type coming in is inconsistent
excelDates2 = datetime64ToExcelDate_Array(datetimes) 
like image 732
DaveBlob Avatar asked Dec 20 '25 06:12

DaveBlob


1 Answers

Datetimes and timedeltas need to be handled using the underlying data (which you just do arr.view('i8') to get, these are np.int64)

Define your constants in terms of their underlying values

In [94]: DATETIME_DATE_ZERO_VIEW = DATETIME64_DATE_ZERO.view('i8')

In [95]: DATETIME_DATE_ZERO_VIEW
Out[95]: -2209161600000000000

In [96]: DATETIME64_ONE_DAY_VALUE = DATETIME64_ONE_DAY.astype('m8[ns]').item()

In [97]: DATETIME64_ONE_DAY_VALUE
Out[97]: 86400000000000L

In [106]: def vect(x):
   .....:     return (x-DATETIME_DATE_ZERO_VIEW)/DATETIME64_ONE_DAY_VALUE
   .....: 

In [107]: f = np.vectorize(vect)

Pass in a view of the underlying np.int64

In [109]: f(datetimes.view('i8'))
Out[109]: array([41407, 41408, 41409, 41410, 41411, 41414])

Pandas way

In [98]: Series(datetimes).apply(lambda x: (x.value-DATETIME_DATE_ZERO_VIEW)/DATETIME64_ONE_DAY_VALUE)
Out[98]: 
0    41407
1    41408
2    41409
3    41410
4    41411
5    41414
dtype: int64
like image 88
Jeff Avatar answered Dec 23 '25 00:12

Jeff



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!