I want to pass a datetime array to a Numba function (which cannot be vectorised and would otherwise be very slow). I understand Numba supports numpy.datetime64. However, it seems it supports datetime64[D] (day precision) but not datetime64[ns] (millisecond precision) (I learnt this the hard way: is it documented?).
I tried to convert from datetime64[ns] to datetime64[D], but can't seem to find a way! Any ideas?
I have summarised my problem with the minimal code below. If you run testdf(mydates)
, which is datetime64[D], it works fine. If you run testdf(dates_input)
, which is datetime64[ns], it doesn't. Note that this example simply passes the dates to the Numba function, which doesn't (yet) do anything with them. I try to convert dates_input to datetime64[D], but the conversion doesn't work. In my original code I read from a SQL table into a pandas dataframe, and need a column which changes the day of each date to the 15th.
import numba import numpy as np import pandas as pd import datetime mydates =np.array(['2010-01-01','2011-01-02']).astype('datetime64[D]') df=pd.DataFrame() df["rawdate"]=mydates df["month_15"] = df["rawdate"].apply(lambda r: datetime.date( r.year, r.month,15 ) ) dates_input = df["month_15"].astype('datetime64[D]') print dates_input.dtype # Why datetime64[ns] and not datetime64[D] ?? @numba.jit(nopython=True) def testf(dates): return 1 print testf(mydates)
The error I get if I run testdf(dates_input)
is:
numba.typeinfer.TypingError: Failed at nopython (nopython frontend) Var 'dates' unified to object: dates := {pyobject}
datetime64() method, we can get the date in a numpy array in a particular format i.e year-month-day by using numpy. datetime64() method. Syntax : numpy.datetime64(date) Return : Return the date in a format 'yyyy-mm-dd'.
Series.astype
converts all date-like objects to datetime64[ns]
. To convert to datetime64[D]
, use values
to obtain a NumPy array before calling astype
:
dates_input = df["month_15"].values.astype('datetime64[D]')
Note that NDFrames (such as Series and DataFrames) can only hold datetime-like objects as objects of dtype datetime64[ns]
. The automatic conversion of all datetime-likes to a common dtype simplifies subsequent date computations. But it makes it impossible to store, say, datetime64[s]
objects in a DataFrame column. Pandas core developer, Jeff Reback explains,
"We don't allow direct conversions because its simply too complicated to keep anything other than datetime64[ns] internally (nor necessary at all)."
Also note that even though df['month_15'].astype('datetime64[D]')
has dtype datetime64[ns]
:
In [29]: df['month_15'].astype('datetime64[D]').dtype Out[29]: dtype('<M8[ns]')
when you iterate through the items in the Series, you get pandas Timestamps
, not datetime64[ns]
s.
In [28]: df['month_15'].astype('datetime64[D]').tolist() Out[28]: [Timestamp('2010-01-15 00:00:00'), Timestamp('2011-01-15 00:00:00')]
Therefore, it is not clear that Numba actually has a problem with datetime64[ns]
, it might just have a problem with Timestamps
. Sorry, I can't check this -- I don't have Numba installed.
However, it might be useful for you to try
testf(df['month_15'].astype('datetime64[D]').values)
since df['month_15'].astype('datetime64[D]').values
is truly a NumPy array of dtype datetime64[ns]
:
In [31]: df['month_15'].astype('datetime64[D]').values.dtype Out[31]: dtype('<M8[ns]')
If that works, then you don't have to convert everything to datetime64[D]
, you just have to pass NumPy arrays -- not Pandas Series -- to testf
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With