Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Difference of of two datetime64 objects yields NaT rather than correct timedelta value

This question "gets asked a lot" - but after looking carefully at the other answers I haven't found a solution that works in my case. It's a shame this is still such a sticking point.

I have a pandas dataframe with a column datetime and I simply want to calculate the time range covered by the data, in seconds (say).

from datetime import datetime

# You can create fake datetime entries any way you like, e.g.
df = pd.DataFrame({'datetime': pd.date_range('10/1/2001 10:00:00', \
    periods=3, freq='10H'),'B':[4,5,6]})

# (a) This yields NaT:
timespan_a=df['datetime'][-1:]-df['datetime'][:1]
print timespan_a
# 0   NaT
# 2   NaT
# Name: datetime, dtype: timedelta64[ns]

# (b) This does work - but why?
timespan_b=df['datetime'][-1:].values.astype("timedelta64")-\
    df['datetime'][:1].values.astype("timedelta64")
print timespan_b
# [72000000000000]
  1. Why doesn't (a) work?

  2. Why is (b) required rather? (it also gives a one-element numpy array rather than a timedelta object)

My pandas is at version 0.20.3, which rules out an earlier known bug.

Is this a dynamic-range issue?

like image 401
jtlz2 Avatar asked Oct 16 '25 03:10

jtlz2


1 Answers

There is problem different indexes, so one item Series cannot align and get NaT.

Solution is convert first or second values to numpy array by values:

timespan_a = df['datetime'][-1:]-df['datetime'][:1].values
print (timespan_a)
2   20:00:00
Name: datetime, dtype: timedelta64[ns]

Or set both index values to same:

a = df['datetime'][-1:]
b = df['datetime'][:1]
print (a)
2   2001-10-02 06:00:00
Name: datetime, dtype: datetime64[ns]

a.index = b.index
print (a)
0   2001-10-02 06:00:00
Name: datetime, dtype: datetime64[ns]
print (b)
0   2001-10-01 10:00:00
Name: datetime, dtype: datetime64[ns]

timespan_a = a - b
print (timespan_a)
0   20:00:00
Name: datetime, dtype: timedelta64[ns]

If want working with scalars:

a = df.loc[df.index[-1], 'datetime']
b = df.loc[0, 'datetime']
print (a)
2001-10-02 06:00:00

print (b)
2001-10-01 10:00:00

timespan_a = a - b
print (timespan_a)
0 days 20:00:00

Another solution, thank you Anton vBR:

timespan_a = df.get_value(len(df)-1,'datetime')- df.get_value(0,'datetime') 
print (timespan_a)
0 days 20:00:00
like image 171
jezrael Avatar answered Oct 17 '25 15:10

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!