Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Day delta for dates >292 years apart

I try to obtain day deltas for a wide range of pandas dates. However, for time deltas >292 years I obtain negative values. For example,

import pandas as pd
dates = pd.Series(pd.date_range('1700-01-01', periods=4500, freq='m'))
days_delta = (dates-dates.min()).astype('timedelta64[D]')

However, using a DatetimeIndex I can do it and it works as I want it to,

import pandas as pd
import numpy as np
dates = pd.date_range('1700-01-01', periods=4500, freq='m')
days_fun = np.vectorize(lambda x: x.days)
days_delta = days_fun(dates.date - dates.date.min())

The question then is how to obtain the correct days_delta for Series objects?

like image 768
MMCM_ Avatar asked Mar 05 '16 12:03

MMCM_


1 Answers

Read here specifically about timedelta limitations:

Pandas represents Timedeltas in nanosecond resolution using 64 bit integers. As such, the 64 bit integer limits determine the Timedelta limits.

Incidentally this is the same limitation the docs mentioned that is placed on Timestamps in Pandas:

Since pandas represents timestamps in nanosecond resolution, the timespan that can be represented using a 64-bit integer is limited to approximately 584 years

This would suggest that the same recommendations the docs make for circumventing the timestamp limitations can be applied to timedeltas. The solution to the timestamp limitations are found in the docs (here):

If you have data that is outside of the Timestamp bounds, see Timestamp limitations, then you can use a PeriodIndex and/or Series of Periods to do computations.

like image 88
Projski Avatar answered Oct 16 '22 04:10

Projski