Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a linearspace of timestamps in python?

Tags:

Is there any way to create a series of equally spaced date-time objects, given the start/stop epochs and the desired number of intervening elements?

t0 = dateutil.parser.parse("23-FEB-2015 23:09:19.445506")
tf = dateutil.parser.parse("24-FEB-2015 01:09:22.404973")
n = 10**4
series = pandas.period_range(start=t0, end=tf, periods=n)

This example fails, maybe pandas isn't intended to give date ranges with frequencies shorter than a day?

I could manually estimate a frequecy, i.e. (tf-t0)/n, but I'm concerned that naively adding this timedelta repeatedly (to the start epoch) will accumulate significant rounding errors as I approach the end epoch.

I could resort to working exclusively with floats instead of datetime objects. (For example, subtract the start epoch from the end epoch, and divide the timedelta by some unit such as a second, then simply apply numpy linspace..) But casting everything to floats (and converting back to dates only when needed) sacrifices the advantages of special data types (simpler code debugging). Is this the best solution?

like image 336
benjimin Avatar asked Jun 03 '16 03:06

benjimin


1 Answers

A workaround* is to use numpy's linspace:

In [11]: np.linspace(pd.Timestamp("23-FEB-2015 23:09:19.445506").value, pd.Timestamp("24-FEB-2015 01:09:22.404973").value, 50, dtype=np.int64)
Out[11]:
array([1424732959445506048, 1424733106444678912, 1424733253443851520,
       1424733400443024384, 1424733547442197248, 1424733694441370112,
       1424733841440542720, 1424733988439715584, 1424734135438888448,
       1424734282438061312, 1424734429437233920, 1424734576436406784,
       ...
       1424739133410763520, 1424739280409936384, 1424739427409108992,
       1424739574408281856, 1424739721407454720, 1424739868406627584,
       1424740015405800192, 1424740162404973056])

In [12]: pd.DatetimeIndex(np.linspace(pd.Timestamp("23-FEB-2015 23:09:19.445506").value, pd.Timestamp("24-FEB-2015 01:09:22.404973").value, 50, dtype=np.int64))
Out[12]:
DatetimeIndex(['2015-02-23 23:09:19.445506048',
               '2015-02-23 23:11:46.444678912',
               '2015-02-23 23:14:13.443851520',
               '2015-02-23 23:16:40.443024384',
               ...
               '2015-02-24 01:04:28.406627584',
               '2015-02-24 01:06:55.405800192',
               '2015-02-24 01:09:22.404973056'],
              dtype='datetime64[ns]', freq=None)

*From using date_range directly:

In [21]: pd.date_range("23-FEB-2015 23:09:19.445506", "24-FEB-2015 01:09:22.404973", periods=10**4)
...
ValueError: Must specify two of start, end, or periods
like image 91
Andy Hayden Avatar answered Sep 28 '22 03:09

Andy Hayden