Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas out of bounds nanosecond timestamp after offset rollforward plus adding a month offset

I am confused how pandas blew out of bounds for datetime objects with these lines:

import pandas as pd BOMoffset = pd.tseries.offsets.MonthBegin() # here some code sets the all_treatments dataframe and the newrowix, micolix, mocolix counters all_treatments.iloc[newrowix,micolix] = BOMoffset.rollforward(all_treatments.iloc[i,micolix] + pd.tseries.offsets.DateOffset(months = x)) all_treatments.iloc[newrowix,mocolix] = BOMoffset.rollforward(all_treatments.iloc[newrowix,micolix]+ pd.tseries.offsets.DateOffset(months = 1)) 

Here all_treatments.iloc[i,micolix] is a datetime set by pd.to_datetime(all_treatments['INDATUMA'], errors='coerce',format='%Y%m%d'), and INDATUMA is date information in the format 20070125.

This logic seems to work on mock data (no errors, dates make sense), so at the moment I cannot reproduce while it fails in my entire data with the following error:

pandas.tslib.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 2262-05-01 00:00:00 
like image 837
László Avatar asked Oct 01 '15 12:10

László


People also ask

How do you fix out of bounds nanosecond timestamp?

The easiest way to get around this error is to use the errors = 'coerce' argument, which coerces any timestamps outside of the minimum or maximum range to NaT values. What is this? The result is a date range with three datetime values and the last datetime is NaT since it exceeded the max value allowed by pandas.

What is timeseries in pandas?

Pandas Time Series Data Structures As mentioned before, it is essentially a replacement for Python's native datetime , but is based on the more efficient numpy. datetime64 data type. The associated Index structure is DatetimeIndex . For time Periods, Pandas provides the Period type.

What is the difference between datetime and timestamp pandas?

Timestamp is the pandas equivalent of python's Datetime and is interchangeable with it in most cases. It's the type used for the entries that make up a DatetimeIndex, and other timeseries oriented data structures in pandas. Value to be converted to Timestamp. Offset which Timestamp will have.

How do I combine date and time in pandas?

Pandas Combine() Function combine() function which allows us to take a date and time string values and combine them to a single Pandas timestamp object. The function accepts two main parameters: Date – refers to the datetime. date object denoting the date string.


2 Answers

Since pandas represents timestamps in nanosecond resolution, the timespan that can be represented using a 64-bit integer is limited to approximately 584 years

pd.Timestamp.min Out[54]: Timestamp('1677-09-22 00:12:43.145225')  In [55]: pd.Timestamp.max Out[55]: Timestamp('2262-04-11 23:47:16.854775807') 

And your value is out of this range 2262-05-01 00:00:00 and hence the outofbounds error

Straight out of: http://pandas-docs.github.io/pandas-docs-travis/user_guide/timeseries.html#timeseries-timestamp-limits

Workaround:

This will force the dates which are outside the bounds to NaT

pd.to_datetime(date_col_to_force, errors = 'coerce')

like image 53
Shankar ARUL Avatar answered Sep 19 '22 04:09

Shankar ARUL


Setting the errors parameter in pd.to_datetime to 'coerce' causes replacement of out of bounds values with NaT. Quoting the docs:

If ‘coerce’, then invalid parsing will be set as NaT

E.g.:

datetime_variable = pd.to_datetime(datetime_variable, errors = 'coerce') 

This does not fix the data (obviously), but still allows processing the non-NaT data points.

like image 37
Pawel Kranzberg Avatar answered Sep 18 '22 04:09

Pawel Kranzberg