I am confused how pandas blew out of bounds for datetime objects with these lines:
import pandas as pd BOMoffset = pd.tseries.offsets.MonthBegin() # here some code sets the all_treatments dataframe and the newrowix, micolix, mocolix counters all_treatments.iloc[newrowix,micolix] = BOMoffset.rollforward(all_treatments.iloc[i,micolix] + pd.tseries.offsets.DateOffset(months = x)) all_treatments.iloc[newrowix,mocolix] = BOMoffset.rollforward(all_treatments.iloc[newrowix,micolix]+ pd.tseries.offsets.DateOffset(months = 1))
Here all_treatments.iloc[i,micolix]
is a datetime set by pd.to_datetime(all_treatments['INDATUMA'], errors='coerce',format='%Y%m%d')
, and INDATUMA
is date information in the format 20070125
.
This logic seems to work on mock data (no errors, dates make sense), so at the moment I cannot reproduce while it fails in my entire data with the following error:
pandas.tslib.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 2262-05-01 00:00:00
The easiest way to get around this error is to use the errors = 'coerce' argument, which coerces any timestamps outside of the minimum or maximum range to NaT values. What is this? The result is a date range with three datetime values and the last datetime is NaT since it exceeded the max value allowed by pandas.
Pandas Time Series Data Structures As mentioned before, it is essentially a replacement for Python's native datetime , but is based on the more efficient numpy. datetime64 data type. The associated Index structure is DatetimeIndex . For time Periods, Pandas provides the Period type.
Timestamp is the pandas equivalent of python's Datetime and is interchangeable with it in most cases. It's the type used for the entries that make up a DatetimeIndex, and other timeseries oriented data structures in pandas. Value to be converted to Timestamp. Offset which Timestamp will have.
Pandas Combine() Function combine() function which allows us to take a date and time string values and combine them to a single Pandas timestamp object. The function accepts two main parameters: Date – refers to the datetime. date object denoting the date string.
Since pandas represents timestamps in nanosecond resolution, the timespan that can be represented using a 64-bit integer is limited to approximately 584 years
pd.Timestamp.min Out[54]: Timestamp('1677-09-22 00:12:43.145225') In [55]: pd.Timestamp.max Out[55]: Timestamp('2262-04-11 23:47:16.854775807')
And your value is out of this range 2262-05-01 00:00:00 and hence the outofbounds error
Straight out of: http://pandas-docs.github.io/pandas-docs-travis/user_guide/timeseries.html#timeseries-timestamp-limits
Workaround:
This will force the dates which are outside the bounds to NaT
pd.to_datetime(date_col_to_force, errors = 'coerce')
Setting the errors
parameter in pd.to_datetime
to 'coerce'
causes replacement of out of bounds values with NaT
. Quoting the docs:
If ‘coerce’, then invalid parsing will be set as NaT
E.g.:
datetime_variable = pd.to_datetime(datetime_variable, errors = 'coerce')
This does not fix the data (obviously), but still allows processing the non-NaT data points.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With