Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to work around Out of bounds nanosecond [duplicate]

Tags:

python

pandas

 LastLogin                         LastPurchased              
2018-08-21 00:28:04.081677         0001-01-01 00:00:00
2018-08-21 00:28:58.209522         2018-08-20 00:28:58.209522    

I need difference in days (df[LastLogin] - df['LastPurchased']).dt.days but there are some '0001-01-01 00:00:00' in LastPurchased. Anything I try to do to change 1-01-01 to a date within the Panda bounds results in Out of bounds nanosecond timestamp: 1-01-01 00:00:00. Is there any other ways?

     LastLogin                         LastPurchased              Days
2018-08-21 00:28:04.081677         1999-01-01 00:00:00            6935
2018-08-21 00:28:58.209522         2018-08-20 00:28:58.209522      1
like image 383
IDontKnowAnything Avatar asked Jan 15 '19 16:01

IDontKnowAnything


1 Answers

Pandas requires that the year in your datetime be greater than 1677 and less than 2622 (approximately - see pandas/_libs/tslibs/src/datetime/np_datetime.c for the exact bounds). Otherwise, the given date is outside the range that can be represented by nanosecond-resolution 64-bit integers:

>>> pd.Timestamp.max
Timestamp('2262-04-11 23:47:16.854775807')
>>> pd.Timestamp.min
Timestamp('1677-09-21 00:12:43.145225')
>>> pd.Timestamp.max - pd.Timestamp.min
datetime.timedelta(213503, 84873, 709550)

It's up to you how you want to handle this. Consider what you are ultimately trying to indicate by subtracting the date 0001-01-01. I'll assume that means a user has logged in but never purchased.

To coerce LastPurchased to either a valid Pandas Timestamp or pd.NaT ("not a time"), you can use

df['LastPurchased'] = pd.to_datetime(df['LastPurchased'], errors='coerce')

This will give NaT as the difference in those spots:

>>> pd.Timestamp(2018, 1, 1) - pd.NaT
NaT

Which you can use as a "sentinel" and check for with pd.isnat().

like image 107
Brad Solomon Avatar answered Sep 21 '22 15:09

Brad Solomon