LastLogin LastPurchased
2018-08-21 00:28:04.081677 0001-01-01 00:00:00
2018-08-21 00:28:58.209522 2018-08-20 00:28:58.209522
I need difference in days (df[LastLogin] - df['LastPurchased']).dt.days
but there are some '0001-01-01 00:00:00'
in LastPurchased
. Anything I try to do to change 1-01-01
to a date within the Panda bounds results in Out of bounds nanosecond timestamp: 1-01-01 00:00:00
. Is there any other ways?
LastLogin LastPurchased Days
2018-08-21 00:28:04.081677 1999-01-01 00:00:00 6935
2018-08-21 00:28:58.209522 2018-08-20 00:28:58.209522 1
Pandas requires that the year in your datetime be greater than 1677 and less than 2622 (approximately - see pandas/_libs/tslibs/src/datetime/np_datetime.c for the exact bounds). Otherwise, the given date is outside the range that can be represented by nanosecond-resolution 64-bit integers:
>>> pd.Timestamp.max
Timestamp('2262-04-11 23:47:16.854775807')
>>> pd.Timestamp.min
Timestamp('1677-09-21 00:12:43.145225')
>>> pd.Timestamp.max - pd.Timestamp.min
datetime.timedelta(213503, 84873, 709550)
It's up to you how you want to handle this. Consider what you are ultimately trying to indicate by subtracting the date 0001-01-01. I'll assume that means a user has logged in but never purchased.
To coerce LastPurchased
to either a valid Pandas Timestamp or pd.NaT
("not a time"), you can use
df['LastPurchased'] = pd.to_datetime(df['LastPurchased'], errors='coerce')
This will give NaT
as the difference in those spots:
>>> pd.Timestamp(2018, 1, 1) - pd.NaT
NaT
Which you can use as a "sentinel" and check for with pd.isnat()
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With