I want to take a column of datetime objects and return a column of integers that are "days from that datetime until today". I can do it in an ugly way, looking for a prettier (and faster) way.
So suppose I have a dataframe with a datetime column like so:
11 2014-03-04 17:16:26+00:00
12 2014-03-10 01:35:56+00:00
13 2014-03-15 02:35:51+00:00
14 2014-03-20 05:55:47+00:00
15 2014-03-26 04:56:33+00:00
Name: datetime, dtype: object
And each element looks like:
datetime.datetime(2014, 3, 4, 17, 16, 26, tzinfo=<UTC>)
Suppose I want to calculate how many days ago each observation occurred, and return that as a simple integer. I know I can just use apply
twice, but is there a vectorized/cleaner way to do it?
today = datetime.datetime.today().date()
df_dates = df['datetime'].apply(lambda x: x.date())
days_ago = today - df_dates
Which gives a timedelta64[ns] Series.
11 56 days, 00:00:00
12 50 days, 00:00:00
13 45 days, 00:00:00
14 40 days, 00:00:00
15 34 days, 00:00:00
Name: datetime, dtype: timedelta64[ns]
And then finally if I want it as an integer:
days_ago_as_int = days_ago.apply(lambda x: x.item().days)
days_ago_as_int
11 56
12 50
13 45
14 40
15 34
Name: datetime, dtype: int64
Any thoughts?
Related questions that didn't quite get at what I was asking:
Pandas Python- can datetime be used with vectorized inputs
Pandas add one day to column
Trying Karl D's answer, I'm successfully able to get today's date and the date column as desired, but something goes awry in the subtraction (different datetimes than in the original example, but shouldn't matter, right?):
converted_dates = df['date'].values.astype('datetime64[D]')
today_date = np.datetime64(dt.date.today())
print converted_dates
print today_date
print today_date - converted_dates
[2014-01-16 00:00:00
2014-01-19 00:00:00
2014-01-22 00:00:00
2014-01-26 00:00:00
2014-01-29 00:00:00]
2014-04-30 00:00:00
[16189 days, 0:08:20.637994
16189 days, 0:08:20.637991
16189 days, 0:08:20.637988
16189 days, 0:08:20.637984
16189 days, 0:08:20.637981]
How about (for a column named date)?
import datetime as dt
df['foo'] = (np.datetime64(dt.date.today())
- df['date'].values.astype('datetime64[D]'))
print df
date foo
0 2014-03-04 17:16:26 56 days
1 2014-03-10 01:35:56 50 days
2 2014-03-15 02:35:51 45 days
3 2014-03-20 05:55:47 40 days
4 2014-03-26 04:56:33 34 days
Or if you wanted it as an int:
df['foo'] = (np.datetime64(dt.date.today())
- df['date'].values.astype('datetime64[D]')).astype(int)
print df
date foo
0 2014-03-04 17:16:26 56
1 2014-03-10 01:35:56 50
2 2014-03-15 02:35:51 45
3 2014-03-20 05:55:47 40
4 2014-03-26 04:56:33 34
Or if it was an index
print np.datetime64(dt.date.today()) - df.index.values.astype('datetime64[D]')
[56 50 45 40 34]
Much later Edit: How about this for a work around?
>>> print df
date
0 2014-03-04 17:16:26
1 2014-03-10 01:35:56
2 2014-03-15 02:35:51
3 2014-03-20 05:55:47
4 2014-03-26 04:56:33
Try assigning today's date to a column so it gets converted to a datetime64 column by pandas and then do the arithmetic:
>>> df['today'] = dt.date.today()
>>> df['foo'] = (df['today'].values.astype('datetime64[D]')
- df['date'].values.astype('datetime64[D]'))
>>> print df
date today foo
0 2014-03-04 17:16:26 2014-05-14 71 days
1 2014-03-10 01:35:56 2014-05-14 65 days
2 2014-03-15 02:35:51 2014-05-14 60 days
3 2014-03-20 05:55:47 2014-05-14 55 days
4 2014-03-26 04:56:33 2014-05-14 49 days
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With