Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Truncate `TimeStamp` column to hour precision in pandas `DataFrame`

I have a pandas.DataFrame called df which has an automatically generated index, with a column dt:

df['dt'].dtype, df['dt'][0] # (dtype('<M8[ns]'), Timestamp('2014-10-01 10:02:45')) 

What I'd like to do is create a new column truncated to hour precision. I'm currently using:

df['dt2'] = df['dt'].apply(lambda L: datetime(L.year, L.month, L.day, L.hour)) 

This works, so that's fine. However, I've an inkling there's some nice way using pandas.tseries.offsets or creating a DatetimeIndex or similar.

So if possible, is there some pandas wizardry to do this?

like image 309
Jon Clements Avatar asked Feb 27 '15 20:02

Jon Clements


People also ask

How do I delete a time zone in pandas?

tz_localize(None) method can be applied to the dataframe column to remove the timezone information.


Video Answer


2 Answers

In pandas 0.18.0 and later, there are datetime floor, ceil and round methods to round timestamps to a given fixed precision/frequency. To round down to hour precision, you can use:

>>> df['dt2'] = df['dt'].dt.floor('h') >>> df                       dt                     dt2 0    2014-10-01 10:02:45     2014-10-01 10:00:00 1    2014-10-01 13:08:17     2014-10-01 13:00:00 2    2014-10-01 17:39:24     2014-10-01 17:00:00 

Here's another alternative to truncate the timestamps. Unlike floor, it supports truncating to a precision such as year or month.

You can temporarily adjust the precision unit of the underlying NumPy datetime64 datatype, changing it from [ns] to [h]:

df['dt'].values.astype('<M8[h]') 

This truncates everything to hour precision. For example:

>>> df                        dt 0     2014-10-01 10:02:45 1     2014-10-01 13:08:17 2     2014-10-01 17:39:24  >>> df['dt2'] = df['dt'].values.astype('<M8[h]') >>> df                       dt                     dt2 0    2014-10-01 10:02:45     2014-10-01 10:00:00 1    2014-10-01 13:08:17     2014-10-01 13:00:00 2    2014-10-01 17:39:24     2014-10-01 17:00:00  >>> df.dtypes dt     datetime64[ns] dt2    datetime64[ns] 

The same method should work for any other unit: months 'M', minutes 'm', and so on:

  • Keep up to year: '<M8[Y]'
  • Keep up to month: '<M8[M]'
  • Keep up to day: '<M8[D]'
  • Keep up to minute: '<M8[m]'
  • Keep up to second: '<M8[s]'
like image 195
Alex Riley Avatar answered Sep 28 '22 17:09

Alex Riley


A method I've used in the past to accomplish this goal was the following (quite similar to what you're already doing, but thought I'd throw it out there anyway):

df['dt2'] = df['dt'].apply(lambda x: x.replace(minute=0, second=0)) 
like image 41
David Hagan Avatar answered Sep 28 '22 17:09

David Hagan