I have a <code>pandas.DataFrame</code> called <code>df</code> which has an automatically generated index, with a column <code>dt</code>: <pre class="prettyprint"><code>df['dt'].dtype, df['dt'][0] # (dtype('<M8[ns]'), Timestamp('2014-10-01 10:02:45')) </code></pre> What I'd like to do is create a new column truncated to hour precision. I'm currently using: <pre class="prettyprint"><code>df['dt2'] = df['dt'].apply(lambda L: datetime(L.year, L.month, L.day, L.hour)) </code></pre> This works, so that's fine. However, I've an inkling there's some nice way using <code>pandas.tseries.offsets</code> or creating a <code>DatetimeIndex</code> or similar. So if possible, is there some <code>pandas</code> wizardry to do this?

In pandas 0.18.0 and later, there are datetime <code>floor</code>, <code>ceil</code> and <code>round</code> methods to round timestamps to a given fixed precision/frequency. To round down to hour precision, you can use: <pre class="prettyprint"><code>>>> df['dt2'] = df['dt'].dt.floor('h') >>> df dt dt2 0 2014-10-01 10:02:45 2014-10-01 10:00:00 1 2014-10-01 13:08:17 2014-10-01 13:00:00 2 2014-10-01 17:39:24 2014-10-01 17:00:00 </code></pre> <hr> Here's another alternative to truncate the timestamps. Unlike <code>floor</code>, it supports truncating to a precision such as year or month. You can temporarily adjust the precision unit of the underlying NumPy <code>datetime64</code> datatype, changing it from <code>[ns]</code> to <code>[h]</code>: <pre class="prettyprint"><code>df['dt'].values.astype('<M8[h]') </code></pre> This truncates everything to hour precision. For example: <pre class="prettyprint"><code>>>> df dt 0 2014-10-01 10:02:45 1 2014-10-01 13:08:17 2 2014-10-01 17:39:24 >>> df['dt2'] = df['dt'].values.astype('<M8[h]') >>> df dt dt2 0 2014-10-01 10:02:45 2014-10-01 10:00:00 1 2014-10-01 13:08:17 2014-10-01 13:00:00 2 2014-10-01 17:39:24 2014-10-01 17:00:00 >>> df.dtypes dt datetime64[ns] dt2 datetime64[ns] </code></pre> The same method should work for any other unit: months <code>'M'</code>, minutes <code>'m'</code>, and so on: <ul> <li>Keep up to year: <code>'<M8[Y]'</code> </li> <li>Keep up to month: <code>'<M8[M]'</code> </li> <li>Keep up to day: <code>'<M8[D]'</code> </li> <li>Keep up to minute: <code>'<M8[m]'</code> </li> <li>Keep up to second: <code>'<M8[s]'</code> </li> </ul>

Truncate `TimeStamp` column to hour precision in pandas `DataFrame`

Tags:

python

datetime

pandas

dataframe

I have a pandas.DataFrame called df which has an automatically generated index, with a column dt:

df['dt'].dtype, df['dt'][0] # (dtype('<M8[ns]'), Timestamp('2014-10-01 10:02:45'))

What I'd like to do is create a new column truncated to hour precision. I'm currently using:

df['dt2'] = df['dt'].apply(lambda L: datetime(L.year, L.month, L.day, L.hour))

This works, so that's fine. However, I've an inkling there's some nice way using pandas.tseries.offsets or creating a DatetimeIndex or similar.

So if possible, is there some pandas wizardry to do this?

309

asked Feb 27 '15 20:02

Jon Clements

Video Answer

2 Answers

In pandas 0.18.0 and later, there are datetime floor, ceil and round methods to round timestamps to a given fixed precision/frequency. To round down to hour precision, you can use:

>>> df['dt2'] = df['dt'].dt.floor('h') >>> df                       dt                     dt2 0    2014-10-01 10:02:45     2014-10-01 10:00:00 1    2014-10-01 13:08:17     2014-10-01 13:00:00 2    2014-10-01 17:39:24     2014-10-01 17:00:00

Here's another alternative to truncate the timestamps. Unlike floor, it supports truncating to a precision such as year or month.

You can temporarily adjust the precision unit of the underlying NumPy datetime64 datatype, changing it from [ns] to [h]:

df['dt'].values.astype('<M8[h]')

This truncates everything to hour precision. For example:

>>> df                        dt 0     2014-10-01 10:02:45 1     2014-10-01 13:08:17 2     2014-10-01 17:39:24  >>> df['dt2'] = df['dt'].values.astype('<M8[h]') >>> df                       dt                     dt2 0    2014-10-01 10:02:45     2014-10-01 10:00:00 1    2014-10-01 13:08:17     2014-10-01 13:00:00 2    2014-10-01 17:39:24     2014-10-01 17:00:00  >>> df.dtypes dt     datetime64[ns] dt2    datetime64[ns]

The same method should work for any other unit: months 'M', minutes 'm', and so on:

Keep up to year: '<M8[Y]'
Keep up to month: '<M8[M]'
Keep up to day: '<M8[D]'
Keep up to minute: '<M8[m]'
Keep up to second: '<M8[s]'

195

answered Sep 28 '22 17:09

Alex Riley

A method I've used in the past to accomplish this goal was the following (quite similar to what you're already doing, but thought I'd throw it out there anyway):

df['dt2'] = df['dt'].apply(lambda x: x.replace(minute=0, second=0))

answered Sep 28 '22 17:09

David Hagan

Related questions
                            
                                Python SqlAlchemy order_by DateTime?
                            
                                How to save the Pandas dataframe/series data as a figure?
                            
                                Recursive unittest discover
                            
                                Python BeautifulSoup extract text between element
                            
                                graph.write_pdf("iris.pdf") AttributeError: 'list' object has no attribute 'write_pdf'
                            
                                SQLAlchemy: Scan huge tables using ORM?
                            
                                profiling a method of a class in Python using cProfile?
                            
                                ValueError: unsupported format character while forming strings
                            
                                Split text after the second occurrence of character
                            
                                Why can't I install python3.6-dev on Ubuntu16.04
                            
                                Why does '(base)' appear in my anaconda command prompt?
                            
                                How can I get an email message's text content using Python?
                            
                                Call method from string
                            
                                Your server socket listen backlog is limited to 100 connections
                            
                                How can I patch / mock logging.getlogger()
                            
                                Sort list of lists ascending and then descending
                            
                                Edit the width of bars using dataframe.plot() function in matplotlib
                            
                                Opening a SSL socket connection in Python
                            
                                Remove object from a list of objects in python
                            
                                'Syntax Error: invalid syntax' for no apparent reason

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With