Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to apply ceiling to pandas DateTime

Suppose I have a pandas dataframe with a column whose values are datetime64[ns].

Out[204]: 
0   2015-03-20 00:00:28
1   2015-03-20 00:01:44
2   2015-03-20 00:02:55
3   2015-03-20 00:03:39
4   2015-03-20 00:04:32
5   2015-03-20 00:05:52
6   2015-03-20 00:06:36
7   2015-03-20 00:07:44
8   2015-03-20 00:08:56
9   2015-03-20 00:09:47
Name: DateTime, dtype: datetime64[ns]

Is there any easy way to convert them the nearest minute after the time? i.e. I want the following:

Out[204]: 
0   2015-03-20 00:01:00
1   2015-03-20 00:02:00
2   2015-03-20 00:03:00
3   2015-03-20 00:04:00
4   2015-03-20 00:05:00
5   2015-03-20 00:06:00
6   2015-03-20 00:07:00
7   2015-03-20 00:08:00
8   2015-03-20 00:09:00
9   2015-03-20 00:10:00
Name: DateTime, dtype: datetime64[ns]

I wrote a complicate code that first converts them to string and then extracts the three portions of 00:09:47, convert them into integers, then unless the last portion (seconds) is already 00, I make the last portion (seconds) to be 00, adds 1 to the middle portion (minutes) except if the middle portion (minutes) is already 59 in which case it adds to the first portion (hours). Then recombine the new integers back to a string and then reconstruct back the DateTime.

But I was thinking that may there might be already an existing simpler solution. Would anyone have any suggestions?

* EDIT *

@Jeff, @unutbu, thanks for your answers. I can only select one answer in SO, but both work.

like image 463
uday Avatar asked Mar 21 '15 00:03

uday


2 Answers

Here's another way. Subtract off the differential seconds (sort of like round). This is vectorized.

In [46]: df.date+pd.to_timedelta(-df.date.dt.second % 60,unit='s')
Out[46]: 
0   2015-03-20 00:01:00
1   2015-03-20 00:02:00
2   2015-03-20 00:03:00
3   2015-03-20 00:04:00
4   2015-03-20 00:05:00
5   2015-03-20 00:06:00
6   2015-03-20 00:07:00
7   2015-03-20 00:08:00
8   2015-03-20 00:09:00
9   2015-03-20 00:10:00
dtype: datetime64[ns

Here's another way. Changing something to a Period of another frequency rounds it. (Note that this is a bit clunky ATM because Periods are not full-fledged as a column type). This is vectorized.

In [48]: pd.Series(pd.PeriodIndex(df.date.dt.to_period('T')+1).to_timestamp())
Out[48]: 
0   2015-03-20 00:01:00
1   2015-03-20 00:02:00
2   2015-03-20 00:03:00
3   2015-03-20 00:04:00
4   2015-03-20 00:05:00
5   2015-03-20 00:06:00
6   2015-03-20 00:07:00
7   2015-03-20 00:08:00
8   2015-03-20 00:09:00
9   2015-03-20 00:10:00
dtype: datetime64[ns]

This last method will always round 'up' as we are incrementing the floored period.

like image 148
Jeff Avatar answered Oct 01 '22 10:10

Jeff


Given a DataFrame with a column of dtype datetime64[ns], you could use

df['date'] += np.array(-df['date'].dt.second % 60, dtype='<m8[s]')

to add the appropriate number of seconds to obtain the ceiling.


For example,

import io
import sys
import numpy as np
import pandas as pd
StringIO = io.BytesIO if sys.version < '3' else io.StringIO

df = '''\
2015-03-20 00:00:00
2015-03-20 00:00:28
2015-03-20 00:01:44
2015-03-20 00:02:55
2015-03-20 00:03:39
2015-03-20 00:04:32
2015-03-20 00:05:52
2015-03-20 00:06:36
2015-03-20 00:07:44
2015-03-20 00:08:56
2015-03-20 00:09:47'''

df = pd.read_table(StringIO(df), sep='\s{2,}', 
                   header=None, parse_dates=[0], names=['date'])

df['date'] += np.array(-df['date'].dt.second % 60, dtype='<m8[s]')
print(df)

yields

                  date
0  2015-03-20 00:00:00
1  2015-03-20 00:01:00
2  2015-03-20 00:02:00
3  2015-03-20 00:03:00
4  2015-03-20 00:04:00
5  2015-03-20 00:05:00
6  2015-03-20 00:06:00
7  2015-03-20 00:07:00
8  2015-03-20 00:08:00
9  2015-03-20 00:09:00
10 2015-03-20 00:10:00
like image 41
unutbu Avatar answered Oct 01 '22 12:10

unutbu