Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove 'seconds' and 'minutes' from a Pandas dataframe column

Given a dataframe like:

import numpy as np
import pandas as pd

df = pd.DataFrame(
{'Date' : pd.date_range('1/1/2011', periods=5, freq='3675S'),
 'Num' : np.random.rand(5)})
                 Date       Num
0 2011-01-01 00:00:00  0.580997
1 2011-01-01 01:01:15  0.407332
2 2011-01-01 02:02:30  0.786035
3 2011-01-01 03:03:45  0.821792
4 2011-01-01 04:05:00  0.807869

I would like to remove the 'minutes' and 'seconds' information.

The following (mostly stolen from: How to remove the 'seconds' of Pandas dataframe index?) works okay,

df = df.assign(Date = lambda x: pd.to_datetime(x['Date'].dt.strftime('%Y-%m-%d %H')))
                 Date       Num
0 2011-01-01 00:00:00  0.580997
1 2011-01-01 01:00:00  0.407332
2 2011-01-01 02:00:00  0.786035
3 2011-01-01 03:00:00  0.821792
4 2011-01-01 04:00:00  0.807869

but it feels strange to convert a datetime to a string then back to a datetime. Is there a way to do this more directly?

like image 327
Dustin Helliwell Avatar asked Apr 13 '17 19:04

Dustin Helliwell


People also ask

How do I remove the seconds from a datetime column in Python?

If you just want strings, you could remove the trailing seconds with a regex ':\d\d$' .


2 Answers

dt.round

This is how it should be done... use dt.round

df.assign(Date=df.Date.dt.round('H'))

                 Date       Num
0 2011-01-01 00:00:00  0.577957
1 2011-01-01 01:00:00  0.995748
2 2011-01-01 02:00:00  0.864013
3 2011-01-01 03:00:00  0.468762
4 2011-01-01 04:00:00  0.866827

OLD ANSWER

One approach is to set the index and use resample

df.set_index('Date').resample('H').last().reset_index()

                 Date       Num
0 2011-01-01 00:00:00  0.577957
1 2011-01-01 01:00:00  0.995748
2 2011-01-01 02:00:00  0.864013
3 2011-01-01 03:00:00  0.468762
4 2011-01-01 04:00:00  0.866827

Another alternative is to strip the date and hour components

df.assign(
    Date=pd.to_datetime(df.Date.dt.date) +
         pd.to_timedelta(df.Date.dt.hour, unit='H'))

                 Date       Num
0 2011-01-01 00:00:00  0.577957
1 2011-01-01 01:00:00  0.995748
2 2011-01-01 02:00:00  0.864013
3 2011-01-01 03:00:00  0.468762
4 2011-01-01 04:00:00  0.866827
like image 182
piRSquared Avatar answered Oct 19 '22 23:10

piRSquared


Other solution could be this :

df.Date = pd.to_datetime(df.Date)
df.Date = df.Date.apply(lambda x: datetime(x.year, x.month, x.day, x.hour))
like image 30
Kerem Tatlıcı Avatar answered Oct 20 '22 00:10

Kerem Tatlıcı