Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas remove date from timestamp

Tags:

python

pandas

I have a large data set like this

                         user  category
time                                   
2014-01-01 00:00:00  21155349         2                                 
2014-01-01 00:00:00  56347479         6                                      
2014-01-01 00:00:00  68429517        13
2014-01-01 00:00:00  39055685         4
2014-01-01 00:00:00    521325        13

I want to make it as

               user category
time                                   
00:00:00  21155349         2                                 
00:00:00  56347479         6                                                                     
00:00:00  68429517        13
00:00:00  39055685         4
00:00:00    521325        13

How you do this using pandas

like image 986
milinda49 Avatar asked Feb 22 '26 07:02

milinda49


1 Answers

If you want to mutate a series (column) in pandas, the pattern is to apply a function to it (that updates on element in the series at a time), and to then assign that series back into into the dataframe

import pandas
import StringIO

# load data 
data = '''date,user,category
2014-01-01 00:00:00,  21155349,         2                                 
2014-01-01 00:00:00,  56347479,         6                                      
2014-01-01 00:00:00,  68429517,        13
2014-01-01 00:00:00,  39055685,         4
2014-01-01 00:00:00,    521325,        13'''
df = pandas.read_csv(StringIO.StringIO(data))
df['date'] = pandas.to_datetime(df['date'])

# make the required change
without_date = df['date'].apply( lambda d : d.time() )
df['date'] = without_date

# display results
print df

If the problem is because the date is the index, you've got a few more hoops to jump through:

df = pandas.read_csv(StringIO.StringIO(data), index_col='date')
ser = pandas.to_datetime(df.index).to_series()
df.set_index(ser.apply(lambda d : d.time() ))

As suggested by @DSM, If you have pandas later than 0.15.2, you can use use the .dt accessor on the series to do fast updates.

df = pandas.read_csv(StringIO.StringIO(data), index_col='date')
ser = pandas.to_datetime(df.index).to_series()
df.set_index(ser.dt.time)
like image 140
Andrew Walker Avatar answered Feb 24 '26 22:02

Andrew Walker