Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas drops timestamp columns in resample

I have a DataFrame which has two Timestamp columns, one of which gets used as an index for resampling, the other remains as a column. If I resample the DataFrame, the one remaining as a column gets dropped. I would like it to be resampled as if it were a numeric column (which it is really):

import numpy as np
import pandas as pd
init_time=pd.date_range(start='2010-01-01 00:00', end='2010-01-05 00:00', freq='12H')
valid_time=pd.date_range(start='2010-01-01 00:00', end='2010-01-05 00:00', freq='12H')
data = np.random.random(len(valid_time))
frame = pd.DataFrame(index=valid_time, data=data)
frame['init_time'] = init_time

DataFrame has an index, 1 numeric column, and one Timestamp column

                     0          init_time
2010-01-01 00:00:00  0.869667   2010-01-01 00:00:00
2010-01-01 12:00:00  0.225805   2010-01-01 12:00:00
2010-01-02 00:00:00  0.348080   2010-01-02 00:00:00
2010-01-02 12:00:00  0.761399   2010-01-02 12:00:00
2010-01-03 00:00:00  0.645069   2010-01-03 00:00:00
2010-01-03 12:00:00  0.133111   2010-01-03 12:00:00
2010-01-04 00:00:00  0.314302   2010-01-04 00:00:00
2010-01-04 12:00:00  0.130491   2010-01-04 12:00:00
2010-01-05 00:00:00  0.621703   2010-01-05 00:00:00

Now resample to daily:

daily = frame.resample('D', how='mean')

             0
2010-01-01   0.547736
2010-01-02   0.554740
2010-01-03   0.389090
2010-01-04   0.222396
2010-01-05   0.621703

My init_time column has been dropped. I could convert it to a raw integer column first, but is there an easier way?

like image 935
samwise Avatar asked Nov 10 '22 12:11

samwise


1 Answers

This is not implemented for datelike (mean), but you can do it this way:

Get the numeric results

In [48]: numeric = frame.resample('D',how='mean')

Get the datelike results. First get the columns which are datelike

In [49]: datelike = frame.loc[:,frame.dtypes.isin([np.dtype('datetime64[ns]')])]

Then resample them; you need to view them as integers to compute on them; Timestamp will handle this float input (and essentially round to the nearest nanosecond)

In [50]: datelike = datelike.resample('D',
         how=lambda x: Timestamp(x.view('i8').mean()))

Put them back together

In [51]: concat([numeric,datelike],axis=1)
Out[51]: 
                   0           init_time
2010-01-01  0.798880 2010-01-01 06:00:00
2010-01-02  0.859781 2010-01-02 06:00:00
2010-01-03  0.515503 2010-01-03 06:00:00
2010-01-04  0.505557 2010-01-04 06:00:00
2010-01-05  0.979835 2010-01-05 00:00:00

[5 rows x 2 columns]
like image 153
Jeff Avatar answered Nov 14 '22 21:11

Jeff