Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas- changing the start and end date of resampled timeseries

I've a time series that i resampled into this dataframe df ,

My data is from 6th june to 28 june. it want to extend the data from 1st june to 30th june. count column will have 0 value in only extended period and my real values from 6th to 28th.

Out[123]: 
                         count
Timestamp                    
2009-06-07 02:00:00         1
2009-06-07 03:00:00         0
2009-06-07 04:00:00         0
2009-06-07 05:00:00         0
2009-06-07 06:00:00         0

i need to the make the

start date:2009-06-01 00:00:00

end date:2009-06-30 23:00:00

so the data would look something like this:

                         count
Timestamp                    
2009-06-01 01:00:00         0
2009-06-01 02:00:00         0
2009-06-01 03:00:00         0

is there an effective way to perform this. the only way i can think of is not that effective.i am trying this since yesterday. please help

  index = pd.date_range('2009-06-01 00:00:00','2009-06-30 23:00:00', freq='H')
    df = pandas.DataFrame(numpy.zeros(len(index),1), index=index)
    df.columns=['zeros']
    result= pd.concat([df2,df])
    result1= pd.concat([df,result])
    result1.fillna(0)
    del result1['zero']
like image 946
sparktime12 Avatar asked Aug 27 '17 18:08

sparktime12


1 Answers

You can create a new index with the desired start and end day/times, resample the time series data and aggregate by count, then set the index to the new index.

import pandas as pd

# create the index with the start and end times you want
t_index = pd.DatetimeIndex(start='2009-06-01', end='2009-06-30 23:00:00', freq='1h')

# create the data frame
df = pd.DataFrame([['2009-06-07 02:07:42'],
                   ['2009-06-11 17:25:28'],
                   ['2009-06-11 17:50:42'],
                   ['2009-06-11 17:59:18']], columns=['daytime'])
df['daytime'] = pd.to_datetime(df['daytime'])

# resample the data to 1 hour, aggregate by counts,
# then reset the index and fill the na's with 0
df2 = df.resample('1h', on='daytime').count().reindex(t_index).fillna(0)

UPDATE:

The original answer has since depreciated, and will require you to alter the first line of code as suggested by @toni-penya-alba to:

t_index = pd.DatetimeIndex(pd.date_range(start='2009-06-01', end='2009-06-30 23:00:00', freq="1h"))
like image 184
James Avatar answered Oct 01 '22 21:10

James