Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting PANDAS dataframe from monthly to daily

Tags:

python

pandas

I have a data frame with monthly data for 2014 for a series of 317 stock tickers (317 tickers x 12 months = 3,804 rows in DF). I would like to convert it to a daily dataframe (317 tickers x 365 days = 115,705 rows). So, I believe I need to upsample or reindex while spreading the monthly values over every day in the month, but I can't get it to work properly.

The dataframe is currently in this format:

>>> df
month    ticker   b    c
2014-1   AAU      10   .04     #different values every month for each ticker
2014-2   AAU      20   .03
2014-3   AAU      13   .06
.
2014-12  AAU      11   .03
.
.
.
2014-1   ZZY      11   .11
2014-2   ZZY      6    .03
.
2014-12  ZZY      17   .09

And this is what I'd like:

>>> df
day          ticker   b    c
2014-01-01   AAU      10   .04  #same values every day in month for each ticker
2014-01-02   AAU      10   .04
2014-01-03   AAU      10   .04
.
2014-01-31   AAU      10   .04
2014-02-01   AAU      20   .03
2014-02-02   AAU      20   .03
.
2014-02-28   AAU      20   .03
.
.
.
2014-12-30   ZZY      17   .09 
2014-12-31   ZZY      17   .09 

I have tried doing a groupby combined with resampling by day, but the updated dataframe will start with the date '2014-01-13' rather than January 1st, and end with '2014-12-01' rather than December 31st. I have also tried to change the month values from, for instance, '2014-1' to '2014-01-01', etc., but the resampled dataframe still ends on '2014-01-01'. There has to be an easier way to go about this, so I'd appreciate any help. I've been going around in circles all day on this.

like image 352
Gregory Saxton Avatar asked Apr 13 '15 18:04

Gregory Saxton


People also ask

How do I convert daily data to monthly data in pandas?

You can also convert to month just by using “m” instead of “w”. For Eg.:df. resample(“m”). mean() .

How do I convert hourly data to daily data in pandas?

Resample Hourly Data to Daily Dataresample() method. To aggregate or temporal resample the data for a time period, you can take all of the values for each day and summarize them. In this case, you want total daily rainfall, so you will use the resample() method together with . sum() .

How do you change date to day of week in pandas?

Hence Pandas provides a method called to_datetime() to convert strings into Timestamp objects. Once we convert a date in string format into a date time object, it is easy to get the day of the week using the method day_name() on the Timestamp object created.

How do I convert monthly data to quarterly data in Python?

To find the quarter for each monthly period, simply use the following formula: =ROUNDUP(Month/3,0).


1 Answers

First, parse the month-datestrings into Pandas timestamps:

df['month'] = pd.to_datetime(df['month'], format='%Y-%m')
#        month ticker   b     c
# 0 2014-01-01    AAU  10  0.04
# 1 2014-02-01    AAU  20  0.03
# 2 2014-03-01    AAU  13  0.06
# 3 2014-12-01    AAU  11  0.03
# 4 2014-01-01    ZZY  11  0.11
# 5 2014-02-01    ZZY   6  0.03
# 6 2014-12-01    ZZY  17  0.09

Next, pivot the DataFrame, using the month as the index and the ticker as a column level:

df = df.pivot(index='month', columns='ticker')
#              b         c      
# ticker     AAU ZZY   AAU   ZZY
# month                         
# 2014-01-01  10  11  0.04  0.11
# 2014-02-01  20   6  0.03  0.03
# 2014-03-01  13 NaN  0.06   NaN
# 2014-12-01  11  17  0.03  0.09

By pivoting now, we will be able to forward-fill each column more easily later.

Now find the start and end dates:

start_date = df.index.min() - pd.DateOffset(day=1)
end_date = df.index.max() + pd.DateOffset(day=31)

Interestingly, note that adding pd.DateOffset(day=31) will not always result in a date that ends on day 31. If the month is February, adding pd.DateOffset(day=31) returns the last day in February:

In [130]: pd.Timestamp('2014-2-28') + pd.DateOffset(day=31)
Out[130]: Timestamp('2014-02-28 00:00:00')

That's nice, since that means adding pd.DateOffset(day=31) will always give us the last valid day in the month.

Now we can reindex and forward-fill the DataFrame:

dates = pd.date_range(start_date, end_date, freq='D')
dates.name = 'date'
df = df.reindex(dates, method='ffill')

which yields

In [160]: df.head()
Out[160]: 
             b         c      
ticker     AAU ZZY   AAU   ZZY
date                          
2014-01-01  10  11  0.04  0.11
2014-01-02  10  11  0.04  0.11
2014-01-03  10  11  0.04  0.11
2014-01-04  10  11  0.04  0.11
2014-01-05  10  11  0.04  0.11

In [161]: df.tail()
Out[161]: 
             b         c      
ticker     AAU ZZY   AAU   ZZY
date                          
2014-12-27  11  17  0.03  0.09
2014-12-28  11  17  0.03  0.09
2014-12-29  11  17  0.03  0.09
2014-12-30  11  17  0.03  0.09
2014-12-31  11  17  0.03  0.09

To move the ticker out of the column index and back into a column:

df = df.stack('ticker')
df = df.sortlevel(level=1)
df = df.reset_index()

So putting it all together:

import pandas as pd
df = pd.read_table('data', sep='\s+')
df['month'] = pd.to_datetime(df['month'], format='%Y-%m')
df = df.pivot(index='month', columns='ticker')

start_date = df.index.min() - pd.DateOffset(day=1)
end_date = df.index.max() + pd.DateOffset(day=31)
dates = pd.date_range(start_date, end_date, freq='D')
dates.name = 'date'
df = df.reindex(dates, method='ffill')

df = df.stack('ticker')
df = df.sortlevel(level=1)
df = df.reset_index()

yields

In [163]: df.head()
Out[163]: 
        date ticker   b     c
0 2014-01-01    AAU  10  0.04
1 2014-01-02    AAU  10  0.04
2 2014-01-03    AAU  10  0.04
3 2014-01-04    AAU  10  0.04
4 2014-01-05    AAU  10  0.04

In [164]: df.tail()
Out[164]: 
          date ticker   b     c
450 2014-12-27    ZZY  17  0.09
451 2014-12-28    ZZY  17  0.09
452 2014-12-29    ZZY  17  0.09
453 2014-12-30    ZZY  17  0.09
454 2014-12-31    ZZY  17  0.09
like image 70
unutbu Avatar answered Oct 21 '22 11:10

unutbu