Converting PANDAS dataframe from monthly to daily

Tags:

I have a data frame with monthly data for 2014 for a series of 317 stock tickers (317 tickers x 12 months = 3,804 rows in DF). I would like to convert it to a daily dataframe (317 tickers x 365 days = 115,705 rows). So, I believe I need to upsample or reindex while spreading the monthly values over every day in the month, but I can't get it to work properly.

The dataframe is currently in this format:

>>> df
month    ticker   b    c
2014-1   AAU      10   .04     #different values every month for each ticker
2014-2   AAU      20   .03
2014-3   AAU      13   .06
.
2014-12  AAU      11   .03
.
.
.
2014-1   ZZY      11   .11
2014-2   ZZY      6    .03
.
2014-12  ZZY      17   .09

And this is what I'd like:

>>> df
day          ticker   b    c
2014-01-01   AAU      10   .04  #same values every day in month for each ticker
2014-01-02   AAU      10   .04
2014-01-03   AAU      10   .04
.
2014-01-31   AAU      10   .04
2014-02-01   AAU      20   .03
2014-02-02   AAU      20   .03
.
2014-02-28   AAU      20   .03
.
.
.
2014-12-30   ZZY      17   .09 
2014-12-31   ZZY      17   .09

I have tried doing a groupby combined with resampling by day, but the updated dataframe will start with the date '2014-01-13' rather than January 1st, and end with '2014-12-01' rather than December 31st. I have also tried to change the month values from, for instance, '2014-1' to '2014-01-01', etc., but the resampled dataframe still ends on '2014-01-01'. There has to be an easier way to go about this, so I'd appreciate any help. I've been going around in circles all day on this.

352

asked Apr 13 '15 18:04

Gregory Saxton

1 Answers

First, parse the month-datestrings into Pandas timestamps:

df['month'] = pd.to_datetime(df['month'], format='%Y-%m')
#        month ticker   b     c
# 0 2014-01-01    AAU  10  0.04
# 1 2014-02-01    AAU  20  0.03
# 2 2014-03-01    AAU  13  0.06
# 3 2014-12-01    AAU  11  0.03
# 4 2014-01-01    ZZY  11  0.11
# 5 2014-02-01    ZZY   6  0.03
# 6 2014-12-01    ZZY  17  0.09

Next, pivot the DataFrame, using the month as the index and the ticker as a column level:

df = df.pivot(index='month', columns='ticker')
#              b         c      
# ticker     AAU ZZY   AAU   ZZY
# month                         
# 2014-01-01  10  11  0.04  0.11
# 2014-02-01  20   6  0.03  0.03
# 2014-03-01  13 NaN  0.06   NaN
# 2014-12-01  11  17  0.03  0.09

By pivoting now, we will be able to forward-fill each column more easily later.

Now find the start and end dates:

start_date = df.index.min() - pd.DateOffset(day=1)
end_date = df.index.max() + pd.DateOffset(day=31)

Interestingly, note that adding pd.DateOffset(day=31) will not always result in a date that ends on day 31. If the month is February, adding pd.DateOffset(day=31) returns the last day in February:

In [130]: pd.Timestamp('2014-2-28') + pd.DateOffset(day=31)
Out[130]: Timestamp('2014-02-28 00:00:00')

That's nice, since that means adding pd.DateOffset(day=31) will always give us the last valid day in the month.

Now we can reindex and forward-fill the DataFrame:

dates = pd.date_range(start_date, end_date, freq='D')
dates.name = 'date'
df = df.reindex(dates, method='ffill')

which yields

In [160]: df.head()
Out[160]: 
             b         c      
ticker     AAU ZZY   AAU   ZZY
date                          
2014-01-01  10  11  0.04  0.11
2014-01-02  10  11  0.04  0.11
2014-01-03  10  11  0.04  0.11
2014-01-04  10  11  0.04  0.11
2014-01-05  10  11  0.04  0.11

In [161]: df.tail()
Out[161]: 
             b         c      
ticker     AAU ZZY   AAU   ZZY
date                          
2014-12-27  11  17  0.03  0.09
2014-12-28  11  17  0.03  0.09
2014-12-29  11  17  0.03  0.09
2014-12-30  11  17  0.03  0.09
2014-12-31  11  17  0.03  0.09

To move the ticker out of the column index and back into a column:

df = df.stack('ticker')
df = df.sortlevel(level=1)
df = df.reset_index()

So putting it all together:

import pandas as pd
df = pd.read_table('data', sep='\s+')
df['month'] = pd.to_datetime(df['month'], format='%Y-%m')
df = df.pivot(index='month', columns='ticker')

start_date = df.index.min() - pd.DateOffset(day=1)
end_date = df.index.max() + pd.DateOffset(day=31)
dates = pd.date_range(start_date, end_date, freq='D')
dates.name = 'date'
df = df.reindex(dates, method='ffill')

df = df.stack('ticker')
df = df.sortlevel(level=1)
df = df.reset_index()

yields

In [163]: df.head()
Out[163]: 
        date ticker   b     c
0 2014-01-01    AAU  10  0.04
1 2014-01-02    AAU  10  0.04
2 2014-01-03    AAU  10  0.04
3 2014-01-04    AAU  10  0.04
4 2014-01-05    AAU  10  0.04

In [164]: df.tail()
Out[164]: 
          date ticker   b     c
450 2014-12-27    ZZY  17  0.09
451 2014-12-28    ZZY  17  0.09
452 2014-12-29    ZZY  17  0.09
453 2014-12-30    ZZY  17  0.09
454 2014-12-31    ZZY  17  0.09

answered Oct 21 '22 11:10

unutbu

Related questions
                            
                                Django queryset and generator
                            
                                how to call / run multiple python scripts from batch file in window xp / 7
                            
                                How to pass parameters to a build in Sublime Text 3?
                            
                                Unable to save DataFrame to HDF5 ("object header message is too large")
                            
                                Python dictreader - How to make CSV column names lowercase?
                            
                                Read previous line in a file python
                            
                                Animation with pcolormesh routine in matplotlib, how do I initialize the data?
                            
                                what do _ and __ mean in PYTHON
                            
                                How to fill in rows with repeating data in pandas?
                            
                                why \b doesn't work in python re module? [duplicate]
                            
                                format () : ValueError: Precision not allowed in integer format specifier
                            
                                Why won't QToolTips appear on QActions within a QMenu
                            
                                Get all logging output with mock
                            
                                Install django1.7 with Python 3.4 using virtualenv
                            
                                Error when check request.method in flask
                            
                                Theano HiddenLayer Activation Function
                            
                                Python 3 - using tuples in str.format() [duplicate]
                            
                                Python - How to gzip a large text file without MemoryError?
                            
                                "Cast" to int in Python 3.4
                            
                                Matplotlib Row heights table property

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Converting PANDAS dataframe from monthly to daily

Tags:

python

pandas

Gregory Saxton

People also ask

1 Answers

unutbu

Recent Activity

Donate For Us