I have a data frame with monthly data for 2014 for a series of 317 stock tickers (317 tickers x 12 months = 3,804 rows in DF). I would like to convert it to a daily dataframe (317 tickers x 365 days = 115,705 rows). So, I believe I need to upsample or reindex while spreading the monthly values over every day in the month, but I can't get it to work properly.
The dataframe is currently in this format:
>>> df
month ticker b c
2014-1 AAU 10 .04 #different values every month for each ticker
2014-2 AAU 20 .03
2014-3 AAU 13 .06
.
2014-12 AAU 11 .03
.
.
.
2014-1 ZZY 11 .11
2014-2 ZZY 6 .03
.
2014-12 ZZY 17 .09
And this is what I'd like:
>>> df
day ticker b c
2014-01-01 AAU 10 .04 #same values every day in month for each ticker
2014-01-02 AAU 10 .04
2014-01-03 AAU 10 .04
.
2014-01-31 AAU 10 .04
2014-02-01 AAU 20 .03
2014-02-02 AAU 20 .03
.
2014-02-28 AAU 20 .03
.
.
.
2014-12-30 ZZY 17 .09
2014-12-31 ZZY 17 .09
I have tried doing a groupby combined with resampling by day, but the updated dataframe will start with the date '2014-01-13' rather than January 1st, and end with '2014-12-01' rather than December 31st. I have also tried to change the month values from, for instance, '2014-1' to '2014-01-01', etc., but the resampled dataframe still ends on '2014-01-01'. There has to be an easier way to go about this, so I'd appreciate any help. I've been going around in circles all day on this.
You can also convert to month just by using “m” instead of “w”. For Eg.:df. resample(“m”). mean() .
Resample Hourly Data to Daily Dataresample() method. To aggregate or temporal resample the data for a time period, you can take all of the values for each day and summarize them. In this case, you want total daily rainfall, so you will use the resample() method together with . sum() .
Hence Pandas provides a method called to_datetime() to convert strings into Timestamp objects. Once we convert a date in string format into a date time object, it is easy to get the day of the week using the method day_name() on the Timestamp object created.
To find the quarter for each monthly period, simply use the following formula: =ROUNDUP(Month/3,0).
First, parse the month-datestrings into Pandas timestamps:
df['month'] = pd.to_datetime(df['month'], format='%Y-%m')
# month ticker b c
# 0 2014-01-01 AAU 10 0.04
# 1 2014-02-01 AAU 20 0.03
# 2 2014-03-01 AAU 13 0.06
# 3 2014-12-01 AAU 11 0.03
# 4 2014-01-01 ZZY 11 0.11
# 5 2014-02-01 ZZY 6 0.03
# 6 2014-12-01 ZZY 17 0.09
Next, pivot the DataFrame, using the month as the index and the ticker as a column level:
df = df.pivot(index='month', columns='ticker')
# b c
# ticker AAU ZZY AAU ZZY
# month
# 2014-01-01 10 11 0.04 0.11
# 2014-02-01 20 6 0.03 0.03
# 2014-03-01 13 NaN 0.06 NaN
# 2014-12-01 11 17 0.03 0.09
By pivoting now, we will be able to forward-fill each column more easily later.
Now find the start and end dates:
start_date = df.index.min() - pd.DateOffset(day=1)
end_date = df.index.max() + pd.DateOffset(day=31)
Interestingly, note that adding pd.DateOffset(day=31)
will not always result in a date that ends on day 31. If the month is February, adding pd.DateOffset(day=31)
returns the last day in February:
In [130]: pd.Timestamp('2014-2-28') + pd.DateOffset(day=31)
Out[130]: Timestamp('2014-02-28 00:00:00')
That's nice, since that means adding pd.DateOffset(day=31)
will always give us the last valid day in the month.
Now we can reindex and forward-fill the DataFrame:
dates = pd.date_range(start_date, end_date, freq='D')
dates.name = 'date'
df = df.reindex(dates, method='ffill')
which yields
In [160]: df.head()
Out[160]:
b c
ticker AAU ZZY AAU ZZY
date
2014-01-01 10 11 0.04 0.11
2014-01-02 10 11 0.04 0.11
2014-01-03 10 11 0.04 0.11
2014-01-04 10 11 0.04 0.11
2014-01-05 10 11 0.04 0.11
In [161]: df.tail()
Out[161]:
b c
ticker AAU ZZY AAU ZZY
date
2014-12-27 11 17 0.03 0.09
2014-12-28 11 17 0.03 0.09
2014-12-29 11 17 0.03 0.09
2014-12-30 11 17 0.03 0.09
2014-12-31 11 17 0.03 0.09
To move the ticker out of the column index and back into a column:
df = df.stack('ticker')
df = df.sortlevel(level=1)
df = df.reset_index()
So putting it all together:
import pandas as pd
df = pd.read_table('data', sep='\s+')
df['month'] = pd.to_datetime(df['month'], format='%Y-%m')
df = df.pivot(index='month', columns='ticker')
start_date = df.index.min() - pd.DateOffset(day=1)
end_date = df.index.max() + pd.DateOffset(day=31)
dates = pd.date_range(start_date, end_date, freq='D')
dates.name = 'date'
df = df.reindex(dates, method='ffill')
df = df.stack('ticker')
df = df.sortlevel(level=1)
df = df.reset_index()
yields
In [163]: df.head()
Out[163]:
date ticker b c
0 2014-01-01 AAU 10 0.04
1 2014-01-02 AAU 10 0.04
2 2014-01-03 AAU 10 0.04
3 2014-01-04 AAU 10 0.04
4 2014-01-05 AAU 10 0.04
In [164]: df.tail()
Out[164]:
date ticker b c
450 2014-12-27 ZZY 17 0.09
451 2014-12-28 ZZY 17 0.09
452 2014-12-29 ZZY 17 0.09
453 2014-12-30 ZZY 17 0.09
454 2014-12-31 ZZY 17 0.09
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With