I know that it is possible to offset with the periods
argument, but how would one go about return-izing daily price data that is spread throughout a month (trading days, for example)?
Example data is:
In [1]: df.AAPL
2009-01-02 16:00:00 90.36
2009-01-05 16:00:00 94.18
2009-01-06 16:00:00 92.62
2009-01-07 16:00:00 90.62
2009-01-08 16:00:00 92.30
2009-01-09 16:00:00 90.19
2009-01-12 16:00:00 88.28
2009-01-13 16:00:00 87.34
2009-01-14 16:00:00 84.97
2009-01-15 16:00:00 83.02
2009-01-16 16:00:00 81.98
2009-01-20 16:00:00 77.87
2009-01-21 16:00:00 82.48
2009-01-22 16:00:00 87.98
2009-01-23 16:00:00 87.98
...
2009-12-10 16:00:00 195.59
2009-12-11 16:00:00 193.84
2009-12-14 16:00:00 196.14
2009-12-15 16:00:00 193.34
2009-12-16 16:00:00 194.20
2009-12-17 16:00:00 191.04
2009-12-18 16:00:00 194.59
2009-12-21 16:00:00 197.38
2009-12-22 16:00:00 199.50
2009-12-23 16:00:00 201.24
2009-12-24 16:00:00 208.15
2009-12-28 16:00:00 210.71
2009-12-29 16:00:00 208.21
2009-12-30 16:00:00 210.74
2009-12-31 16:00:00 209.83
Name: AAPL, Length: 252
As you can see, simply offsetting by 30 would not produce correct results, as there are gaps in the timestamp data, not every month is 30 days, etc. I know there must be an easy way to do this using pandas.
Pandas DataFrame pct_change() Method The pct_change() method returns a DataFrame with the percentage difference between the values for each row and, by default, the previous row. Which row to compare with can be specified with the periods parameter.
pct_change() function calculates the percentage change between the current and a prior element. This function by default calculates the percentage change from the immediately previous row.
Rate of change problems can generally be approached using the formula R = D/T, or rate of change equals the distance traveled divided by the time it takes to do so.
You can resample the data to business month. If you don't want the mean price (which is the default in resample
) you can use a custom resample method using the keyword argument how
:
In [31]: from pandas.io import data as web
# read some example data, note that this is not exactly your data!
In [32]: s = web.get_data_yahoo('AAPL', start='2009-01-02',
... end='2009-12-31')['Adj Close']
# resample to business month and return the last value in the period
In [34]: monthly = s.resample('BM', how=lambda x: x[-1])
In [35]: monthly
Out[35]:
Date
2009-01-30 89.34
2009-02-27 88.52
2009-03-31 104.19
...
2009-10-30 186.84
2009-11-30 198.15
2009-12-31 208.88
Freq: BM
In [36]: monthly.pct_change()
Out[36]:
Date
2009-01-30 NaN
2009-02-27 -0.009178
2009-03-31 0.177022
...
2009-10-30 0.016982
2009-11-30 0.060533
2009-12-31 0.054151
Freq: BM
I stumbled on this error as well while using the pct_change function, and would like to offer my two cents on this question.
The freq argument for the pct_change function seems to only accept fixed-period time offset, such as "2D" and "3D". However, "M" is an indefinite time period, and could range between 28 day to 31 day. So that's where the errors come from.
Pct_change operates similarly to the rolling() function, and using "M" time offset with rolling() would get the same error.
Here is a working example using the freq argument in the pct_change argument:
import pandas_datareader.data as web
return.pct_change(periods = 1, freq = '2D')
Date
2008-03-26 NaN
2008-03-27 NaN
2008-03-28 -0.010342
2008-03-31 NaN
2008-04-01 NaN
...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With