I have a dataframe with monthly financial data:
In [89]: vfiax_monthly.head() Out[89]: year month day d open close high low volume aclose 2003-01-31 2003 1 31 731246 64.95 64.95 64.95 64.95 0 64.95 2003-02-28 2003 2 28 731274 63.98 63.98 63.98 63.98 0 63.98 2003-03-31 2003 3 31 731305 64.59 64.59 64.59 64.59 0 64.59 2003-04-30 2003 4 30 731335 69.93 69.93 69.93 69.93 0 69.93 2003-05-30 2003 5 30 731365 73.61 73.61 73.61 73.61 0 73.61
I'm trying to calculate the returns like that:
In [90]: returns = (vfiax_monthly.open[1:] - vfiax_monthly.open[:-1])/vfiax_monthly.open[1:]
But I'm getting only zeroes:
In [91]: returns.head() Out[91]: 2003-01-31 NaN 2003-02-28 0 2003-03-31 0 2003-04-30 0 2003-05-30 0 Freq: BM, Name: open
I think that's because the arithmetic operations get aligned on the index and that makes the [1:]
and [:-1]
useless.
My workaround is:
In [103]: returns = (vfiax_monthly.open[1:].values - vfiax_monthly.open[:-1].values)/vfiax_monthly.open[1:].values In [104]: returns = pd.Series(returns, index=vfiax_monthly.index[1:]) In [105]: returns.head() Out[105]: 2003-02-28 -0.015161 2003-03-31 0.009444 2003-04-30 0.076362 2003-05-30 0.049993 2003-06-30 0.012477 Freq: BM
Is there a better way to calculate the returns? I don't like the conversion to array and then back to Series.
sum() function return the sum of the values for the requested axis. If the input is index axis then it adds all the values in a column and repeats the same for all the columns and returns a series containing the sum of all the values in each column.
You can use the loc and iloc functions to access columns in a Pandas DataFrame. Let's see how. If we wanted to access a certain column in our DataFrame, for example the Grades column, we could simply use the loc function and specify the name of the column in order to retrieve it.
Instead of slicing, use .shift
to move the index position of values in a DataFrame/Series. For example:
returns = (vfiax_monthly.open - vfiax_monthly.open.shift(1))/vfiax_monthly.open.shift(1)
This is what pct_change
is doing under the bonnet. You can also use it for other functions e.g.:
(3*vfiax_monthly.open + 2*vfiax_monthly.open.shift(1))/5
You might also want to looking into the rolling and window functions for other types of analysis of financial data.
The easiest way to do this is to use the DataFrame.pct_change() method.
Here is a quick example
In[1]: aapl = get_data_yahoo('aapl', start='11/1/2012', end='11/13/2012') In[2]: appl Out[2]: Open High Low Close Volume Adj Close Date 2012-11-01 598.22 603.00 594.17 596.54 12903500 593.83 2012-11-02 595.89 596.95 574.75 576.80 21406200 574.18 2012-11-05 583.52 587.77 577.60 584.62 18897700 581.96 2012-11-06 590.23 590.74 580.09 582.85 13389900 580.20 2012-11-07 573.84 574.54 555.75 558.00 28344600 558.00 2012-11-08 560.63 562.23 535.29 537.75 37719500 537.75 2012-11-09 540.42 554.88 533.72 547.06 33211200 547.06 2012-11-12 554.15 554.50 538.65 542.83 18421500 542.83 2012-11-13 538.91 550.48 536.36 542.90 19033900 542.90 In[3]: aapl.pct_change() Out[3]: Open High Low Close Volume Adj Close Date 2012-11-01 NaN NaN NaN NaN NaN NaN 2012-11-02 -0.003895 -0.010033 -0.032684 -0.033091 0.658945 -0.033090 2012-11-05 -0.020759 -0.015378 0.004959 0.013558 -0.117186 0.013550 2012-11-06 0.011499 0.005053 0.004311 -0.003028 -0.291453 -0.003024 2012-11-07 -0.027769 -0.027423 -0.041959 -0.042635 1.116864 -0.038263 2012-11-08 -0.023020 -0.021426 -0.036815 -0.036290 0.330747 -0.036290 2012-11-09 -0.036049 -0.013073 -0.002933 0.017313 -0.119522 0.017313 2012-11-12 0.025406 -0.000685 0.009237 -0.007732 -0.445323 -0.007732 2012-11-13 -0.027502 -0.007250 -0.004251 0.000129 0.033244 0.000129
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With