I have a dataframe such as the following:
Index Return
2008-11-21 0.153419
2008-11-24 0.037421
2008-11-25 0.077500
What's the best way to calculate a cumulative return across all columns on the last row?
Following is the intended result:
Index Return
2008-11-21 0.153419
2008-11-24 0.037421
2008-11-25 0.077500
Cumulative 0.289316
Where cumulative return calculated as follows:
cumulative = (1 + return1) * (1 + return2) * (1 + return3) - 1
What is the best way to perform this in pandas?
there is a pandas cumprod()
method for that. this will work for every columns.
df.ix["Cumulative"] = ((df+1).cumprod()-1).iloc[-1]
this would be about 2 time faster than other solutions on large dataset:
In[106]: %timeit df.ix["Cumulative"] = ((df+1).cumprod()-1).iloc[-1]
10 loops, best of 3: 18.4 ms per loop
In[107]: %timeit df.ix['Cummulative'] = df.apply(lambda x: (x+1).prod()-1)
10 loops, best of 3: 32.9 ms per loop
In[110]: %timeit df.append(df.iloc[:,1:].apply(lambda col: (col + 1).prod() - 1), ignore_index=True)
10 loops, best of 3: 37.1 ms per loop
In[113]: %timeit df.append(df.apply(lambda col: prod([(1+c) for c in col]) - 1), ignore_index=True)
1 loop, best of 3: 262 ms per loop
I would suggest to never use apply if you can find a built-in method since apply is looping over the dataframe which makes it slow. Bult-in method are highly efficient and normally there is no way you are going to get faster than them using apply.
Another solution:
df.ix["Cumulative"] = (df['Return']+1).prod() - 1
This will add 1 to the df['Return']
column, multiply all the rows together, and then subtract one from the result. This will result in a simple float value. The result will then be placed at the index "Cumulative". Since that index doesn't exist yet, it will be appended to the end of the DataFrame:
Return
2008-11-21 0.153419
2008-11-25 0.077500
2008-11-24 0.037421
Cummulative 0.289316
If you want to apply this across multiple columns:
df.ix['Cummulative'] = df.apply(lambda x: (x+1).prod()-1)
This would output the following (I made a second column called "Return2" that is a copy of "Return"):
Return Return2
2008-11-21 0.153419 0.153419
2008-11-25 0.077500 0.077500
2008-11-24 0.037421 0.037421
Cummulative 0.289316 0.289316
With pandas
, you can use the prod()
method:
df.append(df.iloc[:,1:].apply(lambda col: (col + 1).prod() - 1), ignore_index=True)
# Index Return
#0 2008-11-21 0.153419
#1 2008-11-24 0.037421
#2 2008-11-25 0.077500
#3 NaN 0.289316
Or as @Randy C commented, this can be further simplified to:
df.append((df.iloc[:,1:] + 1).prod() - 1, ignore_index=True)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With