Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: Cumulative return function

I have a dataframe such as the following:

  Index      Return
2008-11-21   0.153419
2008-11-24   0.037421
2008-11-25   0.077500

What's the best way to calculate a cumulative return across all columns on the last row?

Following is the intended result:

  Index      Return
2008-11-21   0.153419
2008-11-24   0.037421
2008-11-25   0.077500
Cumulative   0.289316

Where cumulative return calculated as follows:

cumulative = (1 + return1) * (1 + return2) * (1 + return3) - 1 

What is the best way to perform this in pandas?

like image 655
Kelaref Avatar asked Nov 25 '16 19:11

Kelaref


3 Answers

there is a pandas cumprod() method for that. this will work for every columns.

df.ix["Cumulative"] = ((df+1).cumprod()-1).iloc[-1]

this would be about 2 time faster than other solutions on large dataset:

In[106]: %timeit df.ix["Cumulative"] = ((df+1).cumprod()-1).iloc[-1]
10 loops, best of 3: 18.4 ms per loop
In[107]: %timeit df.ix['Cummulative'] = df.apply(lambda x: (x+1).prod()-1)
10 loops, best of 3: 32.9 ms per loop
In[110]: %timeit df.append(df.iloc[:,1:].apply(lambda col: (col + 1).prod() - 1), ignore_index=True)
10 loops, best of 3: 37.1 ms per loop
In[113]: %timeit df.append(df.apply(lambda col: prod([(1+c) for c in col]) - 1), ignore_index=True)
1 loop, best of 3: 262 ms per loop

I would suggest to never use apply if you can find a built-in method since apply is looping over the dataframe which makes it slow. Bult-in method are highly efficient and normally there is no way you are going to get faster than them using apply.

like image 115
Steven G Avatar answered Sep 28 '22 00:09

Steven G


Another solution:

df.ix["Cumulative"] = (df['Return']+1).prod() - 1

This will add 1 to the df['Return'] column, multiply all the rows together, and then subtract one from the result. This will result in a simple float value. The result will then be placed at the index "Cumulative". Since that index doesn't exist yet, it will be appended to the end of the DataFrame:

               Return
2008-11-21   0.153419
2008-11-25   0.077500
2008-11-24   0.037421
Cummulative  0.289316

If you want to apply this across multiple columns:

df.ix['Cummulative'] = df.apply(lambda x: (x+1).prod()-1)

This would output the following (I made a second column called "Return2" that is a copy of "Return"):

               Return   Return2
2008-11-21   0.153419  0.153419
2008-11-25   0.077500  0.077500
2008-11-24   0.037421  0.037421
Cummulative  0.289316  0.289316
like image 28
TheF1rstPancake Avatar answered Sep 28 '22 01:09

TheF1rstPancake


With pandas, you can use the prod() method:

df.append(df.iloc[:,1:].apply(lambda col: (col + 1).prod() - 1), ignore_index=True)

#        Index    Return
#0  2008-11-21  0.153419
#1  2008-11-24  0.037421
#2  2008-11-25  0.077500
#3         NaN  0.289316

Or as @Randy C commented, this can be further simplified to:

df.append((df.iloc[:,1:] + 1).prod() - 1, ignore_index=True)
like image 31
Psidom Avatar answered Sep 28 '22 02:09

Psidom