Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

DataFrame calculating by group for log return of each stock

As an example, I have created a dataframe that looks like this:

         date  price ticker  volume
0  2018-01-01  1.323     AI    2000
1  2018-01-02  1.525     AI    1500
2  2018-01-03  1.045     AI     500
3  2018-01-01  2.110    BOC    3201
4  2018-01-02  2.150    BOC    5200
5  2018-01-03  2.810    BOC    1980
6  2018-01-01  5.199    CAT    2000
7  2018-01-02  4.980    CAT     450
8  2018-01-03  4.990    CAT    3000

So there are 3 stocks and spanning over three days. I want to calculate the daily log returns for each stock between 2018-01-01 and 2018-01-03.

My current code is:

df["logret"] = df.groupby("ticker").apply(np.log(df.price) - np.log(df.price.shift(1)))

But it's throwing me an error message that the Series objects are mutable, thus they cannot be hashed.

Can someone please explain to me what this error is pointing towards pls? And how to solve it to be able to compute the log returns by ticker names for each stock?

like image 890
cqstack Avatar asked Apr 03 '18 11:04

cqstack


2 Answers

You can perform this calculation via a vectorised approach:

res = df.sort_values(['ticker', 'date'])

res.loc[res['ticker'] == res['ticker'].shift(), 'logret'] = \
np.log(df['price']) - np.log(df['price'].shift())

Result

         date  price ticker  volume    logret
0  2018-01-01  1.323     AI    2000       NaN
1  2018-01-02  1.525     AI    1500  0.142093
2  2018-01-03  1.045     AI     500 -0.377978
3  2018-01-01  2.110    BOC    3201       NaN
4  2018-01-02  2.150    BOC    5200  0.018780
5  2018-01-03  2.810    BOC    1980  0.267717
6  2018-01-01  5.199    CAT    2000       NaN
7  2018-01-02  4.980    CAT     450 -0.043036
8  2018-01-03  4.990    CAT    3000  0.002006

Explanation

  • First sort your dataframe by ticker and date.
  • Then apply your calculation when consecutive rows have the same ticker.
  • Vectorising is more efficient than calculating results one at a time via lambda.
like image 63
jpp Avatar answered Sep 17 '22 18:09

jpp


I think need lambda function:

df["logret"] = df.groupby("ticker")['price'].apply(lambda x: np.log(x) - np.log(x.shift()))
print (df)
         date  price ticker  volume    logret
0  2018-01-01  1.323     AI    2000       NaN
1  2018-01-02  1.525     AI    1500  0.142093
2  2018-01-03  1.045     AI     500 -0.377978
3  2018-01-01  2.110    BOC    3201       NaN
4  2018-01-02  2.150    BOC    5200  0.018780
5  2018-01-03  2.810    BOC    1980  0.267717
6  2018-01-01  5.199    CAT    2000       NaN
7  2018-01-02  4.980    CAT     450 -0.043036
8  2018-01-03  4.990    CAT    3000  0.002006
like image 29
jezrael Avatar answered Sep 19 '22 18:09

jezrael