Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: assigning columns with multiple conditions and date thresholds

Edited:

I have a financial portfolio in a pandas dataframe df, where the index is the date and I have multiple financial stocks per date.

Eg dataframe:

Date    Stock   Weight  Percentile  Final weight
1/1/2000    Apple   0.010   0.75    0.010
1/1/2000    IBM    0.011    0.4     0
1/1/2000    Google  0.012   0.45    0
1/1/2000    Nokia   0.022   0.81    0.022
2/1/2000    Apple   0.014   0.56    0
2/1/2000    Google  0.015   0.45    0
2/1/2000    Nokia   0.016   0.55    0
3/1/2000    Apple   0.020   0.52    0
3/1/2000    Google  0.030   0.51    0
3/1/2000    Nokia   0.040   0.47    0

I created Final_weight by doing assigning values of Weight whenever Percentile is greater than 0.7

Now I want this to be a bit more sophisticated, I still want Weight to be assigned to Final_weight when Percentile is > 0.7, however after this date (at any point in the future), rather than become 0 when a stocks Percentile is not >0.7, we would still get a weight as long as the Stocks Percentile is above 0.5 (ie holding the position for longer than just one day).

Then if the stock goes below 0.5 (in the near future) then Final_weight would become 0.

Eg modified dataframe from above:

Date    Stock   Weight  Percentile  Final weight
1/1/2000    Apple   0.010   0.75    0.010
1/1/2000    IBM     0.011   0.4     0
1/1/2000    Google  0.012   0.45    0
1/1/2000    Nokia   0.022   0.81    0.022
2/1/2000    Apple   0.014   0.56    0.014
2/1/2000    Google  0.015   0.45    0
2/1/2000    Nokia   0.016   0.55    0.016
3/1/2000    Apple   0.020   0.52    0.020
3/1/2000    Google  0.030   0.51    0
3/1/2000    Nokia   0.040   0.47    0

Everyday the portfolios are different not always have the same stock from the day before.

like image 742
MysterioProgrammer91 Avatar asked May 04 '17 20:05

MysterioProgrammer91


Video Answer


2 Answers

This solution is more explicit and less pandas-esque, but it involves only a single pass through all rows without creating tons of temporary columns, and is therefore possibly faster. It needs an additional state variable, which I wrapped it into a closure for not having to make a class.

def closure():
    cur_weight = {}
    def func(x):
        if x["Percentile"] > 0.7:
            next_weight = x["Weight"]
        elif x["Percentile"] < 0.5 :
            next_weight = 0
        else:
            next_weight = x["Weight"] if cur_weight.get(x["Stock"], 0) > 0 else 0
        cur_weight[x["Stock"]] = next_weight
        return next_weight
    return func

df["FinalWeight"] = df.apply(closure(), axis=1)
like image 172
cronos Avatar answered Sep 27 '22 22:09

cronos


  • I'd first put 'Stock' into the index
  • Then unstack to put them into the columns
  • I'd then split w for weights and p for percentiles
  • Then manipulate with a series of where

d1 = df.set_index('Stock', append=True)

d2 = d1.unstack()

w, p = d2.Weight, d2.Percentile

d1.join(w.where(p > .7, w.where((p.shift() > .7) & (p > .5), 0)).stack().rename('Final Weight'))

                   Weight  Percentile  Final Weight
Date       Stock                                   
2000-01-01 Apple    0.010        0.75         0.010
           IBM      0.011        0.40         0.000
           Google   0.012        0.45         0.000
           Nokia    0.022        0.81         0.022
2000-02-01 Apple    0.014        0.56         0.014
           Google   0.015        0.45         0.000
           Nokia    0.016        0.55         0.016
like image 36
piRSquared Avatar answered Sep 27 '22 22:09

piRSquared