Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Rolling second Highest Value based on another column

For below sample data:

data={'Person':['a','a','a','a','a','b','b','b','b','b','b'],
     'Sales':['50','60','90','30','33','100','600','80','90','400','550'],
     'Price':['10','12','8','10','12','10','13','16','14','12','10']}
data=pd.DataFrame(data)

For each person (group) I would like the price based on the second highest Sales on a rolling basis but the window will be different for each group. The result should look like:

result={'Person':['a','a','a','a','a','b','b','b','b','b','b'],
     'Sales':['50','60','90','30','33','100','600','80','90','400','550'],
     'Price':['10','12','8','10','12','10','13','16','14','12','10'],
     'Second_Highest_Price':['','10','12','12','12','','10','10','10','12','10']}

I tried using nlargest(2) but not sure how to get it to work on a rolling basis.

like image 708
Ksh Avatar asked Jul 07 '21 22:07

Ksh


1 Answers

It is not the most elegant solution, but I would do the following:

1- Load Dataset

import numpy as np
import pandas as pd

data={'Person':['a','a','a','a','a','b','b','b','b','b','b'],
     'Sales':['50','60','90','30','33','100','600','80','90','400','550'],
     'Price':['10','12','8','10','12','10','13','16','14','12','10']}

data=pd.DataFrame(data)

data['Sales'] = data['Sales'].astype(float)

2- Use Groupby and expanding together:

data['2nd_sales'] = data.groupby('Person')['Sales'].expanding(min_periods=2) \
                                  .apply(lambda x: x.nlargest(2).values[-1]).values

3- Calculate the Second_Highest_Price:

data['Second_Highest_Price'] = np.where((data['Sales'].shift() == data['2nd_sales']), data['Price'].shift(),
                                (np.where((data['Sales'] == data['2nd_sales']), data['Price'], np.nan)))

data['Second_Highest_Price'] = data.groupby('Person')['Second_Highest_Price'].ffill()

Output:

data['Second_Highest_Price'].values

array([nan, '10', '12', '12', '12', nan, '10', '10', '10', '12', '10'],
      dtype=object)
like image 63
William Avatar answered Oct 22 '22 17:10

William