I have a relatively tricky iteration question that I am having trouble implementing.
I have a dataframe with the first 6 columns seen below. I am trying to write a function that iterates within groups -- specifically grouping rows by Category and Level -- and then generates a new variable if two conditions are met for that row vs. any other row in the group. I'd like to generate the Opportunity? binary indicator below, where it equals 1 if it matches the condition. The Reason column just provides an explanation of the result I want to generate.
Logic:For each id_group, if ((metric_LHS[entity]>Metric_RHS[other entity in group]) & (metric_LHS[entity]>Baseline[entity])), Opportunity? = 1
So in my example, the Opportunity? column equals 1 for Jim because Metric_LHS(Jim) > Metric_RHS(Jack) and Metric_LHS(Jim)>Baseline(Jim). Meanwhile, Rick is a 0, for example, because the criteria does not work for the only other person in the group, Joe.
See below for some pieces of the code and logic that I have written. My question is the following: How do I iterate within each row of each group and compare that row with every other row in that group?
id_group=df.groupby(['Category','Level'])
for row in id_group:
df['Opportunity?'](([df[metric_LHS][row]>df[Metric_RHS][row+1]) &\
(df[metric_LHS][row]>df[Baseline][row])) = 1
***How to iterate to next row in group?***
When iterating this way over a groupby object, the returned object will be a tuple (index
, group
).
To iterate over the rows for each group
, you could use DataFrame.itterrows
.
Something like this:
id_group=df.groupby(['Category','Level'])
for g_idx, group in id_group:
for r_idx, row in group.iterrows():
if (((row['Metric_LHS'] > group['Metric_RHS']).any())
& (row['Metric_LHS'] > row['Baseline'])):
df.loc[r_idx, 'Opportunity?'] = 1
df = pd.DataFrame({'Name':['Jim', 'Jack', 'Greg', 'Alex', 'Steve', 'Jack', 'Rick', 'Joe', 'Bill', 'Dave', 'Dan'],
'Category':['South']*3 + ['North']*3 + ['West']*3 + ['East']*2,
'Level': [1,1,2,2.5,2.5,2.5,3,3,3.25,4,4],
'Metric_LHS': [100,80,70,110,90,105,110,111,90,87,83],
'Metric_RHS': [120,90,75,115,95,110,112,113,95,90,85],
'Baseline': [95,np.nan,73,112,85,103,105,112,93,75,81],
'Opportunity?': [np.nan]*11})
id_group=df.groupby(['Category','Level'])
for g_idx, group in id_group:
for r_idx, row in group.iterrows():
if (((row['Metric_LHS'] > group['Metric_RHS']).any())
& (row['Metric_LHS'] > row['Baseline'])):
df.loc[r_idx, 'Opportunity?'] = 1
print(df)
Name Category Level Metric_LHS Metric_RHS Baseline Opportunity?
0 Jim South 1.00 100 120 95.0 1.0
1 Jack South 1.00 80 90 NaN NaN
2 Greg South 2.00 70 75 73.0 NaN
3 Alex North 2.50 110 115 112.0 NaN
4 Steve North 2.50 90 95 85.0 NaN
5 Jack North 2.50 105 110 103.0 1.0
6 Rick West 3.00 110 112 105.0 NaN
7 Joe West 3.00 111 113 112.0 NaN
8 Bill West 3.25 90 95 93.0 NaN
9 Dave East 4.00 87 90 75.0 1.0
10 Dan East 4.00 83 85 81.0 NaN
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With