I want to create an indicator variable that will propagate to all rows with the same customer-period value pair as the indicator. Specifically, if baz
is yes
, I want all rows of that same customer and period email to show my indicator.
df
Customer Period Question Score
A 1 foo 2
A 1 bar 3
A 1 baz yes
A 1 biz 1
B 1 bar 2
B 1 baz no
B 1 qux 3
A 2 foo 5
A 2 baz yes
B 2 baz yes
B 2 biz 2
I've tried
df['Indicator'] = np.where(
(df.Question.str.contains('baz') & (df.Score == 'yes')),
1, 0)
which returns
Customer Period Question Score Indicator
A 1 foo 2 0
A 1 bar 3 0
A 1 baz yes 1
A 1 biz 1 0
B 1 bar 2 0
B 1 baz no 0
B 1 qux 3 0
A 2 foo 5 0
A 2 baz yes 1
B 2 baz yes 1
B 2 biz 2 0
But this is the desired output:
Customer Period Question Score Indicator
A 1 foo 2 1
A 1 bar 3 1
A 1 baz yes 1
A 1 biz 1 1
B 1 bar 2 0
B 1 baz no 0
B 1 qux 3 0
A 2 foo 5 1
A 2 baz yes 1
B 2 baz yes 1
B 2 biz 2 1
I'm not sure how to go about getting what I want. Maybe groupby with ffill and another with bfill?
You can use
In [954]: df['Indicator'] = (df.assign(eq=df.Question.eq('baz') & df.Score.eq('yes'))
.groupby(['Customer', 'Period'])['eq']
.transform('any').astype(int))
In [955]: df
Out[955]:
Customer Period Question Score Indicator
0 A 1 foo 2 1
1 A 1 bar 3 1
2 A 1 baz yes 1
3 A 1 biz 1 1
4 B 1 bar 2 0
5 B 1 baz no 0
6 B 1 qux 3 0
7 A 2 foo 5 1
8 A 2 baz yes 1
9 B 2 baz yes 1
10 B 2 biz 2 1
Details
In [956]: df.Question.eq('baz') & df.Score.eq('yes')
Out[956]:
0 False
1 False
2 True
3 False
4 False
5 False
6 False
7 False
8 True
9 True
10 False
dtype: bool
In [957]: df.assign(eq=df.Question.eq('baz') & df.Score.eq('yes'))
Out[957]:
Customer Period Question Score Indicator eq
0 A 1 foo 2 1 False
1 A 1 bar 3 1 False
2 A 1 baz yes 1 True
3 A 1 biz 1 1 False
4 B 1 bar 2 0 False
5 B 1 baz no 0 False
6 B 1 qux 3 0 False
7 A 2 foo 5 1 False
8 A 2 baz yes 1 True
9 B 2 baz yes 1 True
10 B 2 biz 2 1 False
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With