Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

propagate conditional column value in pandas

Tags:

python

pandas

I want to create an indicator variable that will propagate to all rows with the same customer-period value pair as the indicator. Specifically, if baz is yes, I want all rows of that same customer and period email to show my indicator.

df
  Customer  Period Question Score
        A       1      foo     2
        A       1      bar     3
        A       1      baz   yes
        A       1      biz     1
        B       1      bar     2
        B       1      baz    no
        B       1      qux     3
        A       2      foo     5
        A       2      baz   yes
        B       2      baz   yes          
        B       2      biz     2          

I've tried

df['Indicator'] = np.where(
                 (df.Question.str.contains('baz') & (df.Score == 'yes')),            
                 1, 0)

which returns

   Customer  Period Question Score  Indicator
         A       1      foo     2          0
         A       1      bar     3          0
         A       1      baz   yes          1
         A       1      biz     1          0
         B       1      bar     2          0
         B       1      baz    no          0
         B       1      qux     3          0
         A       2      foo     5          0
         A       2      baz   yes          1
         B       2      baz   yes          1
         B       2      biz     2          0

But this is the desired output:

   Customer  Period Question Score  Indicator
         A       1      foo     2          1
         A       1      bar     3          1
         A       1      baz   yes          1
         A       1      biz     1          1
         B       1      bar     2          0
         B       1      baz    no          0
         B       1      qux     3          0
         A       2      foo     5          1
         A       2      baz   yes          1
         B       2      baz   yes          1
         B       2      biz     2          1

I'm not sure how to go about getting what I want. Maybe groupby with ffill and another with bfill?

like image 454
Andrew Avatar asked Aug 21 '18 14:08

Andrew


1 Answers

You can use

In [954]: df['Indicator'] = (df.assign(eq=df.Question.eq('baz') & df.Score.eq('yes'))
                               .groupby(['Customer', 'Period'])['eq']
                               .transform('any').astype(int))

In [955]: df
Out[955]:
   Customer  Period Question Score  Indicator
0         A       1      foo     2          1
1         A       1      bar     3          1
2         A       1      baz   yes          1
3         A       1      biz     1          1
4         B       1      bar     2          0
5         B       1      baz    no          0
6         B       1      qux     3          0
7         A       2      foo     5          1
8         A       2      baz   yes          1
9         B       2      baz   yes          1
10        B       2      biz     2          1

Details

In [956]: df.Question.eq('baz') & df.Score.eq('yes')
Out[956]:
0     False
1     False
2      True
3     False
4     False
5     False
6     False
7     False
8      True
9      True
10    False
dtype: bool

In [957]: df.assign(eq=df.Question.eq('baz') & df.Score.eq('yes'))
Out[957]:
   Customer  Period Question Score  Indicator     eq
0         A       1      foo     2          1  False
1         A       1      bar     3          1  False
2         A       1      baz   yes          1   True
3         A       1      biz     1          1  False
4         B       1      bar     2          0  False
5         B       1      baz    no          0  False
6         B       1      qux     3          0  False
7         A       2      foo     5          1  False
8         A       2      baz   yes          1   True
9         B       2      baz   yes          1   True
10        B       2      biz     2          1  False
like image 171
Zero Avatar answered Oct 09 '22 21:10

Zero