I have two dataframes of boolean values.
The first one looks like this:
b1=pd.DataFrame([[ True, False, False, False, False],
       [False, False,  True, False, False],
       [False,  True, False, False, False],
       [False, False, False, False, False]])
b1
Out[88]: 
       0      1      2      3      4
0   True  False  False  False  False
1  False  False   True  False  False
2  False   True  False  False  False
3  False  False  False  False  False
If I am just interested in whether each row has any True value I can use the any method:
b1.any(1)
Out[89]: 
0    True
1    True
2    True
3    False
dtype: bool
However, I want to have an added constraint based on a second dataframe that looks like the following:
b2 = pd.DataFrame([[ True, False,  True, False, False],
       [False, False,  True,  True,  True],
       [ True,  True, False, False, False],
       [ True,  True,  True, False, False]])
b2
Out[91]: 
       0      1      2      3      4
0   True  False   True  False  False
1  False  False   True   True   True
2   True   True  False  False  False
3   True   True   True  False  False
I want to identify rows that have a True value in the first dataframe ONLY if it is the first True value in a row of the second dataframe.
For example, this would exclude row 2 because although it has a True value in the first dataframe, it is the 2nd true value in the second dataframe. In contrast, rows 1 and 2 have a true value in dataframe 1 that is also the first true value in dataframe 2. The output should be the following:
0    True
1    True
2    False
3    False
dtype: bool
                One way would be to use cumsum to help find the first:
In [123]: (b1 & b2 & (b2.cumsum(axis=1) == 1)).any(axis=1)
Out[123]: 
0     True
1     True
2    False
3    False
dtype: bool
This works because b2.cumsum(axis=1) gives us the cumulative number of Trues seen, and cases where that number is 1 and b2 itself is True must be the first one.
In [124]: b2.cumsum(axis=1)
Out[124]: 
   0  1  2  3  4
0  1  1  2  2  2
1  0  0  1  2  3
2  1  2  2  2  2
3  1  2  3  3  3
                        As a variation to @DSM's clever answer, this approach seemed a little more intuitive to me.  The first part should be pretty self-explanatory, and the second part finds the first column number (w/ axis = 1) that is true for each dataframe and compares.
(b1.any(axis = 1) & (b1.idxmax(axis = 1) == b2.idxmax(axis = 1))
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With