I have two dataframes of boolean values.
The first one looks like this:
b1=pd.DataFrame([[ True, False, False, False, False],
[False, False, True, False, False],
[False, True, False, False, False],
[False, False, False, False, False]])
b1
Out[88]:
0 1 2 3 4
0 True False False False False
1 False False True False False
2 False True False False False
3 False False False False False
If I am just interested in whether each row has any True value I can use the any
method:
b1.any(1)
Out[89]:
0 True
1 True
2 True
3 False
dtype: bool
However, I want to have an added constraint based on a second dataframe that looks like the following:
b2 = pd.DataFrame([[ True, False, True, False, False],
[False, False, True, True, True],
[ True, True, False, False, False],
[ True, True, True, False, False]])
b2
Out[91]:
0 1 2 3 4
0 True False True False False
1 False False True True True
2 True True False False False
3 True True True False False
I want to identify rows that have a True value in the first dataframe ONLY if it is the first True value in a row of the second dataframe.
For example, this would exclude row 2 because although it has a True value in the first dataframe, it is the 2nd true value in the second dataframe. In contrast, rows 1 and 2 have a true value in dataframe 1 that is also the first true value in dataframe 2. The output should be the following:
0 True
1 True
2 False
3 False
dtype: bool
One way would be to use cumsum
to help find the first:
In [123]: (b1 & b2 & (b2.cumsum(axis=1) == 1)).any(axis=1)
Out[123]:
0 True
1 True
2 False
3 False
dtype: bool
This works because b2.cumsum(axis=1)
gives us the cumulative number of Trues seen, and cases where that number is 1 and b2 itself is True must be the first one.
In [124]: b2.cumsum(axis=1)
Out[124]:
0 1 2 3 4
0 1 1 2 2 2
1 0 0 1 2 3
2 1 2 2 2 2
3 1 2 3 3 3
As a variation to @DSM's clever answer, this approach seemed a little more intuitive to me. The first part should be pretty self-explanatory, and the second part finds the first column number (w/ axis = 1
) that is true for each dataframe and compares.
(b1.any(axis = 1) & (b1.idxmax(axis = 1) == b2.idxmax(axis = 1))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With