Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find first true value in a row of Pandas dataframe

Tags:

python

pandas

I have two dataframes of boolean values.

The first one looks like this:

b1=pd.DataFrame([[ True, False, False, False, False],
       [False, False,  True, False, False],
       [False,  True, False, False, False],
       [False, False, False, False, False]])

b1
Out[88]: 
       0      1      2      3      4
0   True  False  False  False  False
1  False  False   True  False  False
2  False   True  False  False  False
3  False  False  False  False  False

If I am just interested in whether each row has any True value I can use the any method:

b1.any(1)
Out[89]: 
0    True
1    True
2    True
3    False
dtype: bool

However, I want to have an added constraint based on a second dataframe that looks like the following:

b2 = pd.DataFrame([[ True, False,  True, False, False],
       [False, False,  True,  True,  True],
       [ True,  True, False, False, False],
       [ True,  True,  True, False, False]])

b2
Out[91]: 
       0      1      2      3      4
0   True  False   True  False  False
1  False  False   True   True   True
2   True   True  False  False  False
3   True   True   True  False  False

I want to identify rows that have a True value in the first dataframe ONLY if it is the first True value in a row of the second dataframe.

For example, this would exclude row 2 because although it has a True value in the first dataframe, it is the 2nd true value in the second dataframe. In contrast, rows 1 and 2 have a true value in dataframe 1 that is also the first true value in dataframe 2. The output should be the following:

0    True
1    True
2    False
3    False
dtype: bool
like image 475
AJG519 Avatar asked Dec 23 '22 18:12

AJG519


2 Answers

One way would be to use cumsum to help find the first:

In [123]: (b1 & b2 & (b2.cumsum(axis=1) == 1)).any(axis=1)
Out[123]: 
0     True
1     True
2    False
3    False
dtype: bool

This works because b2.cumsum(axis=1) gives us the cumulative number of Trues seen, and cases where that number is 1 and b2 itself is True must be the first one.

In [124]: b2.cumsum(axis=1)
Out[124]: 
   0  1  2  3  4
0  1  1  2  2  2
1  0  0  1  2  3
2  1  2  2  2  2
3  1  2  3  3  3
like image 154
DSM Avatar answered Jan 06 '23 17:01

DSM


As a variation to @DSM's clever answer, this approach seemed a little more intuitive to me. The first part should be pretty self-explanatory, and the second part finds the first column number (w/ axis = 1) that is true for each dataframe and compares.

(b1.any(axis = 1) & (b1.idxmax(axis = 1) == b2.idxmax(axis = 1))
like image 45
elPastor Avatar answered Jan 06 '23 17:01

elPastor