Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas count different combinations of 2 columns with nan

I have a dataframe similar to

df = pd.DataFrame({'A': [1, np.nan,2,3, np.nan,4], 'B': [np.nan, 1,np.nan,2, 3, np.nan]})
df
     A    B
0  1.0  NaN
1  NaN  1.0
2  2.0  NaN
3  3.0  2.0
4  NaN  3.0
5  4.0  NaN

How do I count the number of occurrences of A is np.nan but B not np.nan, A not np.nan but B is np.nan, and A and B both not np.nan?

I tried df.groupby(['A', 'B']).count() but it doesn't read the rows with np.nan.

like image 940
A1122 Avatar asked Nov 29 '22 21:11

A1122


2 Answers

Using

df.isnull().groupby(['A','B']).size()
Out[541]: 
A      B    
False  False    1
       True     3
True   False    2
dtype: int64
like image 109
BENY Avatar answered Dec 04 '22 01:12

BENY


You can use DataFrame.isna with crosstab for count Trues values:

df1 = df.isna()
df2 = pd.crosstab(df1.A, df1.B)
print (df2)
B      False  True 
A                  
False      1      3
True       2      0

For scalar:

print (df2.loc[False, False])
1

df2 = pd.crosstab(df1.A, df1.B).add_prefix('B_').rename(lambda x: 'A_' + str(x))
print (df2)
B        B_False  B_True
A                       
A_False        1       3
A_True         2       0

Then for scalar use indexing:

print (df2.loc['A_False', 'B_False'])
1

Another solution is use DataFrame.dot by columns names with Series.replace and Series.value_counts:

df = pd.DataFrame({'A': [1, np.nan,2,3, np.nan,4, np.nan], 
                   'B': [np.nan, 1,np.nan,2, 3, np.nan, np.nan]})

s = df.isna().dot(df.columns).replace({'':'no match'}).value_counts()
print (s)

B           3
A           2
no match    1
AB          1
dtype: int64
like image 42
jezrael Avatar answered Dec 04 '22 01:12

jezrael