Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Summing Booleans in a Dataframe

I have a non-indexed Pandas dataframe where each row consists of numeric and boolean values with some NaNs. An example row in my dataframe might look like this (with variables above):

X_1  X_2  X_3 X_4   X_5  X_6 X_7  X_8  X_9   X_10  X_11  X_12
24.4 True 5.1 False 22.4 55  33.4 True 18.04 False NaN   NaN

I would like to add a new variable to my dataframe, call it X_13, which is the number of True values in each row. So in the above case, I would like to obtain:

X_1  X_2  X_3 X_4   X_5  X_6 X_7  X_8  X_9   X_10  X_11  X_12 X_13
24.4 True 5.1 False 22.4 55  33.4 True 18.04 False NaN   NaN  2

I have tried df[X_13] = df[X_2] + df[X_4] + df[X_8] + df[X_10] and that gives me what I want unless the row contains a NaN in a location where a Boolean is expected. For those rows, X_13 has the value NaN.

Sorry -- this feels like it should be absurdly simple. Any suggestions?

like image 911
hms Avatar asked Dec 18 '22 14:12

hms


1 Answers

Select boolean columns and then sum:

df.select_dtypes(include=['bool']).sum(axis=1)

If you have NaNs, first fill with False's:

df.fillna(False).select_dtypes(include=['bool']).sum(axis=1)

Consider this DataFrame:

df
Out: 
       a      b  c     d
0   True  False  1  True
1  False   True  2   NaN

df == True returns True for (0, c) as well:

df == True
Out: 
       a      b      c      d
0   True  False   True   True
1  False   True  False  False

So if you take the sum, you will get 3 instead of 2. Another important point is that boolean arrays cannot contain NaNs. So if you check the dtypes, you will see:

df.dtypes
Out: 
a      bool
b      bool
c     int64
d    object
dtype: object

By filling with Falses you can have a boolean array:

df.fillna(False).dtypes
Out: 
a     bool
b     bool
c    int64
d     bool
dtype: object

Now you can safely sum by selecting the boolean columns.

df.fillna(False).select_dtypes(include=['bool']).sum(axis=1)
Out: 
0    2
1    1
dtype: int64
like image 154
ayhan Avatar answered Dec 21 '22 04:12

ayhan