I have a non-indexed Pandas dataframe where each row consists of numeric and boolean values with some NaNs. An example row in my dataframe might look like this (with variables above):
X_1 X_2 X_3 X_4 X_5 X_6 X_7 X_8 X_9 X_10 X_11 X_12
24.4 True 5.1 False 22.4 55 33.4 True 18.04 False NaN NaN
I would like to add a new variable to my dataframe, call it X_13
, which is the number of True values in each row. So in the above case, I would like to obtain:
X_1 X_2 X_3 X_4 X_5 X_6 X_7 X_8 X_9 X_10 X_11 X_12 X_13
24.4 True 5.1 False 22.4 55 33.4 True 18.04 False NaN NaN 2
I have tried df[X_13] = df[X_2] + df[X_4] + df[X_8] + df[X_10]
and that gives me what I want unless the row contains a NaN
in a location where a Boolean is expected. For those rows, X_13
has the value NaN
.
Sorry -- this feels like it should be absurdly simple. Any suggestions?
Select boolean columns and then sum:
df.select_dtypes(include=['bool']).sum(axis=1)
If you have NaNs, first fill with False's:
df.fillna(False).select_dtypes(include=['bool']).sum(axis=1)
Consider this DataFrame:
df
Out:
a b c d
0 True False 1 True
1 False True 2 NaN
df == True
returns True for (0, c) as well:
df == True
Out:
a b c d
0 True False True True
1 False True False False
So if you take the sum, you will get 3 instead of 2. Another important point is that boolean arrays cannot contain NaNs. So if you check the dtypes, you will see:
df.dtypes
Out:
a bool
b bool
c int64
d object
dtype: object
By filling with False
s you can have a boolean array:
df.fillna(False).dtypes
Out:
a bool
b bool
c int64
d bool
dtype: object
Now you can safely sum by selecting the boolean columns.
df.fillna(False).select_dtypes(include=['bool']).sum(axis=1)
Out:
0 2
1 1
dtype: int64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With