Probably a simple question but I could not find a simple answer. Let's for example take the following column Status within a dataframe df1:
**Status**
Planned
Unplanned
Missing
Corrected
I would like to count the rows when a cell contains, Planned and Missing. I tried the following:
test1 = df1['Status'].str.contains('Planned|Missing').value_counts()
The column Status is from the type: object. What's wrong with my line of code?
You can just filter the df with your boolean condition and then call len
:
In [155]:
len(df[df['Status'].str.contains('Planned|Missing')])
Out[155]:
2
Or use the index True
from your value_counts
:
In [158]:
df['Status'].str.contains('Planned|Missing').value_counts()[True]
Out[158]:
2
pd.Series.str.contains
when coupled with na=False
guarantees you have a Boolean series. Note also True
/ False
act like 1
/ 0
with numeric computations. You can now use pd.Series.sum
directly:
count = df['Status'].str.contains('Planned|Missing', na=False).sum()
This avoids unnecessary and expensive dataframe indexing operations.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With