I want to count the number of occurrences of each of certain words in a data frame. I currently do it using str.contains:
a = df2[df2['col1'].str.contains("sample")].groupby('col2').size()
n = a.apply(lambda x: 1).sum()
Is there a method to match regular expression and get the count of occurrences? In my case I have a large dataframe and I want to match around 100 strings.
Update: Original answer counts those rows which contain a substring.
To count all the occurrences of a substring you can use .str.count:
In [21]: df = pd.DataFrame(['hello', 'world', 'hehe'], columns=['words'])
In [22]: df.words.str.count("he|wo")
Out[22]:
0 1
1 1
2 2
Name: words, dtype: int64
In [23]: df.words.str.count("he|wo").sum()
Out[23]: 4
The str.contains method accepts a regular expression:
Definition: df.words.str.contains(self, pat, case=True, flags=0, na=nan)
Docstring:
Check whether given pattern is contained in each string in the array
Parameters
----------
pat : string
Character sequence or regular expression
case : boolean, default True
If True, case sensitive
flags : int, default 0 (no flags)
re module flags, e.g. re.IGNORECASE
na : default NaN, fill value for missing values.
For example:
In [11]: df = pd.DataFrame(['hello', 'world'], columns=['words'])
In [12]: df
Out[12]:
words
0 hello
1 world
In [13]: df.words.str.contains(r'[hw]')
Out[13]:
0 True
1 True
Name: words, dtype: bool
In [14]: df.words.str.contains(r'he|wo')
Out[14]:
0 True
1 True
Name: words, dtype: bool
To count the occurences you can just sum this boolean Series:
In [15]: df.words.str.contains(r'he|wo').sum()
Out[15]: 2
In [16]: df.words.str.contains(r'he').sum()
Out[16]: 1
You can use value_count function.
import pandas as pd
# URL to .csv file
data_url = 'https://vincentarelbundock.github.io/Rdatasets/csv/carData/Arrests.csv'
# Reading the data
df = pd.read_csv(data_url, index_col=0)

# pandas count distinct values in column
df['sex'].value_counts()

Source: link
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With