Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas str.count

Tags:

python

pandas

Consider the following dataframe. I want to count the number of '$' that appear in a string. I use the str.count function in pandas (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.count.html).

>>> import pandas as pd
>>> df = pd.DataFrame(['$$a', '$$b', '$c'], columns=['A'])
>>> df['A'].str.count('$')
0    1
1    1
2    1
Name: A, dtype: int64

I was expecting the result to be [2,2,1]. What am I doing wrong?

In Python, the count function in the string module returns the correct result.

>>> a = "$$$$abcd"
>>> a.count('$')
4
>>> a = '$abcd$dsf$'
>>> a.count('$')
3
like image 409
user4979733 Avatar asked Nov 29 '16 20:11

user4979733


People also ask

How do you count occurrences of a string in Pandas?

The str. count() function is used to count occurrences of pattern in each string of the Series/Index. This function is used to count the number of times a particular regex pattern is repeated in each of the string elements of the Series.

How do you get Pandas value counts?

value_counts() function returns object containing counts of unique values. The resulting object will be in descending order so that the first element is the most frequently-occurring element.

How do you count specific words in Pandas?

Pandas str. count() method is used to count occurrence of a string or regex pattern in each string of a series.

How do you count characters in Pandas?

To calculate the numbers of characters we use Series. str. len(). This function returns the count of the characters in each word in a series.


2 Answers

$ has a special meaning in RegEx - it's end-of-line, so try this:

In [21]: df.A.str.count(r'\$')
Out[21]:
0    2
1    2
2    1
Name: A, dtype: int64
like image 78
MaxU - stop WAR against UA Avatar answered Sep 29 '22 18:09

MaxU - stop WAR against UA


As the other answers have noted, the issue here is that $ denotes the end of the line. If you do not intend to use regular expressions, you may find that using str.count (that is, the method from the built-in type str) is faster than its pandas counterpart;

In [39]: df['A'].apply(lambda x: x.count('$'))
Out[39]: 
0    2
1    2
2    1
Name: A, dtype: int64

In [40]: %timeit df['A'].str.count(r'\$')
1000 loops, best of 3: 243 µs per loop

In [41]: %timeit df['A'].apply(lambda x: x.count('$'))
1000 loops, best of 3: 202 µs per loop
like image 23
fuglede Avatar answered Sep 29 '22 17:09

fuglede