Consider the following dataframe. I want to count the number of '$' that appear in a string. I use the <code>str.count</code> function in pandas (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.count.html). <pre class="prettyprint"><code>>>> import pandas as pd >>> df = pd.DataFrame(['$$a', '$$b', '$c'], columns=['A']) >>> df['A'].str.count('$') 0 1 1 1 2 1 Name: A, dtype: int64 </code></pre> I was expecting the result to be <code>[2,2,1]</code>. What am I doing wrong? In Python, the <code>count</code> function in the string module returns the correct result. <pre class="prettyprint"><code>>>> a = "$$$$abcd" >>> a.count('$') 4 >>> a = '$abcd$dsf$' >>> a.count('$') 3 </code></pre>

<code>$</code> has a special meaning in RegEx - it's end-of-line, so try this: <pre class="prettyprint"><code>In [21]: df.A.str.count(r'\$') Out[21]: 0 2 1 2 2 1 Name: A, dtype: int64 </code></pre>

As the other answers have noted, the issue here is that <code>$</code> denotes the end of the line. If you do not intend to use regular expressions, you may find that using <code>str.count</code> (that is, the method from the built-in type <code>str</code>) is faster than its pandas counterpart; <pre class="prettyprint"><code>In [39]: df['A'].apply(lambda x: x.count('$')) Out[39]: 0 2 1 2 2 1 Name: A, dtype: int64 In [40]: %timeit df['A'].str.count(r'\$') 1000 loops, best of 3: 243 µs per loop In [41]: %timeit df['A'].apply(lambda x: x.count('$')) 1000 loops, best of 3: 202 µs per loop </code></pre>

Pandas str.count

Tags:

python

pandas

Consider the following dataframe. I want to count the number of '$' that appear in a string. I use the str.count function in pandas (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.count.html).

Click to copy

>>> import pandas as pd
>>> df = pd.DataFrame(['$$a', '$$b', '$c'], columns=['A'])
>>> df['A'].str.count('$')
0    1
1    1
2    1
Name: A, dtype: int64

I was expecting the result to be [2,2,1]. What am I doing wrong?

In Python, the count function in the string module returns the correct result.

Click to copy

>>> a = "$$$$abcd"
>>> a.count('$')
4
>>> a = '$abcd$dsf$'
>>> a.count('$')
3

409

asked Nov 29 '16 20:11

user4979733

2 Answers

$ has a special meaning in RegEx - it's end-of-line, so try this:

Click to copy

In [21]: df.A.str.count(r'\$')
Out[21]:
0    2
1    2
2    1
Name: A, dtype: int64

answered Sep 29 '22 18:09

MaxU - stop WAR against UA

As the other answers have noted, the issue here is that $ denotes the end of the line. If you do not intend to use regular expressions, you may find that using str.count (that is, the method from the built-in type str) is faster than its pandas counterpart;

Click to copy

In [39]: df['A'].apply(lambda x: x.count('$'))
Out[39]: 
0    2
1    2
2    1
Name: A, dtype: int64

In [40]: %timeit df['A'].str.count(r'\$')
1000 loops, best of 3: 243 µs per loop

In [41]: %timeit df['A'].apply(lambda x: x.count('$'))
1000 loops, best of 3: 202 µs per loop

answered Sep 29 '22 17:09

fuglede

Related questions
                            
                                What does this overflow error in python mean?
                            
                                in python, how do you denote required parameters and optional parameters in code?
                            
                                Keras BFGS training using Scipy minimize
                            
                                How to use the output from OneHotEncoder in sklearn?
                            
                                How to quickly fetch all documents MongoDB pymongo
                            
                                Python pandas: replace values based on location not index value
                            
                                Django Rest Framework - Serializer Method field
                            
                                PPrint not working (Python)?
                            
                                How to print 'tight' dots horizontally in python?
                            
                                Django Charfield null=False Integrity Error not raised
                            
                                Uploading files using Browse Button in Jupyter and Using/Saving them
                            
                                Read multiple lines from a file batch by batch
                            
                                SQLAlchemy load_only on parent model
                            
                                Can I use pandas.dataframe.isin() with a numeric tolerance parameter?
                            
                                How to draw a precision-recall curve with interpolation in python?
                            
                                statistical summary table in sklearn.linear_model.ridge?
                            
                                scipy convolve2d outputs wrong values
                            
                                Log file to Pandas Dataframe
                            
                                Optional command line arguments
                            
                                Prevent pandas.read_csv from inferring dtypes

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas str.count

Tags:

python

pandas

user4979733

People also ask

2 Answers

MaxU - stop WAR against UA

fuglede

Recent Activity

Donate For Us