I am trying to use df['column_name'].str.count("+")
in python pandas, but I receive
"error: nothing to repeat"
. With the regular characters the method works, e.g. df['column_name'].str.count("a")
works fine.
Also, there is a problem with the "^"-sign. If I use df['column_name'].str.contains("^")
the result is incorrect - it looks like "^" gets interpreted as " " (empty space).
Surprisingly, if I use .count("+")
and .contains("^")
on a regular, non-pandas string they work perfectly fine.
simple working example:
df = pd.DataFrame({'column1': ['Nighthawks+', 'Dragoons'], 'column2': ['1st', '2nd']}, columns = ['column1', 'column2'])
When applying df["column1"].str.contains("^")
one gets "True, True" but is should be "False, False".
And when applying df["column1"].str.count("+")
one gets
"error: nothing to repeat"
But then, outside of panda, "bla++".count("+")
gives correctly the result "2".
Any solutions? Thanks
You need to escape the plus sign:
In[10]:
df = pd.DataFrame({'a':['dsa^', '^++', '+++','asdasads']})
df
Out[10]:
a
0 dsa^
1 ^++
2 +++
3 asdasads
In[11]:
df['a'].str.count("\+")
Out[11]:
0 0
1 2
2 3
3 0
Name: a, dtype: int64
Also when you do df['a'].str.count('^')
this just returns 1
for all rows:
In[12]:
df['a'].str.count('^')
Out[12]:
0 1
1 1
2 1
3 1
Name: a, dtype: int64
Again you need to escape the pattern:
In[16]:
df['a'].str.count('\^')
Out[16]:
0 1
1 1
2 0
3 0
Name: a, dtype: int64
EDIT
Regarding the semantic difference between count
on a normal string and on a Series
, count
on a python str
just does a character count, but str.count
takes a regex pattern. The ^
and +
are special characters which need to be escaped with a backslash if you are searching for those characters
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With