Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace '..' and '?.' with single periods and question marks in pandas? df['column'].str.replace not working

Tags:

python

pandas

This is a follow up to this SO post which gives a solution to replace text in a string column

How to replace text in a column of a Pandas dataframe?

df['range'] = df['range'].str.replace(',','-')

However, this doesn't seem to work with double periods or a question mark followed by a period

testList = ['this is a.. test stence', 'for which is ?. was a time']
testDf = pd.DataFrame(testList, columns=['strings'])
testDf['strings'].str.replace('..', '.').head()

results in

0     ...........e
1    .............
Name: strings, dtype: object

and

testDf['strings'].str.replace('?.', '?').head()

results in

error: nothing to repeat at position 0
like image 551
SantoshGupta7 Avatar asked Jan 25 '23 20:01

SantoshGupta7


2 Answers

Add regex=False parameter, because as you can see in the docs, regex it's by default True:

-regex bool, default True

Determines if assumes the passed-in pattern is a regular expression: If True, assumes the passed-in pattern is a regular expression.

And ? . are special characters in regular expressions.
So, one way to do it without regex will be this double replacing:

testDf['strings'].str.replace('..', '.',regex=False).str.replace('?.', '?',regex=False)

Output:

                     strings
0     this is a. test stence
1  for which is ? was a time
like image 138
MrNobody33 Avatar answered Jan 27 '23 10:01

MrNobody33


Replace using regular expression. In this case, replace any sepcial character '.' followed immediately by white space. This is abit curly, I advice you go with @Mark Reed answer.

testDf.replace(regex=r'([.](?=\s))', value=r'')


                  strings
0     this is a. test stence
1  for which is ? was a time
like image 23
wwnde Avatar answered Jan 27 '23 10:01

wwnde