I have the following pandas dataframe. Say it has two columns: id
and search_term
:
id search_term
37651 inline switch
I do:
train['search_term'] = train['search_term'].str.replace("in."," in. ")
expecting that the dataset above is unaffected, but I get in return for this dataset:
id search_term
37651 in. in. switch
which means inl
is replaced by in.
and ine
is replaced by in.
, as if I where using a regular expression, where dot means any character.
How do I rewrite the first command so that, literally, in.
is replaced by in.
but any in
not followed by a dot is untouched, as in:
a = 'inline switch'
a = a.replace('in.','in. ')
a
>>> 'inline switch'
replace in JavaScript. To use RegEx, the first argument of replace will be replaced with regex syntax, for example /regex/ . This syntax serves as a pattern where any parts of the string that match it will be replaced with the new substring. The string 3foobar4 matches the regex /\d.
To replace a string in Python, the regex sub() method is used. It is a built-in Python method in re module that returns replaced string. Don't forget to import the re module. This method searches the pattern in the string and then replace it with a new given expression.
regex: It checks whether to interpret to_replace and/or value as regular expressions. If it is True, then to_replace must be a string. Otherwise, to_replace must be None because this parameter will be interpreted as a regular expression or a list, dict, or array of regular expressions.
A regular expression (regex) is a sequence of characters that define a search pattern. To filter rows in Pandas by regex, we can use the str. match() method.
The version 0.23 or newer, the str.replace()
got a new option for switching regex.
Following will simply turn it off.
df.search_term.str.replace('in.', 'in. ', regex=False)
Will results in:
0 inline switch
1 in. here
Name: search_term, dtype: object
and here is the answer: regular expression to match a dot.
str.replace() in pandas indeed uses regex, so that:
df['a'] = df['a'].str.replace('in.', ' in. ')
is not comparable to:
a.replace('in.', ' in. ')
the latter does not use regex. So use '\.' instead of '.' in a statement that uses regex if you really mean dot and not any character.
Regular Expression to match a dot
Try escaping the .
:
import pandas as pd
df = pd.DataFrame({'search_term': ['inline switch', 'in.here']})
>>> df.search_term.str.replace('in\\.', 'in. ')
0 inline switch
1 in. here
Name: search_term, dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With