Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Avoid pandas str.replace using a regex

I have the following pandas dataframe. Say it has two columns: id and search_term:

id       search_term
37651    inline switch

I do:

train['search_term'] = train['search_term'].str.replace("in."," in. ")

expecting that the dataset above is unaffected, but I get in return for this dataset:

id       search_term
37651    in.  in.  switch

which means inl is replaced by in. and ine is replaced by in., as if I where using a regular expression, where dot means any character.

How do I rewrite the first command so that, literally, in. is replaced by in. but any in not followed by a dot is untouched, as in:

a = 'inline switch'
a = a.replace('in.','in. ')

a
>>> 'inline switch'
like image 550
Alejandro Simkievich Avatar asked Mar 29 '16 23:03

Alejandro Simkievich


People also ask

Does str replace use regex?

replace in JavaScript. To use RegEx, the first argument of replace will be replaced with regex syntax, for example /regex/ . This syntax serves as a pattern where any parts of the string that match it will be replaced with the new substring. The string 3foobar4 matches the regex /\d.

Does string replace take regex Python?

To replace a string in Python, the regex sub() method is used. It is a built-in Python method in re module that returns replaced string. Don't forget to import the re module. This method searches the pattern in the string and then replace it with a new given expression.

What is regex in replace pandas?

regex: It checks whether to interpret to_replace and/or value as regular expressions. If it is True, then to_replace must be a string. Otherwise, to_replace must be None because this parameter will be interpreted as a regular expression or a list, dict, or array of regular expressions.

Can you use regex in pandas?

A regular expression (regex) is a sequence of characters that define a search pattern. To filter rows in Pandas by regex, we can use the str. match() method.


3 Answers

The version 0.23 or newer, the str.replace() got a new option for switching regex. Following will simply turn it off.

df.search_term.str.replace('in.', 'in. ', regex=False)

Will results in:

0    inline switch
1         in. here
Name: search_term, dtype: object
like image 179
daisukelab Avatar answered Sep 28 '22 05:09

daisukelab


and here is the answer: regular expression to match a dot.

str.replace() in pandas indeed uses regex, so that:

df['a'] = df['a'].str.replace('in.', ' in. ')

is not comparable to:

a.replace('in.', ' in. ')

the latter does not use regex. So use '\.' instead of '.' in a statement that uses regex if you really mean dot and not any character.

Regular Expression to match a dot

like image 21
Alejandro Simkievich Avatar answered Sep 28 '22 05:09

Alejandro Simkievich


Try escaping the .:

import pandas as pd

df = pd.DataFrame({'search_term': ['inline switch', 'in.here']})
>>> df.search_term.str.replace('in\\.', 'in. ')
0    inline switch
1          in. here
Name: search_term, dtype: object
like image 43
Ami Tavory Avatar answered Sep 28 '22 05:09

Ami Tavory