Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace multiple substrings in a Pandas series with a value

All,

To replace one string in one particular column I have done this and it worked fine:

dataUS['sec_type'].str.strip().str.replace("LOCAL","CORP")

I would like now to replace multiple strings with one string say replace ["LOCAL", "FOREIGN", "HELLO"] with "CORP"

How can make it work? the code below didn't work

dataUS['sec_type'].str.strip().str.replace(["LOCAL", "FOREIGN", "HELLO"], "CORP")
like image 317
SBad Avatar asked Mar 21 '18 17:03

SBad


People also ask

How do I replace a string in a pandas series?

Pandas Series.str.replace () method works like Python.replace () method only, but it works on Series too. Before calling.replace () on a Pandas series,.str has to be prefixed in order to differentiate it from the Python’s default replace method. Syntax: Series.str.replace (pat, repl, n=-1, case=None, regex=True)

How to replace multiple values in a Dataframe in pandas?

By using DataFrame.replace () method we will replace multiple values with multiple new strings or text for an individual DataFrame column. This method searches the entire Pandas DataFrame and replaces every specified value.

How to replace substrings in column values in pandas na naive Bayes?

Naive Bayes is a simple but powerful machine learning model that is often used for classification tasks. To replace substrings in column values in Pandas, use the Series' str.replace (~) method. Thanks for the feedback!

Why must STR be prefixed before calling replace () in pandas series?

Before calling .replace () on a Pandas series, .str has to be prefixed in order to differentiate it from the Python’s default replace method. Attention geek!


1 Answers

You can perform this task by forming a |-separated string. This works because pd.Series.str.replace accepts regex:

Replace occurrences of pattern/regex in the Series/Index with some other string. Equivalent to str.replace() or re.sub().

This avoids the need to create a dictionary.

import pandas as pd

df = pd.DataFrame({'A': ['LOCAL TEST', 'TEST FOREIGN', 'ANOTHER HELLO', 'NOTHING']})

pattern = '|'.join(['LOCAL', 'FOREIGN', 'HELLO'])

df['A'] = df['A'].str.replace(pattern, 'CORP')

#               A
# 0     CORP TEST
# 1     TEST CORP
# 2  ANOTHER CORP
# 3       NOTHING
like image 104
jpp Avatar answered Sep 26 '22 08:09

jpp