Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace a word in dataframe by using another dataframe in Pandas python

I have two data frames:

df:

id   string_data
1    My name is Jeff
2    Hello, I am John
3    I like Brad he is cool.

Another data frame named allnames contains a list of names like this:

id  name
1   Jeff
2   Brad
3   John
4   Emily
5   Ross

I want to replace all the words in df that appear in allnames['name'] with "Firstname"

Expected output:

id   string_data
1    My name is Firstname
2    Hello, I am Firstname
3    I like Firstname he is cool.

I tried this:

nameList = '|'.join(allnames['name'])
df['string_data'].str.replace(nameList, "FirstName", case = False))

But it replaces almost 99% of the words

like image 425
John Doe Avatar asked May 09 '19 08:05

John Doe


People also ask

How to replace a column in a pandas Dataframe?

Depending on your needs, you may use either of the following methods to replace values in Pandas DataFrame: (1) Replace a single value with a new value for an individual DataFrame column: df ['column name'] = df ['column name']. (2) Replace multiple values with a new value for an individual ...

How do I replace a character in a Dataframe in Python?

Python / October 5, 2020. Here are two ways to replace characters in strings in Pandas DataFrame: (1) Replace character/s under a single DataFrame column: df ['column name'] = df ['column name'].str.replace ('old character','new character') (2) Replace character/s under the entire DataFrame: df = df.replace ('old character','new character', ...

How to replace values from another Dataframe when different indices are used?

So to replace values from another DataFrame when different indices we can use: Now the values are correctly set: You can use Pandas merge function in order to get values and columns from another DataFrame. For this purpose you will need to have reference column between both DataFrames or use the index.

How to replace a string in a Dataframe in Java?

It can be done using the DataFrame.replace () method. It is used to replace a regex, string, list, series, number, dictionary, etc. from a DataFrame, Values of the DataFrame method are get replaced with another value dynamically.


1 Answers

Your solution should working if add words boundaries to Series.str.replace:

nameList = '|'.join(r"\b{}\b".format(x) for x in allnames['name'])
df['string_data'] = df['string_data'].str.replace(nameList, "FirstName", case = False)
print (df)
   id                   string_data
0   1          My name is FirstName
1   2         Hello, I am FirstName
2   3  I like FirstName he is cool.

Or replace values with get and join by dictionary:

d = dict.fromkeys(allnames['name'], 'Firstname')
f = lambda x: ' '.join(d.get(y, y) for y in x.split())
df['string_data'] = df['string_data'].apply(f)
print (df)
   id                   string_data
0   1          My name is Firstname
1   2         Hello, I am Firstname
2   3  I like Firstname he is cool.

EDIT: You can convert all values to lowercase by lower:

d = dict.fromkeys([x.lower() for x in allnames['name']], 'Firstname')
f = lambda x: ' '.join(d.get(y.lower(), y) for y in x.split())
df['string_data'] = df['string_data'].apply(f)
like image 101
jezrael Avatar answered Nov 05 '22 13:11

jezrael