Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace a value in a column by vlookup another dataframe only if the value exists

I want to overwrite my df1.Name values based on a mapping table in (df2.Name1, df2.Name2). However, not all values in df1.Name exist in df2.Name1

df1:

Name
Alex
Maria 
Marias
Pandas
Coala

df2:

Name1   Name2
Alex    Alexs
Marias  Maria
Coala   Coalas

Expected Result:

Name
Alexs
Maria
Maria
Pandas
Coalas

I have tried several solutions online such as using the Map function. By turning df2 in a Dictionary I am using df1.Name = df1.Name.map(Dictionary), but this will result in nan for all values not in df2 as per below.

Name
Alexs
Maria
Maria
NAN
Coalas

I am not sure how to use an IF statement to replace only the ones that do exist in df2 and keep the rest as per df1. I also tried to create a function with if statements, but was big time failure.

How I could approach this problem?

like image 508
A.Papa Avatar asked Jan 15 '18 19:01

A.Papa


People also ask

How do you replace values in a DataFrame based on another DataFrame?

In this article, we will learn how we can replace values of a DataFrame with the value of another DataFrame using pandas. It can be done using the DataFrame. replace() method. It is used to replace a regex, string, list, series, number, dictionary, etc.

How do you replace values in a DataFrame column based on condition?

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.

How do I replace a specific value in a column in Pandas?

replace() function is used to replace values in column (one value with another value on all columns). This method takes to_replace, value, inplace, limit, regex and method as parameters and returns a new DataFrame. When inplace=True is used, it replaces on existing DataFrame object and returns None value.


2 Answers

By using replace

df1.Name.replace(df2.set_index('Name1').Name2.to_dict())
Out[437]: 
0     Alexs
1     Maria
2     Maria
3    Pandas
4    Coalas
Name: Name, dtype: object
like image 111
BENY Avatar answered Oct 19 '22 03:10

BENY


Let's use a Pandas solution with map and combine_first:

df1['Name'].map(df2.set_index('Name1')['Name2']).combine_first(df1['Name'])

Output:

0     Alexs
1     Maria
2     Maria
3    Pandas
4    Coalas
Name: Name, dtype: object
like image 20
Scott Boston Avatar answered Oct 19 '22 03:10

Scott Boston