Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas update and add rows one dataframe with key column in another dataframe

Tags:

python

pandas

I have 2 data frames with identical columns. Column 'key' will have unique values.

Data frame 1:-

A B key C    
0 1 k1  2    
1 2 k2  3    
2 3 k3  5

Data frame 2:-

A B key C    
4 5 k1  2    
1 2 k2  3
2 3 k4  5

I would like to update rows in Dataframe-1 with values in Dataframe -2 if key in Dataframe -2 matches with Dataframe -1. Also if key is new then add entire row from Dataframe-2 to Dataframe-1.

Final Output Dataframe is like this with same columns.

A B key C
4 5 k1  2   --> update
1 2 k2  3   --> no changes
2 3 k3  5   --> no changes
2 3 k4  5   --> new row

I have tried with below code. I need only 4 columns 'A', 'B','Key','C' without any suffixes after merge.

df3 = df1.merge(df2,on='key',how='outer')
>>> df3
   A_x  B_x key  C_x  A_y  B_y  C_y
0  0.0  1.0  k1  2.0  4.0  5.0  2.0
1  1.0  2.0  k2  3.0  1.0  2.0  3.0
2  2.0  3.0  k3  5.0  NaN  NaN  NaN
3  NaN  NaN  k4  NaN  2.0  3.0  5.0
like image 360
Chinmay Hegde Avatar asked Dec 16 '17 10:12

Chinmay Hegde


People also ask

How do I add rows to a DataFrame from another DataFrame?

append() function is used to append rows of other dataframe to the end of the given dataframe, returning a new dataframe object. Columns not in the original dataframes are added as new columns and the new cells are populated with NaN value.

How will you add a new column and new row to a pandas DataFrame?

In pandas you can add/append a new column to the existing DataFrame using DataFrame. insert() method, this method updates the existing DataFrame with a new column. DataFrame. assign() is also used to insert a new column however, this method returns a new Dataframe after adding a new column.

How do I append a DataFrame column to another DataFrame?

After extraction, the column needs to be simply added to the second dataframe using join() function. This function needs to be called with reference to the dataframe in which the column has to be added and the variable name which stores the extracted column name has to be passed to it as the argument.


3 Answers

It seems like you're looking for combine_first.

a = df2.set_index('key')
b = df1.set_index('key')

(a.combine_first(b)
  .reset_index()
  .reindex(columns=df1.columns))

     A    B key    C
0  4.0  5.0  k1  2.0
1  1.0  2.0  k2  3.0
2  2.0  3.0  k3  5.0
3  2.0  3.0  k4  5.0
like image 55
cs95 Avatar answered Oct 08 '22 00:10

cs95


try this:

df1 = {'key': ['k1', 'k2', 'k3'], 'A':[0,1,2], 'B': [1,2,3], 'C':[2,3,5]}
df1 = pd.DataFrame(data=df1)
print (df1)
df2 = {'key': ['k1', 'k2', 'k4'], 'A':[4,1,2], 'B': [5,2,3], 'C':[2,3,5]}
df2 = pd.DataFrame(data=df2)
print (df2)
df3 = df1.append(df2)
df3.drop_duplicates(subset=['key'], keep='last', inplace=True)
df3 = df3.sort_values(by=['key'], ascending=True)
print (df3)
like image 45
Joe Avatar answered Oct 07 '22 23:10

Joe


First, you need to indicate index columns:

df1.set_index('key', inplace=True)
df2.set_index('key', inplace=True)

Then, combine the dataframes to get all the index keys in place (this will not update the df1 values! See: combine_first manual):

df1 = df1.combine_first(df2)

Last step is updating the values in df1 with df2 and resetting the index

df1.update(df2)
df1.reset_index(inplace=True)
like image 1
Nimov Avatar answered Oct 07 '22 22:10

Nimov