Merge Pandas Dataframe: how to add columns and replace values

Question

I have a dataframe df1 and want to merge other (many) dataframes df2 so that:

Merge happens on the matching (multi-) indexes
New columns are created if missing
Values are replaced if the column already exists

enter image description here

What is the correct pandas operation to use and with what arguments? I looked into concat/join/merge/assign/append but did not find it yet.

Code for dataframes:

df1 = pd.DataFrame({'A':['A1', 'A2', 'A3', 'A4'],
                    'B':['B1', 'B2' ,'B3', 'B4'],
                    'C':['C1' ,'C2', 'C3', 'C4']},
                  index = [1,2,3,4])

df2 = pd.DataFrame({'C':['NewC'], 'D':['NewD']},
                  index=[3])

Scott Boston · Accepted Answer

One way is to use combine_first:

df2.combine_first(df1)

Output:

    A   B     C     D
1  A1  B1    C1   NaN
2  A2  B2    C2   NaN
3  A3  B3  NewC  NewD
4  A4  B4    C4   NaN

Another way is to use join with fillna:

df1[['A','B']].join(df2).fillna(df1)

Output:

    A   B     C     D
1  A1  B1    C1   NaN
2  A2  B2    C2   NaN
3  A3  B3  NewC  NewD
4  A4  B4    C4   NaN

A third way,

df1a = df1.reindex(df1.columns.union(df2.columns), axis=1)
df1a.update(df2)
df1a

Timings:

%%timeit pd.concat((df1,df2),sort=False).groupby(level=0).last()

4.56 ms ± 947 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit 
df1a = df1.reindex(df1.columns.union(df2.columns), axis=1)  
df1a.update(df2)
df1a

2.93 ms ± 133 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df1[['A','B']].join(df2).fillna(df1)

5.2 ms ± 89.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df2.combine_first(df1)

5.37 ms ± 127 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

anky · Answer

combine first is the correct way here, another alternative is concat and get last from group level=0 (index):

pd.concat((df1,df2),sort=False).groupby(level=0).last()

Or with df.assign

df1.assign(**df2).fillna(df1)

    A   B     C     D
1  A1  B1    C1   NaN
2  A2  B2    C2   NaN
3  A3  B3  NewC  NewD
4  A4  B4    C4   NaN

Merge Pandas Dataframe: how to add columns and replace values

Tags:

merge

join

concatenation

pandas

Philipp

2 Answers

Timings:

Scott Boston

anky

Recent Activity

Donate For Us

Merge Pandas Dataframe: how to add columns and replace values

Tags:

merge

join

concatenation

pandas

Philipp

2 Answers

Timings:

Scott Boston

anky

Related questions

Recent Activity

Donate For Us