Logo Questions Linux Laravel Mysql Ubuntu Git Menu

Pandas: merge dataframes without creating new columns

I've got 2 dataframes with identical columns:

df1 = pd.DataFrame([['Abe','1','True'],['Ben','2','True'],['Charlie','3','True']], columns=['Name','Number','Other'])
df2 = pd.DataFrame([['Derek','4','False'],['Ben','5','False'],['Erik','6','False']], columns=['Name','Number','Other'])

which give:

     Name Number Other
0      Abe      1  True
1      Ben      2  True
2  Charlie      3  True


    Name Number  Other
0  Derek      4  False
1    Ben      5  False
2   Erik      6  False

I want an output dataframe that is an intersection of the two based on "Name":

output_df = 
        Name Number  Other
    0    Ben      2  True
    1    Ben      5  False

I've tried a basic pandas merge but the return is non-desirable:

pd.merge(df1,df2,how='inner',on='Name') = 
 Name Number_x Other_x Number_y Other_y
0  Ben        2    True        5   False

These dataframes are quite large so I'd prefer to use some pandas magic to keep things quick.

like image 608
RedM Avatar asked Dec 21 '16 12:12


People also ask

How do I merge Dataframes without duplicating columns?

In this approach to prevent duplicated columns from joining the two data frames, the user needs simply needs to use the pd. merge() function and pass its parameters as they join it using the inner join and the column names that are to be joined on from left and right data frames in python.

How do I get rid of duplicate columns after merge pandas?

To drop duplicate columns from pandas DataFrame use df. T. drop_duplicates(). T , this removes all columns that have the same data regardless of column names.

How do I merge two Dataframes in pandas based on common column?

To merge two Pandas DataFrame with common column, use the merge() function and set the ON parameter as the column name.

1 Answers

You can use concat and then filter by isin with numpy.intersect1d using boolean indexing:

val = np.intersect1d(df1.Name, df2.Name)
print (val)

df = pd.concat([df1,df2], ignore_index=True)
print (df[df.Name.isin(val)])
  Name Number  Other
1  Ben      2   True
4  Ben      5  False

Another possible solution for val is intersection of sets:

val = set(df1.Name).intersection(set(df2.Name))
print (val)

Then is possible reset index to monotonic:

df = pd.concat([df1,df2])
print (df[df.Name.isin(val)].reset_index(drop=True))
  Name Number  Other
0  Ben      2   True
1  Ben      5  False
like image 88
jezrael Avatar answered Nov 04 '22 16:11
