I am trying to join to data frames. They look like this
DF1 = ID COUNTRY YEAR V1 V2 V3 V4
12 USA 2012 x y z a
13 USA 2013 x y z a
14 RUSSIA 2012 x y z a
DF2 = ID COUNTRY YEAR TRACT
9 USA 2000 A
13 USA 2013 B
The desired end goal is:
DF3 = ID COUNTRY YEAR V1 V2 V3 V4 TRACT
9 USA 2000 A
12 USA 2012 x y z a
13 USA 2013 x y z a B
14 RUSSIA 2012 x y z a
I've been trying to use the pd.merge and the .join function with the on='outer' setting to no success
df3 = pd.merge(df1,df2,how='outer',left_on=['ID','Country','Year'],right_on=['ID',"Country","Year"])
In this approach to prevent duplicated columns from joining the two data frames, the user needs simply needs to use the pd. merge() function and pass its parameters as they join it using the inner join and the column names that are to be joined on from left and right data frames in python.
To drop duplicate columns from pandas DataFrame use df. T. drop_duplicates(). T , this removes all columns that have the same data regardless of column names.
try this:
df.merge(df2,how='outer',left_on=['ID','COUNTRY','YEAR'],right_on=['ID',"COUNTRY","YEAR"])
(the column names should be in caps based on your input tables)
Have you tried
df1.join(df2)
You can add parameters later, but it should work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With