We can merge two Pandas DataFrames on certain columns using the merge function by simply specifying the certain columns for merge. Example1: Let's create a Dataframe and then merge them into a single dataframe. Creating a Dataframe: Python3.
In you want to join on multiple columns instead of a single column, then you can pass a list of column names to Dataframe. merge() instead of single column name. Also, as we didn't specified the value of 'how' argument, therefore by default Dataframe. merge() uses inner join.
merge() for combining data on common columns or indices. . join() for combining data on a key column or an index. concat() for combining DataFrames across rows or columns.
To merge two pandas DataFrames on multiple columns use pandas. merge() method. merge() is considered more versatile and flexible and we also have the same method in DataFrame.
You want to use TWO brackets, so if you are doing a VLOOKUP sort of action:
df = pd.merge(df,df2[['Key_Column','Target_Column']],on='Key_Column', how='left')
This will give you everything in the original df + add that one corresponding column in df2 that you want to join.
You could merge the sub-DataFrame (with just those columns):
df2[list('xab')] # df2 but only with columns x, a, and b
df1.merge(df2[list('xab')])
If you want to drop column(s) from the target data frame, but the column(s) are required for the join, you can do the following:
df1 = df1.merge(df2[['a', 'b', 'key1']], how = 'left',
left_on = 'key2', right_on = 'key1').drop(columns= ['key1'])
The .drop('key1')
part will prevent 'key1' from being kept in the resulting data frame, despite it being required to join in the first place.
You can use .loc
to select the specific columns with all rows and then pull that. An example is below:
pandas.merge(dataframe1, dataframe2.iloc[:, [0:5]], how='left', on='key')
In this example, you are merging dataframe1 and dataframe2. You have chosen to do an outer left join on 'key'. However, for dataframe2 you have specified .iloc
which allows you to specific the rows and columns you want in a numerical format. Using :
, your selecting all rows, but [0:5]
selects the first 5 columns. You could use .loc
to specify by name, but if your dealing with long column names, then .iloc
may be better.
This is to merge selected columns from two tables.
If table_1
contains t1_a,t1_b,t1_c..,id,..t1_z
columns,
and table_2
contains t2_a, t2_b, t2_c..., id,..t2_z
columns,
and only t1_a, id, t2_a are required in the final table, then
mergedCSV = table_1[['t1_a','id']].merge(table_2[['t2_a','id']], on = 'id',how = 'left')
# save resulting output file
mergedCSV.to_csv('output.csv',index = False)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With