Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas merge only certain columns

People also ask

How do I merge only selected columns in pandas?

We can merge two Pandas DataFrames on certain columns using the merge function by simply specifying the certain columns for merge. Example1: Let's create a Dataframe and then merge them into a single dataframe. Creating a Dataframe: Python3.

How do I merge two Dataframes on a specific column?

In you want to join on multiple columns instead of a single column, then you can pass a list of column names to Dataframe. merge() instead of single column name. Also, as we didn't specified the value of 'how' argument, therefore by default Dataframe. merge() uses inner join.

How do I merge columns in pandas Python?

merge() for combining data on common columns or indices. . join() for combining data on a key column or an index. concat() for combining DataFrames across rows or columns.

How do I merge 3 columns in pandas?

To merge two pandas DataFrames on multiple columns use pandas. merge() method. merge() is considered more versatile and flexible and we also have the same method in DataFrame.


You want to use TWO brackets, so if you are doing a VLOOKUP sort of action:

df = pd.merge(df,df2[['Key_Column','Target_Column']],on='Key_Column', how='left')

This will give you everything in the original df + add that one corresponding column in df2 that you want to join.


You could merge the sub-DataFrame (with just those columns):

df2[list('xab')]  # df2 but only with columns x, a, and b

df1.merge(df2[list('xab')])

If you want to drop column(s) from the target data frame, but the column(s) are required for the join, you can do the following:

df1 = df1.merge(df2[['a', 'b', 'key1']], how = 'left',
                left_on = 'key2', right_on = 'key1').drop(columns= ['key1'])

The .drop('key1') part will prevent 'key1' from being kept in the resulting data frame, despite it being required to join in the first place.


You can use .loc to select the specific columns with all rows and then pull that. An example is below:

pandas.merge(dataframe1, dataframe2.iloc[:, [0:5]], how='left', on='key')

In this example, you are merging dataframe1 and dataframe2. You have chosen to do an outer left join on 'key'. However, for dataframe2 you have specified .iloc which allows you to specific the rows and columns you want in a numerical format. Using :, your selecting all rows, but [0:5] selects the first 5 columns. You could use .loc to specify by name, but if your dealing with long column names, then .iloc may be better.


This is to merge selected columns from two tables.

If table_1 contains t1_a,t1_b,t1_c..,id,..t1_z columns, and table_2 contains t2_a, t2_b, t2_c..., id,..t2_z columns, and only t1_a, id, t2_a are required in the final table, then

mergedCSV = table_1[['t1_a','id']].merge(table_2[['t2_a','id']], on = 'id',how = 'left')
# save resulting output file    
mergedCSV.to_csv('output.csv',index = False)