Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In pandas, how to concatenate horizontally and then remove the redundant columns

Tags:

python

pandas

Say I have two dataframes.

DF1: col1, col2, col3,

DF2: col2, col4, col5

How do I concatenate the two dataframes horizontally and have the col1, col2, col3, col4, and col5? Right now, I am doing pd.concat([DF1, DF2], axis = 1) but it ends up having two col2's. Assuming all the values inside the two col2 are the same, I want to have only one columns.

like image 864
Jun Jang Avatar asked Jun 14 '17 13:06

Jun Jang


People also ask

How do you concatenate horizontally in pandas?

To concatenate DataFrames horizontally in Pandas, use the concat(~) method with axis=1 .

Does concat remove duplicates pandas?

By default, when you concatenate two dataframes with duplicate records, Pandas automatically combine them together without removing the duplicate rows.

How do I remove duplicate columns in pandas?

To drop duplicate columns from pandas DataFrame use df. T. drop_duplicates(). T , this removes all columns that have the same data regardless of column names.

How do I merge DataFrames without duplicates?

To concatenate DataFrames, use the concat() method, but to ignore duplicates, use the drop_duplicates() method.


1 Answers

Dropping duplicates should work. Because drop_duplicates only works with index, we need to transpose the DF to drop duplicates and transpose it back.

pd.concat([DF1, DF2], axis = 1).T.drop_duplicates().T
like image 188
Allen Avatar answered Oct 11 '22 04:10

Allen