Fast method for removing duplicate columns in pandas.Dataframe

Question

so by using

df_ab = pd.concat([df_a, df_b], axis=1, join='inner')

I get a Dataframe looking like this:

    A    A    B    B
0   5    5   10   10
1   6    6   19   19

and I want to remove its multiple columns:

    A     B
0   5    10
1   6    19

Because df_a and df_b are subsets of the same Dataframe I know that all rows have the same values if the column name is the same. I have a working solution:

df_ab = df_ab.T.drop_duplicates().T

but I have a number of rows so this one is very slow. Does someone have a faster solution? I would prefer a solution where explicit knowledge of the column names isn't needed.

unutbu · Accepted Answer

Perhaps you would be better off avoiding the problem altogether, by using pd.merge instead of pd.concat:

df_ab = pd.merge(df_a, df_b, how='inner')

This will merge df_a and df_b on all columns shared in common.

Prayson W. Daniel · Answer

The easiest way is:

df = df.loc[:,~df.columns.duplicated()]

One line of code can change everything

Fast method for removing duplicate columns in pandas.Dataframe

Tags:

python

pandas

Peter Klauke

2 Answers

unutbu

Prayson W. Daniel

Recent Activity

Donate For Us

Fast method for removing duplicate columns in pandas.Dataframe

Tags:

python

pandas

Peter Klauke

2 Answers

unutbu

Prayson W. Daniel

Related questions

Recent Activity

Donate For Us