so by using
df_ab = pd.concat([df_a, df_b], axis=1, join='inner')
I get a Dataframe looking like this:
A A B B
0 5 5 10 10
1 6 6 19 19
and I want to remove its multiple columns:
A B
0 5 10
1 6 19
Because df_a and df_b are subsets of the same Dataframe I know that all rows have the same values if the column name is the same. I have a working solution:
df_ab = df_ab.T.drop_duplicates().T
but I have a number of rows so this one is very slow. Does someone have a faster solution? I would prefer a solution where explicit knowledge of the column names isn't needed.
Perhaps you would be better off avoiding the problem altogether, by using pd.merge
instead of pd.concat
:
df_ab = pd.merge(df_a, df_b, how='inner')
This will merge df_a
and df_b
on all columns shared in common.
The easiest way is:
df = df.loc[:,~df.columns.duplicated()]
One line of code can change everything
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With