Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find equal columns between two dataframes

I have two pandas data frames, a and b:

a1   a2   a3   a4   a5   a6   a7
1    3    4    5    3    4    5
0    2    0    3    0    2    1
2    5    6    5    2    1    2

and

b1   b2   b3   b4   b5   b6   b7
3    5    4    5    1    4    3
0    1    2    3    0    0    2
2    2    1    5    2    6    5

The two data frames contain exactly the same data, but in a different order and with different column names. Based on the numbers in the two data frames, I would like to be able to match each column name in a to each column name in b.

It is not as easy as simply comparing the first row of a with the first row of b as there are duplicated values, for example both a4 and a7 have the value 5 so it is not possible to immediately match them to either b2 or b4.

What is the best way to do this?

like image 400
OD1995 Avatar asked Jan 13 '20 17:01

OD1995


People also ask

How do you check if two data frames have the same columns in Python?

DataFrame - equals() function This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal. The column headers do not need to have the same type, but the elements within the columns must be the same dtype.

How do I compare two DataFrames based on a column?

The compare method in pandas shows the differences between two DataFrames. It compares two data frames, row-wise and column-wise, and presents the differences side by side. The compare method can only compare DataFrames of the same shape, with exact dimensions and identical row and column labels.


1 Answers

Here's one way leveraging broadcasting to check for equality between both dataframes and taking all on the result to check where all rows match. Then we can obtain indexing arrays for both dataframe's column names from the result of np.where (with @piR's contribution):

i, j = np.where((a.values[:,None] == b.values[:,:,None]).all(axis=0))
dict(zip(a.columns[j], b.columns[i]))
# {'a7': 'b2', 'a6': 'b3', 'a4': 'b4', 'a2': 'b7'}
like image 121
yatu Avatar answered Sep 28 '22 09:09

yatu