I have two pandas
data frames, a
and b
:
a1 a2 a3 a4 a5 a6 a7
1 3 4 5 3 4 5
0 2 0 3 0 2 1
2 5 6 5 2 1 2
and
b1 b2 b3 b4 b5 b6 b7
3 5 4 5 1 4 3
0 1 2 3 0 0 2
2 2 1 5 2 6 5
The two data frames contain exactly the same data, but in a different order and with different column names. Based on the numbers in the two data frames, I would like to be able to match each column name in a
to each column name in b
.
It is not as easy as simply comparing the first row of a
with the first row of b
as there are duplicated values, for example both a4
and a7
have the value 5
so it is not possible to immediately match them to either b2
or b4
.
What is the best way to do this?
DataFrame - equals() function This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal. The column headers do not need to have the same type, but the elements within the columns must be the same dtype.
The compare method in pandas shows the differences between two DataFrames. It compares two data frames, row-wise and column-wise, and presents the differences side by side. The compare method can only compare DataFrames of the same shape, with exact dimensions and identical row and column labels.
Here's one way leveraging broadcasting to check for equality between both dataframes and taking all
on the result to check where all rows match. Then we can obtain indexing arrays for both dataframe's column names from the result of np.where
(with @piR's contribution):
i, j = np.where((a.values[:,None] == b.values[:,:,None]).all(axis=0))
dict(zip(a.columns[j], b.columns[i]))
# {'a7': 'b2', 'a6': 'b3', 'a4': 'b4', 'a2': 'b7'}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With