I am trying to compare the headers of two pandas dataframes and filter the columns that match. df1 is my big dataframe with two headers, df2 is sort of a dictionary where I have saved every column header I will need from df1.
So if df1 is something like this:
A B C D
a b c d
0.469112 -0.282863 -1.509059 -1.135632
1.212112 -0.173215 0.119209 -1.044236
-0.861849 -2.104569 -0.494929 1.071804
0.721555 -0.706771 -1.039575 0.271860
-0.424972 0.567020 0.276232 -1.087401
-0.673690 0.113648 -1.478427 0.524988
and df2 is something like this:
B D E
I need to get the output:
B D
-0.282863 -1.135632
-0.173215 -1.044236
-2.104569 1.071804
-0.706771 0.271860
0.567020 -1.087401
0.113648 0.524988
and also a list of the header elements that were not matching:
A C
as well as elements missing from df1:
E
So far I have tried the iloc command and a lot of different suggestions here on stackoverflow for comparing rows. Since I am comparing the headers though it was not possible.
EDIT: I have tried
df1.columns.intersection(df2.columns)
but the result is:
MultiIndex(levels=[[], []],
labels=[[], []])
Is this because of the multiple headers?
Assuming your don't have duplicate column names, which is never a good idea in pandas , and "same" doesn't care about the position they occur in the Index, it suffices to check if the length of the columns index is the same as the length of the set intersection between two DataFrame indices.
The compare method in pandas shows the differences between two DataFrames. It compares two data frames, row-wise and column-wise, and presents the differences side by side. The compare method can only compare DataFrames of the same shape, with exact dimensions and identical row and column labels.
To find the positions of two matching columns, we first initialize a pandas dataframe with two columns of city names. Then we use where() of numpy to compare the values of two columns. This returns an array that represents the indices where the two columns have the same value.
Here's are couple of methods, for given df1
and df2
In [1041]: df1.columns
Out[1041]: Index([u'A', u'B', u'C', u'D'], dtype='object')
In [1042]: df2.columns
Out[1042]: Index([u'B', u'D', u'E'], dtype='object')
Columns in both df1
and df2
In [1046]: df1.columns.intersection(df2.columns)
Out[1046]: Index([u'B', u'D'], dtype='object')
Columns in df1
not in df2
In [1047]: df1.columns.difference(df2.columns)
Out[1047]: Index([u'A', u'C'], dtype='object')
Columns in df2
not in df1
In [1048]: df2.columns.difference(df1.columns)
Out[1048]: Index([u'E'], dtype='object')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With