Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare headers of dataframes in pandas

I am trying to compare the headers of two pandas dataframes and filter the columns that match. df1 is my big dataframe with two headers, df2 is sort of a dictionary where I have saved every column header I will need from df1.

So if df1 is something like this:

    A         B         C         D
    a         b         c         d
 0.469112 -0.282863 -1.509059 -1.135632
 1.212112 -0.173215  0.119209 -1.044236
-0.861849 -2.104569 -0.494929  1.071804
 0.721555 -0.706771 -1.039575  0.271860
-0.424972  0.567020  0.276232 -1.087401
-0.673690  0.113648 -1.478427  0.524988

and df2 is something like this:

   B         D         E

I need to get the output:

     B          D
 -0.282863  -1.135632
 -0.173215  -1.044236
 -2.104569   1.071804
 -0.706771   0.271860
  0.567020  -1.087401
  0.113648   0.524988

and also a list of the header elements that were not matching:

A      C

as well as elements missing from df1:

E

So far I have tried the iloc command and a lot of different suggestions here on stackoverflow for comparing rows. Since I am comparing the headers though it was not possible.

EDIT: I have tried

df1.columns.intersection(df2.columns)

but the result is:

MultiIndex(levels=[[], []],
           labels=[[], []])

Is this because of the multiple headers?

like image 585
Moiraine24 Avatar asked Aug 03 '17 11:08

Moiraine24


People also ask

How can you tell if two DataFrames have the same column names?

Assuming your don't have duplicate column names, which is never a good idea in pandas , and "same" doesn't care about the position they occur in the Index, it suffices to check if the length of the columns index is the same as the length of the set intersection between two DataFrame indices.

How do you compare multiple data frames?

The compare method in pandas shows the differences between two DataFrames. It compares two data frames, row-wise and column-wise, and presents the differences side by side. The compare method can only compare DataFrames of the same shape, with exact dimensions and identical row and column labels.

How do I compare two column values in pandas?

To find the positions of two matching columns, we first initialize a pandas dataframe with two columns of city names. Then we use where() of numpy to compare the values of two columns. This returns an array that represents the indices where the two columns have the same value.


1 Answers

Here's are couple of methods, for given df1 and df2

In [1041]: df1.columns
Out[1041]: Index([u'A', u'B', u'C', u'D'], dtype='object')

In [1042]: df2.columns
Out[1042]: Index([u'B', u'D', u'E'], dtype='object')

Columns in both df1 and df2

In [1046]: df1.columns.intersection(df2.columns)
Out[1046]: Index([u'B', u'D'], dtype='object')

Columns in df1 not in df2

In [1047]: df1.columns.difference(df2.columns)
Out[1047]: Index([u'A', u'C'], dtype='object')

Columns in df2 not in df1

In [1048]: df2.columns.difference(df1.columns)
Out[1048]: Index([u'E'], dtype='object')
like image 145
Zero Avatar answered Nov 15 '22 22:11

Zero