Compare headers of dataframes in pandas

Tags:

I am trying to compare the headers of two pandas dataframes and filter the columns that match. df1 is my big dataframe with two headers, df2 is sort of a dictionary where I have saved every column header I will need from df1.

So if df1 is something like this:

    A         B         C         D
    a         b         c         d
 0.469112 -0.282863 -1.509059 -1.135632
 1.212112 -0.173215  0.119209 -1.044236
-0.861849 -2.104569 -0.494929  1.071804
 0.721555 -0.706771 -1.039575  0.271860
-0.424972  0.567020  0.276232 -1.087401
-0.673690  0.113648 -1.478427  0.524988

and df2 is something like this:

   B         D         E

I need to get the output:

     B          D
 -0.282863  -1.135632
 -0.173215  -1.044236
 -2.104569   1.071804
 -0.706771   0.271860
  0.567020  -1.087401
  0.113648   0.524988

and also a list of the header elements that were not matching:

A      C

as well as elements missing from df1:

So far I have tried the iloc command and a lot of different suggestions here on stackoverflow for comparing rows. Since I am comparing the headers though it was not possible.

EDIT: I have tried

df1.columns.intersection(df2.columns)

but the result is:

MultiIndex(levels=[[], []],
           labels=[[], []])

Is this because of the multiple headers?

585

asked Aug 03 '17 11:08

Moiraine24

1 Answers

Here's are couple of methods, for given df1 and df2

In [1041]: df1.columns
Out[1041]: Index([u'A', u'B', u'C', u'D'], dtype='object')

In [1042]: df2.columns
Out[1042]: Index([u'B', u'D', u'E'], dtype='object')

Columns in both df1 and df2

In [1046]: df1.columns.intersection(df2.columns)
Out[1046]: Index([u'B', u'D'], dtype='object')

Columns in df1 not in df2

In [1047]: df1.columns.difference(df2.columns)
Out[1047]: Index([u'A', u'C'], dtype='object')

Columns in df2 not in df1

In [1048]: df2.columns.difference(df1.columns)
Out[1048]: Index([u'E'], dtype='object')

145

answered Nov 15 '22 22:11

Zero

Related questions
                            
                                Read in the first column of a CSV in Python
                            
                                how do I calculate a rolling idxmax
                            
                                how to hide axes in matplotlib.pyplot
                            
                                Changing a value in a yaml file using Python
                            
                                How to select duplicate rows with pandas?
                            
                                More Pythonic/Pandaic approach to looping over a pandas Series
                            
                                why sum on lists is (sometimes) faster than itertools.chain?
                            
                                Issues with Anaconda install - Failed to create Anaconda menus
                            
                                Contrast stretching in Python/ OpenCV
                            
                                Nltk french tokenizer in python not working
                            
                                Is it possible to ping mongodb from pymongo
                            
                                Pandas: how to apply function to column names
                            
                                Heatmap from columns in pandas dataframe
                            
                                How would I loop through all the file names in a subdirectory on Google Cloud Storage with python?
                            
                                PyTorch: access weights of a specific module in nn.Sequential()
                            
                                How to insert a variable value in a string in python
                            
                                Easiest way to plot data from JSON with matplotlib?
                            
                                How to write in Django a custom management command which takes a URL as a parameter?
                            
                                How to send custom header (metadata) with Python gRPC?
                            
                                How to check if a value in one list is in another list with a one-liner for an if statement in Python if I'm not using sets?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Compare headers of dataframes in pandas

Tags:

python

python-3.x

pandas

Moiraine24

People also ask

1 Answers

Zero

Recent Activity

Donate For Us