Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to compare two dataframes and find matches from columns (pandas)

let's say we have the following code example where we create two basic dataframes:

import pandas as pd
 
# Creating Dataframes
a = [{'Name': 'abc', 'Age': 8, 'Grade': 3},
     {'Name': 'xyz', 'Age': 9, 'Grade': 3}]
 
df1 = pd.DataFrame(a)
b = [{'ID': 1,'Name': 'abc', 'Age': 8},
     {'ID': 2,'Name': 'xyz', 'Age': 9}]
 
df2 = pd.DataFrame(b)
 
# Printing Dataframes
display(df1)
display(df2)

We get the following datasets:

    Name   Age  Grade
0   abc    8    3
1   xyz    9    3


    ID   Name   Age
0   1    abc    8
1   2    xyz    9

How can I find the list of columns that are not repeated in these frames when they are intersected? That is, as a result, I want to get the names of the following columns: ['Grade', 'ID']

like image 674
kostya ivanov Avatar asked Sep 12 '25 15:09

kostya ivanov


1 Answers

Use symmetric_difference

res = df2.columns.symmetric_difference(df1.columns)
print(res)

Output

Index(['Grade', 'ID'], dtype='object')

Or as an alternative, use set.symmetric_difference

res = set(df2.columns).symmetric_difference(df1.columns)
print(res)

Output

{'Grade', 'ID'}

A third alternative, suggested by @SashSinha, is to use the shortcut:

res = df2.columns ^ df1.columns

but as of pandas 1.4.3 this issue a warning:

FutureWarning: Index.xor operating as a set operation is deprecated, in the future this will be a logical operation matching Series.xor. Use index.symmetric_difference(other) instead. res = df2.columns ^ df1.columns

like image 164
Dani Mesejo Avatar answered Sep 14 '25 03:09

Dani Mesejo