Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare pandas dataframes for common rows in two dataframes

Tags:

python

pandas

I have two data frames df-1 and df-2 like this,

import pandas as pd

raw_data = {'company': ['comp1', 'comp1', 'comp1', 'comp1', 'comp2', 'comp2', 'comp2', 'comp2', 'comp3', 'comp3', 'comp3', 'comp3'], 
        'region': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'], 
        'name': ['John', 'Jake', 'Alice', 'Mathew', 'Mark', 'Jacon', 'Ryan', 'Sone', 'Steve', 'Rooke', 'Rani', 'Alice'], 
        'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
        'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}
df1 = pd.DataFrame(raw_data, columns = ['company', 'region', 'name', 'preTestScore'])
print df1


raw_data = {'company': [ 'comp1', 'comp1', 'comp2', 'comp2', 'comp2', 'comp2', 'comp3', 'comp3', 'comp3'], 
        'region': [ '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd'], 
        'name': [ 'Alice', 'Mathew', 'Mark', 'Jacon', 'Ryan', 'Sone', 'Steve', 'Rooke', 'Rani', ], 
        'status': [ 'great', 'average', 'average', 'average', 'good', 'great', 'average', 'average', 'average']}
df2 = pd.DataFrame(raw_data, columns = ['company', 'region', 'name', 'status'])
print df2

How to find the rows of company, region and name in df-1 which is same as df-2. In other words, how to find the inner join with combination of all three columns.

like image 734
Jake Avatar asked Jul 23 '16 14:07

Jake


People also ask

How do you find common rows in two DataFrames Pandas?

Python Pandas – Find the common rows between two DataFrames with merge() To find the common rows between two DataFrames with merge(), use the parameter “how” as “inner” since it works like SQL Inner Join and this is what we want to achieve.

How can you tell if two DataFrames have the same rows?

Finding the common rows between two DataFrames We can use either merge() function or concat() function. The merge() function serves as the entry point for all standard database join operations between DataFrame objects. Merge function is similar to SQL inner join, we find the common rows between two dataframes.

How do I compare two data frames in Pandas?

The compare method in pandas shows the differences between two DataFrames. It compares two data frames, row-wise and column-wise, and presents the differences side by side. The compare method can only compare DataFrames of the same shape, with exact dimensions and identical row and column labels.


1 Answers

It depends what you mean by

rows in df-1 which is same as df-2.

since the columns are not identical.

If you mean rows that have the same value for the intersection of columns, you can perform an inner join user merge:

In [13]: pd.merge(df1, df2, how='inner')
Out[13]: 
  company region    name  preTestScore   status
0   comp1    2nd   Alice            31    great
1   comp1    2nd  Mathew             2  average
2   comp2    1st    Mark             3  average
3   comp2    1st   Jacon             4  average
4   comp2    2nd    Ryan            24     good
5   comp2    2nd    Sone            31    great
6   comp3    1st   Steve             2  average
7   comp3    1st   Rooke             3  average
8   comp3    2nd    Rani             2  average

Edit

If you'd like greater control for the join columns, you can use the on, or left_on and right_on parameters of the merge function. If you don't, pandas will assume you mean the intersection of columns of the two dataframes.

like image 115
Ami Tavory Avatar answered Oct 17 '22 03:10

Ami Tavory