I have two data frames df-1 and df-2 like this,
import pandas as pd
raw_data = {'company': ['comp1', 'comp1', 'comp1', 'comp1', 'comp2', 'comp2', 'comp2', 'comp2', 'comp3', 'comp3', 'comp3', 'comp3'],
'region': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'],
'name': ['John', 'Jake', 'Alice', 'Mathew', 'Mark', 'Jacon', 'Ryan', 'Sone', 'Steve', 'Rooke', 'Rani', 'Alice'],
'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}
df1 = pd.DataFrame(raw_data, columns = ['company', 'region', 'name', 'preTestScore'])
print df1
raw_data = {'company': [ 'comp1', 'comp1', 'comp2', 'comp2', 'comp2', 'comp2', 'comp3', 'comp3', 'comp3'],
'region': [ '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd'],
'name': [ 'Alice', 'Mathew', 'Mark', 'Jacon', 'Ryan', 'Sone', 'Steve', 'Rooke', 'Rani', ],
'status': [ 'great', 'average', 'average', 'average', 'good', 'great', 'average', 'average', 'average']}
df2 = pd.DataFrame(raw_data, columns = ['company', 'region', 'name', 'status'])
print df2
How to find the rows of company, region and name in df-1 which is same as df-2. In other words, how to find the inner join with combination of all three columns.
Python Pandas – Find the common rows between two DataFrames with merge() To find the common rows between two DataFrames with merge(), use the parameter “how” as “inner” since it works like SQL Inner Join and this is what we want to achieve.
Finding the common rows between two DataFrames We can use either merge() function or concat() function. The merge() function serves as the entry point for all standard database join operations between DataFrame objects. Merge function is similar to SQL inner join, we find the common rows between two dataframes.
The compare method in pandas shows the differences between two DataFrames. It compares two data frames, row-wise and column-wise, and presents the differences side by side. The compare method can only compare DataFrames of the same shape, with exact dimensions and identical row and column labels.
It depends what you mean by
rows in df-1 which is same as df-2.
since the columns are not identical.
If you mean rows that have the same value for the intersection of columns, you can perform an inner join user merge
:
In [13]: pd.merge(df1, df2, how='inner')
Out[13]:
company region name preTestScore status
0 comp1 2nd Alice 31 great
1 comp1 2nd Mathew 2 average
2 comp2 1st Mark 3 average
3 comp2 1st Jacon 4 average
4 comp2 2nd Ryan 24 good
5 comp2 2nd Sone 31 great
6 comp3 1st Steve 2 average
7 comp3 1st Rooke 3 average
8 comp3 2nd Rani 2 average
Edit
If you'd like greater control for the join columns, you can use the on
, or left_on
and right_on
parameters of the merge
function. If you don't, pandas will assume you mean the intersection of columns of the two dataframes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With