Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare two pandas dataframe with different size

I have one massive pandas dataframe with this structure:

df1:
    A   B
0   0  12
1   0  15
2   0  17
3   0  18
4   1  45
5   1  78
6   1  96
7   1  32
8   2  45
9   2  78
10  2  44
11  2  10

And a second one, smaller like this:

df2
   G   H
0  0  15
1  1  45
2  2  31

I want to add a column to my first dataframe following this rule: column df1.C = df2.H when df1.A == df2.G

I manage to do it with for loops, but the database is massive and the code run really slowly so I am looking for a Pandas-way or numpy to do it.

Many thanks,

Boris

like image 280
boris Avatar asked Jun 07 '17 14:06

boris


People also ask

Can I compare two DataFrames pandas?

The compare method in pandas shows the differences between two DataFrames. It compares two data frames, row-wise and column-wise, and presents the differences side by side. The compare method can only compare DataFrames of the same shape, with exact dimensions and identical row and column labels.

How do you compare two data frames equal?

DataFrame - equals() function The equals() function is used to test whether two objects contain the same elements. This function allows two Series or DataFrames to be compared against each other to see if they have the same shape and elements. NaNs in the same location are considered equal.

How do you compare non identical data frames?

Since your goal is just to compare differences, use DataFrame. compare instead of aggregating into strings. So we just need to align the row/column indexes, either via merge or reindex .


2 Answers

If you only want to match mutual rows in both dataframes:

import pandas as pd

df1 = pd.DataFrame({'Name':['Sara'],'Special ability':['Walk on water']})
df1    
   Name Special ability
0  Sara   Walk on water

df2 = pd.DataFrame({'Name':['Sara', 'Gustaf', 'Patrik'],'Age':[4,12,11]})
df2
     Name  Age
0    Sara    4
1  Gustaf   12
2  Patrik   11

df = df2.merge(o, left_on='Name', right_on='Name', how='left')
df
     Name  Age Special ability
0    Sara    4             NaN
1  Gustaf   12   Walk on water
2  Patrik   11             NaN

This Can allso be done with more than one matching argument: (In this example Patrik from df1 does not exist in df2 becuse they have different ages and therfore will not merge)

df1 = pd.DataFrame({'Name':['Sara','Patrik'],'Special ability':['Walk on water','FireBalls'],'Age':[12,83]})

df1
     Name Special ability  Age
0    Sara   Walk on water   12
1  Patrik       FireBalls   83

df2 = pd.DataFrame({'Name':['Sara', 'Gustaf', 'Patrik'],'Age':[4,12,11]})
df2
     Name  Age
0    Sara    4
1  Gustaf   12
2  Patrik   11

df = df2.merge(df1,left_on=['Name','Age'],right_on=['Name','Age'],how='left')
df
     Name  Age Special ability
0    Sara   12   Walk on water
1  Gustaf   12             NaN
2  Patrik   11             NaN
like image 103
Frans Sjöström Avatar answered Sep 18 '22 00:09

Frans Sjöström


You probably want to use a merge:

df=df1.merge(df2,left_on="A",right_on="G")

will give you a dataframe with 3 columns, but the third one's name will be H

df.columns=["A","B","C"]

will then give you the column names you want

like image 42
WNG Avatar answered Sep 21 '22 00:09

WNG