Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare columns in different pandas dataframes

I have two dataframes, one with daily info starting in 1990 and one with daily info starting in 2000. Both dataframes contain information ending in 2016.

df1:

   Date       A     B     C 
1990-01-01   3.0  40.0  70.0  
1990-01-02  20.0  50.0  80.0  
1990-01-03  30.0  60.0  90.0  
1990-01-04   2.0   1.0   1.0 
1990-01-05   1.0   8.0   3.0  

df2:

   Date       A     B     C 
2000-01-01   NaN   NaN   NaN  
2000-01-02   5.0   NaN   NaN  
2000-01-03   1.0   NaN   5.0  
2000-01-04   2.0   4.0   8.0 
2000-01-05   1.0   3.0   4.0 

I need to compare columns in df1 and df2 which have the same name, which wouldn't usually be too complicated, but I need to compare them from the point at which there is data available in both dataframes for a given column (e.g from df2, 2000-01-02 in column 'A', 2000-01-04 in 'B'). I need to return True if they are the same from that point on and False if they are different. I have started by merging, which gives me:

df2.merge(df1, how = 'left', on = 'Date')


   Date      A.x   B.x   C.x   A.y   B.y   C.y   
2000-01-01   NaN   NaN   NaN   3.0   4.0   5.0
2000-01-02   5.0   NaN   NaN   5.0   9.0   2.0
2000-01-03   1.0   NaN   5.0   1.0   6.0   5.0
2000-01-04   2.0   4.0   8.0   2.0   4.0   1.0
2000-01-05   1.0   3.0   4.0   1.0   3.0   3.0

I have figured out how to find the common date, but am stuck as to how to do the same/different comparison. Can anyone help me compare the columns from the point at which there is a common value? A dictionary comes to mind as a useful output format, but wouldn't be essential:

comparison_dict = {
    "A" : True,
    "B" : True,
    "C" : False
}

Many thanks.

like image 489
poiter Avatar asked Aug 31 '18 16:08

poiter


People also ask

How do I compare three columns in pandas DataFrame?

The new column called all_matching shows whether or not the values in all three columns match in a given row. For example: All three values match in the first row, so True is returned. Not every value matches in the second row, so False is returned.


2 Answers

Assuming the Date column is the index.

  1. Stacking will drop nan by default
  2. Align with 'inner' logic
  3. Check equality
  4. Group and check all True

pd.Series.eq(*df1.stack().align(df2.stack(), 'inner')).groupby(level=1).all()

If Date is not the index

pd.Series.eq(
    *df1.set_index('Date').stack().align(
        df2.set_index('Date').stack(), 'inner'
    )
).groupby(level=1).all()
like image 190
piRSquared Avatar answered Oct 05 '22 17:10

piRSquared


Check with eq and isnull Data from user3483203

((df1.eq(df2))|df2.isnull()|df1.isnull()).all(0)
Out[22]: 
A     True
B     True
C    False
dtype: bool
like image 43
BENY Avatar answered Oct 05 '22 18:10

BENY