Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I need to compare two df's for matches and mismatches, I also need to identify which answer is from the master df in the event of a mismatch

Tags:

python

pandas

I have two data frames in python and want to compare the two to look for matches as well as mismatches. It is important though that I can identify in the mismatches which answer is from the master answer sheet and which answer is from the users answer.

I decided to use the pandas df.where function to achieve this, it worked except for the ability to identify which answer is from the master answer sheet and which is from the users answers in the event of a mismatch.

# I have a DataFrame called df_master (master answer sheet)

import pandas as pd

df_master = pd.DataFrame({'B0': [1, 0, 0, 0, 0, 1],
            'B1': [0, 0, 0, 0, 1, 0],
            'B2': [0, 1, 0, 0, 0, 0],
            'B3': [0, 0, 1, 0, 0, 0],
            'B4': [0, 0, 0, 1, 0, 0]})
print(df_master)

#    B0  B1  B2  B3  B4
# 0   1   0   0   0   0
# 1   0   0   1   0   0
# 2   0   0   0   1   0
# 3   0   0   0   0   1
# 4   0   1   0   0   0
# 5   1   0   0   0   0

# I also have a DataFrame called df_answers (users answers)

df_answers = pd.DataFrame({'B0': [0, 0, 0, 0, 0, 1],
            'B1': [1, 0, 0, 0, 1, 0],
            'B2': [0, 0, 0, 0, 0, 0],
            'B3': [0, 1, 1, 0, 0, 0],
            'B4': [0, 0, 0, 1, 0, 0]})

print(df_answers)

#    B0  B1  B2  B3  B4
# 0   0   1   0   0   0
# 1   0   0   0   1   0
# 2   0   0   0   1   0
# 3   0   0   0   0   1
# 4   0   1   0   0   0
# 5   1   0   0   0   0

# when I compare the the two df's, for each match, matches correctly, where there
# is no match I have used other=2.  However this is a problem as I cannot see which is
# the correct answer.  It would be great if there was a way to work the code to reflect
# the master as a 3 and the incorrect answer from the users to stay 2?

comparison = df_master.where(df_master.values==df_answers.values, other=2)

print(comparison)

# My Results

#    B0  B1  B2  B3  B4
# 0   2   2   0   0   0
# 1   0   0   2   2   0
# 2   0   0   0   1   0
# 3   0   0   0   0   1
# 4   0   1   0   0   0
# 5   1   0   0   0   0

# Expected Results

#    B0  B1  B2  B3  B4
# 0   3   2   0   0   0
# 1   0   0   3   2   0
# 2   0   0   0   1   0
# 3   0   0   0   0   1
# 4   0   1   0   0   0
# 5   1   0   0   0   0
like image 328
Steve Avatar asked Dec 09 '25 05:12

Steve


1 Answers

In your case using replace after str sum , ps : you define the mapping by yourself like {'00':'both failed', '01': 'master failed'...}

(df_answers.astype(str)+df_master.astype(str)).replace({'00':0,'01':3,'10':2,'11':1})
Out[129]: 
   B0  B1  B2  B3  B4
0   3   2   0   0   0
1   0   0   3   2   0
2   0   0   0   1   0
3   0   0   0   0   1
4   0   1   0   0   0
5   1   0   0   0   0
like image 133
BENY Avatar answered Dec 10 '25 19:12

BENY



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!