I have a dataframe like this:
right_answer rater1 rater2 rater3 item
1 1 1 2 S01
1 1 2 2 S02
2 1 2 1 S03
2 2 1 2 S04
and I need to get those rows or values in 'items' where at least two out of the three raters gave the wrong answer. I could already check if all the raters agree with each other with this code:
df.where(df[['rater1', 'rater2', 'rater3']].eq(df.iloc[:, 0], axis=0).all(1) == True)
I don't want to calculate a column with a majority voting because maybe I need to adjust the number of raters that have to agree or disagree wih the right answer.
Thanks for help
Use, DataFrame.filter
to filter the dataframe containing columns like rater
, then use DataFrame.ne
along axis=0
to compare the columns containing rater
with the column right_answer
, then use DataFrame.sum
along axis=1
to get number of raters
who have given wrong answer, then use Series.ge
to create a boolean mask, finally filter the dataframe rows using this mask
:
mask = (
df.filter(like='rater')
.ne(df['right_answer'], axis=0).sum(axis=1).ge(2)
)
df = df[mask]
Result:
# print(df)
right_answer rater1 rater2 rater3 item
1 1 1 2 2 S02
2 2 1 2 1 S03
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With