Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ifelse assignment in data.table

Tags:

r

data.table

I am a teacher, and would like to correctly use the data.table package in R to automatically grade student answers in a log file, i.e. add a column called correct if the student answer to a particular question, is the correct answer to that question, and 0 otherwise. I can do this easily if each question has only 1 answer, but I am getting tripped up if a question has multiple possible answers (questions and their possible correct answers are stored in another table)

Below is a MWE:

set.seed(123)
question_table <- data.table(id=c(1,1,2,2,3,4),correct_ans=sample(1:4,6,replace = T))
log <- data.table(student=sample(letters[1:3],10,replace = T),
                  question_id=c(1,1,1,2,2,2,3,3,4,4), 
                  student_answer= c(2,4,1,3,2,4,4,5,2,1))

My question lies in what is the correct data.table way to use ifelse in j, especially if we depend on another table?

log[,correct:=ifelse(student_answer %in% 
                          question_table[log$question_id %in% id]$correct_ans,1,0)]

As can be seen below, question 1 and 2 both have multiple possible correct answers.

> question_table
   id correct_ans
1:  1           2
2:  1           4
3:  2           2
4:  2           4
5:  3           4
6:  4           1

While the correct column is calculated without errors, something isn't right: e.g. when student b answers question, he is awarded a correct score, even though he answered incorrectly. Only some entries of the correct column are off, which leads me to believe there is something i am not getting with how variables have are scoped.

> log
    student question_id student_answer correct
 1:       b           1              2       1
 2:       c           1              4       1
 3:       b           1              1       1   <- ?
 4:       b           2              3       0
 5:       c           2              2       1
 6:       b           2              4       1
 7:       c           3              4       1
 8:       b           3              5       0
 9:       a           4              2       1   <- ?
10:       c           4              1       1

I considered making a helper column with the correct ans in the log table by joining with question_table, but that does not work since the key is not unique in the latter.

Any and all help would be appreciated. Thanks in advance.

like image 768
Sameer Bhatnagar Avatar asked Oct 28 '16 23:10

Sameer Bhatnagar


Video Answer


1 Answers

You can use a join:

# initialize to zero
log[, correct := 0L ]

# update to 1 if matched
log[question_table, on=c(question_id = "id", student_answer = "correct_ans"),
   correct := 1L ] 

    student question_id student_answer correct
 1:       b           1              2       1
 2:       c           1              4       1
 3:       b           1              1       0
 4:       b           2              3       0
 5:       c           2              2       1
 6:       b           2              4       1
 7:       c           3              4       1
 8:       b           3              5       0
 9:       a           4              2       0
10:       c           4              1       1

How it works. The syntax for an update join is X[Y, on=cols, xvar := z]:

  • If col names differ between X and Y, use on=c(xcol = "ycol", xcol2 = "ycol2") or, in version 1.9.7+, .(xcol = ycol, xcol2 = ycol2).
  • xvar := z will only operate on the rows of X that are matched. Sometimes, it is also useful to use by=.EACHI here, depending on how many rows of X are matched by each in Y and how complicated the expression for z is.

See ?data.table for full documentation on the syntax.

like image 146
Frank Avatar answered Sep 21 '22 01:09

Frank