I am a teacher, and would like to correctly use the data.table
package in R
to automatically grade student answers in a log file, i.e. add a column called correct
if the student answer to a particular question, is the correct answer to that question, and 0 otherwise. I can do this easily if each question has only 1 answer, but I am getting tripped up if a question has multiple possible answers (questions and their possible correct answers are stored in another table)
Below is a MWE:
set.seed(123)
question_table <- data.table(id=c(1,1,2,2,3,4),correct_ans=sample(1:4,6,replace = T))
log <- data.table(student=sample(letters[1:3],10,replace = T),
question_id=c(1,1,1,2,2,2,3,3,4,4),
student_answer= c(2,4,1,3,2,4,4,5,2,1))
My question lies in what is the correct data.table
way to use ifelse
in j
, especially if we depend on another table?
log[,correct:=ifelse(student_answer %in%
question_table[log$question_id %in% id]$correct_ans,1,0)]
As can be seen below, question 1 and 2 both have multiple possible correct answers.
> question_table
id correct_ans
1: 1 2
2: 1 4
3: 2 2
4: 2 4
5: 3 4
6: 4 1
While the correct column is calculated without errors, something isn't right: e.g. when student b
answers question, he is awarded a correct score, even though he answered incorrectly. Only some entries of the correct
column are off, which leads me to believe there is something i am not getting with how variables have are scoped.
> log
student question_id student_answer correct
1: b 1 2 1
2: c 1 4 1
3: b 1 1 1 <- ?
4: b 2 3 0
5: c 2 2 1
6: b 2 4 1
7: c 3 4 1
8: b 3 5 0
9: a 4 2 1 <- ?
10: c 4 1 1
I considered making a helper column with the correct ans in the log
table by join
ing with question_table
, but that does not work since the key is not unique in the latter.
Any and all help would be appreciated. Thanks in advance.
You can use a join:
# initialize to zero
log[, correct := 0L ]
# update to 1 if matched
log[question_table, on=c(question_id = "id", student_answer = "correct_ans"),
correct := 1L ]
student question_id student_answer correct
1: b 1 2 1
2: c 1 4 1
3: b 1 1 0
4: b 2 3 0
5: c 2 2 1
6: b 2 4 1
7: c 3 4 1
8: b 3 5 0
9: a 4 2 0
10: c 4 1 1
How it works. The syntax for an update join is X[Y, on=cols, xvar := z]
:
X
and Y
, use on=c(xcol = "ycol", xcol2 = "ycol2")
or, in version 1.9.7+, .(xcol = ycol, xcol2 = ycol2)
.xvar := z
will only operate on the rows of X
that are matched. Sometimes, it is also useful to use by=.EACHI
here, depending on how many rows of X
are matched by each in Y
and how complicated the expression for z
is.See ?data.table
for full documentation on the syntax.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With