Take a look at my data from a task with many trials, each of which consists of 5 questions (the following code will generate a representative subset):
Subject<-c(rep(400,20),rep(401,20))
RT<-sample(x=seq(250:850),size=40)
accuracy<-c(1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0)
trial<-rep(rep(1:4, each=5),2)
question<-rep(seq(from=0,to=4),8)
data<-data.frame(Subject,trial,question,RT,accuracy)
remove(Subject,RT,accuracy,trial,question)
and will look something like this:
ID trial question RT accuracy
1 400 1 0 131 1
2 400 1 1 768 1
3 400 1 2 300 1
4 400 1 3 130 1
5 400 1 4 168 1
...
36 401 1 0 273 1
37 401 1 1 803 1
38 401 1 2 786 0
39 401 1 3 712 1
40 401 1 4 254 0
The existing accuracy variable refers to the accuracy of each question. I'm trying to create a new variable that essentially indicates whether or not all of the questions in a particular trial are correct (i.e. accuracy = 1). For subject 400 above, the resulting variable would be c(1,1,1,1,1) indicating that all questions are correct. For subject 401 above, the resulting data would be c(0,0,0,0,0) indicating that 1 or more of the questions were incorrect. To achieve this, I did my best to decode the rather confusing help file and examples for Plyr and its variants to come up with this solution:
logic: 1) for each subject, consider questions in each trial separately 2) look at accuracy column of passed data frame 2) if accuracies sum to number of question in trial, return vector of all 1's, otherwise return vector of all 0's
this seems to get the job done:
allOK<-function(x) {
c<-length(x[,1]) #get number of questions for this trial
s<-sum(x$accuracy) #get sum of accuracies
return ( data.frame(rep(as.integer(s==c))) ) #return allOK vector
}
this is my attempt to apply it to my data:
alloktest<-ddply(.data=data,c("Subject","trial"), .fun=allOK, .progress = "text")
it works, except that alloktest only contains Subject,trial, and a new variable with the results. Although the results are correct which is great, but I was hoping for it to return the original data frame with a new variable (maybe named aok).
How do I achieve that? To be clear, I'm looking for this:
ID trial question RT accuracy aok
1 400 1 0 131 1 1
2 400 1 1 768 1 1
3 400 1 2 300 1 1
4 400 1 3 130 1 1
5 400 1 4 168 1 1
...
36 401 1 0 273 1 0
37 401 1 1 803 1 0
38 401 1 2 786 0 0
39 401 1 3 712 1 0
40 401 1 4 254 0 0
thanks!
The simplest approach I can think of is to use mutate
which is a plyr
variation on transform
alloktest<-ddply(.data=data,c("Subject","trial"), mutate,
aok = sum(accuracy) == length(accuracy))
This assumes that within every subject & trial combination, there is only 1 row per question.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With