Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how do I use ddply with a custom function and return the original data frame along with the result

Tags:

r

plyr

Take a look at my data from a task with many trials, each of which consists of 5 questions (the following code will generate a representative subset):

Subject<-c(rep(400,20),rep(401,20))
RT<-sample(x=seq(250:850),size=40)
accuracy<-c(1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0)
trial<-rep(rep(1:4, each=5),2)
question<-rep(seq(from=0,to=4),8)
data<-data.frame(Subject,trial,question,RT,accuracy)
remove(Subject,RT,accuracy,trial,question)

and will look something like this:

      ID    trial  question   RT   accuracy
1     400   1      0          131  1
2     400   1      1          768  1
3     400   1      2          300  1
4     400   1      3          130  1
5     400   1      4          168  1
...
36    401   1      0          273  1
37    401   1      1          803  1
38    401   1      2          786  0
39    401   1      3          712  1
40    401   1      4          254  0

The existing accuracy variable refers to the accuracy of each question. I'm trying to create a new variable that essentially indicates whether or not all of the questions in a particular trial are correct (i.e. accuracy = 1). For subject 400 above, the resulting variable would be c(1,1,1,1,1) indicating that all questions are correct. For subject 401 above, the resulting data would be c(0,0,0,0,0) indicating that 1 or more of the questions were incorrect. To achieve this, I did my best to decode the rather confusing help file and examples for Plyr and its variants to come up with this solution:

logic: 1) for each subject, consider questions in each trial separately 2) look at accuracy column of passed data frame 2) if accuracies sum to number of question in trial, return vector of all 1's, otherwise return vector of all 0's

this seems to get the job done:

allOK<-function(x) {
  c<-length(x[,1]) #get number of questions for this trial
  s<-sum(x$accuracy) #get sum of accuracies
  return ( data.frame(rep(as.integer(s==c))) ) #return allOK vector
}

this is my attempt to apply it to my data:

alloktest<-ddply(.data=data,c("Subject","trial"), .fun=allOK, .progress = "text")

it works, except that alloktest only contains Subject,trial, and a new variable with the results. Although the results are correct which is great, but I was hoping for it to return the original data frame with a new variable (maybe named aok).

How do I achieve that? To be clear, I'm looking for this:

      ID    trial  question   RT   accuracy  aok
1     400   1      0          131  1          1
2     400   1      1          768  1          1
3     400   1      2          300  1          1
4     400   1      3          130  1          1
5     400   1      4          168  1          1
...
36    401   1      0          273  1          0
37    401   1      1          803  1          0
38    401   1      2          786  0          0
39    401   1      3          712  1          0
40    401   1      4          254  0          0

thanks!

like image 395
TSeymour Avatar asked Oct 04 '22 16:10

TSeymour


1 Answers

The simplest approach I can think of is to use mutate which is a plyr variation on transform

 alloktest<-ddply(.data=data,c("Subject","trial"), mutate,  
     aok = sum(accuracy) == length(accuracy))

This assumes that within every subject & trial combination, there is only 1 row per question.

like image 102
mnel Avatar answered Oct 22 '22 13:10

mnel