Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

average an unknown number of responses per respondent; R [duplicate]

Tags:

split

r

aggregate

Scenario: I have a df, "scores" of multiple users attempt(s) at passing a test. Each observation is an attempt with the userID, and score. Some users may pass on their first attempt, some might take several; they get unlimited attempts. I want to find the average score for each user.

For example:

userID = c(1:20, sample(1:20, 10, replace = TRUE))
score = c(rnorm(15, mean = 60, sd = 10), rnorm(8, mean = 70, sd = 5), 
rnorm(7, mean = 90, sd = 2))
scores = data.frame(userID, score)

I need an end result data frame that is just a list of unique userIDs with the average of all of their attempts (whether they attempted once or several times).

Of all the dumb approaches I've tried, my most recent was:

avgScores = aggregate(scores, by=list("userID"), "mean")

and got the following error message: "arguments must have same length." I've also tried sorting and sub-setting (actual data frame has time stamps) and wiggling my nose and tapping my shoes together but I'm getting no where and this noob brain is fried.

THANK YOU

like image 436
blerg Avatar asked Dec 19 '22 06:12

blerg


2 Answers

Better (more elegant ) here to use aggregate with the formula form :

aggregate(score~userID,scores,mean)

Or using the classic form as you have tried , but you get a slightly different result :

aggregate(scores,by=list(userID),mean) ## using name and not string

Of course if you have big data.frame , better to use one of the solution as suggested in other answers.

like image 75
agstudy Avatar answered Jan 26 '23 23:01

agstudy


#data.table
library(data.table)
DT<-data.table(scores)
DT[,.(mean_score=mean(score)),by=userID]

#dplyr
library(dplyr)
scores %>%
group_by(userID)%>%
summarise(mean_score=mean(score))
like image 31
Metrics Avatar answered Jan 27 '23 00:01

Metrics