Scenario: I have a df, "scores" of multiple users attempt(s) at passing a test. Each observation is an attempt with the userID, and score. Some users may pass on their first attempt, some might take several; they get unlimited attempts. I want to find the average score for each user.
For example:
userID = c(1:20, sample(1:20, 10, replace = TRUE))
score = c(rnorm(15, mean = 60, sd = 10), rnorm(8, mean = 70, sd = 5),
rnorm(7, mean = 90, sd = 2))
scores = data.frame(userID, score)
I need an end result data frame that is just a list of unique userIDs with the average of all of their attempts (whether they attempted once or several times).
Of all the dumb approaches I've tried, my most recent was:
avgScores = aggregate(scores, by=list("userID"), "mean")
and got the following error message: "arguments must have same length." I've also tried sorting and sub-setting (actual data frame has time stamps) and wiggling my nose and tapping my shoes together but I'm getting no where and this noob brain is fried.
THANK YOU
Better (more elegant ) here to use aggregate
with the formula form :
aggregate(score~userID,scores,mean)
Or using the classic form as you have tried , but you get a slightly different result :
aggregate(scores,by=list(userID),mean) ## using name and not string
Of course if you have big data.frame , better to use one of the solution as suggested in other answers.
#data.table
library(data.table)
DT<-data.table(scores)
DT[,.(mean_score=mean(score)),by=userID]
#dplyr
library(dplyr)
scores %>%
group_by(userID)%>%
summarise(mean_score=mean(score))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With