Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouped Bar Chart of Means in R

Tags:

r

bar-chart

I have a data set (learner) with student test scores (learner$literacy_total), their grade level (ie. grade 1, 2, 3, ..., 12), and their gender (learner$gender). I'd like to create a bar plot that has grade on the x axis, and the average score on the y axis, with two columns for each grade (one for males and one for females) so I can see how boys/girls do in each grade. I can easily create a plot of the overall average for each grade using the following code:

fig.dist <- split(learner$literacy_total, learner$learner_grade)
fig.mean <- sapply(fig.dist, mean, na.rm = TRUE)
barplot(fig.mean)

But how do I group these so that for each grade I can see the average test scores for boys/girls separately.

In other questions I've seen code that either groups categories or graphs the means, but I'm struggling with how to put the two together.

like image 403
Ardyn Avatar asked Oct 23 '25 12:10

Ardyn


2 Answers

To extend @detroyejr's answer, consider tapply which slices a vector by various factor(s) and applies a function such as mean to each subset returning a named vector or matrix.

However, to align to your original overall mean barplot, transpose the tapply result with t() for male/female rownames and 1-12 grades as colnames. Then use beside=TRUE for unstacked bars.

gender.mean <- t(tapply(learner$literacy_total,
                        list(learner$learner_grade, learner$gender), mean))

barplot(gender.mean, col=c("darkblue","red"), beside=TRUE, legend=rownames(gender.mean))

To demonstrate with random data:

set.seed(888)
learner <- data.frame(
  learner_grade = replicate(50, sample(seq(12), 1, replace=TRUE)),
  gender = replicate(50, sample(c("MALE", "FEMALE"), 1, replace=TRUE)),
  literacy_total = abs(rnorm(50)*100)
)

gender.mean <- t(tapply(learner$literacy_total, 
                        list(learner$learner_grade, learner$gender), mean))

barplot(gender.mean, col=c("darkblue","red"), beside=TRUE, legend=rownames(gender.mean))

Bar Plot Output

like image 157
Parfait Avatar answered Oct 25 '25 01:10

Parfait


You can use tapply (see here or help(tapply) for more info). So, something like this using your dataset:

tapply(df[["literacy_total"]], list(df[["learner_grade"]], df[["gender"]]), mean)

In this example, tapply essentially breaks literacy_total into each combination of learner_grade and gender available and computes the mean value at each grouping. You can see another example using:

tapply(mtcars$mpg, list(mtcars$cyl, mtcars$am), mean)

It's easier to answer if you provide a reproducible example, but this might get you started.

like image 44
detroyejr Avatar answered Oct 25 '25 01:10

detroyejr