Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot graphing of proportions of observations within categories

Tags:

r

ggplot2

I am looking for advice on better ways to plot the proportion of observations in various categories.

I have a dataframe that looks something like this:

cat1 <- c("high", "low", "high", "high", "high", "low", "low", "low", "high", "low", "low")
cat2 <- c("1-young", "3-old", "2-middle-aged", "3-old", "2-middle-aged", "2-middle-aged", "1-young", "1-young", "3-old", "3-old", "1-young")
df <- as.data.frame(cbind(cat1, cat2))

In the example here, I want to plot the proportion of each age group that have the value "high", and the proportion of each age group that have the value "low". More generally, I want to plot, for each value of category 2, the percent of observations that fall into each of the levels of category 1.

The following code produces the right result, but only by manually counting and dividing before plotting. Is there a good way to do this on the fly within ggplot?

library(plyr)
count1 <- count(df, vars=c("cat1", "cat2"))
count2 <- count(df, "cat2")

count1$totals <- count2$freq
count1$pct <- count1$freq / count1$totals

ggplot(data = count1, aes(x=cat2, y=pct))+
facet_wrap(~cat1)+
geom_bar()

This previous stackoverflow question offers something similar, with the following code:

ggplot(mydataf, aes(x = foo)) + 
geom_bar(aes(y = (..count..)/sum(..count..)))

But I do not want "sum(..count..)" - which gives the sum of the count of all the bins - in the denominator; rather, I want the sum of the count of each of the "cat2" categories. I have also studied the stat_bin documentation.

I would be grateful for any tips and suggestions on how to make this work.

like image 312
user1257313 Avatar asked Jun 14 '12 02:06

user1257313


People also ask

What graph is best for proportions?

Waffle charts are an excellent alternative. While waffle charts are similar to pie charts, they actually encode each level, class or value of a categorical variable as a proportion of squares.

How do you find proportions in R?

To find the proportion using normal distribution in R, we can use pnorm function where we can provide the mean and standard deviation of population along with sample, also the tail position can be set by using lower. tail argument to TRUE or FALSE.

What is the difference between plot and Ggplot?

The base plotting paradigm is "ink on paper" whereas the lattice and ggplot paradigms are basically writing a program that uses the grid -package to accomplish the low-level output to the target graphics devices.


1 Answers

I will understand if this isn't really what you're looking for, but I found your description of what you wanted very confusing until I realized that you were simply trying to visualize your data in a way that seemed very unnatural to me.

If someone asked me to produce a graph with the proportions within each category, I'd probably turn to a segmented bar chart:

ggplot(df,aes(x = cat2,fill = cat1)) + 
    geom_bar(position = "fill")

enter image description here

Note the y axis records proportions, not counts, as you wanted.

like image 175
joran Avatar answered Oct 13 '22 01:10

joran