Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Show the percentage instead of count in histogram using ggplot2 | R

Tags:

I'm using a histogram to plot my 3 groups data. But as histogram do, it counts how much each group have those values (in x-axis) and what I want is to the percentage of how much (in %) this value appears/occurs.

Here is my generated figure, I use this regular code to plot the histogram:

ggplot2.histogram(data=dat, xName='dens',
                  groupName='lines', legendPosition="top",
                  alpha=0.1) + 
  labs(x="X", y="Count") +
  theme(panel.border = element_rect(colour = "black"),
        panel.grid.minor = element_blank(), 
        axis.line = element_line(colour = "black")) + 
  theme_bw()+
  theme(legend.title=element_blank())

enter image description here

Any ideas/suggestions?

like image 664
LamaMo Avatar asked Oct 07 '18 00:10

LamaMo


People also ask

Can you make a histogram with percentages?

Example: Draw Histogram with Percentages Using hist() & plot() Functions. The following syntax illustrates how to show percentages instead of frequency counts on the y-axis of our histogram. By running the previous code we have created Figure 2, i.e. a Base R histogram with percentages on the y-axis.

How is it possible to change the number of bins in a Ggplot histogram?

To change the number of bins in the histogram using the ggplot2 package library in the R Language, we use the bins argument of the geom_histogram() function. The bins argument of the geom_histogram() function to manually set the number of bars, cells, or bins the whole histogram will be divided into.

Can you build a histogram using ggplot2?

You can also make histograms by using ggplot2 , “a plotting system for R, based on the grammar of graphics” that was created by Hadley Wickham. This post will focus on making a Histogram With ggplot2.


2 Answers

We can replace the y aesthetic by the relative value of the count computed statistic, and set the scale to show percentages :

ggplot2.histogram(data=dat, xName='dens',
                  groupName='lines', legendPosition="top",
                  alpha=0.1) + 
  labs(x="X", y="Count") +
  theme(panel.border = element_rect(colour = "black"),
        panel.grid.minor = element_blank(), 
        axis.line = element_line(colour = "black")) + 
  theme_bw()+
  theme(legend.title=element_blank()) + 
  aes(y=stat(count)/sum(stat(count))) + 
  scale_y_continuous(labels = scales::percent)
like image 107
Moody_Mudskipper Avatar answered Nov 15 '22 00:11

Moody_Mudskipper


If I understand you correctly, then fill would answer your question?

For instance,

mtcars %>% 
ggplot(aes(x = factor(gear), group = factor(cyl), fill = factor(cyl))) + 
geom_bar(position = "fill")

enter image description here

Here, you don't have the counts any longer, but for each value along the x-axis, you have the percentage of each group (here: cylinder) plotted.

If this is not what you want, a general recommendation is to compute the data that you want to be plotted first, and then to plot it. That is, many people think it is generally advisable to separate computation/transformation/aggregation from plotting.


To follow up on my suggestion to separate computation from visualisation, let's consider the mtcars dataset and focus on gear and carb.

with(mtcars, table(gear, carb))
    carb
gear 1 2 3 4 6 8
   3 3 4 3 5 0 0
   4 4 4 0 4 0 0
   5 0 2 0 1 1 1

For instance, you see that 3 (out of 32) observations have gear = 3, carb = 1, which is a bit less than 10%. Similarly, 4 observations have gear = 4, carb = 1, which is a bit more than 10%. Let's get the percentages directly:

with(mtcars, prop.table(table(gear, carb)))
    carb
gear       1       2       3       4       6       8
   3 0.09375 0.12500 0.09375 0.15625 0.00000 0.00000
   4 0.12500 0.12500 0.00000 0.12500 0.00000 0.00000
   5 0.00000 0.06250 0.00000 0.03125 0.03125 0.03125

I have used prop.table here which also has a margin argument. That is, if you wanted to know conditional percentages, you could easily adjust this (see below). Let's stay with this for the time being, though. Let's say we want to visualize this now after we have computed the numbers, we could simply call the following:

with(mtcars, prop.table(table(gear, carb))) %>% 
             as.data.frame() %>% 
             ggplot(aes(x = factor(carb), y = Freq, group = factor(gear), fill = factor(gear))) + 
             geom_bar(stat = "identity")

which would give us:

enter image description here

Now imagine you want to get the conditional version, e.g.

with(mtcars, prop.table(table(gear, carb), margin = 1))
    carb
gear         1         2         3         4         6         8
   3 0.2000000 0.2666667 0.2000000 0.3333333 0.0000000 0.0000000
   4 0.3333333 0.3333333 0.0000000 0.3333333 0.0000000 0.0000000
   5 0.0000000 0.4000000 0.0000000 0.2000000 0.2000000 0.2000000

Notice how each row sums up to 1. This can be plotted in the same way:

with(mtcars, prop.table(table(gear, carb), margin = 1)) %>% 
as.data.frame() %>% 
ggplot(aes(x = factor(carb), y = Freq, group = factor(gear), fill = factor(gear))) + 
geom_bar(stat = "identity")

enter image description here

Note the similarity to the smoothed version produced by:

mtcars %>% 
ggplot(aes(x = factor(carb), group = factor(gear), fill = factor(gear))) + 
geom_density(alpha = 0.5)

enter image description here

like image 32
coffeinjunky Avatar answered Nov 15 '22 01:11

coffeinjunky