Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Plotting histograms for multiple datasets with percentages with ggplot2

I have four datasets, and I would like to plot histograms of the data all on the same plot. I've put all of the data into one data frame. I also can plot the histograms on one plot. However, I'm having trouble plotting percentages rather than counts. When I use the code below, it plots the percentages as a total of all the counts, but I would prefer that the percentages be relative to each dataset. Is this possible?

all <- rbind(data.frame(fill = "A", Events = A$Events), 
    data.frame(fill = "B", Events = B$Events), 
    data.frame(fill = "C", Events = C$Events), 
    data.frame(fill = "D", Events = D$Events)
ggplot(all,aes(x=Events, fill = fill)) + 
 geom_histogram(aes(y = ..count../sum(..count..)), position = 'dodge')

Edit

Here is some example data:

fill Events  
1   A   1  
2   A   1  
3   A   3  
4   A   1  
5   A   1  
6   A   6  
7   A   2  
8   A   1  
9   A   1  
10  A   2  
11  A   1  
12  A   1  
13  A   1  
14  A   1  
15  A   5  
16  A   1  
17  A   2  
18  A   2  
19  A   1  
20  A   1  
21  A   1  
22  A   1  
23  A   2  
24  A   1  
25  A   2  
26  A   1  
27  B   2  
28  B   3  
29  B   1  
30  B   3  
31  B   2  
32  B   5  
33  B   1  
34  B   1  
35  B   1  
36  B   2  
37  B   1  
38  B   1  
39  B   1  
40  B   1  
41  B   1  
42  B   1  
43  B   1  
44  B   1  
45  B   1  
46  B   4  
47  B   3  
48  B   3  
49  B   5  
50  B   3  
51  C   1  
52  C   2  
53  C   2  
54  C   3  
55  C   3  
56  C   9  
57  C   8  
58  C   1  
59  C   1  
60  C   2  
61  C   2  
62  C   1  
63  C   2  
64  C  39  
65  C  43  
66  C 194  
67  C 129  
68  C 186  
69  C   1  
70  C   2  
71  C   7  
72  C   4  
73  C   1   
74  D  12  
75  D   3  
76  D   2  
77  D   3  
78  D   8  
79  D  20  
80  D   5  
81  D   1  
82  D   4  
83  D   9  
84  D  51  
85  D  12  
86  D   7  
87  D   6  
88  D   7  
89  D   7  
90  D   9  
91  D  17  
92  D  18  
93  D   8  
94  D   7  
95  D   6  
96  D  10  
97  D  27  
98  D  11  
99  D  21  
100 D  89  
101 D  47  
102 D   1  
like image 983
user2167681 Avatar asked Mar 29 '26 20:03

user2167681


1 Answers

You were close, but need to use (..density..)*binwidth rather than ..count../sum(..count..).

# Your data:
all <- data.frame(fill=rep(LETTERS[1:4],c(26,24,23,29)),
                  Events=c(1,1,3,1,1,6,2,1,1,2,1,1,1,1,5,1,2,2,1,1,1,1,2,1,2,1,2,3,1,3,2,5,1,1,1,2,1,1,1,1,1,1,1,1,1,4,3,3,5,3,1,2,2,3,3,9,8,1,1,2,2,1,2,39,43,194,129,186,1,2,7,4,1,12,3,2,3,8,20,5,1,4,9,51,12,7,6,7,7,9,17,18,8,7,6,10,27,11,21,89,47,1))

bw <- 20 # set the binwidth

# plot
p1<-ggplot(all,aes(x=Events, fill=fill)) + 
  geom_histogram(aes(y=(..density..)*bw), position='dodge', binwidth=bw)
p1

desired output

Here is a check to make sure the values add to 1:

aggregate(ymax ~ group, data = as.data.frame(print(p1)$data[[1]]), FUN = sum)
#  group ymax
#1     1    1
#2     2    1
#3     3    1
#4     4    1

Older answer

Here is an example:

library(ggplot2)

ggplot(mtcars,aes(x=mpg, fill = as.factor(cyl))) +
  geom_histogram(aes(y = ..density..), position = 'dodge', binwidth=5)

As a check, adjust the binwidth to 100 and each of the columns will have a value of 0.01 (100% / 100 = 0.01).

(Edit) Here is another example, using an oversimplified data set to highlight the result:

library(data.table)
# Calculate the average miles per gallon by number of cylinders
mtcars_avg <- as.data.table(mtcars)[,
                                    list(mpg_avg=mean(mpg)),
                                    by=list(cyl=as.factor(cyl))][order(cyl)][order(cyl)]
mtcars_avg
#   cyl  mpg_avg
#1:   4 26.66364
#2:   6 19.74286
#3:   8 15.10000

# OP version, with unwanted results of 33% per color (cyl)
ggplot(mtcars_avg, aes(x=mpg_avg, fill=cyl)) +
  geom_histogram(aes(y = ..count../sum(..count..)), position = 'dodge', binwidth=1)

original

# ..density.. version, which shows the desired results of 100% per color (cyl)
ggplot(mtcars_avg, aes(x=mpg_avg, fill=cyl)) +
  geom_histogram(aes(y = ..density..), position = 'dodge', binwidth=1)

solution

You may also want to consider using geom_density instead:

ggplot(mtcars,aes(x=mpg, fill = as.factor(cyl))) + geom_density(alpha=0.5)
like image 200
dnlbrky Avatar answered Apr 02 '26 12:04

dnlbrky



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!