Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot2 density plotting different size of data in R

Tags:

r

ggplot2

I have two data sets, their size is 500 and 1000. I want to plot density for these two data sets in one plot.
I have done some search in google.

  • r-geom-density-values-in-y-axis
  • ggplot2-plotting-two-or-more-overlapping-density-plots-on-the-same-graph/

the data sets in above threads are the same

df <- data.frame(x = rnorm(1000, 0, 1), y = rnorm(1000, 0, 2), z = rnorm(1000, 2, 1.5))

But if I have different data size, I should normalize the data first in order to compare the density between data sets.

Is it possible to make density plot with different data size in ggplot2?

like image 787
l0o0 Avatar asked Dec 07 '17 04:12

l0o0


Video Answer


1 Answers

By default, all densities are scaled to unit area. If you have two datasets with different amounts of data, you can plot them together like so:

df1 <- data.frame(x = rnorm(1000, 0, 2))
df2 <- data.frame(y = rnorm(500, 1, 1))

ggplot() + 
  geom_density(data = df1, aes(x = x), 
               fill = "#E69F00", color = "black", alpha = 0.7) + 
  geom_density(data = df2, aes(x = y),
               fill = "#56B4E9", color = "black", alpha = 0.7)

enter image description here

However, from your latest comment, I take that that's not what you want. Instead, you want the areas under the density curves to be scaled relative to the amount of data in each group. You can do that with the ..count.. aesthetics:

df1 <- data.frame(x = rnorm(1000, 0, 2), label=rep('df1', 1000))
df2 <- data.frame(x = rnorm(500, 1, 1), label=rep('df2', 500))
df=rbind(df1, df2)

ggplot(df, aes(x, y=..count.., fill=label)) + 
  geom_density(color = "black", alpha = 0.7) + 
  scale_fill_manual(values = c("#E69F00", "#56B4E9"))

enter image description here

like image 116
Claus Wilke Avatar answered Oct 11 '22 22:10

Claus Wilke