Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to draw multiple CDF plots of vectors with different number of rows

Tags:

r

ggplot2

cdf

ecdf

I want to draw the CDF plot of multiple variables in the same graph. The length of the variables are different. To simplify the detail, I use the following example code:

library("ggplot2")

a1 <- rnorm(1000, 0, 3)
a2 <- rnorm(1000, 1, 4)
a3 <- rnorm(800, 2, 3)

df <- data.frame(x = c(a1, a2, a3),ggg = gl(3, 1000))
ggplot(df, aes(x, colour = ggg)) + stat_ecdf()+ coord_cartesian(xlim = c(0, 3)) + scale_colour_hue(name="my legend", labels=c('AAA','BBB', 'CCC'))

As we can see, the a3 is 800 length, which is different with a1, a2. When I run the code, it shows:

> df <- data.frame(x = c(a1, a2, a3),ggg = gl(3, 1000))
Error in data.frame(x = c(a1, a2, a3), ggg = gl(3, 1000)) : 
arguments imply differing number of rows: 2800, 3000
> ggplot(df, aes(x, colour = ggg)) + stat_ecdf()+ coord_cartesian(xlim = c(0, 3)) +    scale_colour_hue(name="my legend", labels=c('AAA','BBB', 'CCC'))
Error: ggplot2 doesn't know how to deal with data of class function

So, how can I draw the cdf plots of different variables that is not the same length in the same graph using ggplot2? Looking forward for helps!

like image 250
Excalibur Avatar asked May 17 '14 17:05

Excalibur


1 Answers

ggplot has no trouble at all dealing with different counts in each group. The problem is with your creation of the factor ggg. Use this:

library(ggplot2)

a1 <- rnorm(1000, 0, 3)
a2 <- rnorm(1000, 1, 4)
a3 <- rnorm(800, 2, 3)

df <- data.frame(x = c(a1, a2, a3), ggg=factor(rep(1:3, c(1000,1000,800))))
ggplot(df, aes(x, colour = ggg)) + 
  stat_ecdf()+
  scale_colour_hue(name="my legend", labels=c('AAA','BBB', 'CCC'))

Also, the way you have it set up, setting xlim=c(0,3), draws the cdf on [0,3], which as you can see in the plot above is more or less a straight line.

like image 112
jlhoward Avatar answered Sep 19 '22 17:09

jlhoward