There are some posts about plotting cumulative densities in ggplot. I'm currently using the accepted answer from Easier way to plot the cumulative frequency distribution in ggplot? for plotting my cumulative counts. But this solution involves pre-calculating the values beforehand.
Here I'm looking for a pure ggplot solution. Let's show what I have so far:
x <- data.frame(A=replicate(200,sample(c("a","b","c"),1)),X=rnorm(200))
stat_ecdf
I can use ggplot's stat_ecdf
, but it only plots cumulative densities:
ggplot(x,aes(x=X,color=A)) + geom_step(aes(y=..y..),stat="ecdf")
I'd like to do something like the following, but it doesn't work:
ggplot(x,aes(x=X,color=A)) + geom_step(aes(y=..y.. * ..count..),stat="ecdf")
cumsum
and stat_bin
I found an idea about using cumsum
and stat_bin
:
ggplot(x,aes(x=X,color=A)) + stat_bin(aes(y=cumsum(..count..)),geom="step")
But as you can see, the next color doesn't start at y=0
, but where the last color ended.
What I'd like to have from best to worst:
Ideally a simple fix to the not working
ggplot(x,aes(x=X,color=A)) + geom_step(aes(y=..y.. * ..count..),stat="ecdf")
A more complicated way to use stat_ecdf
with counts.
cumsum
approach, since it gives worse (binned) results.To create a cumulative sum plot in base R, we can simply use plot function. For cumulative sums inside the plot, the cumsum function needs to be used for the variable that has to be summed up with cumulation.
If we want to convert our histogram to a cumulative histogram, we can use the cumsum function within the geom_histogram function as shown below: ggplot(data, aes(x)) + # Draw cumulative ggplot2 histogram geom_histogram(aes(y = cumsum(..count..)))
This will not solve directly problem with grouping of lines but it will be workaround.
You can add three calls to stat_bin()
where you subset your data according to A
levels.
ggplot(x,aes(x=X,color=A)) + stat_bin(data=subset(x,A=="a"),aes(y=cumsum(..count..)),geom="step")+ stat_bin(data=subset(x,A=="b"),aes(y=cumsum(..count..)),geom="step")+ stat_bin(data=subset(x,A=="c"),aes(y=cumsum(..count..)),geom="step")
Another possibility is to multiply values of ..y..
with number of observations in each level. To get this number of observations at this moment only way I found is to precalculate them before plotting and add them to original data frame. I named this column len
. Then in geom_step()
inside aes()
you should define that you will use variable len=len
and then define y
values as y=..y.. * len
.
set.seed(123) x <- data.frame(A=replicate(200,sample(c("a","b","c"),1)),X=rnorm(200)) library(plyr) df <- ddply(x,.(A),transform,len=length(X)) ggplot(df,aes(x=X,color=A)) + geom_step(aes(len=len,y=..y.. * len),stat="ecdf")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With