Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R ggplot: Weighted CDF

Tags:

r

ggplot2

cdf

I'd like to plot a weighted CDF using ggplot. Some old non-SO discussions (e.g. this from 2012) suggest this is not possible, but thought I'd reraise.

For example, consider this data:

df <- data.frame(x=sort(runif(100)), w=1:100)

I can show an unweighted CDF with

ggplot(df, aes(x)) + stat_ecdf()

enter image description here

How would I weight this by w? For this example, I'd expect an x^2-looking function, since the larger numbers have higher weight.

like image 861
Max Ghenis Avatar asked Sep 09 '15 19:09

Max Ghenis


1 Answers

There is a mistake in your answer.

This is the right code to compute the weighted ECDF:

df <- df[order(df$x), ]  # Won't change anything since it was created sorted
df$cum.pct <- with(df, cumsum(w) / sum(w))
ggplot(df, aes(x, cum.pct)) + geom_line()

The ECDF is a function F(a) equal to the sum of weights (probabilities) of observations where x<a divided by the total sum of weights.

But here is a more satisfying option that simply modifies the original code of the ggplot2 stat_ecdf: https://github.com/NicolasWoloszko/stat_ecdf_weighted

like image 96
NicolasWoloszko Avatar answered Oct 01 '22 09:10

NicolasWoloszko