I'd like to plot a weighted CDF using ggplot
. Some old non-SO discussions (e.g. this from 2012) suggest this is not possible, but thought I'd reraise.
For example, consider this data:
df <- data.frame(x=sort(runif(100)), w=1:100)
I can show an unweighted CDF with
ggplot(df, aes(x)) + stat_ecdf()
How would I weight this by w
? For this example, I'd expect an x^2
-looking function, since the larger numbers have higher weight.
There is a mistake in your answer.
This is the right code to compute the weighted ECDF:
df <- df[order(df$x), ] # Won't change anything since it was created sorted
df$cum.pct <- with(df, cumsum(w) / sum(w))
ggplot(df, aes(x, cum.pct)) + geom_line()
The ECDF is a function F(a)
equal to the sum of weights (probabilities) of observations where x<a
divided by the total sum of weights.
But here is a more satisfying option that simply modifies the original code of the ggplot2 stat_ecdf: https://github.com/NicolasWoloszko/stat_ecdf_weighted
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With