I have read other posts (such as here) on getting the "reverse" of quantile -- that is, to get the percentile that corresponds to a certain value in a series of values.
However, the answers don't give me the same value as quantile for the same data series.
I have also researched that quantile provides 9 different algorithms to calculate percentile.
So my question: is there a reliable way to get the reverse of the quantile function? ecdf does not take a "type" argument so it doesn't seem that one can make sure they are using the same method.
Reproducible example:
# Simple data
x = 0:10
pcntile = 0.5
# Get value corresponding to a percentile using quantile
(pcntile_value <- quantile(x, pcntile))
# 50%
# 5 # returns 5 as expected for 50% percentile
# Get percentile corresponding to a value using ecdf function
(pcntile_rev <- ecdf(x)(5))
# [1] 0.5454545 #returns 54.54% as the percentile for the value 5
# Not the same answer as quantile produces
The answer in the link is really good, but perhaps it helps, to have a look at ecdf
Just run the following code:
# Simple data
x = 0:10
p0 = 0.5
# Get value corresponding to a percentile using quantile
sapply(c(1:7), function(i) quantile(x, p0, type = i))
# 50% 50% 50% 50% 50% 50% 50%
# 5.0 5.0 5.0 4.5 5.0 5.0 5.0
Thus, it is not a question of type. You can step into the function using debug:
# Get percentile corresponding to a value using ecdf function
debug(ecdf)
my_ecdf <- ecdf(x)
The crucial part is
rval <- approxfun(vals, cumsum(tabulate(match(x, vals)))/n,
method = "constant", yleft = 0, yright = 1, f = 0, ties = "ordered")
After this you can check
data.frame(x = vals, y = round(cumsum(tabulate(match(x, vals)))/n, 3), stringsAsFactors = FALSE)
and as you devide by n=11
the result is not surprising. As said, for theory have a look at the other answer.
By the way, you can also plot the function
plot(my_ecdf)
Concerning your comment. I think it's not a question of reliability but a question of how to define the "inverse distribution function, if it does not exist":
A good reference for generalized inverses: Paul Embrechts, Marius Hofert: "A note on generalized inverses", Math Meth Oper Res (2013) 77:423–432 DOI
ecdf
is giving the result of the formula in the documentation.
x <- 0:10
Fn <- ecdf(x)
Now, the object Fn
is an interpolating step function.
str(Fn)
#function (v)
# - attr(*, "class")= chr [1:3] "ecdf" "stepfun" "function"
# - attr(*, "call")= language ecdf(x)
And it keeps the original x
values and the corresponding y
values.
environment(Fn)$x
# [1] 0 1 2 3 4 5 6 7 8 9 10
environment(Fn)$y
# [1] 0.09090909 0.18181818 0.27272727 0.36363636 0.45454545 0.54545455
# [7] 0.63636364 0.72727273 0.81818182 0.90909091 1.00000000
The latter are exactly the same values as the result of what the documentation says is the formula used to compute them. From help('ecdf')
:
For observations x= (x1,x2, ... xn), Fn is the fraction of
observations less or equal to t, i.e.,Fn(t) = #{xi <= t}/n = 1/n sum(i=1,n) Indicator(xi <= t).
Instead of 1:length(x)
I will use seq_along
.
seq_along(x)/length(x)
# [1] 0.09090909 0.18181818 0.27272727 0.36363636 0.45454545 0.54545455
# [7] 0.63636364 0.72727273 0.81818182 0.90909091 1.00000000
Fn(x)
# [1] 0.09090909 0.18181818 0.27272727 0.36363636 0.45454545 0.54545455
# [7] 0.63636364 0.72727273 0.81818182 0.90909091 1.00000000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With