The R
ppoints function is described as:
Ordinates for Probability Plotting
Description:
Generates the sequence of probability points ‘(1:m - a)/(m +
(1-a)-a)’ where ‘m’ is either ‘n’, if ‘length(n)==1’, or
‘length(n)’.
Usage:
ppoints(n, a = ifelse(n <= 10, 3/8, 1/2))
...
I've been trying to replicate this function in python
and I have a couple of doubts.
1- The first m
in (1:m - a)/(m + (1-a)-a)
is always an integer: int(n)
(ie: the integer of n
) if length(n)==1
and length(n)
otherwise.
2- The second m
in the same equation is NOT an integer if length(n)==1
(it assumes the real value of n
) and it IS an integer (length(n)
) otherwise.
3- The n
in a = ifelse(n <= 10, 3/8, 1/2)
is the real number n
if length(n)==1
and the integer length(n)
otherwise.
This points are not made clear at all in the description and I'd very much appreciate if someone could confirm that this is the case.
Well this was initially posted at https://stats.stackexchange.com/ because I was hoping to get the input of staticians who work with the ppoints
function. Since it has been migrated here, I'll paste below the function I wrote to replicate ppoints
in python
. I've tested it and both seem to give back the same results, but I'd be great if someone could clarify the points made above because they are not made at all clear by the function's description.
def ppoints(vector):
'''
Mimics R's function 'ppoints'.
'''
m_range = int(vector[0]) if len(vector)==1 else len(vector)
n = vector[0] if len(vector)==1 else len(vector)
a = 3./8. if n <= 10 else 1./2
m_value = n if len(vector)==1 else m_range
pp_list = [((m+1)-a)/(m_value+(1-a)-a) for m in range(m_range)]
return pp_list
Functional programming nature of R provides users with extremely simple and compact interface for quick calculations of probabilities and essential descriptive/inferential statistics for a data analysis problem.
ppoints() is used in qqplot and qqnorm to generate the set of probabilities at which to evaluate the inverse distribution.
I would implement this with numpy:
import numpy as np
def ppoints(n, a):
""" numpy analogue or `R`'s `ppoints` function
see details at http://stat.ethz.ch/R-manual/R-patched/library/stats/html/ppoints.html
:param n: array type or number"""
try:
n = np.float(len(n))
except TypeError:
n = np.float(n)
return (np.arange(n) + 1 - a)/(n + 1 - 2*a)
Sample output:
>>> ppoints(5, 1./2)
array([ 0.1, 0.3, 0.5, 0.7, 0.9])
>>> ppoints(5, 1./4)
array([ 0.13636364, 0.31818182, 0.5 , 0.68181818, 0.86363636])
>>> n = 10
>>> a = 3./8. if n <= 10 else 1./2
>>> ppoints(n, a)
array([ 0.06097561, 0.15853659, 0.25609756, 0.35365854, 0.45121951,
0.54878049, 0.64634146, 0.74390244, 0.84146341, 0.93902439])
One can use R fiddle to test implementation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With