Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Imitating 'ppoints' R function in python

Tags:

python

r

The R ppoints function is described as:

Ordinates for Probability Plotting

Description:

     Generates the sequence of probability points ‘(1:m - a)/(m +
     (1-a)-a)’ where ‘m’ is either ‘n’, if ‘length(n)==1’, or
     ‘length(n)’.

Usage:

     ppoints(n, a = ifelse(n <= 10, 3/8, 1/2))
...

I've been trying to replicate this function in python and I have a couple of doubts.

1- The first m in (1:m - a)/(m + (1-a)-a) is always an integer: int(n) (ie: the integer of n) if length(n)==1 and length(n) otherwise.

2- The second m in the same equation is NOT an integer if length(n)==1 (it assumes the real value of n) and it IS an integer (length(n)) otherwise.

3- The n in a = ifelse(n <= 10, 3/8, 1/2) is the real number n if length(n)==1 and the integer length(n) otherwise.

This points are not made clear at all in the description and I'd very much appreciate if someone could confirm that this is the case.


Add

Well this was initially posted at https://stats.stackexchange.com/ because I was hoping to get the input of staticians who work with the ppoints function. Since it has been migrated here, I'll paste below the function I wrote to replicate ppoints in python. I've tested it and both seem to give back the same results, but I'd be great if someone could clarify the points made above because they are not made at all clear by the function's description.

def ppoints(vector):
    '''
    Mimics R's function 'ppoints'.
    '''

    m_range = int(vector[0]) if len(vector)==1 else len(vector)

    n = vector[0] if len(vector)==1 else len(vector)
    a = 3./8. if n <= 10 else 1./2

    m_value =  n if len(vector)==1 else m_range
    pp_list = [((m+1)-a)/(m_value+(1-a)-a) for m in range(m_range)]

    return pp_list
like image 761
Gabriel Avatar asked Nov 29 '13 19:11

Gabriel


People also ask

What is the function of R in Python?

Functional programming nature of R provides users with extremely simple and compact interface for quick calculations of probabilities and essential descriptive/inferential statistics for a data analysis problem.

What is ppoints in R?

ppoints() is used in qqplot and qqnorm to generate the set of probabilities at which to evaluate the inverse distribution.


1 Answers

I would implement this with numpy:

import numpy as np
def ppoints(n, a):
    """ numpy analogue or `R`'s `ppoints` function
        see details at http://stat.ethz.ch/R-manual/R-patched/library/stats/html/ppoints.html 
        :param n: array type or number"""
    try:
        n = np.float(len(n))
    except TypeError:
        n = np.float(n)
    return (np.arange(n) + 1 - a)/(n + 1 - 2*a)

Sample output:

>>> ppoints(5, 1./2)
array([ 0.1,  0.3,  0.5,  0.7,  0.9])
>>> ppoints(5, 1./4)
array([ 0.13636364,  0.31818182,  0.5       ,  0.68181818,  0.86363636])
>>> n = 10
>>> a = 3./8. if n <= 10 else 1./2
>>> ppoints(n, a)
array([ 0.06097561,  0.15853659,  0.25609756,  0.35365854,  0.45121951,
        0.54878049,  0.64634146,  0.74390244,  0.84146341,  0.93902439])

One can use R fiddle to test implementation.

like image 191
alko Avatar answered Sep 28 '22 17:09

alko