Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

statistics bootstrap library in Python? [closed]

Is there a statistics bootstrap library in Python?

I would like to have functionality similar to what is offered in R bootstrap:

http://statistics.ats.ucla.edu/stat/r/library/bootstrap.htm

Searching I found:

http://mjtokelly.blogspot.com/2006/04/bootstrap-statistics-in-python.html (the link to the code is broken)

http://adorio-research.org/wordpress/?p=9048

https://github.com/cgevans/scikits-bootstrap

but these above do not seem to offer all functionality (in particular the probability weights).

Any pointers?

This got recently added to numpy.random

Thanks

like image 709
gliptak Avatar asked Oct 05 '22 20:10

gliptak


1 Answers

If you're just looking for a python version of R's sample function, try this:

import collections
import random
import bisect

def sample(xs, sample_size = None, replace=False, sample_probabilities = None):
    """Mimics the functionality of http://statistics.ats.ucla.edu/stat/r/library/bootstrap.htm sample()"""

    if not isinstance(xs, collections.Iterable):
        xs = range(xs)
    if not sample_size:
        sample_size = len(xs)            

    if not sample_probabilities:
        if replace:
            return [random.choice(xs) for _ in range(sample_size)]
        else:
            return random.sample(xs, sample_size)
    else:
        if replace:
            total, cdf = 0, []
            for x, p in zip(xs, sample_probabilities):
                total += p
                cdf.append(total)

            return [ xs[ bisect.bisect(cdf, random.uniform(0, total)) ] 
                    for _ in range(sample_size) ]
        else:            
            assert len(sample_probabilities) == len(xs)
            xps = list(zip(xs, sample_probabilities))           
            total = sum(sample_probabilities)
            result = []
            for _ in range(sample_size):
                # choose an item based on weights, and remove it from future iterations.
                # this is slow (N^2), a tree structure for xps would be better (NlogN)
                target = random.uniform(0, total)
                current_total = 0                
                for index, (x,p) in enumerate(xps):
                    current_total += p
                    if current_total > target:
                        xps.pop(index)
                        result.append(x)
                        total -= p
                        break
            return result
like image 50
jnnnnn Avatar answered Oct 13 '22 07:10

jnnnnn