I need to calculate binomial confidence intervals for large set of data within a script of python. Do you know any function or library of python that can do this? Ideally I would like to have a function like this http://statpages.org/confint.html implemented on python. Thanks for your time.

Just noting because it hasn't been posted elsewhere here that <code>statsmodels.stats.proportion.proportion_confint</code> lets you get a binomial confidence interval with a variety of methods. It only does symmetric intervals, though.

The following gives exact (Clopper-Pearson) interval for binomial distribution in a simple way. <pre class="prettyprint"><code>def binomial_ci(x, n, alpha=0.05): #x is number of successes, n is number of trials from scipy import stats if x==0: c1 = 0 else: c1 = stats.beta.interval(1-alpha, x,n-x+1)[0] if x==n: c2=1 else: c2 = stats.beta.interval(1-alpha, x+1,n-x)[1] return c1, c2 </code></pre> You may check the code by e.g.: <pre class="prettyprint"><code>p1,p2 = binomial_ci(2,7) from scipy import stats assert abs(stats.binom.cdf(1,7,p1)-.975)<1E-5 assert abs(stats.binom.cdf(2,7,p2)-.025)<1E-5 assert abs(binomial_ci(0,7, alpha=.1)[0])<1E-5 assert abs((1-binomial_ci(0,7, alpha=.1)[1])**7-0.05)<1E-5 assert abs(binomial_ci(7,7, alpha=.1)[1]-1)<1E-5 assert abs((binomial_ci(7,7, alpha=.1)[0])**7-0.05)<1E-5 </code></pre> I used the relation between the binomial proportion confidence interval and the regularized incomplete beta function, as described here: https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval#Clopper%E2%80%93Pearson_interval

Is there any python function/library for calculate binomial confidence intervals?

8 Answers

Just noting because it hasn't been posted elsewhere here that statsmodels.stats.proportion.proportion_confint lets you get a binomial confidence interval with a variety of methods. It only does symmetric intervals, though.

106

answered Oct 01 '22 05:10

Danica

I would say that R (or another stats package) would probably serve you better if you have the option. That said, if you only need the binomial confidence interval you probably don't need an entire library. Here's the function in my most naive translation from javascript.

def binP(N, p, x1, x2):
    p = float(p)
    q = p/(1-p)
    k = 0.0
    v = 1.0
    s = 0.0
    tot = 0.0

    while(k<=N):
            tot += v
            if(k >= x1 and k <= x2):
                    s += v
            if(tot > 10**30):
                    s = s/10**30
                    tot = tot/10**30
                    v = v/10**30
            k += 1
            v = v*q*(N+1-k)/k
    return s/tot

def calcBin(vx, vN, vCL = 95):
    '''
    Calculate the exact confidence interval for a binomial proportion

    Usage:
    >>> calcBin(13,100)    
    (0.07107391357421874, 0.21204372406005856)
    >>> calcBin(4,7)   
    (0.18405151367187494, 0.9010086059570312)
    ''' 
    vx = float(vx)
    vN = float(vN)
    #Set the confidence bounds
    vTU = (100 - float(vCL))/2
    vTL = vTU

    vP = vx/vN
    if(vx==0):
            dl = 0.0
    else:
            v = vP/2
            vsL = 0
            vsH = vP
            p = vTL/100

            while((vsH-vsL) > 10**-5):
                    if(binP(vN, v, vx, vN) > p):
                            vsH = v
                            v = (vsL+v)/2
                    else:
                            vsL = v
                            v = (v+vsH)/2
            dl = v

    if(vx==vN):
            ul = 1.0
    else:
            v = (1+vP)/2
            vsL =vP
            vsH = 1
            p = vTU/100
            while((vsH-vsL) > 10**-5):
                    if(binP(vN, v, 0, vx) < p):
                            vsH = v
                            v = (vsL+v)/2
                    else:
                            vsL = v
                            v = (v+vsH)/2
            ul = v
    return (dl, ul)

answered Oct 02 '22 05:10

Curt

While the scipy.stats module has a method .interval() to compute the equal tails confidence, it lacks a similar method to compute the highest density interval. Here is a rough way to do it using methods found in scipy and numpy.

This solution also assumes you want to use a Beta distribution as a prior. The hyper-parameters a and b are set to 1, so that the default prior is a uniform distribution between 0 and 1.

import numpy
from scipy.stats import beta
from scipy.stats import norm

def binomial_hpdr(n, N, pct, a=1, b=1, n_pbins=1e3):
    """
    Function computes the posterior mode along with the upper and lower bounds of the
    **Highest Posterior Density Region**.

    Parameters
    ----------
    n: number of successes 
    N: sample size 
    pct: the size of the confidence interval (between 0 and 1)
    a: the alpha hyper-parameter for the Beta distribution used as a prior (Default=1)
    b: the beta hyper-parameter for the Beta distribution used as a prior (Default=1)
    n_pbins: the number of bins to segment the p_range into (Default=1e3)

    Returns
    -------
    A tuple that contains the mode as well as the lower and upper bounds of the interval
    (mode, lower, upper)

    """
    # fixed random variable object for posterior Beta distribution
    rv = beta(n+a, N-n+b)
    # determine the mode and standard deviation of the posterior
    stdev = rv.stats('v')**0.5
    mode = (n+a-1.)/(N+a+b-2.)
    # compute the number of sigma that corresponds to this confidence
    # this is used to set the rough range of possible success probabilities
    n_sigma = numpy.ceil(norm.ppf( (1+pct)/2. ))+1
    # set the min and max values for success probability 
    max_p = mode + n_sigma * stdev
    if max_p > 1:
        max_p = 1.
    min_p = mode - n_sigma * stdev
    if min_p > 1:
        min_p = 1.
    # make the range of success probabilities
    p_range = numpy.linspace(min_p, max_p, n_pbins+1)
    # construct the probability mass function over the given range
    if mode > 0.5:
        sf = rv.sf(p_range)
        pmf = sf[:-1] - sf[1:]
    else:
        cdf = rv.cdf(p_range)
        pmf = cdf[1:] - cdf[:-1]
    # find the upper and lower bounds of the interval 
    sorted_idxs = numpy.argsort( pmf )[::-1]
    cumsum = numpy.cumsum( numpy.sort(pmf)[::-1] )
    j = numpy.argmin( numpy.abs(cumsum - pct) )
    upper = p_range[ (sorted_idxs[:j+1]).max()+1 ]
    lower = p_range[ (sorted_idxs[:j+1]).min() ]    

    return (mode, lower, upper)

answered Sep 30 '22 05:09

mtw729

Just been trying this myself. If it helps here's my solution, which takes two lines of code and seems to give equivalent results to that JS page. This is the frequentist one-sided interval, I'm calling the input argument the MLE (maximum likelihood estimate) of the binomial parameter theta. I.e. mle = number of successes/number of trials. I find the upper bound of the one sided interval. The alpha value used here is therefore double the one in the JS page for the upper limit.

from scipy.stats import binom
from scipy.optimize import bisect

def binomial_ci( mle, N, alpha=0.05 ):
    """
    One sided confidence interval for a binomial test.

    If after N trials we obtain mle as the proportion of those
    trials that resulted in success, find c such that

    P(k/N < mle; theta = c) = alpha

    where k/N is the proportion of successes in the set of trials,
    and theta is the success probability for each trial. 
    """


    to_minimise = lambda c: binom.cdf(mle*N,N,c)-alpha
    return bisect(to_minimise,0,1)

To find the two sided interval, call with (1-alpha/2) and alpha/2 as arguments.

answered Sep 28 '22 05:09

James Thorniley

I needed to do this as well. I was using R and wanted to learn a way to work it out for myself. I would not say it is strictly pythonic.

The docstring explains most of it. It assumes you have scipy installed.

def exact_CI(x, N, alpha=0.95):
    """
    Calculate the exact confidence interval of a proportion 
    where there is a wide range in the sample size or the proportion.

    This method avoids the assumption that data are normally distributed. The sample size
    and proportion are desctibed by a beta distribution.

    Parameters
    ----------

    x: the number of cases from which the proportion is calulated as a positive integer.

    N: the sample size as a positive integer.

    alpha : set at 0.95 for 95% confidence intervals.

    Returns
    -------
    The proportion with the lower and upper confidence intervals as a dict.

    """
    from scipy.stats import beta
    x = float(x)
    N = float(N)
    p = round((x/N)*100,2)

    intervals = [round(i,4)*100 for i in beta.interval(alpha,x,N-x+1)]
    intervals.insert(0,p)

    result = {'Proportion': intervals[0], 'Lower CI': intervals[1], 'Upper CI': intervals[2]}

    return result

answered Sep 29 '22 05:09

John

A numpy/scipy-free way of computing the same thing using the Wilson score and an approximation to the normal cumulative density function,

import math

def binconf(p, n, c=0.95):
  '''
  Calculate binomial confidence interval based on the number of positive and
  negative events observed.

  Parameters
  ----------
  p: int
      number of positive events observed
  n: int
      number of negative events observed
  c : optional, [0,1]
      confidence percentage. e.g. 0.95 means 95% confident the probability of
      success lies between the 2 returned values

  Returns
  -------
  theta_low  : float
      lower bound on confidence interval
  theta_high : float
      upper bound on confidence interval
  '''
  p, n = float(p), float(n)
  N    = p + n

  if N == 0.0: return (0.0, 1.0)

  p = p / N
  z = normcdfi(1 - 0.5 * (1-c))

  a1 = 1.0 / (1.0 + z * z / N)
  a2 = p + z * z / (2 * N)
  a3 = z * math.sqrt(p * (1-p) / N + z * z / (4 * N * N))

  return (a1 * (a2 - a3), a1 * (a2 + a3))


def erfi(x):
  """Approximation to inverse error function"""
  a  = 0.147  # MAGIC!!!
  a1 = math.log(1 - x * x)
  a2 = (
    2.0 / (math.pi * a)
    + a1 / 2.0
  )

  return (
    sign(x) *
    math.sqrt( math.sqrt(a2 * a2 - a1 / a) - a2 )
  )


def sign(x):
  if x  < 0: return -1
  if x == 0: return  0
  if x  > 0: return  1


def normcdfi(p, mu=0.0, sigma2=1.0):
  """Inverse CDF of normal distribution"""
  if mu == 0.0 and sigma2 == 1.0:
    return math.sqrt(2) * erfi(2 * p - 1)
  else:
    return mu + math.sqrt(sigma2) * normcdfi(p)

answered Oct 02 '22 05:10

duckworthd

The following gives exact (Clopper-Pearson) interval for binomial distribution in a simple way.

def binomial_ci(x, n, alpha=0.05):
    #x is number of successes, n is number of trials
    from scipy import stats
    if x==0:
        c1 = 0
    else:
        c1 = stats.beta.interval(1-alpha, x,n-x+1)[0]
    if x==n:
        c2=1
    else:
        c2 = stats.beta.interval(1-alpha, x+1,n-x)[1]
    return c1, c2

You may check the code by e.g.:

p1,p2 = binomial_ci(2,7)
from scipy import stats
assert abs(stats.binom.cdf(1,7,p1)-.975)<1E-5
assert abs(stats.binom.cdf(2,7,p2)-.025)<1E-5
assert abs(binomial_ci(0,7, alpha=.1)[0])<1E-5
assert abs((1-binomial_ci(0,7, alpha=.1)[1])**7-0.05)<1E-5
assert abs(binomial_ci(7,7, alpha=.1)[1]-1)<1E-5
assert abs((binomial_ci(7,7, alpha=.1)[0])**7-0.05)<1E-5

I used the relation between the binomial proportion confidence interval and the regularized incomplete beta function, as described here: https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval#Clopper%E2%80%93Pearson_interval

answered Sep 30 '22 05:09

Dani

Astropy provides such a function (although installing and importing astropy may be a bit excessive): astropy.stats.binom_conf_interval

answered Sep 28 '22 05:09

TheBamf

Related questions
                            
                                Replacing values in a data.frame according to a value from an other data.frame with the same shape (Python)
                            
                                Pandas explode dictionary to rows
                            
                                Cannot use keras models on Mac M1 with BigSur
                            
                                Permission denied with pip install --user -e /home/me/package/
                            
                                Convert result from groupby on multiple columns to list of dictionaries
                            
                                How do I tell sympy that i^2 = -1?
                            
                                Multiple Hidden Imports in Pyinstaller
                            
                                Pandas group by and sum, but create a new row when a certain amount is exceeded
                            
                                Building wheel for cffi (setup.py) ... error while installing the packages from requirements.txt in django
                            
                                Looping tasks in Prefect
                            
                                pip install py-find-1st fails on ubuntu20 & centos with python3.9
                            
                                Create hierarchy column in pandas
                            
                                `multiprocessing.Process` are modifying non-shared variables they should not have access to
                            
                                Troubles with downloading and saving a document in django
                            
                                Find indices for elements in array B best matching those in array A
                            
                                How to rename the first column of a pandas dataframe?
                            
                                Pandas: reading multi-index JSON as pandas data frame
                            
                                How to determine which points are inside of a polygon and which are not (large number of points)?
                            
                                Allowing only single active session per user in Django app
                            
                                Retry on deadlock for MySQL / SQLAlchemy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there any python function/library for calculate binomial confidence intervals?

Tags:

python

statistics

Geparada

People also ask