Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Confidence interval for the difference between two proportions in Python

For example, in an AB test the A population could have 1000 data points, of which 100 are successes. While B could have 2000 data points and 220 successes. This gives A a success proportion of 0.1 and B 0.11, the delta of which is 0.01. How can I calculate this confidence interval around this delta in python?

Stats models can do this for one sample, but seemingly does not have a package to deal with the difference between two samples as is necessary for an AB test. (http://www.statsmodels.org/dev/generated/statsmodels.stats.proportion.proportion_confint.html)

like image 280
Johnny V Avatar asked Nov 30 '17 10:11

Johnny V


People also ask

How do you find the 95 confidence interval in Python?

Create a new sample based on our dataset, with replacement and with the same number of points. Calculate the mean value and store it in an array or list. Repeat the process many times (e.g. 1000) On the list of the mean values, calculate 2.5th percentile and 97.5th percentile (if you want a 95% confidence interval)


2 Answers

I couldn't find a function for this from Statsmodels. However, this website goes over the maths for generating the confidence interval as well as being the source of the below function:

def two_proprotions_confint(success_a, size_a, success_b, size_b, significance = 0.05):
    """
    A/B test for two proportions;
    given a success a trial size of group A and B compute
    its confidence interval;
    resulting confidence interval matches R's prop.test function

    Parameters
    ----------
    success_a, success_b : int
        Number of successes in each group

    size_a, size_b : int
        Size, or number of observations in each group

    significance : float, default 0.05
        Often denoted as alpha. Governs the chance of a false positive.
        A significance level of 0.05 means that there is a 5% chance of
        a false positive. In other words, our confidence level is
        1 - 0.05 = 0.95

    Returns
    -------
    prop_diff : float
        Difference between the two proportion

    confint : 1d ndarray
        Confidence interval of the two proportion test
    """
    prop_a = success_a / size_a
    prop_b = success_b / size_b
    var = prop_a * (1 - prop_a) / size_a + prop_b * (1 - prop_b) / size_b
    se = np.sqrt(var)

    # z critical value
    confidence = 1 - significance
    z = stats.norm(loc = 0, scale = 1).ppf(confidence + significance / 2)

    # standard formula for the confidence interval
    # point-estimtate +- z * standard-error
    prop_diff = prop_b - prop_a
    confint = prop_diff + np.array([-1, 1]) * z * se
    return prop_diff, confint
like image 127
Johnny V Avatar answered Sep 19 '22 03:09

Johnny V


The sample sizes don't have to be equal. The confidence interval for two proportions is enter image description here

p1 and p2 are the observed probabilities, computed over their respective samples n1 and n2.

For more please see this white paper.

like image 40
Igor Urisman Avatar answered Sep 17 '22 03:09

Igor Urisman