How to calculate (statistical) power function vs. sample size in python?

Tags:

How can this be done in python?

Calculate sample size for a given power and alpha?
Calculate power for a given sample size and alpha?

Note: I am totally confused :( with the functions that python gives for (statistical) power function calculation.

Can someone help me to make an order here?

There are two functions under statsmodels:

from statsmodels.stats.power import ttest_power, tt_ind_solve_power()

We have:

tt_ind_solve_power(effect_size=effect_size, alpha=alpha, power=0.8, ratio=1, alternative='two-sided')

And we have also:

ttest_power(0.2, nobs=sampleSize, alpha=alpha, alternative='two-sided')

There is also this peice of code:

import statsmodels.stats.api as sms
es = sms.proportion_effectsize(prop1, prop2, method='normal')
n = sms.NormalIndPower().solve_power(es, power=0.9, alpha=0.05, ratio=2)

I found this example somewhere but it does NOT explain what are prop1 and prop2!

Each one is giving me different values.

thanks

495

asked Nov 15 '17 05:11

Samer Aamar

1 Answers

As an answer to the question above, I wrote this function that calculates power vs. sample size.

When calling the tt_ind_solve_power you need to leave one parameter as None in order to be calculated. In my example below i keep power as None.

I hope that someone will find it useful, any improvements are welcome.

from statsmodels.stats.power import  tt_ind_solve_power
from scipy.interpolate import interp1d
import matplotlib.pyplot as plt

def test_ttest_power_diff(mean, std, sample1_size=None, alpha=0.05, desired_power=0.8, mean_diff_percentages=[0.1, 0.05]):
    '''
    calculates the power function for a given mean and std. the function plots a graph showing the comparison between desired mean differences
    :param mean: the desired mean
    :param std: the std value
    :param sample1_size: if None, it is assumed that both samples (first and second) will have same size. The function then will
    walk through possible sample sizes (up to 100, hardcoded).
    If this value is not None, the function will check different alternatives for sample 2 sizes up to sample 1 size.
    :param alpha: alpha default value is 0.05
    :param desired_power: will use this value in order to mark on the graph
    :param mean_diff_percentages: iterable list of percentages. A line per value will be calculated and plotted.
    :return: None
    '''
    fig, ax = plt.subplots()
    for mean_diff_percent in mean_diff_percentages:
        mean_diff = mean_diff_percent * mean
        effect_size = mean_diff / std

        print('Mean diff: ', mean_diff)
        print('Effect size: ', effect_size)

        powers = []

        max_size  = sample1_size
        if sample1_size is None:
            max_size = 100

        sizes = np.arange(1, max_size, 2)
        for sample2_size in sizes:
            if(sample1_size is None):
                n = tt_ind_solve_power(effect_size=effect_size, nobs1=sample2_size, alpha=alpha, ratio=1.0, alternative='two-sided')
                print('tt_ind_solve_power(alpha=', alpha, 'sample2_size=', sample2_size, '): sample size in *second* group: {:.5f}'.format(n))
            else:
                n = tt_ind_solve_power(effect_size=effect_size, nobs1=sample1_size, alpha=alpha, ratio=(1.0*sample2_size/sample1_size), alternative='two-sided')
                print('tt_ind_solve_power(alpha=', alpha, 'sample2_size=', sample2_size, '): sample size *each* group: {:.5f}'.format(n))

            powers.append(n)

        try: # mark the desired power on the graph
            z1 = interp1d(powers, sizes)
            results = z1(desired_power)

            plt.plot([results], [desired_power], 'gD')
        except Exception as e:
            print("Error: ", e)
            #ignore

        plt.title('Power vs. Sample Size')
        plt.xlabel('Sample Size')
        plt.ylabel('Power')

        plt.plot(sizes, powers, label='diff={:2.0f}%'.format(100*mean_diff_percent)) #, '-gD')

    plt.legend()
    plt.show()

As an example, if you call this function with mean=10 and std=2 you will get this plot:

Power vs. Sample Size (avg=10, std=2)

answered Oct 07 '22 02:10

Samer Aamar

Related questions
                            
                                Replace one part text element in a for loop Python
                            
                                Fast, Real-time plotting of points using pyqtgraph and a LiDAR
                            
                                List comprehension without 'for' [duplicate]
                            
                                Python Pandas: Denormalize data from one data frame into another
                            
                                Pandas Get Day of Week from date type column
                            
                                Group python pandas dataframe per weeks (starting on Monday)
                            
                                Retrieve final hidden activation layer output from sklearn's MLPClassifier
                            
                                PyTest fails on a directory with multiple test files when modularizing fixtures
                            
                                Python timedelta object with negative values
                            
                                Online or batch training by default in tensorflow
                            
                                How to check is numpy 2d array "surrounded" by zeros
                            
                                How to crop an image from the center with certain dimensions?
                            
                                How to retrieve all the css properties of an element using selenium python?
                            
                                Understanding the matrix output of Tfidfvectorizer in Sklearn
                            
                                Celery 4 not auto-discovering tasks
                            
                                Pass a list of strings as parameter of a dependant task in Airflow
                            
                                ParameterVailidation Failed When Sending List to DynamoDB
                            
                                How to use tf.data.Dataset.padded_batch with a nested shape?
                            
                                Pandas Loc select by index as well as boolean condition in single expression
                            
                                How can I use a custom function within an expression using the eval dataframe method?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to calculate (statistical) power function vs. sample size in python?

Tags:

python

statistics

sample-size

t-test

Samer Aamar

People also ask

1 Answers

Samer Aamar

Recent Activity

Donate For Us