Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What scipy statistical test do I use to compare sample means?

Assuming sample sizes are not equal, what test do I use to compare sample means under the following circumstances (please correct if any of the following are incorrect):

Normal Distribution = True and Homogeneity of Variance = True

scipy.stats.ttest_ind(sample_1, sample_2)

Normal Distribution = True and Homogeneity of Variance = False

scipy.stats.ttest_ind(sample_1, sample_2, equal_var = False)

Normal Distribution = False and Homogeneity of Variance = True

scipy.stats.mannwhitneyu(sample_1, sample_2)

Normal Distribution = False and Homogeneity of Variance = False

???
like image 905
blahblahblah Avatar asked Jul 31 '14 16:07

blahblahblah


People also ask

What statistical test is used to compare means?

A t-test is a statistical test that is used to compare the means of two groups. It is often used in hypothesis testing to determine whether a process or treatment has an effect on the population of interest, or whether two groups are different from one another.

What type of statistical test would you use to compare means for multiple groups?

For a comparison of more than two group means the one-way analysis of variance (ANOVA) is the appropriate method instead of the t test.

Which test is used to compare the means of two matched samples python?

A paired samples t-test is used to compare the means of two samples when each observation in one sample can be paired with an observation in the other sample.


1 Answers

Fast answer:

Normal Distribution = True and Homogeneity of Variance = False and sample sizes > 30-50

scipy.stats.ttest_ind(sample1, sample2, equal_var=False)

Good answer:

If you check the Central limit theorem, it says (from Wikipedia): "In probability theory, the central limit theorem (CLT) states that, given certain conditions, the arithmetic mean of a sufficiently large number of iterates of independent random variables, each with a well-defined (finite) expected value and finite variance, will be approximately normally distributed, regardless of the underlying distribution"

So, although you do not have a normal distributed population, if your sample is big enough (greater than 30 or 50 samples), then the mean of the samples will be normally distributed. So, you can use:

scipy.stats.ttest_ind(sample1, sample2, equal_var=False)

This is a two-sided test for the null hypothesis that 2 independent samples have identical average (expected) values. With the option equal_var = False it performs a Welch’s t-test, which does not assume equal population variance.

like image 92
ivangtorre Avatar answered Sep 19 '22 09:09

ivangtorre