Assuming sample sizes are not equal, what test do I use to compare sample means under the following circumstances (please correct if any of the following are incorrect):
Normal Distribution = True and Homogeneity of Variance = True
scipy.stats.ttest_ind(sample_1, sample_2)
Normal Distribution = True and Homogeneity of Variance = False
scipy.stats.ttest_ind(sample_1, sample_2, equal_var = False)
Normal Distribution = False and Homogeneity of Variance = True
scipy.stats.mannwhitneyu(sample_1, sample_2)
Normal Distribution = False and Homogeneity of Variance = False
???
A t-test is a statistical test that is used to compare the means of two groups. It is often used in hypothesis testing to determine whether a process or treatment has an effect on the population of interest, or whether two groups are different from one another.
For a comparison of more than two group means the one-way analysis of variance (ANOVA) is the appropriate method instead of the t test.
A paired samples t-test is used to compare the means of two samples when each observation in one sample can be paired with an observation in the other sample.
Normal Distribution = True and Homogeneity of Variance = False and sample sizes > 30-50
scipy.stats.ttest_ind(sample1, sample2, equal_var=False)
If you check the Central limit theorem, it says (from Wikipedia): "In probability theory, the central limit theorem (CLT) states that, given certain conditions, the arithmetic mean of a sufficiently large number of iterates of independent random variables, each with a well-defined (finite) expected value and finite variance, will be approximately normally distributed, regardless of the underlying distribution"
So, although you do not have a normal distributed population, if your sample is big enough (greater than 30 or 50 samples), then the mean of the samples will be normally distributed. So, you can use:
scipy.stats.ttest_ind(sample1, sample2, equal_var=False)
This is a two-sided test for the null hypothesis that 2 independent samples have identical average (expected) values. With the option equal_var = False it performs a Welch’s t-test, which does not assume equal population variance.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With