Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to conduct hypothesis testing in Python?

I would ideally like to find the P value. I come from more of a statistics background and am fairly new to Python. Are there any packages that will allow me to do this? I am following the "Data Science from Scratch" book and am sort of stuck on Hypothesis Testing and Inference.

like image 897
rmahesh Avatar asked May 16 '26 18:05

rmahesh


1 Answers

SciPy package has a whole module with lots of statistical stuff, including hypothesis tests and build-in distribution functions: scipy.stats

For example, this is how you can test if a random sample is normally distributed using the Kolmogorov-Smirnov test:

import numpy as np
from scipy.stats import norm, pareto, kstest

n = 1000
sample_norm = norm.rvs(size=1000)  # generate normally distributed random sample
sample_pareto = pareto.rvs(1.0, size=1000)  # sample from some other distribution for comparison

d_norm, p_norm = kstest(sample_norm, norm.cdf)  # test if the sample_norm is distributed normally (correct hypothesis)
d_pareto, p_pareto = kstest(sample_pareto, norm.cdf)  # test if the sample_pareto is distributed normally (false hypothesis)

print('Statistic values: %.4f, %.4f' % (d_norm, d_pareto))
print('P-values: %.4f, %.4f' % (p_norm, p_pareto))

As you can see kstest returns the value of the statistic and the p-value. norm.cdf stands for the cumulative distribution function of a normal random variable.

like image 167
Slippy Avatar answered May 21 '26 06:05

Slippy