T-test in Pandas

Tags:

If I want to calculate the mean of two categories in Pandas, I can do it like this:

data = {'Category': ['cat2','cat1','cat2','cat1','cat2','cat1','cat2','cat1','cat1','cat1','cat2'],         'values': [1,2,3,1,2,3,1,2,3,5,1]} my_data = DataFrame(data) my_data.groupby('Category').mean()  Category:     values:    cat1     2.666667 cat2     1.600000

I have a lot of data formatted this way, and now I need to do a T-test to see if the mean of cat1 and cat2 are statistically different. How can I do that?

807

asked Nov 15 '12 19:11

hirolau

Video Answer

1 Answers

it depends what sort of t-test you want to do (one sided or two sided dependent or independent) but it should be as simple as:

Click to copy

from scipy.stats import ttest_ind  cat1 = my_data[my_data['Category']=='cat1'] cat2 = my_data[my_data['Category']=='cat2']  ttest_ind(cat1['values'], cat2['values']) >>> (1.4927289925706944, 0.16970867501294376)

it returns a tuple with the t-statistic & the p-value

see here for other t-tests http://docs.scipy.org/doc/scipy/reference/stats.html

154

answered Sep 19 '22 15:09

Gonzalo

Related questions
                            
                                pyyaml: dumping without tags
                            
                                Python: using sys.exit or SystemExit differences and suggestions
                            
                                python equivalent of filter() getting two output lists (i.e. partition of a list)
                            
                                Switching from SQLite to MySQL with Flask SQLAlchemy
                            
                                Can't get argparse to read quoted string with dashes in it?
                            
                                How do I set sys.argv so I can unit test it?
                            
                                Is there an equivalent to the "for ... else" Python loop in C++?
                            
                                How to run script with elevated privilege on windows
                            
                                How do I find the closest values in a Pandas series to an input number?
                            
                                How to create in-memory file object
                            
                                How to make a multidimension numpy array with a varying row size?
                            
                                Is there a quiet version of subprocess.call?
                            
                                __getattr__ for static/class variables in python
                            
                                Get location of the .py source file
                            
                                pandas - change df.index from float64 to unicode or string
                            
                                seaborn scatterplot marker size for ALL markers
                            
                                Class factory in Python
                            
                                generating variable names on fly in python [duplicate]
                            
                                Importing packages in Python
                            
                                How can I get the current contents of an element in webdriver

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

T-test in Pandas

Tags:

python

pandas

statistics

scipy

hypothesis-test

hirolau

People also ask

Video Answer

1 Answers

Gonzalo

Recent Activity

Donate For Us