ANOVA in python using pandas dataframe with statsmodels or scipy?

Tags:

I want to use the Pandas dataframe to breakdown the variance in one variable.

For example, if I have a column called 'Degrees', and I have this indexed for various dates, cities, and night vs. day, I want to find out what fraction of the variation in this series is coming from cross-sectional city variation, how much is coming from time series variation, and how much is coming from night vs. day.

In Stata I would use Fixed effects and look at the R^2. Hopefully my question makes sense.

Basically, what I want to do, is find the ANOVA breakdown of "Degrees" by three other columns.

906

asked Aug 27 '14 21:08

wolfsatthedoor

1 Answers

I set up a direct comparison to test them, found that their assumptions can differ slightly , got a hint from a statistician, and here is an example of ANOVA on a pandas dataframe matching R's results:

import pandas as pd import statsmodels.api as sm from statsmodels.formula.api import ols   # R code on R sample dataset  #> anova(with(ChickWeight, lm(weight ~ Time + Diet))) #Analysis of Variance Table # #Response: weight #           Df  Sum Sq Mean Sq  F value    Pr(>F) #Time        1 2042344 2042344 1576.460 < 2.2e-16 *** #Diet        3  129876   43292   33.417 < 2.2e-16 *** #Residuals 573  742336    1296 #write.csv(file='ChickWeight.csv', x=ChickWeight, row.names=F)  cw = pd.read_csv('ChickWeight.csv')  cw_lm=ols('weight ~ Time + C(Diet)', data=cw).fit() #Specify C for Categorical print(sm.stats.anova_lm(cw_lm, typ=2)) #                  sum_sq   df            F         PR(>F) #C(Diet)    129876.056995    3    33.416570   6.473189e-20 #Time      2016357.148493    1  1556.400956  1.803038e-165 #Residual   742336.119560  573          NaN            NaN

answered Oct 04 '22 09:10

cphlewis

Related questions
                            
                                Installing py-ldap on Mac OS X Mavericks (missing sasl.h)
                            
                                Python, installing clarifai --> VS14.0 link.exe failed with exit status 1158
                            
                                Why does '12345'.count('') return 6 and not 5?
                            
                                ImportError: No module named 'email.mime'; email is not a package [duplicate]
                            
                                Why does Python implementation and Java implementation of OpenCV's MSER create different output?
                            
                                Spyder, Run script located on remote server
                            
                                Windows: Z3Exception("init(Z3_LIBRARY_PATH) must be invoked before using Z3-python")
                            
                                Idiomatic way to collect & report multiple exceptions in Python
                            
                                How can I configure Pylint to check all things PEP8 checks?
                            
                                Importing local module (python script) in Airflow DAG
                            
                                Is there any way to get vim to auto wrap python strings at 79 chars?
                            
                                Pygame water ripple effect
                            
                                Is there any working memory profiler for Python3 [closed]
                            
                                Python program hangs forever when called from subprocess
                            
                                Python dependency hell: A compromise between virtualenv and global dependencies?
                            
                                Python alignment of assignments (style) [closed]
                            
                                Can I control the architecture (32bit vs 64bit) when building a pyinstaller executable?
                            
                                What's the difference between loop.create_task, asyncio.async/ensure_future and Task?
                            
                                What is the difference between mocking and monkey patching?
                            
                                Is python package virtualenv necessary when I use python 3.3?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

ANOVA in python using pandas dataframe with statsmodels or scipy?

Tags:

python

pandas

scipy

statsmodels

anova

wolfsatthedoor

People also ask

1 Answers

cphlewis

Recent Activity

Donate For Us