Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python statsmodel: tukey HSD plot not working

Trying to figure out how to calculate Tukey's HSD with statsmodel. I could make it work and the results look great but there's a plot of the differences of the means that I can't see. Must be something silly I'm doing.

It's the method plot_simultaneous from the object TukeyHSDResults (see doc).

This is the code I'm using to try:

import pandas as pd
import numpy as np
from sklearn.cross_validation import train_test_split
from scipy import stats
from statsmodels.stats.multicomp import (pairwise_tukeyhsd,
                                         MultiComparison)

red_wine = pd.DataFrame.from_csv(".../winequality-red.csv",
                                  sep=';', header=0, index_col=False)
white_wine = pd.DataFrame.from_csv(".../winequality-white.csv",
                                    sep=';', header=0, index_col=False)
white1, white2 = train_test_split(white_wine['quality'], test_size=0.5, random_state=1812)

# compute anova
f, p = stats.f_oneway(red_wine['quality'], white1, white2)
print("F value: " + str(f))
print("p value: " + str(p))

# tukey HSD
red = pd.DataFrame(red_wine['quality'], columns=['quality'])
red['wine'] = map(lambda x: 'red', red['quality'])
w1 = pd.DataFrame(white1, columns=['quality'])
w1['wine'] = map(lambda x: 'white1', w1['quality'])
w2 = pd.DataFrame(white2, columns=['quality'])
w2['wine'] = map(lambda x: 'white2', w2['quality'])
total = pd.concat([red, w1, w2], axis=0)

res = pairwise_tukeyhsd(endog=total['quality'], groups=total['wine'], alpha=0.01)
print(res.summary())
res.plot_simultaneous()
mod = MultiComparison(total['quality'], total['wine'])
results = mod.tukeyhsd(0.01)
## plot does not work!
results.plot_simultaneous()

The csv files are public datasets and can be obtained from here. A bit of explanation of the code: I split the white wines randomly so that I know that these 2 samples come from the same population, and the 3rd sample for red ones is from a different population. Just a simple setup to try the library.

I've tried both with pydev and just in case with ipython notebook. In pydev I get silence and no graph, and in notebook I get this laconic output:

In [3]: results.plot_simultaneous('red')
Out[3]: <matplotlib.figure.Figure at 0x1105b2b90>

Not so much experience in python/pandas, but I usually do get to see the plots if I insist long enough.

I've also tried with the example in the documentation (link above):

In [3]:

cylinders = np.array([8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 4, 8, 8, 8, 8, 8, 8, 8, 8, 8, 4, 6, 6, 6, 4, 4, 
                    4, 4, 4, 4, 6, 8, 8, 8, 8, 4, 4, 4, 4, 8, 8, 8, 8, 6, 6, 6, 6, 4, 4, 4, 4, 6, 6, 
                    6, 6, 4, 4, 4, 4, 4, 8, 4, 6, 6, 8, 8, 8, 8, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 
                    4, 4, 4, 4, 4, 4, 4, 6, 6, 4, 6, 4, 4, 4, 4, 4, 4, 4, 4])
cyl_labels = np.array(['USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'France', 
    'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'Japan', 'USA', 'USA', 'USA', 'Japan', 
    'Germany', 'France', 'Germany', 'Sweden', 'Germany', 'USA', 'USA', 'USA', 'USA', 'USA', 'Germany', 
    'USA', 'USA', 'France', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'Germany', 
    'Japan', 'USA', 'USA', 'USA', 'USA', 'Germany', 'Japan', 'Japan', 'USA', 'Sweden', 'USA', 'France', 
    'Japan', 'Germany', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 'USA', 
    'Germany', 'Japan', 'Japan', 'USA', 'USA', 'Japan', 'Japan', 'Japan', 'Japan', 'Japan', 'Japan', 'USA', 
    'USA', 'USA', 'USA', 'Japan', 'USA', 'USA', 'USA', 'Germany', 'USA', 'USA', 'USA'])
from statsmodels.stats.multicomp import MultiComparison
cardata = MultiComparison(cylinders, cyl_labels)
results = cardata.tukeyhsd()
results.plot_simultaneous()
Out[3]:
<matplotlib.figure.Figure at 0x10b5bb610>

Same results.

like image 773
lrnzcig Avatar asked Nov 10 '22 16:11

lrnzcig


1 Answers

You probably need to tell matplotlib which backend to use.

In ipython notebook try adding a line with %matplotlib inline before importing matplotlib.

In a module that you run outside of notebook try adding:

import matplotlib
matplotlib.use('qtagg')

See here for more information on the backends.

like image 99
Primer Avatar answered Nov 14 '22 21:11

Primer