Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Statistical tests: how do (perception; actual results; and next) interact?

What is the interaction between perception, outcome, and outlook?

I've brought them into categorical variables to [potentially] simplify things.

import pandas as pd
import numpy as np

high, size = 100, 20
df = pd.DataFrame({'perception': np.random.randint(0, high, size),
                   'age': np.random.randint(0, high, size),
                   'smokes_cat': pd.Categorical(np.tile(['lots', 'little', 'not'],
                                                        size//3+1)[:size]),
                   'outcome': np.random.randint(0, high, size),
                   'outlook_cat': pd.Categorical(np.tile(['positive', 'neutral',
                                                          'negative'],
                                                          size//3+1)[:size])
                  })
df.insert(2, 'age_cat', pd.Categorical(pd.cut(df.age, range(0, high+5, size//2),
                                              right=False, labels=[
                                               "{0} - {1}".format(i, i + 9)
                                               for i in range(0, high, size//2)])))

def tierify(i):
    if i <= 25:
        return 'lowest'
    elif i <= 50:
        return 'low'
    elif i <= 75:
        return 'med'
    return 'high'

df.insert(1, 'perception_cat', df['perception'].map(tierify))
df.insert(6, 'outcome_cat', df['outcome'].map(tierify))

np.random.shuffle(df['smokes_cat'])

Run online: http://ideone.com/fftuSv or https://repl.it/repls/MicroLeftSequences


This is faked data but should present the idea. The individual have a perceived view perception, then they are presented with actual outcome, and from that can decide their outlook.

Using Python (pandas, or anything open-source really), how do I show the probability—and p-value—of the interaction between these 3 dependent columns (possibly using the age, smokes_cat as potential confounders)?

like image 213
A T Avatar asked Nov 07 '22 14:11

A T


1 Answers

You can use interaction plots for this particular purpose. This fits pretty well to your case. I would use such plot for your data. I've tried it for your dummy data generated in the question, and you can write your code like below. Think it as a pseudo-code though, you must tailor the code to your need.

In its simple form:

  • If the lines in the plot have an intersection or likely to have for other values, then you may assume that there is an interaction effect.
  • If the lines are parellel or not likely to have an intersection, then you assume there is no interaction effect.

Yet, for additional and deeper understanding, I placed some links that you can check out.

Code

... # The rest of the code in the question.

# Interaction plot
import matplotlib.pyplot as plt
from statsmodels.graphics.factorplots import interaction_plot

p = interaction_plot(
               x = df['perception'],
               trace=df['outlook_cat'],
               response= df['outcome']
     )
plt.savefig('./my_interaction_plot.png') # or plt.show()

You can find the documentation of interaction_plot() here. Besides, I also suggest you run an ANOVA.

Further reading

You can check out these links:

  • (A paper) titled Interaction Effects in ANOVA.
  • (A case) in practice case.
  • (Another case) in practice case.
like image 151
null Avatar answered Nov 15 '22 18:11

null