Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Visualising a three way interaction between two continuous variables and one categorical variable in R

I have a model in R that includes a significant three-way interaction between two continuous independent variables IVContinuousA, IVContinuousB, IVCategorical and one categorical variable (with two levels: Control and Treatment). The dependent variable is continuous (DV).

model <- lm(DV ~ IVContinuousA * IVContinuousB * IVCategorical)

You can find the data here

I am trying to find out a way to visualise this in R to ease my interpretation of it (perhaps in ggplot2?).

Somewhat inspired by this blog post I thought that I could dichotomise IVContinuousB into high and low values (so it would be a two-level factor itself:

IVContinuousBHigh <- mean(IVContinuousB) + sd (IVContinuousB) 
IVContinuousBLow <- mean(IVContinuousB) - sd (IVContinuousB)

I then planned to plot the relationship between DV and IV ContinuousA and fit lines representing the slopes of this relationship for different combinations of IVCategorical and my new dichotomised IVContinuousB:

IVCategoricalControl and IVContinuousBHigh
IVCategoricalControl and IVContinuousBLow
IVCategoricalTreatment and IVContinuousBHigh
IVCategoricalTreatment and IVContinuousBLow

My first question is does this sound like a viable solution to producing an interpretable plot of this three-way-interaction? I want to avoid 3D plots if possible as I don't find them intuitive... Or is there another way to go about it? Maybe facet plots for the different combinations above?

If it is an ok solution, my second question is how to I generate the data to predict the fit lines to represent the different combinations above?

Third question - does anyone have any advice as to how to code this up in ggplot2?

I posted a very similar question on Cross Validated but because it is more code related I thought I would try here instead (I will remove the CV post if this one is more relevant to the community :) )

Thanks so much in advance,

Sarah

Note that there are NAs (left as blanks) in the DV column and the design is unbalanced - with slightly different numbers of datapoints in the Control vs Treatment groups of the variable IVCategorical.

FYI I have the code for visaualising a two-way interaction between IVContinuousA and IVCategorical:

A<-ggplot(data=data,aes(x=AOTAverage,y=SciconC,group=MisinfoCondition,shape=MisinfoCondition,col = MisinfoCondition,))+geom_point(size = 2)+geom_smooth(method='lm',formula=y~x)

But what I want is to plot this relationship conditional on IVContinuousB....

like image 549
Sarah Avatar asked Mar 03 '18 16:03

Sarah


2 Answers

Here are a couple of options for visualizing the model output in two dimensions. I'm assuming here that the goal here is to compare Treatment to Control

library(tidyverse)
  theme_set(theme_classic() +
          theme(panel.background=element_rect(colour="grey40", fill=NA))

dat = read_excel("Some Data.xlsx")  # I downloaded your data file

mod <- lm(DV ~ IVContinuousA * IVContinuousB * IVCategorical, data=dat)

# Function to create prediction grid data frame
make_pred_dat = function(data=dat, nA=20, nB=5) {
  nCat = length(unique(data$IVCategorical))
  d = with(data, 
           data.frame(IVContinuousA=rep(seq(min(IVContinuousA), max(IVContinuousA), length=nA), nB*2),
                      IVContinuousB=rep(rep(seq(min(IVContinuousB), max(IVContinuousB), length=nB), each=nA), nCat),
                      IVCategorical=rep(unique(IVCategorical), each=nA*nB)))

  d$DV = predict(mod, newdata=d)

  return(d)
}

IVContinuousA vs. DV by levels of IVContinuousB

The roles of IVContinuousA and IVContinuousB can of course be switched here.

ggplot(make_pred_dat(), aes(x=IVContinuousA, y=DV, colour=IVCategorical)) + 
  geom_line() +
  facet_grid(. ~ round(IVContinuousB,2)) +
  ggtitle("IVContinuousA vs. DV, by Level of IVContinousB") +
  labs(colour="")

enter image description here

You can make a similar plot without faceting, but it gets difficult to interpret as the number of IVContinuousB levels increases:

ggplot(make_pred_dat(nB=3), 
       aes(x=IVContinuousA, y=DV, colour=IVCategorical, linetype=factor(round(IVContinuousB,2)))) + 
  geom_line() +
  #facet_grid(. ~ round(IVContinuousB,2)) +
  ggtitle("IVContinuousA vs. DV, by Level of IVContinousB") +
  labs(colour="", linetype="IVContinuousB") +
  scale_linetype_manual(values=c("1434","11","62")) +
  guides(linetype=guide_legend(reverse=TRUE))

enter image description here

Heat map of the model-predicted difference, DV treatment - DV control on a grid of IVContinuousA and IVContinuousB values

Below, we look at the difference between treatment and control at each pair of IVContinuousA and IVContinuousB.

ggplot(make_pred_dat(nA=100, nB=100) %>% 
         group_by(IVContinuousA, IVContinuousB) %>% 
         arrange(IVCategorical) %>% 
         summarise(DV = diff(DV)), 
       aes(x=IVContinuousA, y=IVContinuousB)) + 
  geom_tile(aes(fill=DV)) +
  scale_fill_gradient2(low="red", mid="white", high="blue") +
  labs(fill=expression(Delta*DV~(Treatment - Control)))

enter image description here

like image 146
eipi10 Avatar answered Sep 22 '22 11:09

eipi10


If you really want to avoid 3-d plotting, you could indeed turn one of the continuous variables into a categorical one for visualization purposes.

For the purpose of the answer, I used the Duncan data set from the package car, as it is of the same form as the one you described.

library(car)
# the data
data("Duncan")

# the fitted model; education and income are continuous, type is categorical
lm0 <- lm(prestige ~ education * income * type, data = Duncan)

# turning education into high and low values (you can extend this to more 
# levels)
edu_high <- mean(Duncan$education)  + sd(Duncan$education)
edu_low <- mean(Duncan$education)  - sd(Duncan$education)

# the values below should be used for predictions, each combination of the 
# categories must be represented:
prediction_mat <- data.frame(income = Duncan$income, 
                         education = rep(c(edu_high, edu_low),each = 
                         nrow(Duncan)),
                         type = rep(levels(Duncan$type), each = 
                         nrow(Duncan)*2))


predicted <- predict(lm0, newdata = prediction_mat)


# rearranging the fitted values and the values used for predictions
df <- data.frame(predicted,
             income = Duncan$income,
             edu_group =rep(c("edu_high", "edu_low"),each = nrow(Duncan)),
             type = rep(levels(Duncan$type), each = nrow(Duncan)*2))


# plotting the fitted regression lines
ggplot(df, aes(x = income, y = predicted, group = type, col = type)) + 
geom_line() + 
facet_grid(. ~ edu_group)

enter image description here

like image 25
Daniel Avatar answered Sep 19 '22 11:09

Daniel