I am trying to impute data in dataset with a longitudinal design. There are two predictors (experimental group, and time) and one outcome variable (score). The clustering variable is id. Here is the toy data <pre class="prettyprint"><code>set.seed(345) A0 <- rnorm(4,2,.5) B0 <- rnorm(4,2+3,.5) A1 <- rnorm(4,6,.5) B1 <- rnorm(4,6+2,.5) A2 <- rnorm(4,10,.5) B2 <- rnorm(4,10+1,.5) A3 <- rnorm(4,14,.5) B3 <- rnorm(4,14+0,.5) score <- c(A0,B0,A1,B1,A2,B2,A3,B3) id <- rep(1:8,times = 4, length = 32) time <- rep(0:3, each = 8, length = 32) group <- rep(c("A","B"), times =2, each = 4, length = 32) df <- data.frame(id = id, group = group, time = time, score = score) # plots (ggplot(df, aes(x = time, y = score, group = group)) + stat_summary(fun.y = "mean", geom = "line", aes(linetype = group)) + stat_summary(fun.y = "mean", geom = "point", aes(shape = group), size = 3) + coord_cartesian(ylim = c(0,18))) # now place some NAs df[sample(1:nrow(df), 10, replace = F),"score"] <- NA df </code></pre> If I understand this post correctly, in the predictor matrix I should specify the <code>id</code> clustering variable with a <code>-2</code> and the two fixed predictors <code>time</code> and <code>group</code> with a <code>1</code>. Like so <pre class="prettyprint"><code>library(mice) (ini <- mice(df, maxit=0)) (pred <- ini$predictorMatrix) (pred["score",] <- c(-2, 1, 1, 0)) (imp <- mice(df, method = c("", "", "", "2l.pan"), pred = pred, maxit = 1, seed = 71152)) </code></pre> What i would like to know is: <ol> <li>Is this a longitudinal random intercepts imputation model? Specifying the id variable as <code>-2</code> designates it as a 'class' variable, but in this mice primer it suggests that for multilevel models you should create a variable of all <code>1</code>'s in the dataframe as a constant, which is then specified as the random intercept via <code>2</code> in the predictor matrix. However, this is based on the <code>2l.norm</code> function rather than the <code>2l.pan</code> function, so I am not really sure where I am here. Does the <code>2l.pan</code> function not require this column, or the specification of random effects?</li> <li>Is there any way to specify a longitudinal random-slopes model, and, if so, how?</li> </ol>

The <code>pan</code> library doesn't require an intercept term. You can dig into the function using <pre class="prettyprint"><code>library(pan) ?pan </code></pre> That said <code>mice</code> uses a wrapper around pan called <code>mice.impute.2l.pan</code> with the <code>mice</code> library loaded you can look at the help for that function. It states: it has a parameters called <code>intercept</code> which is <code>[a] Logical [and] determin[es] whether the intercept is automatically added.</code> It is TRUE by default. This is defined as a random intercept by default. Found this out after browsing the R code for the mice wrapper: <code>if (intercept) { x <- cbind(1, as.matrix(x)) type <- c(2, type) }</code> Where the <code>pan</code> function parameter <code>type</code> is a <code>Vector of length ncol(x) identifying random and class variables</code>. The intercept is added by default and defined as a random effect. They do provide and example like you stated with a 1 for "x" in the prediction matrix for fixed effects. It also states for <code>2l.norm</code>, <code>The random intercept is automatically added in mice.impute.2l.norm().</code> It has a few examples with descriptions. The CRAN documentation for <code>pan</code> might help you.

Random Effects in Longitudinal Multilevel Imputation Models Using MICE

Tags:

r

mixed-models

r-mice

I am trying to impute data in dataset with a longitudinal design. There are two predictors (experimental group, and time) and one outcome variable (score). The clustering variable is id.

Here is the toy data

set.seed(345)
A0 <- rnorm(4,2,.5)
B0 <- rnorm(4,2+3,.5)
A1 <- rnorm(4,6,.5)
B1 <- rnorm(4,6+2,.5)
A2 <- rnorm(4,10,.5)
B2 <- rnorm(4,10+1,.5)
A3 <- rnorm(4,14,.5)
B3 <- rnorm(4,14+0,.5)
score <- c(A0,B0,A1,B1,A2,B2,A3,B3)
id <- rep(1:8,times = 4, length = 32)
time <- rep(0:3, each = 8, length = 32)
group <- rep(c("A","B"), times =2, each = 4, length = 32)
df <- data.frame(id = id, group = group, time = time,  score = score)

# plots
(ggplot(df, aes(x = time, y = score, group = group)) + 
    stat_summary(fun.y = "mean", geom = "line", aes(linetype = group)) +
    stat_summary(fun.y = "mean", geom = "point", aes(shape = group), size = 3) +
    coord_cartesian(ylim = c(0,18)))

# now place some NAs
df[sample(1:nrow(df), 10, replace = F),"score"] <- NA

df

If I understand this post correctly, in the predictor matrix I should specify the id clustering variable with a -2 and the two fixed predictors time and group with a 1. Like so

library(mice)

(ini <- mice(df, maxit=0))
(pred <- ini$predictorMatrix)
(pred["score",] <- c(-2, 1, 1, 0))
(imp <- mice(df, 
            method = c("", "", "", "2l.pan"),
            pred = pred, 
            maxit = 1, 
            seed = 71152))

What i would like to know is:

Is this a longitudinal random intercepts imputation model? Specifying the id variable as -2 designates it as a 'class' variable, but in this mice primer it suggests that for multilevel models you should create a variable of all 1's in the dataframe as a constant, which is then specified as the random intercept via 2 in the predictor matrix. However, this is based on the 2l.norm function rather than the 2l.pan function, so I am not really sure where I am here. Does the 2l.pan function not require this column, or the specification of random effects?
Is there any way to specify a longitudinal random-slopes model, and, if so, how?

772

asked Dec 23 '17 06:12

llewmills

2 Answers

This answer is probably a bit late for you, but it may be able to help some people who read this in the future:

How to work with `2l.pan`

Below are some details about specifying multilevel imputation models with mice. Because the application is longitudinal, I use the term "persons" to refer to units at Level 2. These are the most relevant arguments for 2l.pan as mentioned in the mice documentation:

type

Vector of length ncol(x) identifying random and class variables. Random effects are identified by a 2. The group variable (only one is allowed) is coded as -2. Random effects also include the fixed effect. If for a covariates X1 group means shall be calculated and included as further fixed effects choose 3. In addition to the effects in 3, specification 4 also includes random effects of X1.

There are 5 different codes you can use in the predictor matrix for variables imputed with 2l.pan. The person identifier is coded as -2 (this is different from 2l.norm). To include predictor variables with fixed or random effects, these variables are coded with 1 or 2, respectively. If coded as 2, the corresponding fixed effect is automatically included.

In addition, 2l.pan offers the codes 3 and 4, which have similar meanings as 1 and 2 but will include an additional fixed effect for the person mean of that variable. This is useful if you're trying to model within- and between-person effects of time-varying predictor variables.

intercept

Logical determining whether the intercept is automatically added.

By default, 2l.pan includes the intercept as both a fixed and a random effect. For this reason, it is not required to include a constant term in the predictor matrix. If one sets intercept=FALSE, this behavior is changed, and the intercept is dropped from the imputation model.

groupcenter.slope

If TRUE, in case of group means (type is 3 or 4) group mean centering for these predictors are conducted before doing imputations. Default is FALSE.

Using this option, it is possible to center predictor variables around the person mean instead of including the predictor variable "as is" (i.e., without centering). This only applies to variables coded as 3 or 4. For predictors coded as 3, this is not very important because the models with and without centering are identical.

However, when predictor variables are coded as 4 (i.e., with a random slope), then centering alters the meaning of the random effect so that the random slope no longer applies to the variable "as is" but to the within-person deviation of that variable.

In your example, you can include a simple random slope for time as follows:

library(mice)
ini <- mice(df, maxit=0)

# predictor matrix (following 'type')
pred <- ini$predictorMatrix
pred["score",] <- c(-2, 1, 2, 0)

# imputation method
meth <- c("", "", "", "2l.pan")

imp <- mice(df, method=meth, pred=pred, maxit=10, m=10)

In this example, coding time as 3 or 4 wouldn't make a lot of sense because the person means of time are identical for all persons. However, if you have time-varying covariates that you want to include as predictor variables in the imputation model, 3 and 4 can be useful.

The additional arguments like intercept and groupcenter.slope can be specified directly in the call to mice(), for example:

imp <- mice(df, ..., groupcenter.slope=TRUE)

Regarding your Questions

So, to answer your questions as stated in the post:

Yes, 2l.pan provides a multilevel (or rather two-level) imputation model. The intercept is included as both a fixed and a random effect by default (can be changed with intercept=FALSE) and need not be specified in the predictor matrix (this is in contrast to 2l.norm).
Yes, you can specify random slopes with 2l.pan. To do that, predictors with random slopes are coded as 2 or 4 in the predictor matrix. If coded as 2, the random slope is included. If coded as 4, the random slope is included as well as an additional fixed effect for the person means of that variable. If coded as 4, the meaning of the random slope may be altered by making use of groupcenter.slope=TRUE (see above).

This article also includes some worked examples for how to work with 2l.pan and other functions for mutlivel imputation: [Link]

answered Sep 19 '22 10:09

SimonG

The pan library doesn't require an intercept term.

You can dig into the function using

library(pan)
?pan

That said mice uses a wrapper around pan called mice.impute.2l.pan with the mice library loaded you can look at the help for that function. It states: it has a parameters called intercept which is [a] Logical [and] determin[es] whether the intercept is automatically added. It is TRUE by default. This is defined as a random intercept by default. Found this out after browsing the R code for the mice wrapper:

if (intercept) { x <- cbind(1, as.matrix(x)) type <- c(2, type) }

Where the pan function parameter type is a Vector of length ncol(x) identifying random and class variables. The intercept is added by default and defined as a random effect.

They do provide and example like you stated with a 1 for "x" in the prediction matrix for fixed effects.

It also states for 2l.norm, The random intercept is automatically added in mice.impute.2l.norm().

It has a few examples with descriptions. The CRAN documentation for pan might help you.

answered Sep 21 '22 10:09

Matt L.

Related questions
                            
                                R - How to run average & max on different data.table columns based on multiple factors & return original colnames
                            
                                Subtract a column in a dataframe from many columns in R
                            
                                How to set the color range of scale_colour_brewer() in ggplot2? (palette selected)
                            
                                Possible to write Excel formulas or data validation using R?
                            
                                adding ggtitle via do.call when argument is a language object
                            
                                Converting a Number Matrix to a Color Matrix in R
                            
                                r/ggplot - Use position_jitterdodge without a fill aesthetic
                            
                                Find the source file containing R function definition
                            
                                What is the difference between [ ] and [[ ]] in R? [duplicate]
                            
                                Counting consecutive patterns in strings using R
                            
                                R - how to allocate screen space to complex ggplot images
                            
                                Rblpapi - using bdp with ISIN / Cusip gives error
                            
                                Suppress any emission of a particular warning message
                            
                                R Shiny modules with conditionalPanel and reactives
                            
                                removing offset terms from a formula
                            
                                ggplot2 - multiple plots scaling
                            
                                List tables within a Postgres schema using R
                            
                                Error: Cannot pass NA to dbQuoteIdentifier() in sqldf package in R
                            
                                How to join tables from different SQL databases using R and dplyr?
                            
                                How to create a namespace and export a function into it?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Random Effects in Longitudinal Multilevel Imputation Models Using MICE

Tags:

r

mixed-models

r-mice

llewmills

People also ask

2 Answers

How to work with `2l.pan`

Regarding your Questions

SimonG

Matt L.

Recent Activity

Donate For Us

Random Effects in Longitudinal Multilevel Imputation Models Using MICE

Tags:

r

mixed-models

r-mice

llewmills

People also ask

2 Answers

How to work with 2l.pan

Regarding your Questions

SimonG

Matt L.

Related questions

Recent Activity

Donate For Us

How to work with `2l.pan`