Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rewriting Mixed effects model formula from R (lme4) to Julia

I would like to get the same results in Julia as using lmer function from lme4 library in R. Please find below and example on a build-in mtcars dataset using R

library(lme4)
data<-mtcars    
data$vs<-as.factor(data$vs)
data$am<-as.factor(data$am)
data$gear<-as.factor(data$gear)
str(data)
model <- lmer(mpg ~ cyl:gear + hp:am + (1|gear:am), data = data)

I have found the lmm() function from MixedModels package for Julia which should be capable of running the same results, however I don't know how to rewrite the formula from the first argument of lmer() function using lmm(). Especially the interaction (:) operator.

I would be grateful for an answer with short example.

like image 498
JoannaW Avatar asked Nov 15 '15 18:11

JoannaW


2 Answers

The correspondence between R and Julia models here does not seem to be exact. There are also issues of different numerical algorithms. But my attempt to recreate the same model using MixedModels is as follows:

using RDatasets
using MixedModels

mtcars = dataset("datasets","mtcars")
mtcars[:AM] = PooledDataArray(mtcars[:AM])
mtcars[:Gear] = PooledDataArray(mtcars[:Gear])
mtcars[:GearAM] = PooledDataArray(collect(zip(mtcars[:Gear],mtcars[:AM])))
m = fit!(lmm(MPG ~ 1 + Gear + AM + Cyl + HP + (1|GearAM),mtcars))

Creating the mixed effect column manually is klunky - perhaps there is a better way. Note the differences in naming the coefficients between R and Julia. There are 6 fixed-effect coefficients in both.

The solution in Julia seems different (on my machine) but achieves a better log-likelihood. The random-effect is expected to be weak, as its variables are already present in the fixed-effects (it only accounts for dependence between Gear and AM), and there are only 32 data points.

Hope this helps and if you come up with a better understanding, it would be nice to add it in another answer or comment.

like image 124
Dan Getz Avatar answered Oct 07 '22 11:10

Dan Getz


Tl;DR

If you use REML=true and @formula(MPG ~ 1 + Cyl & Gear + HP & AM + (1 | Gear & AM) you can reproduce exactly.

Longer answer

Support for (1 | Gear & AM) requires version 3 or higher of MixedModels.jl which as of now (August 2020) is unreleased, so to reproduce this you'll need to install it with pkg"add MixedModels#master". This was run on Julia 1.5.0:

using RDatasets, MixedModels
mtcars = RDatasets.dataset("datasets","mtcars")
categorical!(mtcars, [:AM, :Gear])
m = fit(MixedModel, 
        @formula(MPG ~ Cyl & Gear + HP & AM + (1 | Gear & AM)), 
        mtcars, 
        REML=true)

Produces this output:

Linear mixed model fit by REML
 MPG ~ 1 + Cyl & Gear + HP & AM + (1 | Gear & AM)
 REML criterion at convergence: 168.11435769585046

Variance components:
             Column    Variance  Std.Dev. 
Gear & AM (Intercept)  16.913543 4.1126078
Residual                7.988539 2.8264004
 Number of obs: 32; levels of grouping factors: 4

  Fixed-effects parameters:
──────────────────────────────────────────────────────
                    Coef.  Std. Error      z  Pr(>|z|)
──────────────────────────────────────────────────────
(Intercept)    34.3507      3.50385     9.80    <1e-21
Cyl & Gear: 3  -1.08837     0.781649   -1.39    0.1638
Cyl & Gear: 4  -1.56833     0.875625   -1.79    0.0733
Cyl & Gear: 5  -0.167413    1.42839    -0.12    0.9067
HP & AM: 0     -0.0412826   0.0215381  -1.92    0.0553
HP & AM: 1     -0.0613247   0.0298067  -2.06    0.0396
──────────────────────────────────────────────────────

Compared with the R code you posted:

Linear mixed model fit by REML ['lmerMod']
Formula: mpg ~ cyl:gear + hp:am + (1 | gear:am)
   Data: data

REML criterion at convergence: 168.1

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.42280 -0.54922 -0.07898  0.54892  2.09217 

Random effects:
 Groups   Name        Variance Std.Dev.
 gear:am  (Intercept) 16.914   4.113   
 Residual              7.989   2.826   
Number of obs: 32, groups:  gear:am, 4

Fixed effects:
            Estimate Std. Error t value
(Intercept) 34.35066    3.50385   9.804
cyl:gear3   -1.08837    0.78165  -1.392
cyl:gear4   -1.56833    0.87562  -1.791
cyl:gear5   -0.16741    1.42839  -0.117
hp:am0      -0.04128    0.02154  -1.917
hp:am1      -0.06132    0.02981  -2.057

Without modifying data

You can also control which variables are treated as categorical by providing contrasts for them:

m2 = fit(MixedModel,
         @formula(MPG ~ Cyl & Gear + HP & AM + (1 | Gear & AM)),
         mtcars,
         REML=true,
         contrasts = Dict(:AM => DummyCoding(), :Gear => DummyCoding()))

Explanation

There are two main differences in specifying the regression formula between R and Julia (with StatsModels.jl).

  1. In Julia, you use & to create an interaction between terms, but in R you use :.
  2. In Julia, you have to explicitly say that you're defining a formula with the @formula macro. In R, anything and everything is potentially a macro via the magic of "nonstandard evaluation".

So, the equivalent Julia formula for your model is

@formula(mpg ~ cyl&gear + hp&am + (1|gear&am))

For using categorical! to convert columns in a DataFrame to categorical, see the DataFrames.jl docs

like image 2
Dave Kleinschmidt Avatar answered Oct 07 '22 11:10

Dave Kleinschmidt