Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding degrees of freedom in lavaan

Tags:

r

r-lavaan

lavaan offers the opportunity to constrain parameters across groups. Assume I have two groups in my data. Assume the following model:

library(RCurl)
library(lavaan)
x <- getURL("https://gist.githubusercontent.com/aronlindberg/dfa0115f1d80b84ebd48b3ed52f9c5ac/raw/3abf0f280a948d6273a61a75415796cc103f20e7/growth_data.csv")
growth_data <- read.csv(text = x)

model_regressions <- ' i =~ 1*t1 + 1*t2 + 1*t3 + 1*t4 + 1*t5 + 1*t6 + 1*t7 + 1*t8 + 1*t9 + 1*t10 + 1*t11 + 1*t12 + 1*t13+ 1*t14 + 1*t15 + 1*t16 + 1*t17 + 1*t18 + 1*t19 + 1*t20
s =~ 0*t1 + 1*t2 + 2*t3 + 3*t4 + 4*t5 + 5*t6 + 6*t7 + 7*t8 + 8*t9 + 9*t10 + 10*t11 + 11*t12 + 12*t13 + 13*t14 + 14*t15 + 15*t16 + 16*t17 + 17*t18 + 18*t19 + 19*t20

# fixing error-variances
t8 ~~ 0.01*t8
t17 ~~ 0.01*t17
t18 ~~ 0.01*t18
# regressions
s ~ h_index
i ~ h_index'

fit_UNconstrained <- growth(model_regressions, data=growth_data, group = "type")

Then using the following I can constrain the intercepts across the two groups:

fit_constrained_intercepts <- growth(model_regressions, data=growth_data, group = "type", group.equal = c("intercepts"))

However, when I compare this model to an unconstrained model, the difference in degrees of freedom and Chi2 is zero (0). How is this possible?

Further, when I constrain other parameters, such as variance, e.g.:

fit_constrained_variances <- growth(model_regressions, data=growth_data, group = "type", group.equal = c("lv.variances"))

...and compare the constrained model to the unconstrained model, the difference in degrees of freedom is 2, not 1 as I would expect from constraining a single parameter:

fitMeasures(fit_UNconstrained, "df")
fitMeasures(fit_constrained_intercepts, "df")
fitMeasures(fit_constrained_variances, "df")

Hence, my question: how does constraining the various parameters (especially intercepts and variances) affect the degrees of freedom in lavaan?

like image 418
histelheim Avatar asked Oct 22 '18 20:10

histelheim


2 Answers

Why does constraining the intercepts not change the degrees of freedom?

This is caused by the fact that you are modelling growth curves: when you use the growth() function in lavaan, all of the intercepts are automatically constrained to be zero! This is why you are getting an identical output when you compare the "unconstrained" model to the one where you've constrained the intercepts - the models actually are identical.

To explore this a bit further, try using sem() and not growth() to run your model fits. We are going to use sem() simply to get a better look at how the degrees of freedom change, as it does not automatically enforce any constraints on its own. Let's take a look at the degrees of freedom again:

> fitMeasures(fit_UNconstrained, "df")
 df 
416 
> fitMeasures(fit_constrained_intercepts, "df")
 df 
434

Note that we gain 18 degrees of freedom by fixing the intercepts. I'll break this down as follows:

Your model has 20 observed variables (t1:t20), so we might think that we gain 20 degrees of freedom by fixing the intercept for each of these observed variables. However, we are actually constraining all of the intercepts to be identical within each latent variable (In this case, you have two latent variables, i and s). Instead of fitting 20 intercepts like before, we are instead fitting 2 intercepts (one for each latent variable), resulting in a net gain of 18 degrees of freedom.

Why does constraining the variances change df by 2?

In your question you mentioned that:

"...the difference in degrees of freedom is 2, not 1 as I would expect from constraining a single parameter..."

Unfortunately, this isn't quite right. In SEM models, the degrees of freedom do not depend on the number of "types" of parameters that we are constraining, but rather they depend on the total number of "free parameters" in your model.

When you use lv.variances, you are actually constraining the variance of the latent variables. As mentioned above, you have two latent variables, i and s, so you are constraining one parameter each, resulting in you gaining two degrees of freedom.

SEM Degrees of Freedom, Further Explained:

Let's fit a small SEM, and then manually calculate the degrees of freedom. Since you're modelling growth curves, we'll use a simplified version of your own growth model. We're going to use three time points instead of twenty.

model_regressions <- ' i =~ 1*t1 + 1*t2 + 1*t3
s =~ 0*t1 + 1*t2 + 2*t3'

fit_UNconstrained <- growth(model_regressions, data=growth_data, group = "type")
summary(fit_UNconstrained) # note the use of "summary()" here

We can calculate the degrees of freedom directly using this formula:

Degrees of Freedom = (the number of unique observations) - (the number of free parameters)

1. Let's calculate the number of unique observations first:

For your growth models, the formula for the number of unique observations in each group is k(k+1)/2 + k, where k is the number of observed variables you have. This comes from the fact that you have k(k+1)/2 covariances for your observed variables, and k observed means. In this case, you have 3 observed variables, so you have 3(3+1)/2 + 3 = 9 unique observations in each group. You also have two groups, so we actually have (9 * 2) = 18 observations in total.

2. Now onto the free parameters. We are fitting (for each group):

  • 2 intercepts for all of the observed variables (can be viewed as 1 intercept for each latent variable)
  • 3 variances for the observed variables
  • 2 variances for the latent variables
  • 1 covariance between the latent variables

This gives us 8 free parameters, but again, you have two groups, so (8 * 2) gives us 16 free parameters in total.

Using the formula stated above, 18 - 16 = 2 degrees of freedom. Let's see if lavaan agrees:

> fit_UNconstrained
lavaan 0.6-3 ended normally after 64 iterations

  Optimization method                           NLMINB
  Number of free parameters                         16

  Number of observations per group         
  Exploration                                       87
  Exploitation                                     125

  Estimator                                         ML
  Model Fit Test Statistic                      62.079
  Degrees of freedom                                 2
  P-value (Chi-square)                           0.000

Voila! I hope this makes things clearer for you. Please keep in mind that if you chose to fix your regressions using s ~ h_index etc., this will also change your degrees of freedom. In general, you should use summary() to see how many free parameters you are estimating, and you can use inspect(..., "sampstat") to get a look at how many unique observations you have.

I suggest playing around with some simpler SEM structures to get a better idea for how they work. Good luck, and happy modelling!

like image 51
Marcus Campbell Avatar answered Sep 30 '22 03:09

Marcus Campbell


I think the issue might come from how degrees of freedom are determined. A regression model degrees of freedom is one less than the coefficients (otherwise known as "regressors"), not the parameters. When you constrain your intercept, you are not altering the number of coefficients/regressors in the model.

like image 27
ginger_cat Avatar answered Sep 30 '22 04:09

ginger_cat