lavaan
offers the opportunity to constrain parameters across groups. Assume I have two groups in my data. Assume the following model:
library(RCurl)
library(lavaan)
x <- getURL("https://gist.githubusercontent.com/aronlindberg/dfa0115f1d80b84ebd48b3ed52f9c5ac/raw/3abf0f280a948d6273a61a75415796cc103f20e7/growth_data.csv")
growth_data <- read.csv(text = x)
model_regressions <- ' i =~ 1*t1 + 1*t2 + 1*t3 + 1*t4 + 1*t5 + 1*t6 + 1*t7 + 1*t8 + 1*t9 + 1*t10 + 1*t11 + 1*t12 + 1*t13+ 1*t14 + 1*t15 + 1*t16 + 1*t17 + 1*t18 + 1*t19 + 1*t20
s =~ 0*t1 + 1*t2 + 2*t3 + 3*t4 + 4*t5 + 5*t6 + 6*t7 + 7*t8 + 8*t9 + 9*t10 + 10*t11 + 11*t12 + 12*t13 + 13*t14 + 14*t15 + 15*t16 + 16*t17 + 17*t18 + 18*t19 + 19*t20
# fixing error-variances
t8 ~~ 0.01*t8
t17 ~~ 0.01*t17
t18 ~~ 0.01*t18
# regressions
s ~ h_index
i ~ h_index'
fit_UNconstrained <- growth(model_regressions, data=growth_data, group = "type")
Then using the following I can constrain the intercepts across the two groups:
fit_constrained_intercepts <- growth(model_regressions, data=growth_data, group = "type", group.equal = c("intercepts"))
However, when I compare this model to an unconstrained model, the difference in degrees of freedom and Chi2 is zero (0). How is this possible?
Further, when I constrain other parameters, such as variance, e.g.:
fit_constrained_variances <- growth(model_regressions, data=growth_data, group = "type", group.equal = c("lv.variances"))
...and compare the constrained model to the unconstrained model, the difference in degrees of freedom is 2, not 1 as I would expect from constraining a single parameter:
fitMeasures(fit_UNconstrained, "df")
fitMeasures(fit_constrained_intercepts, "df")
fitMeasures(fit_constrained_variances, "df")
Hence, my question: how does constraining the various parameters (especially intercepts and variances) affect the degrees of freedom in lavaan?
This is caused by the fact that you are modelling growth curves: when you use the growth()
function in lavaan
, all of the intercepts are automatically constrained to be zero! This is why you are getting an identical output when you compare the "unconstrained" model to the one where you've constrained the intercepts - the models actually are identical.
To explore this a bit further, try using sem()
and not growth()
to run your model fits. We are going to use sem()
simply to get a better look at how the degrees of freedom change, as it does not automatically enforce any constraints on its own. Let's take a look at the degrees of freedom again:
> fitMeasures(fit_UNconstrained, "df")
df
416
> fitMeasures(fit_constrained_intercepts, "df")
df
434
Note that we gain 18 degrees of freedom by fixing the intercepts. I'll break this down as follows:
Your model has 20 observed variables (t1:t20), so we might think that we gain 20 degrees of freedom by fixing the intercept for each of these observed variables. However, we are actually constraining all of the intercepts to be identical within each latent variable (In this case, you have two latent variables, i and s). Instead of fitting 20 intercepts like before, we are instead fitting 2 intercepts (one for each latent variable), resulting in a net gain of 18 degrees of freedom.
In your question you mentioned that:
"...the difference in degrees of freedom is 2, not 1 as I would expect from constraining a single parameter..."
Unfortunately, this isn't quite right. In SEM models, the degrees of freedom do not depend on the number of "types" of parameters that we are constraining, but rather they depend on the total number of "free parameters" in your model.
When you use lv.variances
, you are actually constraining the variance of the latent variables. As mentioned above, you have two latent variables, i and s, so you are constraining one parameter each, resulting in you gaining two degrees of freedom.
Let's fit a small SEM, and then manually calculate the degrees of freedom. Since you're modelling growth curves, we'll use a simplified version of your own growth model. We're going to use three time points instead of twenty.
model_regressions <- ' i =~ 1*t1 + 1*t2 + 1*t3
s =~ 0*t1 + 1*t2 + 2*t3'
fit_UNconstrained <- growth(model_regressions, data=growth_data, group = "type")
summary(fit_UNconstrained) # note the use of "summary()" here
We can calculate the degrees of freedom directly using this formula:
Degrees of Freedom = (the number of unique observations) - (the number of free parameters)
1. Let's calculate the number of unique observations first:
For your growth models, the formula for the number of unique observations in each group is k(k+1)/2 + k, where k is the number of observed variables you have. This comes from the fact that you have k(k+1)/2 covariances for your observed variables, and k observed means. In this case, you have 3 observed variables, so you have 3(3+1)/2 + 3 = 9 unique observations in each group. You also have two groups, so we actually have (9 * 2) = 18 observations in total.
2. Now onto the free parameters. We are fitting (for each group):
This gives us 8 free parameters, but again, you have two groups, so (8 * 2) gives us 16 free parameters in total.
Using the formula stated above, 18 - 16 = 2 degrees of freedom. Let's see if lavaan
agrees:
> fit_UNconstrained
lavaan 0.6-3 ended normally after 64 iterations
Optimization method NLMINB
Number of free parameters 16
Number of observations per group
Exploration 87
Exploitation 125
Estimator ML
Model Fit Test Statistic 62.079
Degrees of freedom 2
P-value (Chi-square) 0.000
Voila! I hope this makes things clearer for you. Please keep in mind that if you chose to fix your regressions using s ~ h_index
etc., this will also change your degrees of freedom. In general, you should use summary()
to see how many free parameters you are estimating, and you can use inspect(..., "sampstat")
to get a look at how many unique observations you have.
I suggest playing around with some simpler SEM structures to get a better idea for how they work. Good luck, and happy modelling!
I think the issue might come from how degrees of freedom are determined. A regression model degrees of freedom is one less than the coefficients (otherwise known as "regressors"), not the parameters. When you constrain your intercept, you are not altering the number of coefficients/regressors in the model.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With