Please note: I am trying to get the code to work with both time & individual fixed effects, and an unbalanced dataset. The sample code below works with a balanced dataset.
See edit below too, please
I am trying to manually calculate the fitted values of a fixed effects model (with both individual and time effects) using the plm
package. This is more of an exercise to confirm I understand the mechanics of the model and the package, I know I can get the fitted values themselves from the plm
object, from the two related questions (here and here).
From the plm
vignette (p.2), the underlying model is:
y_it = alpha + beta_transposed * x_it + (mu_i + lambda_t + epsilon_it)
where mu_i is the individual component of the error term (a.k.a. "individual effect"), and lambda_t is the "time effect".
The fixed effects can be extracted by using fixef()
and I thought I could use them (together with the independent variables) to calculate the fitted values for the model, using (with two independent variables) in this way:
fit_it = alpha + beta_1 * x1 + beta_2 * x2 + mu_i + lambda_t
This is where I fail -- the values I get are nowhere near the fitted values (which I get as the difference between the actual values and the residuals in the model object). For one, I do not see alpha
anywhere. I tried playing with the fixed effects being shown as differences from the first, from the mean, etc., with no success.
What I am missing? It could well be a misunderstanding of the model, or an error in the code, I am afraid... Thanks in advance.
PS: One of the related questions hints that pmodel.response()
should be related to my issue (and the reason there is no plm.fit
function), but its help page does not help me understand what this function actually does, and I cannot find any examples how to interpret the result it produces.
Thanks!
Sample code of what I did:
library(data.table); library(plm)
set.seed(100)
DT <- data.table(CJ(id=c("a","b","c","d"), time=c(1:10)))
DT[, x1:=rnorm(40)]
DT[, x2:=rnorm(40)]
DT[, y:=x1 + 2*x2 + rnorm(40)/10]
DT <- DT[!(id=="a" & time==4)] # just to make it an unbalanced panel
setkey(DT, id, time)
summary(plmFEit <- plm(data=DT, id=c("id","time"), formula=y ~ x1 + x2, model="within", effect="twoways"))
# Extract the fitted values from the plm object
FV <- data.table(plmFEit$model, residuals=as.numeric(plmFEit$residuals))
FV[, y := as.numeric(y)]
FV[, x1 := as.numeric(x1)]
FV[, x2 := as.numeric(x2)]
DT <- merge(x=DT, y=FV, by=c("y","x1","x2"), all=TRUE)
DT[, fitted.plm := as.numeric(y) - as.numeric(residuals)]
FEI <- data.table(as.matrix(fixef(object=plmFEit, effect="individual", type="level")), keep.rownames=TRUE) # as.matrix needed to preserve the names?
setnames(FEI, c("id","fei"))
setkey(FEI, id)
setkey(DT, id)
DT <- DT[FEI] # merge the fei into the data, each id gets a single number for every row
FET <- data.table(as.matrix(fixef(object=plmFEit, effect="time", type="level")), keep.rownames=TRUE) # as.matrix needed to preserve the names?
setnames(FET, c("time","fet"))
FET[, time := as.integer(time)] # fixef returns time as character
setkey(FET, time)
setkey(DT, time)
DT <- DT[FET] # merge the fet into the data, each time gets a single number for every row
# calculate the fitted values (called calc to distinguish from those from plm)
DT[, fitted.calc := as.numeric(coef(plmFEit)[1] * x1 + coef(plmFEit)[2]*x2 + fei + fet)]
DT[, diff := as.numeric(fitted.plm - fitted.calc)]
all.equal(DT$fitted.plm, DT$fitted.calc)
My session is as follows:
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] plm_1.4-0 Formula_1.2-1 RJSONIO_1.3-0 jsonlite_0.9.17 readxl_0.1.0.9000 data.table_1.9.7 bit64_0.9-5 bit_1.1-12 RevoUtilsMath_3.2.2
loaded via a namespace (and not attached):
[1] bdsmatrix_1.3-2 Rcpp_0.12.1 lattice_0.20-33 zoo_1.7-12 MASS_7.3-44 grid_3.2.2 chron_2.3-47 nlme_3.1-122 curl_0.9.3 rstudioapi_0.3.1 sandwich_2.3-4
[12] tools_3.2.2
Edit: (2015-02-22) Since this has attracted some interest, I will try to clarify further. I was trying to fit a "fixed effects" model (a.k.a. "within" or "least squares dummy variables", as the plm package vignette calls it on p.3, top paragraph) -- same slope(s), different intercepts.
This is the same as running an ordinary OLS regression after adding dummies for time
and id
. Using the code below I can duplicate the fitted values from the plm
package using base lm()
. With the dummies, it is explicit that the first elements of both id and time are the group to compare to. What I still cannot do is how to use the facilities of the plm
package to do the same I can easily accomplish using lm()
.
# fit the same with lm() and match the fitted values to those from plm()
lmF <- lm(data = DT, formula = y ~ x1 + x2 + factor(time) + factor(id))
time.lm <- coef(lmF)[grep(x = names(coef(lmF)), pattern = "time", fixed = TRUE)]
time.lm <- c(0, unname(time.lm)) # no need for names, the position index corresponds to time
id.lm <- coef(lmF)[grep(x = names(coef(lmF)), pattern = "id", fixed = TRUE)]
id.lm <- c(0, unname(id.lm))
names(id.lm) <- c("a","b","c","d") # set names so that individual values can be looked up below when generating the fit
DT[, by=list(id, time), fitted.lm := coef(lmF)[["(Intercept)"]] + coef(lmF)[["x1"]] * x1 + coef(lmF)[["x2"]] * x2 + time.lm[[time]] + id.lm[[id]]]
all.equal(DT$fitted.plm, DT$fitted.lm)
Hope this is useful to others who might be interested. The issue might be something about how plm
and fixef
deal with the missing value I intentionally created. I tried playing with the type=
parameter of fixef
but to no effect.
Description. plm is a package for R which intends to make the estimation of linear panel models straightforward. plm provides functions to estimate a wide variety of models and to make (robust) inference.
Product Lifecycle Management (PLM) is an integrated business approach to the collaborative creation, management and dissemination of engineering information throughout the extended enterprise.
Regardless of the sizes of T and N, a very common approach to estimating a linear model is to include both unit fixed effects and time fixed effects in ordinary least squares estimation. The resulting estimator is often called the “two-way fixed effects” (TWFE) estimator.
Time fixed effects are standardly obtained by means of time-dummy variables, which control for all time unit-specific effects. This implies controlling for T-1 time-unit dummy variables in case T time periods are observed in the data.
This works for an unbalanced data with effect="individual"
and time dummies y ~ x +factor(year)
:
fitted <- pmodel.response(plm.model)-residuals(plm.model)
I found this that can help you, since the lm() solution was not working in my case (I got different coefficients comparing to the plm package)
Therefore, it is just about applying the suggestions by the authors of the plm package here http://r.789695.n4.nabble.com/fitted-from-plm-td3003924.html
So what I did is just to apply
plm.object <- plm(y ~ lag(y, 1) + z +z2, data = mdt, model= "within", effect="twoways")
fitted <- as.numeric(plm.object$model[[1]] - plm.object$residuals)
where I need the as.numeric function since I need to use it as a vector to plug in for further manipulations. I also want to point out that if your model has a lagged dependent variable on the right hand side, the solution above with as.numeric provides a vector already NET of the missing values because of the lag. For me this is exactly what I needed to.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With