I have a dataset on state level approval ratings. I need to lag one of the variables by two years.
The data is annual and spans 1970 to 2008. Obviously, if I lag the data I will lose some observations (ie: 1970 won't be able to find the 1968 data) I'm fine with losing those observations, but the diff command returns an error when I try to lag.
However, when I run the lag I get the following error that the replacement does not match the data:
> df$lagvar <- diff(df$var, lag=2)
Error in `$<-.data.frame`(`*tmp*`, "lagvar", value = c(-0.4262501, :
replacement has 230 rows, data has 232
I've searched around, but cannot find a solution. Any ideas on how to get around this?
Missing data can reduce the statistical power of a study and can produce biased estimates, leading to invalid conclusions.
Missing data constitute threats to different forms of reliability, validity, and generalizability of study results. As detailed elsewhere in this volume, the application or nonapplication of different solutions to those problems can impact these threats directly.
In SPSS, LAG is a function that returns the value of a previous case. It's mostly used on data with multiple rows of data per respondent. Here it comes in handy for calculating cumulative sums or counts.
diff
does not pad with leading NA
by default. You have to add those yourself.
df$lagvar <- c(NA, NA, diff(df$var, lag=2))
You could write a simple wrapper function to do it for you. Something like this, perhaps:
mydiff <- function(x, ...) {
d <- diff(x, ...)
c(rep(NA, NROW(x)-NROW(d)), d)
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With