Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Lag with missing data

I have a dataset on state level approval ratings. I need to lag one of the variables by two years.

The data is annual and spans 1970 to 2008. Obviously, if I lag the data I will lose some observations (ie: 1970 won't be able to find the 1968 data) I'm fine with losing those observations, but the diff command returns an error when I try to lag.

However, when I run the lag I get the following error that the replacement does not match the data:

> df$lagvar <- diff(df$var, lag=2)
Error in `$<-.data.frame`(`*tmp*`, "lagvar", value = c(-0.4262501,  : 
replacement has 230 rows, data has 232

I've searched around, but cannot find a solution. Any ideas on how to get around this?

like image 351
user2340913 Avatar asked May 01 '13 21:05

user2340913


People also ask

What is the problem with missing data?

Missing data can reduce the statistical power of a study and can produce biased estimates, leading to invalid conclusions.

Does missing data affect reliability?

Missing data constitute threats to different forms of reliability, validity, and generalizability of study results. As detailed elsewhere in this volume, the application or nonapplication of different solutions to those problems can impact these threats directly.

What does lag mean in SPSS?

In SPSS, LAG is a function that returns the value of a previous case. It's mostly used on data with multiple rows of data per respondent. Here it comes in handy for calculating cumulative sums or counts.


1 Answers

diff does not pad with leading NA by default. You have to add those yourself.

df$lagvar <- c(NA, NA, diff(df$var, lag=2))

You could write a simple wrapper function to do it for you. Something like this, perhaps:

mydiff <- function(x, ...) {
  d <- diff(x, ...)
  c(rep(NA, NROW(x)-NROW(d)), d)
}
like image 109
Joshua Ulrich Avatar answered Nov 15 '22 00:11

Joshua Ulrich