How to create a lag variable within each group?

Tags:

I have a data.table:

set.seed(1) data <- data.table(time = c(1:3, 1:4),                    groups = c(rep(c("b", "a"), c(3, 4))),                    value = rnorm(7))  data #    groups time      value # 1:      b    1 -0.6264538 # 2:      b    2  0.1836433 # 3:      b    3 -0.8356286 # 4:      a    1  1.5952808 # 5:      a    2  0.3295078 # 6:      a    3 -0.8204684 # 7:      a    4  0.4874291

I want to compute a lagged version of the "value" column, within each level of "groups".

The result should look like

#   groups time      value  lag.value # 1      a    1  1.5952808         NA # 2      a    2  0.3295078  1.5952808 # 3      a    3 -0.8204684  0.3295078 # 4      a    4  0.4874291 -0.8204684 # 5      b    1 -0.6264538         NA # 6      b    2  0.1836433 -0.6264538 # 7      b    3 -0.8356286  0.1836433

I have tried to use lag directly:

data$lag.value <- lag(data$value)

...which clearly wouldn't work.

I have also tried:

unlist(tapply(data$value, data$groups, lag))  a1         a2         a3         a4         b1         b2         b3   NA -0.1162932  0.4420753  2.1505440         NA  0.5894583 -0.2890288

Which is almost what I want. However the vector generated is ordered differently from the ordering in the data.table which is problematic.

What is the most efficient way to do this in base R, plyr, dplyr, and data.table?

291

asked Oct 10 '14 04:10

2 Answers

You could do this within data.table

 library(data.table)  data[, lag.value:=c(NA, value[-.N]), by=groups]   data  #   time groups       value   lag.value  #1:    1      a  0.02779005          NA  #2:    2      a  0.88029938  0.02779005  #3:    3      a -1.69514201  0.88029938  #4:    1      b -1.27560288          NA  #5:    2      b -0.65976434 -1.27560288  #6:    3      b -1.37804943 -0.65976434  #7:    4      b  0.12041778 -1.37804943

For multiple columns:

nm1 <- grep("^value", colnames(data), value=TRUE) nm2 <- paste("lag", nm1, sep=".") data[, (nm2):=lapply(.SD, function(x) c(NA, x[-.N])), by=groups, .SDcols=nm1]  data #    time groups      value     value1      value2  lag.value lag.value1 #1:    1      b -0.6264538  0.7383247  1.12493092         NA         NA #2:    2      b  0.1836433  0.5757814 -0.04493361 -0.6264538  0.7383247 #3:    3      b -0.8356286 -0.3053884 -0.01619026  0.1836433  0.5757814 #4:    1      a  1.5952808  1.5117812  0.94383621         NA         NA #5:    2      a  0.3295078  0.3898432  0.82122120  1.5952808  1.5117812 #6:    3      a -0.8204684 -0.6212406  0.59390132  0.3295078  0.3898432 #7:    4      a  0.4874291 -2.2146999  0.91897737 -0.8204684 -0.6212406 #    lag.value2 #1:          NA #2:  1.12493092 #3: -0.04493361 #4:          NA #5:  0.94383621 #6:  0.82122120 #7:  0.59390132

Update

From data.table versions >= v1.9.5, we can use shift with type as lag or lead. By default, the type is lag.

data[, (nm2) :=  shift(.SD), by=groups, .SDcols=nm1] #   time groups      value     value1      value2  lag.value lag.value1 #1:    1      b -0.6264538  0.7383247  1.12493092         NA         NA #2:    2      b  0.1836433  0.5757814 -0.04493361 -0.6264538  0.7383247 #3:    3      b -0.8356286 -0.3053884 -0.01619026  0.1836433  0.5757814 #4:    1      a  1.5952808  1.5117812  0.94383621         NA         NA #5:    2      a  0.3295078  0.3898432  0.82122120  1.5952808  1.5117812 #6:    3      a -0.8204684 -0.6212406  0.59390132  0.3295078  0.3898432 #7:    4      a  0.4874291 -2.2146999  0.91897737 -0.8204684 -0.6212406 #    lag.value2 #1:          NA #2:  1.12493092 #3: -0.04493361 #4:          NA #5:  0.94383621 #6:  0.82122120 #7:  0.59390132

If you need the reverse, use type=lead

nm3 <- paste("lead", nm1, sep=".")

Using the original dataset

  data[, (nm3) := shift(.SD, type='lead'), by = groups, .SDcols=nm1]   #  time groups      value     value1      value2 lead.value lead.value1   #1:    1      b -0.6264538  0.7383247  1.12493092  0.1836433   0.5757814   #2:    2      b  0.1836433  0.5757814 -0.04493361 -0.8356286  -0.3053884   #3:    3      b -0.8356286 -0.3053884 -0.01619026         NA          NA   #4:    1      a  1.5952808  1.5117812  0.94383621  0.3295078   0.3898432   #5:    2      a  0.3295078  0.3898432  0.82122120 -0.8204684  -0.6212406   #6:    3      a -0.8204684 -0.6212406  0.59390132  0.4874291  -2.2146999   #7:    4      a  0.4874291 -2.2146999  0.91897737         NA          NA  #   lead.value2  #1: -0.04493361  #2: -0.01619026  #3:          NA  #4:  0.82122120  #5:  0.59390132  #6:  0.91897737  #7:          NA

data

 set.seed(1)  data <- data.table(time =c(1:3,1:4),groups = c(rep(c("b","a"),c(3,4))),              value = rnorm(7), value1=rnorm(7), value2=rnorm(7))

148

answered Nov 06 '22 04:11

akrun

Using package dplyr:

library(dplyr) data <-      data %>%     group_by(groups) %>%     mutate(lag.value = dplyr::lag(value, n = 1, default = NA))

gives

> data Source: local data table [7 x 4] Groups: groups    time groups       value   lag.value 1    1      a  0.07614866          NA 2    2      a -0.02784712  0.07614866 3    3      a  1.88612245 -0.02784712 4    1      b  0.26526825          NA 5    2      b  1.23820506  0.26526825 6    3      b  0.09276648  1.23820506 7    4      b -0.09253594  0.09276648

As noted by @BrianD, this implicitly assumes that value is sorted by group already. If not, either sort it by group, or use the order_by argument in lag. Also note that due to an existing issue with some versions of dplyr, for safety, arguments and the namespace should be explicitly given.

answered Nov 06 '22 04:11

Alex

Related questions
                            
                                How do I extract a single column from a data.frame as a data.frame?
                            
                                duplicate 'row.names' are not allowed error
                            
                                Convert R vector to string vector of 1 element [duplicate]
                            
                                Fitting several regression models with dplyr
                            
                                Fastest way to detect if vector has at least 1 NA?
                            
                                Options for caching / memoization / hashing in R
                            
                                How to remove a directory in R?
                            
                                R pass variable column indices to ggplot2 [duplicate]
                            
                                How to reverse order a vector?
                            
                                Understanding the contents of the Makevars file in R (macros, variables, ~/.R/Makevars and pkg/src/Makevars)
                            
                                Outputting multiple lines of text with renderText() in R shiny
                            
                                How to check if two data frames are equal [duplicate]
                            
                                Cartesian product data frame
                            
                                Change variable name in for loop using R
                            
                                "Non Zero Exit Status" R 3.0.1 'XML' and 'RCurl'
                            
                                Create stacked barplot where each stack is scaled to sum to 100%
                            
                                do-while loop in R
                            
                                Extract the maximum value within each group in a dataframe [duplicate]
                            
                                ggplot2 reverse order of scale_brewer
                            
                                Changing factor levels with dplyr mutate

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to create a lag variable within each group?

Tags:

r

data.table

dplyr

plyr

xiaodai

People also ask