Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Retain and lag function in R as SAS

Tags:

r

data.table

I am looking for a function in R similar to lag1, lag2 and retain functions in SAS which I can use with data.tables.

I know there are functions like embed and lag in R but they don't return a single value or the previous value . They return a complete set of vectors.

Is there anything in R which I can use with data.table?

More info on the SAS functions :

  • Retain
  • Lag
like image 750
user2786962 Avatar asked Dec 17 '13 14:12

user2786962


People also ask

How do I LAG data in SAS?

You can use the LAG function in SAS to retrieve lagged values of some variable. This function uses the following basic syntax: lag1_value = lag(value); By default, lag finds the previous value of some variable.

What is LAG () in SAS?

In SAS, the LAG function is used to compare the current value to its predecessors. If you want to calculate lag of second order, use LAG2 function. Similarly, you can use LAG3 function for measuring lag of third order.

What retain do in SAS?

The RETAIN statement specifies variables whose values are not set to missing at the beginning of each iteration of the DATA step. The KEEP statement specifies variables that are to be included in any data set that is being created.

What is the opposite of LAG function in SAS?

The "opposite of LAG" function is often called a LEAD function. It is not natively implemented in the SAS DATA step, but you can perform some tricks to emulate the behavior.


1 Answers

You have to be aware that R works very different from the data step in SAS. The lag function in SAS is used in the data step, and is used within the implicit loop structure of that data step. The same goes for the retain function, which simply keeps the value constant when going through the data looping.

R on the other hand works completely vectorized. This means that you have to rethink what you want to do, and adapt accordingly.

  • retain is simply useless in R, as R recycles arguments by default. If you want to do this explicitly, you might look at eg rep() to construct a vector with constant values and a certain length.
  • lag is a matter of using indices, and just shifting position of all values in a vector. In order to keep a vector of the same length, you need to add some NA and remove some extra values.

A simple example: This SAS code lags a variable x and adds a variable year that has a constant value:

data one;
   retain year 2013;
   input x @@;
   y=lag1(x);
   z=lag2(x);
   datalines;
1 2 3 4 5 6
;

In R, you could write your own lag function like this:

mylag <- function(x,k) c(rep(NA,k),head(x,-k))

This single line adds k times NA at the beginning of the vector, and drops the last k values from the vector. The result is a lagged vector as given by lag1 etc. in SAS.

this allows something like :

nrs <- 1:6 # equivalent to datalines
one <- data.frame(
   x = nrs,
   y = mylag(nrs,1),
   z = mylag(nrs,2),
   year = 2013  # R automatically loops, so no extra command needed
)

The result is :

> one
  x  y  z year
1 1 NA NA 2013
2 2  1 NA 2013
3 3  2  1 2013
4 4  3  2 2013
5 5  4  3 2013
6 6  5  4 2013

Exactly the same would work with a data.table object. The important note here is to rethink your strategy: Instead of thinking loopwise as you do with the DATA step in SAS, you have to start thinking in terms of vectors and indices when using R.

like image 163
Joris Meys Avatar answered Nov 04 '22 05:11

Joris Meys