I am looking for a function in R similar to <code>lag1</code>, <code>lag2</code> and <code>retain</code> functions in SAS which I can use with data.tables. I know there are functions like <code>embed</code> and <code>lag</code> in R but they don't return a single value or the previous value . They return a complete set of vectors. Is there anything in R which I can use with data.table? More info on the SAS functions : <ul> <li> Retain </li> <li>Lag</li> </ul>

You have to be aware that R works very different from the data step in SAS. The <code>lag</code> function in SAS is used in the data step, and is used within the implicit loop structure of that data step. The same goes for the <code>retain</code> function, which simply keeps the value constant when going through the data looping. R on the other hand works completely vectorized. This means that you have to rethink what you want to do, and adapt accordingly. <ul> <li> <code>retain</code> is simply useless in R, as R recycles arguments by default. If you want to do this explicitly, you might look at eg <code>rep()</code> to construct a vector with constant values and a certain length.</li> <li> <code>lag</code> is a matter of using indices, and just shifting position of all values in a vector. In order to keep a vector of the same length, you need to add some <code>NA</code> and remove some extra values. </li> </ul> A simple example: This SAS code lags a variable <code>x</code> and adds a variable <code>year</code> that has a constant value: <pre class="prettyprint"><code>data one; retain year 2013; input x @@; y=lag1(x); z=lag2(x); datalines; 1 2 3 4 5 6 ; </code></pre> In R, you could write your own lag function like this: <pre class="prettyprint"><code>mylag <- function(x,k) c(rep(NA,k),head(x,-k)) </code></pre> This single line adds k times NA at the beginning of the vector, and drops the last k values from the vector. The result is a lagged vector as given by <code>lag1</code> etc. in SAS. this allows something like : <pre class="prettyprint"><code>nrs <- 1:6 # equivalent to datalines one <- data.frame( x = nrs, y = mylag(nrs,1), z = mylag(nrs,2), year = 2013 # R automatically loops, so no extra command needed ) </code></pre> The result is : <pre class="prettyprint"><code>> one x y z year 1 1 NA NA 2013 2 2 1 NA 2013 3 3 2 1 2013 4 4 3 2 2013 5 5 4 3 2013 6 6 5 4 2013 </code></pre> Exactly the same would work with a <code>data.table</code> object. The important note here is to rethink your strategy: Instead of thinking loopwise as you do with the DATA step in SAS, you have to start thinking in terms of vectors and indices when using R.

Retain and lag function in R as SAS

1 Answers

You have to be aware that R works very different from the data step in SAS. The lag function in SAS is used in the data step, and is used within the implicit loop structure of that data step. The same goes for the retain function, which simply keeps the value constant when going through the data looping.

R on the other hand works completely vectorized. This means that you have to rethink what you want to do, and adapt accordingly.

retain is simply useless in R, as R recycles arguments by default. If you want to do this explicitly, you might look at eg rep() to construct a vector with constant values and a certain length.
lag is a matter of using indices, and just shifting position of all values in a vector. In order to keep a vector of the same length, you need to add some NA and remove some extra values.

A simple example: This SAS code lags a variable x and adds a variable year that has a constant value:

data one;
   retain year 2013;
   input x @@;
   y=lag1(x);
   z=lag2(x);
   datalines;
1 2 3 4 5 6
;

In R, you could write your own lag function like this:

mylag <- function(x,k) c(rep(NA,k),head(x,-k))

This single line adds k times NA at the beginning of the vector, and drops the last k values from the vector. The result is a lagged vector as given by lag1 etc. in SAS.

this allows something like :

nrs <- 1:6 # equivalent to datalines
one <- data.frame(
   x = nrs,
   y = mylag(nrs,1),
   z = mylag(nrs,2),
   year = 2013  # R automatically loops, so no extra command needed
)

The result is :

> one
  x  y  z year
1 1 NA NA 2013
2 2  1 NA 2013
3 3  2  1 2013
4 4  3  2 2013
5 5  4  3 2013
6 6  5  4 2013

Exactly the same would work with a data.table object. The important note here is to rethink your strategy: Instead of thinking loopwise as you do with the DATA step in SAS, you have to start thinking in terms of vectors and indices when using R.

163

answered Nov 04 '22 05:11

Joris Meys

Related questions
                            
                                Selectively Modify Indices
                            
                                Removing NA columns in xts
                            
                                How to get something like Matplotlib's symlog scale in ggplot or lattice?
                            
                                Simple way to delete dataframe rows robust to instances where no rows match deletion criteria
                            
                                Moving average with varying time window in R
                            
                                Reorganizing data from 3 rows to 1
                            
                                Transfer values from one dataframe to another
                            
                                How can I evaluate (or create) an on the fly column in data.table in r
                            
                                Number of Unique Obs by Variable in a Data Table
                            
                                building nested lists in R
                            
                                Find eigenvector for a given eigenvalue R
                            
                                cost function in cv.glm of boot library in R
                            
                                Why can't I boxplot an xts directly?
                            
                                Converting XML to JSON using R
                            
                                Collapse vector to string of characters with respective numbers of consequtive occurences
                            
                                Create a function with whole columns as input and output
                            
                                Overlay violin plots ggplot2
                            
                                How can I read Mapinfo files in R
                            
                                Why is intersect(...) faster than data table join?
                            
                                How can I use merge to cbind two dataframes

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Retain and lag function in R as SAS

Tags:

r

data.table

user2786962

People also ask

1 Answers

Joris Meys

Recent Activity

Donate For Us