Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting at the previous n-rows in a data frame?

Tags:

r

plyr

I have the following data frame.

date id value
2012-01-01 1 0.3
2012-01-01 2 0.5
2012-01-01 3 0.2
2012-01-01 4 0.8
2012-01-01 5 0.2
2012-01-01 6 0.8
2012-01-01 7 0.1
2012-01-01 8 0.4
2012-01-01 9 0.3
2012-01-01 10 0.2

There are several dates and for each date, I have 10 id values as shown above and a value field. What I would like to do is for every id find the previous n values in the "value" field. For example if n = 3 then I want the output to be as follows.

date id value value1 value2 value3
2012-01-01 1 0.3 NA NA NA
2012-01-01 2 0.5 NA NA NA
2012-01-01 3 0.2 NA NA NA
2012-01-01 4 0.8 0.2 0.5 0.3
2012-01-01 5 0.2 0.8 0.2 0.5
...

Is there an easy way to get to this either through plyr or using mapply? Thanks much in advance.

like image 764
broccoli Avatar asked May 29 '12 06:05

broccoli


People also ask

How do I find the last n rows in a data frame?

tail() to get the last n rows of a DataFrame. Call pandas. DataFrame. tail(n) with n as an integer to select the last n rows of pandas.

How do I find the first n rows of a data frame?

You can use df. head() to get the first N rows in Pandas DataFrame. Alternatively, you can specify a negative number within the brackets to get all the rows, excluding the last N rows.

Which function returns first n rows of a data?

iloc() . Use pandas. DataFrame. iloc() to get the first n rows.

Which function will select the last 3 rows of the data frame?

Get last N rows of a dataframe using tail() It will return the last N rows of dataframe as a dataframe object. Using the tail() function, we fetched the last 3 rows of dataframe as a dataframe and then just printed it.


2 Answers

You can do this quite easily using base functions:

id <- 1:10
value <- c(0.3,0.5,0.2,0.8,0.2,0.8,0.1,0.4,0.3,0.2)
test <- data.frame(id,value)

test$valprev1 <- c(rep(NA,1),head(test$value,-1))
test$valprev2 <- c(rep(NA,2),head(test$value,-2))
test$valprev3 <- c(rep(NA,3),head(test$value,-3))

Result

   id value valprev1 valprev2 valprev3
1   1   0.3       NA       NA       NA
2   2   0.5      0.3       NA       NA
3   3   0.2      0.5      0.3       NA
4   4   0.8      0.2      0.5      0.3
5   5   0.2      0.8      0.2      0.5
6   6   0.8      0.2      0.8      0.2
7   7   0.1      0.8      0.2      0.8
8   8   0.4      0.1      0.8      0.2
9   9   0.3      0.4      0.1      0.8
10 10   0.2      0.3      0.4      0.1

Made a mistake here previously - here is an sapply version in a function:

prevrows <- function(data,n) {sapply(1:n,function(x) c(rep(NA,x),head(data,-x)))}
prevrows(test$value,3)

Which gives just this:

      [,1] [,2] [,3]
 [1,]   NA   NA   NA
 [2,]  0.3   NA   NA
 [3,]  0.5  0.3   NA
 [4,]  0.2  0.5  0.3
 [5,]  0.8  0.2  0.5
 [6,]  0.2  0.8  0.2
 [7,]  0.8  0.2  0.8
 [8,]  0.1  0.8  0.2
 [9,]  0.4  0.1  0.8
[10,]  0.3  0.4  0.1

You could then apply this to each set of dates in your data like this:

result <- tapply(test$value,test$date,prevrows,3)

Which gives a bunch of lists for each date set. You could rowbind these up for adding back to your data set with:

data.frame(test,do.call(rbind,result))
like image 93
thelatemail Avatar answered Oct 12 '22 22:10

thelatemail


Using data.table v1.9.5+ this is as simple as:

library(data.table)
setDT(dt)

lags <- dt[, shift(value, n = c(1,2,3))]

or to append them as additional columns in the same data.table:

dt[, c("lag1", "lag2", "lag3") := shift(value, n = c(1,2,3))]
like image 29
Bar Avatar answered Oct 13 '22 00:10

Bar