Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to refer to multiple previous rows in R data.table

Tags:

r

data.table

I have a question regarding data.table in R i have a dataset like this

data <- data.table(a=c(1:7,12,32,13),b=c(1,5,6,7,8,3,2,5,1,4))

     a b
 1:  1 1
 2:  2 5
 3:  3 6
 4:  4 7
 5:  5 8
 6:  6 3
 7:  7 2
 8: 12 5
 9: 32 1
 10: 13 4

Now i want to generate a third column c, which gonna compare the value of each row of a, to all previous values of b and check if there is any value of b is bigger than a. For e.g, at row 5, a=5, and previous value of b is 1,5,6,7. so 6 and 7 is bigger than 5, therefore value of c should be 1, otherwise it would be 0. The result should be like this

     a b  c
 1:  1 1 NA
 2:  2 5  0
 3:  3 6  1
 4:  4 7  1
 5:  5 8  1
 6:  6 3  1
 7:  7 2  1
 8: 12 5  0
 9: 32 1  0
10: 13 4  0

I tried with a for loop but it takes a very long time. I also tried shift but i can not refer to multiple previous rows with shift. Anyone has any recommendation?

like image 655
Thanh Quang Avatar asked Aug 15 '16 06:08

Thanh Quang


People also ask

How do I refer to a specific row in R?

R – Get Specific Row of Matrix To get a specific row of a matrix, specify the row number followed by a comma, in square brackets, after the matrix variable name. This expression returns the required row as a vector.

How do you go to the next row in R?

The next record in R For the next record, you have to return the same column, but remove the first record and add NA at the end or other appropriate values. If you are dealing with subgroups in data like I do, then check if the next row contains the same category.


2 Answers

library(data.table)
data <- data.table(a=c(1:7,12,32,13),b=c(1,5,6,7,8,3,2,5,1,4))
data[,c:= a <= shift(cummax(b))]
like image 152
Dean MacGregor Avatar answered Sep 30 '22 18:09

Dean MacGregor


This is a base R solution (see the dplyr solution below):

data$c = NA
data$c[2:nrow(data)] <- sapply(2:nrow(data), function(x) { data$c[x] <- any(data$a[x] < data$b[1:(x-1)]) } )

##      a b  c
##  1:  1 1 NA
##  2:  2 5  0
##  3:  3 6  1
##  4:  4 7  1
##  5:  5 8  1
##  6:  6 3  1
##  7:  7 2  1
##  8: 12 5  0
##  9: 32 1  0
## 10: 13 4  0

EDIT

Here is a simpler solution using dplyr

library(dplyr)
### Given the cumulative max and comparing to 'a', set see to 1/0.
data %>% mutate(c = ifelse(a < lag(cummax(b)), 1, 0))

##     a b  c
## 1   1 1 NA
## 2   2 5  0
## 3   3 6  1
## 4   4 7  1
## 5   5 8  1
## 6   6 3  1
## 7   7 2  1
## 8  12 5  0
## 9  32 1  0
## 10 13 4  0

### Using 'shift' with dplyr
data %>% mutate(c = ifelse(a <= shift(cummax(b)), 1, 0))
like image 45
steveb Avatar answered Sep 30 '22 16:09

steveb