dplyr mutate function to evaluate values within columns (current, previous, next) vertically

Question

I have scoured SO for a way to achieve what I need without luck so here it goes. A while back I discovered the package dplyr and its potential. I am thinking this package can do what I want, I just don't know how. This is a small subset of my data, but should be representative of my problem.

    dummy<-structure(list(time = structure(1:20, .Label = c("2015-03-25 12:24:00", 
    "2015-03-25 21:08:00", "2015-03-25 21:13:00", "2015-03-25 21:47:00", 
    "2015-03-26 03:08:00", "2015-04-01 20:30:00", "2015-04-01 20:34:00", 
    "2015-04-01 20:42:00", "2015-04-01 20:45:00", "2015-09-29 18:26:00", 
    "2015-09-29 19:11:00", "2015-09-29 21:21:00", "2015-09-29 22:03:00", 
    "2015-09-29 22:38:00", "2015-09-30 00:48:00", "2015-09-30 01:38:00", 
    "2015-09-30 01:41:00", "2015-09-30 01:45:00", "2015-09-30 01:47:00", 
    "2015-09-30 01:49:00"), class = "factor"), ID = c(1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L), station = c(1L, 1L, 1L, 2L, 3, 
    4L, 4L, 4L, 4L, 5L, 5L, 6L, 
    6L, 5, 5, 5L, 7, 7, 7L, 
    7)), .Names = c("time", "ID", "station"), class = "data.frame", row.names = c(NA, 
    -20L))

I wish to evaluate rows within the time column conditional on the ID and station column. Specifically, I would like the function (dplyr?) to evaluate each time row, and compare the time to the previous time (row-1) and next time (row+1). If the time of current row is within 1 hour of time of previous and/or next row, and the ID and station of current row match that of previous and/or next row, then I would like to add in a new row a 1, otherwise a 0.

How would I achieve this using dplyr?

The expected outcome should be like this:

                  time ID station new.value
1  2015-03-25 12:24:00  1       1         0
2  2015-03-25 21:08:00  1       1         1
3  2015-03-25 21:13:00  1       1         1
4  2015-03-25 21:47:00  1       2         0
5  2015-03-26 03:08:00  1       3         0
6  2015-04-01 20:30:00  1       4         1
7  2015-04-01 20:34:00  1       4         1
8  2015-04-01 20:42:00  1       4         1
9  2015-04-01 20:45:00  1       4         1
10 2015-09-29 18:26:00  2       5         1
11 2015-09-29 19:11:00  2       5         1
12 2015-09-29 21:21:00  2       6         1
13 2015-09-29 22:03:00  2       6         1
14 2015-09-29 22:38:00  2       5         0
15 2015-09-30 00:48:00  2       5         1
16 2015-09-30 01:38:00  2       5         1
17 2015-09-30 01:41:00  2       7         1
18 2015-09-30 01:45:00  2       7         1
19 2015-09-30 01:47:00  2       7         1
20 2015-09-30 01:49:00  2       7         1

Psidom · Accepted Answer

Here is an option using the difftime with dplyr mutate function. Firstly, we use a group_by operation to make sure the comparison is within each unique combination of ID and Station. The difftime can be used to calculate the difference time, here the units will be set as hours for convenience. The lag and lead functions are also from dplyr package which shift the selected column backward or forward. Combining with the vectorised operation of difftime, you can calculate the time difference between the current row and the previous/next row. We use abs to make sure the result is absolute value. The condition of <1 make sure the difference is within an hour. as.integer convert the logical values (T or F) to (1 or 0) correspondingly.

library(dplyr)
dummy %>% group_by(ID, station) %>% 
          mutate(new.value = as.integer(
                 abs(difftime(time, lag(time, default = Inf), units = "hours")) < 1 | 
                 abs(difftime(time, lead(time, default = Inf), units = "hours")) < 1))

Source: local data frame [20 x 4]
Groups: ID, station [7]

                  time    ID station new.value
                (time) (int)   (dbl)     (int)
1  2015-03-25 12:24:00     1       1         0
2  2015-03-25 21:08:00     1       1         1
3  2015-03-25 21:13:00     1       1         1
4  2015-03-25 21:47:00     1       2         0
5  2015-03-26 03:08:00     1       3         0
6  2015-04-01 20:30:00     1       4         1
7  2015-04-01 20:34:00     1       4         1
8  2015-04-01 20:42:00     1       4         1
9  2015-04-01 20:45:00     1       4         1
10 2015-09-29 18:26:00     2       5         1
11 2015-09-29 19:11:00     2       5         1
12 2015-09-29 21:21:00     2       6         1
13 2015-09-29 22:03:00     2       6         1
14 2015-09-29 22:38:00     2       5         0
15 2015-09-30 00:48:00     2       5         1
16 2015-09-30 01:38:00     2       5         1
17 2015-09-30 01:41:00     2       7         1
18 2015-09-30 01:45:00     2       7         1
19 2015-09-30 01:47:00     2       7         1
20 2015-09-30 01:49:00     2       7         1

C8H10N4O2 · Answer

Psidom's answer is great -- here's a data.table approach.

library(data.table)
setDT(dummy)
# you do NOT want a factor for your time variable
dummy[, time := as.POSIXct(time) ]
dummy[, `:=`(lag_diff = c(Inf, diff(as.numeric(time))),
             lead_diff = c(diff(as.numeric(time)), Inf)),
      by = .(ID, station) ]
dummy[, new.value := as.integer(lag_diff < 3600 | lead_diff < 3600) ]
dummy

dplyr mutate function to evaluate values within columns (current, previous, next) vertically

Tags:

r

dplyr

FlyingDutch

2 Answers

Psidom

C8H10N4O2

Recent Activity

Donate For Us

dplyr mutate function to evaluate values within columns (current, previous, next) vertically

Tags:

r

dplyr

FlyingDutch

2 Answers

Psidom

C8H10N4O2

Related questions

Recent Activity

Donate For Us