I have scoured SO for a way to achieve what I need without luck so here it goes. A while back I discovered the package dplyr and its potential. I am thinking this package can do what I want, I just don't know how. This is a small subset of my data, but should be representative of my problem.
dummy<-structure(list(time = structure(1:20, .Label = c("2015-03-25 12:24:00",
"2015-03-25 21:08:00", "2015-03-25 21:13:00", "2015-03-25 21:47:00",
"2015-03-26 03:08:00", "2015-04-01 20:30:00", "2015-04-01 20:34:00",
"2015-04-01 20:42:00", "2015-04-01 20:45:00", "2015-09-29 18:26:00",
"2015-09-29 19:11:00", "2015-09-29 21:21:00", "2015-09-29 22:03:00",
"2015-09-29 22:38:00", "2015-09-30 00:48:00", "2015-09-30 01:38:00",
"2015-09-30 01:41:00", "2015-09-30 01:45:00", "2015-09-30 01:47:00",
"2015-09-30 01:49:00"), class = "factor"), ID = c(1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L), station = c(1L, 1L, 1L, 2L, 3,
4L, 4L, 4L, 4L, 5L, 5L, 6L,
6L, 5, 5, 5L, 7, 7, 7L,
7)), .Names = c("time", "ID", "station"), class = "data.frame", row.names = c(NA,
-20L))
I wish to evaluate rows within the time column conditional on the ID and station column. Specifically, I would like the function (dplyr?) to evaluate each time row, and compare the time to the previous time (row-1) and next time (row+1). If the time of current row is within 1 hour of time of previous and/or next row, and the ID and station of current row match that of previous and/or next row, then I would like to add in a new row a 1, otherwise a 0.
How would I achieve this using dplyr?
The expected outcome should be like this:
time ID station new.value
1 2015-03-25 12:24:00 1 1 0
2 2015-03-25 21:08:00 1 1 1
3 2015-03-25 21:13:00 1 1 1
4 2015-03-25 21:47:00 1 2 0
5 2015-03-26 03:08:00 1 3 0
6 2015-04-01 20:30:00 1 4 1
7 2015-04-01 20:34:00 1 4 1
8 2015-04-01 20:42:00 1 4 1
9 2015-04-01 20:45:00 1 4 1
10 2015-09-29 18:26:00 2 5 1
11 2015-09-29 19:11:00 2 5 1
12 2015-09-29 21:21:00 2 6 1
13 2015-09-29 22:03:00 2 6 1
14 2015-09-29 22:38:00 2 5 0
15 2015-09-30 00:48:00 2 5 1
16 2015-09-30 01:38:00 2 5 1
17 2015-09-30 01:41:00 2 7 1
18 2015-09-30 01:45:00 2 7 1
19 2015-09-30 01:47:00 2 7 1
20 2015-09-30 01:49:00 2 7 1
Here is an option using the difftime
with dplyr
mutate function. Firstly, we use a group_by
operation to make sure the comparison is within each unique combination of ID and Station. The difftime
can be used to calculate the difference time, here the units will be set as hours
for convenience. The lag
and lead
functions are also from dplyr
package which shift the selected column backward or forward. Combining with the vectorised operation of difftime, you can calculate the time difference between the current row and the previous/next row. We use abs
to make sure the result is absolute value. The condition of <1
make sure the difference is within an hour. as.integer
convert the logical values (T or F) to (1 or 0) correspondingly.
library(dplyr)
dummy %>% group_by(ID, station) %>%
mutate(new.value = as.integer(
abs(difftime(time, lag(time, default = Inf), units = "hours")) < 1 |
abs(difftime(time, lead(time, default = Inf), units = "hours")) < 1))
Source: local data frame [20 x 4]
Groups: ID, station [7]
time ID station new.value
(time) (int) (dbl) (int)
1 2015-03-25 12:24:00 1 1 0
2 2015-03-25 21:08:00 1 1 1
3 2015-03-25 21:13:00 1 1 1
4 2015-03-25 21:47:00 1 2 0
5 2015-03-26 03:08:00 1 3 0
6 2015-04-01 20:30:00 1 4 1
7 2015-04-01 20:34:00 1 4 1
8 2015-04-01 20:42:00 1 4 1
9 2015-04-01 20:45:00 1 4 1
10 2015-09-29 18:26:00 2 5 1
11 2015-09-29 19:11:00 2 5 1
12 2015-09-29 21:21:00 2 6 1
13 2015-09-29 22:03:00 2 6 1
14 2015-09-29 22:38:00 2 5 0
15 2015-09-30 00:48:00 2 5 1
16 2015-09-30 01:38:00 2 5 1
17 2015-09-30 01:41:00 2 7 1
18 2015-09-30 01:45:00 2 7 1
19 2015-09-30 01:47:00 2 7 1
20 2015-09-30 01:49:00 2 7 1
Psidom's answer is great -- here's a data.table
approach.
library(data.table)
setDT(dummy)
# you do NOT want a factor for your time variable
dummy[, time := as.POSIXct(time) ]
dummy[, `:=`(lag_diff = c(Inf, diff(as.numeric(time))),
lead_diff = c(diff(as.numeric(time)), Inf)),
by = .(ID, station) ]
dummy[, new.value := as.integer(lag_diff < 3600 | lead_diff < 3600) ]
dummy
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With