There is the following data frame
> start_time <- c("11:23","16:21","18:57","22:00")
> end_time <- c("11:33","16:23","19:03","22:05")
> vol <- c("red","green","blue","black")
> x <- data.frame(start_time,end_time,vol)
> x
start_time end_time vol
1 11:23 11:33 red
2 16:21 16:23 green
3 18:57 19:03 blue
4 22:00 22:05 black
I want to convert this data frame as follows.
> time <- c("11:23","11:24","11:25","16:21","16:22","16:23","16:24","18:57","18:58","18:59","19:00","19:01","22:00","22:01","22:02")
> vol <- c("red","red","red","green","green","green","green","blue","blue","blue","blue","blue","black","black","black")
> y <- data.frame(time,vol)
> y
time vol
1 11:23 red
2 11:24 red
3 11:25 red
4 16:21 green
5 16:22 green
6 16:23 green
7 16:24 green
8 18:57 blue
9 18:58 blue
10 18:59 blue
11 19:00 blue
12 19:01 blue
13 22:00 black
14 22:01 black
15 22:02 black
It adds records every 1 minute from start to finish. Is this possible with R? Thank you
You could convert 'time' columns to POSIXct format, create a 1 mins sequence between start_time and end_time and expand the data.
library(dplyr)
x %>%
mutate_at(vars(ends_with('_time')), as.POSIXct, format = '%H:%M') %>%
mutate(time = purrr::map2(start_time, end_time, seq, by = 'min')) %>%
tidyr::unnest(time) %>%
mutate(time = format(time, '%H:%M')) %>%
select(-ends_with('_time'))
# A tibble: 27 x 2
# vol time
# <chr> <chr>
# 1 red 11:23
# 2 red 11:24
# 3 red 11:25
# 4 red 11:26
# 5 red 11:27
# 6 red 11:28
# 7 red 11:29
# 8 red 11:30
# 9 red 11:31
#10 red 11:32
# … with 17 more rows
Using expand.grid row-wise. For this columns with numeric start/end_time (i.e. seconds) are helpful. To avoid daylight saving time issues we choose a neutral date like "1970-01-01".
dat[4:5] <- lapply(dat[1:2], function(x) as.numeric(as.POSIXct(paste("1970-01-01", x))))
Then use expand.grid on sequences of 60 seconds of start/end_time and vol on each row, and rbind. Using transform convert back numeric time into readable time.
res <- transform(
setNames(do.call(rbind, apply(dat, 1, function(x)
expand.grid(seq(x[4], x[5], 60), x[3]))),
c("time", "vol")),
time=format(as.POSIXct(time, origin="1970-01-01"), "%H:%M"))
res
# time vol
# 1 11:23 red
# 2 11:24 red
# 3 11:25 red
# 4 11:26 red
# 5 11:27 red
# 6 11:28 red
# 7 11:29 red
# 8 11:30 red
# 9 11:31 red
# 10 11:32 red
# 11 11:33 red
# 12 16:21 green
# 13 16:22 green
# 14 16:23 green
# 15 18:57 blue
# 16 18:58 blue
# 17 18:59 blue
# 18 19:00 blue
# 19 19:01 blue
# 20 19:02 blue
# 21 19:03 blue
# 22 22:00 black
# 23 22:01 black
# 24 22:02 black
# 25 22:03 black
# 26 22:04 black
# 27 22:05 black
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With