I have a time series data with 1 minute increments. I have written a code but with the large amount of data I have (over 1M rows), looping through each line is taking way too long. The data looks something like the below:
t0 = as.POSIXlt("2018-12-23 00:01:00")
t0 = t0+seq(60,60*10,60)
p1 = seq(5,5*10,5)
p2 = seq(7,7*10,7)
m0 = cbind(p1,p2)
rownames(m0) = as.character(t0)
Where it looks something like:
> head(m0)
p1 p2
2018-12-23 00:02:00 5 7
2018-12-23 00:03:00 10 14
2018-12-23 00:04:00 15 21
2018-12-23 00:05:00 20 28
2018-12-23 00:06:00 25 35
2018-12-23 00:07:00 30 42
I want to turn this data into 5 seconds increments by adding 11 lines (55 seconds) before each minute with the value carrying over from the latest value. So it would something like:
> new0
p1 p2
2018-12-23 00:01:05 5 7
2018-12-23 00:01:10 5 7
2018-12-23 00:01:15 5 7
2018-12-23 00:01:20 5 7
2018-12-23 00:01:25 5 7
2018-12-23 00:01:30 5 7
2018-12-23 00:01:35 5 7
2018-12-23 00:01:40 5 7
2018-12-23 00:01:45 5 7
2018-12-23 00:01:50 5 7
2018-12-23 00:01:55 5 7
2018-12-23 00:02:00 5 7
2018-12-23 00:02:05 10 14
2018-12-23 00:02:10 10 14
2018-12-23 00:02:15 10 14
2018-12-23 00:02:20 10 14
2018-12-23 00:02:25 10 14
2018-12-23 00:02:30 10 14
2018-12-23 00:02:35 10 14
2018-12-23 00:02:40 10 14
2018-12-23 00:02:45 10 14
2018-12-23 00:02:50 10 14
2018-12-23 00:02:55 10 14
2018-12-23 00:03:00 10 14
I am hoping to find some way to do it without using a loop and utilizing the efficient codes in xts and/or data.table which I am not too familiar with.
I tried using the ave
function from base R, but it is not fast enough.
Since you tagged this with data.table
:
library(data.table)
dt = as.data.table(m0, keep = T)[, rn := as.POSIXct(rn)]
dt[.(rep(rn, each = 12) - seq(0, 55, 5)), on = 'rn', roll = -Inf][order(rn)]
# rn p1 p2
# 1: 2018-12-23 00:01:05 5 7
# 2: 2018-12-23 00:01:10 5 7
# 3: 2018-12-23 00:01:15 5 7
# 4: 2018-12-23 00:01:20 5 7
# 5: 2018-12-23 00:01:25 5 7
# ---
#116: 2018-12-23 00:10:40 50 70
#117: 2018-12-23 00:10:45 50 70
#118: 2018-12-23 00:10:50 50 70
#119: 2018-12-23 00:10:55 50 70
#120: 2018-12-23 00:11:00 50 70
Here's one way to do it in base R. First, convert your data to a data frame with an explicit column for the time stamps:
m0 <- as.data.frame(m0)
m0$t <- t0
p1 p2 t
1 5 7 2018-12-23 00:02:00
2 10 14 2018-12-23 00:03:00
3 15 21 2018-12-23 00:04:00
4 20 28 2018-12-23 00:05:00
5 25 35 2018-12-23 00:06:00
6 30 42 2018-12-23 00:07:00
7 35 49 2018-12-23 00:08:00
8 40 56 2018-12-23 00:09:00
9 45 63 2018-12-23 00:10:00
10 50 70 2018-12-23 00:11:00
Then merge
this data frame with a 1-column data frame of time differences (0 to 55):
m1 <- merge(m0, data.frame(diff = seq(0, 55, 5)))
And finally, subtract the difference column from the timestamp column to create new values:
m1$t2 <- with(m1, t - diff)
> m1[c(1, 20, 40), ]
p1 p2 t diff t2
1 5 7 2018-12-23 00:02:00 0 2018-12-23 00:02:00
20 50 70 2018-12-23 00:11:00 5 2018-12-23 00:10:55
40 50 70 2018-12-23 00:11:00 15 2018-12-23 00:10:45
A combination of lubridate, padr
and tidyr will get you there. I use lubridate
to format the date so it plays nice with padr
. padr
adds missing date time values to a data frame. Finally using tidyr's fill
function to fill the empty values. Note that by default padr
has a break on 1 million rows for memory protection, but you can set this value higher.
library(lubridate)
library(padr)
library(tidyr)
df1 <- data.frame(ymd_hms(t0), p1, p2)
df1 <- pad(df1, interval = "5 secs", start_val = lubridate::ymd_hms("2018-12-23 00:01:05"))
df1 <- fill(df1, p1, p2, .direction = "up")
head(df1, 15)
t0 p1 p2
1 2018-12-23 00:01:05 5 7
2 2018-12-23 00:01:10 5 7
3 2018-12-23 00:01:15 5 7
4 2018-12-23 00:01:20 5 7
5 2018-12-23 00:01:25 5 7
6 2018-12-23 00:01:30 5 7
7 2018-12-23 00:01:35 5 7
8 2018-12-23 00:01:40 5 7
9 2018-12-23 00:01:45 5 7
10 2018-12-23 00:01:50 5 7
11 2018-12-23 00:01:55 5 7
12 2018-12-23 00:02:00 5 7
13 2018-12-23 00:02:05 10 14
14 2018-12-23 00:02:10 10 14
15 2018-12-23 00:02:15 10 14
A base way:
m0 <- as.data.frame(m0)
time <- lapply(as.POSIXct(rownames(m0)), seq, by = "-5 sec", len = 12)
m1 <- cbind(TIME = Reduce(c, time), m0[rep(seq_len(nrow(m0)), each = 12), ])
row.names(m1) <- NULL
head(m1)
# TIME p1 p2
# 1 2018-12-23 00:02:00 5 7
# 2 2018-12-23 00:01:55 5 7
# 3 2018-12-23 00:01:50 5 7
# 4 2018-12-23 00:01:45 5 7
# 5 2018-12-23 00:01:40 5 7
# 6 2018-12-23 00:01:35 5 7
Note: The variable TIME
in output is reversed.
Here's a general xts solution that should work for different parameters than what you have specified in your question.
# convert m0 to xts
x0 <- as.xts(m0)
# create empty xts object with observations at all time points you want
nobs <- 11
nsec <- 5
y0 <- xts(, index(x0) - rep(seq_len(nobs) * nsec, each = nrow(x0)))
# merge data with desired index observations
new0 <- merge(x0, y0)
# carry the current value backward
new0 <- na.locf(new0, fromLast = TRUE)
head(new0, 20)
# p1 p2
# 2018-12-23 00:01:05 5 7
# 2018-12-23 00:01:10 5 7
# 2018-12-23 00:01:15 5 7
# 2018-12-23 00:01:20 5 7
# 2018-12-23 00:01:25 5 7
# 2018-12-23 00:01:30 5 7
# 2018-12-23 00:01:35 5 7
# 2018-12-23 00:01:40 5 7
# 2018-12-23 00:01:45 5 7
# 2018-12-23 00:01:50 5 7
# 2018-12-23 00:01:55 5 7
# 2018-12-23 00:02:00 5 7
# 2018-12-23 00:02:05 10 14
# 2018-12-23 00:02:10 10 14
# 2018-12-23 00:02:15 10 14
# 2018-12-23 00:02:20 10 14
# 2018-12-23 00:02:25 10 14
# 2018-12-23 00:02:30 10 14
# 2018-12-23 00:02:35 10 14
# 2018-12-23 00:02:40 10 14
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With