I have a data frame containing a time series of monthly data, with some missing values.
dates <- seq(
as.Date("2010-01-01"), as.Date("2017-12-01"), "1 month"
)
n_dates <- length(dates)
dates <- dates[runif(n_dates) < 0.5]
time_data <- data.frame(
date = dates,
value = rnorm(length(dates))
)
## date value
## 1 2010-02-01 1.3625419
## 2 2010-06-01 0.1512481
## etc.
In order do be able to make use of time series forecasting functionality in, e.g., forecast
, I'd like to convert this to a ts
object.
The dumb way to do this is to create a regular set of monthly dates over the whole time period, then left join back to the original data.
library(dplyr)
first_date <- min(time_data$date)
last_date <- max(time_data$date)
full_dates <- data.frame(
date = seq(first_date, last_date, "1 month")
)
extended_time_data <- left_join(full_dates, time_data, by = "date")
## date value
## 1 2010-02-01 1.3625419
## 2 2010-03-01 NA
## etc.
Now I can create the time series using ts()
.
library(lubridate)
time_series <- ts(
extended_time_data$value,
start = c(year(first_date), month(first_date)),
frequency = 12
)
For such a simple task, this is long-winded and pretty gross.
I also looked into first converting to xts
, and using a convertor from the timetk
package, but nothing jumped out at me as an easier way.
This question is a dupe of How to create time series with missing datetime values, but the answer there was even fuzzier.
How do I create a ts
object from a time series with missing values?
Interpolation is a powerful method to fill missing values in time-series data.
The most common approach to handling missing data with LSTM networks is data interpolation pre-processing step, usually using mean or forward imputation.
The method argument of fillna() can be used to replace missing values with previous/next valid values. If method is set to 'ffill' or 'pad' , missing values are replaced with previous valid values (= forward fill), and if 'bfill' or 'backfill' , replaced with the next valid values (= backward fill).
Using the input data frame defined in the Note at the end, convert it to a zoo object with index of class yearmon
. Then as.ts
will convert it to ts
.
library(zoo)
z <- read.zoo(DF, FUN = as.yearmon)
as.ts(z)
## Jan Feb Mar Apr May Jun Jul Aug
## 2000 1 NA NA 2 3 NA 4 5
If you prefer to express it in terms of pipes:
library(magrittr)
library(zoo)
DF %>% read.zoo(FUN = as.yearmon) %>% as.ts
If desired, interpolate the values in the time series using na.locf
(last occurrence carried forward), na.approx
(linear interpolation), na.spline
, na.StructTS
(seasonal Kalman filter) or other zoo NA filling function. e.g.
library(forecast)
DF %>% read.zoo(FUN = as.yearmon) %>% as.ts %>% na.spline %>% forecast
The data in the question is not reproducible because random numbers are used without set.seed
and n_dates
is undefined. Below we define a data frame DF
reproducibly for purposes of example.
library(zoo)
dates <- as.Date(as.yearmon("2000-01") + c(0, 3, 4, 6, 7)/12)
DF <- data.frame(dates, values = seq_along(dates))
giving:
> DF
dates values
1 2000-01-01 1
2 2000-04-01 2
3 2000-05-01 3
4 2000-07-01 4
5 2000-08-01 5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With