Creating a ts time series with missing values from a data frame

I have a data frame containing a time series of monthly data, with some missing values.

dates <- seq(
  as.Date("2010-01-01"), as.Date("2017-12-01"), "1 month"
n_dates <- length(dates)
dates <- dates[runif(n_dates) < 0.5]
time_data <- data.frame(
  date = dates,
  value = rnorm(length(dates))
##          date      value
## 1  2010-02-01  1.3625419
## 2  2010-06-01  0.1512481
## etc.

In order do be able to make use of time series forecasting functionality in, e.g., forecast, I'd like to convert this to a ts object.

The dumb way to do this is to create a regular set of monthly dates over the whole time period, then left join back to the original data.

first_date <- min(time_data$date)
last_date <- max(time_data$date)
full_dates <- data.frame(
  date = seq(first_date, last_date, "1 month")
extended_time_data <- left_join(full_dates, time_data, by = "date")
##          date      value
## 1  2010-02-01  1.3625419
## 2  2010-03-01         NA
## etc.

Now I can create the time series using ts().

time_series <- ts(
  start = c(year(first_date), month(first_date)),
  frequency = 12

For such a simple task, this is long-winded and pretty gross.

I also looked into first converting to xts, and using a convertor from the timetk package, but nothing jumped out at me as an easier way.

This question is a dupe of How to create time series with missing datetime values, but the answer there was even fuzzier.

How do I create a ts object from a time series with missing values?

1 Answers

Using the input data frame defined in the Note at the end, convert it to a zoo object with index of class yearmon. Then as.ts will convert it to ts.


z <- read.zoo(DF, FUN = as.yearmon)
##      Jan Feb Mar Apr May Jun Jul Aug
## 2000   1  NA  NA   2   3  NA   4   5

If you prefer to express it in terms of pipes:


DF %>% read.zoo(FUN = as.yearmon) %>% as.ts

If desired, interpolate the values in the time series using na.locf (last occurrence carried forward), na.approx (linear interpolation), na.spline, na.StructTS (seasonal Kalman filter) or other zoo NA filling function. e.g.


DF %>% read.zoo(FUN = as.yearmon) %>% as.ts %>% na.spline %>% forecast


The data in the question is not reproducible because random numbers are used without set.seed and n_dates is undefined. Below we define a data frame DF reproducibly for purposes of example.


dates <- as.Date(as.yearmon("2000-01") + c(0, 3, 4, 6, 7)/12)
DF <- data.frame(dates, values = seq_along(dates))


> DF
       dates values
1 2000-01-01      1
2 2000-04-01      2
3 2000-05-01      3
4 2000-07-01      4
5 2000-08-01      5
