Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R ts with missing values

Tags:

r

time-series

I have a data frame I read from a csv file that has daily observations:

Date        Value 
2010-01-04  23.4
2010-01-05  12.7
2010-01-04  20.1
2010-01-07  18.2

PROBLEM: Missing data. Forecast package expects a plain ts object not containing any missing data, while my dataset has missing data on most weekends and other random points.

converting to ts should not work

ts(values, start = c(1997, 1), frequency = 1)

the only solution I can think of is to transform daily data to weekly data but R is a new thing and other better solutions could exist.

like image 285
Gavello Avatar asked Dec 08 '14 22:12

Gavello


People also ask

How do you represent missing values in R?

In R, missing values are represented by the symbol NA (not available). Impossible values (e.g., dividing by zero) are represented by the symbol NaN (not a number). Unlike SAS, R uses the same symbol for character and numeric data.

How do I make R values ignore missing?

First, if we want to exclude missing values from mathematical operations use the na. rm = TRUE argument. If you do not exclude these values most functions will return an NA . We may also desire to subset our data to obtain complete observations, those observations (rows) in our data that contain no missing data.


2 Answers

One option is to expand your date index to include the missing observations, and use na.approx from zoo to fill in the missing values via interpolation.

allDates <- seq.Date(
  min(values$Date),
  max(values$Date),
  "day")
##
allValues <- merge(
  x=data.frame(Date=allDates),
  y=values,
  all.x=TRUE)
R> head(allValues,7)
        Date      Value
1 2010-01-05 -0.6041787
2 2010-01-06  0.2274668
3 2010-01-07 -1.2751761
4 2010-01-08 -0.8696818
5 2010-01-09         NA
6 2010-01-10         NA
7 2010-01-11 -0.3486378
##
zooValues <- zoo(allValues$Value,allValues$Date)
R> head(zooValues,7)
2010-01-05 2010-01-06 2010-01-07 2010-01-08 2010-01-09 2010-01-10 2010-01-11 
-0.6041787  0.2274668 -1.2751761 -0.8696818         NA         NA -0.3486378 
##
approxValues <- na.approx(zooValues)
R> head(approxValues,7)
2010-01-05 2010-01-06 2010-01-07 2010-01-08 2010-01-09 2010-01-10 2010-01-11 
-0.6041787  0.2274668 -1.2751761 -0.8696818 -0.6960005 -0.5223192 -0.3486378

Even with missing values, zooValues is still a legitimate zoo object, e.g. plot(zooValues) will work (with discontinuities at missing values), but if you plan on fitting some sort of model to the data, you will most likely be better off using na.approx to replace the missing values.

Data:

library(zoo)
library(lubridate)
##
t0 <- "2010-01-04"
Dates <- as.Date(ymd(t0))+1:120
weekDays <- Dates[!(weekdays(Dates) %in% c("Saturday","Sunday"))]
##
set.seed(123)
values <- data.frame(Date=weekDays,Value=rnorm(length(weekDays)))
like image 132
nrussell Avatar answered Oct 19 '22 21:10

nrussell


You can use the imputeTS, zoo or forecast package, which all offer methods to fill the missing data. (the process of filling missing gaps is also called imputation)

imputeTS

na_interpolation(yourData)
na_seadec(yourdata)
na_kalman(yourdata)
na_ma(yourdata)

zoo

na.approx(yourdata)
na.locf(yourdata)
na.StructTS(yourdata)

forecast

na.interp(yourdata)

These are some functions from the packages you could use.

like image 3
Steffen Moritz Avatar answered Oct 19 '22 21:10

Steffen Moritz