I am using R for some statistical analysis of time series. I have tried Googling around, but I can't seem to find any definitive answers. Can any one who knows more please point me in the right direction?
Example:
Let's say I want to do a linear regression of two time series. The time series contain daily data, but there might be gaps here and there so the time series are not regular. Naturally I only want to compare data points where both time series have data. This is what I do currently to read the csv files into a data frame:
library(zoo)
apples <- read.csv('/Data/apples.csv', as.is=TRUE)
oranges <- read.csv('/Data/oranges.csv', as.is=TRUE)
apples$date <- as.Date(apples$date, "%d/%m/%Y")
oranges$date <- as.Date(oranges$date, "%d/%m/%Y")
zapples <- zoo(apples$close,apples$date)
zoranges <- zoo(oranges$close,oranges$date)
zdata <- merge(zapples, zoranges, all=FALSE)
data <- as.data.frame(zdata)
Is there a slicker way of doing this?
Also, how can I slice the data, e.g., select the entries in data
with dates within a certain period?
AutoRegressive Integrated Moving Average (ARIMA) models are among the most widely used time series forecasting techniques: In an Autoregressive model, the forecasts correspond to a linear combination of past values of the variable.
Time Series Analysis Models and TechniquesBox-Jenkins ARIMA models: These univariate models are used to better understand a single time-dependent variable, such as temperature over time, and to predict future data points of variables. These models work on the assumption that the data is stationary.
R has many useful functions and packages for time series analysis. You'll find pointers to them in the task view for Time Series Analysis.
You can then use the “SMA()” function to smooth time series data. To use the SMA() function, you need to specify the order (span) of the simple moving average, using the parameter “n”. For example, to calculate a simple moving average of order 5, we set n=5 in the SMA() function.
Try something along these lines. This assumes that the dates are in column 1. The dyn package can be used to transform lm
, glm
and many similar regression type functions to ones that accept zoo series. Write dyn$lm
in place of lm
as shown:
library(dyn) # also loads zoo
fmt <- "%d/%m/%Y"
zapples <- read.zoo('apples.csv', header = TRUE, sep = ",", format = fmt)
zoranges <- read.zoo('oranges.csv', header = TRUE, sep = ",", format = fmt)
zdata <- merge(zapples, zoranges)
dyn$lm(..whatever.., zdata)
You don't need all = FALSE
since lm
will ignore rows with NAs under the default setting of its na.action
argument.
The window.zoo
function can be used to slice data.
Depending on what you want to do you might also want to look at the xts and quantmod packages.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With