Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the computational complexity of arima() function in R?

If my time series has n members and I want to fit ARIMA(1,0,1) model, what will be the complexity in big O notation?

In the following example I need to know the complexity of second line of code:

series <- arima.sim(list(order=c(1,0,1), ar=.9, ma=-.5), n=1000)
result <- arima(series, order=c(1,0,1))

Thanks.

like image 823
Andriy Kozachuk Avatar asked Feb 07 '23 20:02

Andriy Kozachuk


1 Answers

It is O(n) complexity. A story for this? See below.

As I said in the comment, we can measure it by a regression model. As a toy demonstration, consider the following procedure for data collection and regression.

We first define a function to measure model fitting time of an ARMA(1,1) model (or ARIMA(1,0,1)). (Note that I am using the basic system.time() here. You may consider using microbenchmark() from package microbenchmark. But in the following, I will use reasonably large n, to reduce the sensitivity in time measuring.)

t_arma <- function(N) {
  series <- arima.sim(list(order=c(1,0,1), ar=.9, ma=-.5), n=N)
  as.numeric((system.time(arima(series, order=c(1,0,1)))))[3]
  }

We need to collect, say 100, data. We try 100 increasingly large n, and measure the model fitting time t.

k <- 100; t <- numeric(k)
n <- seq(20000, by = 1000, length = k)  ## start from n = 20000 (big enough)
for (i in 1:k) t[i] <- t_arma(n[i])

Now, if we assume the complexity is: a * (n ^ b), or O(n ^ b), we can estimate a, b by a regression model:

log(t) ~ log(a) + b * log(n)

and we are particularly interested in the slop estimate: b.

So let's call lm()

fit <- lm(log(t) ~ log(n))

#Coefficients:
#(Intercept)       log(n)  
#    -9.2185       0.8646  

We can also sketch the scatter plot of log(n) v.s. log(t), as well as the fitted line:

plot(log(n), log(t), main = "scatter plot")
lines(log(n), fit$fitted, col = 2, lwd = 2)

fitted model

There are some outliers in the beginning, hence the slope is lower than it should be. We now consider removing outliers and refit model for robustness.

Outliers are featured with large residuals. Let's have a look:

plot(fit$resi, main = "residuals")

residuals

We can mark and remove those outliers. Looks like 0.5 is a good enough threshold to filter those outliers.

exclude <- fit$resi > 0.5
n <- n[!exclude]
t <- t[!exclude]

Now we refit the linear model and make plot:

robust_fit <- lm(log(t) ~ log(n))

#Coefficients:
#(Intercept)       log(n)  
#    -11.197        1.039  

plot(log(n), log(t), main = "scatter plot (outliers removed)")
lines(log(n), robust_fit$fitted, col = 2, lwd = 2)

enter image description here

Wow, great, we are golden! The slope estimate is 1. So the O(n) complexity is justified!

like image 149
Zheyuan Li Avatar answered Mar 07 '23 23:03

Zheyuan Li