Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error in using the predict() function

Tags:

r

predict

Shall I convert the output of ar.ols to be some type that predict can accept?

y=rnorm(100, 0,1)
z=rnorm(100, 0,1)
yz=cbind(y,z)
> output = ar.ols(yz, aic = F, order.max = 2, demean = F, intercept = T)    
> predict(output, n.ahead = 2, se.fit = F)

x=as.data.frame(yz) # x is a data frame, and otherwise same as yz. 
> output = ar.ols(x, aic = F, order.max = 2, demean = F, intercept = T)    
> predict(output, n.ahead = 2, se.fit = F)
Error in array(STATS, dims[perm]) : 'dims' cannot be of length 0

Thanks!

like image 389
Tim Avatar asked Apr 24 '14 14:04

Tim


1 Answers

So the error is coming from predict.ar. If you will run ?predict you'll see it is a generic function which "invokes particular methods which depend on the class of the first argument"

So

class(output)
[1] "ar"

and

methods(predict)
#  [1] predict.ar*                predict.Arima*             predict.arima0*            predict.glm                predict.HoltWinters*      
#  [6] predict.lm                 predict.loess*             predict.mlm*               predict.nls*               predict.poly*             
# [11] predict.ppr*               predict.prcomp*            predict.princomp*          predict.smooth.spline*     predict.smooth.spline.fit*
# [16] predict.StructTS*

#    Non-visible functions are asterisked       

Tells you that we are looking for the first method

Next attempt would be to look for the error message within that method. The previous operation informed us the predict.ar is invisible function, so we will need to combine getAnywhere and capture.output and some regex function in order to look for the error message, though unfortunately that won't work

grep("array", capture.output(getAnywhere("predict.ar")))
## integer(0)

That means that the error is coming from some other function which runs within predict.ar.

(as @hadley mentions) we will need to use traceback() in order identify the inner function which causing it

predict(output, n.ahead = 2, se.fit = F)
# Error in array(STATS, dims[perm]) : 'dims' cannot be of length 0
traceback()
# 6: array(STATS, dims[perm])
# 5: aperm(array(STATS, dims[perm]), order(perm))
# 4: sweep(newdata, 2L, object$x.mean, check.margin = FALSE)
# 3: rbind(sweep(newdata, 2L, object$x.mean, check.margin = FALSE), 
#          matrix(rep.int(0, nser), n.ahead, nser, byrow = TRUE))
# 2: predict.ar(output, n.ahead = 2, se.fit = F)
# 1: predict(output, n.ahead = 2, se.fit = F)

This shows us nicely the workflow of our function call: call predict -> identify the class of the object and call the corresponding method predict.ar -> rbind the pre-allocated matrix (of size ncol(x)*n.ahead) with mean-centered data using sweep -> while mean-centering the data (using sweep), transpose some array and create a new array while the last operation returns the error.

So basically all sweep function does is subtracting the mean of yz from yz (mean-centering- which could be done just by running scale(yz, scale = FALSE) so not sure why they using sweep in the first place. Maybe for dmean = FALSE special case?). In your case you specified dmean = FALSE so it removes zeroes from both columns (which is quite unnecessary operation an probably should have been avoided in that case). Compare

all.equal(t(t(yz) - colMeans(yz)), sweep(yz, 2L, colMeans(yz)))
## [1] TRUE

The only problem is that sweep operates on arrays, so it tries to convert your data to an array while specifying the correct dimensions by passing the dim attribute from yz and create an array for further operations, namely something like

dims <- dim(yz)
perm <- c(2L, seq_along(dims)[-2L])
array(colMeans(yz), dims[perm])

That works fine matrices because all matrices have a dim attribute by definition.

Although data.frames don't have a dim attribute, the dim(x) function is still smart enough to calculate the dim itself, so this works perfectly fine

dim(x)
## [1] 100   2

The only problem is that the predict.ar function strips the class attribute from x somewhere in the process before it reaches sweep, so this is where the difference between the matrix and the data.frame is significant for that matter

class(x) <- NULL
dim(x)
## NULL
class(x)
## [1] "list"

While

class(yz) <- NULL
dim(yz)
## [1] 100   2
class(yz)
## [1] "matrix"

Notice, that x just became a list with different elements such as vectors and attributes, while the matrix kept its original structure thanks to its dim attribute, so the class function can still identify it's a matrix, while x was completely deformed and class can't handle it anymore.

If you want to know how class works, and what really happened see my answer here

Anyway, while this still works

STATS <- colMeans(yz)
class(yz) <- NULL
dims <- dim(yz)
perm <- c(2L, seq_along(dims)[-2L])
array(STATS, dims[perm])

This now returns you the error you saw before

x <- as.data.frame(yz)
STATS <- colMeans(x)
class(x) <- NULL
dims <- dim(x)
perm <- c(2L, seq_along(dims)[-2L])
array(STATS, dims[perm])
# Error in array(STATS, dims[perm]) : 'dims' cannot be of length 0

I'll leave you the pleasure to further dig in the rabbit hole in order to better understand how dim works.


So in order to conclude this (as mentioned in comments by @joran)- Please always start by reading the documentation. If you'll look closely into ?ar.ols, x is supposed to be A univariate or multivariate time series. In the examples, x is always an object of class ts and never a data.frame.

So while I agree that for this specific case when you've specified demean = FALSE, this error shouldn't have occurred in the first place, it is still a better practice to know what you are doing. In other words, this is the classic XY problem type of a question.

like image 129
David Arenburg Avatar answered Sep 29 '22 15:09

David Arenburg