Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Forecasting model predict one day ahead - sliding window

I'm struggeling with a problem. I'm using SparkR for time series forecasting, but this scenario can also transferred to normal R environment. Instead of using ARIMA model I want to use regression models such as Random Forest Regression etc. to forecast the load of one day ahead. I also read about the sliding window approach to evaluate the performance of different regressors with respect to different parameters combinations. So to get a better understanding this is an example of the structure of my dataset:

Timestamp              UsageCPU     UsageMemory   Indicator  Delay
2014-01-03 21:50:00    3123            1231          1        123
2014-01-03 22:00:00    5123            2355          1        322
2014-01-03 22:10:00    3121            1233          2        321
2014-01-03 22:20:00    2111            1234          2        211
2014-01-03 22:30:00    1000            2222          2         0 
2014-01-03 22:40:00    4754            1599          1         0

To use any kind of regressor the next step is to extract feature and transform them into a readable format, because these regressions can not read timestamps:

Year   Month  Day  Hour    Minute    UsageCPU   UsageMemory  Indicator Delay
2014   1      3    21       50        3123        1231          1      123
2014   1      3    22       00        5123        2355          1      322
2014   1      3    22       10        3121        1233          2      321
2114   1      3    22       20        2111        1234          2      211

The next step is to create training and test set for the model.

trainTest <-randomSplit(SparkDF,c(0.7,0.3), seed=42)
train <- trainTest[[1]]
test <- trainTest[[2]]

Then it is possible to create the model + prediction (the setting of the randomForest is firstly not relevant):

model <- spark.randomForest(train, UsageCPU ~ ., type = "regression", maxDepth = 5, maxBins = 16)
predictions <- predict(model, test)

So I know all these steps and by plotting the predicted data with actual data it looks quite good. But this regression model is not dynamic, which means I can not predict one day ahead. Because the features such as UsageCPU, UsageMemory etc. does not exist, I want to predict from historical values to the next day. As mentioned in the beginning the sliding window approach can work here, but I'm not sure how to apply it (on the whole dataset, only on the training or test set).

This implementation was from shabbychef's and mbq:

 slideMean<-function(x,windowsize=3,slide=2){
 idx1<-seq(1,length(x),by=slide);
 idx1+windowsize->idx2;
 idx2[idx2>(length(x)+1)]<-length(x)+1;
 c(0,cumsum(x))->cx;
 return((cx[idx2]-cx[idx1])/windowsize);
}

The last question deals about the window size. I want to predict the next day in hours (00,01,02,03...), but the time stamps have an interval of 10min, so in my calculation the size of a window should be 144 (10*60*24 / 10).

Would be so nice if someone can help me. Thanks!

like image 977
Daniel Avatar asked May 02 '17 11:05

Daniel


1 Answers

I also had same problem for time-series prediction using Neural nets. I implemented many models and the one that worked best was the sliding window combined with Neural nets. I also confirmed from other Researchers in the field. from this we conclude that if you want to predict 1 day ahead (24 horizons) in a single step training will be demanding for the system. We proceeded the following:

1. We had a sliding window of 24 hours. e.g lets use [1,2,3] here
2. Then use ML model to predict the [4]. Meaning use value 4 as target. 
# As illustration we had 
x = [1,2,3] 
# then set target as 
y=[4]. 
# We had a function that returns the x=[1,2,3] and y =[4] and
# shift the window in the next training step. 
3.To the:
x =[1,2,3] 
we can add further features that are important to the model. 
x=[1,2,3,feature_x]

4. Then we minimise error and shift the window to have:
 x = [2,3,4,feature_x] and y = [5]. 
5. You could also predict two values ahead. e.g [4,5] .
6. Use a list to collect output and plot
7. Make prediction after the training.
like image 110
smile Avatar answered Oct 05 '22 22:10

smile