Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Struggling to understand the parameters of the cross validation function in fbprophet library

Basically I have 780 (daily) observations from which i would like to train 80% of the data, and the remaining 20% use for cross validation. Therefore I understand I shall use :

df_cv = cross_validation(m, initial='624 days', horizon='156 days')

where initial date corresponds to the first nº of observations I would like to train and horizon the remaining nº of observations I would like to use for cross validation.

I think i am not applying this correctly as it appears a cutoff date I do not really understand what is for.

How could I achieve my goal of using the initial 80% of observations to train the data an last 20% for cross validation?

Thank you in advance

like image 779
JamesHudson81 Avatar asked Dec 23 '22 17:12

JamesHudson81


1 Answers

The cutoff date is used to determine what is in your train dataset (before cutoff) on the first validation iteration and what will be forecasted(after cutoff). If you want to use 80% as the train data, and you want to do cross validation, you can't set the horizon value to 20% of your data as that will allow for only one validation. You will need a smaller number to use for the horizon as that determines how many days you forecast per iteration. For each ordered validation iteration, FBProphet will forecast between the cutoff and the cutoff + horizon and then add the period to get the next cutoff. Here is an example:

800 days total dataset
initial = 624 == size of train dataset
horizon = 20 == size to be forecasted
period = 10 (default = 1/2 of horizon) == spacing between cutoff dates as incremental

1st Iteration: Train on 1-624, Forecast for 625-644
2nd Iteration: Train on 11-634, Forecast for 635-654
3rd Iteration: Train on 21-644, Forecast for 645-664

so the last 20% can be used in the cross validation on different time frames

like image 109
Donald S Avatar answered Dec 25 '22 07:12

Donald S