I noticed a relatively recend add to the h2o.ai suite, the ability to perform supplementary Platt Scaling to improve the calibration of output probabilities. (See calibrate_model
in h2o manual.) Nevertheless few guidance is avaiable on the online help docs. In particular I wonder whether when Platt Scaling is enabled:
calibration_frame
be the same as validation_frame
or should not (both under a computation or theoretical point of view)?Thanks in advance
Calibration is a post-processing step run after the model finishes. Therefore it doesn't affect the leaderboard and and it has no effect on the training metrics either. It adds 2 more columns to the scored frame (with calibrated predictions).
This article provides guidance how to construct a calibration frame:
It also says: The most important step is to create a separate dataset to perform calibration with.
I think the calibration frame should be used only for calibration, and hence distinct from the validation frame. The conservative answer is that they should be separate -- when you use a validation frame for early stopping or any internal model tuning (e.g. lambda search in H2O GLM), that validation frame becomes an extension of the "training data" so it's kind of off-limits at that point. However you could try both versions and directly observe what the effect is, then make a decision. Here's some additional guidance from the article:
"How much data to use for calibration will depend on the amount of data you have available. The calibration model will generally only be fitting a small number of parameters (so you do not need a huge volume of data). I would aim for around 10% of your training data, but at a minimum of at least 50 examples."
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With