I'm using Spark 2.0 and the new spark.ml. packages. Is there a way to adjust the classification threshold so that I reduce the number of False Positives. If it matters I'm also using the CrossValidator.
I see RandomForestClassifier and DecisionTreeClassifier both output a probability column (Which I could use manually, but GBTClassifier does not.
It sounds like you might be looking for the thresholds
parameter:
final val thresholds: DoubleArrayParam
Param for Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values >= 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class' threshold.
You will need to set it by calling setThresholds(value: Array[Double])
on your classifier.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With