Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Adjust Classification Threshold with a Spark Decision Tree

I'm using Spark 2.0 and the new spark.ml. packages. Is there a way to adjust the classification threshold so that I reduce the number of False Positives. If it matters I'm also using the CrossValidator.

I see RandomForestClassifier and DecisionTreeClassifier both output a probability column (Which I could use manually, but GBTClassifier does not.

like image 522
Jeremy Avatar asked Oct 18 '22 02:10

Jeremy


1 Answers

It sounds like you might be looking for the thresholds parameter:

final val thresholds: DoubleArrayParam

Param for Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values >= 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class' threshold.

You will need to set it by calling setThresholds(value: Array[Double]) on your classifier.

like image 117
Chris Dove Avatar answered Oct 31 '22 21:10

Chris Dove