I have run binary logistic regression using spark mllib. As per documentation of spark mllib, RawPrediction are confidence values, which i assume probability for lcl and ucl. I am getting -ve values for RawPrediction. In what scenarios, raw prediction values can be -ve
RawPrediction is typically the direct probability/confidence calculation. From Spark docs: Raw prediction for each possible label. The meaning of a "raw" prediction may vary between algorithms, but it intuitively gives a measure of confidence in each possible label (where larger = more confident).
PySpark logistic Regression is an classification that predicts the dependency of data over each other in PySpark ML model. PySpark logistic Regression is faster way of classification of data and works fine with larger data set with accurate result.
Wikipedia states – In statistics, linear regression is a linear approach to modeling the relationship between dependent variable and one or more independent variables. Linear regression is a basic and commonly used type of predictive analysis.
For classification tasks in Spark, you have logistic regression, naïve Bayes, support vector machines (SVM), decision trees, and random forests at your disposal.
Raw Prediction in case of binary classification is the margin for the concerned class. For a feature vector X,
Raw prediction z = WTX
∴ z ⊂ (-∞,+∞)
Prediction probability =
f(z) = 1 / ( 1 + e-z)
f(z) ⊂ [0, 1]
Source code for raw-prediction calculation : https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L973
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With