Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Log Loss function in pyspark

Is there a built-in log loss function in pyspark?

I have a pyspark dataframe with columns: probability, rawPrediction, label

and I want to use mean log loss to evaluate these predictions.

like image 627
Dustin Avatar asked Feb 28 '18 01:02

Dustin


People also ask

What is log loss function?

Log-loss is indicative of how close the prediction probability is to the corresponding actual/true value (0 or 1 in case of binary classification). The more the predicted probability diverges from the actual value, the higher is the log-loss value.

What is log loss in Python?

Log loss, aka logistic loss or cross-entropy loss. This is the loss function used in (multinomial) logistic regression and extensions of it such as neural networks, defined as the negative log-likelihood of a logistic model that returns y_pred probabilities for its training data y_true .

Is log loss a cost function?

The cost function used in Logistic Regression is Log Loss.

What is log loss in logistic regression?

Logistic regression is similar to linear regression but with two significant differences. It uses a sigmoid activation function on the output neuron to squash the output into the range 0–1 (to represent the output as a probability) It uses a loss function called log loss to calculate the Error.


1 Answers

No such function exists directly, as far as I can tell. But given a PySpark dataframe df with the columns named as in the question, one can explicitly calculate the average log loss:

import pyspark.sql.functions as f

df = (
    df.withColumn(
        'logloss'
        , -f.col('label')*f.log(f.col('probability')) - (1.-f.col('label'))*f.log(1.-f.col('probability'))
    )
)

logloss = df.agg(f.mean('logloss').alias('ll')).collect()[0]['ll']

I'm assuming here that label is numerical (i.e. 0 or 1), and that probability represents the predictions of the model. (Not sure what rawPrediction might mean.)

like image 198
abeboparebop Avatar answered Nov 02 '22 09:11

abeboparebop