Log Loss function in pyspark

1 Answers

No such function exists directly, as far as I can tell. But given a PySpark dataframe df with the columns named as in the question, one can explicitly calculate the average log loss:

Click to copy

import pyspark.sql.functions as f

df = (
    df.withColumn(
        'logloss'
        , -f.col('label')*f.log(f.col('probability')) - (1.-f.col('label'))*f.log(1.-f.col('probability'))
    )
)

logloss = df.agg(f.mean('logloss').alias('ll')).collect()[0]['ll']

I'm assuming here that label is numerical (i.e. 0 or 1), and that probability represents the predictions of the model. (Not sure what rawPrediction might mean.)

198

answered Nov 02 '22 09:11

abeboparebop

Related questions
                            
                                Dimension mismatch error in Spark ML
                            
                                How to save the encoded output in Keras
                            
                                tf.cond lowers the training speed
                            
                                How to convert Euclidean distance to range 0 and 1 like Cosine Similarity?
                            
                                Is it possible to get the objective function value during each training step?
                            
                                Binary Crossentropy to penalize all components of one-hot vector
                            
                                Is it possible to certify an AI-based solution for safety-critical systems? [closed]
                            
                                Least Squares method in practice
                            
                                Deep Learning an Imbalanced data set
                            
                                How to add a regression head after the fully connected layer in convolutional network using Tensorflow?
                            
                                Does CrossValidator in PySpark distribute the execution?
                            
                                Machine learning - normalizing features with no theoretical maximum value
                            
                                ValueError: X.shape[1] = 15 should be equal to 700, the number of features at training time
                            
                                NLP - Embeddings selection of `start` and `end` of sentence tokens
                            
                                Why does this neural network learn nothing?
                            
                                Training GAN on small dataset of images
                            
                                Keras - model.predict return classes and not probabilities
                            
                                Using keras tokenizer for new words not in training set
                            
                                How to use K.get_session in Tensorflow 2.0 or how to migrate it?
                            
                                What is a weak learner?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Log Loss function in pyspark

Tags:

machine-learning

pyspark

cross-validation

Dustin

People also ask

1 Answers

abeboparebop

Recent Activity

Donate For Us