How to use `log_loss` in `GridSearchCV` with multi-class labels in Scikit-Learn (sklearn)?

Tags:

I'm trying to use the log_loss argument in the scoring parameter of GridSearchCV to tune this multi-class (6 classes) classifier. I don't understand how to give it a label parameter. Even if I gave it sklearn.metrics.log_loss, it would change for each iteration in the cross-validation so I don't understand how to give it the labels parameter?

I'm using Python v3.6 and Scikit-Learn v0.18.1

How can I use GridSearchCV with log_loss with multi-class model tuning?

My class representation:

Click to copy

1    31
2    18
3    28
4    19
5    17
6    22
Name: encoding, dtype: int64

My code:

Click to copy

param_test = {"criterion": ["friedman_mse", "mse", "mae"]}
gsearch_gbc = GridSearchCV(estimator = GradientBoostingClassifier(n_estimators=10), 
                        param_grid = param_test, scoring="log_loss", n_jobs=1, iid=False, cv=cv_indices)
gsearch_gbc.fit(df_attr, Se_targets)

Here's the tail end of the error and the full one is here https://pastebin.com/1CshpEBN:

Click to copy

ValueError: y_true contains only one label (1). Please provide the true labels explicitly through the labels argument.

UPDATE: Just use this to make the scorer based on based on @Grr

Click to copy

log_loss_build = lambda y: metrics.make_scorer(metrics.log_loss, greater_is_better=False, needs_proba=True, labels=sorted(np.unique(y)))

324

asked Apr 12 '17 18:04

O.rka

2 Answers

my assumption is that somehow your data split has only one class label in y_true. while this seems unlikely based on the distribution you posted, i guess it is possible. While i havent run into this before it seems that in [sklearn.metrics.log_loss](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html) the label argument is expected if the labels are all the same. The wording of this section of the documentation also makes it seem as if the method imputes a binary classification if labels is not passed.

Now as you correctly assume you should pass log_loss as scorer=sklearn.metrics.log_loss(labels=your_labels)

answered Nov 14 '22 23:11

Grr

You can simply specify "neg_log_loss_scorer" (or "log_loss_scorer") in older versions which will use the negative log loss.

answered Nov 15 '22 00:11

Andreas Mueller

Related questions
                            
                                what is the best way to start google chrome and input a web address by pywinauto
                            
                                Entry point for a bokeh server
                            
                                ZeroMQ fails to .bind() on Docker on [0.0.0.0:5555] - address already in use. Why?
                            
                                How to combine multiple VUnit run.py files into a single VUnit run?
                            
                                pandas list of dictionary to separate columns
                            
                                Counter.most_common(n) how to override arbitrary ordering
                            
                                how to predict my own image using cnn in keras after training on MNIST dataset
                            
                                I am getting an error while creating a simple RDD in Spark
                            
                                TypeError: "unsupported operand type(s) for -: 'Timestamp' and 'str'" pandas
                            
                                DataFrameGroupBy diff() on condition
                            
                                Dask reading CSV, setting partition as CSV length
                            
                                How can I sort within partitions defined by one column but leave the partitions where they are?
                            
                                Overriding python with python3 in vim_configurable.customize
                            
                                Can't connect to mongo from flask in docker containers
                            
                                how extract a vector from groupby pandas in python
                            
                                Making a group in dataframe in pandas
                            
                                What are good options for debugging urwid applications?
                            
                                Pandas One hot encoding: Bundling together less frequent categories
                            
                                Spark Pipeline error
                            
                                Unexpected empty strings within Python strings

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to use `log_loss` in `GridSearchCV` with multi-class labels in Scikit-Learn (sklearn)?

Tags:

python

optimization

machine-learning

scikit-learn

grid-search

O.rka

People also ask

2 Answers

Grr

Andreas Mueller

Recent Activity

Donate For Us