I wonder if there is a way to specify custom cost function in sklearn/python? My real problem has 7 different classes, but to make it more clear lets assume that I want to specify different cost for misclassification for a problem with 3 different classes and I am mainly interested that my model will properly distinguish between class 1 and class 3. <ul> <li>if observation has class 1 and model predicts class 1, penalty is 0 (correct classification)</li> <li>if observation has class 1 and model predicts class 2, penalty is 1 </li> <li>if point has class 1 and model predicts class 3, penalty is 2</li> </ul> <hr> <ul> <li>if point has class 2 and model predicts class 2, penalty is 0 (correct classification)</li> <li>if point has class 2 and model predicts class 3, penalty is 1 </li> <li>if point has class 2 and model predicts class 1, penalty is 1</li> </ul> <hr> <ul> <li>if point has class 3 and model predicts class 3, penalty is 0 (correct classification)</li> <li>if point has class 3 and model predicts class 2, penalty is 1 </li> <li>if point has class 3 and model predicts class 1, penalty is 2 </li> </ul> <hr> So the penalty matrix would look as follows: <pre class="prettyprint"><code> Class 1 Class 2 Class 3 Class 1 0 1 2 Class 2 1 0 1 Class 3 2 1 0 </code></pre> I assume that the 'class_weight' parameter in sklearn does something similar but accepts a dictionary rather than a matrix. Passing class_weight = {1:2,1:1,1:2} would just increase the weight for misclassifying class 1 and class 3, I ,however, want my model get a larger penalty specifically when it chooses class 1 and true class is class 3 and vice versa. Is it possible to do something like this in sklearn? May be some other libraries/learning algorithms allow for unequal misclassification cost?

First, in sklearn there is no way to train a model using custom loss. However, you can implement your own evaluation function and adjust hyperparameters of your model to optimize this metric. Second, you can optimize any custom loss with neural networks, for example, using Keras. But for this purpose, your function should be smooth. The first thing that comes to mind is weighted cross-entropy. In this discussion, people are playing with implementations of this function. Third, the structure of your own problem suggests that order of class labels is what really matters. If this is the case, you could try ordered logistic regression (an example of its implementation). Moreover, in your problem the cost is precisely <code>sum(abs(predicted-fact))</code>. So if you don't need probabilistic prediction, you can simply use a regressor that optimizes MAE (e.g. SGDRegressor with 'epsilon_insensitive' loss or DecisionTreeRegressor with mae criterion). After solving the regression, you need only to find the thresholds that optimize your cost function.

Unequal misclassification costs in python/sklearn

Tags:

python

machine-learning

scikit-learn

I wonder if there is a way to specify custom cost function in sklearn/python? My real problem has 7 different classes, but to make it more clear lets assume that I want to specify different cost for misclassification for a problem with 3 different classes and I am mainly interested that my model will properly distinguish between class 1 and class 3.

if observation has class 1 and model predicts class 1, penalty is 0 (correct classification)
if observation has class 1 and model predicts class 2, penalty is 1
if point has class 1 and model predicts class 3, penalty is 2

if point has class 2 and model predicts class 2, penalty is 0 (correct classification)
if point has class 2 and model predicts class 3, penalty is 1
if point has class 2 and model predicts class 1, penalty is 1

if point has class 3 and model predicts class 3, penalty is 0 (correct classification)
if point has class 3 and model predicts class 2, penalty is 1
if point has class 3 and model predicts class 1, penalty is 2

So the penalty matrix would look as follows:

        Class 1  Class 2  Class 3
Class 1   0        1        2
Class 2   1        0        1
Class 3   2        1        0

I assume that the 'class_weight' parameter in sklearn does something similar but accepts a dictionary rather than a matrix. Passing class_weight = {1:2,1:1,1:2} would just increase the weight for misclassifying class 1 and class 3, I ,however, want my model get a larger penalty specifically when it chooses class 1 and true class is class 3 and vice versa.

Is it possible to do something like this in sklearn? May be some other libraries/learning algorithms allow for unequal misclassification cost?

289

asked Jun 03 '16 14:06

kroonike

Video Answer

1 Answers

First, in sklearn there is no way to train a model using custom loss. However, you can implement your own evaluation function and adjust hyperparameters of your model to optimize this metric.

Second, you can optimize any custom loss with neural networks, for example, using Keras. But for this purpose, your function should be smooth. The first thing that comes to mind is weighted cross-entropy. In this discussion, people are playing with implementations of this function.

Third, the structure of your own problem suggests that order of class labels is what really matters. If this is the case, you could try ordered logistic regression (an example of its implementation).

Moreover, in your problem the cost is precisely sum(abs(predicted-fact)). So if you don't need probabilistic prediction, you can simply use a regressor that optimizes MAE (e.g. SGDRegressor with 'epsilon_insensitive' loss or DecisionTreeRegressor with mae criterion). After solving the regression, you need only to find the thresholds that optimize your cost function.

109

answered Oct 07 '22 06:10

David Dale

Related questions
                            
                                Generate Python type hints with SWIG
                            
                                RuntimeWarning: invalid value encountered in long_scalars
                            
                                How do I implement red noise?
                            
                                Ipython/Jupyter Notebook HTML misaligned when viewing mobile browser
                            
                                Headless endless scroll selenium
                            
                                Wheel names are platform independent even though my package includes compiled libraries
                            
                                Pythonic way to aggregate object properties in memory efficient way?
                            
                                Linker errors with libmsodbcsql-13.0.so.0.0 preventing pyODBC to MS SQL connection. CentOS 7
                            
                                Multimedia Keys in Python (Linux)
                            
                                Is it possible to store python objects (specifically sklearn models) in memory mapped files?
                            
                                Using the multiprocessing.Pool.map() function with keyword arguments?
                            
                                MySQLdb and _mysql versions ncompatible: how to upgrade _mysql
                            
                                Using Python to plot Natural Earth shapes as polygons in Matplotlib Basemap
                            
                                Python logging working on Windows but not Mac OS
                            
                                How to enable python docstring insertion in 2016.1 Intellij IDEA/Python Community Edition 5.1.145.45
                            
                                python tox, creating rpm virtualenv, as part of ci pipeline, unsure about where in workflow
                            
                                Invalid or expired token. Request new token via Tweepy?
                            
                                Generating reachability matrix from a given adjacency matrix
                            
                                binary field download link use in treeview or listview inside one2many field in Odoo
                            
                                Making global variable accessible from every process

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With