I wonder if there is a way to specify custom cost function in sklearn/python? My real problem has 7 different classes, but to make it more clear lets assume that I want to specify different cost for misclassification for a problem with 3 different classes and I am mainly interested that my model will properly distinguish between class 1 and class 3.
So the penalty matrix would look as follows:
Class 1 Class 2 Class 3
Class 1 0 1 2
Class 2 1 0 1
Class 3 2 1 0
I assume that the 'class_weight' parameter in sklearn does something similar but accepts a dictionary rather than a matrix. Passing class_weight = {1:2,1:1,1:2} would just increase the weight for misclassifying class 1 and class 3, I ,however, want my model get a larger penalty specifically when it chooses class 1 and true class is class 3 and vice versa.
Is it possible to do something like this in sklearn? May be some other libraries/learning algorithms allow for unequal misclassification cost?
Most machine learning algorithms assume that all misclassification errors made by a model are equal. This is often not the case for imbalanced classification problems where missing a positive or minority class case is worse than incorrectly classifying an example from the negative or majority class.
You must pass misclassification costs as a square matrix with nonnegative elements. Element C (i,j) of this matrix is the cost of classifying an observation into class j if the true class is i. The diagonal elements C (i,i) of the cost matrix must be 0.
In cost-sensitive learning instead of each instance being either correctly or incorrectly classified, each class (or instance) is given a misclassification cost. Thus, instead of trying to optimize the accuracy, the problem is then to minimize the total misclassification cost.
This is typically not the case for binary classification problems, especially those that have an imbalanced class distribution. Most classifiers assume that the misclassification costs (false negative and false positive cost) are the same. In most real-world applications, this assumption is not true.
First, in sklearn there is no way to train a model using custom loss. However, you can implement your own evaluation function and adjust hyperparameters of your model to optimize this metric.
Second, you can optimize any custom loss with neural networks, for example, using Keras. But for this purpose, your function should be smooth. The first thing that comes to mind is weighted cross-entropy. In this discussion, people are playing with implementations of this function.
Third, the structure of your own problem suggests that order of class labels is what really matters. If this is the case, you could try ordered logistic regression (an example of its implementation).
Moreover, in your problem the cost is precisely sum(abs(predicted-fact))
. So if you don't need probabilistic prediction, you can simply use a regressor that optimizes MAE (e.g. SGDRegressor with 'epsilon_insensitive' loss or DecisionTreeRegressor with mae criterion). After solving the regression, you need only to find the thresholds that optimize your cost function.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With