Model objects in the Python package scikit-learn have methods (e.g. predict_log_proba()) that return a probability/class matrix where classes are "ordered by arithmetical order" (this is how the docs phrase it).
Does anyone have any idea what this means? Is this lexicographic, numeric or something else? Googling this expression finds these same docs as the main hits, so I am guessing this is not the standard naming.
The order is the sorted order of the class labels: if your labels are ["ham", "spam", "eggs"], then they sorted to produce ['eggs', 'ham', 'spam'] (available in the classes_ attribute).
The first column in the output of decision_function, predict_proba and predict_log_proba then corresponds to class eggs, the second to ham and the third to spam. As an exception, when there are two classes, then the max of classes_ is considered the "positive" class and only values for that class are returned.
The formulation "arithmetic order" is a hold-over from the time when class labels had to be integers. I just changed the wording, so the next release will have a clearer description of how this works.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With