Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to combine probabilistic classifiers in scikit-learn

Tags:

I have a logistic regression and a random forest and I'd like to combine them (ensemble) for the final classification probability calculation by taking an average.

Is there a built-in way to do this in sci-kit learn? Some way where I can use the ensemble of the two as a classifier itself? Or would I need to roll my own classifier?

like image 337
user1507844 Avatar asked Feb 02 '14 01:02

user1507844


People also ask

How do you combine two classifiers?

The simplest way of combining classifier output is to allow each classifier to make its own prediction and then choose the plurality prediction as the “final” output. This simple voting scheme is easy to implement and easy to understand, but it does not always produce the best possible results.

How do you use a stacking classifier?

A simple way to achieve this is to split your training set in half. Use the first half of your training data to train the level one classifiers. Then use the trained level one classifiers to make predictions on the second half of the training data. These predictions should then be used to train meta-classifier.

Which of the following utility of Sklearn ensemble is used for classification with extra randomness?

An extra-trees classifier. This class implements a meta estimator that fits a number of randomized decision trees (a.k.a. extra-trees) on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.

What is a stacking estimator?

Stacking refers to a method to blend estimators. In this strategy, some estimators are individually fitted on some training data while a final estimator is trained using the stacked predictions of these base estimators.

How to fine tune classifiers in scikit-learn?

The key to understanding how to fine tune classifiers in scikit-learn is to understand the methods .predict_proba () and .decision_function (). These return the raw probability that a sample is predicted to be in a class. This is an important distinction from the absolute class predictions returned by calling the .predict () method.

Can you compare several classifiers in scikit-learn on synthetic datasets?

A comparison of a several classifiers in scikit-learn on synthetic datasets. The point of this example is to illustrate the nature of decision boundaries of different classifiers. This should be taken with a grain of salt, as the intuition conveyed by these examples does not necessarily carry over to real datasets.

What are the different classification algorithms in scikit-learn?

Scikit-Learn provides easy access to numerous different classification algorithms. Among these classifiers are: K-Nearest Neighbors. Support Vector Machines. Decision Tree Classifiers/Random Forests. Naive Bayes. Linear Discriminant Analysis. Logistic Regression.

How do I use a classifier to make predictions on testing data?

After the classifier model has been trained on the training data, it can make predictions on the testing data. This is easily done by calling the predict command on the classifier and providing it with the parameters it needs to make predictions about, which are the features in your testing dataset:


1 Answers

NOTE: The scikit-learn Voting Classifier is probably the best way to do this now


OLD ANSWER:

For what it's worth I ended up doing this as follows:

class EnsembleClassifier(BaseEstimator, ClassifierMixin):     def __init__(self, classifiers=None):         self.classifiers = classifiers      def fit(self, X, y):         for classifier in self.classifiers:             classifier.fit(X, y)      def predict_proba(self, X):         self.predictions_ = list()         for classifier in self.classifiers:             self.predictions_.append(classifier.predict_proba(X))         return np.mean(self.predictions_, axis=0) 
like image 85
user1507844 Avatar answered Sep 23 '22 09:09

user1507844