I'm working on a problem that would greatly benefit from an active learning protocol (e.g. given a set of unlabeled data as compared to an existing model, the algorithm requests that a subset of unlabeled data be labeled by an 'oracle').
Does anyone have any examples of active learning (either using pool sampling, query by committee, or otherwise) being implemented in a SVM (preferably in python)?
Dongrui Wu. Active learning is a machine learning approach for reducing the data labeling effort. Given a pool of unlabeled samples, it tries to select the most useful ones to label so that a model built from them can achieve the best possible performance.
Stream-Based Selective Sampling It is a sequential strategy where we can randomly sample data points from a sampled distribution and then select if we want to label it or not. Each query decision is made individually. The learner decides if they want to query that instance.
Active learning is the subset of machine learning in which a learning algorithm can query a user interactively to label data with the desired outputs. In active learning, the algorithm proactively selects the subset of examples to be labeled next from the pool of unlabeled data.
Implementing active learning in python is quite straight forward. For simpliest case you just select new sample to query, which has smallest absolute value of decision_function on your learned SVM (simple uncertainty sampling), which is basically a single line long!. Assuming that you have a binary classification, with trained svm in clf
and some unlabeled examples in X
, you simply select
sample = X[np.argmin(np.abs(clf.decision_function(X)))]
You can find many different implementations on github too, like the one for AL paper from last year's ECML: https://github.com/gmum/mlls2015
Two popular query strategies for pool based sampling are uncertainty sampling and query by committee (see paper for an extensive review). The following library implements three common uncertainty strategies: least confident, max margin and entropy as well as two committee strategies: vote entropy and average KL divergence: https://github.com/davefernig/alp
The library is compatible with scikit-learn and can be used with any classifier. It uses random subsampling as a baseline for measuring the benefit of active learning.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With