Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Active Learning (e.g. Pool Sampling) for SVM in python [closed]

I'm working on a problem that would greatly benefit from an active learning protocol (e.g. given a set of unlabeled data as compared to an existing model, the algorithm requests that a subset of unlabeled data be labeled by an 'oracle').

Does anyone have any examples of active learning (either using pool sampling, query by committee, or otherwise) being implemented in a SVM (preferably in python)?

like image 959
DrTchocky Avatar asked May 03 '16 18:05

DrTchocky


People also ask

What is pool based active learning?

Dongrui Wu. Active learning is a machine learning approach for reducing the data labeling effort. Given a pool of unlabeled samples, it tries to select the most useful ones to label so that a model built from them can achieve the best possible performance.

What is stream Based selective sampling?

Stream-Based Selective Sampling It is a sequential strategy where we can randomly sample data points from a sampled distribution and then select if we want to label it or not. Each query decision is made individually. The learner decides if they want to query that instance.

What is active learning AI?

Active learning is the subset of machine learning in which a learning algorithm can query a user interactively to label data with the desired outputs. In active learning, the algorithm proactively selects the subset of examples to be labeled next from the pool of unlabeled data.


2 Answers

Implementing active learning in python is quite straight forward. For simpliest case you just select new sample to query, which has smallest absolute value of decision_function on your learned SVM (simple uncertainty sampling), which is basically a single line long!. Assuming that you have a binary classification, with trained svm in clf and some unlabeled examples in X, you simply select

sample = X[np.argmin(np.abs(clf.decision_function(X)))] 

You can find many different implementations on github too, like the one for AL paper from last year's ECML: https://github.com/gmum/mlls2015

like image 133
lejlot Avatar answered Sep 30 '22 09:09

lejlot


Two popular query strategies for pool based sampling are uncertainty sampling and query by committee (see paper for an extensive review). The following library implements three common uncertainty strategies: least confident, max margin and entropy as well as two committee strategies: vote entropy and average KL divergence: https://github.com/davefernig/alp

The library is compatible with scikit-learn and can be used with any classifier. It uses random subsampling as a baseline for measuring the benefit of active learning.

like image 22
Vadim Smolyakov Avatar answered Sep 30 '22 09:09

Vadim Smolyakov