Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multi-Label Document Classification

I have a database in which I store data based upon the following three fields: id, text, {labels}. Note that each text has been assigned to more than one label \ tag \ class. I want to build a model (weka \ rapidminer \ mahout) that will be able to recommend \ classify a bunch of labels \ tags \ classes to a given text.

I have heard about SVM and Naive Bayes Classifier, but not sure whether they support multi-label classification or not. Anything that guides me to the right direction is more than welcome!

like image 402
user2295350 Avatar asked May 21 '13 15:05

user2295350


1 Answers

The basic multilabel classification method is one-vs.-the-rest (OvR), also called binary relevance (BR). The basic idea is that you take an off-the-shelf binary classifier, such as Naive Bayes or an SVM, then create K instances of it to solve K independent classification problems. In Python-like pseudocode:

for each class k:
    learner = SVM(settings)  # for example
    labels = [class_of(x) == k for x in samples]
    learner.learn(samples, labels)

Then at prediction time, you just run each of the binary classifiers on a sample and collect the labels for which they predict positive.

(Both training and prediction can obviously be done in parallel, since the problems are assumed to be independent. See Wikipedia for links to two Java packages that do multi-label classification.)

like image 115
Fred Foo Avatar answered Oct 01 '22 13:10

Fred Foo