Multi-Label Document Classification

Question

I have a database in which I store data based upon the following three fields: id, text, {labels}. Note that each text has been assigned to more than one label \ tag \ class. I want to build a model (weka \ rapidminer \ mahout) that will be able to recommend \ classify a bunch of labels \ tags \ classes to a given text.

I have heard about SVM and Naive Bayes Classifier, but not sure whether they support multi-label classification or not. Anything that guides me to the right direction is more than welcome!

Fred Foo · Accepted Answer

The basic multilabel classification method is one-vs.-the-rest (OvR), also called binary relevance (BR). The basic idea is that you take an off-the-shelf binary classifier, such as Naive Bayes or an SVM, then create K instances of it to solve K independent classification problems. In Python-like pseudocode:

for each class k:
    learner = SVM(settings)  # for example
    labels = [class_of(x) == k for x in samples]
    learner.learn(samples, labels)

Then at prediction time, you just run each of the binary classifiers on a sample and collect the labels for which they predict positive.

(Both training and prediction can obviously be done in parallel, since the problems are assumed to be independent. See Wikipedia for links to two Java packages that do multi-label classification.)

Multi-Label Document Classification

Tags:

java

machine-learning

text-mining

document-classification

user2295350

1 Answers

Fred Foo

Recent Activity

Donate For Us

Multi-Label Document Classification

Tags:

java

machine-learning

text-mining

document-classification

user2295350

1 Answers

Fred Foo

Related questions

Recent Activity

Donate For Us