Fine Text Classification - what algorithm?

Question

I'm looking to implement a classifier with approximately 150 categories (probably in Java) mostly for tweets (so very small documents).Some of the classes have very similar domains eg. 'Companies', 'Competition', 'Consumers' , 'International law', 'International organisations', 'International politics and government' . What algorithm/ approach is best when such a high resolution is needed? I've tried Naive Bayes (obv) and so far it hasn't performed very well (although that could just be due to the quality of the training data). The communities thoughts would be very welcome!

Thanks,

Mark

Thanks,

Mark

Wesley Baugh · Accepted Answer

It might be worthwhile to come up with a hierarchical classifier built from (potentially many) levels of sub-classifiers (i.e., come up with a taxonomy for your document labels).

Single classifier

single classifier with many possible class labels

A single classifier can output any of the many possible class labels.

Hierarchical classifier

hierarchical classifier

A hierarchical classifier groups related class labels together, and performs additional layers of classification until a leaf node is reached (or until the confidence drops below a certain threshold).

Intuition

The intuition is that the classifier will have an easier time learning discriminative features when the number of categories is fewer.

For example, a hierarchical classifier may have an easier time learning that player is a good feature indicative of sports, whereas a single classifier would have a more difficult time if player was only seen for one category (basketball) and not another (hockey).

Fine Text Classification - what algorithm?

Tags:

java

algorithm

machine-learning

classification

Mark

1 Answers

Single classifier

Hierarchical classifier

Intuition

Wesley Baugh

Recent Activity

Donate For Us

Fine Text Classification - what algorithm?

Tags:

java

algorithm

machine-learning

classification

Mark

1 Answers

Single classifier

Hierarchical classifier

Intuition

Wesley Baugh

Related questions

Recent Activity

Donate For Us