Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transformation-Based Part-of-Speech Tagging(Brill Tagging)

What are the weaknesses and strengths of the Brill Tagger? Can you suggest some possible improvements for the tagger?

like image 419
user239135 Avatar asked Feb 26 '10 13:02

user239135


1 Answers

The biggest weakness of a Brill tagger is the time needed for the training phase (take a look at the time-stamps for ACOPOST here or try to to implement one with NLTK to get an idea). Remember that you should always consider a Brill tagger as the last tagger to be used in a sequence of tagging systems (for simple tagging I usually use and train a Brill tagger on the output of an HMM tagger). Besides making the training phase even longer, to use a Brill tagger by itself generally results in a very large, normally overlapping and sometimes "incorrect" set of rules (i.e., rules which in "true" tagging contexts brake many correct tags).

The biggest strength of a Brill tagger is the fact that its model makes sense, in particular when you store the rules in an human-readable format as it is generally done. To manually inspect the model of a statistical tagger is tedious, error-prone and not very useful, while a set of transformation rules can not only be understood and tweaked manually, but this can be done even by people with no previous experience in NLP (in fact, I did years ago when some undergraduates of a language program evaluated the rules generated on a Brazilian Portugues corpus). In fact, you can even write the set of rules entirely by yourself.

In short, while a Brill tagger is useful as the last step in a robust system of cascading taggers, in general it is not the best alternative to be used by itself (if you want to use a single tagger, I would suggest to go with an HMM one). My suggestion is to train and use a Brill tagger on the tagged output of another tagger, preferably a combined system such as voting one (i.e., when you setup three or four different taggers, use a voting system to select the best tag for each token and only then feed these results to a Brill tagger that would hopefully correct the most common mistakes of the previous system).

like image 189
Giacomo Avatar answered Sep 26 '22 13:09

Giacomo