Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is a good Java library for Parts-Of-Speech tagging? [closed]

Tags:

java

nlp

I'm looking for a good open source POS Tagger in Java. Here's what I have come up with so far.

  • LingPipe
  • Stanford
  • LBJ
  • FastTag

Anybody got any recommendations?

like image 627
Glenn Avatar asked Feb 19 '10 02:02

Glenn


People also ask

Which libraries contain parts of speech tagging?

Tokenization and Parts of Speech(POS) Tagging in Python's NLTK library. Python's NLTK library features a robust sentence tokenizer and POS tagger. Python has a native tokenizer, the .

What is tagging in NLTK?

Summary. POS Tagging in NLTK is a process to mark up the words in text format for a particular part of a speech based on its definition and context. Some NLTK POS tagging examples are: CC, CD, EX, JJ, MD, NNP, PDT, PRP$, TO, etc. POS tagger is used to assign grammatical information of each word of the sentence.

What is VBG in NLP?

VBG. verb, present participle or gerund. stirring focusing approaching erasing. VBN. verb, past participle.

Which of the following class is used to create a default tagger?

TaggerI - Base class The base class of these taggers is TaggerI, means all the taggers inherit from this class.


2 Answers

Are you looking to tag POS in a specific domain? Most of the general purpose taggers are trained on newswire text. Typically they don't perform well when you are using them in specific domains (such and biomedical text). There are other taggers specifically trained for such domains such as dTagger (java) for biomedical text.

For newswire text, Adwait Ratnaparkhi's MXPOST is very good and is the one I would recommend.

Other Java implementations include:

  1. MontyLingua
  2. Berkeley Parser (Not really a POS tagger but all full blown parsers will typically include POS taggers. Google for Java syntactic parsers and you will find many.)
  3. QTag
  4. LBJ

OpenNLP and Lingpipe as posted by the other posters are also pretty decent.

Info on the state-of-the-art on POS tagging can be found here. As you can see LTAG-Spinal (also mentioned by another poster) ranks best as of now, but the variation across the various taggers is not much. I have not used LTAG myself.

Also note that the baseline performance for POS tagging is about 90%. Baseline means - (a) tag every word by most frequent POS tag from a lexicon, and (b) tag every unknown word as a noun.

like image 145
hashable Avatar answered Sep 18 '22 12:09

hashable


I have used OpenNLP with good results. You can also check out MorphAdorner.

like image 34
Shashikant Kore Avatar answered Sep 19 '22 12:09

Shashikant Kore