Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get POS tagging using Stanford Parser

I'm using Stanford Parser to parse the dependence relations between pair of words, but I also need the tagging of words. However, in the ParseDemo.java, the program only output the Tagging Tree. I need each word's tagging like this:

My/PRP$ dog/NN also/RB likes/VBZ eating/VBG bananas/NNS ./.

not like this:

(ROOT
  (S
    (NP (PRP$ My) (NN dog))
    (ADVP (RB also))
    (VP (VBZ likes)
      (S
        (VP (VBG eating)
          (S
            (ADJP (NNS bananas))))))
    (. .)))

Who can help me? thanks a lot.

like image 769
Charlie Epps Avatar asked Sep 17 '10 08:09

Charlie Epps


People also ask

What is Stanford POS Tagger?

A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text in some language and assigns parts of speech to each word (and other token), such as noun, verb, adjective, etc., although generally computational applications use more fine-grained POS tags like 'noun-plural'.

How do you implement a POS tag?

The POS tagging process is the process of finding the sequence of tags which is most likely to have generated a given word sequence. We can model this POS process by using a Hidden Markov Model (HMM), where tags are the hidden states that produced the observable output, i.e., the words.

What are the two main methods used for POS tagging?

POS-tagging algorithms fall into two distinctive groups: rule-based and stochastic. E. Brill's tagger, one of the first and most widely used English POS-taggers, employs rule-based algorithms.


2 Answers

If you're mainly interested in manipulating the tags in a program, and don't need the TreePrint functionality, you can just get the tagged words as a List:

LexicalizedParser lp =
  LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz");
Tree parse = lp.apply(Arrays.asList(sent));
List taggedWords = parse.taggedYield();    
like image 83
Christopher Manning Avatar answered Jan 03 '23 13:01

Christopher Manning


When running edu.stanford.nlp.parser.lexparser.LexicalizedParser on the command line, you want to use:

-outputFormat "wordsAndTags"

Programatically, use the TreePrint class constructed with formatString="wordsAndTags" and call printTree, like this:

TreePrint posPrinter = new TreePrint("wordsAndTags", yourPrintWriter);
posPrinter.printTree(yourLexParser.getBestParse());
like image 41
msbmsb Avatar answered Jan 03 '23 15:01

msbmsb