Obtain multiple taggings with Stanford POS Tagger

Tags:

I'm performing POS tagging with the Stanford POS Tagger. The tagger only returns one possible tagging for the input sentence. For instance, when provided with the input sentence "The clown weeps.", the POS tagger produces the (erroneous) "The_DT clown_NN weeps_NNS ._.".

However, my application will try to parse the result, and may reject a POS tagging because there is no way to parse it. Hence, in this example, it would reject "The_DT clown_NN weeps_NNS ._." but would accept "The_DT clown_NN weeps_VBZ ._." which I assume is a lower-confidence tagging for the parser.

I would therefore like the POS tagger to provide multiple hypotheses for the tagging of each word, annotated by some kind of confidence value. In this way, my application could choose the POS tagging with highest confidence that achieves a valid parsing for its purposes.

I have found no way to ask the Stanford POS Tagger to produce multiple (n-best) tagging hypotheses for each word (or even for the whole sentence). Is there a way to do this? (Alternatively, I am also OK with using another POS tagger with comparable performance that would have support for this.)

900

asked May 28 '13 12:05

a3nm

1 Answers

OpenNLP allows getting n best for POS tagging:

Some applications need to retrieve the n-best pos tag sequences and not only the best sequence. The topKSequences method is capable of returning the top sequences. It can be called in a similar way as tag.
Sequence topSequences[] = tagger.topKSequences(sent);
Each Sequence object contains one sequence. The sequence can be retrieved via Sequence.getOutcomes() which returns a tags array and Sequence.getProbs() returns the probability array for this sequence.

Also, there is also a way to make spaCy do something like this:

Doc.set_extension('tag_scores', default=None)
Token.set_extension('tag_scores', getter=lambda token: token.doc._.tag_scores[token.i])

class ProbabilityTagger(Tagger):
    def predict(self, docs):
        tokvecs = self.model.tok2vec(docs)
        scores = self.model.softmax(tokvecs)
        guesses = []
        for i, doc_scores in enumerate(scores):
            docs[i]._.tag_scores = doc_scores
            doc_guesses = doc_scores.argmax(axis=1)

            if not isinstance(doc_guesses, numpy.ndarray):
                doc_guesses = doc_guesses.get()
            guesses.append(doc_guesses)
        return guesses, tokvecs


Language.factories['tagger'] = lambda nlp, **cfg: ProbabilityTagger(nlp.vocab, **cfg)

Then each token will have tag_scores with the probabilities for each part of speech from spaCy's tag map.

Source: https://github.com/explosion/spaCy/issues/2087

answered Oct 20 '22 07:10

Anastasiia Iurshina

Related questions
                            
                                How to get logical parts of a sentence with java?
                            
                                How to install and invoke Stanford NERTagger?
                            
                                BLEU score implementation for sentence similarity detection
                            
                                Is it better to use a "natural" language to write code?
                            
                                Memory Efficient data structure for Trie Implementation
                            
                                Implementing trigram markov model
                            
                                .NET dll for Natural language to SQL/SPARQL
                            
                                Paragraph Segmentation using Machine Learning
                            
                                spaCy and scikit-learn vectorizer
                            
                                Unable to train my keras model : (Data cardinality is ambiguous:)
                            
                                Online job-searching is tedious. Help me automate it
                            
                                Python: Clustering Search Engine Keywords
                            
                                How can I vary the sentence prefix "I am working on [X]" such that it has correct sentence structure for all X?
                            
                                Parsing HTML into sentences - how to handle tables/lists/headings/etc?
                            
                                How to extract keywords (tags) from text

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Obtain multiple taggings with Stanford POS Tagger

Tags:

nlp

stanford-nlp

pos-tagger

a3nm

People also ask

1 Answers

Anastasiia Iurshina

Recent Activity

Donate For Us