How to calculate tag-wise precision and recall for POS tagger?

Tags:

I am using some rule-based and statistical POS taggers to tag a corpus(of around 5000 sentences) with Parts of Speech(POS). Following is a snippet of my test corpus where each word is seperated by its respective POS tag by '/'.

No/RB ,/, it/PRP was/VBD n't/RB Black/NNP Monday/NNP ./.
But/CC while/IN the/DT New/NNP York/NNP Stock/NNP Exchange/NNP did/VBD n't/RB fall/VB apart/RB Friday/NNP as/IN the/DT Dow/NNP Jones/NNP Industrial/NNP Average/NNP plunged/VBD 190.58/CD points/NNS --/: most/JJS of/IN it/PRP in/IN the/DT final/JJ hour/NN --/: it/PRP barely/RB managed/VBD *-2/-NONE- to/TO stay/VB this/DT side/NN of/IN chaos/NN ./.
Some/DT ``/`` circuit/NN breakers/NNS ''/'' installed/VBN */-NONE- after/IN the/DT October/NNP 1987/CD crash/NN failed/VBD their/PRP$ first/JJ test/NN ,/, traders/NNS say/VBP 0/-NONE- *T*-1/-NONE- ,/, *-2/-NONE- unable/JJ *-3/-NONE- to/TO cool/VB the/DT selling/NN panic/NN in/IN both/DT stocks/NNS and/CC futures/NNS ./.

After tagging the corpus, it looks like this:

No/DT ,/, it/PRP was/VBD n't/RB Black/NNP Monday/NNP ./. 
But/CC while/IN the/DT New/NNP York/NNP Stock/NNP Exchange/NNP did/VBD n't/RB fall/VB apart/RB Friday/VB as/IN the/DT Dow/NNP Jones/NNP Industrial/NNP Average/JJ plunged/VBN 190.58/CD points/NNS --/: most/RBS of/IN it/PRP in/IN the/DT final/JJ hour/NN --/: it/PRP barely/RB managed/VBD *-2/-NONE- to/TO stay/VB this/DT side/NN of/IN chaos/NNS ./. 
Some/DT ``/`` circuit/NN breakers/NNS ''/'' installed/VBN */-NONE- after/IN the/DT October/NNP 1987/CD crash/NN failed/VBD their/PRP$ first/JJ test/NN ,/, traders/NNS say/VB 0/-NONE- *T*-1/-NONE- ,/, *-2/-NONE- unable/JJ *-3/-NONE- to/TO cool/VB the/DT selling/VBG panic/NN in/IN both/DT stocks/NNS and/CC futures/NNS ./.

I need to calculate the tagging accuracy(Tag wise- Recall & Precision), therefore need to find an error(if any) in tagging for each word-tag pair.

The approach I am thinking of is to loop through these 2 text files and store them in a list and later compare the 'two' lists element by element.

The approach seems really crude to me, so would like you guys to suggest some better solution to the above problem.

From the wikipedia page:

In a classification task, the precision for a class is the number of true positives (i.e. the number of items correctly labeled as belonging to the positive class) divided by the total number of elements labeled as belonging to the positive class (i.e. the sum of true positives and false positives, which are items incorrectly labeled as belonging to the class). Recall in this context is defined as the number of true positives divided by the total number of elements that actually belong to the positive class (i.e. the sum of true positives and false negatives, which are items which were not labeled as belonging to the positive class but should have been).

476

asked Mar 10 '11 19:03

stressed_geek

1 Answers

Note that since every word has exactly one tag, overall recall and precision scores are meaningless for this task (they'll both just equal the accuracy measure). But it does make sense to ask for recall and precision measures per tag - for example, you could find the recall and precision for the DT tag.

The most efficient way to do this for all tags at once is similar to the way you suggested, though you can save one pass over the data by skipping the list-making stage. Read in a line of each file, compare the two lines word by word, and repeat until you reach the end of the files. For each word comparison, you probably want to check the words are equal for sanity, rather than assuming the two files are in sync. For each kind of tag, you keep three running totals: true positives, false positives and false negatives. If the two tags for the current word match, increment the true positive total for the tag. If they don't match, you need to increment the false negative total for the true tag and the false positive total for the tag your machine mistakenly chose. At the end, you can calculate recall and precision scores for each tag by following the formula in your Wikipedia excerpt.

I haven't tested this code and my Python's a but rusty, but this should give you the idea. I'm assuming the files are open and the totals data structure is a dictionary of dictionaries:

finished = false
while not finished:
    trueLine = testFile.readline()
    if not trueLine: # end of file
        finished = true
    else:
        trueLine = trueLine.split() # tokenise by whitespace
        taggedLine = taggedFile.readline()
        if not taggedLine:
            print 'Error: files are out of sync.'
        taggedLine = taggedLine.split()
        if len(trueLine) != len(taggedLine):
            print 'Error: files are out of sync.'
        for i in range(len(trueLine)):
            truePair = trueLine[i].split('/')
            taggedPair = taggedLine[i].split('/')
            if truePair[0] != taggedPair[0]: # the words should match
                print 'Error: files are out of sync.'
            trueTag = truePair[1]
            guessedTag = taggedPair[1]
            if trueTag == guessedTag:
                totals[trueTag]['truePositives'] += 1
            else:
                totals[trueTag]['falseNegatives'] += 1
                totals[guessedTag]['falsePositives'] += 1

answered Oct 17 '22 06:10

Tommy Herbert

Related questions
                            
                                Flask for Python - architectural question regarding the system
                            
                                itertools or hand-written generator - what is preferable?
                            
                                What methods, other than listening for Probe Requests, can be used to find 802.11 wifi devices?
                            
                                Jython: Making a simple beep on Windows
                            
                                Streaming pipes in Python
                            
                                Is it possible to use celery for synchronous tasks?
                            
                                How do I use scipy.weave.inline together with external C libraries?
                            
                                What are the implications of calling NumPy's C API functions from multiple threads?
                            
                                python - support .send() for a class?
                            
                                WTForms doesn't validate - no errors
                            
                                Is there a Django template filter that handles "...more" and when you click on it, it shows more of the text?
                            
                                blender - how do I add a color to an object?
                            
                                Using Sphinx with a distutils-built C extension
                            
                                Populating a PDF file - Python
                            
                                check for valid arguments
                            
                                How to achieve desired results when using the subprocees Popen.send_signal(CTRL_C_EVENT) in Windows?
                            
                                Nested Dictionary/Array in C++
                            
                                gaierror: [Errno -2] Name or service not known
                            
                                Serializing SQLAlchemy models for a REST API while respecting access control?
                            
                                Testing django internationalization - Mocking gettext

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to calculate tag-wise precision and recall for POS tagger?

Tags:

python

shell

machine-learning

text-processing

nlp

stressed_geek

People also ask

1 Answers

Tommy Herbert

Recent Activity

Donate For Us