Are there some common or recommended techniques for using the context of word to improve the accuracy of part-of-speech tagging?
For example if I had the sentence:
I played golf on a links.
The word "links" could be either singular (a golf course) or plural. I tried this sentence in several grammar checkers and they all correctly recognized the sentence as valid.
The problem is they also thought that this sentence was valid:
I clicked on a links.
Is there a good way to use the context (clicked vs played golf) to infer the correct part-of-speech?
Thanks!
Determining whether "links" is a "golf course" or "references" is a task called word-sense disambiguation. Here is what Wikipedia's article on Word-sense disambiguation says about the relation to part-of-speech tagging:
In any real test, part-of-speech tagging and sense tagging are very closely related with each potentially making constraints to the other. And the question whether these tasks should be kept together or decoupled is still not unanimously resolved, but recently scientists incline to test these things separately (e.g. in the Senseval/SemEval competitions parts of speech are provided as input for the text to disambiguate). It is instructive to compare the word sense disambiguation problem with the problem of part-of-speech tagging. Both involve disambiguating or tagging with words, be it with senses or parts of speech. However, algorithms used for one do not tend to work well for the other, mainly because the part of speech of a word is primarily determined by the immediately adjacent one to three words, whereas the sense of a word may be determined by words further away. The success rate for part-of-speech tagging algorithms is at present much higher than that for WSD, state-of-the art being around 95% accuracy or better, as compared to less than 75% accuracy in word sense disambiguation with supervised learning. These figures are typical for English, and may be very different from those for other languages.
I am not aware of works that use WSD to inform POS-tagging (however, using POS tags to inform WSD is the standard.) This sounds like a good idea to me, even if the benefit to accuracy would be small because accuracy is already high. It could be implemented as a feature in Toutanova's CRF tagger.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With