I am using the nltk module in python and i am trying to use this for POS tagging different languages.
There is a lot of information on how to train your own POS tagger in different languages - is there a database of really robust well built and tested NLTK POS taggers for different languages? (It is quite easy to export POS taggers using the pickle module)
POS Tagging in NLTK is a process to mark up the words in text format for a particular part of a speech based on its definition and context. Some NLTK POS tagging examples are: CC, CD, EX, JJ, MD, NNP, PDT, PRP$, TO, etc. POS tagger is used to assign grammatical information of each word of the sentence.
IN preposition/subordinating conjunction. JJ adjective 'big' JJR adjective, comparative 'bigger' JJS adjective, superlative 'biggest'
Part-of-speech (POS) tagging is a popular Natural Language Processing process which refers to categorizing words in a text (corpus) in correspondence with a particular part of speech, depending on the definition of the word and its context. Figure 1: Example of POS tagging (Image by Author)
You can find robust and well built and tested NLTK Corpora at http://www.nltk.org/nltk_data/
You may find other corporas but these are the best
If it is not strict to using only NLTK, you can try our robust and language-independent POS tagging toolkit RDRPOSTagger.
(License: GPLv2; Programming Language: Python & Java)
RDRPOSTagger obtains fast performance in both learning and tagging process. In addition, RDRPOSTagger achieves a very competitive accuracy in comparison to the state-of-the-art results.
Updated 18/11/2015: release version 1.2 with improved tagging accuracy, especially on morphologically rich languages. See experimental results including performance speed and tagging accuracy in this paper.
RDRPOSTagger supports pre-trained POS and morphological tagging models for Bulgarian, Czech, Dutch, English, French, German, Hindi, Italian, Portuguese, Spanish, Swedish, Thai and Vietnamese. RDRPOSTagger also supports the pre-trained Universal POS tagging models for 40 languages.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With