Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NLTK other language POS tagger

Tags:

python

nlp

nltk

I am using the nltk module in python and i am trying to use this for POS tagging different languages.

There is a lot of information on how to train your own POS tagger in different languages - is there a database of really robust well built and tested NLTK POS taggers for different languages? (It is quite easy to export POS taggers using the pickle module)

like image 915
Parsa Avatar asked Dec 22 '14 14:12

Parsa


People also ask

What are the different POS tags in NLTK?

POS Tagging in NLTK is a process to mark up the words in text format for a particular part of a speech based on its definition and context. Some NLTK POS tagging examples are: CC, CD, EX, JJ, MD, NNP, PDT, PRP$, TO, etc. POS tagger is used to assign grammatical information of each word of the sentence.

What is JJ in POS tagging?

IN preposition/subordinating conjunction. JJ adjective 'big' JJR adjective, comparative 'bigger' JJS adjective, superlative 'biggest'

What is Pos_tag in NLP?

Part-of-speech (POS) tagging is a popular Natural Language Processing process which refers to categorizing words in a text (corpus) in correspondence with a particular part of speech, depending on the definition of the word and its context. Figure 1: Example of POS tagging (Image by Author)


2 Answers

You can find robust and well built and tested NLTK Corpora at http://www.nltk.org/nltk_data/

You may find other corporas but these are the best

like image 65
shadab.tughlaq Avatar answered Oct 28 '22 23:10

shadab.tughlaq


If it is not strict to using only NLTK, you can try our robust and language-independent POS tagging toolkit RDRPOSTagger.

(License: GPLv2; Programming Language: Python & Java)

RDRPOSTagger obtains fast performance in both learning and tagging process. In addition, RDRPOSTagger achieves a very competitive accuracy in comparison to the state-of-the-art results.

Updated 18/11/2015: release version 1.2 with improved tagging accuracy, especially on morphologically rich languages. See experimental results including performance speed and tagging accuracy in this paper.

RDRPOSTagger supports pre-trained POS and morphological tagging models for Bulgarian, Czech, Dutch, English, French, German, Hindi, Italian, Portuguese, Spanish, Swedish, Thai and Vietnamese. RDRPOSTagger also supports the pre-trained Universal POS tagging models for 40 languages.

like image 31
NQD Avatar answered Oct 29 '22 00:10

NQD