Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to POS_TAG a french sentence?

I'm looking for a way to pos_tag a French sentence like the following code is used for English sentences:

def pos_tagging(sentence):
    var = sentence
    exampleArray = [var]
    for item in exampleArray:
        tokenized = nltk.word_tokenize(item)
        tagged = nltk.pos_tag(tokenized)
        return tagged
like image 583
sahraoui asmoun Avatar asked Jun 10 '17 00:06

sahraoui asmoun


2 Answers

here is the full code source it works very well download link for Standford NLP https://nlp.stanford.edu/software/tagger.shtml#About

from nltk.tag import StanfordPOSTagger
jar = 'C:/Users/m.ferhat/Desktop/stanford-postagger-full-2016-10-31/stanford-postagger-3.7.0.jar'
model = 'C:/Users/m.ferhat/Desktop/stanford-postagger-full-2016-10-31/models/french.tagger'
import os
java_path = "C:/Program Files/Java/jdk1.8.0_121/bin/java.exe"
os.environ['JAVAHOME'] = java_path

pos_tagger = StanfordPOSTagger(model, jar, encoding='utf8' )
res = pos_tagger.tag('je suis libre'.split())
print (res)
like image 149
sahraoui asmoun Avatar answered Nov 07 '22 05:11

sahraoui asmoun


The NLTK doesn't come with pre-built resources for French. I recommend using the Stanford tagger, which comes with a trained French model. This code shows how you might set up the nltk for use with Stanford's French POS tagger. Note that the code is outdated (and for Python 2), but you could use it as a starting point.

Alternately, the NLTK makes it very easy to train your own POS tagger on a tagged corpus, and save it for later use. If you have access to a (sufficiently large) French corpus, you can follow the instructions in the nltk book and simply use your corpus in place of the Brown corpus. You're unlikely to match the performance of the Stanford tagger (unless you can train a tagger for your specific domain), but you won't have to install anything.

like image 20
alexis Avatar answered Nov 07 '22 06:11

alexis