Nltk french tokenizer in python not working

Tags:

Why is the french tokenizer that comes with python not working for me? Am I doing something wrong?

I'm doing

import nltk
content_french = ["Les astronomes amateurs jouent également un rôle important en recherche; les plus sérieux participant couramment au suivi d'étoiles variables, à la découverte de nouveaux astéroïdes et de nouvelles comètes, etc.", 'Séquence vidéo.', "John Richard Bond explique le rôle de l'astronomie."]
tokenizer = nltk.data.load('tokenizers/punkt/PY3/french.pickle')
for i in content_french:
        print(i)
        print(tokenizer.tokenize(i))

But I get non-tokenized output like

John Richard Bond explique le rôle de l'astronomie.
["John Richard Bond explique le rôle de l'astronomie."]

336

asked Feb 23 '17 23:02

Atirag

1 Answers

tokenizer.tokenize() is sentence tokenizer (splitter). If you want to tokenize words then use word_tokenize():

import nltk
from nltk.tokenize import word_tokenize

content_french = ["Les astronomes amateurs jouent également un rôle important en recherche; les plus sérieux participant couramment au suivi d'étoiles variables, à la découverte de nouveaux astéroïdes et de nouvelles comètes, etc.", 'Séquence vidéo.', "John Richard Bond explique le rôle de l'astronomie."]
for i in content_french:
        print(i)
        print(word_tokenize(i, language='french'))

Reference

answered Oct 05 '22 22:10

Yohanes Gultom

Related questions
                            
                                Group DataFrame in 5-minute intervals
                            
                                How use line.rstrip() in Python?
                            
                                Anaconda Python install imutils in Windows10
                            
                                Transposing (pivoting) a dict of lists in python [duplicate]
                            
                                Can't execute Python Pandas set_value
                            
                                sklearn: calculating accuracy score of k-means on the test data set
                            
                                How to create a unit test to check the response of an API made in Flask? [duplicate]
                            
                                Using IF, AND, OR together with EQUAL operand together in Python [duplicate]
                            
                                Python: String replace index
                            
                                Error in pip install matplotlib in Mac
                            
                                Logical Or/bitwise OR in pandas Data Frame
                            
                                Read in the first column of a CSV in Python
                            
                                how do I calculate a rolling idxmax
                            
                                how to hide axes in matplotlib.pyplot
                            
                                Changing a value in a yaml file using Python
                            
                                How to select duplicate rows with pandas?
                            
                                More Pythonic/Pandaic approach to looping over a pandas Series
                            
                                why sum on lists is (sometimes) faster than itertools.chain?
                            
                                Issues with Anaconda install - Failed to create Anaconda menus
                            
                                Contrast stretching in Python/ OpenCV

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Nltk french tokenizer in python not working

Tags:

python

tokenize

nltk

Atirag

People also ask

1 Answers

Yohanes Gultom

Recent Activity

Donate For Us