How do I detect what language a text is written in using NLTK?
The examples I've seen use nltk.detect
, but when I've installed it on my mac, I cannot find this package.
First, you import the detect method from langdetect and then pass the text to the method. The method detects the text provided is in the Swahili language ('sw'). You can also find out the probabilities for the top languages by using detect_langs method.
The Natural Language Toolkit (NLTK) is a platform used for building Python programs that work with human language data for applying in statistical natural language processing (NLP). It contains text processing libraries for tokenization, parsing, classification, stemming, tagging and semantic reasoning.
NLTK (Natural Language Toolkit) is the go-to API for NLP (Natural Language Processing) with Python. It is a really powerful tool to preprocess text data for further analysis like with ML models for instance.
Google Translate - If you need to determine the language of an entire web page or an online document, paste the URL of that page in the Google Translate box and choose “Detect Language” as the source language.
Have you come across the following code snippet?
english_vocab = set(w.lower() for w in nltk.corpus.words.words()) text_vocab = set(w.lower() for w in text if w.lower().isalpha()) unusual = text_vocab.difference(english_vocab)
from http://groups.google.com/group/nltk-users/browse_thread/thread/a5f52af2cbc4cfeb?pli=1&safe=active
Or the following demo file?
https://web.archive.org/web/20120202055535/http://code.google.com/p/nltk/source/browse/trunk/nltk_contrib/nltk_contrib/misc/langid.py
This library is not from NLTK either but certainly helps.
$ sudo pip install langdetect
Supported Python versions 2.6, 2.7, 3.x.
>>> from langdetect import detect >>> detect("War doesn't show who's right, just who's left.") 'en' >>> detect("Ein, zwei, drei, vier") 'de'
https://pypi.python.org/pypi/langdetect?
P.S.: Don't expect this to work correctly always:
>>> detect("today is a good day") 'so' >>> detect("today is a good day.") 'so' >>> detect("la vita e bella!") 'it' >>> detect("khoobi? khoshi?") 'so' >>> detect("wow") 'pl' >>> detect("what a day") 'en' >>> detect("yay!") 'so'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With