Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detect if text in English with python [closed]

Well, i knew this question being asked multiple of times but i still couldn't fix it with the "available" solution. Hope to got any further ideas or concepts of how to detect my sentences is english in python. The available solution:

  • Language Detector (in ruby not in python :/)
  • Google Translate API v2 (No longer free, have to pay 20 bucks a month while i'm doing this project for academic purposes. Courtesy limit: 0 characters/day )
  • Language identification for python (source code not found, link at below. automatic-language-identification)
  • Enchant (it's not for python 2.7? I'm new to python, any guide? I bet this would be the one i need)
  • Wordnet from NLTK (i got no idea why "wordnet.synsets" is missing and only "wordnet.Synset" is available. the sample code in solution is not working for me as well T_T, probably versioning issue again?)
  • Store english words into list and compare if the word exist (yea, it's kinda bad approach while the sentences are from twitter and.. you knew that :P)

WORKING SOLUTION

Finally after a series of trying, the following is the working solution (alternative to the above list)

  • Wiktionary API (Using Urllib2, and simplejson to parse it. then find if the key is -1 means the word doesn't exist. else it's english. of course, for use in twitter have to preprocess your word into no special character like @#,?!. For how to find the key would referencing here. Simplejson and random key value)
  • Answer from Dogukan Tufekci (Ticked)(Weakness: Let say if the sentence shorter than 20 characters long have to install PyEnchant or it will return UNKNOWN. While PyEnchant is not supporting Python 2.7, means couldn't install and not working to less than 20 character sentence)

References

  • Detecting whether or not text is English (in bulk)
  • How to check if a word is an English word with Python?
  • How to retrieve Wiktionary word content?
like image 227
1myb Avatar asked Mar 07 '13 00:03

1myb


People also ask

How do you check if a word is in the dictionary Python?

To simply check if a key exists in a Python dictionary you can use the in operator to search through the dictionary keys like this: pets = {'cats': 1, 'dogs': 2, 'fish': 3} if 'dogs' in pets: print('Dogs found!') # Dogs found!

What is Langdetect in Python?

$ pip install langdetect… pypi.org. langdetect is a re-implementation of Google's language-detection library from Java to Python. Simply pass your text to the imported detect function and it will output the two-letter ISO 693 code of the language for which the model gave the highest confidence score.


2 Answers

You can try the guess_language library that I found through the Miguel Grinber's The Flask Mega Tutorial. It looks like it supports Python 2 and 3 so it should be ok.

like image 137
Dogukan Tufekci Avatar answered Nov 07 '22 17:11

Dogukan Tufekci


You might be able to make use of Hidden Markov models to detect languages, each language would have their own characteristics.

like image 36
Arafangion Avatar answered Nov 07 '22 17:11

Arafangion