Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to detect the language of a given text

In my Rails 3 application, users may write messages in forum. I would like to identify what the language is for a given message. I'm interested in English, Russian, and Hebrew languages. Is there any built-in library in Ruby/Rails for such a task? If not, any ideas will be appreciated.

like image 808
Misha Moroshko Avatar asked May 05 '11 12:05

Misha Moroshko


3 Answers

Use this: https://github.com/nashby/wtf_lang

"ruby is so awesome!".lang # => "en"
"ruby is so awesome!".full_lang # => "ENGLISH"
like image 121
Vasiliy Ermolovich Avatar answered Nov 18 '22 00:11

Vasiliy Ermolovich


You can use the api provided by google to guess it with google translate.

See here for documentation : http://code.google.com/apis/language/translate/v1/using_rest_langdetect.html

like image 5
Hartator Avatar answered Nov 18 '22 01:11

Hartator


Since you're concerned with languages with different character sets you could dig up the character codes that are predominantly in your strings. You could then see if they fall into the code sets that represent hebrew / cryllic characters.

like image 2
digitalWestie Avatar answered Nov 18 '22 01:11

digitalWestie