Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to detect language of user entered text? [closed]

I am dealing with an application that is accepting user input in different languages (currently 3 languages fixed). The requirement is that users can enter text and dont bother to select the language via a provided checkbox in the UI.

Is there an existing Java library to detect the language of a text?

I want something like this:

text = "To be or not to be thats the question."  // returns ISO 639 Alpha-2 code language = detect(text);  print(language); 

result:

EN 

I dont want to know how to create a language detector by myself (i have seen plenty of blogs trying to do that). The library should provide a simple APi and also work completely offline. Open-source or commercial closed doesn't matter.

i also found this questions on SO (and a few more):

How to detect language
How to detect language of text?

like image 257
ManBugra Avatar asked Jul 12 '10 10:07

ManBugra


People also ask

How can you detect language of text in NLP?

First, you import the detect method from langdetect and then pass the text to the method. The method detects the text provided is in the Swahili language ('sw'). You can also find out the probabilities for the top languages by using detect_langs method.

How do I get my language to automatically detect?

In Outlook 2019 and 2021 and Word 2019 and 2021On the Review tab, in the Language group, click Language. Click Set Proofing Language. In the Language dialog box, select the Detect language automatically check box. Review the languages shown above the double line in the Mark selected text as list.


1 Answers

This Language Detection Library for Java should give more than 99% accuracy for 53 languages.

Alternatively, there is Apache Tika, a library for content analysis that offers much more than just language detection.

like image 186
yvespeirsman Avatar answered Sep 21 '22 12:09

yvespeirsman