I am dealing with an application that is accepting user input in different languages (currently 3 languages fixed). The requirement is that users can enter text and dont bother to select the language via a provided checkbox in the UI.
Is there an existing Java library to detect the language of a text?
I want something like this:
text = "To be or not to be thats the question." // returns ISO 639 Alpha-2 code language = detect(text); print(language);
result:
EN
I dont want to know how to create a language detector by myself (i have seen plenty of blogs trying to do that). The library should provide a simple APi and also work completely offline. Open-source or commercial closed doesn't matter.
i also found this questions on SO (and a few more):
How to detect language
How to detect language of text?
First, you import the detect method from langdetect and then pass the text to the method. The method detects the text provided is in the Swahili language ('sw'). You can also find out the probabilities for the top languages by using detect_langs method.
In Outlook 2019 and 2021 and Word 2019 and 2021On the Review tab, in the Language group, click Language. Click Set Proofing Language. In the Language dialog box, select the Detect language automatically check box. Review the languages shown above the double line in the Mark selected text as list.
This Language Detection Library for Java should give more than 99% accuracy for 53 languages.
Alternatively, there is Apache Tika, a library for content analysis that offers much more than just language detection.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With