I have a list of articles, and each article has its own title and description. Unfortunately, from the sources I am using, there is no way to know what language they are written in.
Furthermore, the text is not entirely written in 1 language; almost always English words are present.
I reckon I would need dictionary databases stored on my machine, but it feels a bit impractical. What would you suggest I do?
First, you import the detect method from langdetect and then pass the text to the method. The method detects the text provided is in the Swahili language ('sw'). You can also find out the probabilities for the top languages by using detect_langs method.
In Outlook 2010, 2013, and 2016 and Word 2010, 2013 and 2016On the Review tab, in the Language group, click Language. Click Set Proofing Language. In the Language dialog box, select the Detect language automatically check box. Review the languages shown above the double line in the Mark selected text as list.
I'd use the guess-language project.
Edit: Now in Bitbucket
Have you looked into http://ling.unizd.hr/~dcavar/LID/ and http://en.wikipedia.org/wiki/Language_identification ?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With