Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recognizing language of a short text? [closed]

Tags:

I have a list of articles, and each article has its own title and description. Unfortunately, from the sources I am using, there is no way to know what language they are written in.

Furthermore, the text is not entirely written in 1 language; almost always English words are present.

I reckon I would need dictionary databases stored on my machine, but it feels a bit impractical. What would you suggest I do?

like image 434
RadiantHex Avatar asked Mar 22 '10 17:03

RadiantHex


People also ask

How can you detect language of text in NLP?

First, you import the detect method from langdetect and then pass the text to the method. The method detects the text provided is in the Swahili language ('sw'). You can also find out the probabilities for the top languages by using detect_langs method.

How do I search for a language in Word?

In Outlook 2010, 2013, and 2016 and Word 2010, 2013 and 2016On the Review tab, in the Language group, click Language. Click Set Proofing Language. In the Language dialog box, select the Detect language automatically check box. Review the languages shown above the double line in the Mark selected text as list.


2 Answers

I'd use the guess-language project.

Edit: Now in Bitbucket

like image 64
Alex Martelli Avatar answered Sep 22 '22 23:09

Alex Martelli


Have you looked into http://ling.unizd.hr/~dcavar/LID/ and http://en.wikipedia.org/wiki/Language_identification ?

like image 26
neo Avatar answered Sep 19 '22 23:09

neo