Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python website language detection

i am writing a Bot that can just check thousands of website either they are in English or not.

i am using Scrapy (python 2.7 framework) for crawling each website first page ,

can some one suggest me which is the best way to check website language ,

any help would be appreciated.

like image 227
akhter wahab Avatar asked Jul 16 '12 15:07

akhter wahab


2 Answers

Since you are using Python, you can try out NLTK. More precisely you can check for NLTK.detect

More information and the exact code snippet is here: NLTK and language detection

like image 149
Yavar Avatar answered Sep 18 '22 17:09

Yavar


You can use the response headers to find out:

Wikipedia

like image 40
Hedde van der Heide Avatar answered Sep 20 '22 17:09

Hedde van der Heide