Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the best way to get a description of the website, in Python?

Suppose I downloaded the HTML code, and I can parse it. How do I get the "best" description of that website, if that website does not have meta-description tag?

like image 531
TIMEX Avatar asked Jul 26 '10 05:07

TIMEX


2 Answers

You could get the first few sentence returned from something like Readability.

Safari 5 uses it, so it must be alright :)

like image 120
alex Avatar answered Oct 05 '22 23:10

alex


To follow up on the "Readability" suggestion above (which itself is inspired by the website InstaPaper), they have release the JavaScript: http://code.google.com/p/arc90labs-readability/. What's more, some guy took that and ported it to python: http://github.com/gfxmonk/python-readability. Rejoice!

like image 30
loevborg Avatar answered Oct 06 '22 01:10

loevborg