Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scraping for a "preview" of a webpage - Python

I'm indexing a list of links, these links update quite often so I'm automating thumbnails for the sites.

For most sites it's easy, as I just grab the biggest image on the page hoping it describes the content.

But other times there are videos as main content of the page.


Does somebody have tips with dealing with this? That would be great!


Regarding the usage of Webkit to create screenshots I found this

like image 248
RadiantHex Avatar asked Feb 27 '10 18:02

RadiantHex


1 Answers

wkhtmltopdf uses an embedded copy of the WebKit render engine (used in Safari, Chrome etc.) to save a webpage to PDF, including all images (no Flash video though I guess). That could be a starting point for a much more accurate thumbnail.

like image 98
Wim Avatar answered Oct 13 '22 00:10

Wim