Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create summary from link

Tags:

web-scraping

Many pages (facebook, google+ etc) have a function that creates a summary with header, image and some text from a link. I have tried to find out if there are any libraries or guidelines about how to do this kind of function but my search-results havn't been helpful at all.

I know that I can parse the html of a page and extract the elements I'd like but I think there should be some kind of standard in how to do this (perhaps also how to create pages that are friendly to this kind of functionallity.

Anyone that have a good link that will point me to the right direction? Javascript or .Net is my prefered choise but I can implement it myself too.

like image 710
Roland Avatar asked Nov 14 '22 18:11

Roland


1 Answers

For the "perhaps also how to create pages that are friendly to this kind of functionallity." part:
You are probably searching for the open graph protocol:

<html xmlns:og="http://ogp.me/ns#">
<head>
<title>The Rock (1996)</title>
<meta property="og:title" content="The Rock" />
<meta property="og:type" content="movie" />
<meta property="og:url" content="http://www.imdb.com/title/tt0117500/" />
<meta property="og:image" content="http://ia.media-imdb.com/images/rock.jpg" />
...
</head>
...
</html>

I think this is the first place facebook will look. But facebook seems to have its own algorithms to detect the most relevant part of the page when these tags are missing.

like image 178
Fortega Avatar answered Feb 25 '23 11:02

Fortega