I have a couple of twitter-powered news aggregation website. I have been planning to add images from articles that I find on twitter.
If I download the page and extract image using <img>
tag, I get a bunch of images; not all of them relevant to the article. For example, images of button, icons, ads etc are captured. How do I extract the image accompanying the article? I know there is a solution -- Facebook link sharer does this pretty well.
Mithun
Duplicate of : How to find and extract "main" image in website
Download all images from the page, blacklist all images coming from an ad server. then find some heuristic which will get you the correct image...
I think something like:
then take the image with the most points and throw the rest away
Probably works for majority of sites.
(Would require some fiddling with the heuristics though)
It's been a long time. But this may help next time.
You can use this API https://urlmeta.org/
It's very simple to use and result is the best we need.
example for using API:
<?php
$url = "http://timesofindia.indiatimes.com/business/india-business/Raghuram-Rajan-not-fit-to-be-RBI-Governor-Subramanian-Swamy/articleshow/52236298.cms";
$result = file_get_contents('https://api.urlmeta.org/?url='.$url);
$array = json_decode($result,1);
print_r($array['meta']['image']);
?>
And that's the result you needed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With