I'm trying to build something akin to Facebook's "Share" functionality for my website.
I've gotten to the point where I can accept a URL, scrape it for meta keywords and suitably get titles/descriptions, but I'm a bit stuck as to the best way to determine 'likely' photos the user may want to share.
I currently use the SimpleXMLElement to turn the page into a traversable DOM, and find all the tags, turning them into absolute URLs. After that, I'm not sure how I can go about finding a suitable thumbnail.
Do I download them all, and go by file size? Do I use some sort of heuristic like, "was encountered in the middle of the page"?
Does anyone else have any recommendations, suggestions, or tips?
Webp is the best format for web. If high loading speeds are important for you, choose WebP as the image format for your website. JPG and PNG are also good choices for the web. If your choice is between JPG or PNG, use JPG for photos and PNG for logos.
I wrote something similar a while ago to get images from scraped blog posts. My criteria for choosing an image was something along the lines of getting a list of all images on the page then assigning 'priority points':
Then pick the one with the most priority points. It certainly wasn't foolproof or overly scientific but it got something useful far more often than not.
I don't have any direct experience doing this so I'm not sure that there is any specific best practice, but in general I think a heuristic approach looking at several factors would make sense because of the variability found in website implementations.
I would look at two sets of items: image properties and the context of the where/how the images are placed.
Image Properties:
Image Context:
I would assigns weights to the previous items and then rank the images you find according to how well each image satisfies the rules.
Also, note that some pages may use CSS (or Flash, etc) to display images. These our outside of your purview of images (according to the algorithm you defined); perhaps not a big deal, but something to consider.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With