I need a way to visually represent a random web page on the internet.
Let's say for example this web page.
Currently, these are the standard assets I can use:
I need to visually represent a random website in a way that is very meaningful and inviting for others to click on it.
I need something like what Facebook does when you share a link:
It scraps the link for images and then creates a beautiful meaningful tile which is inviting to click on.
Any way I can scrape the images and text from websites? I'm primarily interested in a Objective-C/JavaScript combo but anything will do and will be selected as an approved answer.
Edit: Re-wrote the post and changed the title.
If you're using Chrome browser, Image downloader for Chrome will be a good choice. For Edge user, you can try Microsoft Edge Image Downloader. Let's take Chrome as an example. Open the website you are aiming to scrape pictures from.
Websites will often provide meta information for user friendly social media sharing, such as Open Graph protocol tags. In fact, in your own example, the reddit page has Open Graph tags which make up the information in the Link Preview (look for meta tags with og: properties).
A fallback approach would be to implement site specific parsing code for most popular websites that don't already conform to a standardized format or to try and generically guess what the most prominent content on a given website is (for example, biggest image above the fold, first few sentences of the first paragraph, text in heading elements etc).
Problem with the former approach is that you you have to maintain the parsers as those websites change and evolve and with the latter that you simply cannot reliably predict what's important on a page and you can't expect to always find what you're looking for either (images for the thumbnail, for example).
Since you will never be able to generate meaningful previews for a 100% of the websites, it boils down to a simple question. What's an acceptable rate of successful link previews? If it's close to what you can get parsing standard meta information, I'd stick with that and save myself a lot of headache. If not, alternatively to the libraries shared above, you can also have a look at paid services/APIs which will likely cover more use cases than you could on your own.
This is what the OpenGraph standard is for. For instance, if you go to the Reddit post in the example, you can view the page information provided by HTML <meta />
tags (all the ones with names starting with 'og'):
However, it is not possible for you to get the data from inside a web browser; CORS prevents the request to the URL. In fact, what Facebook seems to do is send the URL to their servers and have them perform a request to get the required information, and sending it back.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With