Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rules to pull reader-view like content from website?

I'm trying to implement my own little reader view app (an app that would do the same thing as reader-mode on safari), and there are a few things I find asking myself:

  • Is there a technical term for this feature (reader-view doesn't really cut it)?
  • Is there a standard that websites are supposed to follow in order to indicate the content they would like to have in their reader views
  • Is there an open-source set of HTML parsing rules to pull the "readable" content from a website?
  • Is the effort to implement such a thing simply too big for a single person in a few weeks and if so should I opt for services such as Instaparser?
like image 757
Quantaliinuxite Avatar asked Apr 06 '16 15:04

Quantaliinuxite


1 Answers

I believe the original to be implemented by arc90, and they called it readability. You can check out their page here.

It's been ported to many different languages over time, so you could take a look at the different implementations to learn more about it, how it's done etc.

  • Python readability
  • JReadability
  • JavaScript
  • Ruby

This is just a small sample here, there's many more examples if you would like to find more.

Edit: Oops, after some more Googling I found this question with an answer that explains it very well.

like image 134
bmcculley Avatar answered Sep 30 '22 17:09

bmcculley