Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I implement Mozilla readability.js to my Website?

https://github.com/mozilla/readability (readability.js is for creating a read view for web pages)

How can I implement readability.js to this test Webpage The problem is, readability.js deletes the elements of this website, that I want to keep and leaves those that should be removed. I hope someone can help me. Thank you! Is there any documentation on how to use readability.js?

<html><head>
<title>Reader View shows only the browser in reader view</title>
    <script src="https://raw.githack.com/mozilla/readability/master/Readability.js"></script>
</head>
<body>
Everything outside the main div tag vanishes in Reader View<br>
<img class="no-print" src="http://dummyimage.com/1024x100/000/ffffff&text=This+banner+should+vanish+in+print+view">
<div>
   <h1>H1 tags outside ot a p tag are hidden in reader view</h1>
   <img class="no-print" src="http://dummyimage.com/1024x100/000/ffffff&text=This+banner+is resized+in+print+view">
   <p>
 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
 123456789 123456
</p>
</div>
</body>
    <script>
    var article = new Readability(document).parse();
    </script>
</html>

source of the Test page: Optimize website to show reader view in Firefox

like image 356
Marcel Avatar asked Jun 10 '20 22:06

Marcel


2 Answers

You can use DOMPurify and Readability together like they've mentioned in their docs -

import { Readability } from '@mozilla/readability'
import DOMPurify from 'dompurify';

function readable(doc) {
  const reader = new Readability(doc)
  const article = reader.parse()
  return article
}

let cloneDoc = document.cloneNode(true)
let parsed = readable(cloneDoc)
const markup = DOMPurify.sanitize(parsed.content)

markup will be an html string of the readable content. Try console.log(parsed) to see the available properties.

like image 174
akkhil Avatar answered Oct 18 '22 21:10

akkhil


Did you try this?

From their github page:

"Readability's parse() works by modifying the DOM. This removes some elements in the web page. You could avoid this by passing the clone of the document object while creating a Readability object."

var documentClone = document.cloneNode(true); 
var article = new Readability(documentClone).parse();

You can make a copy of the dom object so that you're not actually modifying the real dom

like image 31
bze12 Avatar answered Oct 18 '22 19:10

bze12