https://github.com/mozilla/readability (readability.js is for creating a read view for web pages)
How can I implement readability.js to this test Webpage The problem is, readability.js deletes the elements of this website, that I want to keep and leaves those that should be removed. I hope someone can help me. Thank you! Is there any documentation on how to use readability.js?
<html><head>
<title>Reader View shows only the browser in reader view</title>
<script src="https://raw.githack.com/mozilla/readability/master/Readability.js"></script>
</head>
<body>
Everything outside the main div tag vanishes in Reader View<br>
<img class="no-print" src="http://dummyimage.com/1024x100/000/ffffff&text=This+banner+should+vanish+in+print+view">
<div>
<h1>H1 tags outside ot a p tag are hidden in reader view</h1>
<img class="no-print" src="http://dummyimage.com/1024x100/000/ffffff&text=This+banner+is resized+in+print+view">
<p>
123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
123456789 123456
</p>
</div>
</body>
<script>
var article = new Readability(document).parse();
</script>
</html>
source of the Test page: Optimize website to show reader view in Firefox
You can use DOMPurify and Readability together like they've mentioned in their docs -
import { Readability } from '@mozilla/readability'
import DOMPurify from 'dompurify';
function readable(doc) {
const reader = new Readability(doc)
const article = reader.parse()
return article
}
let cloneDoc = document.cloneNode(true)
let parsed = readable(cloneDoc)
const markup = DOMPurify.sanitize(parsed.content)
markup
will be an html string of the readable content.
Try console.log(parsed)
to see the available properties.
Did you try this?
From their github page:
"Readability's parse() works by modifying the DOM. This removes some elements in the web page. You could avoid this by passing the clone of the document object while creating a Readability object."
var documentClone = document.cloneNode(true);
var article = new Readability(documentClone).parse();
You can make a copy of the dom object so that you're not actually modifying the real dom
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With