Using Puppeteer, how can I run a script in the page context, with the full DOM available, before the in-page JS is executed?
For example, how can I run the following script to remove alt
attributes from img
elements, before any of the page JS is run?
document.querySelectorAll('img[alt]').forEach(
e => e.removeAttribute('alt')
)
(page.evaluateOnNewDocument
looks like it would be useful, but it appears to be executed before the page content is available--at the point at which it runs, the page is blank.)
jQuery provides various methods to add, edit or delete DOM element(s) in the HTML page. The following table lists some important methods to add/remove new DOM elements. Inserts content to the end of element(s) which is specified by a selector.
The DOM isn't a programming language, rather it's a programming interface therefore it's not limited to being used by only JavaScript and HTML. Here is a python script used to manipulate the DOM of an XML document. document = m.
I think the way to achieve what you are looking for is to perform:
page.setJavaScriptEnabled(false)
page.setJavaScriptEnabled(true)
page.goto(`data:text/html,${HTMLWithoutScript}`)
with HTML from step 3page.addScriptTag({ content: script })
from step 3Here is a visualization of your problematic example:
const puppeteer = require('puppeteer');
const html = `
<html>
<head></head>
<body>
<img src="https://picsum.photos/200/300?image=1062" alt="dog ">
<img src="https://picsum.photos/200/300?image=1072" alt="car ">
<div class="alts">List of alts: </div>
<script>
const images = document.querySelectorAll('img');
const altsContainer = document.querySelector('.alts');
images.forEach(image => {
const alt = image.getAttribute('alt') || 'missing alt ';
altsContainer.insertAdjacentHTML('beforeend', alt);
})
</script>
</body>
</html>`;
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(`data:text/html,${html}`);
await page.evaluate(() => {
document.querySelectorAll('img[alt]').forEach(
e => e.removeAttribute('alt')
)
});
await page.screenshot({ path: 'image.png' });
await browser.close();
})();
This code produce:
So remove alts is not working here.
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setJavaScriptEnabled(false);
await page.goto(`data:text/html,${html}`);
const { script, HTMLWithoutScript } = await page.evaluate(() => {
const script = document.querySelector('script').innerHTML;
document.querySelector('script').innerHTML = '';
const HTMLWithoutScript = document.body.innerHTML;
return { script, HTMLWithoutScript }
});
await page.setJavaScriptEnabled(true);
await page.goto(`data:text/html,${HTMLWithoutScript}`);
await page.evaluate(() => {
document.querySelectorAll('img[alt]').forEach(
e => e.removeAttribute('alt')
)
});
await page.addScriptTag({ content: script });
await page.screenshot({ path: 'image.png' });
await browser.close();
})();
This will produce results as you expect in a question:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With