We need to capture all outbound routes from a page. Some of them may not be implemented using link elements <a src="...">
but via some javascript code or as GET
/POST
forms.
In Phantom we did this using onNavigationRequested
callback. We simply clicked at all the elements defined by some selector and used onNavigationRequested
to capture the target url and possibly method or POST data in a case of form and then canceled that navigation event.
I tried request interception but at the moment request gets intercepted the current page is already lost so I would have to go back.
Is there a way how to capture the navigation event when the browser is still at the page that triggered the event and to stop it?
Thank you.
You can do the following.
await page.setRequestInterception(true);
page.on('request', request => {
if (request.resourceType() === 'image')
request.abort();
else
request.continue();
});
Example here:
https://github.com/GoogleChrome/puppeteer/blob/master/examples/block-images.js
Available resource types are listed here:
https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#requestresourcetype
So I finally discovered the solution that doesn't require browser extension and therefore works in a headless mode:
Thx to this guy: https://github.com/GoogleChrome/puppeteer/issues/823#issuecomment-467408640
page.on('request', req => {
if (req.isNavigationRequest() && req.frame() === page.mainFrame() && req.url() !== url) {
// no redirect chain means the navigation is caused by setting `location.href`
req.respond(req.redirectChain().length
? { body: '' } // prevent 301/302 redirect
: { status: 204 } // prevent navigation by js
)
} else {
req.continue()
}
})
EDIT: We have added helper function to Apify SDK that implements this - https://sdk.apify.com/docs/api/puppeteer#puppeteer.enqueueLinksByClickingElements
Here is whole source code:
https://github.com/apifytech/apify-js/blob/master/src/enqueue_links/click_elements.js
It's slightly more complicated as it does not only need to intercept requests but additionally also catch newly opened windows, etc.
I met the same problems.Puppeteer doesn't support the feature now, actually it's chrome devtool that doesn't support it. But I found another way to solve it, using chrome extension. Related issue: https://github.com/GoogleChrome/puppeteer/issues/823
The author of the issue shared a solution here. https://gist.github.com/GuilloOme/2bd651e5154407d2d2165278d5cd7cdb
As the doc says, we can use chrome.webRequest.onBeforeRequest.addListener
to intercept all request from the page and block it if you wanna do.
Don't forget to add the following command to the puppeteer launch options:
--load-extension=./your_ext/ --disable-extensions-except=./your_ext/
page.setRequestInterception(true);
The documentation has a really thorough example here: https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagesetrequestinterceptionvalue.
Make sure to add some logic like in the example (and below) they avoid image requests. You would capture it and then abort each request.
page.on('request', interceptedRequest => {
if (interceptedRequest.url.endsWith('.png') ||
interceptedRequest.url.endsWith('.jpg'))
interceptedRequest.abort();
else
interceptedRequest.continue();
});
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With