How to intercept request in Puppeteer before current page is left?

Question

Usecase:

We need to capture all outbound routes from a page. Some of them may not be implemented using link elements <a src="..."> but via some javascript code or as GET/POST forms.

PhantomJS:

In Phantom we did this using onNavigationRequested callback. We simply clicked at all the elements defined by some selector and used onNavigationRequested to capture the target url and possibly method or POST data in a case of form and then canceled that navigation event.

Puppeteer:

I tried request interception but at the moment request gets intercepted the current page is already lost so I would have to go back.

Is there a way how to capture the navigation event when the browser is still at the page that triggered the event and to stop it?

Thank you.

Ming C. · Accepted Answer

You can do the following.

await page.setRequestInterception(true);
page.on('request', request => {
  if (request.resourceType() === 'image')
    request.abort();
  else
    request.continue();
});

Example here:

https://github.com/GoogleChrome/puppeteer/blob/master/examples/block-images.js

Available resource types are listed here:

https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#requestresourcetype

Marek Trunkát · Answer

So I finally discovered the solution that doesn't require browser extension and therefore works in a headless mode:

Thx to this guy: https://github.com/GoogleChrome/puppeteer/issues/823#issuecomment-467408640

page.on('request', req => {
  if (req.isNavigationRequest() && req.frame() === page.mainFrame() && req.url() !== url) {
    // no redirect chain means the navigation is caused by setting `location.href`
    req.respond(req.redirectChain().length
      ? { body: '' } // prevent 301/302 redirect
      : { status: 204 } // prevent navigation by js
    )
  } else {
    req.continue()
  }
})

EDIT: We have added helper function to Apify SDK that implements this - https://sdk.apify.com/docs/api/puppeteer#puppeteer.enqueueLinksByClickingElements

Here is whole source code:

https://github.com/apifytech/apify-js/blob/master/src/enqueue_links/click_elements.js

It's slightly more complicated as it does not only need to intercept requests but additionally also catch newly opened windows, etc.

rawidn · Answer

I met the same problems.Puppeteer doesn't support the feature now, actually it's chrome devtool that doesn't support it. But I found another way to solve it, using chrome extension. Related issue: https://github.com/GoogleChrome/puppeteer/issues/823

The author of the issue shared a solution here. https://gist.github.com/GuilloOme/2bd651e5154407d2d2165278d5cd7cdb

As the doc says, we can use chrome.webRequest.onBeforeRequest.addListener to intercept all request from the page and block it if you wanna do.

Don't forget to add the following command to the puppeteer launch options:

--load-extension=./your_ext/ --disable-extensions-except=./your_ext/

Bobby Singh · Answer

page.setRequestInterception(true); The documentation has a really thorough example here: https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagesetrequestinterceptionvalue. Make sure to add some logic like in the example (and below) they avoid image requests. You would capture it and then abort each request.

page.on('request', interceptedRequest => {
     if (interceptedRequest.url.endsWith('.png') || 
                              interceptedRequest.url.endsWith('.jpg'))
         interceptedRequest.abort();
     else
         interceptedRequest.continue();
});

How to intercept request in Puppeteer before current page is left?

Tags:

puppeteer

Usecase:

PhantomJS:

Puppeteer:

Marek Trunkát

4 Answers

Ming C.

Marek Trunkát

rawidn

Bobby Singh

Recent Activity

Donate For Us

How to intercept request in Puppeteer before current page is left?

Tags:

puppeteer

Usecase:

PhantomJS:

Puppeteer:

Marek Trunkát

4 Answers

Ming C.

Marek Trunkát

rawidn

Bobby Singh

Related questions

Recent Activity

Donate For Us