Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to intercept request in Puppeteer before current page is left?

Tags:

puppeteer

Usecase:

We need to capture all outbound routes from a page. Some of them may not be implemented using link elements <a src="..."> but via some javascript code or as GET/POST forms.

PhantomJS:

In Phantom we did this using onNavigationRequested callback. We simply clicked at all the elements defined by some selector and used onNavigationRequested to capture the target url and possibly method or POST data in a case of form and then canceled that navigation event.

Puppeteer:

I tried request interception but at the moment request gets intercepted the current page is already lost so I would have to go back.


Is there a way how to capture the navigation event when the browser is still at the page that triggered the event and to stop it?

Thank you.

like image 280
Marek Trunkát Avatar asked Nov 02 '17 08:11

Marek Trunkát


4 Answers

You can do the following.

await page.setRequestInterception(true);
page.on('request', request => {
  if (request.resourceType() === 'image')
    request.abort();
  else
    request.continue();
});

Example here:

https://github.com/GoogleChrome/puppeteer/blob/master/examples/block-images.js

Available resource types are listed here:

https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#requestresourcetype

like image 130
Ming C. Avatar answered Oct 31 '22 13:10

Ming C.


So I finally discovered the solution that doesn't require browser extension and therefore works in a headless mode:

Thx to this guy: https://github.com/GoogleChrome/puppeteer/issues/823#issuecomment-467408640

page.on('request', req => {
  if (req.isNavigationRequest() && req.frame() === page.mainFrame() && req.url() !== url) {
    // no redirect chain means the navigation is caused by setting `location.href`
    req.respond(req.redirectChain().length
      ? { body: '' } // prevent 301/302 redirect
      : { status: 204 } // prevent navigation by js
    )
  } else {
    req.continue()
  }
})

EDIT: We have added helper function to Apify SDK that implements this - https://sdk.apify.com/docs/api/puppeteer#puppeteer.enqueueLinksByClickingElements

Here is whole source code:

https://github.com/apifytech/apify-js/blob/master/src/enqueue_links/click_elements.js

It's slightly more complicated as it does not only need to intercept requests but additionally also catch newly opened windows, etc.

like image 33
Marek Trunkát Avatar answered Oct 31 '22 12:10

Marek Trunkát


I met the same problems.Puppeteer doesn't support the feature now, actually it's chrome devtool that doesn't support it. But I found another way to solve it, using chrome extension. Related issue: https://github.com/GoogleChrome/puppeteer/issues/823

The author of the issue shared a solution here. https://gist.github.com/GuilloOme/2bd651e5154407d2d2165278d5cd7cdb

As the doc says, we can use chrome.webRequest.onBeforeRequest.addListener to intercept all request from the page and block it if you wanna do.

Don't forget to add the following command to the puppeteer launch options:

--load-extension=./your_ext/ --disable-extensions-except=./your_ext/

like image 3
rawidn Avatar answered Oct 31 '22 11:10

rawidn


page.setRequestInterception(true); The documentation has a really thorough example here: https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagesetrequestinterceptionvalue. Make sure to add some logic like in the example (and below) they avoid image requests. You would capture it and then abort each request.

page.on('request', interceptedRequest => {
     if (interceptedRequest.url.endsWith('.png') || 
                              interceptedRequest.url.endsWith('.jpg'))
         interceptedRequest.abort();
     else
         interceptedRequest.continue();
});
like image 1
Bobby Singh Avatar answered Oct 31 '22 12:10

Bobby Singh