Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I capture all network requests and full response data when loading a page in Chrome?

Using Puppeteer, I'd like to load a URL in Chrome and capture the following information:

  • request URL
  • request headers
  • request post data
  • response headers text (including duplicate headers like set-cookie)
  • transferred response size (i.e. compressed size)
  • full response body

Capturing the full response body is what causes the problems for me.

Things I've tried:

  • Getting response content with response.buffer - this does not work if there are redirects at any point, since buffers are wiped on navigation
  • intercepting requests and using getResponseBodyForInterception - this means I can no longer access the encodedLength, and I also had problems getting the correct request and response headers in some cases
  • Using a local proxy works, but this slowed down page load times significantly (and also changed some behavior for e.g. certificate errors)

Ideally the solution should only have a minor performance impact and have no functional differences from loading a page normally. I would also like to avoid forking Chrome.

like image 553
Matt Zeunert Avatar asked Oct 24 '18 12:10

Matt Zeunert


People also ask

How do I capture Network requests?

Select the Chrome menu (⋮) at the top-right of your browser window, then select Tools > Developer Tools. The Developer Tools opens as a docked panel at the side or bottom of Chrome. Click on the Network tab. Select the option Preserve log.

How do I track HTTP requests in my browser?

To view the request or response HTTP headers in Google Chrome, take the following steps : In Chrome, visit a URL, right click , select Inspect to open the developer tools. Select Network tab. Reload the page, select any HTTP request on the left panel, and the HTTP headers will be displayed on the right panel.


2 Answers

You can enable a request interception with page.setRequestInterception() for each request, and then, inside page.on('request'), you can use the request-promise-native module to act as a middle man to gather the response data before continuing the request with request.continue() in Puppeteer.

Here's a full working example:

'use strict';  const puppeteer = require('puppeteer'); const request_client = require('request-promise-native');  (async () => {   const browser = await puppeteer.launch();   const page = await browser.newPage();   const result = [];    await page.setRequestInterception(true);    page.on('request', request => {     request_client({       uri: request.url(),       resolveWithFullResponse: true,     }).then(response => {       const request_url = request.url();       const request_headers = request.headers();       const request_post_data = request.postData();       const response_headers = response.headers;       const response_size = response_headers['content-length'];       const response_body = response.body;        result.push({         request_url,         request_headers,         request_post_data,         response_headers,         response_size,         response_body,       });        console.log(result);       request.continue();     }).catch(error => {       console.error(error);       request.abort();     });   });    await page.goto('https://example.com/', {     waitUntil: 'networkidle0',   });    await browser.close(); })(); 
like image 96
Grant Miller Avatar answered Nov 07 '22 21:11

Grant Miller


Puppeteer-only solution

This can be done with puppeteer alone. The problem you are describing that the response.buffer is cleared on navigation, can be circumvented by processing each request one after another.

How it works

The code below uses page.setRequestInterception to intercept all requests. If there is currently a request being processed/being waited for, new requests are put into a queue. Then, response.buffer() can be used without the problem that other requests might asynchronously wipe the buffer as there are no parallel requests. As soon as the currently processed request/response is handled, the next request will be processed.

Code

const puppeteer = require('puppeteer');  (async () => {     const browser = await puppeteer.launch();     const [page] = await browser.pages();      const results = []; // collects all results      let paused = false;     let pausedRequests = [];      const nextRequest = () => { // continue the next request or "unpause"         if (pausedRequests.length === 0) {             paused = false;         } else {             // continue first request in "queue"             (pausedRequests.shift())(); // calls the request.continue function         }     };      await page.setRequestInterception(true);     page.on('request', request => {         if (paused) {             pausedRequests.push(() => request.continue());         } else {             paused = true; // pause, as we are processing a request now             request.continue();         }     });      page.on('requestfinished', async (request) => {         const response = await request.response();          const responseHeaders = response.headers();         let responseBody;         if (request.redirectChain().length === 0) {             // body can only be access for non-redirect responses             responseBody = await response.buffer();         }          const information = {             url: request.url(),             requestHeaders: request.headers(),             requestPostData: request.postData(),             responseHeaders: responseHeaders,             responseSize: responseHeaders['content-length'],             responseBody,         };         results.push(information);          nextRequest(); // continue with next request     });     page.on('requestfailed', (request) => {         // handle failed request         nextRequest();     });      await page.goto('...', { waitUntil: 'networkidle0' });     console.log(results);      await browser.close(); })(); 
like image 36
Thomas Dondorf Avatar answered Nov 07 '22 22:11

Thomas Dondorf