Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I get all xhr calls in puppeteer?

Tags:

I am using puppeteer to load a web page.

const browser = await puppeteer.launch({ headless: true });
  const page = await browser.newPage();
  await page.setRequestInterception(true);
  page.on('request', (request) => {
    console.log(request.url())
    request.continue();
    ...
  }
}
await page.goto(
    'https://www.onthehouse.com.au/property-for-rent/vic/aspendale-gardens-3195',
    { waitUntil: 'networkidle2' },
  );

I set the request interception to true and log all requests urls. The requests I logged is a lot less than the requests when I load the url in chrome browser. At least there is one request https://www.onthehouse.com.au/odin/api/compositeSearch which can be found in chrome dev tool console but not show in above code.

I wonder how I can log all requests?

like image 459
Joey Yi Zhao Avatar asked Jun 12 '20 06:06

Joey Yi Zhao


1 Answers

I did some benchmarking between 4 variants of this script. And for me the results were the same. Note: I did multiple tests, sometimes due to local network speed it was less calls. But after 2-3 tries Puppeteer was able to catch all requests.

On the https://www.onthehouse.com.au/property-for-rent/vic/aspendale-gardens-3195 page there are some async and defer scripts, my hypothesis was that may load differently when we use different Puppeteer settings, or async vs. sync functions inside page.on.

Note 2: I tested another page, not the one in the original question as I already needed a VPN to visit this Australian website, it was easy from Chrome, with Puppeteer it would take more: trust me the page I tested has similarly tons of analytics and tracking requests.


Baseline from Chrome network: 28 calls

First I've visited xy webpage, the results were 28 calls on the Network tab.

Case 1: Original (sync, networkidle2)

  await page.setRequestInterception(true);
  page.on('request', (request) => {
    console.log(request.url())
    request.continue();
    ...
  }
}
await page.goto(
    'https://www.onthehouse.com.au/property-for-rent/vic/aspendale-gardens-3195',
    { waitUntil: 'networkidle2' },
  );

Result: 28 calls

Case 2: Async, networkidle2

The page.on has an async function inside so we can await the request.url()

  await page.setRequestInterception(true);
  page.on('request', async request => {
    console.log(await request.url())
    request.continue();
    ...
  }
}
await page.goto(
    'https://www.onthehouse.com.au/property-for-rent/vic/aspendale-gardens-3195',
    { waitUntil: 'networkidle2' },
  );

Result: 28 calls

Case 3: Sync, networkidle0

Similar as the original, but with networkidle0.

  await page.setRequestInterception(true);
  page.on('request', (request) => {
    console.log(request.url())
    request.continue();
    ...
  }
}
await page.goto(
    'https://www.onthehouse.com.au/property-for-rent/vic/aspendale-gardens-3195',
    { waitUntil: 'networkidle0' },
  );

Result: 28 calls

Case 3: Async, networkidle0

The page.on has an async function inside so we can await the request.url(). Plus networkidle0.

  await page.setRequestInterception(true);
  page.on('request', async request => {
    console.log(await request.url())
    request.continue();
    ...
  }
}
await page.goto(
    'https://www.onthehouse.com.au/property-for-rent/vic/aspendale-gardens-3195',
    { waitUntil: 'networkidle0' },
  );

Result: 28 calls


As there was no difference between the number of requests on the Network tab and from Puppeteer, neither the way we launch puppeteer or how we collect the requests my idea is:

  • Either you have accepted the Cookie Consent in your Chrome so the Network will have more requests (these requests only happen after the cookies are accepted), you can accept their cookie policy with a simple navigation, so after you've navigated inside their page there will be more requests on Network immediately.

    [...] By continuing to use our website, you consent to cookies being used.

Solution: Do not directly visit the desired page, but navigate there through clicks, so your Puppeteer's Chromium will accept the cookie consent, hence you will have all analytics requests as well.

  • Some Chrome addon affects the number of requests on the page.

Advise: Check your Puppeteer requests against an incognito Chrome's Network tab, make sure all Extensions/Addons are disabled.


+ If you are only interested in XHR then you may need to add request.resourceType to your code to differentiate them from others docs.

like image 61
theDavidBarton Avatar answered Sep 30 '22 18:09

theDavidBarton