I am using puppeteer
to load a web page.
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.setRequestInterception(true);
page.on('request', (request) => {
console.log(request.url())
request.continue();
...
}
}
await page.goto(
'https://www.onthehouse.com.au/property-for-rent/vic/aspendale-gardens-3195',
{ waitUntil: 'networkidle2' },
);
I set the request interception to true
and log all requests urls. The requests I logged is a lot less than the requests when I load the url in chrome browser.
At least there is one request https://www.onthehouse.com.au/odin/api/compositeSearch
which can be found in chrome dev tool console but not show in above code.
I wonder how I can log all requests?
I did some benchmarking between 4 variants of this script. And for me the results were the same. Note: I did multiple tests, sometimes due to local network speed it was less calls. But after 2-3 tries Puppeteer was able to catch all requests.
On the https://www.onthehouse.com.au/property-for-rent/vic/aspendale-gardens-3195 page there are some async
and defer
scripts, my hypothesis was that may load differently when we use different Puppeteer settings, or async vs. sync functions inside page.on
.
Note 2: I tested another page, not the one in the original question as I already needed a VPN to visit this Australian website, it was easy from Chrome, with Puppeteer it would take more: trust me the page I tested has similarly tons of analytics and tracking requests.
First I've visited xy webpage, the results were 28 calls on the Network tab.
await page.setRequestInterception(true);
page.on('request', (request) => {
console.log(request.url())
request.continue();
...
}
}
await page.goto(
'https://www.onthehouse.com.au/property-for-rent/vic/aspendale-gardens-3195',
{ waitUntil: 'networkidle2' },
);
Result: 28 calls
The page.on
has an async function inside so we can await the request.url()
await page.setRequestInterception(true);
page.on('request', async request => {
console.log(await request.url())
request.continue();
...
}
}
await page.goto(
'https://www.onthehouse.com.au/property-for-rent/vic/aspendale-gardens-3195',
{ waitUntil: 'networkidle2' },
);
Result: 28 calls
Similar as the original, but with networkidle0
.
await page.setRequestInterception(true);
page.on('request', (request) => {
console.log(request.url())
request.continue();
...
}
}
await page.goto(
'https://www.onthehouse.com.au/property-for-rent/vic/aspendale-gardens-3195',
{ waitUntil: 'networkidle0' },
);
Result: 28 calls
The page.on
has an async function inside so we can await the request.url()
. Plus networkidle0
.
await page.setRequestInterception(true);
page.on('request', async request => {
console.log(await request.url())
request.continue();
...
}
}
await page.goto(
'https://www.onthehouse.com.au/property-for-rent/vic/aspendale-gardens-3195',
{ waitUntil: 'networkidle0' },
);
Result: 28 calls
As there was no difference between the number of requests on the Network tab and from Puppeteer, neither the way we launch puppeteer or how we collect the requests my idea is:
[...] By continuing to use our website, you consent to cookies being used.
Solution: Do not directly visit the desired page, but navigate there through clicks, so your Puppeteer's Chromium will accept the cookie consent, hence you will have all analytics requests as well.
Advise: Check your Puppeteer requests against an incognito Chrome's Network tab, make sure all Extensions/Addons are disabled.
+ If you are only interested in XHR then you may need to add request.resourceType
to your code to differentiate them from others docs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With