Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Programmatically capturing AJAX traffic with headless Chrome

Chrome officially supports running the browser in headless mode (including programmatic control via the Puppeteer API and/or the CRI library).

I've searched through the documentation, but I haven't found how to programmatically capture the AJAX traffic from the instances (ie. start an instance of Chrome from code, navigate to a page, and access the background response/request calls & raw data (all from code not using the developer tools or extensions).

Do you have any suggestions or examples detailing how this could be achieved? Thanks!

like image 243
Andrei Avatar asked Sep 06 '17 12:09

Andrei


2 Answers

Update

As @Alejandro pointed out in the comment, resourceType is a function and the return value is lowercased

page.on('request', request => {
    if (request.resourceType() === 'xhr')
    // do something
});

Original answer

Puppeteer's API makes this really easy:

page.on('request', request => {
  if (request.resourceType === 'XHR')
    // do something
});

You can also intercept requests with setRequestInterception, but it's not needed in this example if you're not going to modify the requests.

There's an example of intercepting image requests that you can adapt.

resourceTypes are defined here.

like image 65
ebidel Avatar answered Oct 04 '22 11:10

ebidel


Puppeteer's listeners could help you capture xhr response via response and request event.

You should check wether request.resourceType() is xhr or fetch first.

        listener = page.on('response', response => {
            const isXhr = ['xhr','fetch'].includes(response.request().resourceType())
            if (isXhr){
                console.log(response.url());
                response.text().then(console.log)
            }
        })
like image 45
ahuigo Avatar answered Oct 04 '22 11:10

ahuigo