Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Missing request headers in puppeteer

I want to read the request cookie during a test written with the puppeteer. But I noticed that most of the requests I inspect have only referrer and user-agent headers. If I look at the same requests in Chrome dev tools, they have a lot more headers, including Cookie. To check it out, copy-paste the code below into https://try-puppeteer.appspot.com/.

const browser = await puppeteer.launch();
const page = await browser.newPage();

page.on('request', function(request) {
  console.log(JSON.stringify(request.headers, null, 2));
});

await page.goto('https://google.com/', {waitUntil: 'networkidle'});

await browser.close();

Is there a restriction which request headers you can and can not access? Is it a limitation of Chrome itself or puppeteer?

Thanks for suggestions!

like image 775
Bardt Avatar asked Nov 02 '17 15:11

Bardt


2 Answers

I also saw this when I was trying to use Puppeteer to test some CORS behaviour - I found the Origin header was missing from some requests.

Having a look around the GitHub issues I found an issue which mentioned Puppeteer does not listen to the Network.responseReceivedExtraInfo event of the underlying Chrome DevTools Protocol, this event provides extra response headers not available to the Network.responseReceived event. There is also a similar Network.requestWillBeSentExtraInfo event for requests.

Hooking up to these events seemed to get me all the headers I needed. Here is some sample code which captures the data from all these events and merges it onto a single object keyed by request ID:

// Setup.
const browser = await puppeteer.launch()
const page = await browser.newPage()
const cdpRequestDataRaw = await setupLoggingOfAllNetworkData(page)

// Make requests.
await page.goto('http://google.com/')

// Log captured request data.
console.log(JSON.stringify(cdpRequestDataRaw, null, 2))

await browser.close()

// Returns map of request ID to raw CDP request data. This will be populated as requests are made.
async function setupLoggingOfAllNetworkData(page) {
    const cdpSession = await page.target().createCDPSession()
    await cdpSession.send('Network.enable')
    const cdpRequestDataRaw = {}
    const addCDPRequestDataListener = (eventName) => {
        cdpSession.on(eventName, request => {
            cdpRequestDataRaw[request.requestId] = cdpRequestDataRaw[request.requestId] || {}
            Object.assign(cdpRequestDataRaw[request.requestId], { [eventName]: request })
        })
    }
    addCDPRequestDataListener('Network.requestWillBeSent')
    addCDPRequestDataListener('Network.requestWillBeSentExtraInfo')
    addCDPRequestDataListener('Network.responseReceived')
    addCDPRequestDataListener('Network.responseReceivedExtraInfo')
    return cdpRequestDataRaw
}
like image 95
Hugo Avatar answered Nov 12 '22 04:11

Hugo


That's because your browser sets a bunch of headers depending on settings and capabilities, and also includes e.g. the cookies that it has stored locally for the specific page.

If you want to add additional headers, you can use methods such as:

page.setExtraHTTPHeaders docs here.

page.setUserAgent docs here.

page.setCookies docs here.

With these you can mimic the extra headers that you see your Chrome browser dispatching.

like image 22
tomahaug Avatar answered Nov 12 '22 06:11

tomahaug