Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to scrape JSON from puppeteer?

I login to a site and it gives a browser cookie.

I go to a URL and it is a json response.

How do I scrape the page after entering await page.goto('blahblahblah.json'); ?

like image 884
Amy Coin Avatar asked Jan 29 '18 22:01

Amy Coin


1 Answers

Another way which doesn't give you intermittent issues is to evaluate the body when it becomes available and return it as JSON e.g.

const puppeteer = require('puppeteer'); 

async function run() {

    const browser = await puppeteer.launch( {
        headless: false  //change to true in prod!
    }); 

    const page = await browser.newPage(); 

    await page.goto('https://raw.githubusercontent.com/GoogleChrome/puppeteer/master/package.json');

   //I would leave this here as a fail safe
    await page.content(); 

    innerText = await page.evaluate(() =>  {
        return JSON.parse(document.querySelector("body").innerText); 
    }); 

    console.log("innerText now contains the JSON");
    console.log(innerText);

    //I will leave this as an excercise for you to
    //  write out to FS...

    await browser.close(); 

};

run(); 
like image 143
Rippo Avatar answered Oct 14 '22 00:10

Rippo