Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do i return a value from page.evaluate() in puppeteer?

I am trying to get a value from inside page.evaluate() body in my YouTube scraper that I've built using Puppeteer. I am unable to return the result from page.evaluate(). How do I achieve this? Here's the code:

let boxes2 = []
        const getData = async() => {
            return await page.evaluate(async () => { // scroll till there's no more room to scroll or you get at least 250 boxes  
                console.log(await new Promise(resolve => {

                    var scrolledHeight = 0  
                    var distance = 100 
                    var timer = setInterval(() => {
                        boxes = document.querySelectorAll("div.style-scope.ytd-item-section-renderer#contents > ytd-video-renderer > div.style-scope.ytd-video-renderer#dismissable")
                        console.log(`${boxes.length} boxes`)
                        var scrollHeight = document.documentElement.scrollHeight
                        window.scrollBy(0, distance)
                        scrolledHeight += distance
                        if(scrolledHeight >= scrollHeight || boxes.length >= 50){
                            clearInterval(timer)
                            resolve(Array.from(boxes))
                        }
                    }, 500)
                }))
            })
        }
        boxes2 = await getData()
        console.log(boxes2)

The console.log wrapping the promise prints the resulting array in the browser's console. I just cannot get that array in boxes2 down where I'm calling the getData() function. I feel like I'm missing out on a tiny little bit, but can't figure out what it is. Appreciate any tip here.

like image 319
roitmi Avatar asked Aug 20 '19 09:08

roitmi


3 Answers

The little issue is that you don't actually return the data from inside of page.evaluate:

const getData = () => {
    return page.evaluate(async () => { 
        return await new Promise(resolve => { // <-- return the data to node.js from browser
            // scraping
        }))
    })
}

And here's a full minimal working example for puppeteer that will print array [ 1, 2, 3 ]:

const puppeteer = require('puppeteer');

puppeteer.launch().then(async browser => {
  const page = await browser.newPage();

  boxes2 = [];

  const getData = async() => {
    return await page.evaluate(async () => {
        return await new Promise(resolve => {
          setTimeout(() => {
                resolve([1,2,3]);
          }, 3000)
      })
    })
  }  

  boxes2 = await getData();
  console.log(boxes2)

  await browser.close();
});
like image 175
Vaviloff Avatar answered Sep 21 '22 06:09

Vaviloff


To get parameters to work with a result back to here is what you need.

const results = await page.evaluate(new Function('name', "return new Promise(resolve => {resolve('done')});"), name);
like image 30
Rick Avatar answered Sep 22 '22 06:09

Rick


let videoURLs = await page.evaluate(async () => { // scroll till there's no more room to scroll or you get at least 250 boxes  
                    return await new Promise(resolve => {
                        var scrolledHeight = 0  
                        var distance = 100 
                        var timer = setInterval(() => {
                            boxes = Array.from(document.querySelectorAll("div.style-scope.ytd-item-section-renderer#contents > ytd-video-renderer > div.style-scope.ytd-video-renderer#dismissable a#video-title")).map(vid => vid.href)
                            // boxes = Array.from(document.querySelectorAll("div.style-scope.ytd-item-section-renderer#contents > ytd-video-renderer > div.style-scope.ytd-video-renderer#dismissable"))
                            var scrollHeight = document.documentElement.scrollHeight
                            window.scrollBy(0, distance)
                            scrolledHeight += distance
                            if(scrolledHeight >= scrollHeight || boxes.length >= 50){
                                clearInterval(timer)
                                resolve(boxes)
                            }
                        }, 500)
                    })
                })
console.log(videoURLs)
like image 35
roitmi Avatar answered Sep 18 '22 06:09

roitmi