Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrolling to the bottom of a div in puppeteer not working

So I'm trying to scrape all the concerts in the boxed off area in the picture below:

https://i.stack.imgur.com/7QIMM.jpg

The problem is the list only presents the first 10 options until you scroll down in that specific div to the bottom, and then it dynamically presents more until there are no more results. I tried following the link below's answer but couldn't get it to scroll down to present all the 'concerts':

How to scroll inside a div with Puppeteer?

Here's my basic code:

const browser = await puppeteerExtra.launch({ args: [                
    '--no-sandbox'                                                  
    ]});

async function functionName() {
    const page = await browser.newPage();
    await preparePageForTests(page);
    page.once('load', () => console.log('Page loaded!'));
    await page.goto(`https://www.google.com/search?q=concerts+near+poughkeepsie&client=safari&rls=en&uact=5&ibp=htl;events&rciv=evn&sa=X&fpstate=tldetail`);   

    const resultList = await page.waitForSelector(".odIJnf"); 
    const scrollableSection = await page.waitForSelector("#Q5Vznb");    //I think this is the div that contains all the concert items.
    const results = await page.$$(".odIJnf");  //this needs to be iterable to be used in the for loop

//this is where I'd like to scroll down the div all the way to the bottom

    for (let i = 0; i < results.length; i++) {
      const result = await (await results[i].getProperty('innerText')).jsonValue();
      console.log(result)
    }
}
like image 689
nickcoding2 Avatar asked May 10 '21 15:05

nickcoding2


People also ask

How do I scroll to the bottom of a div?

Use element. scroll() to Scroll to Bottom of Div in JavaScript. You can use element.

How do I scroll to a div?

You need to get the top offset of the element you'd like to scroll into view, relative to its parent (the scrolling div container): var myElement = document. getElementById('element_within_div'); var topPos = myElement.

How to scroll to the bottom of a Div using CSS?

As a result, when you run the code in your browser, the div will scroll to the bottom. In addition, the element should be scrollable via CSS overflow-y: scroll. The Element.scrollIntoView () method will scroll an element to be visible to the user. As a result, you see the overflow content hidden from sight.

Can puppeteer extract data from infinite scrolling applications?

Thanks to Puppeteer, you can now extract data on infinite scrolling applications quickly and efficiently. While it may not be what you utilize in all cases, the script from this article should serve as a starting point for emulating human-like scrolling on an application.

How do I scroll to the bottom of an element?

Scroll to bottom with Element.scroll (). Scroll to bottom with Element.scrollIntoView (). A combination of scrollTop and scrollHeight can cause an element to scroll to the bottom because scrollTop determines the number of pixels for a vertical scroll. In contrast, scrollHeight is the element’s height (visible and non-visible parts).

Is there a puppeteer for Chrome?

And, no, not the kind that works Puppets. Puppeteer, is a headless Chrome Node API, allows you to emulate scrolling on the page and retrieve the desired data needed from the rendered elements. Puppeteer allows you to behave almost exactly as if you were in your regular browser, except programmatically and without a user interface.


2 Answers

Try this to scroll down on the list of concerts. You can keep looping until the number of results stops increasing, or you find the concert you are looking for:

await page.evaluate(()=>{
  document.querySelector("#Q5Vznb").scrollIntoView(false);
});
like image 91
Benny Avatar answered Nov 15 '22 00:11

Benny


As you mention in your question, when you run page.$$, you get back an array of ElementHandle. From Puppeteer's documentation:

ElementHandle represents an in-page DOM element. ElementHandles can be created with the page.$ method.

This means you can iterate over them, but you also have to run evaluate() or $eval() over each element to access the DOM element.

I see from your snippet that you are trying to access the parent div that handles the list scroll event. The problem is that this page seems to be using auto-generated classes and ids. This might make your code brittle or not work properly. It would be best to try and access the ul, li, div's direct.

I've created this snippet that can get ITEMS amounts of concerts from the site:

const puppeteer = require('puppeteer')

/**
 * Constants
 */
const ITEMS = process.env.ITEMS   || 50
const URL   = process.env.URL     || "https://www.google.com/search?q=concerts+near+poughkeepsie&client=safari&rls=en&uact=5&ibp=htl;events&rciv=evn&sa=X&fpstate=tldetail"

/**
 * Main
 */
main()
  .then( ()    => console.log("Done"))
  .catch((err) => console.error(err))

/**
 * Functions
 */
async function main() {
  const browser = await puppeteer.launch({ args: ["--no-sandbox"] })
  const page = await browser.newPage()
  
  await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36')
  await page.goto(URL)
 
  const results = await getResults(page)
  console.log(results)
  
  await browser.close()
}

async function getResults(page) {
  await page.waitForSelector("ul")
  const ul  = (await page.$$("ul"))[0]
  const div = (await ul.$x("../../.."))[0]
  const results = []
  
  const recurse = async () => {
    // Recurse exit clause
    if (ITEMS <= results.length) {
      return
    }

    const $lis = await page.$$("li")
    // Slicing this way will avoid duplicating the result. It also has
    // the benefit of not having to handle the refresh interval until
    // new concerts are available.
    const lis = $lis.slice(results.length, Math.Infinity)
    for (let li of lis) {
      const result = await li.evaluate(node => node.innerText)
      results.push(result)
    }
    // Move the scroll of the parent-parent-parent div to the bottom
    await div.evaluate(node => node.scrollTo(0, node.scrollHeight))
    await recurse()
  }
  // Start the recursive function
  await recurse()
 
  return results
}

By studying the page structure, we see that the ul for the list is nested in three divs deep from the div that handles the scroll. We also know that there are only two uls on the page, and the first is the one we want. That is what we do on these lines:

  const ul  = (await page.$$("ul"))[0]
  const div = (await ul.$x("../../.."))[0]

The $x function evaluates the XPath expression relative to the document as its context node*. It allows us to traverse the DOM tree until we find the div that we need. We then run a recursive function until we get the items that we want.

  • Taken from the docs.
like image 20
guzmonne Avatar answered Nov 14 '22 22:11

guzmonne