Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Want to scrape table using Puppeteer. How can I get all rows, iterate through rows, and then get "td's" for each row?

I have Puppeteer setup, and I was able get all of the rows using:

let rows = await page.$$eval('#myTable tr', row => row);

Now I want for each row to get "td's" and then get the innerText from those.

Basically I want to do this:

var tds = myRow.querySelectorAll("td");

Where myRow is a table row, with Puppeteer.

like image 464
user838426 Avatar asked Mar 12 '18 13:03

user838426


People also ask

Which method is used to iterate through the table rows?

TableAPI. tableIterator() provides non-atomic table iteration. Use this method to iterate over indexes.

Is puppeteer good for web scraping?

Conclusion. We learned Puppeteer is a powerful library for automating things, web scraping, taking screenshots, saving pdfs, debugging, and it supports non-headless environments too just like selenium. We saw how our web crawlers scraped data from Wikipedia and then saved it in a JSON file.


Video Answer


1 Answers

One way to achieve this is to use evaluate that first gets an array of all the TD's then returns the textContent of each TD

const puppeteer = require('puppeteer');

const html = `
<html>
    <body>
      <table>
      <tr><td>One</td><td>Two</td></tr>
      <tr><td>Three</td><td>Four</td></tr>
      </table>
    </body>
</html>`;

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto(`data:text/html,${html}`);

  const data = await page.evaluate(() => {
    const tds = Array.from(document.querySelectorAll('table tr td'))
    return tds.map(td => td.innerText)
  });

  //You will now have an array of strings
  //[ 'One', 'Two', 'Three', 'Four' ]
  console.log(data);
  //One
  console.log(data[0]);
  await browser.close();
})();

You could also use something like:-

const data = await page.$$eval('table tr td', tds => tds.map((td) => {
  return td.innerText;
}));

//[ 'One', 'Two', 'Three', 'Four' ]
console.log(data);
like image 160
Rippo Avatar answered Sep 19 '22 15:09

Rippo