Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

node js puppeteer metadata

I am new to Puppeteer, and I am trying to extract meta data from a Web site using Node.JS and Puppeteer. I just can't seem to get the syntax right. The code below works perfectly extracting the Title tag, using two different methods, as well as text from a paragraph tag. How would I extract the content text for the meta data with the name of "description" for example?

meta name="description" content="Stack Overflow is the largest, etc"

I would be seriously grateful for any suggestions! I can't seem to find any examples of this anywhere (5 hours of searching and code hacking later). My sample code:

const puppeteer = require('puppeteer');

async function main() {
  const browser = await puppeteer.launch({headless: false});
  const page = await browser.newPage();
  await page.goto('https://stackoverflow.com/', {waitUntil: 'networkidle2'});

  const pageTitle1 = await page.evaluate(() => document.querySelector('title').textContent);
  const pageTitle2 = await page.title();
  const innerText = await page.evaluate(() => document.querySelector('p').innerText);
  console.log(pageTitle1);
  console.log(pageTitle2);
  console.log(innerText);
};  

main();
like image 906
Lauren Kay Avatar asked Feb 21 '18 07:02

Lauren Kay


2 Answers

You need a deep tutorial for CSS selectors MDN CSS Selectors.

Something that I highly recommend is testing your selectors on the console directly in the page you will apply the automation, this will save hours of running-stop your system. Try this:

document.querySelectorAll("head > meta[name='description']")[0].content;

Now for puppeteer, you need to copy that selector and past on puppeteer function also I like more this notation:

await page.$eval("head > meta[name='description']", element => element.content);

Any other question or problem just comment.

like image 82
Raul Rueda Avatar answered Nov 04 '22 11:11

Raul Rueda


For anyone struggling to get the OG tags in Puppeteer , here is the solution.

let dom2 = await page.evaluate(() => {
    return document.head.querySelector('meta[property="og:description"]').getAttribute("content");
});
console.log(dom2);
like image 1
starforce Avatar answered Nov 04 '22 13:11

starforce