Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get href attribute in pupeteer Node.js

Tags:

I know the common methods such as evaluate for capturing the elements in puppeteer, but I am curious why I cannot get the href attribute in a JavaScript-like approach as

const page = await browser.newPage();  await page.goto('https://www.example.com');  let links = await page.$$('a'); for (let i = 0; i < links.length; i++) {   console.log(links[i].getAttribute('href'));   console.log(links[i].href); } 
like image 860
Googlebot Avatar asked Mar 28 '19 00:03

Googlebot


People also ask

How do you get the puppeteer element?

We can get element text in Puppeteer. This is done with the help of the textContent property. This property of the element is passed as a parameter to the getProperty method.

How do I find the URL of a puppeteer?

Puppeteer has page. url() function to get the URL of the current page.


2 Answers

await page.$$('a') returns an array with ElementHandles — these are objects with their own pupeteer-specific API, they have not usual DOM API for HTML elements or DOM nodes. So you need either retrieve attributes/properties in the browser context via page.evaluate() or use rather complicated ElementHandles API. This is an example with both ways:

'use strict';  const puppeteer = require('puppeteer');  (async function main() {   try {     const browser = await puppeteer.launch();     const [page] = await browser.pages();      await page.goto('https://example.org/');      // way 1     const hrefs1 = await page.evaluate(       () => Array.from(         document.querySelectorAll('a[href]'),         a => a.getAttribute('href')       )     );      // way 2     const elementHandles = await page.$$('a');     const propertyJsHandles = await Promise.all(       elementHandles.map(handle => handle.getProperty('href'))     );     const hrefs2 = await Promise.all(       propertyJsHandles.map(handle => handle.jsonValue())     );      console.log(hrefs1, hrefs2);      await browser.close();   } catch (err) {     console.error(err);   } })(); 
like image 116
vsemozhebuty Avatar answered Oct 13 '22 00:10

vsemozhebuty


I don't know why it's such a pain, but this was found when I encountered this a while ago.

async function getHrefs(page, selector) {   return await page.$$eval(selector, anchors => [].map.call(anchors, a => a.href)); } 
like image 28
Phix Avatar answered Oct 13 '22 01:10

Phix