I know the common methods such as evaluate
for capturing the elements in puppeteer
, but I am curious why I cannot get the href
attribute in a JavaScript-like approach as
const page = await browser.newPage(); await page.goto('https://www.example.com'); let links = await page.$$('a'); for (let i = 0; i < links.length; i++) { console.log(links[i].getAttribute('href')); console.log(links[i].href); }
We can get element text in Puppeteer. This is done with the help of the textContent property. This property of the element is passed as a parameter to the getProperty method.
Puppeteer has page. url() function to get the URL of the current page.
await page.$$('a')
returns an array with ElementHandles — these are objects with their own pupeteer-specific API, they have not usual DOM API for HTML elements or DOM nodes. So you need either retrieve attributes/properties in the browser context via page.evaluate()
or use rather complicated ElementHandles API. This is an example with both ways:
'use strict'; const puppeteer = require('puppeteer'); (async function main() { try { const browser = await puppeteer.launch(); const [page] = await browser.pages(); await page.goto('https://example.org/'); // way 1 const hrefs1 = await page.evaluate( () => Array.from( document.querySelectorAll('a[href]'), a => a.getAttribute('href') ) ); // way 2 const elementHandles = await page.$$('a'); const propertyJsHandles = await Promise.all( elementHandles.map(handle => handle.getProperty('href')) ); const hrefs2 = await Promise.all( propertyJsHandles.map(handle => handle.jsonValue()) ); console.log(hrefs1, hrefs2); await browser.close(); } catch (err) { console.error(err); } })();
I don't know why it's such a pain, but this was found when I encountered this a while ago.
async function getHrefs(page, selector) { return await page.$$eval(selector, anchors => [].map.call(anchors, a => a.href)); }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With