According to https://github.com/GoogleChrome/puppeteer/issues/628, I should be able to get all links from < a href="xyz" > with this single line:
const hrefs = await page.$$eval('a', a => a.href);
But when I try a simple:
console.log(hrefs)
I only get:
http://example.de/index.html
... as output which means that it could only find 1 link? But the page definitely has 12 links in the source code / DOM. Why does it fail to find them all?
Minimal example:
'use strict';
const puppeteer = require('puppeteer');
crawlPage();
function crawlPage() {
(async () => {
const args = [
"--disable-setuid-sandbox",
"--no-sandbox",
"--blink-settings=imagesEnabled=false",
];
const options = {
args,
headless: true,
ignoreHTTPSErrors: true,
};
const browser = await puppeteer.launch(options);
const page = await browser.newPage();
await page.goto("http://example.de", {
waitUntil: 'networkidle2',
timeout: 30000
});
const hrefs = await page.$eval('a', a => a.href);
console.log(hrefs);
await page.close();
await browser.close();
})().catch((error) => {
console.error(error);
});;
}
We can fetch href links in a page in Selenium by using the method find_elements(). All the links in the webpage are designed in a html document such that they are enclosed within the anchor tag. To fetch all the elements having <anchor> tagname, we shall use the method find_elements_by_tag_name().
In your example code you're using page.$eval
, not page.$$eval
. Since the former uses document.querySelector
instead of document.querySelectorAll
, the behaviour you describe is the expected one.
Also, you should change your pageFunction
in the $$eval
arguments:
const hrefs = await page.$$eval('a', as => as.map(a => a.href));
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With