<p>According to https://github.com/GoogleChrome/puppeteer/issues/628, I should be able to get all links from <em>< a href="xyz" ></em> with this single line:</p> <pre class="prettyprint"><code>const hrefs = await page.$$eval('a', a => a.href); </code></pre> <p>But when I try a simple:</p> <pre class="prettyprint"><code>console.log(hrefs) </code></pre> <p>I only get:</p> <pre class="prettyprint"><code>http://example.de/index.html </code></pre> <p>... as output which means that it could only find 1 link? But the page definitely has 12 links in the source code / DOM. Why does it fail to find them all?</p> <p>Minimal example:</p> <p></p> <div class="snippet" data-lang="js" data-hide="false" data-console="true" data-babel="false"> <div class="snippet-code"> <pre class="prettyprint snippet-code-html lang-html prettyprint-override"><code>'use strict'; const puppeteer = require('puppeteer'); crawlPage(); function crawlPage() { (async () => { const args = [ "--disable-setuid-sandbox", "--no-sandbox", "--blink-settings=imagesEnabled=false", ]; const options = { args, headless: true, ignoreHTTPSErrors: true, }; const browser = await puppeteer.launch(options); const page = await browser.newPage(); await page.goto("http://example.de", { waitUntil: 'networkidle2', timeout: 30000 }); const hrefs = await page.$eval('a', a => a.href); console.log(hrefs); await page.close(); await browser.close(); })().catch((error) => { console.error(error); });; }</code></pre> </div> </div>

<p>In your example code you're using <code>page.$eval</code>, not <code>page.$$eval</code>. Since the former uses <code>document.querySelector</code> instead of <code>document.querySelectorAll</code>, the behaviour you describe is the expected one.</p> <p>Also, you should change your <code>pageFunction</code>in the <code>$$eval</code> arguments:</p> <pre class="prettyprint"><code>const hrefs = await page.$$eval('a', as => as.map(a => a.href)); </code></pre>

How to get all links from the DOM?

Tags:

javascript

node.js

puppeteer

web-crawler

headless-browser

According to https://github.com/GoogleChrome/puppeteer/issues/628, I should be able to get all links from < a href="xyz" > with this single line:

const hrefs = await page.$$eval('a', a => a.href);

But when I try a simple:

console.log(hrefs)

I only get:

http://example.de/index.html

... as output which means that it could only find 1 link? But the page definitely has 12 links in the source code / DOM. Why does it fail to find them all?

Minimal example:

'use strict';
const puppeteer = require('puppeteer');

crawlPage();

function crawlPage() {
    (async () => {
	
	const args = [
            "--disable-setuid-sandbox",
            "--no-sandbox",
            "--blink-settings=imagesEnabled=false",
        ];
        const options = {
            args,
            headless: true,
            ignoreHTTPSErrors: true,
        };

	const browser = await puppeteer.launch(options);
        const page = await browser.newPage();
	await page.goto("http://example.de", {
            waitUntil: 'networkidle2',
            timeout: 30000
        });
     
	const hrefs = await page.$eval('a', a => a.href);
        console.log(hrefs);
		
        await page.close();
	await browser.close();
		
    })().catch((error) => {
        console.error(error);
    });;

}

236

asked Mar 26 '18 12:03

Vega

1 Answers

In your example code you're using page.$eval, not page.$$eval. Since the former uses document.querySelector instead of document.querySelectorAll, the behaviour you describe is the expected one.

Also, you should change your pageFunctionin the $$eval arguments:

const hrefs = await page.$$eval('a', as => as.map(a => a.href));

119

answered Oct 11 '22 22:10

Miguel Calderón

Related questions
                            
                                Condensing a sparse array in Javascript?
                            
                                Applying delay between iterations of javascript for loop
                            
                                How do I dynamically adjust css stylesheet based on browser width?
                            
                                Detect element style change in chrome
                            
                                Regex to only allow numbers under 10 digits?
                            
                                JS function when keyboard key is pressed?
                            
                                using an if statement to check if NaN
                            
                                Simple JavaScript chess board
                            
                                Error rendering data with Javascript / KendoUI autocomplete - Object #<Object> has no method 'slice' - how to resolve?
                            
                                Assigning worker tasks [closed]
                            
                                How to display alternative route using google map api
                            
                                Node.js Passport strategy login with either Email or Username
                            
                                How to display alert boxes based on if a check box is checked or not using Javascript
                            
                                Suites vs Specs Protractor
                            
                                ng-pattern gives "Lexer Error"?
                            
                                Sort object of weekdays like Sunday, Monday, ..., Saturday [duplicate]
                            
                                Node js request entity too large with multer upload
                            
                                Angular 2 function inside callback value not updating view
                            
                                How to inject custom meta tags in html-webpack-plugin?
                            
                                jQuery check if all required inputs from a form are not empty

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With