Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Puphpeteer - Get text and href-attribute from link

Tags:

php

I am using "@nesk/puphpeteer": "^2.0.0" Link to Github-Repo and want get the text and the href-attribute from a link.

I tried the following:

<?php

require_once '../vendor/autoload.php';

use Nesk\Puphpeteer\Puppeteer;
use Nesk\Rialto\Data\JsFunction;

$debug = true;

$puppeteer = new Puppeteer([
    'read_timeout' => 100,
    'debug' => $debug,
]);
$browser = $puppeteer->launch([
    'headless' => !$debug,
    'ignoreHTTPSErrors' => true,
]);

$page = $browser->newPage();
$page->goto('http://example.python-scraping.com/');

//get text and link
$links = $page->querySelectorXPath('//*[@id="results"]/table/tbody/tr/td/div/a', JsFunction::createWithParameters(['node'])
    ->body('return node.textContent;'));

// iterate over links and print each link and its text

// get single text
$singleText = $page->querySelectorXPath('//*[@id="pagination"]/a', JsFunction::createWithParameters(['node'])
    ->body('return node.textContent;'));


$browser->close();

When I run the above script I get the nodes from the page, BUT I cannot access the attributes or the text?

Any suggestions how to do this?

I appreciate your replies!

like image 927
Carol.Kar Avatar asked Aug 09 '21 19:08

Carol.Kar


People also ask

How to get attribute values of an element using puppeteer?

We can get attribute values of an element using Puppeteer. The attributes are added within the HTML tag. They are used to describe the properties of an element. An attribute and its value are defined in a key-value pair.

How to get an attribute value from a href link in selenium?

We can get an attribute value from a href link in Selenium. To begin with, we have to first identify the element having an anchor tag with the help of any of the locators like css, id, class, and so on. Next, we shall use the getAttribute method and pass href as a parameter to the method.

How to get the attribute of an anchor tag in HTML?

To begin with, we have to first identify the element having an anchor tag with the help of any of the locators like css, id, class, and so on. Next, we shall use the getAttribute method and pass href as a parameter to the method. Let us investigate an element with an anchor tag having the href attribute.

How to get the value of an attribute in HTML?

We can get the attribute using getAttribute (attribute_name), getAttribute (attribute_name) method fetches the value of an attribute, in HTML code whatever is present in left side of '=' is an attribute, the value on the right side is the Attribute value.


1 Answers

querySelectorXPath return array of ElementHandle. one more thing querySelectorXPath does not support callback function.

first get all node ElementHandle

$links = $page->querySelectorXPath('//*[@id="results"]/table/tbody/tr/td/div/a');

then loop over links to access attributes or text of node

foreach($links as $link){
   // for text
    $text = $link->evaluate(JsFunction::createWithParameters(['node'])
    ->body('return node.innerText;'));

  // for link
  $link = $link->evaluate(JsFunction::createWithParameters(['node'])
    ->body('return node.href;'));
}
like image 187
Zeeshan Anjum Avatar answered Oct 12 '22 04:10

Zeeshan Anjum