I am using "@nesk/puphpeteer": "^2.0.0"
Link to Github-Repo and want get the text and the href
-attribute from a link.
I tried the following:
<?php
require_once '../vendor/autoload.php';
use Nesk\Puphpeteer\Puppeteer;
use Nesk\Rialto\Data\JsFunction;
$debug = true;
$puppeteer = new Puppeteer([
'read_timeout' => 100,
'debug' => $debug,
]);
$browser = $puppeteer->launch([
'headless' => !$debug,
'ignoreHTTPSErrors' => true,
]);
$page = $browser->newPage();
$page->goto('http://example.python-scraping.com/');
//get text and link
$links = $page->querySelectorXPath('//*[@id="results"]/table/tbody/tr/td/div/a', JsFunction::createWithParameters(['node'])
->body('return node.textContent;'));
// iterate over links and print each link and its text
// get single text
$singleText = $page->querySelectorXPath('//*[@id="pagination"]/a', JsFunction::createWithParameters(['node'])
->body('return node.textContent;'));
$browser->close();
When I run the above script I get the nodes from the page, BUT I cannot access the attributes or the text?
Any suggestions how to do this?
I appreciate your replies!
We can get attribute values of an element using Puppeteer. The attributes are added within the HTML tag. They are used to describe the properties of an element. An attribute and its value are defined in a key-value pair.
We can get an attribute value from a href link in Selenium. To begin with, we have to first identify the element having an anchor tag with the help of any of the locators like css, id, class, and so on. Next, we shall use the getAttribute method and pass href as a parameter to the method.
To begin with, we have to first identify the element having an anchor tag with the help of any of the locators like css, id, class, and so on. Next, we shall use the getAttribute method and pass href as a parameter to the method. Let us investigate an element with an anchor tag having the href attribute.
We can get the attribute using getAttribute (attribute_name), getAttribute (attribute_name) method fetches the value of an attribute, in HTML code whatever is present in left side of '=' is an attribute, the value on the right side is the Attribute value.
querySelectorXPath
return array of ElementHandle
. one more thing querySelectorXPath
does not support callback function.
first get all node ElementHandle
$links = $page->querySelectorXPath('//*[@id="results"]/table/tbody/tr/td/div/a');
then loop over links to access attributes or text of node
foreach($links as $link){
// for text
$text = $link->evaluate(JsFunction::createWithParameters(['node'])
->body('return node.innerText;'));
// for link
$link = $link->evaluate(JsFunction::createWithParameters(['node'])
->body('return node.href;'));
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With