Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Issue with scraping a list to get href using Goutte and PHP

I am trying to scrape the following, I basically want the text and the link, I am using Goutte with PHP. I can get the text fine using the following code but I cannot get the href value. Any help would be amazing.

$crawler->filter('#most-popular > div > ol > li > a')->each(function ($node) {
    var_dump($node->getAttribute('href'));
});


<li class="first-child ol1">
  <a href="http://www.bbc.co.uk/news/uk-england-south-yorkshire-31895703" class="story">
    <span class="livestats-icon livestats-1">1: </span>MP claims £17 poppy wreath expenses</a>
</li>
like image 827
Oliver Bayes-Shelton Avatar asked Nov 27 '22 16:11

Oliver Bayes-Shelton


2 Answers

getAttribute() is implemented as attr() within the Crawler class.

$crawler->filter('#most-popular > div.panel.open > ol > li.first-child.ol1 > a')->each(function ($node) {
    var_dump($node->attr('href'));
});
like image 71
Burak Avatar answered Dec 23 '22 00:12

Burak


The bellow code will fix this issue.

$crawler->filter('#most-popular > div.panel.open > ol > li.first-child.ol1 > a')->each(function ($node) {
    $href = $node->extract(array('href'));
    var_dump($href[0]);
});
like image 41
Oliver Bayes-Shelton Avatar answered Dec 23 '22 00:12

Oliver Bayes-Shelton