Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Simple HTML DOM getting all attributes from a tag

Tags:

html

dom

php

Sort of a two part question but maybe one answers the other. I'm trying to get a piece of information out of an

<div id="foo">
<div class="bar"><a data1="xxxx" data2="xxxx" href="http://foo.bar">Inner text"</a>
<div class="bar2"><a data3="xxxx" data4="xxxx" href="http://foo.bar">more text"</a>

Here is what I'm using now.

$articles = array();
$html=file_get_html('http://foo.bar');
foreach($html->find('div[class=bar] a') as $a){
    $articles[] = array($a->href,$a->innertext);
}

This works perfectly to grab the href and the inner text from the first div class. I tried adding a $a->data1 to the foreach but that didn't work.

How do I grab those inner data tags at the same time I grab the href and innertext.

Also is there a good way to get both classes with one statement? I assume I could build the find off of the id and grab all the div information.

Thanks

like image 264
TheEditor Avatar asked Jan 22 '13 10:01

TheEditor


3 Answers

To grab all those attributes, you should before investigate the parsed element, like this:

foreach($html->find('div[class=bar] a') as $a){
  var_dump($a->attr);
}

...and see if those attributes exist. They don't seem to be valid HTML, so maybe the parser discards them.

If they exist, you can read them like this:

foreach($html->find('div[class=bar] a') as $a){
  $article = array($a->href, $a->innertext);
  if (isset($a->attr['data1'])) {
    $article['data1'] = $a->attr['data1'];
  }
  if (isset($a->attr['data2'])) {
    $article['data2'] = $a->attr['data2'];
  }
  //...
  $articles[] = $article;
}

To get both classes you can use a multiple selector, separated by a comma:

foreach($html->find('div[class=bar] a, div[class=bar2] a') as $a){
...
like image 57
ermannob Avatar answered Oct 16 '22 20:10

ermannob


I know this question is old, but the OP asked how they could get all the attributes in one statement. I just did this for a project I'm working on.

You can get all the attributes for an element with the getAllAttributes() method. The results are automatically stored in an array property called attr.

In the example below I am grabbing all links but you can use this with whatever you want. NOTE: This also works with data- attributes. So if there is an attribute called data-url it will be accessible with $e->attr['data-url'] after you run the getAllAttributes method.

In your case the attributes your looking for will be $e->attr['data1'] and $e->attr['data2']. Hope this helps someone if not the OP.

Get all Attributes

$html = file_get_html('somefile.html');
foreach ($html->find('a') as $e) {   //used a tag here, but use whatever you want
    $e->getAllAttributes();

    //testing that it worked
    print_r($e->attr);
}
like image 5
Tech Savant Avatar answered Oct 16 '22 20:10

Tech Savant


Check this code

<?php
$html = file_get_html('somefile.html');
foreach ($html->find('a') as $e) { 
$filter = $e->getAttribute('data-filter-string');
}
?>
like image 2
Stepan Chopko Avatar answered Oct 16 '22 19:10

Stepan Chopko