Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to use dom php parser

I'm new to DOM parsing in PHP:
I have a HTML file that I'm trying to parse. It has a bunch of DIVs like this:

<div id="interestingbox"> 
   <div id="interestingdetails" class="txtnormal">
        <div>Content1</div>
        <div>Content2</div>
   </div>
</div>

<div id="interestingbox"> 
......

I'm trying to get the contents of the many div boxes using php. How can I use the DOM parser to do this?

Thanks!

like image 583
chris Avatar asked Jun 06 '09 23:06

chris


People also ask

What is DOM parser in PHP?

Dom Parser is very good at dealing with XML as well as HTML. Dom parser travels based on tree based and before access the data, it will load the data into dom object and it will update the data to the web browser. Below Example shows how to get access to the HTML data in web browser.

Can we use DOM in PHP?

The built-in DOM parser makes it possible to process XML documents in PHP.

What is Simple_html_dom PHP?

A simple PHP HTML DOM parser written in PHP5+, supports invalid HTML, and provides a very easy way to find, extract and modify the HTML elements of the dom. jquery like syntax allow sophisticated finding methods for locating the elements you care about.


2 Answers

First i have to tell you that you can't use the same id on two different divs; there are classes for that point. Every element should have an unique id.

Code to get the contents of the div with id="interestingbox"

$html = '
<html>
<head></head>
<body>
<div id="interestingbox"> 
   <div id="interestingdetails" class="txtnormal">
        <div>Content1</div>
        <div>Content2</div>
   </div>
</div>

<div id="interestingbox2"><a href="#">a link</a></div>
</body>
</html>';


$dom_document = new DOMDocument();

$dom_document->loadHTML($html);

//use DOMXpath to navigate the html with the DOM
$dom_xpath = new DOMXpath($dom_document);

// if you want to get the div with id=interestingbox
$elements = $dom_xpath->query("*/div[@id='interestingbox']");

if (!is_null($elements)) {

  foreach ($elements as $element) {
    echo "\n[". $element->nodeName. "]";

    $nodes = $element->childNodes;
    foreach ($nodes as $node) {
      echo $node->nodeValue. "\n";
    }

  }
}

//OUTPUT
[div]  {
        Content1
        Content2
}

Example with classes:

$html = '
<html>
<head></head>
<body>
<div class="interestingbox"> 
   <div id="interestingdetails" class="txtnormal">
        <div>Content1</div>
        <div>Content2</div>
   </div>
</div>

<div class="interestingbox"><a href="#">a link</a></div>
</body>
</html>';

//the same as before.. just change the xpath

[...]

$elements = $dom_xpath->query("*/div[@class='interestingbox']");

[...]

//OUTPUT
[div]  {
        Content1
        Content2
}

[div]  {
a link
}

Refer to the DOMXPath page for more details.

like image 68
apelliciari Avatar answered Oct 05 '22 22:10

apelliciari


I got this to work using simplehtmldom as a start:

$html = file_get_html('example.com');
foreach ($html->find('div[id=interestingbox]') as $result)
{
    echo $result->innertext;
}
like image 34
chris Avatar answered Oct 05 '22 23:10

chris