Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHP: how to get the correct closing tag of an HTML element

Assuming that I have an HTML page as follows:

<!-- This is the opening tag -->
<div class="content_text">
  <div>Title</div>
  <div>Author Name</div>
  <div>Some complicated HTML elements correctly validated</div>
  <b>Some more text</b>
  <img ... />
  <div> more and more text </div>
</div><!-- This is the correct closing tag -->

How do I get the content between the opening of the div with class="content_text" and its correct closing tag?

I tried regular expressions, but I couldn't find any easy or even hard way to do it.

I tried XPath, but I still couldn't get the content. Instead I got the text inside the outer div.

like image 332
Shehabic Avatar asked Dec 21 '22 07:12

Shehabic


2 Answers

You can use the PHP Simple HTML DOM Parser to parse HTML like DOMDocument would for XML.

Note: PHP has support for DOMDocument directly as well.

like image 200
Shoe Avatar answered Dec 22 '22 21:12

Shoe


    $scrape_address = "http://www.al-madina.com/node/444862";
    $ch = curl_init($scrape_address);
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, '1'); 
    curl_setopt($ch, CURLOPT_HEADER, 0);
    curl_setopt($ch, CURLOPT_ENCODING, "");
    $data = curl_exec($ch);
    // I couldn't get an element by Attribute so I just replaced class to id
    $data = str_replace('class="content_text"','id="my_unique_id"',$data);

    $domd = new DOMDocument();
    libxml_use_internal_errors(true);
    $domd->loadHTML($data);
    libxml_use_internal_errors(false);
    $div = $domd->getElementById("my_unique_id");

    if ($div) {
      $dom2 = new DOMDocument();
      $dom2->appendChild($dom2->importNode($div, true));
      echo $dom2->saveHTML();
    } else {
      echo "Nothing found";
    }
like image 39
Shehabic Avatar answered Dec 22 '22 19:12

Shehabic