Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using PHP DOM document, to select HTML element by its class and get its text

I trying to get text from div where class = 'review-text', by using PHP's DOM element with following HTML (same structure) and following code.

However this doesn't seem to work

  1. HTML

    $html = '
        <div class="page-wrapper">
            <section class="page single-review" itemtype="http://schema.org/Review" itemscope="" itemprop="review">
                <article class="review clearfix">
                    <div class="review-content">
                        <div class="review-text" itemprop="reviewBody">
                        Outstanding ... 
                        </div>
                    </div>
                </article>
            </section>
        </div>
    ';
    
  2. PHP Code

        $classname = 'review-text';
        $dom = new DOMDocument;
        $dom->loadHTML($html);
        $xpath     = new DOMXPath($dom);
        $results = $xpath->query("//*[@class and contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");
    
        if ($results->length > 0) {
            echo $review = $results->item(0)->nodeValue;
        }
    

The XPATH syntax to select element by Class is provided at this Blog

I have tried many example from StackOverflow, online tutorials, but none seems to work. Am I missing something ?

like image 907
Abhishek Madhani Avatar asked Aug 12 '13 08:08

Abhishek Madhani


People also ask

What is DOM document in PHP?

The DOMDocument::getElementsByTagName() function is an inbuilt function in PHP which is used to return a new instance of class DOMNodeList which contains all the elements of local tag name.

Can PHP interact with the DOM?

So if you're ever working with the content for a post (a post type or a custom post type, for that matter) and you need to manipulate tags much like you would with JavaScript, then using the DomDocument library is one of the most powerful tools are your disposal.

What is HTML element in DOM?

In the HTML DOM, the Element object represents an HTML element, like P, DIV, A, TABLE, or any other HTML element.


1 Answers

The following XPath query does what you want. Just replace the argument provided to $xpath->query with the following:

//div[@class="review-text"]

Edit: For easy development, you can test your own XPath query's online at http://www.xpathtester.com/test.

Edit2: Tested this code; it worked perfectly.

<?php

$html = '
    <div class="page-wrapper">
        <section class="page single-review" itemtype="http://schema.org/Review" itemscope="" itemprop="review">
            <article class="review clearfix">
                <div class="review-content">
                    <div class="review-text" itemprop="reviewBody">
                    Outstanding ... 
                    </div>
                </div>
            </article>
        </section>
    </div>
';

$classname = 'review-text';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$results = $xpath->query("//*[@class='" . $classname . "']");

if ($results->length > 0) {
    echo $review = $results->item(0)->nodeValue;
}

?>
like image 82
Frank Houweling Avatar answered Sep 22 '22 06:09

Frank Houweling