Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use PHP Simple HTML DOM Parser to find table cell and get contents of next sibling

I am trying to use PHP Simple HTML DOM Parser to grab the HTML of an external file. The file contains a table and the goal is to find a able cell with specific data contents, and then get the next sibling cell's data. This data needs to be places into a PHP variable.

Based on the research and info found in articles like How to parse and process HTML/XML with PHP?, Grabbing the href attribute of an A element, Scraping Data: PHP Simple HTML DOM Parser and of course PHP Simple HTML DOM Parser Manual I've been able to produce some results, but I'm afraid I may be on the wrong track.

The table row looks like this:

<tr>
<td>fluff</td>  
<td>irrelevant</td> 
<td>etc</td>   
<td><a href="one">Hello world</a></td>                        
<td>123.456</td> 
<td>fluff</td>          
<td>irrelevant</td>   
<td>etc</td>
</tr>

What I'm trying to accomplish is to find the table cell that contains "Hello world", and then get the number from withing the next td cell. The following code finds that table cell and echoes its contents, but my attempts to use it as a landmark in order to get the next cell's data have failed...

$html = file_get_html("http://site.com/stuff.htm");
$e = $html->find('td',0)->innertext = 'Hello world';
echo $e;

So ultimately, in the example above the value of 123.456 needs to somehow get into a PHP variable.

Thanks for your help!

like image 816
stotrami Avatar asked Apr 02 '13 18:04

stotrami


2 Answers

It can be done using the DOMXPath class. You won't need an external library for this.

Here comes an example:

<?php

$html = <<<EOF
<tr>
<td>fluff</td>  
<td>irrelevant</td> 
<td>etc</td>   
<td><a href="one">Hello world</a></td>                        
<td>123.456</td> 
<td>fluff</td>          
<td>irrelevant</td>   
<td>etc</td>
</tr>
EOF;


// create empty document 
$document = new DOMDocument();

// load html
$document->loadHTML($html);

// create xpath selector
$selector = new DOMXPath($document);

// selects the parent node of <a> nodes
// which's content is 'Hello world'
$results = $selector->query('//td/a[text()="Hello world"]/..');

// output the results 
foreach($results as $node) {
    echo $node->nodeValue . PHP_EOL;
}
like image 187
hek2mgl Avatar answered Oct 15 '22 09:10

hek2mgl


using simple html dom parser:

$str = "<table><tr>
<td>fluff</td>  
<td>irrelevant</td> 
<td>etc</td>   
<td><a href=\"one\">Hello world</a></td>                        
<td>123.456</td> 
<td>fluff</td>          
<td>irrelevant</td>   
<td>etc</td>
</tr></table>";

$html = str_get_html($str);

 $tds = $html->find('table',0)->find('td');
 $num = null;
 foreach($tds as $td){

     if($td->plaintext == 'Hello world'){

        $next_td = $td->next_sibling();
        $num = $next_td->plaintext ;    
        break; 
     }
 }

 echo($num);
like image 44
Adidi Avatar answered Oct 15 '22 10:10

Adidi