I am trying to use PHP Simple HTML DOM Parser to grab the HTML of an external file. The file contains a table and the goal is to find a able cell with specific data contents, and then get the next sibling cell's data. This data needs to be places into a PHP variable.
Based on the research and info found in articles like How to parse and process HTML/XML with PHP?, Grabbing the href attribute of an A element, Scraping Data: PHP Simple HTML DOM Parser and of course PHP Simple HTML DOM Parser Manual I've been able to produce some results, but I'm afraid I may be on the wrong track.
The table row looks like this:
<tr>
<td>fluff</td>
<td>irrelevant</td>
<td>etc</td>
<td><a href="one">Hello world</a></td>
<td>123.456</td>
<td>fluff</td>
<td>irrelevant</td>
<td>etc</td>
</tr>
What I'm trying to accomplish is to find the table cell that contains "Hello world", and then get the number from withing the next td cell. The following code finds that table cell and echoes its contents, but my attempts to use it as a landmark in order to get the next cell's data have failed...
$html = file_get_html("http://site.com/stuff.htm");
$e = $html->find('td',0)->innertext = 'Hello world';
echo $e;
So ultimately, in the example above the value of 123.456 needs to somehow get into a PHP variable.
Thanks for your help!
It can be done using the DOMXPath
class. You won't need an external library for this.
Here comes an example:
<?php
$html = <<<EOF
<tr>
<td>fluff</td>
<td>irrelevant</td>
<td>etc</td>
<td><a href="one">Hello world</a></td>
<td>123.456</td>
<td>fluff</td>
<td>irrelevant</td>
<td>etc</td>
</tr>
EOF;
// create empty document
$document = new DOMDocument();
// load html
$document->loadHTML($html);
// create xpath selector
$selector = new DOMXPath($document);
// selects the parent node of <a> nodes
// which's content is 'Hello world'
$results = $selector->query('//td/a[text()="Hello world"]/..');
// output the results
foreach($results as $node) {
echo $node->nodeValue . PHP_EOL;
}
using simple html dom parser:
$str = "<table><tr>
<td>fluff</td>
<td>irrelevant</td>
<td>etc</td>
<td><a href=\"one\">Hello world</a></td>
<td>123.456</td>
<td>fluff</td>
<td>irrelevant</td>
<td>etc</td>
</tr></table>";
$html = str_get_html($str);
$tds = $html->find('table',0)->find('td');
$num = null;
foreach($tds as $td){
if($td->plaintext == 'Hello world'){
$next_td = $td->next_sibling();
$num = $next_td->plaintext ;
break;
}
}
echo($num);
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With