I am using PHP simple DOM parser but it does not seem to have the functionality to search for text. I need to search for a string and find the parent id for it. Essentially the reverse of normal usage.
Anyone know how?
$html = file_get_html('http://www.google.com/');
$eles = $html->find('*');
foreach($eles as $e) {
if(strpos($e->innertext, 'theString') !== false) {
echo $e->id;
}
}
http://simplehtmldom.sourceforge.net/manual.htm
Just imagine that any tag has a "plaintext" attribute and use standart attribute selectors.
So, HTML:
<div id="div1">
<span>London is the capital</span> of Great Britain
</div>
<div id="div2">
<span>Washington is the capital</span> of the USA
</div>
can be imagined in mind as:
<div id="div1" plaintext="London is the capital of Great Britain">
<span plaintext="London is the capital ">London is the capital</span> of Great Britain
</div>
<div id="div2" plaintext="Washington is the capital of the USA">
<span plaintext="Washington is the capital ">Washington is the capital</span> of the USA
</div>
And PHP to resolve your task is just:
<?php
$t = '
<div id="div1">
<span>London is the capital</span> of Great Britain
</div>
<div id="div2">
<span>Washington is the capital</span> of the USA
</div>';
$html = str_get_html($t);
$foo = $html->find('span[plaintext^=London]');
echo "ID: " . $foo[0]->parent()->id; // div1
?>
(take in mind that "plaintext" for <span>
tags is right-padded with a space symbol; this is default behaviour of Simple HTML DOM, defined by constant DEFAULT_SPAN_TEXT
)
$d = new DOMDocument();
$d->loadXML($xml);
$x = new DOMXPath($d);
$result = $x->evaluate("//text()[contains(.,'617.99')]/ancestor::*/@id");
$unique = null;
for($i = $result->length -1;$i >= 0 && $item = $result->item($i);$i--){
if($x->query("//*[@id='".addslashes($item->value)."']")->length == 1){
echo 'Unique ID is '.$item->value."\n";
$unique = $item->value;
break;
}
}
if(is_null($unique)) echo 'no unique ID found';
Got the answer. The entire example is a little long but it works. I also show the output.
The HTML for what we are going to look at:
<html>
<head>
<title>Simple HTML DOM - Find Text</title>
</head>
<body>
<h3>Simple HTML DOM - Find Text</h3>
<div id="first">
<p>This is a paragraph inside of div 'first'.
This paragraph does not have the text we are looking for.</p>
<p>As a matter of fact this div does not have the text we are looking for</p>
</div>
<div id="second">
<ul>
<li>This is an unordered list.
<li id="love1">We are looking for the following word love.
<li>Does not contain the word.
</ul>
<p id="love2">This paragraph which is in div second contains the word love.</p>
</div>
<div id="third">
<a id="love3" href="goes.nowhere.com">link to love site</a>
</div>
</body>
</html>
The PHP:
<?php
include_once('simple_html_dom.php');
function scraping_for_text($iUrl,$iText)
{
echo "iUrl=".$iUrl."<br />";
echo "iText=".$iText."<br />";
// create HTML DOM
$html = file_get_html($iUrl);
// get text elements
$aObj = $html->find('text');
if (count($aObj) > 0)
{
echo "<h4>Found ".$iText."</h4>";
}
else
{
echo "<h4>No ".$iText." found"."</h4>";
}
foreach ($aObj as $key=>$oLove)
{
$plaintext = $oLove->plaintext;
if (strpos($plaintext,$iText) !== FALSE)
{
echo $key.": text=".$plaintext."<br />"
."--- parent tag=".$oLove->parent()->tag."<br />"
."--- parent id=".$oLove->parent()->id."<br />";
}
}
// clean up memory
$html->clear();
unset($html);
return;
}
// -------------------------------------------------------------
// test it!
// user_agent header...
ini_set('user_agent', 'My-Application/2.5');
scraping_for_text("test_text.htm","love");
?>
The output:
iUrl=test_text.htm
iText=love
Found love
18: text=We are looking for the following word love.
--- parent tag=li
--- parent id=love1
21: text=This paragraph which is in div second contains the word love.
--- parent tag=p
--- parent id=love2
25: text=link to love site
--- parent tag=a
--- parent id=love3
That's all they wrote!!!!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With