I'm looking to match a specific table within a table. Here's the sample html and a summary of my failed attempts thus far:
<table id="parent">
<table class="possible_target">
<tr><td>We're tageting this table</td></tr>
</table>
</table>
<table class="possible_target">
<tr><td>We're not targeteing this table</td></tr>
</table>
Here was my initial attempt. But even if it worked, it would likely match the second, unnested table as well:
~(?=<table.*?)<table class="possible_target".*?</table>~si
Here is my sudo expression for what I'm trying to accomplish. It would assert the presence of an opening table tag as well as the absence of a closing table tag before making the match:
~(?=<table.*?)(?!</table>)<table class="possible_target".*?</table>~si
I found that interesting, as it's challenging to work with regex and nested html-tags.
My attempt does (should do) the following:
1.) Enumerate the tables by depth using a callback function. Lowest depth = 1
// html stuff to process
$source = "your input";
// specify tag to match
$rx_tag = "table";
// match all $rx_tag in depth (lowest = 1)
$rx_depth = 2;
// ----------------------------
// set markers using callback function
$source = preg_replace_callback('~<(/)?'.$rx_tag.'~i','set_tag_depth',$source);
function set_tag_depth($out)
{
global $tag_depth;
if($out[1]=="/") {
$tag_depth--; return $out[0].($tag_depth+1);
}
$tag_depth++; return $out[0].$tag_depth;
}
#echo nl2br(htmlspecialchars($source));
2.) Tables are now renamed to depth e.g. <table2 ... </table2> for all tables inside <table1, <table3 ... </table3> for tables inside <table2 and so on. Now it's easy to match the tables in the desired depth. Then strip the enumeration, in case you need the original source again.
// get specified tags in desired depth
$pattern = '~<'.$rx_tag.(string)$rx_depth.'.*</'.$rx_tag.(string)$rx_depth.'>~Uis';
preg_match_all($pattern,$source,$out);
// strip markers
if(!empty($out[0]))
{
foreach($out[0] AS $v)
{
$v = preg_replace('~(</?'.$rx_tag.')\d+~i','\1',$v);
// test output
echo nl2br(htmlspecialchars($v))."<br>------------------------------<br>";
}
}
$source = preg_replace('~(</?'.$rx_tag.')\d+~i','\1',$source);
It's intended, that if a tag contains tags of the same kind, those are not stripped from the parent e.g. <table2>...</table2> might contain <table3>...</table3> ... <table3>...</table3>. Set $rx_depth = 3; to get those.
I hope it works as it should, was very tired already :-) It's designed to work with any kinds of tags, but didn't test it much. At least an idea.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With