I have a php page mixed with HTML. Some example code:
<?php echo "<p>some text</p>"; ?>/* <? some php in comments ?> */
<p>some HTML text</p> <!-- <h1>some HTML in comments</h1> -->
<? $header_info = <<<END
\$some="<?php @ob_start(); @session_set_save_handler(); ?>";
END; ?>
<h2>Some more HTML</h2>
I would like to split at each PHP and HTML tag but leave any PHP tags or HTML tags in quotes or comments untouched/ignored. This is what I have so far:
$array = preg_split("/((^<\?php)|([^'|\"]<\?php)|([^'|\"]<\?)|([^'|\"]\?>)|(<\%)|(\%>))/i", $string, -1);
The issue I have is that some of the HTML closing brackets '>' are missing in the final $array. I would like to keep the HTML open and closing tags intact. Sometimes I end up with
<p></p instead of <p></p>
It should look like this:
[0] echo "<p>some text</p>";
[1] <p>some HTML text</p>
[2] $header_info = <<<END
\$some="<?php @ob_start(); @session_set_save_handler(); ?>";
END;
[3] <h2>Some more HTML</h2>
Any comments do not need to be part of the array as long as preg_split does not see them as any delimiters and disregards any of them.
I also just realized that some of the php tags, especially when using eval() can end up like this:
"?> <p>some HTML text</p> <?";
which would mean that the quotations in my regex would not match any of those cases.
Preg_match() might be a better option, not sure though.
Any help would be very much appreciated as I am not very ingenious when it comes to regex and am rather stuck at this point.
Thanks a lot :)
PREAMBLE
Since a regular expression solution was asked, the following solution will rely on regular expressions. However, in this particular case, a PHP parser would be more suited.
Regular Expression
#(?<!"|\')<\\?(?:php)?\\s+(.+?)\\?>(?!"|\')|/\*.+\*/|<!--.+-->#is
Scriptlet
$subject = '<?php echo "<p>some text</p>"; ?>/* <? some php in comments ?> */
<p>some HTML text</p> <!-- <h1>some HTML in comments</h1> -->
<? $header_info = <<<END
\\$some="<?php @ob_start(); @session_set_save_handler(); ?>";
END; ?>
<h2>Some more HTML</h2>';
$returnValue = preg_replace('#(?<!"|\')<\\?(?:php)?\\s+(.+?)\\?>(?!"|\')|/\*.+\*/|<!--.+-->#is', '$1', $subject, -1);
var_dump(preg_split('#\\r?\\n#s', $returnValue));
Result
array(6) {
[0]=>
string(25) "echo "<p>some text</p>"; "
[1]=>
string(22) "<p>some HTML text</p> "
[2]=>
string(21) "$header_info = <<<END"
[3]=>
string(60) "\$some="<?php @ob_start(); @session_set_save_handler(); ?>";"
[4]=>
string(5) "END; "
[5]=>
string(23) "<h2>Some more HTML</h2>"
}
DEMO
http://sandbox.onlinephpfunctions.com/code/017a51877b50f272f151feade7b59e142757481e
Discussion
1. #
2. (?<!"|\')
3. <\\?(?:php)?\\s+
4. (.+?)
5. \\?>
6. (?!"|\')
7. |/\*.+\*/
8. |<!--.+-->
9. #is
line 1 I use this regex delimiter since it permits avoiding the escape of /
line 2 Here is the key of the regex. A negative lookbehind is used to ensure that the next opening php tag is NOT preceded by any single or double quote.
line 3 Here is defined what an opening php tag is. To support ASP tags too, this line can be changed like this : <\\?(?:php|%)?\\s+
line 4 Since we have detected the start of a php code sequence, we match any char appeaing in this php code sequence. Note on line 9 we use the s
flag to indicate that we want new lines as well in php code sequence.
line 5 We mark the end of php code sequence.
line 6 We ensure that the preceding matched php tag is not followed by any single/double quote with the negative lookahead assertion.
line 7,8 If we find some php/HTML comment, they will be simply ignored.
line 9 End f regex.
Known issues
$subject
, the lines are simply splitted with a newline (preceded by an optional carriage return) delimiter.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With