I'm using HTML Tidy in PHP and it's producing unexpected results because of a <script>
tag in a JavaScript string literal. Here's a sample input:
<html>
<script>
var t='<script><'+'/script>';
</script>
</html>
HTML Tidy's output:
<html>
<script>
//<![CDATA[
var t='<script><'+'/script>';
<\/script>
<\/html>
//]]>
</script>
</html>
It's interpreting </script></html>
as part of the script. Then, it adds another </script></html>
to close the open tags. I tried this on an online version of HTML Tidy (http://www.dirtymarkup.com/) and it's producing the same error.
How do I prevent this error from occurring in PHP?
After playing around with it a bit I discovered that one can use comment //'<\/script>'
to confuse the algorithm in a way to prevent this bug from occurring:
<html>
<script>
var t='<script><'+'/script>'; //'<\/script>'
</script>
</html>
After clean-up:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<script>
var t='<script><'+'/script>'; //'<\/script>'
</script>
<title></title>
</head>
<body>
</body>
</html>
My guess is that as the clean-up algorithm looks through the codes and detects the string <script>
twice, it looks for </script>
immediately. And separting <
with /script>
makes the second </script>
goes undetected, which is why it decided to add another </script>
at the end of the codes and somehow also closed it with antoher </html>
. (Poor design indeed!)
So I made a second assumption that there isn't an if-statement in the algorithm to determine if a </scirpt>
is in a comment, and I was right! Having another string <\/script>
as a javascript comment indeed makes the algorithm to think that there are two </script>
in total.
There's no need for string concatenation to avoid the closing </script>
. Simply escaping the /
character is enough to "fool" the parsers in browsers and, it seems, HTML Tidy's parser as well:
<html>
<script>
var t='<script><\/script>';
</script>
</html>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With