Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why doesn't PHP DOM include slash on self closing tags?

Tags:

dom

php

I have been using PHP's DOM to load an html template, modify it and output it. Recently I discovered that self-closing (empty) tags don't include a closing slash, even though the template file did.

e.g.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"`"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
</head>
<body>
</body>
</html>

becomes:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
</body>
</html>

Is this a bug or a setting, or a doctype issue?

like image 780
peterjwest Avatar asked Jun 29 '10 22:06

peterjwest


People also ask

Do self closing tags need a slash?

All tags in XML or XHTML can be used as self-closing by closing them with (<.. />). HTML5: Using a slash is absolutely optional. HTML4: Using a slash is technically invalid. However, the W3C's HTML validator still accepts it.

Can we use DOM in PHP?

So if you're ever working with the content for a post (a post type or a custom post type, for that matter) and you need to manipulate tags much like you would with JavaScript, then using the DomDocument library is one of the most powerful tools are your disposal.

Which tag is not a self closing tag?

There are also tags that are forbidden to be closed: img, input, br, hr, meta, etc.

Why does input not have a closing tag?

It seems the doesnt require a closing tag for the same reason doesn't, as it's just placing the box/picture there. <input> is among those few that we call void elements, named so because they do not enclose content in a wrapper the way <p></p> and all the other content sectioning tags do.


2 Answers

DOMDocument->saveHTML() takes your XML DOM infoset and writes it out as old-school HTML, not XML. You should not use saveHTML() together with an XHTML doctype, as its output won't be well-formed XML.

If you use saveXML() instead, you'll get proper XHTML. It's fine to serve this XML output to standards-compliant browsers if you give it a Content-Type: application/xhtml+xml header. But unfortunately IE6-8 won't be able to read that, as they can still only handle old-school HTML, under the text/html media type.

The usual compromise solution is to serve text/html and use ‘HTML-compatible XHTML’ as outlined in Appendix C of the XHTML 1.0 spec. But sadly there is no PHP DOMDocument->saveXHTML() method to generate the correct output for this.

There are some things you can do to persuade saveXML() to produce HTML-compatible output for some common cases. The main one is that you have to ensure that only elements defined by HTML4 as having an EMPTY content model (<img>, <br> etc) actually do have empty content, causing the self-closing syntax (<img/>) to be used. Other elements must not use the self-closing syntax, so if they're empty you should put a space in their text content to stop them being so:

<script src="x.js"/>           <-- no good, confuses HTML parser and breaks page
<script src="x.js"> </script>  <-- fine

The other one to look out for is handling of the inline <script> and <style> elements, which are normal elements in XHTML but special CDATA-content elements in HTML. Some /*<![CDATA[*/.../*]]>*/ wrapping is required to make any < or & characters inside them behave mostly-consistently, though note you still have to avoid the ]]> and </ sequences.

If you want to really do it properly you would have to write your own HTML-compatible-XHTML serialiser. Long-term that would probably be a better option. But for small simple cases, hacking your input so that it doesn't contain anything that would come out the other end of an XML serialiser as incompatible with HTML is probably the quick solution.

That or just suck it up and live with old-school non-XML HTML, obviously.

like image 62
bobince Avatar answered Sep 20 '22 21:09

bobince


doctype issue as it's text/html the closing slash isn't needed, you only need closing slash if it is an xhtml doc

noted you've updated to add in the doctype, but PHP dom also looks at that meta tag you've got in there, and content="text/html; charset=utf-8" clearly isn't XML based, it's just text/html :)

aside: DOM api also picks up the charset from there

like image 32
nathan Avatar answered Sep 21 '22 21:09

nathan