According to the XML spec, this is the definition of an empty element: <blockquote> An element with no content is said to be empty.] The representation of an empty element is either a start-tag immediately followed by an end-tag, or an empty-element tag. </blockquote> (see: http://www.w3.org/TR/REC-xml/#NT-content) Now, I have no problem understanding empty-element tags: <code><i-am-empty/></code> and no misunderstanding is possible. But it seems to me the standard contradicts itself in the other case: on the one hand it says that any tag with no <code>content</code> is empty, on the other hand it says that this can be represented by a start-tag followed immediately by an end-tag. But if we look at the definition of <code>content</code>: <pre class="prettyprint"><code>[43] content ::= CharData? ((element | Reference | CDSect | PI | Comment) CharData?)* </code></pre> It seems to me that <code>content</code> consists of two optional parts, <code>CharData?</code> and a group <code>()*</code>. But since both these parts are optional, it would mean that nothing (as in, absence of characters) matches this production. SO if I would try to match this definition of content to whatever is inside <code><am-i-empty-or-not></am-i-empty-or-not></code> I would get a positive match. So, on the one hand this is an empty tag because it is "a start-tag immediately followed by an end-tag", on the other hand it is not empty because between the tags I can positively match the definition of production rule [43] for content, in which case it contains content, which means it can't be empty. Can anybody explain what rules take precedence? Does anybody know about any DOM or parser implementations that have differrent opinions on this?

<blockquote> But since both these parts are optional, it would mean that nothing (as in, absence of characters) matches this production. </blockquote> That may be true, but the wording in the spec on this issue is quite clear. There are even examples for empty elements in the next paragraph. <pre class="prettyprint"><code><IMG align="left" src="http://www.w3.org/Icons/WWW/w3c_home" /> </code></pre> So the only way (in this context, with the surrounding wording and examples) to read <blockquote> An element with no content </blockquote> would be to include "content that (while matching the production) is completely empty" (i.e. zero-length, not even white-space).

I wanted to check what different variations of "empty" actually are empty. Variation A <code><Santa/></code> gives a tree of <pre class="prettyprint"><code>|- NODE_DOCUMENT #document "" |- NODE_ELEMENT Santa "" </code></pre> Variation B <code><Santa></Santa></code> gives a DOM tree of: <pre class="prettyprint"><code>|- NODE_DOCUMENT #document "" |- NODE_ELEMENT Santa "" </code></pre> Variation C <code><Santa></code><kbd>Space</kbd><code></Santa></code> gives a DOM tree of: <pre class="prettyprint"><code>|- NODE_DOCUMENT #document "" |- NODE_ELEMENT Santa "" </code></pre> Variation D <code><Santa></code><kbd>Tab</kbd><code></Santa></code> gives a DOM tree of: <pre class="prettyprint"><code>|- NODE_DOCUMENT #document "" |- NODE_ELEMENT Santa "" </code></pre> Variation E <code><Santa></code><kbd>CRLF</kbd> <code></Santa></code> gives a DOM tree of: <pre class="prettyprint"><code>|- NODE_DOCUMENT #document "" |- NODE_ELEMENT Santa "" </code></pre> All variations of text give the same DOM tree. When a XML document is asked to serialize itself, the DOM tree: <pre class="prettyprint"><code>|- NODE_DOCUMENT #document "" |- NODE_ELEMENT Santa "" </code></pre> results in the serialized text: <pre class="prettyprint"><code><?xml version="1.0"?> <Santa/> </code></pre> <h3>Manually adding an empty text node</h3> I wanted to see what happens if i build the DOM tree: <pre class="prettyprint"><code>|- NODE_DOCUMENT #document "" |- NODE_ELEMENT Santa "" |- NODE_TEXT #text "" </code></pre> using the pseudo-code: <pre class="prettyprint"><code>XmlDocument doc = new XmlDocument(); XmlElement santa = doc.appendChild(doc.CreateElement("Santa")); santa.appendChild(doc.CreateText("")); </code></pre> When that DOM document is saved to a stream, it comes out as: <pre class="prettyprint"><code><?xml version="1.0"?> <Santa/> </code></pre> Even when the element is forced to have a child (i.e. forced to not be empty), the DOM takes it to be empty. <h3>Force text node with whitespace</h3> And then if i make sure to put some whitespace in the <code>TEXT</code> node: <pre class="prettyprint"><code>XmlDocument doc = new XmlDocument(); XmlElement santa = doc.appendChild(doc.CreateElement("Santa")); santa.appendChild(doc.CreateText(" ")); </code></pre> It comes out as the XML: <pre class="prettyprint"><code><?xml version="1.0" ?> <Santa> </Santa> </code></pre> with the DOM tree: <pre class="prettyprint"><code>|- NODE_DOCUMENT #document "" |- NODE_ELEMENT Santa "" |- NODE_TEXT #text " " </code></pre> Interesting; it's not round-trippable. <h3>Force a TAB CRLF</h3> <pre class="prettyprint"><code>XmlDocument doc = new XmlDocument(); XmlElement santa = doc.appendChild(doc.CreateElement("Santa")); santa.appendChild(doc.CreateText(TAB+LF+CR)); </code></pre> It comes out as the XML: <pre class="prettyprint"> <?xml version="1.0"?> <Santa><kbd>TAB</kbd><kbd>LF</kbd> <kbd>CR</kbd> </Santa> </pre> with the DOM tree: <pre class="prettyprint"><code>|- NODE_DOCUMENT #document "" |- NODE_ELEMENT Santa "" |- NODE_TEXT #text "\t\n\n" </code></pre> Yes, XML converts all <kbd>CR</kbd> into <kbd>LF</kbd>, and yes, it's not round-trippable. If you parse: <pre class="prettyprint"> <?xml version="1.0"?> <Santa><kbd>TAB</kbd><kbd>LF</kbd> <kbd>CR</kbd> </Santa> </pre> you will get the DOM tree of: <pre class="prettyprint"><code>|- NODE_DOCUMENT #document "" |- NODE_ELEMENT Santa "" </code></pre> <h3>Setting element.text</h3> Finally we come to what happens if you set an element's text through it's <code>.text</code> property. Set no text: <pre class="prettyprint"><code>XmlDocument doc = new XmlDocument(); XmlElement santa = doc.appendChild(doc.CreateElement("Santa")); //santa.text = ""; example where we don't set the text </code></pre> gives the DOM tree: <pre class="prettyprint"><code>|- NODE_DOCUMENT #document "" |- NODE_ELEMENT Santa "" </code></pre> and the XML: <pre class="prettyprint"><code><?xml version="1.0"?> <Santa/> </code></pre> Setting empty text <pre class="prettyprint"><code>XmlDocument doc = new XmlDocument(); XmlElement santa = doc.appendChild(doc.CreateElement("Santa")); santa.text = ""; //example where we do set the text </code></pre> gives the DOM tree: <pre class="prettyprint"><code>|- NODE_DOCUMENT #document "" |- NODE_ELEMENT Santa "" |- NODE_TEXT #text "" </code></pre> and the XML: <pre class="prettyprint"><code><?xml version="1.0"?> <Santa/> </code></pre> Setting single space <pre class="prettyprint"><code>XmlDocument doc = new XmlDocument(); XmlElement santa = doc.appendChild(doc.CreateElement("Santa")); santa.text = " "; </code></pre> gives the DOM tree: <pre class="prettyprint"><code>|- NODE_DOCUMENT #document "" |- NODE_ELEMENT Santa "" |- NODE_TEXT #text " " </code></pre> and the XML: <pre class="prettyprint"><code><?xml version="1.0"?> <Santa> </Santa> </code></pre> Setting more whitepsace <pre class="prettyprint"><code>XmlDocument doc = new XmlDocument(); XmlElement santa = doc.appendChild(doc.CreateElement("Santa")); santa.text = LF+TAB+CR; </code></pre> gives the DOM tree: <pre class="prettyprint"><code>|- NODE_DOCUMENT #document "" |- NODE_ELEMENT Santa "" |- NODE_TEXT #text "\n\t\n" </code></pre> and the XML: <pre class="prettyprint"> <?xml version="1.0"?> <Santa><kbd>LF</kbd> <kbd>TAB</kbd><kbd>LF</kbd> </Santa> </pre> So what they told you was true, from a certain point of view. <ul> <li>an xml string that contains only whitespace in the element will be empty when parsed</li> <li>an DOM element that contain only whitespace in its text node will render the whitespace when converted to an xml string</li> </ul>

What is an empty element?

Tags:

According to the XML spec, this is the definition of an empty element:

An element with no content is said to be empty.] The representation of an empty element is either a start-tag immediately followed by an end-tag, or an empty-element tag.

(see: http://www.w3.org/TR/REC-xml/#NT-content)

Now, I have no problem understanding empty-element tags: <i-am-empty/> and no misunderstanding is possible. But it seems to me the standard contradicts itself in the other case: on the one hand it says that any tag with no content is empty, on the other hand it says that this can be represented by a start-tag followed immediately by an end-tag. But if we look at the definition of content:

[43] content ::= CharData? ((element | Reference | CDSect | PI | Comment) CharData?)*

It seems to me that content consists of two optional parts, CharData? and a group ()*. But since both these parts are optional, it would mean that nothing (as in, absence of characters) matches this production. SO if I would try to match this definition of content to whatever is inside <am-i-empty-or-not></am-i-empty-or-not> I would get a positive match. So, on the one hand this is an empty tag because it is "a start-tag immediately followed by an end-tag", on the other hand it is not empty because between the tags I can positively match the definition of production rule [43] for content, in which case it contains content, which means it can't be empty.

Can anybody explain what rules take precedence? Does anybody know about any DOM or parser implementations that have differrent opinions on this?

683

asked Feb 17 '10 09:02

Roland Bouman

2 Answers

But since both these parts are optional, it would mean that nothing (as in, absence of characters) matches this production.

That may be true, but the wording in the spec on this issue is quite clear. There are even examples for empty elements in the next paragraph.

<IMG align="left"  src="http://www.w3.org/Icons/WWW/w3c_home" /> <br></br> <br/>

So the only way (in this context, with the surrounding wording and examples) to read

An element with no content

would be to include "content that (while matching the production) is completely empty" (i.e. zero-length, not even white-space).

146

answered Sep 19 '22 02:09

Thilo

I wanted to check what different variations of "empty" actually are empty.

Variation A

<Santa/>

gives a tree of

|- NODE_DOCUMENT #document ""    |- NODE_ELEMENT Santa ""

Variation B

<Santa></Santa>

gives a DOM tree of:

|- NODE_DOCUMENT #document ""    |- NODE_ELEMENT Santa ""

Variation C

<Santa>Space</Santa>

gives a DOM tree of:

|- NODE_DOCUMENT #document ""    |- NODE_ELEMENT Santa ""

Variation D

<Santa>Tab</Santa>

gives a DOM tree of:

|- NODE_DOCUMENT #document ""    |- NODE_ELEMENT Santa ""

Variation E

<Santa>CRLF
</Santa>

gives a DOM tree of:

|- NODE_DOCUMENT #document ""    |- NODE_ELEMENT Santa ""

All variations of text give the same DOM tree. When a XML document is asked to serialize itself, the DOM tree:

|- NODE_DOCUMENT #document ""    |- NODE_ELEMENT Santa ""

results in the serialized text:

<?xml version="1.0"?> <Santa/>

Manually adding an empty text node

I wanted to see what happens if i build the DOM tree:

|- NODE_DOCUMENT #document ""    |- NODE_ELEMENT Santa ""       |- NODE_TEXT #text ""

using the pseudo-code:

XmlDocument doc = new XmlDocument(); XmlElement santa = doc.appendChild(doc.CreateElement("Santa")); santa.appendChild(doc.CreateText(""));

When that DOM document is saved to a stream, it comes out as:

<?xml version="1.0"?> <Santa/>

Even when the element is forced to have a child (i.e. forced to not be empty), the DOM takes it to be empty.

Force text node with whitespace

And then if i make sure to put some whitespace in the TEXT node:

XmlDocument doc = new XmlDocument(); XmlElement santa = doc.appendChild(doc.CreateElement("Santa")); santa.appendChild(doc.CreateText(" "));

It comes out as the XML:

<?xml version="1.0" ?> <Santa> </Santa>

with the DOM tree:

|- NODE_DOCUMENT #document ""    |- NODE_ELEMENT Santa ""       |- NODE_TEXT #text " "

Interesting; it's not round-trippable.

Force a TAB CRLF

XmlDocument doc = new XmlDocument(); XmlElement santa = doc.appendChild(doc.CreateElement("Santa")); santa.appendChild(doc.CreateText(TAB+LF+CR));

It comes out as the XML:

 <?xml version="1.0"?> <Santa>TABLF CR     </Santa>

with the DOM tree:

|- NODE_DOCUMENT #document ""    |- NODE_ELEMENT Santa ""       |- NODE_TEXT #text "\t\n\n"

Yes, XML converts all CR into LF, and yes, it's not round-trippable. If you parse:

 <?xml version="1.0"?> <Santa>TABLF CR    </Santa>

you will get the DOM tree of:

|- NODE_DOCUMENT #document ""    |- NODE_ELEMENT Santa ""

Setting element.text

Finally we come to what happens if you set an element's text through it's .text property.

Set no text:

XmlDocument doc = new XmlDocument(); XmlElement santa = doc.appendChild(doc.CreateElement("Santa")); //santa.text = ""; example where we don't set the text

gives the DOM tree:

|- NODE_DOCUMENT #document ""    |- NODE_ELEMENT Santa ""

and the XML:

<?xml version="1.0"?> <Santa/>

Setting empty text

XmlDocument doc = new XmlDocument(); XmlElement santa = doc.appendChild(doc.CreateElement("Santa")); santa.text = ""; //example where we do set the text

gives the DOM tree:

|- NODE_DOCUMENT #document ""    |- NODE_ELEMENT Santa ""       |- NODE_TEXT #text ""

and the XML:

<?xml version="1.0"?> <Santa/>

Setting single space

XmlDocument doc = new XmlDocument(); XmlElement santa = doc.appendChild(doc.CreateElement("Santa")); santa.text = " ";

gives the DOM tree:

|- NODE_DOCUMENT #document ""    |- NODE_ELEMENT Santa ""       |- NODE_TEXT #text " "

and the XML:

<?xml version="1.0"?> <Santa> </Santa>

Setting more whitepsace

XmlDocument doc = new XmlDocument(); XmlElement santa = doc.appendChild(doc.CreateElement("Santa")); santa.text = LF+TAB+CR;

gives the DOM tree:

|- NODE_DOCUMENT #document ""    |- NODE_ELEMENT Santa ""       |- NODE_TEXT #text "\n\t\n"

and the XML:

 <?xml version="1.0"?>   <Santa>LF TABLF </Santa>

So what they told you was true, from a certain point of view.

an xml string that contains only whitespace in the element will be empty when parsed
an DOM element that contain only whitespace in its text node will render the whitespace when converted to an xml string

answered Sep 19 '22 02:09

Ian Boyd

Related questions
                            
                                Storing PDF files as binary objects in SQL Server, yes or no?
                            
                                Set form submit header
                            
                                the best "Simple" CMS system suitable for .Net MVC [closed]
                            
                                How to get CPU usage statistics on Android?
                            
                                What is Medium Trust in Asp.net?
                            
                                this == null // How can it be possible?
                            
                                Question about [Pure] methods
                            
                                Can I find a filename from a filehandle in Perl?
                            
                                In Java, is a Comparator used in Collections.sort() thread safe?
                            
                                How to remove accidental branch in TortoiseHg?
                            
                                UUID collision risk using different algorithms
                            
                                Parsing a string for dates in PHP

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is an empty element?

Tags:

Roland Bouman

People also ask

2 Answers

Thilo

Manually adding an empty text node

Force text node with whitespace

Force a TAB CRLF

Setting element.text

Ian Boyd

Recent Activity

Donate For Us