Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is an empty element?

Tags:

According to the XML spec, this is the definition of an empty element:

An element with no content is said to be empty.] The representation of an empty element is either a start-tag immediately followed by an end-tag, or an empty-element tag.

(see: http://www.w3.org/TR/REC-xml/#NT-content)

Now, I have no problem understanding empty-element tags: <i-am-empty/> and no misunderstanding is possible. But it seems to me the standard contradicts itself in the other case: on the one hand it says that any tag with no content is empty, on the other hand it says that this can be represented by a start-tag followed immediately by an end-tag. But if we look at the definition of content:

[43] content ::= CharData? ((element | Reference | CDSect | PI | Comment) CharData?)* 

It seems to me that content consists of two optional parts, CharData? and a group ()*. But since both these parts are optional, it would mean that nothing (as in, absence of characters) matches this production. SO if I would try to match this definition of content to whatever is inside <am-i-empty-or-not></am-i-empty-or-not> I would get a positive match. So, on the one hand this is an empty tag because it is "a start-tag immediately followed by an end-tag", on the other hand it is not empty because between the tags I can positively match the definition of production rule [43] for content, in which case it contains content, which means it can't be empty.

Can anybody explain what rules take precedence? Does anybody know about any DOM or parser implementations that have differrent opinions on this?

like image 683
Roland Bouman Avatar asked Feb 17 '10 09:02

Roland Bouman


People also ask

What are empty elements examples?

In short, Empty elements are self-closing or void and not container tags. The empty elements are used to embed images, lists, breaks, horizontal lines, hyperlinks, for input, meta-data, area, etc.

What are empty elements and it is valid?

no, it is not valid to use empty element. empty elements are element with no data. no, there is no such term as empty element.

Which is an empty tag?

The tags that do not contain any closing tags are known as empty tags. Empty tags contain only the opening tag but they perform some action in the webpage.


2 Answers

But since both these parts are optional, it would mean that nothing (as in, absence of characters) matches this production.

That may be true, but the wording in the spec on this issue is quite clear. There are even examples for empty elements in the next paragraph.

<IMG align="left"  src="http://www.w3.org/Icons/WWW/w3c_home" /> <br></br> <br/> 

So the only way (in this context, with the surrounding wording and examples) to read

An element with no content

would be to include "content that (while matching the production) is completely empty" (i.e. zero-length, not even white-space).

like image 146
Thilo Avatar answered Sep 19 '22 02:09

Thilo


I wanted to check what different variations of "empty" actually are empty.

Variation A

<Santa/>

gives a tree of

|- NODE_DOCUMENT #document ""    |- NODE_ELEMENT Santa "" 

Variation B

<Santa></Santa>

gives a DOM tree of:

|- NODE_DOCUMENT #document ""    |- NODE_ELEMENT Santa "" 

Variation C

<Santa>Space</Santa>

gives a DOM tree of:

|- NODE_DOCUMENT #document ""    |- NODE_ELEMENT Santa "" 

Variation D

<Santa>Tab</Santa>

gives a DOM tree of:

|- NODE_DOCUMENT #document ""    |- NODE_ELEMENT Santa "" 

Variation E

<Santa>CRLF
</Santa>

gives a DOM tree of:

|- NODE_DOCUMENT #document ""    |- NODE_ELEMENT Santa "" 

All variations of text give the same DOM tree. When a XML document is asked to serialize itself, the DOM tree:

|- NODE_DOCUMENT #document ""    |- NODE_ELEMENT Santa "" 

results in the serialized text:

<?xml version="1.0"?> <Santa/> 

Manually adding an empty text node

I wanted to see what happens if i build the DOM tree:

|- NODE_DOCUMENT #document ""    |- NODE_ELEMENT Santa ""       |- NODE_TEXT #text "" 

using the pseudo-code:

XmlDocument doc = new XmlDocument(); XmlElement santa = doc.appendChild(doc.CreateElement("Santa")); santa.appendChild(doc.CreateText("")); 

When that DOM document is saved to a stream, it comes out as:

<?xml version="1.0"?> <Santa/> 

Even when the element is forced to have a child (i.e. forced to not be empty), the DOM takes it to be empty.

Force text node with whitespace

And then if i make sure to put some whitespace in the TEXT node:

XmlDocument doc = new XmlDocument(); XmlElement santa = doc.appendChild(doc.CreateElement("Santa")); santa.appendChild(doc.CreateText(" ")); 

It comes out as the XML:

<?xml version="1.0" ?> <Santa> </Santa> 

with the DOM tree:

|- NODE_DOCUMENT #document ""    |- NODE_ELEMENT Santa ""       |- NODE_TEXT #text " " 

Interesting; it's not round-trippable.

Force a TAB CRLF

XmlDocument doc = new XmlDocument(); XmlElement santa = doc.appendChild(doc.CreateElement("Santa")); santa.appendChild(doc.CreateText(TAB+LF+CR)); 

It comes out as the XML:

 <?xml version="1.0"?> <Santa>TABLF CR     </Santa> 

with the DOM tree:

|- NODE_DOCUMENT #document ""    |- NODE_ELEMENT Santa ""       |- NODE_TEXT #text "\t\n\n" 

Yes, XML converts all CR into LF, and yes, it's not round-trippable. If you parse:

 <?xml version="1.0"?> <Santa>TABLF CR    </Santa> 

you will get the DOM tree of:

|- NODE_DOCUMENT #document ""    |- NODE_ELEMENT Santa "" 

Setting element.text

Finally we come to what happens if you set an element's text through it's .text property.

Set no text:

XmlDocument doc = new XmlDocument(); XmlElement santa = doc.appendChild(doc.CreateElement("Santa")); //santa.text = ""; example where we don't set the text 

gives the DOM tree:

|- NODE_DOCUMENT #document ""    |- NODE_ELEMENT Santa "" 

and the XML:

<?xml version="1.0"?> <Santa/> 

Setting empty text

XmlDocument doc = new XmlDocument(); XmlElement santa = doc.appendChild(doc.CreateElement("Santa")); santa.text = ""; //example where we do set the text 

gives the DOM tree:

|- NODE_DOCUMENT #document ""    |- NODE_ELEMENT Santa ""       |- NODE_TEXT #text "" 

and the XML:

<?xml version="1.0"?> <Santa/> 

Setting single space

XmlDocument doc = new XmlDocument(); XmlElement santa = doc.appendChild(doc.CreateElement("Santa")); santa.text = " "; 

gives the DOM tree:

|- NODE_DOCUMENT #document ""    |- NODE_ELEMENT Santa ""       |- NODE_TEXT #text " " 

and the XML:

<?xml version="1.0"?> <Santa> </Santa> 

Setting more whitepsace

XmlDocument doc = new XmlDocument(); XmlElement santa = doc.appendChild(doc.CreateElement("Santa")); santa.text = LF+TAB+CR; 

gives the DOM tree:

|- NODE_DOCUMENT #document ""    |- NODE_ELEMENT Santa ""       |- NODE_TEXT #text "\n\t\n" 

and the XML:

 <?xml version="1.0"?>   <Santa>LF TABLF </Santa> 

So what they told you was true, from a certain point of view.

  • an xml string that contains only whitespace in the element will be empty when parsed
  • an DOM element that contain only whitespace in its text node will render the whitespace when converted to an xml string
like image 37
Ian Boyd Avatar answered Sep 19 '22 02:09

Ian Boyd