Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is an XML infoset and in what ways is it different to an XML document?

I've tried to read http://www.w3.org/TR/xml-infoset/ and the wikipedia entry. But frankly I'm still not sure what the difference is.

The quote :

An XML document has an information set if it is well-formed and satisfies the namespace constraints. There is no requirement for an XML document to be valid in order to have an information set.

From the wikipedia entry seems to not make sense. How can a non valid document have any semantics, and thus how can it be an 'information' set?

What is this 'infoset' that

well-formed and satisfies the namespace constrained

XML has? And in what way it is useful in itself. In other words why is it, semantically speaking, necessary to define the XML infoset? Is there any information that cannot be represented in XML? If so I can see the limiting set of the XML Infoset, but if not surely the XML Infoset is as meaningless as term 'information'?

Thank you for the interesting answers: I still cannot grasp why the Xml infoset has any purpose as opposed to the term infoset. But you guys have given me the direct answer to the question.

like image 650
Preet Sangha Avatar asked May 08 '09 10:05

Preet Sangha


People also ask

What is set in XML?

The XML information set is a description of the information that is available in a well-formed XML document, and it describes an abstract data model of an XML document in terms of a set of information set items.

What is an attribute information item?

An attribute information item has the following properties: [namespace name] The namespace name, if any, of the attribute. Otherwise, this property has no value. [local name] The local part of the attribute name. This does not include any namespace prefix or following colon.


1 Answers

XML is not text. XML "is" the XML infoset. This may then be serialized into text in an XML document, but it is the XML infoset that is the reality.

The infoset may exist in memory as a DOM tree, for instance. It exists in memory as the implementation of an abstract object model.

What if I serialized it as UTF-8 and then as UTF-16. Chances are the results would be two different sets of bits, but same infoset.

Consider also that with text it makes sense to do things like string concatenation. You don't want to concatenate a "<" into the middle of an XML element. You have to encode it first. Why would you have to do this if it were just text? If you used the DOM, for instance, you'd just say element.InnerText = "<"; When serialized, the "<" would be encoded into "&lt;". Yet it's the same infoset.

like image 182
John Saunders Avatar answered Sep 20 '22 16:09

John Saunders