I've tried to read http://www.w3.org/TR/xml-infoset/ and the wikipedia entry. But frankly I'm still not sure what the difference is.
The quote :
An XML document has an information set if it is well-formed and satisfies the namespace constraints. There is no requirement for an XML document to be valid in order to have an information set.
From the wikipedia entry seems to not make sense. How can a non valid document have any semantics, and thus how can it be an 'information' set?
What is this 'infoset' that
well-formed and satisfies the namespace constrained
XML has? And in what way it is useful in itself. In other words why is it, semantically speaking, necessary to define the XML infoset? Is there any information that cannot be represented in XML? If so I can see the limiting set of the XML Infoset, but if not surely the XML Infoset is as meaningless as term 'information'?
Thank you for the interesting answers: I still cannot grasp why the Xml infoset has any purpose as opposed to the term infoset. But you guys have given me the direct answer to the question.
The XML information set is a description of the information that is available in a well-formed XML document, and it describes an abstract data model of an XML document in terms of a set of information set items.
An attribute information item has the following properties: [namespace name] The namespace name, if any, of the attribute. Otherwise, this property has no value. [local name] The local part of the attribute name. This does not include any namespace prefix or following colon.
XML is not text. XML "is" the XML infoset. This may then be serialized into text in an XML document, but it is the XML infoset that is the reality.
The infoset may exist in memory as a DOM tree, for instance. It exists in memory as the implementation of an abstract object model.
What if I serialized it as UTF-8 and then as UTF-16. Chances are the results would be two different sets of bits, but same infoset.
Consider also that with text it makes sense to do things like string concatenation. You don't want to concatenate a "<" into the middle of an XML element. You have to encode it first. Why would you have to do this if it were just text? If you used the DOM, for instance, you'd just say element.InnerText = "<"; When serialized, the "<" would be encoded into "<". Yet it's the same infoset.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With