The javadoc for the <code>Document</code> class has the following note under <code>getElementById</code>. <blockquote> Note: Attributes with the name "ID" or "id" are not of type ID unless so defined </blockquote> So, I read an XHTML doc into the DOM (using Xerces 2.9.1). The doc has a plain old <code></code> in it. I call <code>getElementById("fribble")</code>, and it returns null. I use XPath to get "//*[id='fribble']", and all is well. So, the question is, what causes the <code>DocumentBuilder</code> to actually mark ID attributes as 'so defined?'

These attributes are special because of their type and not because of their name. IDs in XML Although it is easy to think of attributes as <code>name="value"</code> with the value is being a simple string, that is not the full story -- there is also an attribute type associated with attributes. This is easy to appreciate when there is an XML Schema involved, since XML Schema supports datatypes for both XML elements and XML attributes. The XML attributes are defined to be of a simple type (e.g. xs:string, xs:integer, xs:dateTime, xs:anyURI). The attributes being discussed here are defined with the <code>xs:ID</code> built-in datatype (see section 3.3.8 of the XML Schema Part 2: Datatypes). <pre class="prettyprint"><code><xs:element name="foo"> <xs:complexType> ... <xs:attribute name="bar" type="xs:ID"/> ... </xs:complexType> </xs:element> </code></pre> Although DTD don't support the rich datatypes in XML Schema, it does support a limited set of attribute types (which is defined in section 3.3.1 of XML 1.0). The attributes being discussed here are defined with an attribute type of <code>ID</code>. <pre class="prettyprint"><code><!ATTLIST foo bar ID #IMPLIED> </code></pre> With either the above XML Schema or DTD, the following element will be identified by the ID value of "xyz". <pre class="prettyprint"><code><foo bar="xyz"/> </code></pre> Without knowing the XML Schema or DTD, there is no way to tell what is an ID and what is not: <ul> <li>Attributes with the name of "id" do not necessarily have an attribute type of ID; and</li> <li>Attributes with names that are not "id" might have an attribute type of ID!</li> </ul> To improve this situation, the <code>xml:id</code> was subsequently invented (see xml:id W3C Recommendation). This is an attribute that always has the same prefix and name, and is intended to be treated as an attribute with attribute type of ID. However, whether it does will depend on the parser being used is aware of <code>xml:id</code> or not. Since many parsers were initially written before <code>xml:id</code> was defined, it might not be supported. IDs in Java In Java, <code>getElementById()</code> finds elements by looking for attributes of type ID, not for attributes with the name of "id". In the above example, <code>getElementById("xyz")</code> will return that <code>foo</code> element, even though the name of the attribute on it is not "id" (assuming the DOM knows that <code>bar</code> has an attribute type of ID). So how does the DOM know what attribute type an attribute has? There are three ways: <ol> <li>Provide an XML Schema to the parser (example)</li> <li>Provide a DTD to the parser</li> <li>Explicitly indicate to the DOM that it is treated as an attribute type of ID.</li> </ol> The third option is done using the <code>setIdAttribute()</code> or <code>setIdAttributeNS()</code> or <code>setIdAttributeNode()</code> methods on the <code>org.w3c.dom.Element</code> class. <pre class="prettyprint"><code>Document doc; Element fooElem; doc = ...; // load XML document instance fooElem = ...; // locate the element node "foo" in doc fooElem.setIdAttribute("bar", true); // without this, 'found' would be null Element found = doc.getElementById("xyz"); </code></pre> This has to be done for each element node that has one of these type of attributes on them. There is no simple built-in method to make all occurrences of attributes with a given name (e.g. "id") be of attribute type ID. This third approach is only useful in situations where the code calling the <code>getElementById()</code> is separate from that creating the DOM. If it was the same code, it already has found the element to set the ID attribute so it is unlikely to need to call <code>getElementById()</code>. Also, be aware that those methods were not in the original DOM specification. The <code>getElementById</code> was introduced in DOM level 2. IDs in XPath The XPath in the original question gave a result because it was only matching the attribute name. To match on attribute type ID values, the XPath <code>id</code> function needs to be used (it is one of the Node Set Functions from XPath 1.0): <pre class="prettyprint"><code>id("xyz") </code></pre> If that had been used, the XPath would have given the same result as <code>getElementById()</code> (i.e. no match found). IDs in XML continued Two important features of ID should be highlighted. Firstly, the values of all attributes of attribute type ID must be unique to the whole XML document. In the following example, if <code>personId</code> and <code>companyId</code> both have attribute type of ID, it would be an error to add another company with <code>companyId</code> of id24601, because it will be a duplicate of an existing ID value. Even though the attribute names are different, it is the attribute type that matters. <pre class="prettyprint"><code><test1> <person personId="id24600">...</person> <person personId="id24601">...</person> <company companyId="id12345">...</company> <company companyId="id12346">...</company> </test1> </code></pre> Secondly, the attributes are defined on elements rather than the entire XML document. So attributes with the same attribute name on different elements might have different attribute type properties. In the following example XML document, if only <code>alpha/@bar</code> has an attribute type of ID (and no other attribute was), <code>getElementById("xyz")</code> will return an element, but <code>getElementById("abc")</code> will not (since <code>beta/@bar</code> is not of attribute type ID). Also, it is not an error for the attribute <code>gamma/@bar</code> to have the same value as <code>alpha/@bar</code>, that value is not considered in the uniqueness of IDs in the XML document because it is is not of attribute type ID. <pre class="prettyprint"><code><test2> <alpha bar="xyz"/> <beta bar="abc"/> <gamma bar="xyz"/> </test2> </code></pre>

For the <code>getElementById()</code> call to work, the <code>Document</code> has to know the types of its nodes, and the target node must be of the XML ID type for the method to find it. It knows about the types of its elements via an associated schema. If the schema is not set, or does not declare the <code>id</code> attribute to be of the XML ID type, <code>getElementById()</code> will never find it. My guess is that your document doesn't know the <code>p</code> element's <code>id</code> attribute is of the XML ID type (is it?). You can navigate to the node in the DOM using <code>getChildNodes()</code> and other DOM-traversal functions, and try calling <code>Attr.isId()</code> on the id attribute to tell for sure. From the getElementById javadoc: <blockquote> The DOM implementation is expected to use the attribute Attr.isId to determine if an attribute is of type ID. Note: Attributes with the name "ID" or "id" are not of type ID unless so defined. </blockquote> If you are using a <code>DocumentBuilder</code> to parse your XML into a DOM, be sure to call <code>setSchema(schema)</code> on the DocumentBuilderFactory before calling newDocumentBuilder(), to ensure that the builder you get from the factory is aware of element types.

Java XML DOM: how are id Attributes special?

2 Answers

These attributes are special because of their type and not because of their name.

IDs in XML

Although it is easy to think of attributes as name="value" with the value is being a simple string, that is not the full story -- there is also an attribute type associated with attributes.

This is easy to appreciate when there is an XML Schema involved, since XML Schema supports datatypes for both XML elements and XML attributes. The XML attributes are defined to be of a simple type (e.g. xs:string, xs:integer, xs:dateTime, xs:anyURI). The attributes being discussed here are defined with the xs:ID built-in datatype (see section 3.3.8 of the XML Schema Part 2: Datatypes).

<xs:element name="foo">   <xs:complexType>    ...    <xs:attribute name="bar" type="xs:ID"/>    ...   </xs:complexType> </xs:element>

Although DTD don't support the rich datatypes in XML Schema, it does support a limited set of attribute types (which is defined in section 3.3.1 of XML 1.0). The attributes being discussed here are defined with an attribute type of ID.

<!ATTLIST foo  bar ID #IMPLIED>

With either the above XML Schema or DTD, the following element will be identified by the ID value of "xyz".

<foo bar="xyz"/>

Without knowing the XML Schema or DTD, there is no way to tell what is an ID and what is not:

Attributes with the name of "id" do not necessarily have an attribute type of ID; and
Attributes with names that are not "id" might have an attribute type of ID!

To improve this situation, the xml:id was subsequently invented (see xml:id W3C Recommendation). This is an attribute that always has the same prefix and name, and is intended to be treated as an attribute with attribute type of ID. However, whether it does will depend on the parser being used is aware of xml:id or not. Since many parsers were initially written before xml:id was defined, it might not be supported.

IDs in Java

In Java, getElementById() finds elements by looking for attributes of type ID, not for attributes with the name of "id".

In the above example, getElementById("xyz") will return that foo element, even though the name of the attribute on it is not "id" (assuming the DOM knows that bar has an attribute type of ID).

So how does the DOM know what attribute type an attribute has? There are three ways:

Provide an XML Schema to the parser (example)
Provide a DTD to the parser
Explicitly indicate to the DOM that it is treated as an attribute type of ID.

The third option is done using the setIdAttribute() or setIdAttributeNS() or setIdAttributeNode() methods on the org.w3c.dom.Element class.

Document doc; Element fooElem;  doc = ...; // load XML document instance fooElem = ...; // locate the element node "foo" in doc  fooElem.setIdAttribute("bar", true); // without this, 'found' would be null  Element found = doc.getElementById("xyz");

This has to be done for each element node that has one of these type of attributes on them. There is no simple built-in method to make all occurrences of attributes with a given name (e.g. "id") be of attribute type ID.

This third approach is only useful in situations where the code calling the getElementById() is separate from that creating the DOM. If it was the same code, it already has found the element to set the ID attribute so it is unlikely to need to call getElementById().

Also, be aware that those methods were not in the original DOM specification. The getElementById was introduced in DOM level 2.

IDs in XPath

The XPath in the original question gave a result because it was only matching the attribute name.

To match on attribute type ID values, the XPath id function needs to be used (it is one of the Node Set Functions from XPath 1.0):

id("xyz")

If that had been used, the XPath would have given the same result as getElementById() (i.e. no match found).

IDs in XML continued

Two important features of ID should be highlighted.

Firstly, the values of all attributes of attribute type ID must be unique to the whole XML document. In the following example, if personId and companyId both have attribute type of ID, it would be an error to add another company with companyId of id24601, because it will be a duplicate of an existing ID value. Even though the attribute names are different, it is the attribute type that matters.

<test1>  <person personId="id24600">...</person>  <person personId="id24601">...</person>  <company companyId="id12345">...</company>  <company companyId="id12346">...</company> </test1>

Secondly, the attributes are defined on elements rather than the entire XML document. So attributes with the same attribute name on different elements might have different attribute type properties. In the following example XML document, if only alpha/@bar has an attribute type of ID (and no other attribute was), getElementById("xyz") will return an element, but getElementById("abc") will not (since beta/@bar is not of attribute type ID). Also, it is not an error for the attribute gamma/@bar to have the same value as alpha/@bar, that value is not considered in the uniqueness of IDs in the XML document because it is is not of attribute type ID.

<test2>   <alpha bar="xyz"/>   <beta bar="abc"/>   <gamma bar="xyz"/> </test2>

143

answered Sep 20 '22 23:09

Hoylen

For the getElementById() call to work, the Document has to know the types of its nodes, and the target node must be of the XML ID type for the method to find it. It knows about the types of its elements via an associated schema. If the schema is not set, or does not declare the id attribute to be of the XML ID type, getElementById() will never find it.

My guess is that your document doesn't know the p element's id attribute is of the XML ID type (is it?). You can navigate to the node in the DOM using getChildNodes() and other DOM-traversal functions, and try calling Attr.isId() on the id attribute to tell for sure.

From the getElementById javadoc:

The DOM implementation is expected to use the attribute Attr.isId to determine if an attribute is of type ID.

Note: Attributes with the name "ID" or "id" are not of type ID unless so defined.

If you are using a DocumentBuilder to parse your XML into a DOM, be sure to call setSchema(schema) on the DocumentBuilderFactory before calling newDocumentBuilder(), to ensure that the builder you get from the factory is aware of element types.

answered Sep 20 '22 23:09

Tom Tresansky

Related questions
                            
                                How to kill a thread in delphi?
                            
                                How do you clear the R console in OS X (or Ubuntu) [duplicate]
                            
                                Proper use of RuntimeException? [duplicate]
                            
                                How to add horizontal separator in a dynamically created ContextMenu?
                            
                                Print current directory using Perl
                            
                                Tracking metrics using StatsD (via etsy) and Graphite, graphite graph doesn't seem to be graphing all the data
                            
                                Django Model Forms - Setting a required field
                            
                                su postgres: Sorry? [closed]
                            
                                Move Symfony2 service config to bundle
                            
                                JQuery-Mobile collapsible expand/collapse event
                            
                                How to check whether a string contains lowercase letter, uppercase letter, special character and digit?
                            
                                Search multiple columns - Rails

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Java XML DOM: how are id Attributes special?

Tags:

bmargulies

People also ask

2 Answers

Hoylen

Tom Tresansky

Recent Activity

Donate For Us