Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to document the structure of XML files

When it comes to documenting the structure of XML files...

One of my co-workers does it in a Word table.

Another pastes the elements into a Word document with comments like this:

<learningobject id="{Learning Object Id (same value as the loid tag)}"              xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"                  xsi:noNamespaceSchemaLocation="http://www.aicpcu.org/schemas/cms_lo.xsd">     <objectRoot>     <v>         <!-- Current version of the object from the repository. !-->         <!-- (Occurance: 1) -->     </v>     <label>         <!-- Name of the object from the repository. !-->         <!-- (Occurance: 0 or 1 or Many) -->     </label> </objectRoot> 

Which one of these methods is preferred? Is there a better way?

Are there other options that do not require third party Schema Documenter tools to update?

like image 238
joe Avatar asked Nov 17 '09 23:11

joe


People also ask

How is XML data structured?

XML data structures consist of elements, nested child elements, and attributes that Analytics identifies when it analyzes an XML file. They are displayed in the XML Data Structures treeview, which is a hierarchical representation of the XML file.

What is XML explain the basic structure?

An XML document is a basic unit of XML information composed of elements and other markup in an orderly package. An XML document can contains wide variety of data. For example, database of numbers, numbers representing molecular structure or a mathematical equation.

What is XML write a basic structure and rules of XML?

XML (Extensible Markup Language) is a markup language like HTML for storage or transmission of data. XML is widely used in web services to transport data over the network. XML has no predefined tags, unlike HTML. XML is very easy to parse and generate.

How XML is well structured?

An XML document is called well-formed if it satisfies certain rules, specified by the W3C. These rules are: A well-formed XML document must have a corresponding end tag for all of its start tags. Nesting of elements within each other in an XML document must be proper.


2 Answers

I'd write an XML Schema (XSD) file to define the structure of the XML document. xs:annotation and xs:documentation tags can be included to describe the elements. The XSD file can be transformed into documentation using XSLT stylesheets such as xs3p or tools such as XML Schema Documenter.

For an introduction to XML Schema see the XML Schools tutorial.

Here is your example, expressed as XML Schema with xs:annotation tags:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">   <xs:element name="objectroot">     <xs:complexType>       <xs:sequence>                  <xs:element name="v" type="xs:string">           <xs:annotation>             <xs:documentation>Current version of the object from the repository.</xs:documentation>           </xs:annotation>         </xs:element>          <xs:element name="label" minOccurs="0" maxOccurs="unbounded" type="xs:string">           <xs:annotation>             <xs:documentation>Name of the object from the repository.</xs:documentation>           </xs:annotation>         </xs:element>                </xs:sequence>     </xs:complexType>   </xs:element> </xs:schema> 
like image 106
Phil Ross Avatar answered Sep 23 '22 06:09

Phil Ross


Enjoy RELAX NG compact syntax

Experimenting with various XML schema languages, I have found RELAX NG the best fit for most of the cases (reasoning at the end).

Requirements

  • Allow documenting XML document structure
  • Do it in readable form
  • Keep it simple for the author

Modified sample XML (doc.xml)

I have added one attribute, to illustrate also this type of structure in the documentation.

<objectRoot created="2015-05-06T20:46:56+02:00">     <v>         <!-- Current version of the object from the repository. !-->         <!-- (Occurance: 1) -->     </v>     <label>         <!-- Name of the object from the repository. !-->         <!-- (Occurance: 0 or 1 or Many) -->     </label> </objectRoot> 

Use RELAX NG Compact syntax with comments (schema.rnc)

RELAX NG allows describing sample XML structure in the following way:

start =  ## Container for one object element objectRoot {      ## datetime of object creation     attribute created { xsd:dateTime },      ## Current version of the object from the repository     ## Occurrence 1 is assumed by default     element v {         text     },      ## Name of the object from the repository     ## Note: the occurrence is denoted by the "*" and means 0 or more     element label {         text     }* } 

I think, it is very hard to beat the simplicity, keeping given level of expressiveness.

How to comment the structure

  • always place the comment before relevant element, not after it.
  • for readability, use one blank line before the comment block
  • use ## prefix, which is automatically translates into documentation element in other schema format. Single hash # translates into XML comment and not a documentation element.
  • multiple consecutive comments (as in the example) will turn into single multi-line documentation string within single element.

  • obvious fact: the inline XML comments in doc.xml are irrelevant, only what is in schema.rnc counts.

If XML Schema 1.0 is required, generate it (schema.xsd)

Assuming you have a (open sourced) tool called trang available, you may create an XML Schema file as follows:

$ trang schema.rnc schema.xsd 

Resulting schema looks like this:

<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">   <xs:element name="objectRoot">     <xs:annotation>       <xs:documentation>Container for one object</xs:documentation>     </xs:annotation>     <xs:complexType>       <xs:sequence>         <xs:element ref="v"/>         <xs:element minOccurs="0" maxOccurs="unbounded" ref="label"/>       </xs:sequence>       <xs:attribute name="created" use="required" type="xs:dateTime">         <xs:annotation>           <xs:documentation>datetime of object creation</xs:documentation>         </xs:annotation>       </xs:attribute>     </xs:complexType>   </xs:element>   <xs:element name="v" type="xs:string">     <xs:annotation>       <xs:documentation>Current version of the object from the repository Occurance 1 is assumed by default</xs:documentation>     </xs:annotation>   </xs:element>   <xs:element name="label" type="xs:string">     <xs:annotation>       <xs:documentation>Name of the object from the repository Note: the occurance is denoted by the "*" and means 0 or more</xs:documentation>     </xs:annotation>   </xs:element> </xs:schema> 

Now can your clients, insisting on using only XML Schema 1.0 use your XML document specification.

Validating doc.xml against schema.rnc

There are open source tools like jing and rnv supporting RELAX NG Compact syntax and working on both Linux as well as on MS Windows.

Note: those tools are rather old, but very stable. Read it as a sign of stability not as sign of being obsolete.

Using jing:

$ jing -c schema.rnc doc.xml 

The -c is important, jing by default assumes RELAX NG in XML form.

Using rnv to check, the schema.rnc itself is valid:

$ rnv -c schema.rnc 

and to validate doc.xml:

$ rnv schema.rnc doc.xml 

rnv allows validating multiple documents at once:

$ rnv schema.rnc doc.xml otherdoc.xml anotherone.xml 

RELAX NG Compact syntax - pros

  • very readable, even newbie should understand the text
  • easy to learn (RELAX NG comes with good tutorial, one can learn most of it within one day)
  • very flexible (despite the fact, it looks simple, it covers many situation, some of them cannot be even resolved by XML Schema 1.0).
  • some tools for converting into other formats (RELAX NG XML form, XML Schema 1.0, DTD, but even generation of sample XML document) exists.

RELAX NG limitations

  • multiplicity can be only "zero or one", "just one", "zero or more" or "one or more". (Multiplicity of small number of elements can be described by "stupid repetition" of "zero or one" definitions)
  • There are XML Schema 1.0 constructs, which cannot be described by RELAX NG.

Conclusions

For the requirement defined above, RELAX NG Compact syntax looks like the best fit. With RELAX NG you get both - human readable schema which is even usable for automated validation.

Existing limitations do not come into effect very often and can be in many cases resolved by comments or other means.

like image 44
Jan Vlcinsky Avatar answered Sep 26 '22 06:09

Jan Vlcinsky