As an amateur software developer (I'm still in academia) I've written a few schemas for XML documents. I routinely run into design flubs that cause ugly-looking XML documents because I'm not entirely certain what the semantics of XML exactly are.
My assumptions:
<property> value </property>
property = value
<property attribute="attval"> value </property>
A property with a special descriptor, the attribute.
<parent> <child> value </child> </parent>
The parent has a characteristic "child" which has the value "value."
<tag />
"Tag" is a flag or it directly translates to text. I'm not sure on this one.
<parent> <child /> </parent>
"child" describes "parent." "child" is a flag or boolean. I'm not sure on this one, either.
Ambiguity arises if you want to do something like representing cartesian coordinates:
<coordinate x="0" y="1" /> <coordinate> 0,1 </coordinate> <coordinate> <x> 0 </x> <y> 1 </y> </coordinate>
Which one of these options is most correct? I would lean towards the third based upon my current conception of XML schema design, but I really don't know.
What are some resources that succinctly describe how to effectively design xml schemas?
XML data must be designed and structured carefully to ensure that it is accurate, flexible, performant and reusable. Design decisions can seriously impact the quality, usability and shelf life of XML applications.
The XML Schema Designer (XSD Designer) is a graphical tool that allows you to visualize a schema set at different levels of abstraction. The main components of the XSD Designer are: XML Schema Explorer, which enables you to browse and navigate the XML Schema tree and perform searches.
Datatype requirements The XML schema language must: provide for primitive data typing, including byte, date, integer, sequence, SQL & Java primitive data types, etc.; Conformance.
One general (but important!) recommendation is never to store multiple logical pieces of data in a single node (be it a text node or an attribute node). Otherwise, you end up needing your own parsing logic on top of the XML parsing logic you normally get for free from your framework.
So in your coordinate example, <coordinate x="0" y="1" />
and <coordinate> <x>0</x> <y>1</y> </coordinate>
are both reasonable to me.
But <coordinate> 0,1 </coordinate>
isn’t very good, because it’s storing two logical pieces of data (the X-coordinate and the Y-coordinate) in a single XML node—forcing the consumer to parse the data outside of their XML parser. And while splitting a string by a comma is pretty simple, there are still some ambiguities like what happens if there's an extra comma at the end.
See the tutorial:
I also recommend:
Priscilla Walmsley's book "Definitive XML Schema".
Jeni Tennison's XML Schema pages
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With