Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

New and improved with clarification: XML feed design best practice for structured data when there is no pre-existing DTD/Schema

When designing an XML feed for structured data, what is good practice, and what anti-patterns are there?

I'd like answers that cover XML structure and content, and/or transport mechanisms.

Transport Mechanisms

With current technologies is FTP/SFTP a good technology? Are there cases where it is the best fit as a solution?

Generally I prefer HTTP pull feeds, but what weaknesses does using HTTP have?

What other feed mechanisms should be considered with their pros and cons?

XML Structure Content

When there is no suitable existing DTD/schema that exists, what practices can be followed to come up with a good XML design?

Two anti-patterns for this I have already given in my answer below.

But what should I be doing when designing a feed? I'd like to hear about tags vs attributes, how relational data (esp. many-to-many relationships) should be conveyed in XML, etc.

Note: I have completely rewritten the question, as even with the bounty offered it wasn't getting a lot of love. (The old version is in the edit history if you want to see it. This version should be pertinent to the answers already given)

like image 975
DanSingerman Avatar asked Mar 12 '09 10:03

DanSingerman


4 Answers

A good feed has

1) A schema, because that way you can check it programatically and you know when it's been changed - saves lots of arguements

2) Tells you when it's down

3) Works consistently

4) Will handle stops, starts, pause, rewind gracefully

5) Has a test service that fully exercises all the existing feed features

6) Has a new features service for sand box development

Realistically I've only worked with feeds that deliver 1 and sometimes 2, but we can dream.

like image 91
MrTelly Avatar answered Nov 12 '22 07:11

MrTelly


Without a DTD / Schema you have no way to knowing if a feed is valid until your code encounters a problem. So for me schemas are very important, both as an XML consumer and a producer.

Even a simple schema is useful, defining the elements, how many times they occur etc. A detailed schema, with restrictions or enumerations as needed is even nicer. When I have those I can minimise the amount of errors in the XML I produce, or I can validate the whole file if it's sent to me and reject it as non-compliant as necessary. It's just a neat, standard way of performing input validation.

like image 26
blowdart Avatar answered Nov 12 '22 07:11

blowdart


It's a good question, but I don't know how much further it goes than schema good, !schema bad.

I've had to consume feeds which failed to provide or provided broken schemas and realistically all you can do is transform those into namespace-less clones, which is workable but risky as hell.

I18N and especially number formats and datestamps are a massive problem. Best practice is of course declaring your format in the doc, and preferably defaulting to UTC time.

I guess the only other good practice I can suggest is where consuming multiple feeds which need to interact don't try and deal with them on their terms, instead the first thing you need to do is deserialise them to a standard object or transform them to a standard internal schema.

like image 36
annakata Avatar answered Nov 12 '22 07:11

annakata


Without knowing your real requirements, it is difficult to make recommendations for transport mechanisms or styles. For instance, if you're doing pull based syndication, HTTP can offer features that assist with caching. If you're doing push based or publish/subscribe protocols like XMPP could be used.

For your feed itself, I'd recommend sticking to a public specification such as Atom (or maybe an RSS variant if you want). Atom incorporates some of the items you mentioned such as encoding content and date formats (using UTC is easiest in most cases, then convert to a user's local time for display). By sticking to standard formats, you also allow use of feed parsers that support that spec.

Atom and RSS are flexible enough to allow you to define your own XML namespaces to add whatever elements and attributes you need. If your data produced doesn't map onto the feed/entry data model, then maybe they aren't the best fit for you.

If you are using XML, parent/child relationships (where the child only has 1 parent) these can be easily modeled as parent/child elements. If the child has multiple parents, you can use reference and attributes to link elements.

like image 33
David Schlosnagle Avatar answered Nov 12 '22 07:11

David Schlosnagle