(This post is soliciting personal experiences about storing XML; please share what you know. :-) )
I am working on a service application that communicates with an external service using XML. I am planning to use SQL Server 2008 to store the XML that is received and sent to the external service. I'm exploring my options for storing the XML in the database. The three options that I have identified are:
I'm looking for any advice, based on your personal experience, with storing and retrieving XML data in SQL Server.
Some additional background: I've used an 'xsd.exe' equivalent called XsdObjectgenerator to create .net classes based on the XML schemas. When the service receives the XML file, it is deserialized into an instance of .net class. This instance is used to perform the operations of the service. My original plan was to then use option #1 above to store the XML. If I needed to update or report on the data, I would simply deserialize the db record back into one of my .net classes.
Although this approach works and makes working with the xml very simple, I have concerns that as the volume of data increases, the performance of querying XML data type records will decrease. This is why I've explored options 2. & 3. above.
In addition to storing the XML, the XML will be queried for use in both reports and a separate web application. The db records will be queried, sorted, filtered, grouped, summaried and possibly updated by the end users.
XML documents you insert into columns of type XML can reside either in the default storage object, or directly in the base table row. Base table row storage is under your control and is available only for small documents; larger documents are always stored in the default storage object.
XML stores data in plain text format. This provides a software- and hardware-independent way of storing, transporting, and sharing data.
An appropriate storage model for your XML documents is an xml data type column. This preserves the InfoSet content of your XML data.
Explanation: XML is a good choice if you want a platform-independent model in order to ensure the portability of the data by using structural and semantic markup.
I guess it depends on what you want to do with your XML in your database.
If you're mostly only storing it, and possibly retrieving it later as a whole and sending it out again, then I'd definitely use the XML datatype - no point in shredding it into bits and pieces.
If you however need to mostly work with the contents of the XML file, and possibly also manipulate and change that content, then it might be advisable to create tables with columns to match your XML content, and shred it when storing it, use it, and when you need to, reassemble it from the relational pieces using something like SELECT (columns) FROM dbo.Table FOR XML.....
There is an overhead involved in shredding and reassembling - so you need to ask yourself if that's worth doing. But there's also an overhead involved if you need to manipulate the XML column too much.
If you only need read-only access to a few attributes in your XML, I've come to appreciate the ability to wrap those into a UDF and surface it as a computed column in your table. That way, you can easily select something from your table, based on values that are stored somewhere inside your XML - quite handy! But do not overuse this approach - works fine for 2, 3 attributes - but if you need to access your XML over and over again (and most or all of it), then you might be better off shredding it into relational pieces to begin with.
While continuing to explore solutions, a collegue forwarded the following applicable links:
Some preliminary conclusions from these articles and other research:
I will be mocking up each solution with test data and performing some benchmarking. I'll post the results here once they are available.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With