I have a system which generates a large number of XML documents every day (of the order of 1 million) and I would like to be able to store and index these so that I can, for example, search for all documents with a certain field set to a given value.
I understand that there are fundamentally two types of XML database, those that provide XML support on top of a conventional relational database and those that are "native" XML database. Given that I am open to using either, what would you recommend?
Microsoft SQL Server has support for XML columns. This is more than just BLOB/TEXT support.
You can use XML columns in an unstructured manner, where SQL Server will just ensure they are correct XML. This allows storage of arbitrary XML documents inside SQL Server, but still ensuring you're dealing with XML and not just arbitrary bytes/characters. SQL Server lets you query on top of this using XQuery.
You can also create XML columns that conform to a schema using XSD. More interestingly, SQL Server allows indexing the XML so that your XPath queries can perform well.
See "What's New for XML in SQL Server 2008" for more information. (Although most of the XML support exists in SQL Server 2005.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With