We currently have a nicely relational sql server 2008 database that is our master application database. We are looking to improve an existing document storage mechanism which uses xml data types with something more schemaless that can handle similar but not identical documents and thought that couchdb would be good fit.
The idea is that the common meta data about the documents could be stored within sql server for ease of display/aggregation/reporting but the actual documents are stored in couch to handle the subtle differences in the documents. The idea is to make the most of the two different technologies.
For example the status, type, related person and date created would all be common across all documents and stored in sql but an email and a letter (obviously with different fields) could be stored in couch.
Then we can display our document grid for all types of document (thousands of docs) which can be queried through sql but the display of the doc gets its data from couch when the user requests to view it.
Something to bear in mind is that some document types are generated from templates that are also documents themselves (think mail merge/find and replace).
Application layer is asp.net 4.5, c#, repository pattern, Windsor for ioc, JavaScript
So, to the question...
Is this approach a sensible way to make the most of the two differing data storage paradigms?
Are we making our programming lives needlessly complex in the desire to "use the most appropriate technology for the problem"?
Does anyone have any experiences of trying something similar and if so, how did it go?
It's really not uncommon to use two different storage formats for a document: One for searchable aspects and metadata and another for presentation.
Looking at it in a more general way, the approach is somewhat similar to the one we developed at the Royal Danish Library and pushed in the Planets EU project:
http://www.researchgate.net/publication/221176211_Archive_Design_Based_on_Planets_Inspired_Logical_Object_Model
Here's another paper that discusses this approach in a more general way: "Opening Schrödingers Library"
The goal was archiving. We recognized that when converting documents for archiving or preservation no sigle storage format was superior in all aspects of preserving the attributes, formats, looks, contents etc of the original document. Solution: Convert to several formats, and use a sophisticated digital object to track the conversions, and which aspects of the original were best preserved in which conversion.
So in my opinion the approach is theoretically and practically sound.
Practical issues: You will probably need some sort of digital object that keeps track of the various parts of a document, eg. whether it occurs in one system only (and so which one), or in both. It seems that you are going to use SQLserver for this aspect, and that sounds sensible.
We actually did implement the object model we describe in the paper, and last I hear they are still using it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With