I've been trying to see if I can accomplish some requirements with a document based database, in this case CouchDB. Two generic requirements:
And I'm begining to think that a Document-based database isn't the best choice to address these requirements. Furthermore, I can't imagine a use for a Document based database (maybe my imagination is too limited).
Can you explain to me if I am asking pears from an elm when I try to use a Document oriented database for these requirements?
If you're working with lots of unorganized data then a document database might suit you better, if your data is more structured and you application needs to access specific information and how it related to other data-points then a relational database is a better fit.
The main benefits of using relational databases are that they can be easily queried, allow for the use of stored procedures to manipulate data, and provide a consistent database design. They also have limitations when it comes to high volume transactions or large amounts of data storage, the issue of speed can arise.
Relational databases generally store data in separate tables that are defined by the programmer, and a single object may be spread across several tables. Document databases store all information for a given object in a single instance in the database, and every stored object can be different from every other.
You need to think of how you approach the application in a document oriented way. If you simply try to replicate how you would model the problem in an RDBMS then you will fail. There are also different trade-offs that you might want to make. ([ed: not sure how this ties into the argument but:] Remember that CouchDB's design assumes you will have an active cluster of many nodes that could fail at any time. How is your app going to handle one of the database nodes disappearing from under it?)
One way to think about it is to imagine you didn't have any computers, just paper documents. How would you create an efficient business process using bits of paper being passed around? How can you avoid bottlenecks? What if something goes wrong?
Another angle you should think about is eventual consistency, where you will get into a consistent state eventually, but you may be inconsistent for some period of time. This is anathema in RDBMS land, but extremely common in the real world. The canonical transaction example is of transferring money from bank accounts. How does this actually happen in the real world - through a single atomic transactions or through different banks issuing credit and debit notices to each other? What happens when you write a cheque?
So lets look at your examples:
If I understand this correctly in CouchDB terms, you want to have a collection of documents where some named value is guaranteed to be unique across all those documents? That case isn't generally supportable because documents may be created on different replicas.
So we need to look at the real world problem and see if we can model that. Do you really need them to be unique? Can your application handle multiple docs with the same value? Do you need to assign a unique identifier? Can you do that deterministically? A common scenario where this is required is where you need a unique sequential identifier. This is tough to solve in a replicated environment. In fact if the unique id is required to be strictly sequential with respect to time created it's impossible if you need the id straight away. You need to relax at least one of those constraints.
I'm not sure what to add here as the last comment you made on that post was to say "very useful! thanks". Was there something missing from the approach outlined there that is still causing you a problem? I thought MrKurt's answer was pretty full and I added a little enhancement that would reduce contention.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With